Notice: MediaWiki has been updated. Report any rough edges to

USB Protocol Information

From OpenKinect
Revision as of 09:23, 8 March 2011 by Zarvox (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigationJump to search

General Overview

This page should be superceded by the info in the camera section of Protocol Documentation, which is substantially more up-to-date, relevant, and informative.

Depth Camera

RGB data should follow similar conventions but will need to be analyzed. The depth camera returns values with 11-bits of precision. Information is needed on if the data returned is fixed, or if some sort of processing algorithm is operating on it.

Relevent Bits of Code

Lets take a look at what is actually interesting about the code, lots of Beagle interface library and Qt code to shift through.

The most important piece of data gathered from the libkinect project is the following struct:

 struct __attribute__ ((packed)) frame_info {
     uint8_t  magic[2];
     uint8_t  resv_0; // ??
     uint8_t  cmd; // 71 - new frame, 72 - current frame, 75 - EOF
     uint8_t  unk_42;
     uint8_t  pkt_seq;
     uint8_t  unk_06; // 04 on EOF
     uint8_t  unk_e0; // 78 on EOF
     uint32_t time_code; // all 0 on new frame.
     uint8_t  data[];
     int     _frame_pos;
     uint8_t _frame[422400];

This is a 12-byte struct followed with a data array and forms the basis of the Kinect's depth camera perception protocol. As you can tell we definitely have some magic inside this structure, unidentified packets for sure. Some of them are most likely control/status commands and will be determined later on. The _frame_pos and _frame are private member variables of the QThread class, which processes the data.

All the magic happens at line 100 of source.cpp when they call the process_usb_data function with the pointer to sniffed traffic. To process an image we do the following:

  1. If cmd == 0x71 we have reached the end of the current frame and need to go onto the next frame. Set our _frame_pos pointer to 0.
  2. We get the length of the data array. This is the total length of the packet subtracting the length of the header information in the above struct.
  3. We memcpy the data array into our _frame array, using _frame_pos as an base index position.
  4. If cmd == 0x75 we have reached the end of the current frame, time to shoot it to a output of some sort.

Possible improvements to this algorithm could include monitoring packet_seq and timecode for sequential data packets. Obviously if the data is corrupt you have bigger problems but it might be nice. Need to figure out the magic.

After we have the data, for basic display in the QWidget and generate proper RGB pixels:

  1. Reverse the endianness of the data. (0b10000000 becomes 0b00000001)
  2. Shove it into an RGB bitstream, since we're only dealing with monochrome data it's just duplicated three times for convienience.

As a note there's some bit manipulation going on to get it into a pixel value. Each pixel is 11 bits, which gives it 2047 possible values it looks like. Kind of nifty. You can see the bit-shifting going on inside the libkinect source, documented here for prosperity:

 for (int p = 0, bit_offset = 0; p < 640*480; p++, bit_offset += 11) {
   uint32_t pixel = 0; // value of pixel
   pixel   = *((uint32_t *)(data+(p*11/8)));
   pixel >>= (p*11 % 8);
   pixel  &= 0x7ff;
   uint8_t pix_low  = (pixel & 0x00ff) >> 0;
   uint8_t pix_high = (pixel & 0xff00) >> 8;
   pix_low  = reverse[pix_low];
   pix_high = reverse[pix_high];
   pixel = (pix_low << 8) | (pix_high);
   pixel >>= 5;
   // Image drops the 3 MSbs
   rgb[3*p+0] = 255-pixel;
   rgb[3*p+1] = 255-pixel;
   rgb[3*p+2] = 255-pixel;

Looks fairly routine, but I'll be the first to admit bit-wise operations make my head hurt, doubly so for efficient bit-wise operations.

RGB Camera

Stub for future update!