Notice: MediaWiki has been updated. Report any rough edges to

Protocol Documentation

From OpenKinect
Revision as of 05:21, 18 November 2010 by JoshB (talk | contribs) (Motor Initialization: Removed old faulty info)
Jump to: navigation, search

OpenKinect Collaboration

Please check back often, as additional insight and knowledge is gained it will be documented here.


USB Communication

USB Communicatoin is currently under development, with most communication taking place using pyusb, and libusb.


  • 04 - NUI Motor
  • 07 E1 - NUI Camera - RGB
  • 07 E2 - NUI Camera - Depth
  • 06 - NUI Audio

Control Packet Structure

All motor/LED commands are sent via control transfers. More information on the basic structure of these packets at

 Control Transfer (8-bytes) Request: 
 Type (1 byte)
 Request (1 byte)
 Value (2 bytes)
 Index (2 bytes)
 Length (2 bytes)

For read packets (Type & 0x80 ?) length is the length of the response.

Motor Initialization

Verify readiness? Send: 0xC0, 0x10, 0x0000, 0x0000, 0x0001 Response: 0x22

The joint range of the Kinect motor is +31 degrees (up) and -31 degrees (down).

Setting Joint Position

The joint position can be set by sending:

 request  request  value                    index   data   length
 0x40     0x31     2*desired_angle_degrees  0x0     empty  0

Warning: Sending the motor past 31 degress has been shown to cause stalling of the motor. There are no built-in safe-gaurds to prevent this kind of command from being sent. Due to the low torque and power consumption no damage is likely but this should be avoided to ensure maximum component life.

Setting LED

The led can be set by sending:

 request  request  value        index   data   length
 0x40     0x06     led_option   0x0     empty  0

where the led_options are enumerated as follows:

   LED_OFF    = 0,
   LED_GREEN  = 1,
   LED_RED    = 2,
   LED_YELLOW = 3,

Reading Joint State

The joint state information is grouped in with the accelerometer data and is stored in the 8th and 9th byte of the return from:

 request  request  value  index   data    length
 0xC0     0x32     0x0    0x0     buf     10

the 8th byte (buf[8]) yields:

 positive_angle_degrees = value/2
 negative_angle_degrees = (255-value)/2

the 9th byte (buf[9]) yields the following status codes:

 0x0 - stopped
 0x1 - reached limits
 0x4 - moving

Reading Accelerometer

The accelerometer data is stored in two byte pairs for x,y, and z:

 ux = ((uint16_t)buf[2] << 8) | buf[3];
 uy = ((uint16_t)buf[4] << 8) | buf[5];
 uz = ((uint16_t)buf[6] << 8) | buf[7];

The Accelerometer documentation ( states there are 819 counts/g


The depth camera returns values with 11-bits of precision.

RGB data should follow similar conventions but will need to be analyzed. RGB frames are significantly bigger and encoded using a Bayer pattern.

Relevent Bits of Code

The most important piece of data gathered from the libkinect project is the following struct:

 struct __attribute__ ((packed)) frame_info {
   uint8_t  magic[2]; //"RB"
   uint8_t  control; // 00 - means incoming data, other values (if any) are control codes
   uint8_t  cmd; // if control=0,  71 or 81- new frame, 72  or 82- current frame, 75 or 85 - EOF (7x-depth, 8x-color)
   uint8_t  SeqNum;
   uint8_t  pkt_seq;
   uint8_t  unk_06; // 04 on EOF
   uint8_t  unk_e0; // 78 on EOF
   uint32_t time_stamp; // all 0 on new frame. packets for one frame has the same timestamp
   uint8_t  data[];
   int     _frame_pos;
   uint8_t _frame[422400];

This is a 12-byte struct followed with a data array and forms the basis of the Kinect's depth/color camera perception protocol. As you can tell we definitely have some magic inside this structure. Some of them are most likely control/status commands and will be determined later on. The _frame_pos and _frame are private member variables which work to effectively process the data.

To process an image we do the following, after pushing the data into the above struct:

  1. If cmd == 0x71 we have reached the end of the current frame and need to go onto the next frame. Set our _frame_pos pointer to 0.
  2. We get the length of the data array. This is the total length of the packet subtracting the length of the header information in the above struct.
  3. We memcpy the data array into our _frame array, using _frame_pos as an base index position.
  4. If cmd == 0x75 we have reached the end of the current frame, time to shoot it to a output of some sort.

There are 242 packets for one frame for depth camera (including 0x71 and 0x75 packets). All packets are 1760 bytes except 0x75 packet - 1144 bytes. Minus headers it gives 422400 bytes of data.

There are 162 packets for one frame for color camera (including 0x81 and 0x85 packets). All packets are 1920 bytes except 0x85 packet - 24 bytes. Minus header it gives 307200 bytes of data.

Possible improvements to this algorithm could include monitoring packet_seq and timecode for sequential data packets. Obviously if the data is corrupt you have bigger problems but it might be nice. Need to figure out the magic.

After we have the data, for basic display in the QWidget and to generate proper RGB pixels:

  1. Reverse the endianness of the data. (0b10000000 becomes 0b00000001)
  2. Shove it into an RGB bitstream, since we're only dealing with monochrome data it's just duplicated three times for convienience.

As a note there's some bit manipulation going on to get it into a pixel value. Each pixel is 11 bits, which gives it 2047 possible values it looks like. Kind of nifty. You can see the bit-shifting going on inside the libkinect source, documented here for prosperity:

 for (int p = 0, bit_offset = 0; p < 640*480; p++, bit_offset += 11) {
   uint32_t pixel = 0; // value of pixel
   pixel   = *((uint32_t *)(data+(p*11/8)));
   pixel >>= (p*11 % 8);
   pixel  &= 0x7ff;
   uint8_t pix_low  = (pixel & 0x00ff) >> 0;
   uint8_t pix_high = (pixel & 0xff00) >> 8;
   pix_low  = reverse[pix_low];
   pix_high = reverse[pix_high];
   pixel = (pix_low << 8) | (pix_high);
   pixel >>= 5;
   // Image drops the 3 MSbs
   rgb[3*p+0] = 255-pixel;
   rgb[3*p+1] = 255-pixel;
   rgb[3*p+2] = 255-pixel;

Looks fairly routine, but I'll be the first to admit bit-wise operations make my head hurt, doubly so for efficient bit-wise operations.

RGB Camera

The RGB Camera follows the same frame format as the depth camera, with a minor difference of cmd being 0x8x instead of 0x7x.

The frame output of the RGB camera is a 640x480 Bayer pattern, so the total frame size is 640*480=307200 bytes.

The Bayer pattern is layed out as: RG, GB.

Control EP

Control endpoint uses different header

 struct control_hdr {
   uint8_t magic[2]; //can be "RB" for Input and "GM" for Output (command)
   uint16_t len; //data length in words
   uint16_t cmd; 
   uint16_t cmd2_or_tag; //cmd and tag from GM packet should be matched in response RB packet
   uint16_t  data[];

Control point is most likely inits the IR projector and cameras Commands remain unknown (yet)

NUI Audio

So far not much has been determined about the audio interface. It looks as if there might be bi-directional streams shooting back and forth, the XBox might be exporting some of it's own audio to the Kinect for some type of echo cancelation. Filtering out game noises certainly would be a clever thing to do.

Getting audio working might be challenging, but rewarding if built-in echo cancelation is present.