Notice: MediaWiki has been updated. Report any rough edges to marcan@marcan.st

Protocol Documentation

From OpenKinect
Revision as of 12:54, 8 December 2010 by 87.208.37.149 (talk) (Relevent Bits of Code)
Jump to: navigation, search

OpenKinect Collaboration

Please check back often, as additional insight and knowledge is gained it will be documented here.

Links

USB Communication

USB Communication is currently under development, with most communication taking place using pyusb, and libusb.

Devices

  • 04 - NUI Motor
  • 07 E1 - NUI Camera - RGB
  • 07 E2 - NUI Camera - Depth
  • 06 - NUI Audio

NB Do not assume that Motor and Audio devices will always be present, as it is possible to run the camera board standalone.

Control Packet Structure

All motor/LED commands are sent via control transfers. More information on the basic structure of these packets at http://www.beyondlogic.org/usbnutshell/usb6.shtml#SetupPacket

 Control Transfer (8-bytes) Request: 
 Type (1 byte)
 Request (1 byte)
 Value (2 bytes)
 Index (2 bytes)
 Length (2 bytes)

For read packets (Type & 0x80 ?) length is the length of the response.

Motor Initialization

Verify readiness? Send: 0xC0, 0x10, 0x0000, 0x0000, 0x0001 Response: 0x22

The joint range of the Kinect motor is +31 degrees (up) and -31 degrees (down).

Tilting the Camera Up and Down

The pitch (up/down angle) of the camera can be set by sending:

 request  request  value                    index   data   length
 0x40     0x31     2*desired_angle_degrees  0x0     empty  0

The desired angle is relative to the horizon, not relative to the base. Kinect uses it's accelerometers to detect when it has reached the correct angle, and then stops. So if the Kinect is on a hill, and you set the desired angle to 0 degrees, the camera will become level, not parallel to the hill.

The value is actually in the range -128 to +128, where -128 is a bit less (more positive) than -90 degrees, and +128 is a bit less than +90 degrees. So, it is not exactly 2 x the desired angle. The mapping is a bit rough, presumably because the accelerometers aren't calibrated and accelerometers never perform identically. The value corresponds to the 9th byte of the 10-byte accelerometer report, which reports the current angle from -128 to +128.

Warning: Sending the motor past 31 degress relative to the base has been shown to cause stalling of the motor. There is no way of knowing the angle relative to the base, since angles are measured relative to the horizon! There are no automatic safe-gaurds to prevent this kind of command from being sent. And manual safeguards by restricting the range of values sent are impossible, since we don't know the angle of the base. To prevent this, it is necessary to monitor the 10-byte accelerometer report. The last byte of the accelerometer report is the motor status, where 0 means stationary and OK, 1 means stopped because the motor couldn't go any further, 4 means moving like normal, 8 means taking a quick break and about to try again because the motor couldn't go further. The second byte (out of 10) tells you the current level of strain on the motor, which applications can monitor. 0 means no strain. Due to the low torque and power consumption no damage is likely but this should be avoided to ensure maximum component life.

Setting LED

The led can be set by sending:

 request  request  value        index   data   length
 0x40     0x06     led_option   0x0     empty  0

where the led_options are enumerated as follows:

   LED_OFF    = 0,
   LED_GREEN  = 1,
   LED_RED    = 2,
   LED_YELLOW = 3, (actually orange)
   LED_BLINK_YELLOW = 4, (actually orange)
   LED_BLINK_GREEN = 5,
   LED_BLINK_RED_YELLOW = 6 (actually red/orange)

Other colours are possible by rapidly swapping the value to 2 different colours hundreds of times per second.

Reading Joint State

The joint state information is grouped in with the accelerometer data and is stored in the 8th and 9th byte of the return from:

 request  request  value  index   data    length
 0xC0     0x32     0x0    0x0     buf     10

the 8th byte (buf[8]) yields:

 positive_angle_degrees = value/2
 negative_angle_degrees = (255-value)/2

Please note that this is not the angle of the motor, this is the angle of the kinect itself in degrees (basically accelerometer data translated)

the 9th byte (buf[9]) yields the following status codes:

 0x0 - stopped
 0x1 - reached limits
 0x4 - moving

Reading Accelerometer

The accelerometer data is stored in two byte pairs for x,y, and z:

 ux = ((uint16_t)buf[2] << 8) | buf[3];
 uy = ((uint16_t)buf[4] << 8) | buf[5];
 uz = ((uint16_t)buf[6] << 8) | buf[7];

The Accelerometer documentation (http://www.kionix.com/Product%20Sheets/KXSD9%20Product%20Brief.pdf) states there are 819 counts/g

Cameras

The depth camera returns values with 11-bits of precision.

RGB data should follow similar conventions but will need to be analyzed. RGB frames are significantly bigger and encoded using a Bayer pattern.

Relevant Bits of Code

The most important piece of data gathered from the libkinect project is the following struct:


 struct __attribute__ ((packed)) frame_info {
   uint8_t  magic[2]; //"RB"
   uint8_t  control; // 00 - means incoming data, other values (if any) are control codes
   uint8_t  cmd; // if control=0,  71 or 81- new frame, 72  or 82- current frame, 75 or 85 - EOF (7x-depth, 8x-color)
   uint8_t  SeqNum;
   uint8_t  pkt_seq;
   uint8_t  unk_06; // 04 on EOF
   uint8_t  unk_e0; // 78 on EOF
   uint32_t time_stamp; // all 0 on new frame. packets for one frame has the same timestamp
   uint8_t  data[];
 };
 
   int     _frame_pos;
   uint8_t _frame[422400];


This is a 12-byte struct followed with a data array and forms the basis of the Kinect's depth/color camera perception protocol. As you can tell we definitely have some magic inside this structure. Some of them are most likely control/status commands and will be determined later on. The _frame_pos and _frame are private member variables which work to effectively process the data.

To process an image we do the following, after pushing the data into the above struct:

  1. If cmd == 0x71 we have reached the end of the current frame and need to go onto the next frame. Set our _frame_pos pointer to 0.
  2. We get the length of the data array. This is the total length of the packet subtracting the length of the header information in the above struct.
  3. We memcpy the data array into our _frame array, using _frame_pos as an base index position.
  4. If cmd == 0x75 we have reached the end of the current frame, time to shoot it to a output of some sort.

There are 242 packets for one frame for depth camera (including 0x71 and 0x75 packets). All packets are 1760 bytes except 0x75 packet - 1144 bytes. Minus headers it gives 422400 bytes of data.

There are 162 packets for one frame for color camera (including 0x81 and 0x85 packets). All packets are 1920 bytes except 0x85 packet - 24 bytes. Minus header it gives 307200 bytes of data.

Possible improvements to this algorithm could include monitoring packet_seq and timecode for sequential data packets. Obviously if the data is corrupt you have bigger problems but it might be nice. Need to figure out the magic.

After we have the data, for basic display in the QWidget and to generate proper RGB pixels:

  1. Reverse the endianness of the data. (0b10000000 becomes 0b00000001)
  2. Shove it into an RGB bitstream, since we're only dealing with monochrome data it's just duplicated three times for convienience.

As a note there's some bit manipulation going on to get it into a pixel value. Each pixel is 11 bits, which gives it 2047 possible values it looks like. Kind of nifty. You can see the bit-shifting going on inside the libkinect source, documented here for prosperity:

 for (int p = 0, bit_offset = 0; p < 640*480; p++, bit_offset += 11) {
   uint32_t pixel = 0; // value of pixel
 
   pixel   = *((uint32_t *)(data+(p*11/8)));
   pixel >>= (p*11 % 8);
   pixel  &= 0x7ff;
 
   uint8_t pix_low  = (pixel & 0x00ff) >> 0;
   uint8_t pix_high = (pixel & 0xff00) >> 8;
 
   pix_low  = reverse[pix_low];
   pix_high = reverse[pix_high];
 
   pixel = (pix_low << 8) | (pix_high);
   pixel >>= 5;
 
   // Image drops the 3 MSbs
   rgb[3*p+0] = 255-pixel;
   rgb[3*p+1] = 255-pixel;
   rgb[3*p+2] = 255-pixel;
 }

Looks fairly routine, but I'll be the first to admit bit-wise operations make my head hurt, doubly so for efficient bit-wise operations.

RGB Camera

The RGB Camera follows the same frame format as the depth camera, with a minor difference of cmd being 0x8x instead of 0x7x.

The frame output of the RGB camera is a 640x480 Bayer pattern, so the total frame size is 640*480=307200 bytes.

The Bayer pattern is layed out as: RG, GB.

Control EP

Control endpoint uses different header

 struct control_hdr {
   uint8_t magic[2]; //can be "RB" for Input and "GM" for Output (command)
   uint16_t len; //data length in words
   uint16_t cmd; 
   uint16_t cmd2_or_tag; //cmd and tag from GM packet should be matched in response RB packet
   uint16_t  data[];
 };

Control point is most likely inits the IR projector and cameras

Control Commands

Here are some of the commands we've fuzzed or found in USB logs.

Setting Values

control_hdr.cmd = 0x0003;

control_hdr.tag = NONCE;

control_hdr.len = 0x0002;

uint16_t data[] = { CommandID, value };


Reading Values

control_hdr.cmd = 0x0002;

control_hdr.tag = NONCE;

control_hdr.len = 0x0001;

uint16_t data[] = { CommandID, 0x0000 };


Replies

The reply returns the original packet header with the magic bits set to RB. You must do a USB read after you've written the command to ask for the reply. The first uint16_t of the reply will be the status of the command. 0x0000 is success, 0x0005 is failure. On a read, it will return the currently set value as the second uint16_t. When reads fail with 0x0005, we are assuming the command does not exist.

Note that windows errovalue 5 is ERROR_ACCESS_DENIED. Not sure if that's relevant.


Command Default Valid Range     Behavior
0x0000 0x01 Replies 05 00 01 00 unless written value > 2, in which case the IR cam turns off and glview hangs.
0x0005 0x00 Color Stream Control

0: Disable stream

1: Open RGB Stream

2: ?

3: Open IR Stream

0x0006 0x00 Depth Stream Control

0: Disable stream

1: (also opens depth stream)

2: Open Depth Stream

3: (also opens depth stream)

0x000c 0x00 0x0000-0x0005 RGB Image Format

0x0000 = Bayer

0x0001 = ?

0x0002 = ?

0x0003 = ?

0x0004 = ?

0x0005 = UYVY. Must be used in conjunction with 15hz framerate

0x000d 0x00 RGB Image Resolution

0: small

1: standard (640x480)

2: ?? Only partial frames, but a LOT of data. Might be bandwidth limited

0x000e 0x00 15, 30 RGB Framerate

0x1e (30): 30 fps

0x0f (15): 15 fps

0x0011 0x01 Nothing visible happens, but the only accepted input values are 0 and 1
0x0012 0x01 Depth Stream Format

0: uncompressed 16 bit depth stream between 0x0000 and 0x07ff (2^11-1). Causes bandwidth issues; will drop packets.

1: differential/RLE compressed 11 bit depth stream

2: 10-bit stream

3: 11-bit stream

0x0013 0x01 0x0000-0x0002 Depth Stream Resolution

0: small

1: standard (640x480)

2: lots of data - haven't gotten a coherent frame out of it.

0x0014 0x00  ?, 30 Depth Framerate

0x1e (30): 30 fps

0x0015 0x1e Send a 0, it replies 05 00 00 00 (not 05 00 01 00 as with most other commands). Otherwise, it appears to accept the command with no visible effects. (values 1-50 return success, others 05 00 00 00)
0x0016 0x0001 0, 1 Depth Smoothing

LSB = 0: Smoothing Disabled

LSB = 1: Smoothing Enabled (default)

0x0017 0x0000 0, 1 Depth H-Flip

0: Regular feed

1: Flipped Horizontally

0x0019 0x0000 0x0000 IR Stream Format (unconfirmed, inferred from RGB & depth commands)
0x001a 0x01 0x0000-0x0002 IR Stream Resolution

0: small

1: standard (640x480)

2: lots of data - haven't gotten a coherent frame out of it. .

0x001b 0x00  ?, 30 IR Framerate (unconfirmed, inferred from RGB & depth commands)

0x1e (30): 30 fps

0x0024 0x01 Unknown, but has nonzero default value
0x002d 0x01 Unknown, but has nonzero default value
0x0047 0x00 Enable RGB hflip. Send 0: normal orientation. Send anything else: RGB data is hflipped (but keeps the same Bayer pattern, so the pretty demosaicing looks like garbage)
0x0048 0x00 Send 0: normal depth buffer behavior. Send 1: Get weird data from depth stream. Full frame, but only small patches. Unknown depth coding.
0x0100 0x0001 0x0000-0x0001 unknown function
0x0101 0x0000 0x0000-0xFFFF unknown function
0x0102 0x0000 0x0000-0x0001 unknown function
0x0103 0x008d (Interesting: mine reads 0x008a - zarvox) All attempted values fail to set.
0x0104 0x012c 0x0000-0xFFFF unknown function
0x0105 0x0000 (mine reads 0x005a - zarvox) 0x0000-0xFFFF unknown function
0x0106 0x01f4 0x0000-0xFFFF unknown function
0x0107 0x0bb8 0x0000-0xFFFF unknown function
0x0108 0x0000 0x0000-0x0001 unknown function
0x0109 0x002a 0x0000-0xFFFF unknown function
0x010a 0x001b 0x0000-0xFFFF unknown function
0x010b 0x0008 0x0000-0xFFFF unknown function
0x010c 0x0003 0x0000-0xFFFF unknown function
0x010d 0x00fa 0x0000-0x???? Setting to 0xFFFF crashes the command endpoint
0x010e 0x0004 0x0000-0x00FF unknown function
0x010f 0x2710 0x0000-0xFFFF unknown function
0x0110 0x0004 0x0000-0xFFFF unknown function
0x0111 0x0008 0x0000-0xFFFF unknown function
0x0112 0x1388 0x0000-0xFFFF unknown function
0x0113 0x0078 0x0000-0xFFFF unknown function
0x0114 0x03e8 0x0000-0xFFFF unknown function
0x0115 0x3a98 0x0000-0xFFFF unknown function
0x0116 0x0064 0x0000-0x00FF unknown function
0x0117 0x00b7 0x0000-0x00FF unknown function
0x0118 0x006c 0x0000-0x00FF unknown function
0x0119 0x00ca 0x0000-0x00FF unknown function
0x011a 0x00f5 0x0000-0x00FF unknown function
0x011b 0x0027 0x0000-0x00FF unknown function
0x011c 0x0005 unknown function

Further command discovery is still a work-in-progress.

Color CMOS Camera Register Access

The USB dumps show us that the xbox is modifying the color CMOS sensor's registers manually. The sensor is very similar to the mt9v112: [1], but has a larger sensor and some of the registers don't align with the above document.

RGB Camera config strings

Header:

control_hdr.cmd = 0x0095;

control_hdr.tag = NONCE;

control_hdr.len = sizeof(data) / sizeof(uint16_t);


There appear to be a number of strings that affect the RGB feed that take the format:

uint16_t data[] = { RegisterCount, Address0, Value0, Address1, Value1, Address2, Value2 ... };


RegisterCount Is the number of Address/Value pairs in the command. The maximum number seems to be 0xC (12).

To write a register, set the high bit (0x8000). To read a register, do not set the high bit.


Here are a couple of raw strings from the USB dump:

0c0021800080048000050380000407808e0208805f000b80460039821605578264025882e0025c8210155d82151a3b82e604

0c000280680001801c002581050005810300478130109d81ae3c5381102054814060558180a05681c0d05781e0f0588100ff

0C0005810100478130109D81AE345381102054814060558180A05681C0D05781E0F0588100FF068182742E8244102F820091


We're not 100% sure what everything is, but here's what we think we know.

Address Default Behavior
0x0000 0x148C Sensor Core R0:0—0x000 – Chip Version (Read Only)
0x0001 0x001C Sensor Core R1:0—0x001 – Row Start

The first row to be read out (not counting dark rows that may be read). To window the image down, set this register to the starting Y value. Setting a value less than 8 is not recommended since the dark rows should be read using Reg0x022.

0x0002 0x0068 Sensor Core R2:0—0x002 – Column Start

The first column to be read out (not counting dark columns that may be read). To window the image down, set this register to the starting X value. Setting a value below 0x18 is not recommended since readout of dark columns should be controlled by Reg0x022.

0x0003 0x0400(1024) Sensor Core R3:0—0x003 - Row Count

Number of rows in the image to be read out (not counting dark rows or border rows that may be read).

0x0004 0x0500(1280) Sensor Core R4:0—0x004 - Column Count

Number of columns in image to be read out (not counting dark columns or border columns that may be read).

0x0007 0x028D Exposure control? 0x028E seems to be 33ms, and 0x0000 is about 500ms. XBox Init changes this to 0x28E.
0x0008 0x005F
0x000B 0x0000 XBox Init changes this to 0x0046
0x0021 0x8000
0x8105 0x0003 R5:1—0x105 – Aperture Correction

Aperture correction scale factor used for sharpening.

Bit 3 Enables automatic sharpness reduction control (see R51:2 0x233).

Bits 2:0 Sharpening factor:

000: No sharpening.

001: 25% sharpening.

010: 50% sharpening.

011: 75% sharpening. (Default)

100: 100% sharpening.

101: 125% sharpening.

110: 150% sharpening.

111: 200% sharpening.

0x0106 0x648E R6:1—0x106 – Operating Mode Control (Read/Write)

XBox Init sets this to 0x7482

This register specifies the operating mode of the IFP.

Bit 15 Enables manual white balance. (Default=1)

User can set the base matrix and color channel gains. this bit must be asserted and de-asserted with a frame in between to force new color correction settings to take effect.

Bit 14 Enables auto exposure. (Default=1)

Bit 13 Enables on-the-fly defect correction. (Default=1)

Bit 12 Reserved—obsolete. The user should write a “0” to this bit. (Default=0)

Bit 11 not used. - Note that this is the description from the mt9v112; the Kinect sets this bit to true in init strings, and it may do something on this variation of the part.

Bit 10 Enables lens shading correction. (Default=0)

Bits 9:8 Reserved. (Default=0)

Bit 7 Enables automatic flicker detection. (Default=0, XBox init sequence sets to 1)

Bit 6 Reserved for future expansion. (Default=0)

Bit 5 Reserved. (Default=0)

Bit 4 Bypasses color correction matrix. (Default=0)

0: Normal color processing.

1: Outputs “raw” color bypassing color correction.

Bits 3:2 Auto exposure back light compensation control.

00: Auto exposure sampling window is specified by R38:2 and R39:2 (“large window”). (XBox init)

01: Auto exposure sampling window is specified by R43:2 and R44:2 (“small window”). (Default)

1X: Auto exposure sampling window is specified by the weighted sum of the large window

Bit 1 Enables auto white balance.

0: Freezes white balance at current values. (Default)

1: Enables auto white balance. (XBox init)

Bit 0 Reserved for future expansion. (Default=1)

0x0125 0x0005

R37:1—0x125 – Color Saturation Control (Read/Write)

Bit 5:3 Specify overall attenuation of the color saturation.

000: Full color saturation (Default)

001: 75% of full saturation

010: 50% of full saturation

011: 37.5% of full saturation

100: 25% of full saturation

101: 150% of full saturation

110: Black and white


Bit 2:0 Specify color saturation attenuation at high luminance (linearly increasing attenuation from no attenuation to monochrome at luminance of 224).

000: No attenuation.

001: Attenuation starts at luminance of 216.

010: Attenuation starts at luminance of 208.

011: Attenuation starts at luminance of 192.

100: Attenuation starts at luminance of 160.

101: Attenuation starts at luminance of 96. (Default)

0x0147 0x2040 XBox Init sets this to 0x1030
0x0153 0x0E04 XBox Init sets this to 0x2010. RGB Gain Ramp. Elements 1,0. Controls dark pixels.
0x0154 0x4C28 XBox Init sets this to 0x6040. RGB Gain Ramp. Elements 3,2.
0x0155 0x9777 XBox Init sets this to 0xA080. RGB Gain Ramp. Elements 5,4. Controls medium brightness pixels.
0x0156 0xC7B1 XBox Init sets this to 0xD0C0. RGB Gain Ramp. Elements 7,6.
0x0157 0xEEDB XBox Init sets this to 0xF0E0. RGB Gain Ramp. Elements 9,8.
0x0158 0xFF00 XBox Init sets this to 0xFF00. RGB Gain Ramp. Element 10,11. Controls brights and 11 seems to affect bloom somehow.
0x019D 0x3CAE Seen both 0x3CAE and 0x34AE set for this address
0x022E 0x0C44 XBox Init sets this to 0x1044.
0x022F 0x9120 XBox Init sets this to 0x9100.
0x0239 0x0690 XBox Init sets this to 0x0516.
0x023B 0x03DE XBox Init sets this to 0x04E6.
0x0257 0x0267 XBox Init sets this to 0x0264.
0x0258 0x02E1 XBox Init sets this to 0x02E0.
0x025C 0x1610 XBox Init sets this to 0x1510.
0x025D 0x1A14 XBox Init sets this to 0x1A15.

NUI Audio

So far not much has been determined about the audio interface. It looks as if there might be bi-directional streams shooting back and forth, the XBox might be exporting some of it's own audio to the Kinect for some type of echo cancelation. Filtering out game noises certainly would be a clever thing to do.

Getting audio working might be challenging, but rewarding if built-in echo cancelation is present.

 <+marcan> anyway, a long time ago (before I actually got the depth/rgb stuff to work, actually) I pretty much figured out the marvell init procedure
 <+marcan> i.e. download of live firmware, booting, download of the cancellation filter data
 <+marcan> buut I didn't write it up because I'm lazy :P