Notice: MediaWiki has been updated. Report any rough edges to marcan@marcan.st
Protocol Documentation
OpenKinect Collaboration
Please check back often, as additional insight and knowledge is gained it will be documented here.
Links
- https://github.com/adafruit/Kinect/blob/master/usbmotor.py - Python motor control code - does the "move up and down" action.
- https://gist.github.com/670533 - Sample python code
- https://github.com/adafruit/Kinect/tree/master/USBlogs/ - Some USB logs, curtosy of adafruit, hosting on the wonderful large binary file distribution site, github, for easy downloading.
USB Communication
USB Communication is currently under development, with most communication taking place using pyusb, and libusb.
Devices
- 04 - NUI Motor
- 07 E1 - NUI Camera - RGB
- 07 E2 - NUI Camera - Depth
- 06 - NUI Audio
NB Do not assume that Motor and Audio devices will always be present, as it is possible to run the camera board standalone.
Control Packet Structure
All motor/LED commands are sent via control transfers. More information on the basic structure of these packets at http://www.beyondlogic.org/usbnutshell/usb6.shtml#SetupPacket
Control Transfer (8-bytes) Request: Type (1 byte) Request (1 byte) Value (2 bytes) Index (2 bytes) Length (2 bytes)
For read packets (Type & 0x80 ?) length is the length of the response.
Motor Initialization
Verify readiness? Send: 0xC0, 0x10, 0x0000, 0x0000, 0x0001 Response: 0x22
The joint range of the Kinect motor is +31 degrees (up) and -31 degrees (down).
Tilting the Camera Up and Down
The pitch (up/down angle) of the camera can be set by sending:
request request value index data length 0x40 0x31 2*desired_angle_degrees 0x0 empty 0
The desired angle is relative to the horizon, not relative to the base. Kinect uses it's accelerometers to detect when it has reached the correct angle, and then stops. So if the Kinect is on a hill, and you set the desired angle to 0 degrees, the camera will become level, not parallel to the hill.
The value is actually in the range -128 to +128, where -128 is a bit less (more positive) than -90 degrees, and +128 is a bit less than +90 degrees. So, it is not exactly 2 x the desired angle. The mapping is a bit rough, presumably because the accelerometers aren't calibrated and accelerometers never perform identically. The value corresponds to the 9th byte of the 10-byte accelerometer report, which reports the current angle from -128 to +128.
Warning: Sending the motor past 31 degress relative to the base has been shown to cause stalling of the motor. There is no way of knowing the angle relative to the base, since angles are measured relative to the horizon! There are no automatic safe-gaurds to prevent this kind of command from being sent. And manual safeguards by restricting the range of values sent are impossible, since we don't know the angle of the base. To prevent this, it is necessary to monitor the 10-byte accelerometer report. The last byte of the accelerometer report is the motor status, where 0 means stationary and OK, 1 means stopped because the motor couldn't go any further, 4 means moving like normal, 8 means taking a quick break and about to try again because the motor couldn't go further. The second byte (out of 10) tells you the current level of strain on the motor, which applications can monitor. 0 means no strain. Due to the low torque and power consumption no damage is likely but this should be avoided to ensure maximum component life.
Setting LED
The led can be set by sending:
request request value index data length 0x40 0x06 led_option 0x0 empty 0
where the led_options are enumerated as follows:
LED_OFF = 0, LED_GREEN = 1, LED_RED = 2, LED_YELLOW = 3, LED_BLINK_YELLOW = 4, LED_BLINK_GREEN = 5, LED_BLINK_RED_YELLOW = 6
Reading Joint State
The joint state information is grouped in with the accelerometer data and is stored in the 8th and 9th byte of the return from:
request request value index data length 0xC0 0x32 0x0 0x0 buf 10
the 8th byte (buf[8]) yields:
positive_angle_degrees = value/2 negative_angle_degrees = (255-value)/2
Please note that this is not the angle of the motor, this is the angle of the kinect itself in degrees (basically accelerometer data translated)
the 9th byte (buf[9]) yields the following status codes:
0x0 - stopped 0x1 - reached limits 0x4 - moving
Reading Accelerometer
The accelerometer data is stored in two byte pairs for x,y, and z:
ux = ((uint16_t)buf[2] << 8) | buf[3]; uy = ((uint16_t)buf[4] << 8) | buf[5]; uz = ((uint16_t)buf[6] << 8) | buf[7];
The Accelerometer documentation (http://www.kionix.com/Product%20Sheets/KXSD9%20Product%20Brief.pdf) states there are 819 counts/g
Cameras
The depth camera returns values with 11-bits of precision.
RGB data should follow similar conventions but will need to be analyzed. RGB frames are significantly bigger and encoded using a Bayer pattern.
Relevent Bits of Code
The most important piece of data gathered from the libkinect project is the following struct:
struct __attribute__ ((packed)) frame_info { uint8_t magic[2]; //"RB" uint8_t control; // 00 - means incoming data, other values (if any) are control codes uint8_t cmd; // if control=0, 71 or 81- new frame, 72 or 82- current frame, 75 or 85 - EOF (7x-depth, 8x-color) uint8_t SeqNum; uint8_t pkt_seq; uint8_t unk_06; // 04 on EOF uint8_t unk_e0; // 78 on EOF uint32_t time_stamp; // all 0 on new frame. packets for one frame has the same timestamp uint8_t data[]; }; int _frame_pos; uint8_t _frame[422400];
This is a 12-byte struct followed with a data array and forms the basis of the Kinect's depth/color camera perception protocol. As you can tell we definitely have some magic inside this structure. Some of them are most likely control/status commands and will be determined later on. The _frame_pos and _frame are private member variables which work to effectively process the data.
To process an image we do the following, after pushing the data into the above struct:
- If cmd == 0x71 we have reached the end of the current frame and need to go onto the next frame. Set our _frame_pos pointer to 0.
- We get the length of the data array. This is the total length of the packet subtracting the length of the header information in the above struct.
- We memcpy the data array into our _frame array, using _frame_pos as an base index position.
- If cmd == 0x75 we have reached the end of the current frame, time to shoot it to a output of some sort.
There are 242 packets for one frame for depth camera (including 0x71 and 0x75 packets). All packets are 1760 bytes except 0x75 packet - 1144 bytes. Minus headers it gives 422400 bytes of data.
There are 162 packets for one frame for color camera (including 0x81 and 0x85 packets). All packets are 1920 bytes except 0x85 packet - 24 bytes. Minus header it gives 307200 bytes of data.
Possible improvements to this algorithm could include monitoring packet_seq and timecode for sequential data packets. Obviously if the data is corrupt you have bigger problems but it might be nice. Need to figure out the magic.
After we have the data, for basic display in the QWidget and to generate proper RGB pixels:
- Reverse the endianness of the data. (0b10000000 becomes 0b00000001)
- Shove it into an RGB bitstream, since we're only dealing with monochrome data it's just duplicated three times for convienience.
As a note there's some bit manipulation going on to get it into a pixel value. Each pixel is 11 bits, which gives it 2047 possible values it looks like. Kind of nifty. You can see the bit-shifting going on inside the libkinect source, documented here for prosperity:
for (int p = 0, bit_offset = 0; p < 640*480; p++, bit_offset += 11) {
uint32_t pixel = 0; // value of pixel
pixel = *((uint32_t *)(data+(p*11/8)));
pixel >>= (p*11 % 8);
pixel &= 0x7ff;
uint8_t pix_low = (pixel & 0x00ff) >> 0;
uint8_t pix_high = (pixel & 0xff00) >> 8;
pix_low = reverse[pix_low];
pix_high = reverse[pix_high];
pixel = (pix_low << 8) | (pix_high);
pixel >>= 5;
// Image drops the 3 MSbs
rgb[3*p+0] = 255-pixel;
rgb[3*p+1] = 255-pixel;
rgb[3*p+2] = 255-pixel;
}
Looks fairly routine, but I'll be the first to admit bit-wise operations make my head hurt, doubly so for efficient bit-wise operations.
RGB Camera
The RGB Camera follows the same frame format as the depth camera, with a minor difference of cmd being 0x8x instead of 0x7x.
The frame output of the RGB camera is a 640x480 Bayer pattern, so the total frame size is 640*480=307200 bytes.
The Bayer pattern is layed out as: RG, GB.
Control EP
Control endpoint uses different header
struct control_hdr {
uint8_t magic[2]; //can be "RB" for Input and "GM" for Output (command)
uint16_t len; //data length in words
uint16_t cmd;
uint16_t cmd2_or_tag; //cmd and tag from GM packet should be matched in response RB packet
uint16_t data[];
};
Control point is most likely inits the IR projector and cameras
Commands
zarvox did some work to determine a few useful commands, documented below. For each of the below commands, len=2, cmd=3 , and tag is a nonce. data[0] is the "Command" and data[1] is the argument to that command, or the "written data" in the event that these are just register pokes. The responses are usually 00 00 on success and 05 00 01 00 on invalid input.
Command | Behavior |
---|---|
0x0000 | Replies 05 00 01 00 unless written value > 2, in which case the IR cam turns off and glview hangs. |
0x0005 | Enable RGB streaming. Send 0 to stop RGB cam streaming, and anything else to start streaming. |
0x0006 | Enable depth streaming. Send 0 to stop depth cam streaming; anything else to start streaming. |
0x000c | Just write 0. If you write a 1, you muck up the stream (134 packets per buf before receiving flag 85), but can recover by writing 0 or 5. |
0x000d | Write 1, otherwise, produce lots of data on the RGB stream with invalid magic. |
0x000e | 0x1e for 30hz Bayer, 0x1f is 15hz 320x480 UYVY for the left half of the screen. |
0x0011 | Nothing visible happens, but the only accepted input values are 0 and 1 |
0x0012 | Set depth camera stream format. So far, we know that letting data[1] = 2 gives a packed 10-bit depth stream, and data[1] = 3 gives a packed 11-bit depth stream. data[1] = 1 appears to give a stream with variable numbers of packets. I suspect this is a compressed format, as blacking out sections of the depth camera results in fewer packets per frame. data[1] = 0 gives a long stream of uint16_t with value between 0 and 0x07ff (2^11-1), but many packets have badly broken headers (bad magic, bad flags). |
0x0013 | Send a 0, and the depth camera output produces a weird cloned appearance. Send a 1, and the depth camera works correctly. Send a 2, and the depth stream starts producing packets with invalid magic bytes - we may be parsing them wrong. Send anything else and it gets rejected. |
0x0014 | Only accepts 0x1e, as far as I've tested. |
0x0015 | Send a 0, it replies 05 00 00 00 (not 05 00 01 00 as with most other commands). Otherwise, it appears to accept the command with no visible effects. (values 1-50 return success, others 05 00 00 00) |
0x0016 | If LSB of the written value is 1, a smoothing feature on the depth camera is activated (this is the default). If it's 0, the smoothing is disabled. |
0x0017 | Enable depth hflip. Send 1: depth data is horizontally mirrored. Send anything else: depth data is oriented as usual. |
0x001a | Behaves like 0x0013. |
0x001b | Corrupts current depth and color frame. Tearing shows up on color - potentially a image-read sync command. Value doesn't matter, always returns 05 00 01 00. |
0x0047 | Enable RGB hflip. Send 0: normal orientation. Send anything else: RGB data is hflipped (but keeps the same Bayer pattern, so the pretty demosaicing looks like garbage) |
0x0048 | Send 0: normal depth buffer behavior. Send 1: Get weird data from depth stream. Full frame, but only small patches. Unknown depth coding. |
Further command discovery is still a work-in-progress.
NUI Audio
So far not much has been determined about the audio interface. It looks as if there might be bi-directional streams shooting back and forth, the XBox might be exporting some of it's own audio to the Kinect for some type of echo cancelation. Filtering out game noises certainly would be a clever thing to do.
Getting audio working might be challenging, but rewarding if built-in echo cancelation is present.