Notice: Now on HTTPS. Report any rough edges to firstname.lastname@example.org
There are several places in the article that state errors at various distances. It would be very nice to have a reference, and to be able to look at the original data itself.
--Jbrownkramer 15:08, 10 January 2012 (GMT)
If anyone is interested in a data set I made of depth measurements PM me.
I used my CNC machine to move a target away from the Kinect in 0.25 inch steps, from 23.5 to 79.0 inches.
Linear fit on 1/distance in cm results in: f(x) -0.0000288276676796794x + 0.0314486952642188 with R^2=0.999929851
There's one spot about halfway in the dataset where I think I made a positioning mistake and the remaining datapoints are shifted a tiny bit. If this is corrected then the fit will be even better.
Hi, I would like to point out that it would be very convenient to represent the depth calculations as a standard perspective transform. That would fit nicely into existing 3D graphics engines because they have many built in functions for transforming to and from screen space. Considering that Microsoft developed the Kinect for computer games, this approach makes a lot of sense.
The Direct3D transform matrix contains the elements w, h, Q, -Qn, and 1. Q = f/(f-n). The depth value calculation is
1. depth = Q - 1/z * Qn
and transforms z in [n,f] to [0,1]. Inverting eqn 1 gives
2. 1/z = 1/n - 1/Qn * depth
If we note that the Kinect generally returns values in the range of 0.3 to 10 meters, I will make the following declaration. The encoded depth value is 10 bits with an extra bit for overflow. This means that the values returned by a Kinect can be interpreted as 10 bit Direct3D depth values with the near plane set at 0.3 meters and the far plane set at 5 meters. Here, the term "depth value" has a precise meaning in the computer graphics community and we discard the notion of rawDisparity (although we acknowledge that stereo disparity is the source of these values). Now, I will show that this model matches the ROS calibrations. Plugging n=0.3, f=5 into eqn 2 gives
3. 1/z = 3.3333333333333335 - 3.1333333333333337 * depth
Remember that depth is encoded as a 10 bit number, d. So, replace depth with d/1024
4. 1/z = 3.3333333333333335 - 0.0030598958333333337 * d
Eqn 4 is within <1% of the ROS calibration. To me, using eqn 2 for an uncalibrated Kinect is much more useful because it gives a familiar and more general representation. An off the shelf game engine should be able to work with Kinect data very easily by simply setting n=0.3 and f=5 and using normal utility functions to convert from screen space to view, world or model space. This also makes it easy to use Kinect data on a GPU by upconverting it to a D15S1 depth texture (the stencil bit can be used to mark bad pixels).
Paulu 01:09, 6 April 2011 (CEST)
x = (i - w / 2) * (z + minDistance) * scaleFactor * (w/h)
I removed the * (w/h) from he calculation of the x coordinate because if I'm correct that would assume the camera pixels aren't square and an object that looks square in he picture is not really square (eg. if it has same number of pixels in and y diretion it would calculate different real lengths)