The projector uses two diffraction gratings to produce a constellation
of dots. The camera sees these dots from a slightly different
perspective (due to the 10cm or so spacing). Due to this, different
depths produce slightly horizontally shifted positions for the dots. The
Kinect performs subpixel analysis to determine the center position of
each dot, then compares it to a reference calibration "image" with every
dot position, created in the factory. It then determines depth for each
dot by comparing the horizontal displacement.
The arguments I think support this theory are:
- Multiple kinects don't interfere except at a small fraction of points.
This makes sense, since the chip would ignore extra dots not in the
calibration image, except where they overlap and break that depth sample.
- The laser is temperature-controlled with a Peltier element (both for
heating and cooling), presumably for wavelength stability. This makes
sense because, since the constellation is produced by a diffraction
grating, different wavelengths would alter the dot spacing and throw off
the calibration.
- It has been said that the Kinect is highly sensitive to mechanical
stress on the internal structure, i.e. applying a bending force to the
metal frame that joins the projector and camera causes very noticeable
changes in the depth image. This also agrees with some sort of accurate
and sensitive calibration being involved, and the perspective between
camera and projector being critical.
It's still just guesswork though, and it certainly doesn't match that
particular patent (though we don't know if PrimeSense have been working
on several different methods to do this kind of thing). Thoughts?
--
Hector Martin (hec...@marcansoft.com)
Public Key: http://www.marcansoft.com/marcan.asc
http://www.google.com/patents/about?id=eTKuAAAAEBAJ&dq=primesense
They describe a system that casts a reference field and then
cross-correlates the viewed field with reference depth images. This
basically comes down to decomposing the viewed field into regions that
correlate strongly with each of the calibrated depth images.
This seems to suggest that each kinect needs to be "factory calibrated".
That isn't ideal for mass-marketing but that wouldn't rule out the
possibility. Hector's other supporting claims are consistent with this
patent as well (sensitivity to bending, lack of major interference, etc).
I'm reading through more recent primesense patents to see if there are
any improvements on this approach (i.e. some that wouldn't require
individual device calibration).
-Andy
>On 11/25/10 7:03 AM, Hector Martin wrote:
>> That's a very interesting theory. Here's a different one that I've had
>> up until now:
>>
>> The projector uses two diffraction gratings to produce a constellation
>> of dots. The camera sees these dots from a slightly different
>> perspective (due to the 10cm or so spacing). Due to this, different
>> depths produce slightly horizontally shifted positions for the dots. The
>> Kinect performs subpixel analysis to determine the center position of
>> each dot, then compares it to a reference calibration "image" with every
>> dot position, created in the factory. It then determines depth for each
>> dot by comparing the horizontal displacement.
>This description is certainly supported by the primesense patent (most
>accounts say primsense licensed to Microsoft for the tech in kinect).
>
>http://www.google.com/patents/about?id=eTKuAAAAEBAJ&dq=primesense
>
>They describe a system that casts a reference field and then
>cross-correlates the viewed field with reference depth images. This
>basically comes down to decomposing the viewed field into regions that
>correlate strongly with each of the calibrated depth images.
>
>This seems to suggest that each kinect needs to be "factory calibrated".
>That isn't ideal for mass-marketing but that wouldn't rule out the
>possibility. Hector's other supporting claims are consistent with this
>patent as well (sensitivity to bending, lack of major interference, etc).
There is an eeprom on the camera board, but doesn't contain a great deal of data - about 600-odd
bytes. My guess is some of it's to do with laser power calibration -the biggest block is 480 bytes,
and looks like it's grouped as 30 blocks of 16 bytes
I followed it without any problems.
> - What happens in the case of multiple kinects?
If I'm understanding right the parallax has no effect on the
calculation. In other words the distance between the light source (D1,
D2) and the camera doesn't matter. I imagine a large difference in z
would matter, but from a single Kinect laser to primesense camera the
z does not change -- so it doesn't matter.
Ok if I'm not too far off, then Kinect #1 could see dots from Kinect
#2 and still calculate the right result. But because the z is
different, there is probably going to be a fixed offset after the
calculation. I'd add some sort of bayesian filter that evaluates the
probability that the dot is from Kinect #2 instead of from Kinect #1,
and go with the results with minimum mean squared error. For a more
advanced approach, I'd model the surface with nurbs and minimize the
coefficients at each control point to find the optimal fit.
> - What's special about the distance between the laser and the camera?
I think it doesn't matter as long as z is the same. However, I bet
factory calibration includes finding the "offset" (bias) of the
measurements.
I do think the laser output power is an important part of the
calibration. At least for A, and probably a scale & bias measurement.
That probably explains the (relatively expensive) peltier junction to
precisely control the laser output power.
> - Would it matter if they were closer together or farther apart?
Closer dots will have a brighter Pb (background pixel), so the
signal-to-noise ratio will be better. When dividing Z = Pd / (A Pb),
that means accuracy is better when the dots have a brighter Pb. (This
makes sense intuitively also.) It could mean that cooling the
primesense camera (think overclockers) and boosting the laser output
power would dramatically increase the range.
> - Could we make a kinect microscope by putting a magnifying lens in
> front of the camera and the laser?
Yes, but I bet you would have to recalibrate.
Cheers,
David
The peltier controls wavelength - power will be done by adjusting drive current, using an optical
feedback mechanism, typically a photodiode in the laser diode package.
The question is why is wavelngth so critical? - performance of the pattern generator or bandwidth of
the filter on the camera?
I wonder if the dot pattern spacing may be a function of wavelength - someone who knows about
diffraction may be able to answer this.
I have seen my Kinect not getting a depth image until the peltier has stabilised.
Considering the illuminator produces a geometric pattern from a point source, I find it hard to
believe that it is making significant use of strength of illumination, as opposed to geometry to
determine the depth. The size of a group of points of known angular spacing should give you all the
info you need to calculate distance.
Reflectivity of targets would have a bigger effect on intensity than proximity.
May be this article can help :
http://en.wikipedia.org/wiki/Time-of-flight_camera
Calibration would be quite quick.
With a few reference points the system can interpolate between these
points to map every single camera pixel to a vector.
Given the small amount of eprom needed to store every single pixel
vector they could probably do that too.
There is no need to do fast mapping of camera pixel + laser dot
position -> xyz. This can be done (I used to do it) in real time
using floating point calculation.
ACC
--
Adam Crow BEng (hons) MEngSc MIEEE
Technical Director
DC123 Pty Ltd
Suite 10, Level 2
13 Corporate Drive
HEATHERTON VIC 3202
http://dc123.com
phone: 1300 88 04 07
fax: +61 3 9923 6590
int: +61 3 8689 9798
It does (this is why CDs produce shiny rainbows).
If the kinect only has a small EEPROM to store calibration data, then my
guess is they're storing a series of parameters that characterize the
manufacturing deviations of the system, instead of the raw position of
each dot.
So, maybe each of the 9 "subconstellations" (each segment around a
brighter central dot, which is the zero-order beam for that portion,
which originally came from the primary grating) has a very well-defined
pattern (maybe even algorithmic), but I'm guessing they can't really
precisely know the position of each one. So all they'd have to do is
store the affine parameters for each of the 9 segments and then derive
the position of each dot from whatever algorithm.
FFS - how many times does this need repeating - KINECT DOES NOT USE TOF!!!!!
It uses a standard CMOS image sensor, which can't do TOF, and the laser is not modulated.
Address 0 :
C0 FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF
FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF
00 02 A5 A5 00 00 00 00 00 00 00 00 00 00 00 00
00 52 00 00 00 B1 FF FF 00 00 00 00 00 60 00 16
07 8C 00 00 00 04 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 01 00 00 00 01 00 00 00 00 00 00 00 00 00 01
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 16 FF FE 00 5F 02 0E 00 31 00 00 00 00 00 00
Address 0x400
00 F0 00 8B 00 63 01 39 01 75 00 00 00 00 00 00
00 F1 00 8C 00 63 01 6E 01 6E 00 00 00 00 00 00
00 F2 00 8C 00 62 01 6E 01 6E 00 00 00 00 00 00
00 F3 00 8C 00 62 01 6E 01 6E 00 00 00 00 00 00
00 F4 00 8B 00 61 01 36 01 6E 00 00 00 00 00 00
00 F5 00 8B 00 61 01 36 01 6E 00 00 00 00 00 00
00 F6 00 8B 00 61 01 32 01 6B 00 00 00 00 00 00
00 F7 00 8B 00 61 01 28 01 68 00 00 00 00 00 00
00 F8 00 8B 00 60 01 25 01 5E 00 00 00 00 00 00
00 F9 00 8B 00 60 01 25 01 93 00 00 00 00 00 00
00 FA 00 8B 00 60 01 25 02 62 00 00 00 00 00 00
00 DD 00 8B 00 68 01 5A 01 CF 00 00 00 00 00 00
00 DE 00 8B 00 68 01 5A 01 D2 00 00 00 00 00 00
00 DF 00 8A 00 16 00 D2 03 84 00 00 00 00 00 00
00 E0 00 8B 00 64 01 43 01 B4 00 00 00 00 00 00
00 E1 00 8B 00 6D 01 36 01 78 00 00 00 00 00 00
00 E2 00 8B 00 6C 01 36 01 C2 00 00 00 00 00 00
00 E3 00 8C 00 6A 01 8C 01 C8 00 00 00 00 00 00
00 E4 00 8B 00 47 00 78 03 84 00 00 00 00 00 00
00 E5 00 88 00 55 00 DC 03 84 00 00 00 00 00 00
00 E6 00 88 00 1A 00 D5 01 D6 00 00 00 00 00 00
00 E7 00 8C 00 3F 01 61 01 AE 00 00 00 00 00 00
00 E8 00 8C 00 56 01 78 01 BB 00 00 00 00 00 00
00 E9 00 8C 00 5C 01 86 01 C2 00 00 00 00 00 00
00 EA 00 8B 00 60 01 4A 01 8C 00 00 00 00 00 00
00 EB 00 8B 00 60 01 43 01 7F 00 00 00 00 00 00
00 EC 00 8B 00 62 01 3C 01 78 00 00 00 00 00 00
00 ED 00 8B 00 62 01 39 01 75 00 00 00 00 00 00
00 EE 00 8B 00 64 01 39 01 AA 00 00 00 00 00 00
00 EF 00 8C 00 63 01 75 01 75 00 00 00 00 00 00
Address 0x800
00 17 00 00 00 20 00 00 0A 10 00 00 16 04 01 01
00 17 00 00 00 21 00 00 00 00 00 00 00 00 00 00
00 17 00 00 00 20 00 00 05 E0 00 00 16 04 01 01
00 17 00 00 00 21 00 00 00 00 00 00 00 00 00 00
00 7D FF FF FF FF 00 00 00 00 00 00 00 00 00 00
00 7D 00 00 00 02 00 00 00 00 00 00 00 00 00 00
00 7D 00 00 00 07 00 00 00 00 00 00 00 00 00 00
FF FF FF FF 00 21 00 00 00 00 00 00 00 00 00 00
Slackening the screws on the illuminator, and rotating it slightly, just the amount that the screws
allow within their holes, progressively narrows the depth image field of view from full to a narrow
vertical strip about 10% of the normal width.
Slackening some more and panning left/right shifts the depth values without noticeably affecting the
FOV or the geometry.
Panning up/down shifts the FOV left/right - not the image, just the part of the image that remains
visible after rotating as above.
The amount by which even small movements affect the image suggest that some post-assembly
calibration would be necessary.
It also shows that a rigid metal mounting plate is essential in maintaining good aligmment between
illuminator and sensor
Can you better save the brute binary content in a file and attach it
to the mail ?
Then, it would easier to hexdump it by anyone here.
Thanks.
I wonder if this is one of those things that varies between people, like high frequency hearing - is
there anyone here with normal vision who _doesn't_ see the red pattern when looking towards the
illuminator?