aligning depth to external camera

1,928 views
Skip to first unread message

Kyle McDonald

unread,
Apr 28, 2011, 10:30:03 AM4/28/11
to openk...@googlegroups.com, James George
hi everyone,

i'm working on projecting the kinect point cloud down to an rgb camera.

we're using a custom mount
http://www.flickr.com/photos/49322752@N00/sets/72157626569601694/
it's fairly easy to make, and super solid. here is a diagram in
sketchup http://sketchup.google.com/3dwarehouse/details?mid=13dd4e1f541420bb4366ba10874ef691

first step is to calibrate both the cameras for their intrinsic
parameters and distortion, and then their extrinsic parameters, using
chessboard calibration. i have that part working correctly with a very
low reprojection error. i'm covering up the IR projector with a
diffuser to get the best results.

the next step is to project the kinect depth image into a real space
point cloud. this takes two steps:

1 converting raw depth values to real depth values
2 2d->3d projection, akin to OpenNI's ConvertProjectiveToRealWorld

for step 1, i've been using stéphane magnenat's equation from
https://groups.google.com/group/openkinect/browse_thread/thread/31351846fd33c78/e98a94ac605b9f21?lnk=gst&q=stephane&pli=1

i know this is modeled on the original ROS raw/depth data, but does
anyone know how universal it is? is it really accurate across all
kinects? how much does it disagree with the OpenNI raw->depth
conversion? has anyone tried reverse engineering the OpenNI raw->depth
conversion?

for step 2, i'm using the following:

float fx = tanf(fov.x / 2) * 2;
float fy = tanf(fov.y / 2) * 2;
float xReal = ((x - principalPoint.x) / imageSize.width) * z * fx;
float yReal = ((y - principalPoint.y) / imageSize.height) * z * fy;

which can be done in a single matrix operation a la glpclview.c
LoadVertexMatrix() i suppose. any tips would be great here. i think
the principal point should be slightly offset to compensate for the
difference between the ir and depth images (given on the ROS site as
-4.8 x -3.9) but i'm not sure where to put this. should it modify the
principal point?

finally, the last step is to project the point cloud onto the rgb
image and sample the colors. for this i'm using cv::projectPoints()
which is also kind enough to handle lens undistortion.

is this whole process correct?

the main thing i'm skeptical about right now is converting raw to
depth values. i have a suspicion that it varies from kinect to kinect,
and that this can cause some variation in the accuracy of this whole
process.

thanks,
kyle

Kyle McDonald

unread,
Apr 29, 2011, 1:06:02 AM4/29/11
to openk...@googlegroups.com
tl;dr: what is the state-of-the-art for raw->depth conversion?

and how much does it vary per kinect?

kyle

Kyle McDonald

unread,
Apr 29, 2011, 10:37:41 AM4/29/11
to openk...@googlegroups.com, James George
i just wrote an app to capture a chessboard simultaneously from a
color camera and the kinect depth camera. i moved the chessboard from
~50cm to ~90cm (which is the most important range for me) and
simultaneously stored the raw kinect data and the chessboard distance
(as given by cv::solvePnP). the standard deviation between my measured
distance and the distance given by stéphane's equation is 4 mm, which
is well below the noise threshold of the kinect in my experience.

in other words, i can confirm for at least one other kinect that
stéphane's model is accurate for the near range, and that it's not
simply 'tuned' to the ROS data.

here is the data on pastebin: http://pastebin.com/pnVkzHUJ

i'm still interested in any other insight/feedback on the other
calibration stuff!

kyle

vinot

unread,
Apr 29, 2011, 3:00:45 PM4/29/11
to OpenKinect

hi,

afaik the process is correct

if use openni instead freenect, you don't need calibration for kinect.

take a look at nicolas burrus job, specially in his calibration (based
on ros) because extrinsic calibration, could be more easy instead
using openni.

another question is about stability of kinect calibration: microsoft
delivers a card to calibrate it (with ps I supose), so I must imagine
that it must be recalibrated. This is why I prefer freenect, I can
control calibration each time.

very nice your job, please keep us informed about your progress

toni

Kyle McDonald

unread,
Apr 29, 2011, 11:31:49 PM4/29/11
to openk...@googlegroups.com
thanks toni,

i've read through nicholas' work, it's been very helpful. openni won't
work here, because i'm using an external (non-kinect camera).

but this does lead me to two other burning questions i've had for a
few months now:

1 does anyone understand how the xbox calibration card works? when/why
it's used?

2 how does openni align the color and depth images? either all kinects
are similar enough that the extrinsics between the rgb + ir cam are
hardcoded, or every kinect is different -- in which case they must be
reporting their in-factory calibration data...

best,
kyle

drew.m...@gmail.com

unread,
Apr 30, 2011, 1:57:35 AM4/30/11
to openk...@googlegroups.com
On Fri, Apr 29, 2011 at 8:31 PM, Kyle McDonald <ky...@kylemcdonald.net> wrote:
> thanks toni,
>
> i've read through nicholas' work, it's been very helpful. openni won't
> work here, because i'm using an external (non-kinect camera).
>
> but this does lead me to two other burning questions i've had for a
> few months now:
>
> 1 does anyone understand how the xbox calibration card works? when/why
> it's used?

Haven't read anything on this. Nor have I seen/heard of anyone
actually using it ever. I'm curious too.

> 2 how does openni align the color and depth images? either all kinects
> are similar enough that the extrinsics between the rgb + ir cam are
> hardcoded, or every kinect is different -- in which case they must be
> reporting their in-factory calibration data...

It's the latter. Commands 0x40 and 0x41 on the Kinect return the
registration and frame padding information respectively [1]. OpenNI
has an algorithm that utilizes these parameters to register the depth
image to the RGB image [2].

Reimplementing these in libfreenect is actually on my list of TODO
items, but fairly far down, so I'd love any help offered. Any takers?
:)

> best,
> kyle

-Drew

[1] - https://github.com/avin2/SensorKinect/commit/48f6059b232840e7aa4d671c7dfa3545fff907d8#diff-6
[2] - https://github.com/PrimeSense/Sensor/blob/master/Source/XnDeviceSensorV2/Registration.cpp#L503

Carlos Roberto

unread,
Apr 30, 2011, 7:21:00 AM4/30/11
to openk...@googlegroups.com
I bought my kinect in January and only after some couple weeks playing, one day I received a message asking to get the calibration card and do some things with the card. And that happened only one. Any way, I dont know what make Xbox ask the player to calibrate the kinect.
Regards.

Carlos Roberto
Software Eng. Consultant @ IBM

My profiles: LinkedIn Twitter Blogger
--
Carlos Roberto
Software Eng. Consultant @ IBM
My Blog
My LinkedIn
follow me @ twitter

Kyle McDonald

unread,
Apr 30, 2011, 12:36:43 PM4/30/11
to openk...@googlegroups.com
wow, this is incredible + good news! i had my suspicions...

it looks like this algorithm is based on the fact that the images are
almost rectified: it's just the fov that's different. they're taking
each pixel in the depth image and using two lookup tables
(m_pRegistrationTable and m_pDepthToShiftTable) to figure out where it
should be in the rgb image, and saving that to an image buffer.

i'm not familiar with setting custom registers/sending custom commands
using libfreenect. but once i have access to the registration +
padding information i would love to write some code to align the
images similar to how this openni snippet is working.

in my opinion, it's super important for off the shelf computer vision
use: once you have aligned color+depth images, it opens things up a
bit. optical flow is a bit awkward on the depth image, but happens
naturally on the color image, for example.

if you could walk me through getting this data via libfreenect, on or
off list, that'd be great. i'd love to contribute.

thanks,
kyle

drew.m...@gmail.com

unread,
May 2, 2011, 5:44:59 AM5/2/11
to openk...@googlegroups.com
On Sat, Apr 30, 2011 at 9:36 AM, Kyle McDonald <ky...@kylemcdonald.net> wrote:
> wow, this is incredible + good news! i had my suspicions...
>
> it looks like this algorithm is based on the fact that the images are
> almost rectified: it's just the fov that's different. they're taking
> each pixel in the depth image and using two lookup tables
> (m_pRegistrationTable and m_pDepthToShiftTable) to figure out where it
> should be in the rgb image, and saving that to an image buffer.
>
> i'm not familiar with setting custom registers/sending custom commands
> using libfreenect. but once i have access to the registration +
> padding information i would love to write some code to align the
> images similar to how this openni snippet is working.

Excellent! This is why teamwork is great! He're the part you wanted: [1]. :)

> in my opinion, it's super important for off the shelf computer vision
> use: once you have aligned color+depth images, it opens things up a
> bit. optical flow is a bit awkward on the depth image, but happens
> naturally on the color image, for example.

I agree. Registration would make everything much neater.

> if you could walk me through getting this data via libfreenect, on or
> off list, that'd be great. i'd love to contribute.

Start with my branch linked in [1] (it's just one commit atop the
as-of-writing master/unstable) and call freenect_get_reg_info() and
freenect_get_reg_pad_info(), which will return structs equivalent in
structure to XnRegistrationInformation1080 and
XnRegistrationPaddingInformation, respectively. Let me know if you
need more guidance than that; I'm zarvox on IRC and the rest of my
contact information is on the wiki.

I can't guarantee that whatever will eventually go into libfreenect
will bear exactly the same function signatures, but I'll promise that
the structs themselves will have the same layout (if not the same
names). Ideally, they'd be invisible to the end user - the user would
just call an enable/disable registration function, and the driver
would handle all this behind the scenes. But in the meantime, I think
what I've written is the quickest/easiest way to get the data in your
hands for development/testing; we can always clean everything up
before we merge.

> thanks,
> kyle

Best,
Drew

[1] - https://github.com/zarvox/libfreenect/tree/registration

Kyle McDonald

unread,
May 2, 2011, 6:12:57 PM5/2/11
to openk...@googlegroups.com
i'm 'porting' the primesense registration code into something more
self contained/removing dependencies, so libfreenect can use it
easily. i have a few questions.

there are two registration routines, for chips 1000 and 1080. i'm
assuming kinect is 1080, and the reference sensor is 1000, because
1080 deals with color and 1000 doesn't.

there are two important lookup tables used by Apply1080:
m_pDepthToShiftTable and m_pRegistrationTable

m_pRegistrationTable is built by BuildDepthToShiftTable. and this
function grabs three parameters that i don't know the value, and it
looks like it's asking the kinect for them:

m_pStream->GetProperty(XN_STREAM_PROPERTY_ZERO_PLANE_PIXEL_SIZE,
&dPlanePixelSize);
m_pStream->GetProperty(XN_STREAM_PROPERTY_ZERO_PLANE_DISTANCE, &nPlaneDsr);
XnDepthPixel nMaxDepth = m_pStream->GetDeviceMaxDepth();

i found some fov relationships between the zpps and zpd that at least
gives me an approximate ratio between the two (.00083 ~= zpps / zpd)
but i don't think that's enough to generate the depth to shift table.
so i'm also going to need these values.

for nMaxDepth, i'm pretty sure it's just 2048 (or maybe lower), but it
looks like it's polling the kinect for that value as well.

Apply1080 internally needs to know whether the depth image is mirrored
(m_pDepthStream->IsMirrored()). how can i check this property with
libfreenect?

i think it's about 250 lines for a pretty straight port that removes
primesense dependencies, but it could probably be reduced to 150-200
lines. i have a version that's compiling and running, but not
producing anything reasonable (probably because of the made up zpps
and zpd values).

tl;dr, i need three more things to make this work:
ZERO_PLANE_PIXEL_SIZE, ZERO_PLANE_DISTANCE, GetDeviceMaxDepth()

kyle

drew.m...@gmail.com

unread,
May 2, 2011, 7:34:14 PM5/2/11
to openk...@googlegroups.com
On Mon, May 2, 2011 at 3:12 PM, Kyle McDonald <ky...@kylemcdonald.net> wrote:
> i'm 'porting' the primesense registration code into something more
> self contained/removing dependencies, so libfreenect can use it
> easily. i have a few questions.
>
> there are two registration routines, for chips 1000 and 1080. i'm
> assuming kinect is 1080, and the reference sensor is 1000, because
> 1080 deals with color and 1000 doesn't.

Yes, that is correct. :)

> there are two important lookup tables used by Apply1080:
> m_pDepthToShiftTable and m_pRegistrationTable
>
> m_pRegistrationTable is built by BuildDepthToShiftTable. and this
> function grabs three parameters that i don't know the value, and it
> looks like it's asking the kinect for them:
>
> m_pStream->GetProperty(XN_STREAM_PROPERTY_ZERO_PLANE_PIXEL_SIZE,
> &dPlanePixelSize);
> m_pStream->GetProperty(XN_STREAM_PROPERTY_ZERO_PLANE_DISTANCE, &nPlaneDsr);
> XnDepthPixel nMaxDepth = m_pStream->GetDeviceMaxDepth();

Don't know these off the top of my head; I'll look into finding out
more for you when I can.

> i found some fov relationships between the zpps and zpd that at least
> gives me an approximate ratio between the two (.00083 ~= zpps / zpd)
> but i don't think that's enough to generate the depth to shift table.
> so i'm also going to need these values.
>
> for nMaxDepth, i'm pretty sure it's just 2048 (or maybe lower), but it
> looks like it's polling the kinect for that value as well.

MaxDepthValue appears to be a settable parameter in OpenNI which
defaults to 10000. Not sure what the units are on that, though.

> Apply1080 internally needs to know whether the depth image is mirrored
> (m_pDepthStream->IsMirrored()). how can i check this property with
> libfreenect?

Ahh, that's another feature that we need to add configurable support
for. For now, isMirrored() will always be false, and in the future,
they'll be member variables of the freenect_device struct (I'll
probably call them video_mirrored and depth_mirrored).

> i think it's about 250 lines for a pretty straight port that removes
> primesense dependencies, but it could probably be reduced to 150-200
> lines. i have a version that's compiling and running, but not
> producing anything reasonable (probably because of the made up zpps
> and zpd values).
>
> tl;dr, i need three more things to make this work:
> ZERO_PLANE_PIXEL_SIZE, ZERO_PLANE_DISTANCE, GetDeviceMaxDepth()
>
> kyle

Awesome progress! I'll poke around and see what I can find. :)

-Drew

Joshua Blake

unread,
May 2, 2011, 7:47:46 PM5/2/11
to openk...@googlegroups.com


On May 2, 2011 6:13 PM, "Kyle McDonald" <ky...@kylemcdonald.net> wrote:
>
> i'm 'porting' the primesense registration code into something more
> self contained/removing dependencies, so libfreenect can use it
> easily.

Please make sure that you are not copy/pasting any sensor code and that you don't copy the style/naming accidentally. The licenses are incompatible and we want to avoid any issues there. Otherwise, sounds like you are going in the right direction so keep up the great work!

Josh

Kyle McDonald

unread,
May 3, 2011, 12:24:58 AM5/3/11
to openk...@googlegroups.com
i don't think there should be any license issues. right now i'm
working in a completely separate openFrameworks project with parts of
the Sensor code and parts of libfreenect, using recorded raw + color
data. once i can see things working, i'll rewrite the algorithm from
scratch. it's just a lot of moving data around and interpolating
things.

MaxDepthValue at 10000 makes sense. in my experience openni prefers mm
units (which are actually very nice, because they allow you to store
distance in a short int while allowing for the necessary precision and
range). so 10000 would be 10 meters, which is probably the farthest
i've seen the kinect operate at.

my original assumption 2048 was based on an inverted understanding.
the Sensor code creates a table of distance->raw values, then uses
this to interpolate into a raw->distance table. 2048 would make sense
if it was the other way around (a la the glview m_gamma LUT).

isMirrored() seems to change depending on whether you've ever used
openni or not. i remember a discussion about this... but anyway, for
now i'll assume mirrored is false.

all i know about the two other params i need are in XnStreamParams.h:

/** XN_DEPTH_TYPE */
#define XN_STREAM_PROPERTY_ZERO_PLANE_DISTANCE "ZPD"
/** Real */
#define XN_STREAM_PROPERTY_ZERO_PLANE_PIXEL_SIZE "ZPPS"

but i have no idea how to actually get those values. Sensor is such a
complex web of XnProperty, XnIntProperty, XnActualIntProperty... by
the time i've found something that actually does anything i forget
what i'm looking for.

kyle

drew.m...@gmail.com

unread,
May 3, 2011, 1:11:05 PM5/3/11
to openk...@googlegroups.com
On Mon, May 2, 2011 at 9:24 PM, Kyle McDonald <ky...@kylemcdonald.net> wrote:
> i don't think there should be any license issues. right now i'm
> working in a completely separate openFrameworks project with parts of
> the Sensor code and parts of libfreenect, using recorded raw + color
> data. once i can see things working, i'll rewrite the algorithm from
> scratch. it's just a lot of moving data around and interpolating
> things.

Good, good. I probably ought to rename all the fields of those
structs...thanks for the reminder, JoshB.

> MaxDepthValue at 10000 makes sense. in my experience openni prefers mm
> units (which are actually very nice, because they allow you to store
> distance in a short int while allowing for the necessary precision and
> range). so 10000 would be 10 meters, which is probably the farthest
> i've seen the kinect operate at.
>
> my original assumption 2048 was based on an inverted understanding.
> the Sensor code creates a table of distance->raw values, then uses
> this to interpolate into a raw->distance table. 2048 would make sense
> if it was the other way around (a la the glview m_gamma LUT).

Ah, that makes sense.

> isMirrored() seems to change depending on whether you've ever used
> openni or not. i remember a discussion about this... but anyway, for
> now i'll assume mirrored is false.

Yep, this was an issue before, but I fixed it back in February for
unstable, and the commit has now reached master: [1]. So even if
OpenNI enables it, we disable it again now.

> all i know about the two other params i need are in XnStreamParams.h:
>
> /** XN_DEPTH_TYPE */
> #define XN_STREAM_PROPERTY_ZERO_PLANE_DISTANCE          "ZPD"
> /** Real */
> #define XN_STREAM_PROPERTY_ZERO_PLANE_PIXEL_SIZE        "ZPPS"
> but i have no idea how to actually get those values. Sensor is such a
> complex web of XnProperty, XnIntProperty, XnActualIntProperty... by
> the time i've found something that actually does anything i forget
> what i'm looking for.

Yes, this is a mess to walk through. Thank goodness for cscope.

It took a while, but I've tracked it down - it's one of the so-called
"Fixed Parameters" that I need to pull from the device, so that's
another piece of USB communication that I'll need to implement. It'll
probably be unique per-device, but immutable. I'll look into that as
soon as I can.

In the meantime, assume that I will (eventually) provide a function
that will return a struct called ZeroPlaneInfo that looks like:

struct ZeroPlaneInfo {
float distance;
float pixel_size;
}

One last thing: you asked on IRC for a list of my Kinect's parameters;
here's the data from the two Kinects I have access to, in the OpenNI
naming scheme. [2]

-Drew

[1] - https://github.com/OpenKinect/libfreenect/commit/9b533d4c0253e2af5bb0ac65e05ec1d155f09203
[2] - http://pastebin.com/YDrNsYgC

Kyle McDonald

unread,
May 4, 2011, 7:10:02 PM5/4/11
to openk...@googlegroups.com
good to see the parameters aren't hugely different between cameras,
but they are slightly different... :) the ones that look very
different are kind of misleading, because they're actually less than
32 bits packed into an int32. there's a simple function in Sensor that
decodes (shifts + masks) them. a corollary might already be in
libfreenect, i'll have to look.

it would be worth renaming the parameters of the structs if just for
the sake of making them cleaner :) in a way, i think libfreenect is
catching up with a lot of what's already available from PrimeSense --
but the difference is:

1 it's a community effort, aimed at getting people to develop a shared
understanding of the hardware and techniques involved
2 it's designed for "user-developers" (hackers) who don't necessarily
have a CS degree / don't recite design patterns in their sleep...

and small things like struct field names can be a great place for that
difference to shine.

re mirroring: good to hear. i think i've seen it flip back after using
openni recently, so i know it's working then.

those two 'fixed parameters' (the ZeroPlaneInfo struct) are exactly
what i need. if you can poll them and get me some numbers to start
with, even if they're specific to your camera, that would be hugely
helpful. i can't really move further till i have them.

in other news, i have the kinect calibrated nicely to an external
camera using opencv:

http://www.flickr.com/photos/kylemcdonald/5686302302/in/photostream

which i personally think is super exciting. the code is on github
already but needs some more cleaning to make it hacker friendly.

kyle

Joshua Blake

unread,
May 4, 2011, 8:13:46 PM5/4/11
to openk...@googlegroups.com
On Wed, May 4, 2011 at 7:10 PM, Kyle McDonald <ky...@kylemcdonald.net> wrote:
i think libfreenect is
catching up with a lot of what's already available from PrimeSense --
but the difference is:

1 it's a community effort, aimed at getting people to develop a shared
understanding of the hardware and techniques involved
2 it's designed for "user-developers" (hackers) who don't necessarily
have a CS degree / don't recite design patterns in their sleep...
 
Great description of the project and the advantages. I think it's very exciting that we have a fairly large and continuously growing base of people who understand pretty complicated details of Kinect and are exposing this knowledge to general developers. It's also exciting that people are motivated to invest in learning some of these concepts that are typically reserved for PhD-level researchers.

drew.m...@gmail.com

unread,
May 5, 2011, 2:39:08 AM5/5/11
to openk...@googlegroups.com
On Wed, May 4, 2011 at 4:10 PM, Kyle McDonald <ky...@kylemcdonald.net> wrote:
> good to see the parameters aren't hugely different between cameras,
> but they are slightly different... :) the ones that look very
> different are kind of misleading, because they're actually less than
> 32 bits packed into an int32. there's a simple function in Sensor that
> decodes (shifts + masks) them. a corollary might already be in
> libfreenect, i'll have to look.

Oh right, I forgot to mask all the fields out, sorry. There's not a
function for that in libfreenect yet, but it might be worth adding
either an inline function or a #define for it.

> it would be worth renaming the parameters of the structs if just for
> the sake of making them cleaner :) in a way, i think libfreenect is
> catching up with a lot of what's already available from PrimeSense --
> but the difference is:
>
> 1 it's a community effort, aimed at getting people to develop a shared
> understanding of the hardware and techniques involved
> 2 it's designed for "user-developers" (hackers) who don't necessarily
> have a CS degree / don't recite design patterns in their sleep...
>
> and small things like struct field names can be a great place for that
> difference to shine.

I completely agree with you and JoshB here. I also look forward to an
open-source implementation of the Microsoft skeleton tracking
algorithm. Whenever that will happen.

> those two 'fixed parameters' (the ZeroPlaneInfo struct) are exactly
> what i need. if you can poll them and get me some numbers to start
> with, even if they're specific to your camera, that would be hugely
> helpful. i can't really move further till i have them.

After much confusion (the device returns a lot more data than
sizeof(XnFixedParams)! what on earth?), it is done. Because of that
wonkiness, I can't actually guarantee that these are the right
numbers, but they were the only spot that looked like four floats in a
row, so I think they'll be correct. Pull the commit from my
registration branch [1].

My lab's Kinect's numbers, for reference, are:

distance: 120.000000
pixel_size: 0.104200

> in other news, i have the kinect calibrated nicely to an external
> camera using opencv:
>
> http://www.flickr.com/photos/kylemcdonald/5686302302/in/photostream
>
> which i personally think is super exciting. the code is on github
> already but needs some more cleaning to make it hacker friendly.

Nice work! :) I really should learn more about OpenCV some time...

And just for warning: I'll be gone for a week or so as I'm attending a
conference out of the country, so I might be a bit latent in replies
and unable to actually test code since I probably won't bring my
Kinect with me. Apologies in advance. On the bright side, I was able
to get this part done for you before I leave. :)

-Drew

[1] - https://github.com/zarvox/libfreenect/tree/registration

Joshua Blake

unread,
May 5, 2011, 10:22:45 AM5/5/11
to openk...@googlegroups.com
On Thu, May 5, 2011 at 2:39 AM, drew.m...@gmail.com <drew.m...@gmail.com> wrote:
I completely agree with you and JoshB here.  I also look forward to an
open-source implementation of the Microsoft skeleton tracking
algorithm.  Whenever that will happen.
 
 
This is one of the things I've been quietly working on. I've already implemented the algorithm, but it isn't real-time yet and I need to add some more features before open sourcing it. See attached image where I trained it on several labeled hand images and then recognized the input depth data through the decision tree forest.
 
I'm working on this for work purposes so it's branded as part of InfoStrat.MotionFx, but I do have approval to open source it and integrate patches from the community once I achieve a certain milestone. (InfoStrat.MotionFx is my larger WPF Kinect integration project hosted [1], videos available [2].)
 
handposerecognition.png

Kyle McDonald

unread,
May 5, 2011, 1:09:06 PM5/5/11
to openk...@googlegroups.com
thanks!

i just pulled your branch and get the same values for my kinect (120,
.104200) which isn't so surprising.

i plugged them in, and it's definitely drawing something now. but it's
not quite right (see attached).

something i'm skeptical about is these two defines i took from Sensor:

#define XN_CMOS_VGAOUTPUT_XRES 1280
#define XN_SENSOR_DEPTH_RGB_CMOS_DISTANCE 2.4

but when i run your new code i get:

fDCmosEmitterDistance: 7.500000
fDCmosRCmosDistance: 2.300000

which makes me think the second define above is wrong.

i also wonder why XN_CMOS_VGAOUTPUT_XRES is 1280 instead of 640. if it
has to do with the depth + color being interlaced, or if it's for some
other reason?

changing those two to 640 and 2.3 gets me a little closer, but it's
still not right.

if anyone wants to dig through what i'm doing, here's some code:

http://pastebin.com/WJQwj6J5

i haven't posted it in a compile-able state since it's not really working yet.

one final thing i'm confused by is that this is what it looks like
with mirrored = true. mirrored = false is way more wrong.

i think i might have to go through compiling Sensor and digging deeper
there first so i know what to expect...

kyle

On Thu, May 5, 2011 at 2:39 AM, drew.m...@gmail.com
<drew.m...@gmail.com> wrote:

Screen shot 2011-05-05 at 1.01.42 PM.png

Nicolas Burrus

unread,
May 5, 2011, 1:13:04 PM5/5/11
to openk...@googlegroups.com
Wow, that would be so nice. I guess many people are looking forward a
fully opensource implementation without binary-only dynamic libs that
are so difficult to integrate and impossible to debug.

And I'm dreaming about a libfreenect-like simple API with just
get_user_masks(context, depth_image) and get_skeleton(context,
depth_image, user_masks, user_id) and not a forest of nodes and
automagical inits, which are probably useful to some people, but
definitly a pain for most.

Good luck :)

Florian Echtler

unread,
May 6, 2011, 3:34:01 AM5/6/11
to openk...@googlegroups.com
On Thu, 2011-05-05 at 13:09 -0400, Kyle McDonald wrote:
> i just pulled your branch and get the same values for my kinect (120,
> .104200) which isn't so surprising.
>
> i plugged them in, and it's definitely drawing something now. but it's
> not quite right (see attached).
Just a brief question for my understanding: the depth-color alignment
shown in your attached picture is calculated solely from internal data
given by the Kinect and not by some external calibration, e.g. with a
chessboard, correct?

Thanks, Florian
--
SENT FROM MY DEC VT50 TERMINAL

drew.m...@gmail.com

unread,
May 6, 2011, 3:39:28 AM5/6/11
to openk...@googlegroups.com
On Thu, May 5, 2011 at 10:09 AM, Kyle McDonald <ky...@kylemcdonald.net> wrote:
> thanks!
>
> i just pulled your branch and get the same values for my kinect (120,
> .104200) which isn't so surprising.

Cool.

> i plugged them in, and it's definitely drawing something now. but it's
> not quite right (see attached).
>
> something i'm skeptical about is these two defines i took from Sensor:
>
> #define XN_CMOS_VGAOUTPUT_XRES 1280
> #define XN_SENSOR_DEPTH_RGB_CMOS_DISTANCE 2.4
>
> but when i run your new code i get:
>
> fDCmosEmitterDistance: 7.500000
> fDCmosRCmosDistance:   2.300000
>
> which makes me think the second define above is wrong.

Curious indeed. I guess keep testing until it looks right? :P

> i also wonder why XN_CMOS_VGAOUTPUT_XRES is 1280 instead of 640. if it
> has to do with the depth + color being interlaced, or if it's for some
> other reason?

The RGB and IR sensors in hardware *are* actually 1280x1024. That's
too large a frame to stream at 30fps over USB2, though, so RGB gets
squished down to 640x480, and depth loses resolution when it computes
horizontal shift of the speckle pattern. I'm not sure what's
happening specifically here, though.

> changing those two to 640 and 2.3 gets me a little closer, but it's
> still not right.
>
> if anyone wants to dig through what i'm doing, here's some code:
>
> http://pastebin.com/WJQwj6J5
>
> i haven't posted it in a compile-able state since it's not really working yet.
>
> one final thing i'm confused by is that this is what it looks like
> with mirrored = true. mirrored = false is way more wrong.

Actually, I may have gotten the mirroring boolean value completely
inverted in libfreenect due to the way glview draws the textures, so
that may be the error there. I assumed that sending a 0 meant "not
mirrored" and sending a 1 meant "mirrored" but the opposite may be
true. TODO.

> i think i might have to go through compiling Sensor and digging deeper
> there first so i know what to expect...

If you do go traipsing through Sensor code, make sure you're reading
avin2's SensorKinect code, which has some modifications for the
Kinect.

-Drew

Kyle McDonald

unread,
May 6, 2011, 10:36:42 AM5/6/11
to openk...@googlegroups.com
@florian, no, this is just using internal parameters from the kinect.
like open ni. no chessboards.

@drew, i was aware that the sensor is 1280x1024, but as far as i can
tell the image is being remapped to a 640x480 image space, so i don't
think that's what it's referring to here.

regarding the mirroring, good to know it might be backwards :) that
would explain a lot.

i'll assume you're referring to avin2's code in this repo:
https://github.com/avin2/SensorKinect

i'll try it out. reversing code that already works should help a bunch
in recreating the correction.

kyle

Kyle McDonald

unread,
May 6, 2011, 12:55:02 PM5/6/11
to openk...@googlegroups.com
i know this is probably not the right way to do it, but it's the first
thing that came to mind.

i decided to litter Registration.cpp in SensorKinect with some printfs
in the functions like BuildDepthToShiftTable to give me an idea of
what's going on.

then:

cd SensorKinect/Platform/Linux-x86/Build and run make
cd SensorKinect/Platform/Linux-x86/Redist and sudo ./install.sh
cd OpenNI-Bin-MacOSX-v1.1.0.41/Samples/Bin/Release and ./NiViewer

and in NiViewer i selected the "depth->image" setting to make sure
it's doing the registration.

i must have something wrong, because i can't see those printouts
anywhere in my system logs.

or maybe i'm completely misunderstanding -- the way i assumed things
worked is that SensorKinect is building into a module that's used by
OpenNI, and created when you run NiViewer.

which part do i have backwards?

kyle

Drew Fisher

unread,
May 6, 2011, 1:02:54 PM5/6/11
to openk...@googlegroups.com
----- Original message -----
> i know this is probably not the right way to do it, but it's the first
> thing that came to mind.
>
> i decided to litter Registration.cpp in SensorKinect with some printfs
> in the functions like BuildDepthToShiftTable to give me an idea of
> what's going on.
>
> then:
>
> cd SensorKinect/Platform/Linux-x86/Build and run make
> cd SensorKinect/Platform/Linux-x86/Redist and sudo ./install.sh
> cd OpenNI-Bin-MacOSX-v1.1.0.41/Samples/Bin/Release and ./NiViewer
>
> and in NiViewer i selected the "depth->image" setting to make sure
> it's doing the registration.
>
> i must have something wrong, because i can't see those printouts
> anywhere in my system logs.
>
> or maybe i'm completely misunderstanding -- the way i assumed things
> worked is that SensorKinect is building into a module that's used by
> OpenNI, and created when you run NiViewer.
>
> which part do i have backwards?
>
> kyle

Huh. I've actually never built/run OpenNI so far, but that sounds right, given my limited knowledge of how it works. If that's mistaken, someone else explain it to us both! :P

-Drew

Joshua Blake

unread,
May 6, 2011, 1:06:08 PM5/6/11
to openk...@googlegroups.com
That sounds about right but the printf output pipe might be redirected to nowhere since it is the driver. Since you're littering the project anyway, you might as well to fprintf to a hard coded file location to create a log.

Kyle McDonald

unread,
May 23, 2011, 10:16:15 PM5/23/11
to openk...@googlegroups.com
i think i'm kind of stuck here.

as far as i can tell, changing the source of SensorKinect is not
having any effect on NiViewer

i tried changing this line:

XnDouble dPelSize = 1.0 / (dPlanePixelSize * nXScale * S2D_PEL_CONST);

to use 10 instead of 1, which should totally destroy the registration.

but when i run NiViewer and set it to use registration, everything is fine.

is there anyone here who has successfully compiled and used
SensorKinect and OpenNI together? if so, how were they set up?

i feel like i really need to cross-check some more things about how
SensorKinect is working in order to get the same kind of registration
in libfreenect.

kyle

Joshua Blake

unread,
May 23, 2011, 11:36:53 PM5/23/11
to openk...@googlegroups.com
I've compiled Sensor and OpenNI. Keep in mind that when you recompile the driver you need to completely uninstall/delete the old driver before the new version will be registered properly. 

Kyle McDonald

unread,
May 24, 2011, 3:21:25 AM5/24/11
to openk...@googlegroups.com
awesome, that was the missing part. reading through install.sh closer
was very informative.

here's a pic of things working http://j.mp/ifTdbw

i got a ton of info back from the camera, and everything agrees with
the assumptions i described above.

the last step that i was forgetting (which is kind of obvious in
retrospect) is that i assumed depthToShift was working with the raw
10/11-bit depth values. in fact, it's using the millimeter values. i
should have guessed this given XN_DEVICE_MAX_DEPTH 10000. while it
should have seen it sooner, it's also kind of silly: if you account
for raw->depth in the LUT itself, you have a LUT that is 20% the
original size (<2048 vs 10000). as i refactor the code i'll be sure to
incorporate this optimization.

actually, i'd be really curious to hear why this conversion (raw
disparity to 'shift') wouldn't just be linear.

at the moment, i'm using stephane's equation for raw->depth:

const float k1 = 127.50, k2 = 2842.5, k3 = 1.1863;
inline float rawToMillimeters(uint16_t raw) {
return k1 * tan((raw / k2) + k3);
}

but because the exact constants here are based on the kinect's
parameters, i'm going to use a variation of the algorithm inside
XnShiftToDepthUpdate. it's an interesting read, because it doesn't
have any trigonometric functions -- apparently the tan() is a better
approximation, even it isn't actually the correct model? would love to
hear thoughts on this too.

kyle

Joshua Blake

unread,
May 24, 2011, 8:34:38 AM5/24/11
to openk...@googlegroups.com

Great Kyle! I'm looking forward to making SensorKinect and NITE obsolete and this is a big part.

Marcos Slomp

unread,
May 24, 2011, 10:38:45 PM5/24/11
to openk...@googlegroups.com
> here's a pic of things working http://j.mp/ifTdbw

Beautiful results! Good job!

> actually, i'd be really curious to hear why this conversion (raw
> disparity to 'shift') wouldn't just be linear.


Not sure if I understand what you mean by "shift".
Would it be the scale/offset you have to apply to fetch the proper (aligned) color value in the video image?
If this is the case, my guess would be that the non-linear behavior is due to some discrepancies in the lenses of video and depth cameras.

Kyle McDonald

unread,
May 25, 2011, 1:13:47 AM5/25/11
to openk...@googlegroups.com
afaict, primesense uses the term 'shift' to refer to two different things:

1 the raw depth values returned by the sensor (disparity)
2 the offset required to register a rectified depth image to the rgb image

here i'm just saying that the relationship between the two should be
linear, i think. the rectification handles any undistortion i think.

today i've spent a little time reading XnShiftToDepthUpdate() and
converting it into a single shorter function:

double dPlanePixelSize = 0.1042;
uint64_t nPlaneDsr = 120;
double planeDcl = 7.5; // pConfig->fEmitterDCmosDistance
int32_t paramCoeff = 4; // pConfig->nParamCoeff
int32_t constShift = 200; // pConfig->nConstShift
int32_t shiftScale = 10; // pConfig->nShiftScale

uint16_t RawToDepth(uint16_t raw) {
double fixedRefX = ((raw - (paramCoeff * constShift)) / paramCoeff) - 0.375;
double metric = fixedRefX * dPlanePixelSize;
return shiftScale * ((metric * nPlaneDsr / (planeDcl - metric)) + nPlaneDsr);
}

this function serves as the primesense-endorsed replacement to
stephane's equation.

i checked this model against stephane's, and there is at most a 55 mm
discrepancy within the usable range, centered around 2.5 meters. the
std dev between the models is ~17 mm.

in other words, if you have something at 2.5 meters, openni will tell
you it's 55 mm away from where stephane's equation tells you.

the only trick with this function is that there are four more
parameters i had to grab from the kinect: fEmitterDCmosDistance,
nParamCoeff, nConstShift, nShiftScale. they look to me like they're
the same for every kinect, but if someone could confirm this that
would be great.

on a separate topic, something i'm more worried about is the
CreateDXDYTables() function. it seems to be using all the input
parameters along with incrementalFitting50() to do some kind of
polynomial fitting using lots of integer math. i personally don't have
the mathematical insight to rewrite this function in a novel way. and,
if i understand the licensing situation correctly, because Sensor is
LGPL and libfreenect is dual GPL/Apache, that means libfreenect cannot
include this code... it would be kind of frustrating to make all this
progress and then have it be some kind of awkward plugin to
libfreenect.

kyle

Kyle McDonald

unread,
May 28, 2011, 3:04:07 PM5/28/11
to openk...@googlegroups.com
tl;dr: i'm not smart enough to rewrite CreateDXDYTables() in a novel
way. due to the licensing situation, the code i'm working on probably
can't be incorporated into libfreenect.

that said, i'll put together a demo inside libfreenect (instead of
inside openFrameworks, where i've been working) and hopefully some
smarter people can take a look.

other ideas are very welcome.

kyle

Florian Echtler

unread,
Jul 12, 2011, 7:45:07 AM7/12/11
to openk...@googlegroups.com, Kyle McDonald
[digging up old thread again]

Kyle, I've got some free time on my hands for the next few weeks and
would like to experiment a bit with your standalone registration code
which I'm considering _extremely_ useful - would love to see that
integrated into libfreenect.

Do you have the source available somewhere?

Florian

signature.asc

Kyle McDonald

unread,
Jul 13, 2011, 12:20:31 AM7/13/11
to Florian Echtler, openk...@googlegroups.com
Hi Florian,

I've attached the openFrameworks code I was using.

Early last week I wrapped this code into a single class that does
registration, but unfortunately (through a complex chain of events) I
don't have access to the computer with that code right now.

The parameters in the code are tuned to my Kinect. To get those
parameters, I used Drew's branch that polls them. I also checked them
against OpenNI by adding code to SensorKinect to report the same
parameters.

I would really like to clean, integrated, and contribute this code to
libfreenect. I know a lot of people who use OpenNI only because it
does registration (and then proceed to bang their head on their
keyboard every time their OSX machine freezes).

I suppose the only issue is licensing:

- PrimeSense Sensor is LGPL 3
- libfreenect is dual Apache 2/GPL 2

From what I read, It sounds like PrimeSense Sensor code could be used
in libfreenect if it's not derivative work. I think this means that if
we could write a wrapper to their code then it's ok to include it in
libfreenect. On the other hand, if it's derivative work (the way I've
done it) I think you're required to allow "modification for the
customer's own use and reverse engineering for debugging such
modifications". Which is allowed by both libfreenect licenses.

I'm not a lawyer or a licensing person though. Maybe someone else can
clarify this? It looks like the Sensor code is trying to be flexible,
and so is libfreenect. What's keeping them from borrowing from each
other/playing nicely together?

Kyle

testApp.cpp
testApp.h

Joshua Blake

unread,
Jul 13, 2011, 2:02:19 AM7/13/11
to openk...@googlegroups.com, Florian Echtler
Kyle,
 
The licenses are incompatible so from a strict legal sense we cannot just copy the code into libfreenect, as you know. Also of consideration is that algorithms are not copyrightable; however, implementations of algorithms are copyrightable.
 
There are two paths we could take:
1) Ask PrimeSense for permission/authorization to include the methods in question verbatim. (I could handle this ask if desired.)
2) Use Clean Room Design (http://en.wikipedia.org/wiki/Clean_room_design) to create a description or specification of the algorithm and then independently implement the algorithm.
 
#2 is harder and a bit tedious but frees us of derivative work claims.
In creating a specification for the algorithm, we don't necessarily need to understand the underlying principles or derive the algorithm from mathematical truths. All someone needs to do is write out in pseudocode or plain English: "first you need to take this array we read from the device, and loop over it. Within the loop, perform this mathematical operation:
y1 = x1 * 2.4 + x2 whatever etc..."
 
Basicaly if you examine the original code and implement the idea to produce the same results using in another language (English/pseudo-code/Javascript/C#) it is a new work. Since the source and target languages are both C/C++, we just need a level of indirection through a second language. The reason is that the person analyzing the original code will typically be too familiar with it to think of new variable names and such to write an independent work.
Josh
 

Florian Echtler

unread,
Jul 13, 2011, 11:04:44 AM7/13/11
to Kyle McDonald, Joshua Blake, openk...@googlegroups.com
Kyle, thanks for your code - I've done some experiments and it seems to
work nicely (I've removed all openframeworks dependencies and put in the
registration parameters from my own Kinect).

Some questions:

- the Apply1080 method expects the depthInput data to be in what unit -
millimeters? depthOutput is consequently also in mm?

- what's the exact purpose of the depthTo[RGB]Shift table?

- the RGB-CMOS distance #define using 2.4 is different from the
registration data value of 2.3 (as you've already noted in the code). I
don't really see any difference between the two variants - did you?

As a first step towards inclusion, I've decided to rewrite the central
CreateDXDYTables function - please see attachment. I'm not a lawyer, but
I believe this is sufficiently different to the original to qualify for
inclusion in libfreenect. If desired, I can push this to my branch at
https://github.com/floe/libfreenect

Additionally, I think there's some room for improvement in the rest of
the code which might also qualify the result for inclusion in
libfreenect. E.g. the raw2depth function might best be handled by a
16-bit LUT, there are some defines which should be fetched directly from
the camera etc.

Florian

createdxdytables.c
signature.asc

Kyle McDonald

unread,
Jul 13, 2011, 11:42:40 AM7/13/11
to Florian Echtler, Joshua Blake, openk...@googlegroups.com
@josh, thanks for explaining!

i still don't understand why they're legally incompatible, but i'll
take your word for it.

if you could contact primesense to ask about this specific chunk of
code (Registration.cpp) that would be ideal. then we can resort to the
alternative only if we have to.

there's another bit of code that would be good too, but i haven't
hunted it down. it's the code that implements
xnConvertProjectiveToRealWorld. as best i can tell, it's spread
throughout OpenNI rather than being in a single location. for now we
have a good replacement for that anyway that has been posted to the
mailing list in the past (but should also, in my opinion, be included
in libfreenect).

@florian, awesome!

any instance of real-world depth information in OpenNI that i've seen
is always mm. so yes, depthInput and depthOutput are mm uint16_t (kind
of convenient for in-place processing).

depthToRgbShift table (which is named a few things throughout the
code) is the cornerstone of the registration algorithm. once you have
your image undistorted using the DXDYTable, then you need to
horizontally slide your depth data over. the amount that you slide it
over is related to the depth value itself. instead of doing this
calculation on every depth value, they create a LUT to do it faster.
depthToRgbShift is that LUT. think about this process as the inverse
of depth-from-disparity.

switching the 2.4/2.3 thing causes a minor measurable difference in
the results, but i haven't been able to determine which one is "more
correct". we could try reporting it to the original github repo as an
issue and maybe they can comment on it there?

this is a really positive development. here's to hoping we can make
libfreenect as powerful as OpenNI, but more
stable/understandable/community driven :)

best,
kyle

Reply all
Reply to author
Forward
0 new messages