i'm working on projecting the kinect point cloud down to an rgb camera.
we're using a custom mount
http://www.flickr.com/photos/49322752@N00/sets/72157626569601694/
it's fairly easy to make, and super solid. here is a diagram in
sketchup http://sketchup.google.com/3dwarehouse/details?mid=13dd4e1f541420bb4366ba10874ef691
first step is to calibrate both the cameras for their intrinsic
parameters and distortion, and then their extrinsic parameters, using
chessboard calibration. i have that part working correctly with a very
low reprojection error. i'm covering up the IR projector with a
diffuser to get the best results.
the next step is to project the kinect depth image into a real space
point cloud. this takes two steps:
1 converting raw depth values to real depth values
2 2d->3d projection, akin to OpenNI's ConvertProjectiveToRealWorld
for step 1, i've been using stéphane magnenat's equation from
https://groups.google.com/group/openkinect/browse_thread/thread/31351846fd33c78/e98a94ac605b9f21?lnk=gst&q=stephane&pli=1
i know this is modeled on the original ROS raw/depth data, but does
anyone know how universal it is? is it really accurate across all
kinects? how much does it disagree with the OpenNI raw->depth
conversion? has anyone tried reverse engineering the OpenNI raw->depth
conversion?
for step 2, i'm using the following:
float fx = tanf(fov.x / 2) * 2;
float fy = tanf(fov.y / 2) * 2;
float xReal = ((x - principalPoint.x) / imageSize.width) * z * fx;
float yReal = ((y - principalPoint.y) / imageSize.height) * z * fy;
which can be done in a single matrix operation a la glpclview.c
LoadVertexMatrix() i suppose. any tips would be great here. i think
the principal point should be slightly offset to compensate for the
difference between the ir and depth images (given on the ROS site as
-4.8 x -3.9) but i'm not sure where to put this. should it modify the
principal point?
finally, the last step is to project the point cloud onto the rgb
image and sample the colors. for this i'm using cv::projectPoints()
which is also kind enough to handle lens undistortion.
is this whole process correct?
the main thing i'm skeptical about right now is converting raw to
depth values. i have a suspicion that it varies from kinect to kinect,
and that this can cause some variation in the accuracy of this whole
process.
thanks,
kyle
and how much does it vary per kinect?
kyle
in other words, i can confirm for at least one other kinect that
stéphane's model is accurate for the near range, and that it's not
simply 'tuned' to the ROS data.
here is the data on pastebin: http://pastebin.com/pnVkzHUJ
i'm still interested in any other insight/feedback on the other
calibration stuff!
kyle
i've read through nicholas' work, it's been very helpful. openni won't
work here, because i'm using an external (non-kinect camera).
but this does lead me to two other burning questions i've had for a
few months now:
1 does anyone understand how the xbox calibration card works? when/why
it's used?
2 how does openni align the color and depth images? either all kinects
are similar enough that the extrinsics between the rgb + ir cam are
hardcoded, or every kinect is different -- in which case they must be
reporting their in-factory calibration data...
best,
kyle
Haven't read anything on this. Nor have I seen/heard of anyone
actually using it ever. I'm curious too.
> 2 how does openni align the color and depth images? either all kinects
> are similar enough that the extrinsics between the rgb + ir cam are
> hardcoded, or every kinect is different -- in which case they must be
> reporting their in-factory calibration data...
It's the latter. Commands 0x40 and 0x41 on the Kinect return the
registration and frame padding information respectively [1]. OpenNI
has an algorithm that utilizes these parameters to register the depth
image to the RGB image [2].
Reimplementing these in libfreenect is actually on my list of TODO
items, but fairly far down, so I'd love any help offered. Any takers?
:)
> best,
> kyle
-Drew
[1] - https://github.com/avin2/SensorKinect/commit/48f6059b232840e7aa4d671c7dfa3545fff907d8#diff-6
[2] - https://github.com/PrimeSense/Sensor/blob/master/Source/XnDeviceSensorV2/Registration.cpp#L503
it looks like this algorithm is based on the fact that the images are
almost rectified: it's just the fov that's different. they're taking
each pixel in the depth image and using two lookup tables
(m_pRegistrationTable and m_pDepthToShiftTable) to figure out where it
should be in the rgb image, and saving that to an image buffer.
i'm not familiar with setting custom registers/sending custom commands
using libfreenect. but once i have access to the registration +
padding information i would love to write some code to align the
images similar to how this openni snippet is working.
in my opinion, it's super important for off the shelf computer vision
use: once you have aligned color+depth images, it opens things up a
bit. optical flow is a bit awkward on the depth image, but happens
naturally on the color image, for example.
if you could walk me through getting this data via libfreenect, on or
off list, that'd be great. i'd love to contribute.
thanks,
kyle
Excellent! This is why teamwork is great! He're the part you wanted: [1]. :)
> in my opinion, it's super important for off the shelf computer vision
> use: once you have aligned color+depth images, it opens things up a
> bit. optical flow is a bit awkward on the depth image, but happens
> naturally on the color image, for example.
I agree. Registration would make everything much neater.
> if you could walk me through getting this data via libfreenect, on or
> off list, that'd be great. i'd love to contribute.
Start with my branch linked in [1] (it's just one commit atop the
as-of-writing master/unstable) and call freenect_get_reg_info() and
freenect_get_reg_pad_info(), which will return structs equivalent in
structure to XnRegistrationInformation1080 and
XnRegistrationPaddingInformation, respectively. Let me know if you
need more guidance than that; I'm zarvox on IRC and the rest of my
contact information is on the wiki.
I can't guarantee that whatever will eventually go into libfreenect
will bear exactly the same function signatures, but I'll promise that
the structs themselves will have the same layout (if not the same
names). Ideally, they'd be invisible to the end user - the user would
just call an enable/disable registration function, and the driver
would handle all this behind the scenes. But in the meantime, I think
what I've written is the quickest/easiest way to get the data in your
hands for development/testing; we can always clean everything up
before we merge.
> thanks,
> kyle
Best,
Drew
[1] - https://github.com/zarvox/libfreenect/tree/registration
there are two registration routines, for chips 1000 and 1080. i'm
assuming kinect is 1080, and the reference sensor is 1000, because
1080 deals with color and 1000 doesn't.
there are two important lookup tables used by Apply1080:
m_pDepthToShiftTable and m_pRegistrationTable
m_pRegistrationTable is built by BuildDepthToShiftTable. and this
function grabs three parameters that i don't know the value, and it
looks like it's asking the kinect for them:
m_pStream->GetProperty(XN_STREAM_PROPERTY_ZERO_PLANE_PIXEL_SIZE,
&dPlanePixelSize);
m_pStream->GetProperty(XN_STREAM_PROPERTY_ZERO_PLANE_DISTANCE, &nPlaneDsr);
XnDepthPixel nMaxDepth = m_pStream->GetDeviceMaxDepth();
i found some fov relationships between the zpps and zpd that at least
gives me an approximate ratio between the two (.00083 ~= zpps / zpd)
but i don't think that's enough to generate the depth to shift table.
so i'm also going to need these values.
for nMaxDepth, i'm pretty sure it's just 2048 (or maybe lower), but it
looks like it's polling the kinect for that value as well.
Apply1080 internally needs to know whether the depth image is mirrored
(m_pDepthStream->IsMirrored()). how can i check this property with
libfreenect?
i think it's about 250 lines for a pretty straight port that removes
primesense dependencies, but it could probably be reduced to 150-200
lines. i have a version that's compiling and running, but not
producing anything reasonable (probably because of the made up zpps
and zpd values).
tl;dr, i need three more things to make this work:
ZERO_PLANE_PIXEL_SIZE, ZERO_PLANE_DISTANCE, GetDeviceMaxDepth()
kyle
Yes, that is correct. :)
> there are two important lookup tables used by Apply1080:
> m_pDepthToShiftTable and m_pRegistrationTable
>
> m_pRegistrationTable is built by BuildDepthToShiftTable. and this
> function grabs three parameters that i don't know the value, and it
> looks like it's asking the kinect for them:
>
> m_pStream->GetProperty(XN_STREAM_PROPERTY_ZERO_PLANE_PIXEL_SIZE,
> &dPlanePixelSize);
> m_pStream->GetProperty(XN_STREAM_PROPERTY_ZERO_PLANE_DISTANCE, &nPlaneDsr);
> XnDepthPixel nMaxDepth = m_pStream->GetDeviceMaxDepth();
Don't know these off the top of my head; I'll look into finding out
more for you when I can.
> i found some fov relationships between the zpps and zpd that at least
> gives me an approximate ratio between the two (.00083 ~= zpps / zpd)
> but i don't think that's enough to generate the depth to shift table.
> so i'm also going to need these values.
>
> for nMaxDepth, i'm pretty sure it's just 2048 (or maybe lower), but it
> looks like it's polling the kinect for that value as well.
MaxDepthValue appears to be a settable parameter in OpenNI which
defaults to 10000. Not sure what the units are on that, though.
> Apply1080 internally needs to know whether the depth image is mirrored
> (m_pDepthStream->IsMirrored()). how can i check this property with
> libfreenect?
Ahh, that's another feature that we need to add configurable support
for. For now, isMirrored() will always be false, and in the future,
they'll be member variables of the freenect_device struct (I'll
probably call them video_mirrored and depth_mirrored).
> i think it's about 250 lines for a pretty straight port that removes
> primesense dependencies, but it could probably be reduced to 150-200
> lines. i have a version that's compiling and running, but not
> producing anything reasonable (probably because of the made up zpps
> and zpd values).
>
> tl;dr, i need three more things to make this work:
> ZERO_PLANE_PIXEL_SIZE, ZERO_PLANE_DISTANCE, GetDeviceMaxDepth()
>
> kyle
Awesome progress! I'll poke around and see what I can find. :)
-Drew
On May 2, 2011 6:13 PM, "Kyle McDonald" <ky...@kylemcdonald.net> wrote:
>
> i'm 'porting' the primesense registration code into something more
> self contained/removing dependencies, so libfreenect can use it
> easily.
Please make sure that you are not copy/pasting any sensor code and that you don't copy the style/naming accidentally. The licenses are incompatible and we want to avoid any issues there. Otherwise, sounds like you are going in the right direction so keep up the great work!
Josh
MaxDepthValue at 10000 makes sense. in my experience openni prefers mm
units (which are actually very nice, because they allow you to store
distance in a short int while allowing for the necessary precision and
range). so 10000 would be 10 meters, which is probably the farthest
i've seen the kinect operate at.
my original assumption 2048 was based on an inverted understanding.
the Sensor code creates a table of distance->raw values, then uses
this to interpolate into a raw->distance table. 2048 would make sense
if it was the other way around (a la the glview m_gamma LUT).
isMirrored() seems to change depending on whether you've ever used
openni or not. i remember a discussion about this... but anyway, for
now i'll assume mirrored is false.
all i know about the two other params i need are in XnStreamParams.h:
/** XN_DEPTH_TYPE */
#define XN_STREAM_PROPERTY_ZERO_PLANE_DISTANCE "ZPD"
/** Real */
#define XN_STREAM_PROPERTY_ZERO_PLANE_PIXEL_SIZE "ZPPS"
but i have no idea how to actually get those values. Sensor is such a
complex web of XnProperty, XnIntProperty, XnActualIntProperty... by
the time i've found something that actually does anything i forget
what i'm looking for.
kyle
Good, good. I probably ought to rename all the fields of those
structs...thanks for the reminder, JoshB.
> MaxDepthValue at 10000 makes sense. in my experience openni prefers mm
> units (which are actually very nice, because they allow you to store
> distance in a short int while allowing for the necessary precision and
> range). so 10000 would be 10 meters, which is probably the farthest
> i've seen the kinect operate at.
>
> my original assumption 2048 was based on an inverted understanding.
> the Sensor code creates a table of distance->raw values, then uses
> this to interpolate into a raw->distance table. 2048 would make sense
> if it was the other way around (a la the glview m_gamma LUT).
Ah, that makes sense.
> isMirrored() seems to change depending on whether you've ever used
> openni or not. i remember a discussion about this... but anyway, for
> now i'll assume mirrored is false.
Yep, this was an issue before, but I fixed it back in February for
unstable, and the commit has now reached master: [1]. So even if
OpenNI enables it, we disable it again now.
> all i know about the two other params i need are in XnStreamParams.h:
>
> /** XN_DEPTH_TYPE */
> #define XN_STREAM_PROPERTY_ZERO_PLANE_DISTANCE "ZPD"
> /** Real */
> #define XN_STREAM_PROPERTY_ZERO_PLANE_PIXEL_SIZE "ZPPS"
> but i have no idea how to actually get those values. Sensor is such a
> complex web of XnProperty, XnIntProperty, XnActualIntProperty... by
> the time i've found something that actually does anything i forget
> what i'm looking for.
Yes, this is a mess to walk through. Thank goodness for cscope.
It took a while, but I've tracked it down - it's one of the so-called
"Fixed Parameters" that I need to pull from the device, so that's
another piece of USB communication that I'll need to implement. It'll
probably be unique per-device, but immutable. I'll look into that as
soon as I can.
In the meantime, assume that I will (eventually) provide a function
that will return a struct called ZeroPlaneInfo that looks like:
struct ZeroPlaneInfo {
float distance;
float pixel_size;
}
One last thing: you asked on IRC for a list of my Kinect's parameters;
here's the data from the two Kinects I have access to, in the OpenNI
naming scheme. [2]
-Drew
[1] - https://github.com/OpenKinect/libfreenect/commit/9b533d4c0253e2af5bb0ac65e05ec1d155f09203
[2] - http://pastebin.com/YDrNsYgC
it would be worth renaming the parameters of the structs if just for
the sake of making them cleaner :) in a way, i think libfreenect is
catching up with a lot of what's already available from PrimeSense --
but the difference is:
1 it's a community effort, aimed at getting people to develop a shared
understanding of the hardware and techniques involved
2 it's designed for "user-developers" (hackers) who don't necessarily
have a CS degree / don't recite design patterns in their sleep...
and small things like struct field names can be a great place for that
difference to shine.
re mirroring: good to hear. i think i've seen it flip back after using
openni recently, so i know it's working then.
those two 'fixed parameters' (the ZeroPlaneInfo struct) are exactly
what i need. if you can poll them and get me some numbers to start
with, even if they're specific to your camera, that would be hugely
helpful. i can't really move further till i have them.
in other news, i have the kinect calibrated nicely to an external
camera using opencv:
http://www.flickr.com/photos/kylemcdonald/5686302302/in/photostream
which i personally think is super exciting. the code is on github
already but needs some more cleaning to make it hacker friendly.
kyle
i think libfreenect is
catching up with a lot of what's already available from PrimeSense --
but the difference is:
1 it's a community effort, aimed at getting people to develop a shared
understanding of the hardware and techniques involved
2 it's designed for "user-developers" (hackers) who don't necessarily
have a CS degree / don't recite design patterns in their sleep...
Oh right, I forgot to mask all the fields out, sorry. There's not a
function for that in libfreenect yet, but it might be worth adding
either an inline function or a #define for it.
> it would be worth renaming the parameters of the structs if just for
> the sake of making them cleaner :) in a way, i think libfreenect is
> catching up with a lot of what's already available from PrimeSense --
> but the difference is:
>
> 1 it's a community effort, aimed at getting people to develop a shared
> understanding of the hardware and techniques involved
> 2 it's designed for "user-developers" (hackers) who don't necessarily
> have a CS degree / don't recite design patterns in their sleep...
>
> and small things like struct field names can be a great place for that
> difference to shine.
I completely agree with you and JoshB here. I also look forward to an
open-source implementation of the Microsoft skeleton tracking
algorithm. Whenever that will happen.
> those two 'fixed parameters' (the ZeroPlaneInfo struct) are exactly
> what i need. if you can poll them and get me some numbers to start
> with, even if they're specific to your camera, that would be hugely
> helpful. i can't really move further till i have them.
After much confusion (the device returns a lot more data than
sizeof(XnFixedParams)! what on earth?), it is done. Because of that
wonkiness, I can't actually guarantee that these are the right
numbers, but they were the only spot that looked like four floats in a
row, so I think they'll be correct. Pull the commit from my
registration branch [1].
My lab's Kinect's numbers, for reference, are:
distance: 120.000000
pixel_size: 0.104200
> in other news, i have the kinect calibrated nicely to an external
> camera using opencv:
>
> http://www.flickr.com/photos/kylemcdonald/5686302302/in/photostream
>
> which i personally think is super exciting. the code is on github
> already but needs some more cleaning to make it hacker friendly.
Nice work! :) I really should learn more about OpenCV some time...
And just for warning: I'll be gone for a week or so as I'm attending a
conference out of the country, so I might be a bit latent in replies
and unable to actually test code since I probably won't bring my
Kinect with me. Apologies in advance. On the bright side, I was able
to get this part done for you before I leave. :)
-Drew
[1] - https://github.com/zarvox/libfreenect/tree/registration
I completely agree with you and JoshB here. I also look forward to an
open-source implementation of the Microsoft skeleton tracking
algorithm. Whenever that will happen.
i just pulled your branch and get the same values for my kinect (120,
.104200) which isn't so surprising.
i plugged them in, and it's definitely drawing something now. but it's
not quite right (see attached).
something i'm skeptical about is these two defines i took from Sensor:
#define XN_CMOS_VGAOUTPUT_XRES 1280
#define XN_SENSOR_DEPTH_RGB_CMOS_DISTANCE 2.4
but when i run your new code i get:
fDCmosEmitterDistance: 7.500000
fDCmosRCmosDistance: 2.300000
which makes me think the second define above is wrong.
i also wonder why XN_CMOS_VGAOUTPUT_XRES is 1280 instead of 640. if it
has to do with the depth + color being interlaced, or if it's for some
other reason?
changing those two to 640 and 2.3 gets me a little closer, but it's
still not right.
if anyone wants to dig through what i'm doing, here's some code:
i haven't posted it in a compile-able state since it's not really working yet.
one final thing i'm confused by is that this is what it looks like
with mirrored = true. mirrored = false is way more wrong.
i think i might have to go through compiling Sensor and digging deeper
there first so i know what to expect...
kyle
On Thu, May 5, 2011 at 2:39 AM, drew.m...@gmail.com
<drew.m...@gmail.com> wrote:
And I'm dreaming about a libfreenect-like simple API with just
get_user_masks(context, depth_image) and get_skeleton(context,
depth_image, user_masks, user_id) and not a forest of nodes and
automagical inits, which are probably useful to some people, but
definitly a pain for most.
Good luck :)
Thanks, Florian
--
SENT FROM MY DEC VT50 TERMINAL
Cool.
> i plugged them in, and it's definitely drawing something now. but it's
> not quite right (see attached).
>
> something i'm skeptical about is these two defines i took from Sensor:
>
> #define XN_CMOS_VGAOUTPUT_XRES 1280
> #define XN_SENSOR_DEPTH_RGB_CMOS_DISTANCE 2.4
>
> but when i run your new code i get:
>
> fDCmosEmitterDistance: 7.500000
> fDCmosRCmosDistance: 2.300000
>
> which makes me think the second define above is wrong.
Curious indeed. I guess keep testing until it looks right? :P
> i also wonder why XN_CMOS_VGAOUTPUT_XRES is 1280 instead of 640. if it
> has to do with the depth + color being interlaced, or if it's for some
> other reason?
The RGB and IR sensors in hardware *are* actually 1280x1024. That's
too large a frame to stream at 30fps over USB2, though, so RGB gets
squished down to 640x480, and depth loses resolution when it computes
horizontal shift of the speckle pattern. I'm not sure what's
happening specifically here, though.
> changing those two to 640 and 2.3 gets me a little closer, but it's
> still not right.
>
> if anyone wants to dig through what i'm doing, here's some code:
>
> http://pastebin.com/WJQwj6J5
>
> i haven't posted it in a compile-able state since it's not really working yet.
>
> one final thing i'm confused by is that this is what it looks like
> with mirrored = true. mirrored = false is way more wrong.
Actually, I may have gotten the mirroring boolean value completely
inverted in libfreenect due to the way glview draws the textures, so
that may be the error there. I assumed that sending a 0 meant "not
mirrored" and sending a 1 meant "mirrored" but the opposite may be
true. TODO.
> i think i might have to go through compiling Sensor and digging deeper
> there first so i know what to expect...
If you do go traipsing through Sensor code, make sure you're reading
avin2's SensorKinect code, which has some modifications for the
Kinect.
-Drew
@drew, i was aware that the sensor is 1280x1024, but as far as i can
tell the image is being remapped to a 640x480 image space, so i don't
think that's what it's referring to here.
regarding the mirroring, good to know it might be backwards :) that
would explain a lot.
i'll assume you're referring to avin2's code in this repo:
https://github.com/avin2/SensorKinect
i'll try it out. reversing code that already works should help a bunch
in recreating the correction.
kyle
i decided to litter Registration.cpp in SensorKinect with some printfs
in the functions like BuildDepthToShiftTable to give me an idea of
what's going on.
then:
cd SensorKinect/Platform/Linux-x86/Build and run make
cd SensorKinect/Platform/Linux-x86/Redist and sudo ./install.sh
cd OpenNI-Bin-MacOSX-v1.1.0.41/Samples/Bin/Release and ./NiViewer
and in NiViewer i selected the "depth->image" setting to make sure
it's doing the registration.
i must have something wrong, because i can't see those printouts
anywhere in my system logs.
or maybe i'm completely misunderstanding -- the way i assumed things
worked is that SensorKinect is building into a module that's used by
OpenNI, and created when you run NiViewer.
which part do i have backwards?
kyle
Huh. I've actually never built/run OpenNI so far, but that sounds right, given my limited knowledge of how it works. If that's mistaken, someone else explain it to us both! :P
-Drew
as far as i can tell, changing the source of SensorKinect is not
having any effect on NiViewer
i tried changing this line:
XnDouble dPelSize = 1.0 / (dPlanePixelSize * nXScale * S2D_PEL_CONST);
to use 10 instead of 1, which should totally destroy the registration.
but when i run NiViewer and set it to use registration, everything is fine.
is there anyone here who has successfully compiled and used
SensorKinect and OpenNI together? if so, how were they set up?
i feel like i really need to cross-check some more things about how
SensorKinect is working in order to get the same kind of registration
in libfreenect.
kyle
here's a pic of things working http://j.mp/ifTdbw
i got a ton of info back from the camera, and everything agrees with
the assumptions i described above.
the last step that i was forgetting (which is kind of obvious in
retrospect) is that i assumed depthToShift was working with the raw
10/11-bit depth values. in fact, it's using the millimeter values. i
should have guessed this given XN_DEVICE_MAX_DEPTH 10000. while it
should have seen it sooner, it's also kind of silly: if you account
for raw->depth in the LUT itself, you have a LUT that is 20% the
original size (<2048 vs 10000). as i refactor the code i'll be sure to
incorporate this optimization.
actually, i'd be really curious to hear why this conversion (raw
disparity to 'shift') wouldn't just be linear.
at the moment, i'm using stephane's equation for raw->depth:
const float k1 = 127.50, k2 = 2842.5, k3 = 1.1863;
inline float rawToMillimeters(uint16_t raw) {
return k1 * tan((raw / k2) + k3);
}
but because the exact constants here are based on the kinect's
parameters, i'm going to use a variation of the algorithm inside
XnShiftToDepthUpdate. it's an interesting read, because it doesn't
have any trigonometric functions -- apparently the tan() is a better
approximation, even it isn't actually the correct model? would love to
hear thoughts on this too.
kyle
Great Kyle! I'm looking forward to making SensorKinect and NITE obsolete and this is a big part.
1 the raw depth values returned by the sensor (disparity)
2 the offset required to register a rectified depth image to the rgb image
here i'm just saying that the relationship between the two should be
linear, i think. the rectification handles any undistortion i think.
today i've spent a little time reading XnShiftToDepthUpdate() and
converting it into a single shorter function:
double dPlanePixelSize = 0.1042;
uint64_t nPlaneDsr = 120;
double planeDcl = 7.5; // pConfig->fEmitterDCmosDistance
int32_t paramCoeff = 4; // pConfig->nParamCoeff
int32_t constShift = 200; // pConfig->nConstShift
int32_t shiftScale = 10; // pConfig->nShiftScale
uint16_t RawToDepth(uint16_t raw) {
double fixedRefX = ((raw - (paramCoeff * constShift)) / paramCoeff) - 0.375;
double metric = fixedRefX * dPlanePixelSize;
return shiftScale * ((metric * nPlaneDsr / (planeDcl - metric)) + nPlaneDsr);
}
this function serves as the primesense-endorsed replacement to
stephane's equation.
i checked this model against stephane's, and there is at most a 55 mm
discrepancy within the usable range, centered around 2.5 meters. the
std dev between the models is ~17 mm.
in other words, if you have something at 2.5 meters, openni will tell
you it's 55 mm away from where stephane's equation tells you.
the only trick with this function is that there are four more
parameters i had to grab from the kinect: fEmitterDCmosDistance,
nParamCoeff, nConstShift, nShiftScale. they look to me like they're
the same for every kinect, but if someone could confirm this that
would be great.
on a separate topic, something i'm more worried about is the
CreateDXDYTables() function. it seems to be using all the input
parameters along with incrementalFitting50() to do some kind of
polynomial fitting using lots of integer math. i personally don't have
the mathematical insight to rewrite this function in a novel way. and,
if i understand the licensing situation correctly, because Sensor is
LGPL and libfreenect is dual GPL/Apache, that means libfreenect cannot
include this code... it would be kind of frustrating to make all this
progress and then have it be some kind of awkward plugin to
libfreenect.
kyle
that said, i'll put together a demo inside libfreenect (instead of
inside openFrameworks, where i've been working) and hopefully some
smarter people can take a look.
other ideas are very welcome.
kyle
Kyle, I've got some free time on my hands for the next few weeks and
would like to experiment a bit with your standalone registration code
which I'm considering _extremely_ useful - would love to see that
integrated into libfreenect.
Do you have the source available somewhere?
Florian
I've attached the openFrameworks code I was using.
Early last week I wrapped this code into a single class that does
registration, but unfortunately (through a complex chain of events) I
don't have access to the computer with that code right now.
The parameters in the code are tuned to my Kinect. To get those
parameters, I used Drew's branch that polls them. I also checked them
against OpenNI by adding code to SensorKinect to report the same
parameters.
I would really like to clean, integrated, and contribute this code to
libfreenect. I know a lot of people who use OpenNI only because it
does registration (and then proceed to bang their head on their
keyboard every time their OSX machine freezes).
I suppose the only issue is licensing:
- PrimeSense Sensor is LGPL 3
- libfreenect is dual Apache 2/GPL 2
From what I read, It sounds like PrimeSense Sensor code could be used
in libfreenect if it's not derivative work. I think this means that if
we could write a wrapper to their code then it's ok to include it in
libfreenect. On the other hand, if it's derivative work (the way I've
done it) I think you're required to allow "modification for the
customer's own use and reverse engineering for debugging such
modifications". Which is allowed by both libfreenect licenses.
I'm not a lawyer or a licensing person though. Maybe someone else can
clarify this? It looks like the Sensor code is trying to be flexible,
and so is libfreenect. What's keeping them from borrowing from each
other/playing nicely together?
Kyle
Some questions:
- the Apply1080 method expects the depthInput data to be in what unit -
millimeters? depthOutput is consequently also in mm?
- what's the exact purpose of the depthTo[RGB]Shift table?
- the RGB-CMOS distance #define using 2.4 is different from the
registration data value of 2.3 (as you've already noted in the code). I
don't really see any difference between the two variants - did you?
As a first step towards inclusion, I've decided to rewrite the central
CreateDXDYTables function - please see attachment. I'm not a lawyer, but
I believe this is sufficiently different to the original to qualify for
inclusion in libfreenect. If desired, I can push this to my branch at
https://github.com/floe/libfreenect
Additionally, I think there's some room for improvement in the rest of
the code which might also qualify the result for inclusion in
libfreenect. E.g. the raw2depth function might best be handled by a
16-bit LUT, there are some defines which should be fetched directly from
the camera etc.
Florian
i still don't understand why they're legally incompatible, but i'll
take your word for it.
if you could contact primesense to ask about this specific chunk of
code (Registration.cpp) that would be ideal. then we can resort to the
alternative only if we have to.
there's another bit of code that would be good too, but i haven't
hunted it down. it's the code that implements
xnConvertProjectiveToRealWorld. as best i can tell, it's spread
throughout OpenNI rather than being in a single location. for now we
have a good replacement for that anyway that has been posted to the
mailing list in the past (but should also, in my opinion, be included
in libfreenect).
@florian, awesome!
any instance of real-world depth information in OpenNI that i've seen
is always mm. so yes, depthInput and depthOutput are mm uint16_t (kind
of convenient for in-place processing).
depthToRgbShift table (which is named a few things throughout the
code) is the cornerstone of the registration algorithm. once you have
your image undistorted using the DXDYTable, then you need to
horizontally slide your depth data over. the amount that you slide it
over is related to the depth value itself. instead of doing this
calculation on every depth value, they create a LUT to do it faster.
depthToRgbShift is that LUT. think about this process as the inverse
of depth-from-disparity.
switching the 2.4/2.3 thing causes a minor measurable difference in
the results, but i haven't been able to determine which one is "more
correct". we could try reporting it to the original github repo as an
issue and maybe they can comment on it there?
this is a really positive development. here's to hoping we can make
libfreenect as powerful as OpenNI, but more
stable/understandable/community driven :)
best,
kyle