a first, very simplified, linux kernel driver for the Kinect sensor
device is now available, so you now can use it as a ridiculously
expensive webcam with any v4l2 application.
Here's the code:
http://git.ao2.it/gspca_kinect.git/
And here's some background about it (meant also for non OpenKinect
folks):
http://ao2.it/blog/2010/12/06/kinect-linux-kernel-driver
As you can see this driver is just some "advanced copy&paste" from
libfreenect, plus reducing the packets scan routine to the bare minimum.
Taking some code from libfreenect should be OK as it comes under GPLv2
(dual-licensed with an Apache license) but let me know if you think
there could be any issues.
The gspca framework proved itself very precious once again —thanks
Jean-François—, for this simple proof-of-concept driver it took care of
the whole isoc transfer setup for us.
Now the hard part begins, here's a loose TODO-list:
- Discuss the "fragmentation problem":
* the webcam kernel driver and the libusb backend of libfreenect
are not going to conflict each other in practice, but code
duplication could be avoided to some degree; we could start
listing the advantages and disadvantages of a v4l2 backend
opposed to a libusb backend for video data in libfreenct (don't
think in terms of userspace/kernelspace for now).
* Would exposing the accelerometer as an input device make sense
too? The only reason for that is to use the data in already
existing applications. And what about led and motor?
If we agree that the kernel driver approach is not too dangerous
for libfreenect's future, than we could start talking about
submitting code to mainline linux.
- Check if gspca can handle two video nodes for the same USB device
in a single driver (Kinect sensor uses ep 0x81 for color data and
ep 0x82 for depth data).
- Decide if we want two separate video nodes, or a
combined RGB-D data stream coming from a single video device node.
(I haven't even looked at the synchronization logic yet).
- If a combined format would be chosen, settle on a format usable also
by future RGB-D devices.
Comment and Critics more than welcomed.
Regards,
Antonio
--
Antonio Ospite
http://ao2.it
PGP public key ID: 0x4553B001
A: Because it messes up the order in which people normally read text.
See http://en.wikipedia.org/wiki/Posting_style
Q: Why is top-posting such a bad thing?
I've got a patchset in the works for this, just trying to view Y16 from
userspace first...
> * Would exposing the accelerometer as an input device make sense
> too? The only reason for that is to use the data in already
> existing applications. And what about led and motor?
embedded applications may find this helpful...
> If we agree that the kernel driver approach is not too dangerous
> for libfreenect's future, than we could start talking about
> submitting code to mainline linux.
always a good idea. I'm pretty sure the motion project [1] could make
use of a v4l device that spits out smoothed depth data...
> - Check if gspca can handle two video nodes for the same USB device
> in a single driver (Kinect sensor uses ep 0x81 for color data and
> ep 0x82 for depth data).
wip.
> - Decide if we want two separate video nodes, or a
> combined RGB-D data stream coming from a single video device node.
> (I haven't even looked at the synchronization logic yet).
There is a timestamp field with every frame passed on by the kinect, can
that be passed up to v4l?
Also, if we decide to stay with two video nodes, some udev magic could
create:
/dev/kinect/depth -> ../videoX
/dev/kinect/camera -> ../videoY
/dev/kinect/ir -> ../videoZ
...
For consistent naming.
thx,
Jason.
> On 12/06/2010 10:18 PM, Antonio Ospite wrote:
[...]
> > Now the hard part begins, here's a loose TODO-list:
> > - Discuss the "fragmentation problem":
> > * the webcam kernel driver and the libusb backend of libfreenect
> > are not going to conflict each other in practice, but code
> > duplication could be avoided to some degree; we could start
> > listing the advantages and disadvantages of a v4l2 backend
> > opposed to a libusb backend for video data in libfreenct (don't
> > think in terms of userspace/kernelspace for now).
>
> I think that being able to use the kinect as just a webcam in apps like cheese,
> skype and google chat is a major advantage of the kernel driver. I also
> think that in the long run it is not useful to have 2 different drivers.
>
Well, keep in mind that libfreenect meant to be a postable layer
across different platforms, so the libusb backend would be around even
if the projects ended up not using it on linux.
[...]
>
> > - Check if gspca can handle two video nodes for the same USB device
> > in a single driver (Kinect sensor uses ep 0x81 for color data and
> > ep 0x82 for depth data).
>
> Currently gspca cannot handle 2 streaming endpoints / 2 video nodes for 1
> usb device. I've been thinking about this and I think that the simplest solution
> is to simply pretend the kinects video functionality consists of 2 different
> devices / usb interfaces in a multifunction device. Even though it does not
> what I'm proposing is for the kinect driver to call: gspca_dev_probe twice with
> 2 different sd_desc structures, thus creating 2 /dev/video nodes, framequeues and
> isoc management variables.
>
> This means that the alt_xfer function in gspca.c needs to be changed to not
> always return the first isoc ep. We need to add an ep_nr variable to the
> cam struct and when that is set alt_xfer should search for the ep with that
> number and return that (unless the wMaxPacketSize for that ep is 0 in the
> current alt setting in which case NULL should be returned).
>
Thanks for these directions, it doesn't sound too hard to do.
> > - Decide if we want two separate video nodes, or a
> > combined RGB-D data stream coming from a single video device node.
> > (I haven't even looked at the synchronization logic yet).
>
> I think 2 separate nodes is easiest, also see above.
>
That's the way I am going for now.
> On Monday, December 06, 2010 22:18:47 Antonio Ospite wrote:
[...]
> > Now the hard part begins, here's a loose TODO-list:
> > - Discuss the "fragmentation problem":
> > * the webcam kernel driver and the libusb backend of libfreenect
> > are not going to conflict each other in practice, but code
> > duplication could be avoided to some degree; we could start
> > listing the advantages and disadvantages of a v4l2 backend
> > opposed to a libusb backend for video data in libfreenct (don't
> > think in terms of userspace/kernelspace for now).
> > * Would exposing the accelerometer as an input device make sense
> > too?
>
> How do other accelerometer drivers do this?
>
input device of course, the question was more with regard to libfreenect
than linux...
> > The only reason for that is to use the data in already
> > existing applications. And what about led and motor?
>
> We are talking about LED(s?) on the webcam and the motor controlling the webcam?
> That is typically also handled via v4l2, usually by the control API.
>
I have to check whether the control API fits this case, the led (only
one) and motor are on another USB device: the Kinect sensor appears as
a hub with several distinct devices (camera, motor/led/accel, audio)
plugged in it.
[...]
> > - Decide if we want two separate video nodes, or a
> > combined RGB-D data stream coming from a single video device node.
> > (I haven't even looked at the synchronization logic yet).
>
> My gut feeling is that a combined RGB-D stream is only feasible if the two
> streams as received from the hardware are completely in sync. If they are in
> sync, then it would probably simplify the driver logic to output a combined
> planar RGB+D format (one plane of RGB and one of D). Otherwise two nodes are
> probably better.
>
Agreed.
> > - If a combined format would be chosen, settle on a format usable also
> > by future RGB-D devices.
>
> In general the video format should match what the hardware supplies. Any
> format conversions should take place in libv4lconvert. Format conversions do
> not belong in kernel space, that's much better done in userspace.
>
Ok, so if I wanted to visualize the depth data in a general v4l2 app,
then libv4lconvert should provide some conversion routines to some
2d image format like the "rgb heat map" in libfreenect; then the
question here becomes:
Is it OK to have depth data, which strictly speaking is not video
data, coming out from a video device node?
Are there any other examples of such "abuses" in kernel drivers right
now?
> Hope this helps. Sounds like this is a cool device :-)
>
Yeah, I played a little bit with accelerometers and image stabilization
(a primordial version of it), and it is fun:
http://blip.tv/file/get/Ao2-KinectImageStabilization247.webm
Good work! Here are my thoughts on the kernel driver:
I don't think we should exclusively move over Kinect support to current
standard kernel interfaces (v4l, input, ALSA...), as they are too
generic and people like to experiment with and use the features of the
Kinect to their fullest. However, kernel drivers do have their
advantages, and of course people also want to use the Kinect as a
standard input device where it makes sense.
I think the best solution would be to have a kernel driver that
implements the basics of the kinect (v4l RGB video for webcam use,
mainly, and possibly a simple depth feed), as well as provides the
streaming layer for libfreenect to use (iso packet header parsing and
frame reassembly, but no data processing). Libfreenect would hook into
this as a backend (the current abstraction is at the USB level but this
could be changed) and therefore get the benefits of kernel-side iso
handling and streaming (probably 10x less context switching and more
reliable performance for loaded systems) while still letting the lib
handle configuration and letting us experiment with the different modes
and formats without messing with the kernel code. And without
libfreenect, users get to use the Kinect as a regular webcam.
I'm not sure what this "power" kernel interface would look like, but it
could be v4l2 as long as we can make sure we can expose all the Kinect
specifics at a lower level (basically weird/compressed/packed frame
formats, the timestamps, and raw control commands/responses). Also, I
haven't checked, but I assume v4l2 supports select()/poll() for frames
(this is critical in order to interoperate with the way libusb does
things, for our other subdevices). Another advantage of not exposing all
the Kinect functionality at high level via v4l2 methods is that we don't
have to duplicate that code in the kernel (otherwise libfreenect would
just become one huge switch at the high level to select between doing
most everything in the kernel and doing most everything in userspace,
and then what's the point). The goal would be a thin kernel driver in
libfreenect mode, plus a thick but basic mode for general compatibility
with other v4l apps.
I don't think we need an accelerometer kernel driver, but if someone has
a use case for accelerometer data delivered via evdev with existing code
it could be done. There's not much point in making libfreenect talk to
it though, for that subdevice we should just ask libusb to unbind the
kernel driver if one gets written.
Audio is still up in the air, but an ALSA driver would of course make
sense for general microphone use; once we figure out what we want to do
with audio this will become clearer. I suspect that getting echo
cancellation to work at all will be near impossible without a hard
realtime framework extending outside the kernel (i.e. JACK), and that
won't play nice with multiple sound cards, so I think advanced features
have the greatest chance of working with a setup like this, and thus
without using a kernel driver:
ALSA card <-- JACK <-> kinect-jackmic <-> libfreenect <-> Kinect
|
+--> echo-cancelled audio to other JACK clients
i.e. we *could* expose the Kinect as a full-duplex ALSA sound card but
at best you'd need a proper realtime framework and server program doing
the echo cancellation and such, and somehow make software talk to two
ALSA cards at once (Kinect and the main soundcard), and a way to set up
the echo cancellation parameters (not to mention firmware issues), so
what's the point, just make libfreenect talk straight to the hardware.
Simple mic use (no echo cancellation, no downloading the filter matrix,
just a 4-channel mic), nonetheless, makes perfect sense as a bog
standard ALSA driver.
libusb-1.0 support will always be in there as a lowest common
denominator and to support other OSes, of course.
FWIW, a combined video stream is hard because the sources aren't
framelocked, not to mention the video streams aren't aligned (and as far
as I can tell, no, the Kinect's standard firmware cannot either of
these, even though its chipset can). I think figuring out how to align
things both temporally and spatially belongs squarely in userspace, so
the Kinect should show up as two v4l2 devices: one for video, and one
for depth. This also lets us play with RGB/IR and depth parameters fully
independently (they really are two separate streams) and makes perfect
logical sense given how the Kinect hardware is implemented.
On 12/06/2010 10:18 PM, Antonio Ospite wrote:
> Hi,
>
> a first, very simplified, linux kernel driver for the Kinect sensor
> device is now available, so you now can use it as a ridiculously
> expensive webcam with any v4l2 application.
>
> Here's the code:
> http://git.ao2.it/gspca_kinect.git/
>
> And here's some background about it (meant also for non OpenKinect
> folks):
> http://ao2.it/blog/2010/12/06/kinect-linux-kernel-driver
>
> As you can see this driver is just some "advanced copy&paste" from
> libfreenect, plus reducing the packets scan routine to the bare minimum.
> Taking some code from libfreenect should be OK as it comes under GPLv2
> (dual-licensed with an Apache license) but let me know if you think
> there could be any issues.
>
> The gspca framework proved itself very precious once again �thanks
> Jean-Fran�ois�, for this simple proof-of-concept driver it took care of
--
Hector Martin (hec...@marcansoft.com)
Public Key: http://www.marcansoft.com/marcan.asc
Hi,
> a first, very simplified, linux kernel driver for the Kinect sensor
> device is now available, so you now can use it as a ridiculously
> expensive webcam with any v4l2 application.
Good work! Here are my thoughts on the kernel driver:
> As you can see this driver is just some "advanced copy&paste" from
> libfreenect, plus reducing the packets scan routine to the bare minimum.
> Taking some code from libfreenect should be OK as it comes under GPLv2
> (dual-licensed with an Apache license) but let me know if you think
> there could be any issues.
This is actually precisely the reason why I made sure we had GPLv2 as an
option :)
> Now the hard part begins, here's a loose TODO-list:
> - Discuss the "fragmentation problem":
> * the webcam kernel driver and the libusb backend of libfreenect
> are not going to conflict each other in practice, but code
> duplication could be avoided to some degree; we could start
> listing the advantages and disadvantages of a v4l2 backend
> opposed to a libusb backend for video data in libfreenct (don't
> think in terms of userspace/kernelspace for now).
I don't think we should exclusively move over Kinect support to current
> * Would exposing the accelerometer as an input device make sense
> too? The only reason for that is to use the data in already
> existing applications. And what about led and motor?
I'm not sure we need an accelerometer kernel driver, but if someone has
a use case for accelerometer data delivered via evdev with existing code
it could be done. There's not much point in making libfreenect talk to
it though, for that subdevice we should just ask libusb to unbind the
kernel driver if one gets written.
> - Decide if we want two separate video nodes, or a
> combined RGB-D data stream coming from a single video device node.
> (I haven't even looked at the synchronization logic yet).
A combined video stream is hard because the sources aren't framelocked,
not to mention the video streams aren't aligned (and as far as I can
tell, no, the Kinect's standard firmware cannot either of these, even
though its chipset can). I think figuring out how to align things both
temporally and spatially belongs squarely in userspace, so the Kinect
should show up as two v4l2 devices: one for video, and one for depth.
This also lets us play with RGB/IR and depth parameters fully
independently (they really are two separate streams) and makes perfect
logical sense given how the Kinect hardware is implemented.
Audio is still up in the air, but an ALSA driver would of course make
sense for general microphone use; once we figure out what we want to do
with audio this will become clearer. I suspect that getting echo
cancellation to work at all will be near impossible without a hard
realtime framework extending outside the kernel (i.e. JACK), and that
won't play nice with multiple sound cards, so I think advanced features
have the greatest chance of working with a setup like this, and thus
without using a kernel driver:
ALSA card <-- JACK <-> kinect-jackmic <-> libfreenect <-> Kinect
|
+--> echo-cancelled audio to other JACK clients
i.e. we *could* expose the Kinect as a full-duplex ALSA sound card but
at best you'd need a proper realtime framework and server program doing
the echo cancellation and such, and somehow make software talk to two
ALSA cards at once (Kinect and the main soundcard), and a way to set up
the echo cancellation parameters (not to mention firmware issues), so
what's the point, just make libfreenect talk straight to the hardware.
Simple mic use (no echo cancellation, no downloading the filter matrix,
just a 4-channel mic), nonetheless, makes perfect sense as a bog
standard ALSA driver.
libusb-1.0 support will always be in there as a lowest common
denominator and to support other OSes, of course.
Cheers,