First off: WARNING: RECENT CHANGES WILL PROBABLY BREAK ANY SOFTWARE
USING LIBFREENECT. If I have your attention now, good.
I've finally gotten around to implementing library support for
multiple resolutions and enabled the 1280x1024 mode for the RGB camera
and IR camera in libfreenect. I've submitted these changes upstream
[1] and qDot has merged them into unstable, but I'd like some extra
review for a couple of reasons:
1) This patchset adds, removes, and breaks API. A brief overview:
I've done away with static defines that no longer make sense with
multiple resolution support, like FREENECT_FRAME_PIX. I've added a
clear way to query the appropriate size and dimensions of a buffer
(see freenect_get_video_frame_size). You should set a resolution
(FREENECT_RESOLUTION_{LOW,MEDIUM,HIGH} in addition to the pixel format
before trying to start the video stream. I'd like to hear if the API
changes are sound, or if anyone has suggestions for other things to
leave in, add, remove, or change.
2) Bindings authors, in addition to compile fixes, I need your help to
expose this functionality to users. I currently have compile fixes
for fakenect and cppview; c_sync will need some love (primarly around
the new set-a-resolution-before-streaming constraint). Since python
depends on c_sync, until c_sync is fixed, RGB in python will be broken
too. All the other non-C languages will probably need a bit of work
too.
3) I want other people's suggestions on how best to show off this
support in a demo. glview is getting more and more complex and is
difficult for beginners to follow; and throwing high resolution in the
mix would also require a much larger window (or we'd just see the
high-res picture downsampled). Rather than try to make a
do-everything demo, I've added a new example, hiview, which is
basically a simpler viewer that has two separate windows and resizes
the video window on demand. It also demonstrates correct usage of the
new API.
Obviously, I'd like to ensure a smooth transition with as little
trouble as possible, so I welcome all suggestions, feedback, and
patches. If you're the author of a set of bindings, it'd be great if
we could talk.
In short: if something in unstable recently broke your app or
libfreenect, it's probably my fault.
libfreenect users: Bear with me while I break everything now. It
beats breaking API later, after a versioned release.
libfreenect devs: If you test these changes, give feedback, and fix
your bindings, I'd really appreciate it. Otherwise, I'll try to read
up on the various bindings and see if I can't get them at least
functional again myself.
Many thanks,
Drew Fisher (zarvox)
Florian
--
SENT FROM MY DEC VT50 TERMINAL
About 10 FPS. It saturates the same isochronous pipe that gets used
for the 640x480@30fps video.
-Drew
- Can you give an estimate of the effective depth resolution at
1280x1024? I'd suppose it's not much larger than at 640x480, since the
effective resolution is determined by the speckle pattern?
- AFAIK it's not possible to disable one of the two streams (RGB/depth)
to get more bandwidth for the other, correct?
Thanks,
I would have done that, except that the IR camera doesn't use 640x480
- it uses 640x488, and I wanted to avoid people doing manual malloc()s
of buffers that would surprise them by being 8 rows too small, causing
segfaults. I'm also not entirely sure what the low resolution setting
is (though I bet it's 320x240, I haven't got it working yet), so
assigning it a number would be premature.
Your points are still valid, though; we could call it
FREENECT_RESOLUTION_VGA and FREENECT_RESOLUTION_SXGA if you think
that'd be better, and people wouldn't make the "it's not actually
exactly VGA" mistake.
-Drew
IR camera, not depth camera. This is the raw, unprocessed IR stream
that we get at 1280x1024. The depth stream is completely unaffected
by this patchset (well, except for the removal of FREENECT_FRAME_PIX).
> - AFAIK it's not possible to disable one of the two streams (RGB/depth)
> to get more bandwidth for the other, correct?
That's correct. Both isochronous transfer "pipes" are fixed-size, as
described by the USB Endpoint Descriptors; the Kinect merely stuffs
larger (and fewer) frames into one of the pipes, as quickly as it can.
-Drew
Probably best to stick with 640x480, unless you've got a reason not
to. 1280x1024 has a third the framerate, is more likely to drop
frames (more packets to lose, but the same threshold for unacceptable
packet loss), is more expensive to do computer vision on (more
pixels), and has the strange constraint that you can't run 1280x1024
IR and the depth camera at the same time, which kinda defeats the
point of the Kinect in the first place.
Hmm, that should probably go into the documentation, somewhere.
> 2) If different use case scenarios exist what are they and what is the impact of not supporting them.
I'm not sure what you're asking with this question. My thoughts were:
I'll try to have this library/driver support anything the hardware can
actually do.
> I view the PS1080 as a Visual Processing Unit for processing 3D depth information input into a computer similiar to the way we view a GPU for processing 3D data output from a computer.
>
> The resolution and performance will increase over time until we hit a standard similar to 1080P but we need to always be able to specify the resolution manually. Examples are bandwidth-we may not be using usb but internet, disk storage - maybe we want to save the data etc
I'm not the one who picked the resolutions to support - the Kinect
firmware only offers a couple. If you want to process/trim frames
down to an arbitrary resolution, that could probably be better done
with libv4l or something. Further, you ask the driver:
freenect_get_video_frame_size(FREENECT_VIDEO_RGB, FREENECT_RESOLUTION_MEDIUM);
and it replies with a
struct freenect_frame_size {
int width = 640;
int height = 480;
int bytes = 921600; // 640*480*3
}
So it's not like you don't know the resolution of the frames you're
getting - it's just available at runtime, rather than compile time.
Maybe that's still too abstract or confusing; I was aiming for a
resolution-agnostic, pixel format-agnostic way for people to determine
frame information at runtime, which would simplify looping over rows
or columns for processing. It made some of the library internals
simpler - you can refactor all the messy frame information into
freenect_video_get_frame_size(), rather than leaving a pile of if-else
all over the library.
Does that answer your question?
-Drew
Good to hear.
> I would go one step further and have libfreenect expose a list of
> "frame-modes" and let users cycle/iterate through them.
> const frame_desc* freenect_get_first_video_mode();
> const frame_desc* freenect_get_next_video_mode(const frame_desc*);
> a frame_desc structure could look like this:
> struct frame_desc
> {
> enum mode = { VIDEO_BAYER, VIDEO_RGB, VIDEO_IR, ... };
> int width;
> int height;
> int pitch;
> int pixeldepth;
> int bytes;
> };
> I believe that something like this would make the API more scalable.
> One could also thing to have some "shortcuts" to some frame_desc* via the
> signature you have just posted:
> frame_desc*
> get_first_video_mode(FREENECT_VIDEO_RGB, FREENECT_RESOLUTION_MEDIUM);
I'm a little confused by this part, but I've never really done real
API design, so bear with me.
I'm assuming that frame_desc would obviate freenect_frame_size as it
is more informative. The purpose of freenect_get_first_video_mode()
and freenect_get_next_video_mode() would be for enumerating all the
format combinations that the driver supports at runtime.
Would actually setting the resolution or pixel format would remain in
the current set_video_resolution and set_video_format functions, or
should there be a different API for that? If the former: we probably
need the frame_desc struct to contain the freenect_video_resolution
and freenect_video_format enum values, so they can be provided to the
appropriate setters. If the latter: what might it look like? I don't
really like the idea of having the driver dealing with validating
whether an arbitrary user-provided frame_desc is valid or matches one
we provide - I'd rather have a couple functions to set the couple
knobs we allow users to tweak.
So, with that in mind, here's what I propose to do:
1) Rename freenect_frame_size to freenect_video_mode or something like
that, and add several more informative fields, including the
freenect_video_format and freenect_video_resolution values that the
driver associates with this particular mode:
struct freenect_video_mode {
freenect_video_format fmt;
freenect_video_resolution res;
int width;
int height;
int bytes;
int fps; // Well, a best-case approximate value, anyway.
//int pitch; // Personally, I'd actually rather just mandate
contiguous buffers for simplicity. Other opinions?
int pixeldepth;
};
2) Add functions:
freenect_video_mode freenect_get_first_video_mode()
freenect_video_mode freenect_get_next_video_mode(freenect_video_mode)
to facilitate enumeration of supported modes. There can be at most
one mode with a particular (freenect_video_format,
freenect_video_resolution) pair. (Open question: how do represent
that we've gone through all the available video modes?
freenect_video_mode.bytes = 0?) I'll also add a convenience function:
freenect_video_mode freenect_get_video_mode(freenect_video_format,
freenect_video_resolution)
which will return the unique
3) Change freenect_get_current_video_frame_size() to
freenect_video_mode freenect_get_current_video_mode(freenect_device *dev)
Alternately, remove this convenience function entirely, and rework the
bindings that still use it to track this sort of thing internally.
4) Leave the freenect_set_video_* functions as-is.
5) Fix all the examples to match.
Comments?
Thanks,
Drew
TODO: fix the rest of the bindings so they work as before. My pull
request currently covers c_sync, C++, AS3, and OpenCV. Matlab appears
to only use c_sync directly, so it should be fine. I still need to
take care of (or receive assistance with) Python, C#, Java, and Ruby.
Note that this is only bringing the bindings back to their
functionality before the API was switched out from under them, and
does not include exposing the new high resolution support. That will
require the bindings maintainers breaking each API's bindings (or at
least, adding to them).
Many thanks for your patience. Hopefully we'll have a release soon!
-Drew Fisher
[1] - https://github.com/zarvox/libfreenect/commit/844e17b90aa62581598dba602a98b7d80baaecd0
[2] - https://github.com/OpenKinect/libfreenect/pull/203