High resolution support (warning: API breakage!)

drew.m...@gmail.com

unread,

Feb 15, 2011, 12:15:13 AM2/15/11

to openk...@googlegroups.com

Hey all,

First off: WARNING: RECENT CHANGES WILL PROBABLY BREAK ANY SOFTWARE
USING LIBFREENECT. If I have your attention now, good.

I've finally gotten around to implementing library support for
multiple resolutions and enabled the 1280x1024 mode for the RGB camera
and IR camera in libfreenect. I've submitted these changes upstream
[1] and qDot has merged them into unstable, but I'd like some extra
review for a couple of reasons:

1) This patchset adds, removes, and breaks API. A brief overview:
I've done away with static defines that no longer make sense with
multiple resolution support, like FREENECT_FRAME_PIX. I've added a
clear way to query the appropriate size and dimensions of a buffer
(see freenect_get_video_frame_size). You should set a resolution
(FREENECT_RESOLUTION_{LOW,MEDIUM,HIGH} in addition to the pixel format
before trying to start the video stream. I'd like to hear if the API
changes are sound, or if anyone has suggestions for other things to
leave in, add, remove, or change.

2) Bindings authors, in addition to compile fixes, I need your help to
expose this functionality to users. I currently have compile fixes
for fakenect and cppview; c_sync will need some love (primarly around
the new set-a-resolution-before-streaming constraint). Since python
depends on c_sync, until c_sync is fixed, RGB in python will be broken
too. All the other non-C languages will probably need a bit of work
too.

3) I want other people's suggestions on how best to show off this
support in a demo. glview is getting more and more complex and is
difficult for beginners to follow; and throwing high resolution in the
mix would also require a much larger window (or we'd just see the
high-res picture downsampled). Rather than try to make a
do-everything demo, I've added a new example, hiview, which is
basically a simpler viewer that has two separate windows and resizes
the video window on demand. It also demonstrates correct usage of the
new API.

Obviously, I'd like to ensure a smooth transition with as little
trouble as possible, so I welcome all suggestions, feedback, and
patches. If you're the author of a set of bindings, it'd be great if
we could talk.

In short: if something in unstable recently broke your app or
libfreenect, it's probably my fault.
libfreenect users: Bear with me while I break everything now. It
beats breaking API later, after a versioned release.
libfreenect devs: If you test these changes, give feedback, and fix
your bindings, I'd really appreciate it. Otherwise, I'll try to read
up on the various bindings and see if I can't get them at least
functional again myself.

Many thanks,
Drew Fisher (zarvox)

[1] - https://github.com/OpenKinect/libfreenect/pull/203

Florian Echtler

unread,

Feb 15, 2011, 1:59:29 AM2/15/11

to openk...@googlegroups.com

On Mon, 2011-02-14 at 21:15 -0800, drew.m...@gmail.com wrote:
> I've finally gotten around to implementing library support for
> multiple resolutions and enabled the 1280x1024 mode for the RGB camera
> and IR camera in libfreenect. I've submitted these changes upstream
> [1] and qDot has merged them into unstable, but I'd like some extra
> review for a couple of reasons:

Awesome, I didn't know the hardware was capable of that. What's the
framerate at this resolution?

Florian
--
SENT FROM MY DEC VT50 TERMINAL

drew.m...@gmail.com

unread,

Feb 15, 2011, 2:37:59 AM2/15/11

to openk...@googlegroups.com

About 10 FPS. It saturates the same isochronous pipe that gets used
for the 640x480@30fps video.

-Drew

David Garcia Garzon

unread,

Feb 15, 2011, 7:52:57 AM2/15/11

to OpenKinect

Shouldn't a more specific resolution macro have more sense? i mean
FREENECT_RESOLUTION_{LOW,MEDIUM,HIGH} is too relative. Maybe including
numeric dimensions in the macro name is more future proof and
informative. For example:
FREENECT_RESOLUTION_1280x1024.

David.

On 15 feb, 06:15, "drew.m.fis...@gmail.com" <drew.m.fis...@gmail.com>
wrote:

Florian Echtler

unread,

Feb 15, 2011, 8:05:22 AM2/15/11

to openk...@googlegroups.com

Thanks Drew - two more brief questions:

- Can you give an estimate of the effective depth resolution at
1280x1024? I'd suppose it's not much larger than at 640x480, since the
effective resolution is determined by the speckle pattern?

- AFAIK it's not possible to disable one of the two streams (RGB/depth)
to get more bandwidth for the other, correct?

Thanks,

Nink

unread,

Feb 15, 2011, 8:51:14 AM2/15/11

to openk...@googlegroups.com

Probably the two questions are

1) What's the optimal resolutions or is it dependant on specific use case scenarios.

2) If different use case scenarios exist what are they and what is the impact of not supporting them.

I view the PS1080 as a Visual Processing Unit for processing 3D depth information input into a computer similiar to the way we view a GPU for processing 3D data output from a computer.

The resolution and performance will increase over time until we hit a standard similar to 1080P but we need to always be able to specify the resolution manually. Examples are bandwidth-we may not be using usb but internet, disk storage - maybe we want to save the data etc
Sent from my BlackBerry

drew.m...@gmail.com

unread,

Feb 15, 2011, 12:29:01 PM2/15/11

to openk...@googlegroups.com

On Tue, Feb 15, 2011 at 4:52 AM, David Garcia Garzon
<david....@upf.edu> wrote:
> Shouldn't a more specific resolution macro have more sense? i mean
> FREENECT_RESOLUTION_{LOW,MEDIUM,HIGH} is too relative. Maybe including
> numeric dimensions in the macro name is more future proof and
> informative. For example:
> FREENECT_RESOLUTION_1280x1024.

I would have done that, except that the IR camera doesn't use 640x480
- it uses 640x488, and I wanted to avoid people doing manual malloc()s
of buffers that would surprise them by being 8 rows too small, causing
segfaults. I'm also not entirely sure what the low resolution setting
is (though I bet it's 320x240, I haven't got it working yet), so
assigning it a number would be premature.

Your points are still valid, though; we could call it
FREENECT_RESOLUTION_VGA and FREENECT_RESOLUTION_SXGA if you think
that'd be better, and people wouldn't make the "it's not actually
exactly VGA" mistake.

-Drew

drew.m...@gmail.com

unread,

Feb 15, 2011, 12:32:59 PM2/15/11

to openk...@googlegroups.com

On Tue, Feb 15, 2011 at 5:05 AM, Florian Echtler <fl...@butterbrot.org> wrote:
> Thanks Drew - two more brief questions:
>
> - Can you give an estimate of the effective depth resolution at
> 1280x1024? I'd suppose it's not much larger than at 640x480, since the
> effective resolution is determined by the speckle pattern?

IR camera, not depth camera. This is the raw, unprocessed IR stream
that we get at 1280x1024. The depth stream is completely unaffected
by this patchset (well, except for the removal of FREENECT_FRAME_PIX).

> - AFAIK it's not possible to disable one of the two streams (RGB/depth)
> to get more bandwidth for the other, correct?

That's correct. Both isochronous transfer "pipes" are fixed-size, as
described by the USB Endpoint Descriptors; the Kinect merely stuffs
larger (and fewer) frames into one of the pipes, as quickly as it can.

-Drew

drew.m...@gmail.com

unread,

Feb 15, 2011, 1:00:46 PM2/15/11

to openk...@googlegroups.com

On Tue, Feb 15, 2011 at 5:51 AM, Nink <nin...@gmail.com> wrote:
> Probably the two questions are
>
> 1) What's the optimal resolutions or is it dependant on specific use case scenarios.

Probably best to stick with 640x480, unless you've got a reason not
to. 1280x1024 has a third the framerate, is more likely to drop
frames (more packets to lose, but the same threshold for unacceptable
packet loss), is more expensive to do computer vision on (more
pixels), and has the strange constraint that you can't run 1280x1024
IR and the depth camera at the same time, which kinda defeats the
point of the Kinect in the first place.

Hmm, that should probably go into the documentation, somewhere.

> 2) If different use case scenarios exist what are they and what is the impact of not supporting them.

I'm not sure what you're asking with this question. My thoughts were:
I'll try to have this library/driver support anything the hardware can
actually do.

> I view the PS1080 as a Visual Processing Unit for processing 3D depth information input into a computer similiar to the way we view a GPU for processing 3D data output from a computer.
>
> The resolution and performance will increase over time until we hit a standard similar to 1080P but we need to always be able to specify the resolution manually. Examples are bandwidth-we may not be using usb but internet, disk storage - maybe we want to save the data etc

I'm not the one who picked the resolutions to support - the Kinect
firmware only offers a couple. If you want to process/trim frames
down to an arbitrary resolution, that could probably be better done
with libv4l or something. Further, you ask the driver:

freenect_get_video_frame_size(FREENECT_VIDEO_RGB, FREENECT_RESOLUTION_MEDIUM);

and it replies with a

struct freenect_frame_size {
int width = 640;
int height = 480;
int bytes = 921600; // 640*480*3
}

So it's not like you don't know the resolution of the frames you're
getting - it's just available at runtime, rather than compile time.
Maybe that's still too abstract or confusing; I was aiming for a
resolution-agnostic, pixel format-agnostic way for people to determine
frame information at runtime, which would simplify looping over rows
or columns for processing. It made some of the library internals
simpler - you can refactor all the messy frame information into
freenect_video_get_frame_size(), rather than leaving a pile of if-else
all over the library.

Does that answer your question?

-Drew

Marcos Slomp

unread,

Feb 15, 2011, 8:50:48 PM2/15/11

to openk...@googlegroups.com

> (...) It made some of the library internals

> simpler - you can refactor all the messy frame information into
> freenect_video_get_frame_size(), rather than leaving a pile of if-else
> all over the library.

I agree with this design because polluting the API with lots of macros is unnecessary.

I would go one step further and have libfreenect expose a list of "frame-modes" and let users cycle/iterate through them.

const frame_desc* freenect_get_first_video_mode();

const frame_desc* freenect_get_next_video_mode(const frame_desc*);

a frame_desc structure could look like this:

struct frame_desc

{

enum mode = { VIDEO_BAYER, VIDEO_RGB, VIDEO_IR, ... };

int width;

int height;

int pitch;

int pixeldepth;

int bytes;

};

I believe that something like this would make the API more scalable.

One could also thing to have some "shortcuts" to some frame_desc* via the signature you have just posted:

frame_desc* get_first_video_mode(FREENECT_VIDEO_RGB, FREENECT_RESOLUTION_MEDIUM);

drew.m...@gmail.com

unread,

Feb 17, 2011, 4:27:37 AM2/17/11

to openk...@googlegroups.com

On Tue, Feb 15, 2011 at 5:50 PM, Marcos Slomp <msl...@gmail.com> wrote:
>> (...) It made some of the library internals
>> simpler - you can refactor all the messy frame information into
>> freenect_video_get_frame_size(), rather than leaving a pile of if-else
>> all over the library.
> I agree with this design because polluting the API with lots of macros
> is unnecessary.

Good to hear.

> I would go one step further and have libfreenect expose a list of
> "frame-modes" and let users cycle/iterate through them.
> const frame_desc* freenect_get_first_video_mode();
> const frame_desc* freenect_get_next_video_mode(const frame_desc*);
> a frame_desc structure could look like this:
> struct frame_desc
> {
>   enum mode = { VIDEO_BAYER, VIDEO_RGB, VIDEO_IR, ... };
>   int width;
>   int height;
>   int pitch;
>   int pixeldepth;
>   int bytes;
> };
> I believe that something like this would make the API more scalable.
> One could also thing to have some "shortcuts" to some frame_desc* via the
> signature you have just posted:
> frame_desc*
> get_first_video_mode(FREENECT_VIDEO_RGB, FREENECT_RESOLUTION_MEDIUM);

I'm a little confused by this part, but I've never really done real
API design, so bear with me.

I'm assuming that frame_desc would obviate freenect_frame_size as it
is more informative. The purpose of freenect_get_first_video_mode()
and freenect_get_next_video_mode() would be for enumerating all the
format combinations that the driver supports at runtime.

Would actually setting the resolution or pixel format would remain in
the current set_video_resolution and set_video_format functions, or
should there be a different API for that? If the former: we probably
need the frame_desc struct to contain the freenect_video_resolution
and freenect_video_format enum values, so they can be provided to the
appropriate setters. If the latter: what might it look like? I don't
really like the idea of having the driver dealing with validating
whether an arbitrary user-provided frame_desc is valid or matches one
we provide - I'd rather have a couple functions to set the couple
knobs we allow users to tweak.

So, with that in mind, here's what I propose to do:

1) Rename freenect_frame_size to freenect_video_mode or something like
that, and add several more informative fields, including the
freenect_video_format and freenect_video_resolution values that the
driver associates with this particular mode:

struct freenect_video_mode {
freenect_video_format fmt;
freenect_video_resolution res;
int width;
int height;
int bytes;
int fps; // Well, a best-case approximate value, anyway.
//int pitch; // Personally, I'd actually rather just mandate
contiguous buffers for simplicity. Other opinions?
int pixeldepth;
};

2) Add functions:

freenect_video_mode freenect_get_first_video_mode()
freenect_video_mode freenect_get_next_video_mode(freenect_video_mode)

to facilitate enumeration of supported modes. There can be at most
one mode with a particular (freenect_video_format,
freenect_video_resolution) pair. (Open question: how do represent
that we've gone through all the available video modes?
freenect_video_mode.bytes = 0?) I'll also add a convenience function:

freenect_video_mode freenect_get_video_mode(freenect_video_format,
freenect_video_resolution)

which will return the unique

3) Change freenect_get_current_video_frame_size() to

freenect_video_mode freenect_get_current_video_mode(freenect_device *dev)

Alternately, remove this convenience function entirely, and rework the
bindings that still use it to track this sort of thing internally.

4) Leave the freenect_set_video_* functions as-is.

5) Fix all the examples to match.

Comments?

Thanks,
Drew

Marcos Slomp

unread,

Feb 17, 2011, 10:44:04 PM2/17/11

to openk...@googlegroups.com

> I'm a little confused by this part, but I've never really done real
> API design, so bear with me.

Neither did I =)

> //int pitch; // Personally, I'd actually rather just mandate

> contiguous buffers for simplicity. Other opinions?

I agree with contiguous buffers; the pitch may or not be helpful in the future, but currently it is redundant data.

> Would actually setting the resolution or pixel format would remain in
> the current set_video_resolution and set_video_format functions, or
> should there be a different API for that? If the former: we probably
> need the frame_desc struct to contain the freenect_video_resolution
> and freenect_video_format enum values, so they can be provided to the
> appropriate setters. If the latter: what might it look like? I don't
> really like the idea of having the driver dealing with validating
> whether an arbitrary user-provided frame_desc is valid or matches one
> we provide - I'd rather have a couple functions to set the couple
> knobs we allow users to tweak.

Here is my API idea:

We could "force" the user on using only frame_desc that are supported by libfreenect.

If we return a "const frame_desc*" (pointer), which should be allocated/initialized/managed by libfreenect, identifying a video mode is just a matter of doing a pointer comparison.

The "const" qualifier will prevent the user on writing on it (err, well, sort of...)

This is not a elegant solution (since we are actually exposing some memory-space of libfreenect to the client), but get the job done fairly quickly.

If the implicit memory exposure is really a problem, we could have something like:

int freenect_get_number_of_video_modes(...)

and then:

const frame_desc freenect_describe_video_mode(int i)

With an API like this, the "iterator-terminator" comes naturally, and we could then let the user do:

freenect_set_video_mode(const frame_desc* fd)

Internally, the freenect_set_video_mode(const frame_desc) will just peek over the "freenect_video_format fmt" and "freenect_video_resolution res" fields, ignoring width and height.

Even if the user decides to mess around width and height or whatever, all libfreenect wants is fmt and res.

We could then also expose some shortcuts:

const frame_desc freenect_find_video_mode(freenect_video_format fmt, freenect_video_resolution res)

Any thoughts?

drew.m...@gmail.com

unread,

Mar 16, 2011, 4:33:02 AM3/16/11

to openk...@googlegroups.com

Okay, after some off-list discussion, Marcos Slomp and I have come up
with the API in [1]. I've updated my pull request [2] accordingly.

TODO: fix the rest of the bindings so they work as before. My pull
request currently covers c_sync, C++, AS3, and OpenCV. Matlab appears
to only use c_sync directly, so it should be fine. I still need to
take care of (or receive assistance with) Python, C#, Java, and Ruby.

Note that this is only bringing the bindings back to their
functionality before the API was switched out from under them, and
does not include exposing the new high resolution support. That will
require the bindings maintainers breaking each API's bindings (or at
least, adding to them).

Many thanks for your patience. Hopefully we'll have a release soon!

-Drew Fisher

[1] - https://github.com/zarvox/libfreenect/commit/844e17b90aa62581598dba602a98b7d80baaecd0
[2] - https://github.com/OpenKinect/libfreenect/pull/203

Reply all

Reply to author

Forward