Hi,
Here is new demo we made I think might be of interest. It uses the
Kinect color image and depth map to achieve augmented reality with
occlusion and object manipulation. Without wasting time, here is the
link to the demo (April 2012):
http://youtu.be/OTmHtAaQD_c
Now the description: we work on augmented reality (AR) where a virtual
object is added to a live video stream as if it was really part of the
scene. Last year, we developed unity3d plugin that is able to create
the AR illusion. Basically it grabs the webcam image and computes the
camera position wrt. a marker. Unity then only shows the webcam image
superimposed with a render of the virtual object from the correct view
point. This video shows what it looks like:
http://youtu.be/npuubLGpU1A
Then we wanted to allow the user to manipulate the state of the
object, rotate and resize, with his hands. The Kinect is perfect for
that, the skeletal tracking measure where the user is relative to the
camera (hand, head, etc). While doing so, we were confronted with the
occlusion problem. That is, if the hand is closer to the camera than
virtual object, the hand must be seen (we must not render the virtual
object over the hand). Again, the Kinect measure the distance of
everything it sees, the depth map, the z-distance. Similarly, when a
video card renders a virtual scene it computes occlusion from the
depth buffer, i.e. the z distance of every object in the virtual
scene. The occlusion algorithm is then obvious, on the shader
clip( z.real < z.virtual).
Thus, we used the Kinect (with the Microsoft sdk, but any other sdk
with skeletal tracking would do) to manage occlusion and manipulate
the virtual objects interactively. Rendering is done in Unity3d free,
occlusion is managed in a unity shader (this is very fast since
computation is done on the graphic card), manipulation is done in
Unity3d from skeletal information from the Kinect.
The biggest problem we faced was speed, transferring the depth map to
the shader was time consuming. We had to resize the depth map to
512x512 to allow faster manipulation. While doing so, we were
confronted by a new problem; the occlusion clip was highlighting the
object contours. So occlusion was correct but when the hand was behind
the virtual object we could see through the virtual object at the
object boundary. The effect was kind of cool but unwanted.
After some investigation we found out that opencv was using bilinear
interpolation to resize the depth map image. This was creating
intermediate values between the background and the object boundaries.
Forcing to nearest neighbor interpolation was the solution, but the
contours were still there. The shader was interpolating the values
when we extracted the depth value from the texture. The coordinates
are indexed from 0 to 1 instead of 0 to width (or 0 to height). We had
to crop the decimal value of the coordinate to make wure it was
falling at the center of the texture pixel to avoid interpolation.
While doing so we found out that unity3d is indexing pixels
coordinates from the pixel corner, not the center. An offset of 0.5
must be added to the coordinate in the native image range.
We are interested in comment from the community. While the technology
is cool, useful applications are difficult to find.