NUIse - a proof-of-concept driver for the Kinect audio core

77 views
Skip to first unread message

drew.m...@gmail.com

unread,
Mar 7, 2011, 7:49:56 PM3/7/11
to openk...@googlegroups.com
I spent some quality time this weekend reverse-engineering the Kinect
audio device. This proof-of-concept driver is the fruit of that
labor. :)

Check it out at https://github.com/zarvox/nuise

License is GPLv3+ for now.

Special thanks go out to Adafruit for the USB logs I used and
Sebastian Ortiz [1] for doing some preliminary work and proposing a
very similar interpretation of much of the data.

I know there's TONS to clean up still, and some data that I still
don't fully understand. I intend to eventually get this into
libfreenect, but for now, it is not of sufficient quality to be
developed in that tree. Protocol documentation [2] is still a
work-in-progress. If anyone knows much about Complex Empirical Mode
Decomposition, that might help us understand how the calibration
works. Lastly, if anyone can record some USB traces of the Kinect
doing audio calibration, that would also be fantastic.

Enjoy! [3]

Drew Fisher (zarvox)

[1] - http://www.keyboardmods.com/2011/02/kinect-audio-reverse-engineering.html
[2] - http://openkinect.org/wiki/Protocol_Documentation#NUI_Audio
[3] - http://zarvox.org/kinect/recording.wav

Sebastian Ortiz

unread,
Mar 7, 2011, 7:58:12 PM3/7/11
to openk...@googlegroups.com

Congrats on getting it all working. I've got a dump of calibration,. but unfortunately just in exported CSV form. I'll hunt around to see if I can find the original .tdc otherwise I can get you new traces this Thursday.

-S

Sebastian Ortiz

unread,
Mar 10, 2011, 4:52:58 AM3/10/11
to openk...@googlegroups.com, drew.m...@gmail.com
I acquired an xbox a little earlier than planned. Unfortunately my personal server isn't really set up, so all I've got for you are some megaupload links, sorry. 

Audio setup has 4 steps:
(1)Background noise check: http://www.megaupload.com/?d=PUM8DDYE
(2)Xbox volume check: http://www.megaupload.com/?d=PUM8DDYE  (next time, I'll split 1 and 2)
(3)Microphone calibration: http://www.megaupload.com/?d=9HIXRHAA
(4)Speech detection test: http://www.megaupload.com/?d=X9MCVDCH

Would it be helpful if I calibrate the system (while saving the TDCs) then power cycle the xbox and dump the initialization sequence? It seems like that might be a first step towards understanding how you go from the 4 items above to the presumed CEMD data blob.

All of the dumps linked above have a 700Hz tone in the background that's quiet enough to pass the background noise check. I threw that in there because I thought it might make it easier to make out patterns, but I'll take another set tomorrow without the tone. During the background noise check, you're expected to be as quiet as possible. During the volume check, the Xbox plays a sequence of stereo audio bursts. A similar set of audio bursts is also played during the microphone calibration. During the speech detection test you're asked to read a series of verbal prompts- "one,two,three",etc. I'll try to get a video recording of the whole process up sometime tomorrow, if I get the chance. 

As you mentioned earlier, input from someone familiar with the usage of CEMD in the context of audio would be helpful. I can't even begin to guess what's in the CEMD blob. Would it be, for example, 100 complex points of the top 5 IMFs for each channel?  It'd also be helpful if an audio pro could comment on whether CEMD might be a red herring here- i.e. maybe CEMD stands for something else, maybe the blob is something like TDOA data computed from the audio bursts? Just a thought.
   
-S

drew.m...@gmail.com

unread,
Mar 10, 2011, 2:22:53 PM3/10/11
to sebo...@gmail.com, openk...@googlegroups.com
On Thu, Mar 10, 2011 at 1:52 AM, Sebastian Ortiz <sebo...@gmail.com> wrote:
> I acquired an xbox a little earlier than planned. Unfortunately my personal
> server isn't really set up, so all I've got for you are some megaupload
> links, sorry.
> Audio setup has 4 steps:
> (1)Background noise check: http://www.megaupload.com/?d=PUM8DDYE
> (2)Xbox volume check: http://www.megaupload.com/?d=PUM8DDYE  (next time,
> I'll split 1 and 2)
> (3)Microphone calibration: http://www.megaupload.com/?d=9HIXRHAA
> (4)Speech detection test: http://www.megaupload.com/?d=X9MCVDCH

Sweet. These will take a while to read through, but I'm sure they'll
be very helpful. Many thanks.

I've mirrored them on my server at [1], [2], [3]. I can't guarantee
they'll be up forever, but as long as they don't eat too much
bandwidth I'll leave them up.

> Would it be helpful if I calibrate the system (while saving the TDCs) then
> power cycle the xbox and dump the initialization sequence? It seems like
> that might be a first step towards understanding how you go from the 4 items
> above to the presumed CEMD data blob.

Yes, that would probably be good, although I suspect that the system
will upload the calibration blob at the end of the calibration
sequence. I won't know until I've looked through the logs, though.
Either way, it'd be good to see if it's there's some default blob
used, and whether that blob persists across device connections.

> All of the dumps linked above have a 700Hz tone in the background that's
> quiet enough to pass the background noise check. I threw that in there
> because I thought it might make it easier to make out patterns, but I'll
> take another set tomorrow without the tone. During the background noise
> check, you're expected to be as quiet as possible. During the volume check,
> the Xbox plays a sequence of stereo audio bursts. A similar set of audio
> bursts is also played during the microphone calibration. During the speech
> detection test you're asked to read a series of verbal prompts-
> "one,two,three",etc. I'll try to get a video recording of the whole process
> up sometime tomorrow, if I get the chance.

Thanks for the information!

> As you mentioned earlier, input from someone familiar with the usage of CEMD
> in the context of audio would be helpful. I can't even begin to guess what's
> in the CEMD blob. Would it be, for example, 100 complex points of the top 5
> IMFs for each channel?  It'd also be helpful if an audio pro could comment
> on whether CEMD might be a red herring here- i.e. maybe CEMD stands for
> something else, maybe the blob is something like TDOA data computed from the
> audio bursts? Just a thought.

Made some small progress on interpreting the data (actually, Hector
did this himself some point back in November, but it wasn't written up
outside of IRC logs anywhere to my knowledge, so I didn't notice until
after I duplicated his work) - it's a long series of 32-bit floats
between -.5 and .5 . There are 200 blocks of size 2k bytes (512
samples) each. The second block, 22nd block, ... ( block = 2 mod 20
)th block ... all have much greater magnitude than the rest. I
converted the data to a WAV file [4] by multiplying each sample by
2**31 and flooring the integer value, then sticking a WAV header on
it. Maybe it'll help to visualize the waveforms with audacity.

Maybe I'll hear/see something similar in the logs.

> -S

-Drew

[1] - http://zarvox.org/kinect/acceptable_700hz_background_and_music_check.tdc
[2] - http://zarvox.org/kinect/microphone_calibration_700hz.tdc
[3] - http://zarvox.org/kinect/speech_detection_test.tdc
[4] - http://zarvox.org/kinect/filters.wav

Reply all
Reply to author
Forward
0 new messages