Spatialize audio streams from multiple peers / c++

Yuri Tikhomirov

unread,

Apr 4, 2016, 1:24:51 AM4/4/16

to discuss-webrtc

Hey community!

I'm working on webrtc for Windows / Android and (in distant future) for iOS. My goal is to make a 3D virtual room where a number of people could talk to each other.

And I want to implement spatialized sound, speaking of which, I want to capture the audio stream of every peer separately and to pass it to some audio engine to let it position the audio sources in space.

This means I have to avoid a common scheme where every audio track is mixed with others and being played out of one speaker. To deal with it, I started with simple project that implements those signaling, ice and stuff, I've got it working and now I'm facing the main problem.

First of all, I found AudioTrackSinkInterface and dummy-implemented it, so now I may see it provides some samples from remote audio stream. But the callback OnData() accepts many things I'm afraid could change over time, here's typical log from handler:

MyAudioTrackSink::OnData() : bits_per_sample=16 sample_rate=48000 number_of_channels=2 number_of_frames=480

So here is some question:

1. Is audio sink provides samples from very last step of audio processing, I mean from that point, where it is being output to audio device, or is that plain output from codec (like opus), and if yes, may I expect sample rate to be changed over time depending on connection?..

2. Are there not-so-sophisticated ways to bypass output to device but still keep audio sink working? Overall, I need the webrtc not to capture audio device for output at all, this means, I want to handle audio output by myself. For now I'm just horrified with that fact I probably need to implement AudioDeviceModule to do that...

3. Not that important, but why audio sink is not working for local audio stream? I found that for local audio stream there's just empty implementation of AddSink() virtual method, but why? The reason I'm asking is that I'll probably want to add some reverb to the voice of local actor and put 'wet' sound of that reverb to output (I know about delays) - to let user feel like he's in some hall and his voice reflects from walls or whatever...

4. How to decode the data of AudioSinkInterface::OnData() ? =) Mean, is number_of_frames is actually a number of samples? Total samples or for each channel? And how does the samples of different channels aligned, like '121212' or '111222'?

5. And finally, why I got two channels in audio sink callback? Maybe this answers my question number 1 meaning that audio sink is just the place where webrtc generates the final output in a form appropriate to device? And if so, does the channels have the same waveform and it is actually one channel cloned to both left and right?

I think that's it for now...

Thanks for your help in advance, any short answers will be appreciated!

Oivvio Polite

unread,

Apr 19, 2016, 4:56:29 PM4/19/16

to discuss...@googlegroups.com

On fre, apr 01, 2016 at 07:13:12 -0700, Yuri Tikhomirov wrote:
> Hey community!
>
> I'm working on webrtc for Windows / Android and (in distant future) for
> iOS. My goal is to make a 3D virtual room where a number of people could
> talk to each other.

I can't give you the whole puzzle but I can tell you that the Web Audio
API has functions for placing sound sources and the listener in virtual 3D
room. It's pretty easy to use and works well.

So all you need to figure out is who to pipe the audio coming out of
your MediaStreamTracks in to a Web Audio graph.

Here's a tutorial on the Web Audio API.

http://www.html5rocks.com/en/tutorials/webaudio/games/

regards Oivvio

Ricky B.

unread,

Nov 4, 2016, 10:15:52 AM11/4/16

to discuss-webrtc

Hi! Did you make any progress with your efforts?

Chris Ward

unread,

Nov 4, 2016, 5:08:42 PM11/4/16

to discuss-webrtc

I can confirm that Oivvio's suggestion of using webaudio is certainly a viable path to achieve Yuri's request of spatializing the audio.

In fact, we are doing exactly that in our new web app, Locus. The audio streams from peerConnections can be piped to webaudio and spatialized there before combining and outputting them.

Regards, Chris

Ryan B.

unread,

Jun 8, 2017, 2:45:04 AM6/8/17

to discuss-webrtc

I have these exact same questions, but unfortunately the previous posters did not seem to have the answers. Anyone?

Ryan B.

unread,

Jun 9, 2017, 4:24:52 PM6/9/17

to discuss-webrtc

I know this is an old thread, but I struggled with the same questions, so posting the answers here in case anyone is searching for them in the future.

1. The audio sink provides the unmixed audio from a single track, but it has already been decoded. I don't believe the sample rate should change unless the peers renegotiate the connection.

2. Yes! If you set track->set_enabled(false), it will mute playback to the speakers, but you can still capture the data in the OnData() callback to do as you like with it.

3. Not sure.

4. Yes, number_of_frames is the number of samples (per channel). My data was single-channel, so I'm not sure whether the channels are interleaved or back-to-back. The data are signed integers, with the precision specified by bits_per_sample.