Hey community!
I'm working on webrtc for Windows / Android and (in distant future) for iOS. My goal is to make a 3D virtual room where a number of people could talk to each other.
And I want to implement spatialized sound, speaking of which, I want to capture the audio stream of every peer separately and to pass it to some audio engine to let it position the audio sources in space.
This means I have to avoid a common scheme where every audio track is mixed with others and being played out of one speaker. To deal with it, I started with simple project that implements those signaling, ice and stuff, I've got it working and now I'm facing the main problem.
First of all, I found AudioTrackSinkInterface and dummy-implemented it, so now I may see it provides some samples from remote audio stream. But the callback OnData() accepts many things I'm afraid could change over time, here's typical log from handler:
MyAudioTrackSink::OnData() : bits_per_sample=16 sample_rate=48000 number_of_channels=2 number_of_frames=480
So here is some question:
1. Is audio sink provides samples from very last step of audio processing, I mean from that point, where it is being output to audio device, or is that plain output from codec (like opus), and if yes, may I expect sample rate to be changed over time depending on connection?..
2. Are there not-so-sophisticated ways to bypass output to device but still keep audio sink working? Overall, I need the webrtc not to capture audio device for output at all, this means, I want to handle audio output by myself. For now I'm just horrified with that fact I probably need to implement AudioDeviceModule to do that...
3. Not that important, but why audio sink is not working for local audio stream? I found that for local audio stream there's just empty implementation of AddSink() virtual method, but why? The reason I'm asking is that I'll probably want to add some reverb to the voice of local actor and put 'wet' sound of that reverb to output (I know about delays) - to let user feel like he's in some hall and his voice reflects from walls or whatever...
4. How to decode the data of AudioSinkInterface::OnData() ? =) Mean, is number_of_frames is actually a number of samples? Total samples or for each channel? And how does the samples of different channels aligned, like '121212' or '111222'?
5. And finally, why I got two channels in audio sink callback? Maybe this answers my question number 1 meaning that audio sink is just the place where webrtc generates the final output in a form appropriate to device? And if so, does the channels have the same waveform and it is actually one channel cloned to both left and right?
I think that's it for now...
Thanks for your help in advance, any short answers will be appreciated!