Dear developers,
I am developing a VoIP software using WebRTC as audio processing library. I only used the audio_processing in modules before. It is working well except I have to calculate the delay between far-end audio stream and the near-end stream. Currently I am using other way to get the delay, then I pass it to echo canceller by webrtc::AudioProcessing::set_stream_delay_ms(int delay), and then I use AnalyzeReverseStream(&far_frame) and ProcessStream(&near_frame) to do the echo cancellation.
So I have a few questions:
1. Am I doing it right by using audio_processing? I mean do I really have to calculate the delay myself, did I miss something?
2. Recently I noticed that there is a function webrtc::EchoCancellation::GetDelayMetrics(). According to the description, I can get the delay by calling this function. So my understanding is that first I call this function and then feed the result delay to set_stream_delay_ms(). I did some tests using some recordings (pcm files), but it never give me the right delay and of course no echo was cancelled at all. Is my understanding about this function right?
3. I know there is a Voice Engine in WebRTC, and I had a look at the source code. It seems a upper layer than modules. So my understanding is that VOE calls functions in modules to do audio processing. I went through all functions in Voice Engine, I can see how to set or get the status of echo canceller or other audio processor, but I didn't find any function calling AnalyzeReverseStream and ProcessStream which is the place you really do the processing work. This is very confusing. If I am not using audio_processing in modules directly, how can I do the processing only with Voice Engine?
Thank you so much for your help!!