We here at Blackboard have been working on a virtual classroom based on WebRTC clients in Chrome. It is really working well, except for lip sync, which is usually very good, but all too often goes off. Sometimes by seconds.
We use a similar technique to Jitsi, in using multiple SSRC values (aka multi-stream or multi-track) in each session (audio & video) with the SDP indicating via msid values, which pairs are together, and thus should be connected for the purposes of lip sync. So, after going over and over our own code, it came time to try and figure out what WebRTC wanted as we were sure we were "compliant" to the RFC's.
So, digging into the source, I am convinced that WebRTC does use the "normal" RFC3550 technique for lip sync, utilising RTCP sender reports timestamp and NTP time information to adjust relative delays in the audio and video streams. And, if we limit the peer connection to a single stream each for audio and video, it in fact does work exactly as expected. Various custom added log output has confirmed this.
However, when multiple SSRC's are present, what I discovered is that the information was not getting from the RTCP packet handlers (e.g. RTCPReceiver::HandleSenderReceiverReport) to where it is needed in the ViESyncModule class. Further trawling through the code led me to ViEBaseImpl::ConnectAudioChannel and where it was called from, revealed the following comment:
// Connect the voice channel, if there is one.
// TODO(perkj): The A/V is synched by the receiving channel. So we need to
// know the SSRC of the remote audio channel in order to fetch the correct
// webrtc VoiceEngine channel. For now- only sync the default channel used
// in 1-1 calls.
Which is really not good for us. :-(
Does anyone know the status of this? Fix is imminent? No one is even away of the problem? Well, at least perkj was aware of it!
Thanks for any info.
Robert.