I opened a bug for Firefox some time ago, and they got it fixed since (not reproducible in Firefox 49 anymore). You can find more info on both the post I opened at the time, and the related issue I opened on their bug tracker:
https://bugzilla.mozilla.org/show_bug.cgi?id=1256679This still happens with Chrome, instead. Apparently the cause is googJitterBufferMs that is bumped for some reason to about 300/500ms, to only decrease after some minutes. I initially thought some broken RTCP management could be blamed on my end, but I double checked and everything seems fine there. As I wrote on the Firefox-related posts, it seems to happen more evidently when the bitrate of the video is low, while it's barely noticeable for higher bitrates for some reason, although it might or might not be related.
I'll post some of the details again here. It is easily reproduceable, if you want to check it yourself:
1. open
https://janus.conf.meetecho.com/echotest (simple gateway demo that just sends you back whatever you send to it)
2. use the "bandwidth" control to cap it to 128kbps (as anticipated, it seems more evident at lower bitrates: that control forces REMB feedback at 128000)
3. mute the local video track: echotest.webrtcStuff.myStream.getVideoTracks()[0].enabled=false
4. wait some time and unmute it again: echotest.webrtcStuff.myStream.getVideoTracks()[0].enabled=true
5. you'll see that video is now delayed, while audio is still ok.
The problem is not in how we send the stream we receive, as everything is correct in the RTP timestamps: I verified this with Wireshark. Besides, I also did more tests by forwarding the incoming RTP (unencrypted) to a gstreamer script, and by doing the same test in a SFU scenario with users joining *after* the unmute happened, and so that didn't witness the mute/unmute: in all of those cases, the stream was in realtime, and not delayed at all, while it was delayed for those who were in when the unmute happened. This means that the issue only happens for the receiver that gets a stream being muted, which is then unmuted again.
My guess is that this is related to the different framerate that occurs when doing the mute/unmute, that eventually confuses the receiver. Not sure why this is more evident at lower bitrates, maybe it's just because packets are sent less frequently or less data is involved.
Anything I can provide to have this looked into and, in case the issue is in Chrome, fixed?
Thanks,
Lorenzo