A/V Synchronization issue during time or poor CPU

646 views
Skip to first unread message

seba...@tribe.pm

unread,
Mar 22, 2017, 1:33:52 PM3/22/17
to meetecho-janus
I post here my issue because I can't resolve it by myself.

During WebRTC 1-to-1 (or 1-to-many) connections I have randomly an A/V Synchronization issue. I'm using janus v0.2.2 and latest github code with a mobile application linked with WebRTC (release from December 2016) on Cocoapods https://cocoapods.org/pods/WebRTC, same revision on Android.

Rooms creation are configured with:
- Bitrate 512kbits/s
- FirFreq 10
- AudioCodec opus
- VideoCodec vp8
- AudioLevelExt true
- VideoOrientExt true
- PlayoutDelayExt true
- No recording

Janus is linked with:
- rabbitmq-c v0.8.0
- usrsctp latest
- boringssl latest
- janus-gateway latest (tried v0.2.1 and v0.2.2 but still no luck)
- libsrtp v2.0

Janus is running in a docker container and linked to CoTURN only with STUN (like clients)

This issue is appearing quickly on 4G/3G networks or poor mobile CPU. Video could be played multiple seconds later than audio and never resync (eg dropping frames to synchronize to audio again)
WebRTC library used is from December 2016 (The last one from Feb 2017 have some issues on video quality).
Last thing I'm doing is to renegotiate SDP and b=AS: parameter to lowering bitrate if an additional peer is joining (could be up to 8). Eg: 256k with two peers, 128k for 3 & 4 peers etc... So Im' rewriting the SDP on the fly for limiting max bandwidth.

I tried to trace/debug timestamping on RTP for the publisher and I saw something on janus_videoroom.c (plugin videoroom):

janus_rtp_header_update(packet->data, &listener->context, FALSE, 4500); // For Video
janus_rtp_header_update(packet->data, &listener->context, FALSE, 960); // For Audio

With my patch, I could see that the timestamp video delta from the publisher is not fixed for the Video part, it could be from 0 to 3880, audio seems to be fix to 960 with Opus
I saved the latest RTP timestamp from publisher (Audio and Video separately) and apply the same delta on listeners RTP but it didn't help, the Video and Audio freeze very quickly after 1 second and I had a lot of errors output from Janus.

Perhaps I'm doing something terribly wrong. Is there an explanation why RTP timestamping delta are set to fixed values on listeners and not applying delta from the publisher ?
Theorically, the end point in P2P mode without Janus would be the final peer. I also see that sometimes the timestamp delta between two video packets from the publisher could be 0 (very strange).

If someone could help me to debug this, it will greatly appreciated (I don't know where to start for the moment).

Thank you

seba...@tribe.pm

unread,
Mar 22, 2017, 2:10:41 PM3/22/17
to meetecho-janus
More infos:

- I set dtls_mtu = 1200 to avoid some big nacks issues
- I set max_nack_queue=200
- force-bundle = true
- force-rtcp-mux = true

Lorenzo Miniero

unread,
Mar 24, 2017, 9:39:27 AM3/24/17
to meetecho-janus
The 4500/960 things are not responsible for this. They're only involved in case of context/publisher switching, which you most likely are not using.

Not sure what you're using the SDP b=AS attribute for: you can do dynamic publishing bitrate adaptation in the VideoRoom using "configure" requests with a target bitrate, without doing any renegotiation, fake or otherwise. We use that extensively to also auto-adapt to network issues (e.g., many slow-links --> configure a lower bitrate).

Try recording and then post-processing the .mjr files to see if you notice the same desync in the resulting file too.

L.

seba...@tribe.pm

unread,
Mar 24, 2017, 10:27:07 AM3/24/17
to meetecho-janus
Hi Lorenzo,

Thank you for your answer. The 4500/960 thing as you say is called at every RTP packet I receive from a publisher.
If I read the code correctly, every incoming rtp packet from a publisher will be republished to all listeners:

                packet.data = rtp;
                packet.length = len;
                packet.is_video = video;
                /* Backup the actual timestamp and sequence number set by the publisher, in case switching is involved */
                packet.timestamp = ntohl(packet.data->timestamp);
                packet.seq_number = ntohs(packet.data->seq_number);
                /* Go */
                g_slist_foreach(participant->listeners, janus_videoroom_relay_rtp_packet, &packet);

So janus_videoroom_relay_rtp_packet is called with the packet received from a publisher. Then in janus_videoroom_relay_rtp_packet:

        /* Make sure there hasn't been a publisher switch by checking the SSRC */ <-- I don't understand this comment as you just testing if the packet is a video packet
        if(packet->is_video) {

Then finally, if it's a video packet you call:

janus_rtp_header_update(packet->data, &listener->context, TRUE, 4500);

in rtp.c, the function janus_rtp_header_update check if (video) so:

        if(video) {
                if(ssrc != context->v_last_ssrc) {
                        /* Video SSRC changed: update both sequence number and timestamp */
                        JANUS_LOG(LOG_VERB, "Video SSRC changed, %"SCNu32" --> %"SCNu32"\n",
                                context->v_last_ssrc, ssrc);
                        context->v_last_ssrc = ssrc;
                        context->v_base_ts_prev = context->v_last_ts;
                        context->v_base_ts = timestamp;
                        context->v_base_seq_prev = context->v_last_seq;
                        context->v_base_seq = seq;
                }
                if(context->v_seq_reset) {
                        /* Video sequence number was paused for a while: just update that */
                        context->v_seq_reset = FALSE;
                        context->v_base_seq_prev = context->v_last_seq;
                        context->v_base_seq = header->seq_number;
                }
                /* Compute a coherent timestamp and sequence number */
                context->v_last_ts = (timestamp-context->v_base_ts) + context->v_base_ts_prev+step; <---- Here, you adapt the context->v_last_ts with the step that is equal to 4500
                context->v_last_seq = (seq-context->v_base_seq)+context->v_base_seq_prev+1;
                /* Update the timestamp and sequence number in the RTP packet */
                header->timestamp = htonl(context->v_last_ts);   <---- And finally set the new timestamp computed with step on the RTP header for the listener
                header->seq_number = htons(context->v_last_seq);

So if I understand your code correctly, the rtp header packet is updated for every video and audio packet received from the publisher before sent to each listeners with specific v_base_ts and step added (4500 for video / 960 for audio), tell me if I'm wrong.
If I log the header->timestamp updated for a listener, I see it's called all the time for each video packet. It's why I'm asking about the fixed step (4500/960) used.

I'm using a SDP renegotiation because I'm changing the max bitrate & the resolution via getUserMedia (not only the bitrate). I'm not aware of a request that could change the resolution dynamically like the configure request for target bitrate. A/V out-of-sync seems to occur frequently after some video freezes (slow links).

I started to record some streams to see if the problem is coming from the publisher or from janus. I let you know the result after processing with janus-pp-rec.

Thank you Lorenzo.
Sebastien.

Lorenzo Miniero

unread,
Mar 24, 2017, 10:35:56 AM3/24/17
to meetecho-janus
I know it's called for every packet, it needs to work like that. Again, those steps are only relevant when there's an SSRC switch (in the VideoRoom case, you switched a listener to a completely different publisher with "switch"), which likely means a completely different sequence number and timestamp. Since we need them to be consistent to make the recipient think it's still the same stream, we calculate how to adapt the original values to what it would like in the context of this listener instead. Since we don't know what the gaps are, we make assumptions (4500 for video, 960 for audio) *only* when switching. The fact we sum them every time has no relevance at all, as timestamps are not an absolute measure of time, but relative.

 
I'm using a SDP renegotiation because I'm changing the max bitrate & the resolution via getUserMedia (not only the bitrate). I'm not aware of a request that could change the resolution dynamically like the configure request for target bitrate. A/V out-of-sync seems to occur frequently after some video freezes (slow links).



We don't support renegotiations yet, so not sure if something may be breaking when you involve those. You may want to try unpublishing/republishing instead to see if that helps.

As to the out-of-sync, as I've said recording and post-processing might tell you if the issue is on the encoder side, rather than in Janus (maybe the slow links are caused by the device failing to keep up with the encoding process?). 

L.

seba...@tribe.pm

unread,
Mar 29, 2017, 10:34:12 AM3/29/17
to meetecho-janus
Hi Lorenzo and the list,

Thank you for your reply. Accordingly to your comments, I set recording on for all the streams and trying to know what's happening.
My conclusions:
- On old device (like iPhone5 for example) and lib webrtc, there is an out-of-sync of 1 second after the first minute, I think it's related to the low cpu of this device. I'm trying to record the session and using janus-pp-rec to convert to .opus and .webm then remux with ffmpeg, the out-of-sync is present on the publisher source. So I think the problem is coming from the client libwebrtc.
- So, I tried to use the meetecho demo videoroom available on meetecho janus website with two web chrome web browser on last macbookair. After 50 minutes average, an out-of-sync is present (about 500ms to 1000ms), video is played after the audio. So I think there is (perhaps) an incompatibility in RTP muxing between webrtc lib (implemented in chrome last release) and Janus. Don't know for the moment where is the problem.

This problem seems to be amplified when the device has slow CPU.
I tried to disable negotiation and implementing smooth algorithm to lowering bitrate when slow_link message occurs (with configure/bitrate), the bitrate is set accordlingly to the request (current-bitrate adapt to the value I sent). But the out-of-sync occurs despite of this.

Do you think there is an incompatibility or (perhaps) a bug in webrtc lib implemented by google with you RTP timestamping implementation ?

Let me know because I would help to troubleshoot this on Janus. By the way if we set P2P mode between two devices (1-to-1), there is no out-of-sync at all.

Thanks,
Sebastien

Lorenzo Miniero

unread,
Mar 29, 2017, 11:00:53 AM3/29/17
to meetecho-janus
My guess is you're hitting the same issue Ju Ju was describing here:

https://groups.google.com/forum/#!topic/meetecho-janus/SsGLnihikhI

You may want to check if you're talking about the same thing. My guess is something RTCP-related rather than RTP related, but I have no way to look into anything before next week, as I'm abroad for a conference and full-time busy. If you guys can look into logs (especially Chrome logs, after you enable webrtc debugging) that would help.

L.

Ju Ju

unread,
Mar 30, 2017, 8:17:05 AM3/30/17
to meetecho-janus
Hi sebastien

 
- So, I tried to use the meetecho demo videoroom available on meetecho janus website with two web chrome web browser on last macbookair. After 50 minutes average, an out-of-sync is present (about 500ms to 1000ms), video is played after the audio. So I think there is (perhaps) an incompatibility in RTP muxing between webrtc lib (implemented in chrome last release) and Janus. Don't know for the moment where is the problem.


I think you have seen the same issue but from another point of view :
you -> using a device with low CPU
me -> using HD resolution + very high bitrate (> 2 Mbits)

You can send me an email and I will give use access to our PF where same meetheco demos are available with no bandwidth limitation. So you could try videoroom in HD with high bandwidth and you will notice it is not just desync it s full stuttering !

I m still wonder why there is so few people who notice that !

J

Ju Ju

unread,
Mar 30, 2017, 8:18:28 AM3/30/17
to meetecho-janus

. If you guys can look into logs (especially Chrome logs, after you enable webrtc debugging) that would help.


can u paste an example of what kind of logs I have to seek ? because I have never noticed logs about RTCP in chrome (but I didn't pay attention to this kind of message either

Lorenzo Miniero

unread,
Mar 30, 2017, 11:40:25 AM3/30/17
to meetecho-janus
No idea as I'm not familiar with Chrome debugging. You'll probably have to enable WebRTC logging, as I believe it's disabled by default. I remember launching Chrome with options like this in the past:

--enable-logging --v=1 --vmodule=*/libjingle/*=3,*=0

but I don't know if they're still the same, and on Windows they may differ. You may want to search and/or ask on discuss-webrtc on how to do that.

L.

Mirko Brankovic

unread,
Mar 30, 2017, 1:19:33 PM3/30/17
to meetecho-janus
I think you have checkbox to start 'recording log ' on webrtc-internals, at lest on linux....

--
You received this message because you are subscribed to the Google Groups "meetecho-janus" group.
To unsubscribe from this group and stop receiving emails from it, send an email to meetecho-janus+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Ju Ju

unread,
Mar 30, 2017, 2:49:18 PM3/30/17
to Mirko Brankovic, meetecho-janus
You re right

Enable diagnostic packet and event recording
"A diagnostic packet and event recording can be used for analyzing various issues related to thread starvation, jitter buffers or bandwidth estimation. Two types of data are logged. First, incoming and outgoing RTP headers and RTCP packets are logged. These do not include any audio or video information, nor any other types of personally identifiable information (so no IP addresses or URLs)."

But

It is a binary data, unreadable

This feature seems to be made for posting more debug info while submitting a bug 

However you can record « the PeerConnection updates and stats data » but Really don’t see how to exploit it !

I can paste mine while the issue occur if someone knows where to find

J


You received this message because you are subscribed to a topic in the Google Groups "meetecho-janus" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/meetecho-janus/2_X-cC1kazU/unsubscribe.
To unsubscribe from this group and all its topics, send an email to meetecho-janu...@googlegroups.com.

Lorenzo Miniero

unread,
Mar 30, 2017, 3:03:13 PM3/30/17
to meetecho-janus, mirkobr...@gmail.com
Try asking info on discuss-webrtc.

L.
To unsubscribe from this group and all its topics, send an email to meetecho-janus+unsubscribe@googlegroups.com.

Ju Ju

unread,
Mar 31, 2017, 11:18:27 AM3/31/17
to Lorenzo Miniero, meetecho-janus, mirkobr...@gmail.com
Hello Lorenzo,

I post nothing but no one is answering …

To help us understand what is going on with chrome I have a close look on webrtc-internals counters using the echo test demo


Can you notice on the graphs the 3 « drops » in the frame rate « received » ? In real world this means a video freeze (look the « frame rate received » in the second graphs)

You can notice the same drop in the buffer: this means number of « good » packets, number  of packets which could be used in the video has decrease. 


But look at NACK, FIR and PLI graphs :

No NACK have been sent but PLI ones did !





Now something are interesting:

Packets Lost have been monitored but No NACK have been sent,  no FIR have been sent but PLI have been sent !





Maybe Janus is not managing PLI as well as NACK ?
Maybe it is a start for investigation, what do you think ?

J-







To unsubscribe from this group and all its topics, send an email to meetecho-janu...@googlegroups.com.

Lorenzo Miniero

unread,
Mar 31, 2017, 11:21:29 AM3/31/17
to meetecho-janus, lmin...@gmail.com, mirkobr...@gmail.com
Il giorno venerdì 31 marzo 2017 17:18:27 UTC+2, Ju Ju ha scritto:
Hello Lorenzo,

I post nothing but no one is answering …

To help us understand what is going on with chrome I have a close look on webrtc-internals counters using the echo test demo


Can you notice on the graphs the 3 « drops » in the frame rate « received » ? In real world this means a video freeze (look the « frame rate received » in the second graphs)

You can notice the same drop in the buffer: this means number of « good » packets, number  of packets which could be used in the video has decrease. 


But look at NACK, FIR and PLI graphs :

No NACK have been sent but PLI ones did !





Now something are interesting:

Packets Lost have been monitored but No NACK have been sent,  no FIR have been sent but PLI have been sent !





Maybe Janus is not managing PLI as well as NACK ?
Maybe it is a start for investigation, what do you think ?



Of course we don't handle PLI: we don't encode media ourselves, so what would we do with it? When it arrives we simply forward it to the other peer, which is definitely what happens in the EchoTest demo. I don't think we do it in the VideoRoom in order not to flood the publisher (if there are 300 viewers a publisher can get a ton of PLIs at the same time), which is why we have the "FIR/PLI frequency" setting.

Lorenzo


 
J-







To unsubscribe from this group and all its topics, send an email to meetecho-janus+unsubscribe@googlegroups.com.

Mirko Brankovic

unread,
Mar 31, 2017, 11:24:39 AM3/31/17
to meetecho-janus
Ju ju, 
Im wondering what is happening with your cpu on local client when problem occurs? Does it spike to 100% for a longer period on all cores?

Ju Ju

unread,
Mar 31, 2017, 11:26:07 AM3/31/17
to Lorenzo Miniero, meetecho-janus, mirkobr...@gmail.com




Of course we don't handle PLI: we don't encode media ourselves, so what would we do with it? When it arrives we simply forward it to the other peer, which is definitely what happens in the EchoTest demo.
If you just forward them, the echo test demo should be work well if the issue come from here. So it must not be the issue, it could have explain only in video room

I don't think we do it in the VideoRoom in order not to flood the publisher (if there are 300 viewers a publisher can get a ton of PLIs at the same time), which is why we have the "FIR/PLI frequency" setting.
I tried many setting from 1 to 15 , without any impact

So at least do you have a idea why chrome is not sending NACK to janus when it is detecting packets LOST ?


J -
To unsubscribe from this group and all its topics, send an email to meetecho-janu...@googlegroups.com.

Lorenzo Miniero

unread,
Mar 31, 2017, 11:33:03 AM3/31/17
to meetecho-janus, lmin...@gmail.com, mirkobr...@gmail.com


Il giorno venerdì 31 marzo 2017 17:26:07 UTC+2, Ju Ju ha scritto:





Of course we don't handle PLI: we don't encode media ourselves, so what would we do with it? When it arrives we simply forward it to the other peer, which is definitely what happens in the EchoTest demo.
If you just forward them, the echo test demo should be work well if the issue come from here. So it must not be the issue, it could have explain only in video room

I don't think we do it in the VideoRoom in order not to flood the publisher (if there are 300 viewers a publisher can get a ton of PLIs at the same time), which is why we have the "FIR/PLI frequency" setting.
I tried many setting from 1 to 15 , without any impact

So at least do you have a idea why chrome is not sending NACK to janus when it is detecting packets LOST ?



I don't know how Chrome implements stuff. I guess they probably decide to send a PLI instead of a NACK under some circumstances. This blog post describes how it works: http://www.rtcbits.com/2017/03/retransmissions-in-webrtc.html

L.
 

J -
To unsubscribe from this group and all its topics, send an email to meetecho-janus+unsubscribe@googlegroups.com.

Ju Ju

unread,
Mar 31, 2017, 11:38:19 AM3/31/17
to meetecho-janus
no it remains under 50% all the time
J-







To unsubscribe from this group and stop receiving emails from it, send an email to meetecho-janu...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to a topic in the Google Groups "meetecho-janus" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/meetecho-janus/2_X-cC1kazU/unsubscribe.
To unsubscribe from this group and all its topics, send an email to meetecho-janu...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the Google Groups "meetecho-janus" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/meetecho-janus/2_X-cC1kazU/unsubscribe.
To unsubscribe from this group and all its topics, send an email to meetecho-janu...@googlegroups.com.

Ju Ju

unread,
Mar 31, 2017, 11:42:05 AM3/31/17
to Lorenzo Miniero, meetecho-janus, mirkobr...@gmail.com


I don't know how Chrome implements stuff. I guess they probably decide to send a PLI instead of a NACK under some circumstances. This blog post describes how it works: http://www.rtcbits.com/2017/03/retransmissions-in-webrtc.html


Ok thx I will read it

but update: Chrome is sending NACK but the video freeze always occurs when a PLI is sent as shown below



Lorenzo Miniero

unread,
Mar 31, 2017, 11:45:07 AM3/31/17
to meetecho-janus, lmin...@gmail.com, mirkobr...@gmail.com
That's the expected reaction. Chrome fails to decode an image, which in UI is perceived as a freeze, and so sends a PLI to get a new picture to start from. PLIs are a consequence of the freeze, not the cause. What causes the freeze I don't know.

L.



 


Ju Ju

unread,
Mar 31, 2017, 11:50:00 AM3/31/17
to meetecho-janus, lmin...@gmail.com, mirkobr...@gmail.com


That's the expected reaction. Chrome fails to decode an image, which in UI is perceived as a freeze, and so sends a PLI to get a new picture to start from. PLIs are a consequence of the freeze, not the cause. What causes the freeze I don't know.

L.

Ok so back to the beginning :(
Just another information I forgot to mention: the PLI is seen on the sender side (so janus has well forward him). Maybe he is received too late fro being honored.
 
Reply all
Reply to author
Forward
0 new messages