Bad audio quality

Skip to first unread message

Oct 29, 2020, 11:59:42 AM10/29/20
to kurento

We are having an audio quality problem and I am wondering if maybe one of you already has experience with this, or has ideas about it.

We are using KMS 6.13.0 (with our own memory leak fix). Still have to perform my experiments with KMS 6.14.0 and/or the nightly.

What we are trying to do is this:
* we have our own RTSP tool generating OPUS encoded 48KHz stereo audio
* an RTSP Player endpoint, consuming our RTSP audio
  * we want low latency and predictable resource consumption,
  * so we configure it with networkCache=0 and useEncodedMedia=true
* a media pipeline
* a WebRTC endpoint
* a firefox or a chrome browser

With this setup, the sound in the browser is very choppy and robotic.

If we configure the Player endpoint with useEncodedMedia=false, allowing kurento to transcode, we get good audio quality in the browser, but a delay is being built up.

If we record the output of our RTSP tool directly (with an ffmpeg commandline),
we get good audio quality in the recording.

If we create a Player endpoint with an opus audio file (instead of RTSP), we get good audio quality in the browser.

If we create a packet capture of the WebRTC stream, decrypt this and extract the opus data from the capture into a file (using the opusrtp commandline tool), we get good audio quality in the extracted opus file.

It is quite baffling and hard to understand and we are looking for ideas to tackle this.

One thing we noticed from the chrome webrtc internals dump is a lot of jitter, on average higher jitter buffer delay and many concealed samples and concealmentEvents for the useEncodedMedia=true case compared to when kurento is allowed to transcode.

Any insights, ideas, remarks, questions are welcome!

Thanks in advance,
Erik Cumps

Juan Navarro

Oct 29, 2020, 1:29:30 PM10/29/20
Firs thing that comes to mind is is raising the networkCache from 0 to other values might have an impact? In principle networkCache=0 would mean that all RTP packets better be exactly on point at the expected times in the RTSP stream, or else they will be dropped. So even a slight amount of jitter or delay in the network might cause packets to be dropped when they arrive to the PlayerEndpoint.

"networkCache" translates directly to the "latency" property of the GStreamer "rtspsrc" element:

which after checking the source code, it ends up being set as the property "latency" on "rtpbin" and the "rtpjitterbuffer" inside it:

there is no other processing or buffering done by KMS regarding this value, so the effect of setting networkCache=0 can be directly studied wrt. these GStreamer elements.
You received this message because you are subscribed to the Google Groups "kurento" group.
To unsubscribe from this group and stop receiving emails from it, send an email to
To view this discussion on the web visit

Erik Cumps

Oct 29, 2020, 1:55:40 PM10/29/20
Thanks for the quick response Juan!

I did a quick experiment with networkCache set to 1000 and once to 10000.

The result is... weird.

First there is silence for the length of the cache (that's expected).

Next, audio starts, with much better quality for a short time (a second or two).

Then the audio quality degrades again, and on top of that the audio seems to slow down, speed up, slow down... (that's unexpected)


Oct 30, 2020, 10:27:35 AM10/30/20
to kurento
Another bit of info:

A packet capture shows the RTP packets are being sent at the expected constant rate of 50 pkts/second. (or a packet every 20 ms)

However, when I grep for the "rtparrival %u, rtptime %u" messages in the kurento logs,
I get an unexpected and surprisingly big variance for the rtparrival times. For example:

Sorry it's an image, right now it's the only way I know to get this properly formatted as a table.

The "rtparriva"l and "rtptime" columns contain the rrparrival and associated rtptime values from the logs.
The "Darr" column contains the deltas between two consecutive rtparrival values.
Likewise, the "Dtime" column contains the deltas between two consecutive rtptime values.

There are also some statistics included: minimum, maximu, average, std deviation, count of values below
960, count of values above 960, total count.

As you can see rtparrival time is varying wildly. We cannot explain this.

We guess this variance influences the transmit timing of these packets on the webrtc stream...


Oct 30, 2020, 10:37:03 AM10/30/20
to kurento
Ok, so that picture didn't come through. I'll try with text but it probably won't come out nice.

rtparrival  rtptime       Darr      736    Dtime      960        ← MIN
 2097      1673029895              1106              1601        ← MAX
 3096      1673030855      999      960     960       960        ← AVERAGE
 3920      1673031815      824      100     960        15        ← STDEV
 4879      1673032775      959      440     960         0        ← below 960
 5939      1673033735     1060     1287     960         2        ← above 960
 6948      1673034695     1009     1728     960      1728        ← COUNT
 7758      1673035655      810              960           
 8723      1673036615      965              960           
 9790      1673037575     1067              960           
10810      1673038535     1020              960           
11606      1673039495      796              960           
12594      1673040455      988              960           
13598      1673041415     1004              960           
14633      1673042375     1035              960           
15441      1673043335      808              960           
16432      1673044295      991              960           
17452      1673045255     1020              960           
18527      1673046215     1075              960           
19293      1673047175      766              960           
20258      1673048135      965              960           
21337      1673049095     1079              960           
22356      1673050055     1019              960           
23130      1673051015      774              960           
24104      1673051975      974              960           
25175      1673052935     1071              960           
26210      1673053895     1035              960           
26979      1673054855      769              960           
27962      1673055815      983              960           
29040      1673056775     1078              960            


Juan Navarro

Oct 30, 2020, 11:35:13 AM10/30/20
Both the picture and the ASCII table looked exactly the same for me, and it looks like a properly formatted table, so I'm not sure why you said it looks wrong. It's 7 columns, with columns 4 and 6 having the summary stats of Darr and Dtime, respectively, right?

I assume you mean the log message from rtpsource.c, right? (it's the only I could find with that formatting):

GST_LOG ("rtparrival %u, rtptime %u, clock-rate %d, diff %d, jitter: %f"

"rtptime" is the RTP Timestamp that the sender wrote in the RTP header. It represents the exact point in time when the audio payload was packetized and sent over the network.

"rtparrival" is again an RTP Timestamp (I'm using this name to make it clear that it's not necessarily a simple measurement in milliseconds) of the instant when the packet was received by the local Rtp implementation. This is, if I understand correctly, the RtpBin in your PlayerEndpoint. Both readings of rtptime and rtparrival seem correct to me.

rtptime is expected to be always the same for audio, because the sender is expected to send, as you mentioned, a constant rate of audio packets. This is the 960 you see, where this is in the magnitude of Rtp Timestamp, and can be converted to "clock time" dividing by the clock-rate of the media, 48000 for OPUS. So 960 / 48000 = 20 milliseconds separation between packets, which seems about right.

rtparrival, on the other hand, is the time when the packet arrived; this, being UDP, will suffer from all typical network conditions of UDP: packets will sometimes arrive late, sometimes arrive too early, maybe get duplicated, or get lost altogether. That's why it's mandatory buffering and reordering packets at the receiver, a task that is done by the GstJitterBuffer. Your capture shows exactly that, with some variation on the receiving times for each packet; that variation is the jitter.

There is a very instructive detail here: notice how even with packets coming earlier or later than expected, overall their average arrival time is still 960 ticks, matching the difference on the sender side!

Given that your biggest difference between packets is 1106 ticks, i.e. around 23ms, it seems that you would need to set up a jitter latency of 23 ms. I would put 50ms, just to be safe. Of course, this would only apply to the network segment you have studied, and with the same network conditions.

If the jitterbuffer latency is biger than this, which it is in your case, then the arrival time is smoothed out by the jitterbuffer, and this is definitely not affecting the rest of the pipeline, so the issue musty be somewhere else.

Erik Cumps

Nov 4, 2020, 12:32:09 PM11/4/20
Thanks for confirming the table layout is coming through ok.

Just realized I forgot to mention another important detail that better explains why this is so strange: these rtp
packets are not flowing over the network, they are flowing between processes on the same machine: both our
RTSP source and the kurento media server are running on the same host, so packets are not leaving the machine.

To be more precise: our RTSP Player Endpoint access the RTSP stream using an "rtsp://" URI.
All processes are running in the same LXC container.


Erik Cumps
NUCLeUS System Architect

For Service & Support :
Support Line Belgium: +32 2 2009897
Support Line International: +44 12 56 68 38 78
Or via email :
Message has been deleted
Message has been deleted

Juan Navarro

Nov 13, 2020, 9:06:50 AM11/13/20
On 13/11/20 11:29, wrote:
A quick status update:
  • we found the root cause of our problem
  • it had nothing to do with kurento
For those of you who are curious as to what the problem was:

There was a mismatch between the incoming raw audio frame size and the opus encoding frame size,
which resulted in a bad encoding cadence causing irregular encoded frame intervals.

We remedied this by ensuring that the incoming audio frame size and the opus encoding frame size are
the same --- or the incoming frame size is a divisor of the encoding frame size.

Your message was caught for no apparent reason in the Google spam filter; I've approved it now and dropped the other attempts you made.
Did you get any kind of notification from Google Groups when your message was not appearing in the group? One is configured to be shown, but I'm not sure it is being sent to users.

Regarding your message: thank you for sharing the root cause of your issue! Other users might find it very useful in the future. I might also add a section about encoding in the Troubleshooting Guide, as encoding misconfigurations can be a real source of headaches.


Erik Cumps

Nov 13, 2020, 9:32:08 AM11/13/20
Hi Juan,

Thanks for clearing that up, I was getting a little confused... :)

No, I did not receive any indication at all from Google groups, neither when posting directly on the site nor when sending directly via email.


You received this message because you are subscribed to the Google Groups "kurento" group.
To unsubscribe from this group and stop receiving emails from it, send an email to
Reply all
Reply to author
0 new messages