libnice global lock issue

361 views
Skip to first unread message

emmanu...@dazzl.tv

unread,
Oct 27, 2017, 4:02:24 AM10/27/17
to meetecho-janus
Hi Lorenzo, hi all,

I just wanted to know if there is any fix yet for this libnice global lock issue. We are experiencing many packet losses/NACKS when forwarding heavy bitrate streams, which in some situations is not acceptable. Is there any decision regarding this: switch to another lib? try to fix the problem?

Cheers,

Emmanuel.

Lorenzo Miniero

unread,
Oct 27, 2017, 4:44:58 AM10/27/17
to meetecho-janus
Hi Emmanuel,

I'm not aware of any fix, which anyway is not up to us but to libnice. I posted on their mailing list some time ago:


and I found out about an issue on their tracker about this:


but I'm not sure any progress has been made. You may want to contribute to the discussion there to see this fixed. One of the participants to that issue contributed a fork where he hacked the code to turn the global lock into a per-agent one:


We did something similar internally and did see improvements, but again, they're hacks, as we're not familiar with the library internals, and so there's no guarantuee that this doesn't break the stability of the library or introduce unpredictable bejaviour.

I did investigate alternatives, and I wasn't happy or comfortable with any of them. Apart from this issue, I'm in general very happy with libnice and its clean API. There's pjnath that sounded like a good solution (Asterisk uses it) but it has hardcoded limits for the number of candidates, and if you go beyond that (very easy in the WebRTC world) it just crashes. To avoid that, you have to manually patch and install the library (which I believe is what Asterisk does) but I have no intention of maintaining my own fork or making the installation of Janus any harder than it already is. There's libre too, but they have different ICE stacks that they're only now starting to merge and change (https://github.com/creytiv/re/wiki/ICE-Stacks-merge), which means that until that's done it's hard to pin down any API we can reliably use now and in the future.

I also considered writing my own ICE stack, and did some experimentation before the summer there that lead me to interesting results, but then had to put it on ice (pun intended!). As you can guess it's a huge effort, and something we don't have the time nor the resources to address as of now. Should someone want to sponsor this effort, it would definitely make it easier and quicker, and something I'd be glad to talk about.

Hope this helps,
Lorenzo

Emmanuel Riou

unread,
Oct 27, 2017, 12:47:41 PM10/27/17
to Lorenzo Miniero, meetecho-janus
Hi Lorenzo,

Thanks for your detailed answer! It's cool to have a complete status regarding this.
I've then tested harder this afternoon trying to narrow down what exactly my problem is.
My testbed is quite simple, I have 1 input stream at a quite heavy bitrate (around 6/7Mbps), and 1 output stream. Janus in this case receives lot of NACKS from the receiver of the output stream while I don't have any packet losses on the receive side (verified also with a wireshark capture). I would have to do more testing but it seems I start receiving these NACKS with bitrates higher than 4Mbps.
My first intuition was that my problem was due to this libnice issue I had heard about. So I asked the question on this forum and I added a function to libnice called "nice_agent_send_data_nolock" for the sending of data packets only, and which basically calls nice_socket_send_messages without holding any lock, so a very simple function. I replaced all the calls to nice_agent_send in janus_ice_send_thread with calls to my new function.
I was quite confident it would fix my problem, but it didn't!
Then I tried something else. I tried to get rid completely of this "per handle data packet queue" in Janus. So basically I modified janus_ice_relay_rtp to send packets directly instead of pushing them up into the queue. And no more NACKS in this case...
So probably a lock contention issue related to async_queue_push/pop functions, or may it be something else?
I don't have time to do more investigations right now, and will not be able to work back on this till thursday next week, but I wanted to share this with you anyway, even if I don't have a clear conclusion yet!

Cheers,

Emmanuel.












--
You received this message because you are subscribed to a topic in the Google Groups "meetecho-janus" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/meetecho-janus/8_XGmAnOv-4/unsubscribe.
To unsubscribe from this group and all its topics, send an email to meetecho-janus+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Lorenzo Miniero

unread,
Oct 27, 2017, 1:00:29 PM10/27/17
to meetecho-janus
Il giorno venerdì 27 ottobre 2017 18:47:41 UTC+2, Emmanuel Riou ha scritto:
Hi Lorenzo,

Thanks for your detailed answer! It's cool to have a complete status regarding this.
I've then tested harder this afternoon trying to narrow down what exactly my problem is.
My testbed is quite simple, I have 1 input stream at a quite heavy bitrate (around 6/7Mbps), and 1 output stream. Janus in this case receives lot of NACKS from the receiver of the output stream while I don't have any packet losses on the receive side (verified also with a wireshark capture). I would have to do more testing but it seems I start receiving these NACKS with bitrates higher than 4Mbps.
My first intuition was that my problem was due to this libnice issue I had heard about. So I asked the question on this forum and I added a function to libnice called "nice_agent_send_data_nolock" for the sending of data packets only, and which basically calls nice_socket_send_messages without holding any lock, so a very simple function. I replaced all the calls to nice_agent_send in janus_ice_send_thread with calls to my new function.
I was quite confident it would fix my problem, but it didn't!
Then I tried something else. I tried to get rid completely of this "per handle data packet queue" in Janus. So basically I modified janus_ice_relay_rtp to send packets directly instead of pushing them up into the queue. And no more NACKS in this case...
So probably a lock contention issue related to async_queue_push/pop functions, or may it be something else?
I don't have time to do more investigations right now, and will not be able to work back on this till thursday next week, but I wanted to share this with you anyway, even if I don't have a clear conclusion yet!



Sending directly is what we did in the past, but we changed that behaviour as, depending on the number of recipients and the throughput, this could mean keeping the thread doing the transmission busy for too long. In fact, assuming you're using something like the VideoRoom, incoming_rtp in the plugin is triggered by the libnice recv callback, and so happens in the related recv thread: if relay_rtp sends packets directly instead of queueing, that recv thread can only get the next packet after it has physically sent the last packet, which can take a while if you have many viewers and/or have a lot of data to send; if this interval exceeds the inter-packet arrival internals, you're not going to catch up, and packets are going to be lost. Besides, there's the SRTP context contention issue: allowing different threads to invoke relay_rtp indiscriminately will cause crashes, as libsrtp is not thread safe, and so you'd need a lock anyway.

All of this lead us to make that change some time ago, which resulted in better performance for us. I guess in your case the issue might be partially related to too many threads being active, as we spawn two for each user (one for the libnice ICE loop to receive packets, and one for the outgoing queue), and so with many viewers sending a lot of data you may have too many context switches. A possible option here might be aggregating participants in less threads, e.g., 3-4 threads that will handle all incoming packets, and 3-4 threads doing the delivery. I think we made a similar experiment in the past, but IIRC this is one of the approaches that actually helped us identify libnice as the bottleneck in our case. In my experiments on a new ICE stack, I was working on an approach that only used a single thread to both send and receive, but I never got to the point of seeing how well it performed unfortunately, so I don't know if it would be a winning choice yet.

If you can figure out what exactly in the send thread + queue is slowing things down for you, e.g., via profiling of some sort, it will definitely help.

Thanks,
Lorenzo

Emmanuel Riou

unread,
Nov 8, 2017, 10:35:33 AM11/8/17
to Lorenzo Miniero, meetecho-janus
Hi Lorenzo, all,

I just had a chance to do some more investigations, in particular some perf profiling. It appears that in my case, the bottleneck is in icesend thread, in g_list_last/g_list_append functions (g_list_append actually calls g_list_last). Perf gives me a sum up of nearly 50% of time spent in g_list_last. Going through the whole list (actually 3 different lists) over and over is clearly problematic, in particular when the list is long which is propably the case for me.

It is not the only problem I have, unfortunately. It appears that I also have some fragmentation issues. Basically I ingest a non encrypted stream, and when forwarding it, srtp adds me extra bytes that makes most packets bigger than the MTU. Packets reaches the receiver in wrong order because of fragmentation slow down (and the fact that some packets are fragmented while some others are not). Bad reception order makes the receiver send lot of NACKs to Janus. The problem is, that I can't control the MTU of the sender, so the only solution I have seems to not use srtp on the sending path, which is not very satisfying...

Cheers,

Emmanuel.



Lorenzo Miniero

unread,
Nov 9, 2017, 9:30:46 PM11/9/17
to meetecho-janus
Thanks for digging more on this. I only see a couple of those instances: we use g_list_last to calculate how many bytes we sent in the last second (or whatever no_media_timer is set to), and g_list_append only to update the list of packets to retransmit. I guess the former could be redone without a list at all, not sure about the latter. Anyway, it's a GList, which means it's a double linked list: I'd expect a g_list_last or append to go ->prev from the head, instead of crawling the whole list the other way around.

Nothing we can do about the MTU, instead: it's up to the source to send smaller packets. Being so close to the MTU, SRTP overhead or not, is still dangerous, as it's not that unusual to encounter netork paths with MTUs smaller than average.

L.

Emmanuel Riou

unread,
Nov 10, 2017, 4:07:18 AM11/10/17
to Lorenzo Miniero, meetecho-janus
2017-11-10 3:30 GMT+01:00 Lorenzo Miniero <lmin...@gmail.com>:
Thanks for digging more on this. I only see a couple of those instances: we use g_list_last to calculate how many bytes we sent in the last second (or whatever no_media_timer is set to), and g_list_append only to update the list of packets to retransmit. I guess the former could be redone without a list at all, not sure about the latter. Anyway, it's a GList, which means it's a double linked list: I'd expect a g_list_last or append to go ->prev from the head, instead of crawling the whole list the other way around.


Hi Lorenzo,

It's true it's a double linked list but unfortunately, as mentioned in glib documentation g_list_append/g_list_last don't work as we would expect:
"g_list_append() has to traverse the entire list to find the end, which is inefficient when adding multiple elements. A common idiom to avoid the inefficiency is to use g_list_prepend() and reverse the list with g_list_reverse() when all elements have been added."

 
Nothing we can do about the MTU, instead: it's up to the source to send smaller packets. Being so close to the MTU, SRTP overhead or not, is still dangerous, as it's not that unusual to encounter netork paths with MTUs smaller than average.


Yes clearly, the MTU is a problem, there are protocols to discover the max MTU over a network path, but it is probably not widely used. And in my case anyway it would not be enough.

Cheers,

Emmanuel.

Lorenzo Miniero

unread,
Nov 10, 2017, 4:19:10 AM11/10/17
to meetecho-janus
Il giorno venerdì 10 novembre 2017 10:07:18 UTC+1, Emmanuel Riou ha scritto:


2017-11-10 3:30 GMT+01:00 Lorenzo Miniero <lmin...@gmail.com>:
Thanks for digging more on this. I only see a couple of those instances: we use g_list_last to calculate how many bytes we sent in the last second (or whatever no_media_timer is set to), and g_list_append only to update the list of packets to retransmit. I guess the former could be redone without a list at all, not sure about the latter. Anyway, it's a GList, which means it's a double linked list: I'd expect a g_list_last or append to go ->prev from the head, instead of crawling the whole list the other way around.


Hi Lorenzo,

It's true it's a double linked list but unfortunately, as mentioned in glib documentation g_list_append/g_list_last don't work as we would expect:
"g_list_append() has to traverse the entire list to find the end, which is inefficient when adding multiple elements. A common idiom to avoid the inefficiency is to use g_list_prepend() and reverse the list with g_list_reverse() when all elements have been added."



Not sure how more efficient that would be: wouldn't g_list_reverse() have to traverse the whole list too, to invert all the ->next and ->prev pointers? It sounds easier and quicker to do g_list_prepend() and not reverse at all, but only make sure the head of the list is still the same it was before.

Anyway, as a quick test you can try preventing the code to store the retransmitted packet at all, and see how much it improves for you: if it's considerable, we can think of better ways to do the same thing, e.g., using the method described above.

L.

Emmanuel Riou

unread,
Nov 10, 2017, 4:39:31 AM11/10/17
to Lorenzo Miniero, meetecho-janus
2017-11-10 10:19 GMT+01:00 Lorenzo Miniero <lmin...@gmail.com>:
Il giorno venerdì 10 novembre 2017 10:07:18 UTC+1, Emmanuel Riou ha scritto:


2017-11-10 3:30 GMT+01:00 Lorenzo Miniero <lmin...@gmail.com>:
Thanks for digging more on this. I only see a couple of those instances: we use g_list_last to calculate how many bytes we sent in the last second (or whatever no_media_timer is set to), and g_list_append only to update the list of packets to retransmit. I guess the former could be redone without a list at all, not sure about the latter. Anyway, it's a GList, which means it's a double linked list: I'd expect a g_list_last or append to go ->prev from the head, instead of crawling the whole list the other way around.


Hi Lorenzo,

It's true it's a double linked list but unfortunately, as mentioned in glib documentation g_list_append/g_list_last don't work as we would expect:
"g_list_append() has to traverse the entire list to find the end, which is inefficient when adding multiple elements. A common idiom to avoid the inefficiency is to use g_list_prepend() and reverse the list with g_list_reverse() when all elements have been added."



Not sure how more efficient that would be: wouldn't g_list_reverse() have to traverse the whole list too, to invert all the ->next and ->prev pointers? It sounds easier and quicker to do g_list_prepend() and not reverse at all, but only make sure the head of the list is still the same it was before.


g_list_reverse is not efficient either, but as mentioned in the documentation you do it only once, once the list is complete, which is not relevant for us. So yes g_list_prepend instead of g_list_append would be the solution.


 
Anyway, as a quick test you can try preventing the code to store the retransmitted packet at all, and see how much it improves for you: if it's considerable, we can think of better ways to do the same thing, e.g., using the method described above.


I've done this test already, it is clearly really better without the retransmit buffer list code (actually I removed all the code that call g_list_last in ice_send_thread, so also the code you mentioned earlier, to calculate number of bytes sent in last second). With it I spend 50% of time in ice_send_thread g_list_last. Without, perf results are really better, top function (srtp encryption function) doesn't go higher than 15%, and the rest is very low; and again this is with a quite heavy bitrate (aroud 6Mbps).

Lorenzo Miniero

unread,
Nov 10, 2017, 4:44:25 AM11/10/17
to meetecho-janus
Il giorno venerdì 10 novembre 2017 10:39:31 UTC+1, Emmanuel Riou ha scritto:


2017-11-10 10:19 GMT+01:00 Lorenzo Miniero <lmin...@gmail.com>:
Il giorno venerdì 10 novembre 2017 10:07:18 UTC+1, Emmanuel Riou ha scritto:


2017-11-10 3:30 GMT+01:00 Lorenzo Miniero <lmin...@gmail.com>:
Thanks for digging more on this. I only see a couple of those instances: we use g_list_last to calculate how many bytes we sent in the last second (or whatever no_media_timer is set to), and g_list_append only to update the list of packets to retransmit. I guess the former could be redone without a list at all, not sure about the latter. Anyway, it's a GList, which means it's a double linked list: I'd expect a g_list_last or append to go ->prev from the head, instead of crawling the whole list the other way around.


Hi Lorenzo,

It's true it's a double linked list but unfortunately, as mentioned in glib documentation g_list_append/g_list_last don't work as we would expect:
"g_list_append() has to traverse the entire list to find the end, which is inefficient when adding multiple elements. A common idiom to avoid the inefficiency is to use g_list_prepend() and reverse the list with g_list_reverse() when all elements have been added."



Not sure how more efficient that would be: wouldn't g_list_reverse() have to traverse the whole list too, to invert all the ->next and ->prev pointers? It sounds easier and quicker to do g_list_prepend() and not reverse at all, but only make sure the head of the list is still the same it was before.


g_list_reverse is not efficient either, but as mentioned in the documentation you do it only once, once the list is complete, which is not relevant for us. So yes g_list_prepend instead of g_list_append would be the solution.



Yep, g_list_prepend plus making sure the head doesn't change (as I believe a prepend would make the prepended item the head instead).

 

 
Anyway, as a quick test you can try preventing the code to store the retransmitted packet at all, and see how much it improves for you: if it's considerable, we can think of better ways to do the same thing, e.g., using the method described above.


I've done this test already, it is clearly really better without the retransmit buffer list code (actually I removed all the code that call g_list_last in ice_send_thread, so also the code you mentioned earlier, to calculate number of bytes sent in last second). With it I spend 50% of time in ice_send_thread g_list_last. Without, perf results are really better, top function (srtp encryption function) doesn't go higher than 15%, and the rest is very low; and again this is with a quite heavy bitrate (aroud 6Mbps).


That's really interesting! Sorry, didn't know you had made these tests already, I thought you had only done profiling so far (well, *only*, that's a huge task already...).

I'm abroad until the 26th for various events and next week in particular I'll be very busy, so not sure when I'll be able to prepare a patch that includes what we discussed in this exchange. I'll notify here in case I manage to put something together quickly that you and others can test.

Thanks again for this really useful feedback!
L.

Emmanuel Riou

unread,
Nov 10, 2017, 4:52:28 AM11/10/17
to Lorenzo Miniero, meetecho-janus
2017-11-10 10:44 GMT+01:00 Lorenzo Miniero <lmin...@gmail.com>:
Il giorno venerdì 10 novembre 2017 10:39:31 UTC+1, Emmanuel Riou ha scritto:


2017-11-10 10:19 GMT+01:00 Lorenzo Miniero <lmin...@gmail.com>:
Il giorno venerdì 10 novembre 2017 10:07:18 UTC+1, Emmanuel Riou ha scritto:


2017-11-10 3:30 GMT+01:00 Lorenzo Miniero <lmin...@gmail.com>:
Thanks for digging more on this. I only see a couple of those instances: we use g_list_last to calculate how many bytes we sent in the last second (or whatever no_media_timer is set to), and g_list_append only to update the list of packets to retransmit. I guess the former could be redone without a list at all, not sure about the latter. Anyway, it's a GList, which means it's a double linked list: I'd expect a g_list_last or append to go ->prev from the head, instead of crawling the whole list the other way around.


Hi Lorenzo,

It's true it's a double linked list but unfortunately, as mentioned in glib documentation g_list_append/g_list_last don't work as we would expect:
"g_list_append() has to traverse the entire list to find the end, which is inefficient when adding multiple elements. A common idiom to avoid the inefficiency is to use g_list_prepend() and reverse the list with g_list_reverse() when all elements have been added."



Not sure how more efficient that would be: wouldn't g_list_reverse() have to traverse the whole list too, to invert all the ->next and ->prev pointers? It sounds easier and quicker to do g_list_prepend() and not reverse at all, but only make sure the head of the list is still the same it was before.


g_list_reverse is not efficient either, but as mentioned in the documentation you do it only once, once the list is complete, which is not relevant for us. So yes g_list_prepend instead of g_list_append would be the solution.



Yep, g_list_prepend plus making sure the head doesn't change (as I believe a prepend would make the prepended item the head instead).


Yes, or you can just prepend, and then go through the whole list from the last element only when you retransmit.

 

 

 
Anyway, as a quick test you can try preventing the code to store the retransmitted packet at all, and see how much it improves for you: if it's considerable, we can think of better ways to do the same thing, e.g., using the method described above.


I've done this test already, it is clearly really better without the retransmit buffer list code (actually I removed all the code that call g_list_last in ice_send_thread, so also the code you mentioned earlier, to calculate number of bytes sent in last second). With it I spend 50% of time in ice_send_thread g_list_last. Without, perf results are really better, top function (srtp encryption function) doesn't go higher than 15%, and the rest is very low; and again this is with a quite heavy bitrate (aroud 6Mbps).


That's really interesting! Sorry, didn't know you had made these tests already, I thought you had only done profiling so far (well, *only*, that's a huge task already...).

I'm abroad until the 26th for various events and next week in particular I'll be very busy, so not sure when I'll be able to prepare a patch that includes what we discussed in this exchange. I'll notify here in case I manage to put something together quickly that you and others can test.

Thanks again for this really useful feedback!


You're welcome! Don't hesitate if you think I can help!

Cheers,

Emmanuel.

Lorenzo Miniero

unread,
Nov 10, 2017, 11:09:32 PM11/10/17
to meetecho-janus
Quick and dirty patch, that I don't have time to validate or test: https://github.com/meetecho/janus-gateway/pull/1063
Please let me know if you see improvements. Feel free to play with the code to see if you can fix issues, if any, as I won't be able to do much more with it for a while.

L.

Paul Dao

unread,
May 6, 2019, 2:37:36 PM5/6/19
to meetecho-janus
Does anyone know if this fixed is in the latest Janus release?

Thanks

Paul
L.

 


 
L.

 
L.


 
Cheers,

Emmanuel.












To unsubscribe from this group and all its topics, send an email to meetech...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the Google Groups "meetecho-janus" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/meetecho-janus/8_XGmAnOv-4/unsubscribe.
To unsubscribe from this group and all its topics, send an email to meetech...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the Google Groups "meetecho-janus" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/meetecho-janus/8_XGmAnOv-4/unsubscribe.
To unsubscribe from this group and all its topics, send an email to meetech...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the Google Groups "meetecho-janus" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/meetecho-janus/8_XGmAnOv-4/unsubscribe.
To unsubscribe from this group and all its topics, send an email to meetech...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the Google Groups "meetecho-janus" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/meetecho-janus/8_XGmAnOv-4/unsubscribe.
To unsubscribe from this group and all its topics, send an email to meetech...@googlegroups.com.

Alessandro Amirante

unread,
May 6, 2019, 5:48:38 PM5/6/19
to Paul Dao, meetecho-janus
If you're referring to the libnice global lock issue as per the thread title, Janus has nothing to do with that, libnice does. Recent versions of libnice (>=0.1.14) got rid of the global lock in most scenarios. IIRC it's still there for ice-tcp and turn connections.

If instead you're talking about the ice send thread improvement that is discussed in the thread, it's been merged long time ago.

A.

You received this message because you are subscribed to the Google Groups "meetecho-janus" group.
To unsubscribe from this group and stop receiving emails from it, send an email to meetecho-janu...@googlegroups.com.

Lorenzo Miniero

unread,
May 7, 2019, 12:37:40 AM5/7/19
to Alessandro Amirante, Paul Dao, meetecho-janus
Just as a note, IIRC the libnice versions with the fix start from 0.1.15, so 0.1.14 should still be affected.

Lorenzo

Paul Dao

unread,
May 7, 2019, 12:07:30 PM5/7/19
to meetecho-janus
Hi Lorenzo and Alessandro:

Thanks for a quick replied. We are trying to upgrade to the latest version of libnice and Janus and I still want to know if it would resolved the lock issue we ran into.
Alessandro have said that libnice get rid of global lock in most scenarios... but we are using ice-tcp and turn connections with janus..
Are we still safe for upgrading?
What are the most scenarios you know of?

Thanks

Paul D
A.

To unsubscribe from this group and stop receiving emails from it, send an email to meetech...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "meetecho-janus" group.
To unsubscribe from this group and stop receiving emails from it, send an email to meetech...@googlegroups.com.

Alessandro Amirante

unread,
May 7, 2019, 12:18:28 PM5/7/19
to Paul Dao, meetecho-janus
AFAIK, ice-tcp is still affected by the global lock, so the bottleneck will likely still be there. Upgrading is still recommended as they made other improvements and fixes in master. If you're using TURN *for Janus* (i.e., you configured it in janus.jcfg) beware of this bug, though.

A.

To unsubscribe from this group and stop receiving emails from it, send an email to meetecho-janu...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/meetecho-janus/8ec7610b-614c-4736-9160-b22800fec5c3%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages