Hi,love the name! :-DI'll give it a try myself as soon as I can, as I'm always curious about projects that use Janus. :-)About the freeze issues, the causes may be different, and I'm not sure the log alone can help. It may be a keyframe that has been lost, for instance, causing all differential packets to be discarded by the decoder until a new key frame is received. Do you configure a FIR/PLI frequency for rooms, or did you disable that part? A regular FIR/PLI (e.g., every 5-10 seconds) may help make sure that in such cases a key frame is received much sooner: without that, browsers usually send a FIR every 100 seconds instead.
If it's not FIR/PLI related (e.g., video never recovers for some publishers) it may instead be a different issue, e.g., video that is not being received by a publisher anymore, or for some reason not being broadcasted to (all or some?) participants.
One additional reason may be ICE related: if the publisher is using TURN and the binding stops for some reason, none of the media that is being sent would be relayed by the TURN server to Janus anymore. As you can see, there are several potential causes, and the only way to investigate is check the available sources.
The best way to check this is to look at both the chrome://webrtc-internals (or in case of Firefox, about:webrtc) and the Janus admin API. The latter will provide, for each handle, details about the number of bytes that have been sent/received per media type, and it has a per-second view as well. Of course it also provides other useful pieces of information, as for instance the ICE state, currently used candidates, and so on. Using our admin demo page it can be a bit messy to look for the right handle, as you'd have to crawl through all of them until you find the publisher handle you're interested in, and maybe some of the viewers as well (just to verify, for instance, if it's a specific viewer that's not receiving anything anymore or something else). You may want to script something that automates that process for you. Please also notice that there's no async event for admin stuff: it's all request/response based, so you need to refresh a query about a handle to get new information about it.
By checking the combined info on client (internals) and server (admin API) sides, there should be additional pieces to the puzzle.Hope that helps,Lorenzo
Replying inline.
El miércoles, 6 de mayo de 2015, 14:36:18 (UTC+2), Lorenzo Miniero escribió:Hi,love the name! :-DI'll give it a try myself as soon as I can, as I'm always curious about projects that use Janus. :-)About the freeze issues, the causes may be different, and I'm not sure the log alone can help. It may be a keyframe that has been lost, for instance, causing all differential packets to be discarded by the decoder until a new key frame is received. Do you configure a FIR/PLI frequency for rooms, or did you disable that part? A regular FIR/PLI (e.g., every 5-10 seconds) may help make sure that in such cases a key frame is received much sooner: without that, browsers usually send a FIR every 100 seconds instead.
We have fir_freq = 10 in all the rooms. So that shouldn't be the problem.
If it's not FIR/PLI related (e.g., video never recovers for some publishers) it may instead be a different issue, e.g., video that is not being received by a publisher anymore, or for some reason not being broadcasted to (all or some?) participants.
I assume you meant "video not being received FROM a publisher". Anyway, if that happens (let's say a bandwidth problem in one of the participants), do you mean that the problem can propagate to the whole system and cause the whole room to freeze? That looks like a quite undesirable feature.
One additional reason may be ICE related: if the publisher is using TURN and the binding stops for some reason, none of the media that is being sent would be relayed by the TURN server to Janus anymore. As you can see, there are several potential causes, and the only way to investigate is check the available sources.
We saw this on the logs.
STUN-CLIENT(srflx(IP4:10.100.201.16:58265/UDP|stun.l.google.com:19302)): Timed out
Can it be the culprit or looks more like a symptom or a consequence?
If it's the culprit, what's your recommendation?
The best way to check this is to look at both the chrome://webrtc-internals (or in case of Firefox, about:webrtc) and the Janus admin API. The latter will provide, for each handle, details about the number of bytes that have been sent/received per media type, and it has a per-second view as well. Of course it also provides other useful pieces of information, as for instance the ICE state, currently used candidates, and so on. Using our admin demo page it can be a bit messy to look for the right handle, as you'd have to crawl through all of them until you find the publisher handle you're interested in, and maybe some of the viewers as well (just to verify, for instance, if it's a specific viewer that's not receiving anything anymore or something else). You may want to script something that automates that process for you. Please also notice that there's no async event for admin stuff: it's all request/response based, so you need to refresh a query about a handle to get new information about it.
Debugging all that info sounds like too much for us. All we know about WebRTC is that Janus makes everything easy for us. :-)
Hi,love the name! :-DI'll give it a try myself as soon as I can, as I'm always curious about projects that use Janus. :-)
Il giorno giovedì 7 maggio 2015 12:00:28 UTC+2, Ancor Gonzalez Sosa ha scritto:Replying inline.
El miércoles, 6 de mayo de 2015, 14:36:18 (UTC+2), Lorenzo Miniero escribió:Hi,love the name! :-DI'll give it a try myself as soon as I can, as I'm always curious about projects that use Janus. :-)About the freeze issues, the causes may be different, and I'm not sure the log alone can help. It may be a keyframe that has been lost, for instance, causing all differential packets to be discarded by the decoder until a new key frame is received. Do you configure a FIR/PLI frequency for rooms, or did you disable that part? A regular FIR/PLI (e.g., every 5-10 seconds) may help make sure that in such cases a key frame is received much sooner: without that, browsers usually send a FIR every 100 seconds instead.
We have fir_freq = 10 in all the rooms. So that shouldn't be the problem.
If it's not FIR/PLI related (e.g., video never recovers for some publishers) it may instead be a different issue, e.g., video that is not being received by a publisher anymore, or for some reason not being broadcasted to (all or some?) participants.
I assume you meant "video not being received FROM a publisher". Anyway, if that happens (let's say a bandwidth problem in one of the participants), do you mean that the problem can propagate to the whole system and cause the whole room to freeze? That looks like a quite undesirable feature.Yes I meant FROM. I thought only some videos were getting frozen, which is why I thought about the possible cause above. If one of the publishers can't send its frames to Janus anymore, all its viewers are obviously not going to receive them anymore: this translates in frozen video/audio from the viewer's perspective. If a few viewers can't get the video for some reason, same thing for them only.A complete freeze of all videos for everybody is something different, and not something we ever experienced. Try to make sure you're not pushing the forced video bitrate too high, especially if you're seeing a lot of nacks. The Janus videoroom plugin notifies about slow link events (both on the uplink and downlink sides) so you can make use of that feedback to configure the publishers' bitrate accordingly.
Hi all,
We're developing Jangouts[1] (some kind of «Google Hangouts» clone) and we're relying on Janus. It seems to work quite ok but, sometimes, video/audio get frozen for everyone and, to be honest, I am not able to find the problem in Janus logs.
I've seen a lot of NACKS and packets retransmissions, so I'm no sure if it has something to do with bandwidth or something like that. I'm attaching the logs of the full session (they are compressed) just in case someone could spot the problem there.
Please, let me know if you want us to do more tests.
Great, and thanks for doing this, as it also provides a nice opportunity for stress testing! I' think I'll be able to chime in at the beginning of next week (sorry, can't make it earlier than that). Of course I encourage all other members of the group to give it a try and help us all make Janus a better place ;-)About the number of people exceeding 8-9, also take into account potential limitations on the client side, as the CPU/mem consumption there might get up to a point where they can't handle it anymore. In such a case, also the capture/encoding/transmission might be affected, which could result in missing packets sent to Janus, Janus asking for retransmission of those lost packets, and the client not being able to cope (and viewers who would get a bad experience as a result). So estimating the usage of resources on that side might help exclude some causes.
About other bits, just some quick thoughts:(1) the inability to join can depend on several factors... if it's always the same people, try to make sure they can indeed do WebRTC FROM a network perspective, e.g., that a TURN is available. Debugging the ICE state on both server and client side might help.
(2) this might mean that the publisher stream was not successfully established for them, and as such their availability was not notified. This might be related to the excessive resources we discussed about and/or different issues: again looking at the admin API for that specific handle might have more info on the cause of the issue (eg. ICE failure or something else).
(3) looks like an invalid SRTP context, maybe a result of a destroyed handle that was still used; I do see a (ctx=0x0) infact, maybe we should add a check for that;
(4) meaning that some messages were not received/relayed, or that they could not be established?
Thanks again for this experiment, hope to join you soon!
Can you try updating Janus? Pierce provided a patch that should fix the sometimes incorrect and heavy behaviour that Janus had with respect to NACKs and retransmissions, so this might help in your case.
Lorenzo,Theses changes are some great work.
El miércoles, 20 de mayo de 2015, 21:45:13 (UTC+2), Wilbert Jackson escribió:Lorenzo,Theses changes are some great work.
Yes!
Just for the records (Lorenzo already know because he was there), it also improved dramatically our use case. No single crash of Janus and no weird behavior. Moreover, the server was able to handle several streams more than before.
El jueves, 21 de mayo de 2015, 7:55:15 (UTC+2), Ancor Gonzalez Sosa escribió:El miércoles, 20 de mayo de 2015, 21:45:13 (UTC+2), Wilbert Jackson escribió:Lorenzo,Theses changes are some great work.
Yes!
Just for the records (Lorenzo already know because he was there), it also improved dramatically our use case. No single crash of Janus and no weird behavior. Moreover, the server was able to handle several streams more than before.
Just another detail about the mentioned test, for the records.
We observed that most clients were transmitting around 200kbit/s, although the room was forced to 64kb/s. We thought it was related to the issues described by Wilbert in this thread (clients ignoring the room threshold after a while). The solution seemed to send a "configure" from the client side once in a while to set the bitrate back to 64k (or whatever is set in the room).
I did a preliminary implementation and the problem seems to be another. None Firefox or Chromium seem to ignore the threshold set sever-side, so it looks like I actually don't need the client-side-generated extra call to "configure" (which in the other hand, works nicely to change the bitrate for any other purpose).
The real problem is that Firefox doesn't seem to allow to set the threshold below 192kbits/s. Setting a room to 64k, 128k or 192k has the same effect: Chromium honors the limit while Firefox limits the bitrate to 192k (no matter if the value is lower). With higher bitrates everything works as expected. Sending the "configure" client-side changes nothing: any bitrate lower than 192 is treated as 192 by Firefox, no matter were the REMB is originated.
I hope that information is useful to somebody. Of course, if you know a way to enforce smaller bitrate in Firefox, I'd be glad to hear it.