Janus server stops working randomly until restart

154 views
Skip to first unread message

Shantanu Dhanuka

unread,
Dec 5, 2020, 7:56:52 AM12/5/20
to meetecho-janus
Dear All

We have been developing our solution using the Janus videoroom plugin for last 6 months. We have set up everything and the product is in production now. We have the following issues reported from customers : 

Randomly during the day, the Janus server stops working in the middle of a meeting. This has no correlation to the load on the server or the internet connection used. We have tested with a small load of 4 users today and the problem still occurs. If we do a 
sudo systemctl restart Janus
The server starts working again without an issue. This is using the exact same environment as before the restart.

For help with analyzing this problem, I am posting my Janus log details here (Its a pretty small log): https://pastebin.com/TCyf9X9M

What I could understand from this log file is, I see a bunch of 
ICE failed for component 1 in stream 1
Issues in the log file. I was earlier advised in this forum that to fix this problem, I would need to implement a TURN server. I just want to understand that if this really is an ICE issue, then why does a server restart fix the problem ?? The restart of the Janus utility does not change the server or client IP address, so TURN server being the solution does not make sense to me.

For a clearer understanding, you can even find a list of just the errors and warnings from the above log, over here: https://pastebin.com/qUjEx8k2

The error today was reported at about 15:56 as per this log file, after which at about 16:17 we restarted the server and it worked without a problem.

We are struggling with this issue for a long time and will really appreciate any help which helps us put this random issue to rest, for good. This has become such a menace in production that we are forced to consider dumping Janus after spending six months working on it. Please help. 

Regards
Shantanu Dhanuka

Lorenzo Miniero

unread,
Dec 7, 2020, 7:02:22 AM12/7/20
to meetecho-janus
Not sure what you expect us to do, since you don't provide any useful information. You don't even say what version of Janus this is (and it definitely isn't master, since the few lines I see mentioned don't match), how it's configured or used, which plugins, etc. Try master first, then provide more info, and then maybe someone will be able to help. That said, if you expect people to run to you just because you're running out of time, then you may need to realize this is the wrong place for that: this is a best effort community, not a commercial support forum.

Lorenzo

John Burns

unread,
Dec 8, 2020, 5:00:36 AM12/8/20
to meetecho-janus
Hi Shantu,

Not sure if any of these suggestions help, but here are some thoughts:

1. Restarting a server like Janus obviously results in the release and reinitialization of resources. So, if this solves your problem, then your server is getting into an unhealthy state during these 4-5 minutes (for whatever reason)
2. If we assume that the videoroom plugin is pretty solid - being used in production by many teams, then the fault is probably with your front end code.
3. Examine what you are doing from the browser side. I can see in your pastebin logs you are repeatedly trying to create a room, and are getting a room exists error. This leads me to think you have some bugs on the Javascript side.
4. Server load (ie, CPU utilization) is not necessarily a useful metric here. You should use the Janus admin.js, to connect to the server and examine the state of your live sessions. 
5. In particular, you could take one single user, and monitoring the server using admin,js, watch the sessions, plugins, #publishers, #subscribers etc. to see if it meets your understanding of your client application. My guess is that you will see some very surprising states which may lead you to discover the bugs in your front end causing this problem.

When developing your front end code you should always use the admin interface to monitor closely the backend state. This is something I do all the time, in order to understand the consequences of my client design and implementation.

John
Reply all
Reply to author
Forward
0 new messages