Hi Fred!
Firstly allow me to congratulate you and your team for this great effort of yours on BBB project.
We have installed BBB in a bare metal server (self-hosted) with these specs
BigBlueButton Server 2.2.0-beta-20 (1416)
Kernel version: 4.4.0-161-generic
Distribution: Ubuntu 16.04.6 LTS (64-bit)
Memory: 32903 MB
CPU cores: 24 (Intel(R) Xeon(R) CPU E5-2640 0 @ 2.50GHz)
Network Bandwith : 1Gbps
On this instance of BBB we are running multiple concurrent meetings for educational purposes (university classes). For the history,
- we have conducted a BBB meeting with 110 users (109 students and 1 moderator - there was only 1 webcam streaming)
- we have conducted 10 concurrent meetings with a total of around 250 users participating in those (60 participants to one meeting, 10 to another meeting etc...)
Yesterday while there were about 5-6 meetings running (no more than a total of 150 participants) from one point and further if someone was trying to share his cam he was getting the error "error 2200 media server failed to process request"
Running top command on the server the kurento-media-server s process was getting all the cpu power (the server got up to load average of 717!!!) At that point i was expecting the server to just crush. Instead of crushing, all the meetings kept going on normally (without audio problems) except the video feature.
In two meetings out of total the users shared about 10 webcams per meeting. In the other meeting there was only one cam shared per meeting.
Because i didn't want to interrupt all these meetings i didn't run "bbb-conf --restart" (taking into consideration that on all other aspects the meetings had a normal flow). At some time the kurento process stopped (without human intervention) and cpu load average came back to normal. Running systemctl status kurento-media-server the result was that it was running fine (didn't expect that). I did a systemctl restart kurento-media-server while the meetings were running but it didn't help.
Finally after all meetings ended i did "bbb-conf --restart" and problem was solved (Didn't have to reboot server)
I reviewed all logs but didn't manage to find what caused the problem. Searched the internet and found out that this is appearing the last month also to your demo server.
- Could you please provide me with any hint in order to be prepared and avoid it the next time? Maybe help me where to search in logs in order to dig out the real cause?
- Is it something (except bbb-conf --restart") that a sys-admin can do in order to correct it live (without breaking the running meetings) if it happens again?
- I think that the specs of the BBB instance could hold a great load. I didn't expect that day a problem to show up. Your thoughts on that?
I am planning to run a meeting with 500 users and 1 moderator (one webcam - only the moderator) with no other concurrent meetings running the same time. Doing the math on cpu and bandwith (1 Gbps line) i am assuming that it has great chances to go out well. I would be very happy if i could record any data (system or app data) that could help your effort on developing and troubleshooting and maybe some advice of yours based on your experience in order to avoid disruptions or degradations of quality.
Thanks in advance.
Regards,
Ioannis