We are using Kurento as a SFU and on a server that has 40 vCPUs (Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz), it tends to max out 100% CPU when there is around 500-700 user sessions in a room.
Looking at Jitsi Videobridge
https://jitsi.org/jitsi-videobridge-performance-evaluation/ performance, it claims that is can handle "
1056 Streams; Bitrate mean: 550.4Mbps; CPU usage mean: 20.3%", which means it can handle 1056 streams with just 2 CPUs (their test server machine is equipped with a quad-core Intel ® Xeon® E5-1620 v2 @ 3.70GHz CPU)
So, why Kurento is very CPU intensive even when we use it as a SFU? Is there anything special we need to do to make it more scalable like Jitsi?
In our last 3 years of Kurento in production, it was great for one-o-one sessions and small group conferences up to 5-10. When we use it to deliver webinars with over 500 attendees, it usually crashes or maxes out the CPU cores. We usually reserve high-end servers with minimum 40 cores, but no luck.
Kurento clustering is another puzzle. There is no official ElasticRTC support and latest builds available in AWS after Twilio's acquisition of Kurento. I am not sure anyone else have successfully clustered Kurento using Hazelcast (or Redis, not sure??) and optimized it for enterprise grade service for end users.
AWS is not cheap. Even if we go with C5 compute optimized instances, we need to provision servers to meet capacity in advance. Let's say you are a SaaS provider, and you have 2 x c5.2xlarge instances which brings to total 16 vCPUs. Each core may handle up to 20 sessions, so total 320 sessions. As a SaaS provider, the traffic is unpredictable. Either you have to spin multiple instances in advance to meet a 1000 concurrent session scenario or rely on auto scaling to do that. With auto scaling, there is warm up time for the new instance to come alive, and until then new users joining the rooms may have a bad experience.
Ideally, number of sessions per core needs to be higher so we can save on AWS billing cost in the longer run and capacity can be reserved in advance for a predictable workload. However, Kurento clustering is not something readily available and it may be possible by reaching out to Kurento professional services, but the performance of Kurento being a SFU is again questionable when we compare the numbers with Jitsi Videobridge.
Hope someone came across these concerns before. Any feedback or thoughts on this will be highly appreciated.