Hello,I am benchmarking janus, specifically streaming plugin with RTP inbound traffic. The goal would be to estimate the maximum number of client per janus instance where the client receives 5Mbits video/audio in "realtime" (camera-to-screen in less than 400ms for transatlantic connection).
In general it works well as long as Janus process reaches 100% of CPU. In my test there is Janus with following config:- 2 x Intel Xenon E5-2620 v3 (24 cores)- 32GB RAM- up/down link 10Gbits- Debian WheezyThe clients are done with AWS instances by implementing "WebRTC stress load". More details you can find in our blogpost http://www.cargomedia.ch/2015/10/15/headless-chrome-on-ec2.html
With current setup I can handle publisher stream of 5Mbits and re-stream up to 25 WebRTC client with 5Mbits.Observations:- Janus uses 100% of single core. The rest of 23 cores are basically idle
- It uses 130Mbit of uplink (our max is 10GBits)
In general for 25+ clients the delay on delivery rise from 400ms to 3+ seconds.
Is there any chance to optimise that? Could you please advise me what I am doing wrong?
Hi Lorenzo,Thanks for all information.For next benchmark I am going to scale vertically on the single bare metal HPC instance by running 2 scenarios:- mulitple Janus instances- using dockerFor both cases I assume to observe a better CPU utilization. Will share my result as soon as possible.Answering your questions:1. looks like Janus threads are distributed to all cores but they consume very little CPU. Only one thread (main one I guess) uses max CPU (when many clients connected).2. I have tried to scale up to 100 clients with 5Mbits each. Up to 25 clients there is ~400ms delay. Up to 50 clients it works with delay of delivery 2-3sec. Up to 100 clients there are missing frames + delay is ~7sec.3. I have tried they same with 2 publishers by streaming into 2 separate mountpoints. The result is the same, means the limitation seems to be in distributing to the WebRTC end-point (I could achieve a good quality up to 25 client for mp1+mp2).
Architecture
Server description
2.9GHz Xeon E5-2690, 32GB RAM, 10gbps ethernet
Debian Wheezy
Publisher description
Live RTP inbound from Gstreamer
2Mbit and 5Mbit incoming stream
Subscriber description
Headless Chrome instances on EC2 http://www.cargomedia.ch/2015/10/15/headless-chrome-on-ec2.html
Up to 75 WebRTC subscribers per AWS instance
Benchmark phases
Single mountpoint per single Janus instance
Two mountpoints per single Janus instance
Two Janus instances with one mountpoint each.
Results
Spreadsheet with results https://docs.google.com/a/cargomedia.ch/spreadsheets/d/1ViHanQXDHRkLsToYG3SOH2W-7VqZGWLu7nYselx9fWs/editFor single or double mountpoint per Janus instance the CPU seems to behave identical. Only one process is active.
Throughput limit has been reached at 310Mbit per Janus instance (single and double mountpoint). Above that the video starts stuttering.
For 2 separate Janus instances the performance and throughput has doubled. The limit of 310Mbit still exist per single Janus instance. But for 2 x Janus instance we could stream up to 600Mbit. Adding next instances seems to work reliably.
Sorry, the spreadsheet should be publicly available now.My observations:- agree, the throughput limitation is strongly related to the CPU. Second test has been done with upgraded CPU (from Intel Xenon E5-2620 to 2.9GHz Xeon E5-2690) and I confirm there is significant increase in the performance.- also the threads (peers) are distributed over all cores (with utilization less than 2-3% each) but there is always single thread (main) which consumes 100% of core (when higher outbound traffic) for single or double mountpoint.- when mounpoint (single/double test) main thread reaches max of CPU it is easy to observe that most of webrtc/peer threads are 0% CPU. Looks like data is not pushed there anymore?
Some compiler optimization would help for sure! However, better threads distribution per MP would increase throughput significantly imo.