Django Channels Load Testing Results

1,990 views
Skip to first unread message

Robert Roskam

unread,
Sep 11, 2016, 8:46:52 AM9/11/16
to Django developers (Contributions to Django itself)
Hello All,

The following is an initial report of Django Channels performance. While this is being shared in other media channels at this time, I fully expect to get some questions or clarifications from this group in particular, and I'll be happy to add to that README anything to help describe the results.



Robert Roskam

Chris Foresman

unread,
Sep 12, 2016, 2:38:59 PM9/12/16
to Django developers (Contributions to Django itself)
Is this one worker each? I also don't really understand the implication of the results. There's no context to explain the numbers nor if one result is better than another.

Robert Roskam

unread,
Sep 12, 2016, 9:41:05 PM9/12/16
to Django developers (Contributions to Django itself)
Hey Chris,

The goal of these tests is to see how channels performs with normal HTTP traffic under heavy load with a control. In order to compare accurately, I tried to eliminate variances as much as possible. 

So yes, there was one worker for both Redis and IPC setups. I provided the supervisor configs, as I figured those would be helpful in describing exactly what commands were run on each system.

Does that help bring some context? Or would you like for me to elaborate further on some point?

Thanks,
Robert

Erik Cederstrand

unread,
Sep 13, 2016, 3:28:36 AM9/13/16
to django-d...@googlegroups.com

> Den 13. sep. 2016 kl. 03.41 skrev Robert Roskam <raider...@gmail.com>:
>
> Hey Chris,
>
> The goal of these tests is to see how channels performs with normal HTTP traffic under heavy load with a control. In order to compare accurately, I tried to eliminate variances as much as possible.
>
> So yes, there was one worker for both Redis and IPC setups. I provided the supervisor configs, as I figured those would be helpful in describing exactly what commands were run on each system.
>
> Does that help bring some context? Or would you like for me to elaborate further on some point?

First of all, thanks for taking the time to actually do the measurements! It's insightful and very much appreciated.

At least for me, it took a long time to find out what the graphs were actually showing. In the first graph, maybe the title could be more descriptive, e.g. "Request latency at 300 rps". Also, 300K requests in 10 minutes is 500 rps, but the text says 500 rps. Which is it?

The unit in the second graph is requests per minute, which is inconsistent since the first graph is requests per second. This also makes comparison difficult. Also, it doesn't actually show the requests per minute value unless you break out a calculator, since the X axis stops at 50 seconds.

Also, the lines in the second graph are suspiciously linear - how many measurements were made, at which points in time, what is the jitter? Just show the actual measurements as dots, then I can live with the straight line. That would also show any effects of the autothrottle algorithm.

Finally, I have a hard time understanding the latency values - the config shows 1 worker, so I'm assuming seial output. But for Gunicorn, the graph shows ~80.000 rpm which corresponds to a latency of 0,75ms, while the text says 6ms. Likewise, according to the graph, Redis has a latency of 1,7ms and IPC 6ms, which does not align with the text. Is this the effect of an async I/O behind the scenes, og are there multiple threads within the worker?

Erik

Michael Manfre

unread,
Sep 13, 2016, 7:13:04 AM9/13/16
to Django developers (Contributions to Django itself)
Hi Robert,

Thanks for doing this load testing. More context would be useful to help us outside observers to understand the potentially different variables. Is redis running locally or are you using elasticache?

Regards,
Michael Manfre

--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-develop...@googlegroups.com.
To post to this group, send email to django-d...@googlegroups.com.
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/7fdb4db1-7458-409f-94f5-d1dad3973616%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Erik Cederstrand

unread,
Sep 13, 2016, 9:50:18 AM9/13/16
to django-d...@googlegroups.com

> Den 13. sep. 2016 kl. 09.28 skrev Erik Cederstrand <erik+...@cederstrand.dk>:
>
> First of all, thanks for taking the time to actually do the measurements! It's insightful and very much appreciated.
>
> [...]300K requests in 10 minutes is 500 rps, but the text says 500 rps. Which is it?
^^^^^^^
300 rps

Jeez, not even the email whining about inconsistencies can get the numbers right :-)

Erik

Chris Foresman

unread,
Sep 14, 2016, 10:21:27 AM9/14/16
to Django developers (Contributions to Django itself)
Yes. Honestly, just explain what these results mean in words, because I cannot turn these graphs into anything meaningful on my own.

Robert Roskam

unread,
Sep 25, 2016, 8:22:15 PM9/25/16
to Django developers (Contributions to Django itself)
Hey Micahel,

Yes, I ran the tests locally in redis. I thought this would be the best choice for a 1-to-1 comparison. In my opinion, it would invalidate the results to run the tests on an outside server or service.

Robert Roskam

unread,
Sep 25, 2016, 9:03:09 PM9/25/16
to Django developers (Contributions to Django itself)

Hey Erik,

You are absolutely correct! It was 500 rps. It's a typo. Very sorry! I'll fix it!

To your other points,


> At least for me, it took a long time to find out what the graphs were actually showing. In the first graph, maybe the title could be more descriptive, e.g. "Request latency at 300 rps". 

That sounds better! It's an accurate description. I appreciate the feedback.


> The unit in the second graph is requests per minute, which is inconsistent since the first graph is requests per second. This also makes comparison difficult. Also, it doesn't actually show the requests per minute value unless you break out a calculator, since the X axis stops at 50 seconds. 

So is your request basically for me to give the slope on each one so that you can interpolate the results from one graph to the other?


> Also, the lines in the second graph are suspiciously linear - how many measurements were made, at which points in time, what is the jitter? Just show the actual measurements as dots, then I can live with the straight line. That would also show any effects of the autothrottle algorithm. 

I'll have to regather that data. I had not logged every single response. I was aggregating the results every 5 seconds. I can resample this one.



> Finally, I have a hard time understanding the latency values - the config shows 1 worker, so I'm assuming seial output. But for Gunicorn, the graph shows ~80.000 rpm which corresponds to a latency of 0,75ms, while the text says 6ms. Likewise, according to the graph, Redis has a latency of 1,7ms and IPC 6ms, which does not align with the text. Is this the effect of an async I/O behind the scenes, og are there multiple threads within the worker? 

I'm pressed to understand what you're trying to say. I'm not sure where you got the 80 (or is it 80 thousand?) rps from. If you're trying to sum the values of gunicorn through time, I guess that exposes something else that I either misrepresented or said incorrectly. The values presented are accumulated. Perhaps, that's not the correct way to present this.

Try the below instead.:





Robert Roskam

Robert Roskam

unread,
Sep 25, 2016, 9:23:45 PM9/25/16
to Django developers (Contributions to Django itself)
Hey Chris,

Sure thing! I'm going to add a little color to this; probably a little more than required.

I have gunciorn for comparison on both graphs because channels supports HTTP requests, so we wanted to see how it would do against a serious production environment option. I could have equally done uwsgi. I chose gunicorn out of convenience. It serves as a control for the redis channels setup.

The main point of comparison is to say: yeah, Daphne has an order of magnitude higher latency than gunicorn, and as a consequence, it's throughput in the same period of time as gunicorn is less. This really shouldn't be surprising. Channels is processing an HTTP request, stuffing it in a redis queue, having a worker pull it out, process it, and then send a response back through the queue. This has some innate overhead in it. 

You'll note I didn't include IPC for latency comparison. It's because it's so bad that it would make the graph unreadable. You can get the sense of that when you see it's throughput. So don't use it for serious production machines. Use it for a dev environment when you don't want a complex setup, or use it with nginx splitting traffic for just websockets if you don't want to run redis for some reason.



Robert Roskam

Chris Foresman

unread,
Sep 26, 2016, 9:59:24 AM9/26/16
to Django developers (Contributions to Django itself)
Robert,

Thanks! This really does clear things up. The results were a little surprising at first blush since I believe part of the idea behind channels is to be able to serve more requests concurrently than a single-threaded approach typically allows. This is why I don't think this benchmark alone is very useful. We already knew it would be slower to serve requests with a single worker given the overhead as you described. So what does this benchmark get us? Is it merely to characterize the performance difference in the pathological case? I think ramping up the number of workers on a single machine would be an interesting next step, no?

Anyway, thanks for taking the time to do this work and help us understand the results.

ludovic coues

unread,
Sep 26, 2016, 3:25:04 PM9/26/16
to django-d...@googlegroups.com
What you call a pathological case is a small website, running on
something like cheap VPS.
> --
> You received this message because you are subscribed to the Google Groups
> "Django developers (Contributions to Django itself)" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to django-develop...@googlegroups.com.
> To post to this group, send email to django-d...@googlegroups.com.
> Visit this group at https://groups.google.com/group/django-developers.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/django-developers/f58727ac-fc47-439b-8943-eddf4021a96f%40googlegroups.com.
>
> For more options, visit https://groups.google.com/d/optout.



--

Cordialement, Coues Ludovic
+336 148 743 42

Chris Foresman

unread,
Sep 26, 2016, 4:30:42 PM9/26/16
to Django developers (Contributions to Django itself)
Why would you be running a small website in ASGI mode with a single worker? My suspicion is that someone using Django in ASGI mode has a specific reason to do so. Otherwise, why not run it in WSGI mode?

Andrew Godwin

unread,
Sep 26, 2016, 4:45:15 PM9/26/16
to Django developers (Contributions to Django itself)
You might want to run a small site with WebSockets - there are a number of reasons to use ASGI mode, and it's important we make it scale down as well as up.

Andrew

To unsubscribe from this group and stop receiving emails from it, send an email to django-developers+unsubscribe@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.

ludovic coues

unread,
Sep 26, 2016, 4:53:29 PM9/26/16
to django-d...@googlegroups.com
For exemple, student trying to do an interactive browser game.
From what I understood, ASGI main objective is to be the standard for
websocket with django.

In my opinion, the tested case is not pathological. It is the default
one. Django configured barely enough to have stuff working.

I agree that the benchmark only show one face of the truth. Maybe ASGI
scale way better than WSGI. Maybe ASGI require only a fraction of the
CPU or memory required by WSGI. I don't know.

But the use case isn't pathological. If the default are the worst
configuration possible, something is wrong.
> https://groups.google.com/d/msgid/django-developers/645a5190-2437-4d6e-b8df-526650bfd2c0%40googlegroups.com.

Erik Cederstrand

unread,
Sep 27, 2016, 8:22:34 AM9/27/16
to django-d...@googlegroups.com
Hi Robert,

> Den 26. sep. 2016 kl. 03.03 skrev Robert Roskam <raider...@gmail.com>:
>
> > The unit in the second graph is requests per minute, which is inconsistent since the first graph is requests per second. This also makes comparison difficult. Also, it doesn't actually show the requests per minute value unless you break out a calculator, since the X axis stops at 50 seconds.
>
> So is your request basically for me to give the slope on each one so that you can interpolate the results from one graph to the other?

One way to compare results from the two tests would be to find out how far from max throughput on graph #2 you measured the latency from graph #1. It's difficult for me to add a 300 rps line in graph #2 because the 1-second mark on the X-axis is impossible to read, and the 1-minute mark is off the axis. Does that make sense?


> > Also, the lines in the second graph are suspiciously linear - how many measurements were made, at which points in time, what is the jitter? Just show the actual measurements as dots, then I can live with the straight line. That would also show any effects of the autothrottle algorithm.
>
> I'll have to regather that data. I had not logged every single response. I was aggregating the results every 5 seconds. I can resample this one.

It's not a big deal, it's just that the straight lines are hiding a lot of possibly interesting information in the distribution of the sample data - is it really linear, what's the variability, does the server slow to a crawl after 15 seconds, etc. Thanks for the attached graph, which clears a lot of this up.


> > Finally, I have a hard time understanding the latency values - the config shows 1 worker, so I'm assuming seial output. But for Gunicorn, the graph shows ~80.000 rpm which corresponds to a latency of 0,75ms, while the text says 6ms. Likewise, according to the graph, Redis has a latency of 1,7ms and IPC 6ms, which does not align with the text. Is this the effect of an async I/O behind the scenes, og are there multiple threads within the worker?
>
> I'm pressed to understand what you're trying to say. I'm not sure where you got the 80 (or is it 80 thousand?) rps from. If you're trying to sum the values of gunicorn through time, I guess that exposes something else that I either misrepresented or said incorrectly. The values presented are accumulated. Perhaps, that's not the correct way to present this.


Sorry, I'm in a locale where we use '.' as thousands separator and ',' as decimal separator. I'll switch to American notation now :-)

I read the blue line for Gunicorn in graph #2 as having returned in total 80.000 requests after a duration 60 seconds (just over the 75,000 mark after ca. 55 seconds). Is that an incorrect interpretation of the graph? If not, then 60 seconds / 80,000 requests * 1000 = 0.75 ms per request, if the requests are processed in serial.

Erik

Chris Foresman

unread,
Sep 27, 2016, 9:34:48 AM9/27/16
to Django developers (Contributions to Django itself)
Ok, but now we're talking about something that as far as I can tell, you can't even do with gunicorn. I agree WebSockets is a great reason to consider ASGI mode, but this still feels like an apples to oranges comparison. Or is the goal to somehow get Daphne + Django channels to work as fast as gunicorn + WSGI mode? Is that a reasonable way to think of it?
Reply all
Reply to author
Forward
0 new messages