Macro-benchmark with Django, Flask and AsyncIO (aiohttp.web+API-Hour)

Ludovic Gasc

unread,

Feb 25, 2015, 5:55:24 AM2/25/15

to python...@googlegroups.com

Hi guys,

I'm permitted to send you this e-mail because some people has discussed on this mailing-list about AsyncIO benchmarks.

I've published an improved version of my benchmarks:

http://blog.gmludo.eu/2015/02/macro-benchmark-with-django-flask-and-asyncio.html

Moreover, to reduce the risk to start a benchmark war in Python community, this post should help:

http://blog.gmludo.eu/2015/02/open-letter-for-sync-world.html

Don't hesitate to contact me if you find an error, I'm really interested in.

Regards.

INADA Naoki

unread,

Feb 25, 2015, 10:20:51 AM2/25/15

to Ludovic Gasc, python...@googlegroups.com

It's not benchmark for sync vs async. It is keep-alive vs non keep-alive.

gunicorn's default (sync) worker is designed as "backend" server. It
means it requires reverse proxy like nginx.

Since it doesn't support keep-alive, you should use unix domain socket
when serving high req/sec.
Otherwise, "TIME_WAIT" connection fills all port.

gunicorn supports some engines. Tornado can be used as *sync* (when
using wsgi) and keep-alive engine.
Meinheld is another option. It's very high speed server written in C.

--
INADA Naoki <songof...@gmail.com>

Ludovic Gasc

unread,

Feb 25, 2015, 11:33:46 AM2/25/15

to INADA Naoki, python-tulip

To avoid TIME_WAIT, I used "sudo sysctl -w tcp_tw_recycle=1" command.

About Meinheld, I've tested a little bit in the past. From my little tests, yes it improves performance, but not at the same level as aiohttp+API-Hour version.

Moreover, to my knowledge, almost nobody use that on production, contrary to Gunicorn. The goal was to compare the standard production setup, as explained in Django and Flask documentation, with aiohttp.web+API-Hour.

To be honest, for me, Meinheld uses a little bit black magic to try to transform sync code to async, I don't recommend that on production for a complex Web application.

--
Ludovic Gasc

INADA Naoki

unread,

Feb 25, 2015, 1:49:21 PM2/25/15

to Ludovic Gasc, python-tulip

On Thu, Feb 26, 2015 at 1:33 AM, Ludovic Gasc <gml...@gmail.com> wrote:
> To avoid TIME_WAIT, I used "sudo sysctl -w tcp_tw_recycle=1" command.

It avoid TIME_WAIT problem.
But difference between keep-alive and non keep-alive is very huge
although tcp_tw_recycle=1.

> About Meinheld, I've tested a little bit in the past. From my little tests,
> yes it improves performance, but not at the same level as aiohttp+API-Hour
> version.
> Moreover, to my knowledge, almost nobody use that on production, contrary to
> Gunicorn. The goal was to compare the standard production setup, as
> explained in Django and Flask documentation, with aiohttp.web+API-Hour.

I admit that meinheld is not used well.
But I think you should use some engine supporting keep-alive to
compare with aiohttp.

How about using nginx for both of aiohttp and sync server?

wrk -(HTTP on TCP)-> nginx -(uwsgi on unix socket w/o keep-alive)-> uWSGI
wrk -(HTTP on TCP)-> nginx -(HTTP on unix socket w/ keep-alive)->
Gunicorn (Tornado worker)
wrk -(HTTP on TCP)-> nginx -(HTTP on unix socket w/ keep-alive)-> aiohttp

Or how about don't use keep-alive on aiohttp?

>
> To be honest, for me, Meinheld uses a little bit black magic to try to
> transform sync code to async, I don't recommend that on production for a
> complex Web application.

Meinheld's async feature is based on greenlet (like gevent).
But you can use meinheld without using async API.
It can be high performance sync server supporting keep-alive.

--
INADA Naoki <songof...@gmail.com>

Victor Stinner

unread,

Feb 25, 2015, 5:29:38 PM2/25/15

to INADA Naoki, Ludovic Gasc, python...@googlegroups.com

Hi,

I don't understand everything, but I agree that keep alive has a huge impact on performances.

You should compare servers which all support keep alive, or disable keep alive in the client (or disable it on all servers).

In a previous job, I backported httplib and xmlrpclib to get keep alive from python 2.5 to 2.7, because HTTPS was mandotary and the handshake took between 100 and 900 ms. Our client sent 300 requests or more to load the UI. It's easy to compute the speedup of keep alive ;-)

Victor

Ludovic Gasc

unread,

Feb 25, 2015, 5:44:56 PM2/25/15

to INADA Naoki, python-tulip

On Wed, Feb 25, 2015 at 7:49 PM, INADA Naoki <songof...@gmail.com> wrote:

On Thu, Feb 26, 2015 at 1:33 AM, Ludovic Gasc <gml...@gmail.com> wrote:
> To avoid TIME_WAIT, I used "sudo sysctl -w tcp_tw_recycle=1" command.

It avoid TIME_WAIT problem.
But difference between keep-alive and non keep-alive is very huge
although tcp_tw_recycle=1.

I'm not agree with you, in fact I've used two others commands in a boot script:

echo -n 1 > /proc/sys/net/ipv4/tcp_tw_recycle

echo -n 1 > /proc/sys/net/ipv4/tcp_tw_reuse

I don't remember all parameters I've optimized, because I did a lot tries/errors on my benchmark labo since one year.

I'm checking to push missing parameters in the benchmarks repository.

Without theses parameters, you're right, it's catastrophic for Django/Flask, 50-100 HTTP queries by second.

Thanks for the keep-alive catch, I thought aiohttp.server hadn't keepalive enabled by default, in fact, it changed recently:
http://aiohttp.readthedocs.org/en/latest/changes.html#id4

Server has 75 seconds keepalive timeout now, was non-keepalive by default.

I've disabled keep_alive in api_hour, I quickly tested on agents list webservices via localhost, I've 3334.52 req/s instead of 4179 req/s, 0.233 latency average instead of 0.098 and 884 errors instead of 0 errors.

It isn't a big change compare to others Web frameworks values, but it's a change.

To rerun everything I need time: I've a full-time job and a family, I'll rerun that this week-end.

> About Meinheld, I've tested a little bit in the past. From my little tests,
> yes it improves performance, but not at the same level as aiohttp+API-Hour
> version.
> Moreover, to my knowledge, almost nobody use that on production, contrary to
> Gunicorn. The goal was to compare the standard production setup, as
> explained in Django and Flask documentation, with aiohttp.web+API-Hour.

I admit that meinheld is not used well.
But I think you should use some engine supporting keep-alive to
compare with aiohttp.

How about using nginx for both of aiohttp and sync server?

wrk -(HTTP on TCP)-> nginx -(uwsgi on unix socket w/o keep-alive)-> uWSGI
wrk -(HTTP on TCP)-> nginx -(HTTP on unix socket w/ keep-alive)->
Gunicorn (Tornado worker)
wrk -(HTTP on TCP)-> nginx -(HTTP on unix socket w/ keep-alive)-> aiohttp

Or how about don't use keep-alive on aiohttp?

It's easier to me to rerun without keep-alive enabled.

I'll do a test this week-end with nginx to see if I've a big change. Certainly less errors but certainly more latency.

About a more realist scenario, the Empower Framework Benchmark will be more interesting than my benchmark on a kitchen table.

I've published that to conterbalance some bias about AsyncIO.

>
> To be honest, for me, Meinheld uses a little bit black magic to try to
> transform sync code to async, I don't recommend that on production for a
> complex Web application.

Meinheld's async feature is based on greenlet (like gevent).
But you can use meinheld without using async API.
It can be high performance sync server supporting keep-alive.

First, I will test with nginx before meinheld, nginx is more frequent in Python webdev toolbox.

--
INADA Naoki <songof...@gmail.com>

Antoine Pitrou

unread,

Feb 25, 2015, 6:05:09 PM2/25/15

to python...@googlegroups.com

On Wed, 25 Feb 2015 23:44:33 +0100
Ludovic Gasc <gml...@gmail.com> wrote:
>
> I've disabled keep_alive in api_hour, I quickly tested on agents list
> webservices via localhost, I've 3334.52 req/s instead of 4179 req/s, 0.233
> latency average instead of 0.098 and 884 errors instead of 0 errors.
> It isn't a big change compare to others Web frameworks values, but it's a
> change.

IMO, the fact that you get so many errors indicates that something is
probably wrong in your benchmark setup. It is difficult to believe that
Flask and Django would believe so badly in such a simple (almost
simplistic) workload.

Regards

Antoine.

Aymeric Augustin

unread,

Feb 26, 2015, 2:19:05 AM2/26/15

to Antoine Pitrou, python...@googlegroups.com

If you push concurrency too far — like 5000 threads — I expect
performance to plummet and that kind of results. I suspect that’s
the situation Ludovic’s benchmark creates. It's a pathological use
case for threads.

--
Aymeric.

Antoine Pitrou

unread,

Feb 26, 2015, 4:20:15 AM2/26/15

to python...@googlegroups.com

On Thu, 26 Feb 2015 08:19:00 +0100
Aymeric Augustin
<aymeric....@polytechnique.org>
wrote:

> On 26 févr. 2015, at 00:00, Antoine Pitrou <solipsis-xNDA5W...@public.gmane.org> wrote:
>
> > On Wed, 25 Feb 2015 23:44:33 +0100
> > Ludovic Gasc <gml...@gmail.com> wrote:
> >>
> >> I've disabled keep_alive in api_hour, I quickly tested on agents list
> >> webservices via localhost, I've 3334.52 req/s instead of 4179 req/s, 0.233
> >> latency average instead of 0.098 and 884 errors instead of 0 errors.
> >> It isn't a big change compare to others Web frameworks values, but it's a
> >> change.
> >
> > IMO, the fact that you get so many errors indicates that something is
> > probably wrong in your benchmark setup. It is difficult to believe that
> > Flask and Django would believe so badly in such a simple (almost
> > simplistic) workload.
>
> If you push concurrency too far — like 5000 threads — I expect
> performance to plummet and that kind of results.

Shouldn't the server limit the size of the thread pool and queue
incoming connections? That's what e.g. Apache will do.

Regards

Antoine.

Ludovic Gasc

unread,

Feb 26, 2015, 4:25:35 AM2/26/15

to Antoine Pitrou, python-tulip

On Thu, Feb 26, 2015 at 12:00 AM, Antoine Pitrou <soli...@pitrou.net> wrote:

On Wed, 25 Feb 2015 23:44:33 +0100
Ludovic Gasc <gml...@gmail.com> wrote:
>
> I've disabled keep_alive in api_hour, I quickly tested on agents list
> webservices via localhost, I've 3334.52 req/s instead of 4179 req/s, 0.233
> latency average instead of 0.098 and 884 errors instead of 0 errors.
> It isn't a big change compare to others Web frameworks values, but it's a
> change.

IMO, the fact that you get so many errors indicates that something is
probably wrong in your benchmark setup.

If you have a technical clue, I'm interested in.

It is difficult to believe that
Flask and Django would believe so badly in such a simple (almost
simplistic) workload.

Sorry guys, but often, things mustn't be complicated to be good: I've filled the requirements of my client with this WebService.

As a Computer Science engineer, I've learnt at school that to prove my skills, I need to build complicated solutions.

For me, it's like to shot in your foot: Yes, you prove your value, but finally, you need to change the business logic several times, you need to maintain code.

If it's very complicated at the beginning, few chances it will be simpler at the end.

For example, see the Web frameworks: more and more layers to try to assist developers, but, finally, not really sure that everybody saves time in the story.

BTW, I want to publish a more complete version of this daemon used in the benchmark but almost nobody can test easily, the complete daemon use Panoramisk to discuss with Asterisk. The setup is a little bit more complicated that a classical Web architecture.

FYI, I've in the pipe a benchmark to compare an AGI daemon built with Panoramisk and xivo-agid.

Regards

Antoine.

Victor Stinner

unread,

Feb 26, 2015, 4:35:52 AM2/26/15

to Aymeric Augustin, Antoine Pitrou, python-tulip

2015-02-26 8:19 GMT+01:00 Aymeric Augustin <aymeric....@polytechnique.org>:
>> IMO, the fact that you get so many errors indicates that something is
>> probably wrong in your benchmark setup. It is difficult to believe that
>> Flask and Django would believe so badly in such a simple (almost
>> simplistic) workload.
>
> If you push concurrency too far — like 5000 threads — I expect
> performance to plummet and that kind of results. I suspect that’s
> the situation Ludovic’s benchmark creates. It's a pathological use
> case for threads.

On Linux, to see the number of threads of a process, you can use the command:

grep ^Threads: /proc/<pid>/status

or:

ls /proc/<pid>/task/|wc -l"

Victor

Ludovic Gasc

unread,

Feb 26, 2015, 4:41:22 AM2/26/15

to Aymeric Augustin, Antoine Pitrou, python-tulip

I don't understand you: To my knowledge, in my benchmark, I don't use threads, only processes.

Moreover, what does it mean "too far" and "pathological use" ?

What's the difference between my benchmark and a server that receives a lot of requests ?

For you, this use case doesn't happen on production ? Or you maybe you have a tip to avoid that ?

INADA Naoki

unread,

Feb 26, 2015, 6:01:30 AM2/26/15

to Ludovic Gasc, Aymeric Augustin, Antoine Pitrou, python-tulip

> What's the difference between my benchmark and a server that receives a lot
> of requests ?
> For you, this use case doesn't happen on production ? Or you maybe you have
> a tip to avoid that ?

First, I don't use Gunicorn's default (sync) worker for receiving
request from client directly.
It should be used behind nginx or similar buffering reverse proxy.

Second, I don't use Gunicorn's default worker for high load.
Nginx and uWSGI via unix domain socket is much faster than gunicorn's
sync worker.

Last, to benchmark classical web stack on single machine, `wrk -c200`
is too high.
Concurrent connection and concurrent request is different at all.

--
INADA Naoki <songof...@gmail.com>

Ludovic Gasc

unread,

Feb 26, 2015, 6:38:09 AM2/26/15

to INADA Naoki, Aymeric Augustin, Antoine Pitrou, python-tulip

On 26 Feb 2015 12:01, "INADA Naoki" <songof...@gmail.com> wrote:
>
> > What's the difference between my benchmark and a server that receives a lot
> > of requests ?
> > For you, this use case doesn't happen on production ? Or you maybe you have
> > a tip to avoid that ?
>
> First, I don't use Gunicorn's default (sync) worker for receiving
> request from client directly.
> It should be used behind nginx or similar buffering reverse proxy.
>
> Second, I don't use Gunicorn's default worker for high load.
> Nginx and uWSGI via unix domain socket is much faster than gunicorn's
> sync worker.

I'll try to test as much as possible scenarios this weekend, at least with nginx.

Guido van Rossum

unread,

Feb 26, 2015, 12:49:46 PM2/26/15

to Ludovic Gasc, Aymeric Augustin, Antoine Pitrou, python-tulip

On Thu, Feb 26, 2015 at 1:41 AM, Ludovic Gasc <gml...@gmail.com> wrote:

[...] To my knowledge, in my benchmark, I don't use threads, only processes.

Maybe that's the reason why the benchmark seems so imbalanced. Processes cause way more overhead. Try comparing async I/O to a thread-based solution (which is what most people use).

--

--Guido van Rossum (python.org/~guido)

Ludovic Gasc

unread,

Feb 26, 2015, 7:12:47 PM2/26/15

to Guido van Rossum, Aymeric Augustin, Antoine Pitrou, python-tulip

Hi all,

I'm trying to address all remarks before to publish that on my blog.

Please open your chakras, informations I've added right now in the e-mail are work in progress, I'm not an expert of unix socket, meinheld nor uwsgi, certainly to change a config, it should change values:

Change kernel parameters is a cheat: No, it isn't a cheat, most production applications recommend to do that, not only for benchmarks. Example with:

PostgreSQL: https://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server shared_buffers config (BTW, I've forgotten to push my kernel settings for postgresql, it's now in the repository)
Nginx: http://wiki.nginx.org/SSL-Offloader section preparation

Debug mode on daemons: All daemons was impacted, I've disabled that. I've relaunched localhost benchmark on /agents endpoint, I've almost the same values, certainly because I've already disabled logging globally in my benchmarks. If somebody could confirm my diagnostic ?
wrk/wrk2 aren't good tool to benchmark HTTP, its can hit too much: It's the goal of a benchmark to hit highest as possible, isn't it ? FYI, almost serious benchmark reports on the Web use wrk.
Keep-alive: I've installed a nginx on the server and I use UNIX socket as suggested. I use meinheld (apparently the best async worker for gunicorn) with this command line:

gunicorn -w 16 -b 0.0.0.0:8000 -k meinheld.gmeinheld.MeinheldWorker -b unix:/tmp/flask.sock --backlog=10240 application:app

I don't relaunch everything, only some agents tests locally and with distant. It's more catastrophic than without nginx when I hit with the 4000 req/s. In fact, I've a lot of 404 errors about socket file BUT, it isn't a config file error, because, at the beginning of HTTP keep-alive connection, I received my JSON document. I can send a wireshark capture if you don't believe me.
If you have a clue, I'm interested in. I'll try tomorrow to test on a server without any system modifications, I want to find if I've an issue in my system setup, or it's the reality. I'll also test more intermediary values to try to find a better scenario for Django/Flask.

But for your information, with agents webservices on the network, with only 50 connections opened in the same time, I've no more error with Flask.
This is the values:

For Nginx + gunicorn + meinheld (with keep alive):

axelle@GMLUDO-XPS:~$ wrk -t12 -c50 -d30s http://192.168.2.100:18000/agents
Running 30s test @ http://192.168.2.100:18000/agents
12 threads and 50 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 74.93ms 9.73ms 153.97ms 79.89%
Req/Sec 52.98 5.56 73.00 68.97%
19234 requests in 30.00s, 133.39MB read
Requests/sec: 641.08
Transfer/sec: 4.45MB

For API-Hour:

axelle@GMLUDO-XPS:~$ wrk -t12 -c50 -d30s http://192.168.2.100:8008/agents
Running 30s test @ http://192.168.2.100:8008/agents
12 threads and 50 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 12.82ms 8.81ms 95.62ms 69.17%
Req/Sec 357.40 139.34 0.92k 71.68%
118928 requests in 30.00s, 745.16MB read
Requests/sec: 3964.59
Transfer/sec: 24.84MB

https://github.com/Eyepea/API-Hour/tree/master/benchmarks/etc

With the test with 10 queries by seconds, I've almost same value for latency with Gunicorn directly:

$ wrk2 -t10 -c10 -d30s -R 10 http://192.168.2.100:18000/agents
Running 30s test @ http://192.168.2.100:18000/agents
10 threads and 10 connections
Thread calibration: mean lat.: 22.676ms, rate sampling interval: 51ms
Thread calibration: mean lat.: 22.336ms, rate sampling interval: 52ms
Thread calibration: mean lat.: 22.409ms, rate sampling interval: 50ms
Thread calibration: mean lat.: 21.755ms, rate sampling interval: 48ms
Thread calibration: mean lat.: 21.366ms, rate sampling interval: 49ms
Thread calibration: mean lat.: 21.304ms, rate sampling interval: 48ms
Thread calibration: mean lat.: 21.398ms, rate sampling interval: 44ms
Thread calibration: mean lat.: 20.078ms, rate sampling interval: 43ms
Thread calibration: mean lat.: 18.530ms, rate sampling interval: 41ms
Thread calibration: mean lat.: 19.120ms, rate sampling interval: 41ms
Thread Stats Avg Stdev Max +/- Stdev
Latency 20.09ms 3.45ms 28.16ms 64.00%
Req/Sec 0.99 4.51 25.00 95.35%
300 requests in 30.01s, 2.08MB read
Requests/sec: 10.00
Transfer/sec: 71.00KB

uWSGI: I continue to use Nginx for keepalive with Flask and Django. It works when I call directly via HTTP, but not via unix sockets, if somebody has a working config file, I'm interested in.
Use threads instead of workers: I'll test tomorrow because I really need to sleep.

Don't worry, I'll relaunch full-monty benchmarks this week-end with all details.

--
Ludovic Gasc

INADA Naoki

unread,

Feb 26, 2015, 8:38:40 PM2/26/15

to Ludovic Gasc, Guido van Rossum, Aymeric Augustin, Antoine Pitrou, python-tulip

> For Nginx + gunicorn + meinheld (with keep alive):
>
> axelle@GMLUDO-XPS:~$ wrk -t12 -c50 -d30s http://192.168.2.100:18000/agents
> Running 30s test @ http://192.168.2.100:18000/agents
> 12 threads and 50 connections
> Thread Stats Avg Stdev Max +/- Stdev
> Latency 74.93ms 9.73ms 153.97ms 79.89%
> Req/Sec 52.98 5.56 73.00 68.97%
> 19234 requests in 30.00s, 133.39MB read
> Requests/sec: 641.08
> Transfer/sec: 4.45MB
>

Good! No errors, and max latency < (avg latency * 10).
I feel it's stable setup.
Thank you for trying meinheld.

Ludovic Gasc

unread,

Mar 1, 2015, 6:11:02 PM3/1/15

to Guido van Rossum, Aymeric Augustin, Antoine Pitrou, python-tulip

Hi,

About tests with a standard server with the same hardware and without any kernel customizations, I've the same proportion between tests, but with less performance for all daemons.

The only notable thing is that with API-Hour, I'd around 50 timeouts by round instead of 0 with my customized server.

About Nginx as frontend and process/threads comparison, I must finish to implement unix sockets and threads worker in API-Hour before to test.