dnspython performance issues with apache + wsgi +asyncio + dnspython

134 views
Skip to first unread message

Dr. Benjamin Schoenbach

unread,
Jan 18, 2022, 2:36:19 AM1/18/22
to dnspython-users
Hi guys,

We are experiencing performance issues within our pythondns based web application. We use apache + wsgi as basic webserver framework to server the application. The app has to process several hundred DNS requests per second concurrently (offering multiple client connections). Currently, we are facing bottleneck within UDP based DNS query requests (less than 20qps within 32 parallel client connections). Below you will find our code which performs the DNS queries (asyncio backend is used).

DNS query code:
https://gitlab.com/teamdns/nastng/-/blob/master/nast/queries.py

Complete gitlab project:
https://gitlab.com/teamdns/nastng

Question 1:
Does the code implementation above fit to achieve maximum throughput in dnspython based queries? Is there a better way (different backend framework like trio etc.)

Question 2:
Do you have same experience with performance on concurrent DNS queries on pythondns? Is there any benchmark data available?





Bob Halley

unread,
Jan 18, 2022, 9:49:03 AM1/18/22
to dnspython-users
Dnspython is not blazing fast in any sense, but what you are reporting seems unusually slow.   Are you saying 20 UDP qps total, or 20 qps on each of the 32 connections?  Also, is the process doing the querying running at 100% of one CPU?  On my mac laptop, I can do about 3700 DNSKEY queries per second, or 2500 qps for an NXDOMAIN with proof to a local BIND nameserver using a technique similar to yours.  I wasn't using giant RSA keys, but rather shorter elliptic curve keys.  When I do this, it comes close to maxing out the CPU.  Spreading the work across multiple python processes would increase throughput.    I used trio for my test at first, as that is my preferred async library at the moment.  I also tried with asyncio using tasks like you did, and got similar performance to trio, though when I had a very large number of tasks (e.g. 10,000) at once, the task wait seemed never to return though all I/O had stopped.  Not sure what's going on there.

My advice would be to look for more detail about exactly what is or isn't happening.  If you are not CPU-bound in dnspython, it may be there is some other factor.  For example, you might need to rule out being rate-limited by the authoritative servers you are querying.

Dr. Benjamin Schoenbach

unread,
Jan 19, 2022, 2:24:26 AM1/19/22
to dnspython-users
Well, the mentioned qps are in total. This means with 32 simultaneous open connections the server performance decreases to 20 qps in total (ultra slow). The app is running in docker container and reaches 180% load on a virtual machine with 2vCPUs. We investigated a lot overhead in socket handling due to multiple creation and closing of sockets within send procedures. Therefore we currently try to improve this by keeping the sockets open (i.e one common open socket for all queries in terms of UDP send). Lets see if this leads to a higher performance.....If anybody has some other ideas on the queries code above you are highly welcome. 

I also found this discussion regarding performance issues on pythondns (using trio, curio)....unfortunately asyncio is missing.
https://github.com/python-trio/trio/issues/1595

Dr. Benjamin Schoenbach

unread,
Jan 20, 2022, 12:48:31 AM1/20/22
to dnspython-users

Bob Halley

unread,
Jan 20, 2022, 8:02:22 AM1/20/22
to dnspython-users
Your Queries class run() method directly invokes asyncio.run(), which is going to block until the run loop finishes and also incur the costs of setting up and tearing down asyncio's infrastructure.  Usually an asynchronous application has a single run loop for the whole program and everything is coroutines.  Depending on how the rest of your program is structured, this may be causing you to have only one set of Queries running at a time.

Dr. Benjamin Schoenbach

unread,
Jan 21, 2022, 6:30:44 AM1/21/22
to dnspython-users

thx for the hint. Yes the usage of multiple asyncio.run() should be optimized...... But we also tried a synchronized threaded implementation of Queries class (avoiding any asyncio stuff, just pythondns)...it performs ~ 25% better than with asyncio backend usage. However, testing synchronized threaded implementation of Queries class in isolated manner (see code below) we are able to reach ~1500 DNS qps (mixed set of UDP and TCP queries)....But this seems for away from benchmark results I have seen with 5000 - 8000 qps with dnspython....really strange


threaded implementation of Queries class ~ 1500 DNS qps
Reply all
Reply to author
Forward
0 new messages