Dnspython is not blazing fast in any sense, but what you are reporting seems unusually slow. Are you saying 20 UDP qps total, or 20 qps on each of the 32 connections? Also, is the process doing the querying running at 100% of one CPU? On my mac laptop, I can do about 3700 DNSKEY queries per second, or 2500 qps for an NXDOMAIN with proof to a local BIND nameserver using a technique similar to yours. I wasn't using giant RSA keys, but rather shorter elliptic curve keys. When I do this, it comes close to maxing out the CPU. Spreading the work across multiple python processes would increase throughput. I used trio for my test at first, as that is my preferred async library at the moment. I also tried with asyncio using tasks like you did, and got similar performance to trio, though when I had a very large number of tasks (e.g. 10,000) at once, the task wait seemed never to return though all I/O had stopped. Not sure what's going on there.
My advice would be to look for more detail about exactly what is or isn't happening. If you are not CPU-bound in dnspython, it may be there is some other factor. For example, you might need to rule out being rate-limited by the authoritative servers you are querying.