Hi, a Motor user began an interesting discussion on the MongoDB-user list:The summary is this: he's fetching hundreds of URLs concurrently and inserting the results into MongoDB with Motor. Motor throws lots of connection-timeout errors. The problem is getaddrinfo: on Mac, Python only allows one getaddrinfo call at a time. With hundreds of HTTP fetches in progress, there's a long queue waiting for the getaddrinfo lock. Whenever Motor wants to grow its connection pool it has to call getaddrinfo on "localhost", and it spends so long waiting for that call, it times out and thinks it can't reach MongoDB.
Motor's connection-timeout implementation in asyncio is sort of wrong:coro = asyncio.open_connection(host, port)sock = yield from asyncio.wait_for(coro, timeout)The timer runs during the call to getaddrinfo, as well as the call to the loop's sock_connect(). This isn't the intention: the timeout should apply only to the connection.A philosophical digression: The "connection timeout" is a heuristic. "If I've waited N seconds and haven't established the connection, I probably never will. Give up." Based on what they know about their own networks, users can tweak the connection timeout. In a fast network, a server that hasn't responded in 20ms is probably down; but on a global network, 10 seconds might be reasonable. Regardless, the heuristic only applies to the actual TCP connection. Waiting for getaddrinfo is not related; that's up to the operating system.In a multithreaded client like PyMongo we distinguish the two phases:for res in socket.getaddrinfo(host, port, family, socket.SOCK_STREAM):af, socktype, proto, dummy, sa = ressock = socket.socket(af, socktype, proto)try:sock.settimeout(connect_timeout)# THE TIMEOUT ONLY APPLIES HERE.sock.connect(sa)sock.settimeout(None)return sockexcept socket.error as e:# Connection refused, or not established within the timeout.sock.close()Here, the call to getaddrinfo isn't timed at all, and each distinct attempt to connect on a different address is timed separately. So this kind of code matches the idea of a "connect timeout" as a heuristic for deciding whether the server is down.Two questions:1. Should asyncio.open_connection support a connection timeout that acts like the blocking version above? That is, a connection timeout that does not include getaddrinfo, and restarts for each address we attempt to connect to?
2. Why does Python lock around getaddrinfo on Mac and Windows anyway? The code comment says these are "systems on which getaddrinfo() is believed to not be thread-safe". Has this belief ever been confirmed?
It'll go much quicker if you send a PR to the asyncio github project. Thanks!
--Guido (mobile)
W00t! Hopefully when you connect a socket it actually believes the pre-parsed address.
--Guido (mobile)