Hi Adam and Nick,
Thanks for the replies and sorry for my delayed response, I've been on a bit of vacation.
Nick, I agree with your findings, I've attached a slightly updated example file (simpler queries) that has the ability to run with MySQLdb directly and with SqlAlchemy and the error doesn't exhibit when running MySQLdb directly, but it does while running SqlAlchemy, even with the NullPool ConnectionPool.
However, as Adam correctly points out, the error message is no longer "No such file or directory", but now "Transport endpoint is not connected". I have an example that I've been running since the start, previously giving the stacktrace I originally posted, that is now giving this "Transport endpoint is not connected" instead. I haven't changed anything on my end, so I have to assume in the weeks that I've been working on this problem something has changed on the google end.
Adam, to your statement that "all available concurrent connections" are saturated. My reading of the documentation stated I should be able to have about 4000 concurrent connections (
https://cloud.google.com/sql/faq#sizeqps). When limiting my "max_concurrent_requests" to 72 I am seeing the attached error rates across 18 instances (since my instances is limited to a max of 6 simultaneous requests). It seems like this should be well below the 4000 connection limit, by a couple of orders of magnitude, and yet I am still seeing this error occur. You also mentioned getting 5k tasks in the queue, if the queues are rate limited/concurrent request limited, the size of the queue backlog shouldn't have any effect on concurrent sql connections correct? Or am I misunderstanding something here?
Also, just to clarify, I don't believe there to a "bad" instance floating around out there. Initially, I did not see instances "recover" with subsequent requests, but a slower rate (of about 6/m) proved that bit of information to be incorrect. You can however see by my attached screenshot, that instances vary in their error rate.
I really appreciate the both of you getting back to me and Nick for sticking with this for so long. Unfortunately, work priorities have come up and since I have a work around with the MySQLdb route, I'm going to just take SqlAlchemy out of the mix and move forward with the working solution. If there are other contributions I can make in my spare time, let me know and I'll happily help. Otherwise, this is me "signing off".
Cheers,
Myles