Hang when >= 15 MS SQL Server requests have been made

194 views
Skip to first unread message

Brian Paterni

unread,
Dec 28, 2019, 11:12:13 PM12/28/19
to sqlalchemy
Hi,

I seemingly have a problem with flask/socketio/eventlet/sqlalchemy + MSSQL when >= 15 parallel requests have been made. I've built a test app:


that can be used to reproduce the problem. It will hang if >= 15 parallel request have been made to the '/api/busy/mssql' endpoint.

I'm not sure if the root cause of the problem is based in the SQL Server ODBC Driver, sqlalchemy, or eventlet, but I've already paid the microsoft support tax only to be told that there's insufficient evidence to indicate the ODBC driver is at fault. So I thought I would post the issue here to see if anybody would be able to help in pinpointing the code that is at fault with this problem.

Once the test app above is running and has a valid SQL server to query, you should be able to reproduce the hang with

seq 15 | parallel -j0 "curl -s localhost:5000/api/busy/mssql && echo {}"

The hang seems lo occur consistently on the 15th request. This happens even when connection pool_size/max_overflow are adjusted away from their respective default values which leads me to believe that exhausting the connection pool is not the cause of the problem. Though there may be some other reason behind the scenes for the hang occurring at the 15th connection(?)

Thanks very much for any help that can be provided in resolving this issue!
:)

Brian Paterni

unread,
Dec 28, 2019, 11:18:49 PM12/28/19
to sqlalchemy
Plus, if it's any help. the *does* seem resolve itself after ~2 hours. That, or it can be side-stepped by sending a SIGINT signal (ctrl-c) to the flask app when it is hung. the SIGINT seems to kill the 15th (hung) request and allows the app to continue processing other requests successfully.

Mike Bayer

unread,
Dec 29, 2019, 2:17:24 AM12/29/19
to noreply-spamdigest via sqlalchemy


On Sat, Dec 28, 2019, at 11:12 PM, Brian Paterni wrote:
Hi,

I seemingly have a problem with flask/socketio/eventlet/sqlalchemy + MSSQL when >= 15 parallel requests have been made. I've built a test app:


I can't run the test app however 15 seems like your connection pool is set up at its default size of 5 connections + 10 overflow, all connections are being checked out, and none are being returned.

while I strongly recommend against using eventlet with Python DBAPI drivers or SQLAlchemy,  when using eventlet or gevent with SQLAlchemy you need to ensure that a full monkeypatch of  "thereading" / "socket" and everything is performed before anything else is imported.   SQLAlchemy's pool makes use of a port of the Queue class which makes use of threading mutexes all of which will wreck an eventlet application that did not correctly monkeypatch these.

I'm also not familiar with any driver for MSSQL that supports implicit or explicit async.   SQLAlchemy only works with PyODBC or pymssql neither of which have async support that I'm aware of, what driver are you using ?






that can be used to reproduce the problem. It will hang if >= 15 parallel request have been made to the '/api/busy/mssql' endpoint.

I'm not sure if the root cause of the problem is based in the SQL Server ODBC Driver, sqlalchemy, or eventlet, but I've already paid the microsoft support tax only to be told that there's insufficient evidence to indicate the ODBC driver is at fault. So I thought I would post the issue here to see if anybody would be able to help in pinpointing the code that is at fault with this problem.

Once the test app above is running and has a valid SQL server to query, you should be able to reproduce the hang with

seq 15 | parallel -j0 "curl -s localhost:5000/api/busy/mssql && echo {}"

The hang seems lo occur consistently on the 15th request. This happens even when connection pool_size/max_overflow are adjusted away from their respective default values which leads me to believe that exhausting the connection pool is not the cause of the problem. Though there may be some other reason behind the scenes for the hang occurring at the 15th connection(?)

Thanks very much for any help that can be provided in resolving this issue!
:)


--
SQLAlchemy -
The Python SQL Toolkit and Object Relational Mapper
 
 
To post example code, please provide an MCVE: Minimal, Complete, and Verifiable Example. See http://stackoverflow.com/help/mcve for a full description.
---
You received this message because you are subscribed to the Google Groups "sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sqlalchemy+...@googlegroups.com.

Brian Paterni

unread,
Dec 29, 2019, 11:54:59 PM12/29/19
to sqlalchemy
On Sunday, December 29, 2019 at 1:17:24 AM UTC-6, Mike Bayer wrote:

I can't run the test app however 15 seems like your connection pool is set up at its default size of 5 connections + 10 overflow, all connections are being checked out, and none are being returned.

Hm, is the issue with running the app possibly something I could help with? I agree the magic 15 figure seems to be related to connection pool exhaustion. The funny thing is that the app still hangs on the 15th connection even if pool_size+overflow are expanded > 15, but not if the pool_size+overflow < 15 (?!)
 

while I strongly recommend against using eventlet with Python DBAPI drivers or SQLAlchemy,  when using eventlet or gevent with SQLAlchemy you need to ensure that a full monkeypatch of  "thereading" / "socket" and everything is performed before anything else is imported.   SQLAlchemy's pool makes use of a port of the Queue class which makes use of threading mutexes all of which will wreck an eventlet application that did not correctly monkeypatch these.

I'm also not familiar with any driver for MSSQL that supports implicit or explicit async.   SQLAlchemy only works with PyODBC or pymssql neither of which have async support that I'm aware of, what driver are you using ?

The test app is careful to initiate eventlet monkey patching before any additional logic/imports (except for the `import os` required to check if an envvar is set. The problem persists even if the envvar comparison is taken out and eventlet monkeypatching becomes the absolute first action of the test app).


The driver I'm using to connect to SQL Server is the official ODBC driver from Microsoft (mssql+pyodbc):


Apparently it *does* (or should) support async, as it is mentioned several times in the RELEASE_NOTES shipped with the driver. I'm not sure if it's does so implicitly or explicitly though.

Mike Bayer

unread,
Dec 30, 2019, 10:07:45 AM12/30/19
to noreply-spamdigest via sqlalchemy


On Sun, Dec 29, 2019, at 11:54 PM, Brian Paterni wrote:
On Sunday, December 29, 2019 at 1:17:24 AM UTC-6, Mike Bayer wrote:

I can't run the test app however 15 seems like your connection pool is set up at its default size of 5 connections + 10 overflow, all connections are being checked out, and none are being returned.

Hm, is the issue with running the app possibly something I could help with?


sure, if you can turn it into a single file, runnable MCVE with zero depedendencies other than SQLAlchemy, a single MSSQL Python driver (please note that MS's ODBC driver, while necessary, is not a Python driver by itself), and in this case eventlet, I can run that.      However I think you likely should be able to reproduce your issue not using SQLAlchemy at all and simply using pyodbc directly assuming that's the driver you are using.






I agree the magic 15 figure seems to be related to connection pool exhaustion. The funny thing is that the app still hangs on the 15th connection even if pool_size+overflow are expanded > 15, but not if the pool_size+overflow < 15 (?!)

OK, maybe not the pool then.    you probably need to do some debugging to figure out where eventlet is hung.

 


while I strongly recommend against using eventlet with Python DBAPI drivers or SQLAlchemy,  when using eventlet or gevent with SQLAlchemy you need to ensure that a full monkeypatch of  "thereading" / "socket" and everything is performed before anything else is imported.   SQLAlchemy's pool makes use of a port of the Queue class which makes use of threading mutexes all of which will wreck an eventlet application that did not correctly monkeypatch these.

I'm also not familiar with any driver for MSSQL that supports implicit or explicit async.   SQLAlchemy only works with PyODBC or pymssql neither of which have async support that I'm aware of, what driver are you using ?

The test app is careful to initiate eventlet monkey patching before any additional logic/imports (except for the `import os` required to check if an envvar is set. The problem persists even if the envvar comparison is taken out and eventlet monkeypatching becomes the absolute first action of the test app).


The driver I'm using to connect to SQL Server is the official ODBC driver from Microsoft (mssql+pyodbc):



that's not a Python driver, that's the native ODBC driver.  However, MS does recommend pyodbc which is linked in that document.  If you are using pyodbc, as it looks like you discussed here: https://github.com/eventlet/eventlet/issues/538  , it would need to work with eventlet somehow.   pyodbc is not written in Python, it's written in C and does not invoke any async-related native APIs that I'm familiar with (however if I'm wrong feel free to point this out since I didn't review the source), so it cannot be eventlet-monkeypatched, it can only be either adapted to use a non-blocking ODBC API somehow or it can be in a thread pool which means it is not non blocking.



Apparently it *does* (or should) support async, as it is mentioned several times in the RELEASE_NOTES shipped with the driver. I'm not sure if it's does so implicitly or explicitly though.

unfortunately things are not that simple.   PostgreSQL for example supports a non-blocking API.  However, you can't just use psycopg2 out of the box and expect it to work, psycopg2 offers an explicit API for this that has to be adapted, which you can see here: http://initd.org/psycopg/docs/advanced.html#green-support   in order for that API to work with eventlet, you need to use a special eventlet adaptation form here:  https://pypi.org/project/psycogreen/

So for any of this to work with pyodbc, you need a similar layer to be created and I am not familiar with one right now.  Per https://github.com/mkleehammer/pyodbc/issues/348, it's not supported, and the issue was closed with no plans to implement AFAICT.   There seems to be a library aiodbc, but that's for asyncio, not implicit async like eventlet.
 




--
SQLAlchemy -
The Python SQL Toolkit and Object Relational Mapper
 
 
To post example code, please provide an MCVE: Minimal, Complete, and Verifiable Example. See http://stackoverflow.com/help/mcve for a full description.
---
You received this message because you are subscribed to the Google Groups "sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sqlalchemy+...@googlegroups.com.

Brian Paterni

unread,
Feb 2, 2020, 3:36:14 PM2/2/20
to sqlalchemy
On Monday, December 30, 2019 at 9:07:45 AM UTC-6, Mike Bayer wrote:

On Sun, Dec 29, 2019, at 11:54 PM, Brian Paterni wrote:
On Sunday, December 29, 2019 at 1:17:24 AM UTC-6, Mike Bayer wrote:

I can't run the test app however 15 seems like your connection pool is set up at its default size of 5 connections + 10 overflow, all connections are being checked out, and none are being returned.

Hm, is the issue with running the app possibly something I could help with?


sure, if you can turn it into a single file, runnable MCVE with zero depedendencies other than SQLAlchemy, a single MSSQL Python driver (please note that MS's ODBC driver, while necessary, is not a Python driver by itself), and in this case eventlet, I can run that.      However I think you likely should be able to reproduce your issue not using SQLAlchemy at all and simply using pyodbc directly assuming that's the driver you are using.





Which should be a stripped down version of the flask-app I'd posted before.

You are correct in that the problem persists when using only pyodbc, and as a result I've gone ahead and created an issue with that project in order to try and get at the source of this problem: https://github.com/mkleehammer/pyodbc/issues/694
 

Apparently it *does* (or should) support async, as it is mentioned several times in the RELEASE_NOTES shipped with the driver. I'm not sure if it's does so implicitly or explicitly though.

unfortunately things are not that simple.   PostgreSQL for example supports a non-blocking API.  However, you can't just use psycopg2 out of the box and expect it to work, psycopg2 offers an explicit API for this that has to be adapted, which you can see here: http://initd.org/psycopg/docs/advanced.html#green-support   in order for that API to work with eventlet, you need to use a special eventlet adaptation form here:  https://pypi.org/project/psycogreen/


I believe psycogreen (or it's intended behavior) has already been integrated into eventlet: https://github.com/eventlet/eventlet/blob/master/eventlet/support/psycopg2_patcher.py

which is probably the reason postgresql has been implicitly working as expected this whole time.

I agree that some additional hoops may need to be jumped in order for MSSQL to work as expected, but this hang on >= 15 busy connections is strange. Hopefully it is something that can be bandaid'ed in pyodbc until some kind of genuine async interface can be added to the project...
Reply all
Reply to author
Forward
0 new messages