You can expect to encounter all the same problems with greenlets as
you would with threads, such as deadlocks and other race-conditions.
This is just a natural consequence of context switches, even
cooperative ones. In my gevent-based modules, I code my greenlets as
I would code threads, but use gevent primitives for synchronization
(e.g., a gevent semaphore versus posix mutex); this way, when code
changes are made by me or others that make additional calls to
functions that may result in context switch in places where they
didn't happen before, the code still continues to work properly.
In the failure case that I described, I used the Open-Source
DBUtils.PooledDB to provide/manage a mysql database connection pool,
so that I wouldn't have to implement the same thing myself. It's a
pretty nifty utility that has the capability to automatically
reconnect broken connections, do connection pings, integrates well
with SQL transaction support, etc. I am told that it works pretty
well in true multi-threaded apps.
I spent some time trying to debug the problem, but not enough to nail
it down. However, it made me realize that monkey.patch_all(),
monkey.patch_socket(), etc. can cause undesirable side-effects/
failures in other unsuspecting 3rd party packages that the same app
also needs to use. My theory is that somewhere in DBUtils.PooledDB or
pymysql (my intuition leans towards PooledDB) there is code that
doesn't expect a context switch to take place within a single thread,
and so updates some shared data structures and performs blocking
operations (that may cause a gevent context switch when monkey-
patched) in an order that may result in some sort of data structure
integrity problem during concurrency situations. I tried a couple of
work-arounds in my investigation: 1. create a new db connection for
each SQL transaction (not using PooledDB): this made the failure go
away, preserved concurrency, but had an undesirable impact on
performance. 2. not use gevent monkey-patching of the socket module:
this made the sporadic failure go away, but also disabled concurrency
during the execution of SQL. I settled on work-around #2 as that was
the practical solution for my app at the time.