Concurrent database queries with gevent

2,028 views
Skip to first unread message

Andy

unread,
Apr 11, 2010, 6:49:26 AM4/11/10
to gevent: coroutine-based Python network library
Hi,

I'm interested in using gevent & gevent-mysql to run database queries
concurrently .

I'm using Django and I understand that I need to monkey patch it. But
after that how do I run queries concurrently?

I read the examples on spawn() and joinall() but am still unsure on
how to use them to run the QuerySets and retrieve the results.

Say I have these 2 QuerySets I want to run:

q1 = BlogEntry.objects.filter(name="foo")
q2 = Message.objects.filter(date__gte=datetime(2010, 1, 1))

How do I run them concurrently?

Also this is somewhat outside the scope of gevent, but I'm guessing
some of you must have encountered this before: how do I set up Django
with gevent-mysql or mysql connector instead of the blocking MySQLdb
driver that's officially supported by Django?

Thanks!

mthurlin

unread,
Apr 13, 2010, 5:08:27 PM4/13/10
to gevent: coroutine-based Python network library
> q1 = BlogEntry.objects.filter(name="foo")
> q2 = Message.objects.filter(date__gte=datetime(2010, 1, 1))
>
> How do I run them concurrently?

Spawn two tasks, join on them, and then fetch their results. You could
let one run in the current tasklet, but that doesnt look as pretty :)

t1 = gevent.spawn(lambda : BlogEntry.objects.filter(name="foo"))
t2 = gevent.spawn(lambda :
Message.objects.filter(date__gte=datetime(2010, 1, 1)))

t1.join(); t2.join()

q1, q2 = t1.value, t2.value

You need to make sure both queries are performed in separate
connections though, but that is Django-specific and I have never used
Django myself.
You might also want to have some error handling, see the docs:
http://www.gevent.org/gevent.html.

Andy

unread,
Apr 14, 2010, 7:36:10 PM4/14/10
to gevent: coroutine-based Python network library
Thank you very much.

I don't think Django shares multiple queries over the same database
connection. So that should be find.

Appreciate your help.

Denis Bilenko

unread,
Apr 14, 2010, 11:27:12 PM4/14/10
to gev...@googlegroups.com
Last time I checked it uses threadlocal variable to store the
connections. If you call monkey.patch_all() then it becomes
greenlet-local. So it seems you should be fine. (I would check
explicitly
that you're using different connections in such queries though.)

I wonder if overhead of creating a number of short-lived connections
is worth it though.

Andy

unread,
Apr 15, 2010, 7:32:06 AM4/15/10
to gevent: coroutine-based Python network library
Thanks for the tips.

Can you tell me how to check that different connections are being used
in the queries?

You made a great point about the overhead of creating the connections.
Didn't think of that at all.

How would you recommend to deal with that? Is there a way to have a
pool of connections from which the greenlets can just grab? So instead
of requiring each greenlet to create a new connection, it'd just need
to grab an existing connection from the pool - would that work?

And what would the flow of control be if I don't spawn new greenlets
to run the queries? The database call will return immediately since it
is async, right? So what would the python thread do next after the
database call returns but there is no result yet?

Is there a recommended way to deal with database queries in gevent? At
first I thought it'd be great to run those queries in parallel but now
that you pointed out the overhead of creating database connections I'm
not so sure anymore.

By the way, after a greenlet is spawned, when will it be destroyed?
When join()/joinall() is called? Or they don't get destroyed at all?

Thanks.


On Apr 14, 11:27 pm, Denis Bilenko <denis.bile...@gmail.com> wrote:
> Last time I checked it uses threadlocal variable to store the
> connections. If you call monkey.patch_all() then it becomes
> greenlet-local. So it seems you should be fine. (I would check
> explicitly
> that you're using different connections in such queries though.)
>
> I wonder if overhead of creating a number of short-lived connections
> is worth it though.
>

Markus Thurlin

unread,
Apr 15, 2010, 7:53:24 AM4/15/10
to gev...@googlegroups.com

> You made a great point about the overhead of creating the connections.
> Didn't think of that at all.
>
> How would you recommend to deal with that? Is there a way to have a
> pool of connections from which the greenlets can just grab? So instead
> of requiring each greenlet to create a new connection, it'd just need
> to grab an existing connection from the pool - would that work?
>
Yes, a pool is the way to go.

I guess this would be the simplest pool possible:

pool = gevent.queue.Queue()
for i in range(10):
pool.put_nowait(db.connect())

Then in your greenlets:

conn = pool.get() # will block until there is a free connection available
# Do your stuff with the connection
pool.put_nowait(conn) # put the connection back so it can be used again
somewhere else


> And what would the flow of control be if I don't spawn new greenlets
> to run the queries? The database call will return immediately since it
> is async, right? So what would the python thread do next after the
> database call returns but there is no result yet?
>

No, the whole point of gevent is to provide a blocking API for each
greenlet. So, the query will block (but only the current greenlet) until
the result is available.

> Is there a recommended way to deal with database queries in gevent? At
> first I thought it'd be great to run those queries in parallel but now
> that you pointed out the overhead of creating database connections I'm
> not so sure anymore.
>

This is not really related to gevent. It all boils down to how you get
Django to use a pool of connections and how it supports parallel queries.


> By the way, after a greenlet is spawned, when will it be destroyed?
> When join()/joinall() is called? Or they don't get destroyed at all?
>

It will exit like any regular function. "g.join()" is just a way to
block the current greenlet until the greenlet "g" is done.


/Markus

Andy

unread,
Apr 15, 2010, 9:27:55 AM4/15/10
to gevent: coroutine-based Python network library

On Apr 15, 7:53 am, Markus Thurlin <mar...@esn.me> wrote:

>
> Yes, a pool is the way to go.
>
> I guess this would be the simplest pool possible:
>
> pool = gevent.queue.Queue()
> for i in range(10):
>     pool.put_nowait(db.connect())
>
> Then in your greenlets:
>
> conn = pool.get() # will block until there is a free connection available
> # Do your stuff with the connection
> pool.put_nowait(conn) # put the connection back so it can be used again
> somewhere else
>

Thanks for the pool example and the explanations.

The problem with Django though is that all the database connection
operations are managed by Django, not by the application code.

Several people have added connection pooling to Django by taking
SQLAlchemy's pooling module & adding it to
http://code.djangoproject.com/browser/django/trunk/django/db/backends/mysql/base.py
--
http://jasonrubenstein.blogspot.com/2008/02/quick-and-dirty-database-pooling-in.html
http://node.to/wordpress/2010/02/11/database-connection-pool-solution-for-django-mysql/
http://node.to/wordpress/2010/02/11/database-connection-pool-solution-for-django-mysql/

But all those are for synchronous operations.

How to make your database pool work with django's internal operations
and make it asynchronous?

I'm definitely out of my depth here. Can anyone help?

Reply all
Reply to author
Forward
0 new messages