MySQL connection pooling - preferred method??

527 views
Skip to first unread message

Cal Leeming [Simplicity Media Ltd]

unread,
Jan 24, 2012, 7:11:20 PM1/24/12
to django-d...@googlegroups.com, Harry Roberts
Hi all,

After spending about 30 minutes looking through old tickets, long discussion threads and various blogs, I'm still not clear on the MySQL connection pooling topic.

To quote Russ: "the capability already exists in third party tools, and they're in a position to do a much better job at it than us because it's their sole focus" [3]

Could a core dev (or anyone else with experience on this) clarify which approach is recommended, on the following conditions:

* Safety (should not cause any strangeness with query cache or ORM)
* Performance (should avoid causing Django to open a new database connection on every request)

I found various ways to accomplish this, one of which was to use SQLalchemy[1], another was to stop Django from closing the database connection after each query[2].

I'm hoping this thread will also serve as a final answer for anyone else looking for clarification.

Many thanks

Cal



Cal Leeming [Simplicity Media Ltd]

unread,
Jan 25, 2012, 4:01:22 PM1/25/12
to django-d...@googlegroups.com, Harry Roberts
Damn - no thoughts on this from anyone?

Russell Keith-Magee

unread,
Jan 27, 2012, 9:15:43 PM1/27/12
to django-d...@googlegroups.com, Harry Roberts
Hi Cal,

I'm not exactly sure what it is you're looking for.

The position of the core team has been fairly clear -- there are third
party connection pooling tools that handle connection pooling very
well.

The recommendation of the core team is that you should use these tools.

The alternative is to try an engineer a solution into Django's DB
connection stack. This solution would inevitably be less stable than
one that originates from a project whose sole purpose is implementing
a connection pool.

If you're looking for a recommendation for a connection pooler for
MySQL, that's another matter. Unfortunately, I can't be much help
here; I don't keep on top of developments in the MySQL world, so I
can't comment with any authority.

Yours,
Russ Magee %-)

On Thu, Jan 26, 2012 at 5:01 AM, Cal Leeming [Simplicity Media Ltd]

> --
> You received this message because you are subscribed to the Google Groups
> "Django developers" group.
> To post to this group, send email to django-d...@googlegroups.com.
> To unsubscribe from this group, send email to
> django-develop...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/django-developers?hl=en.

Cal Leeming [Simplicity Media Ltd]

unread,
Jan 28, 2012, 12:52:34 PM1/28/12
to django-d...@googlegroups.com, Harry Roberts
Hi Russ,

Thanks very much for the reply. I guess ultimately my question was "do any of the connection pooling solutions have an impact on the stability/performance of Django, or more importantly, cause any problems with the ORM?"

I had very little luck in finding write ups by anyone on this, and it seems to be a question often asked.

So I'm going to put time aside to test them all myself, then do a full write up about it - I'll reply back to the thread once done.

Cheers

Cal

Cal Leeming [Simplicity Media Ltd]

unread,
Feb 17, 2012, 4:11:57 PM2/17/12
to django-d...@googlegroups.com
Hi all,

Wwe actually put a patch into production about 2 weeks ago, which seems to have reduced the connection count, whilst being stable and not having any inconsistency problems.

# Apparently this will stop many connections to MySQL
from django.core import signals
from django.db import close_connection
signals.request_finished.disconnect(close_connection)

Although it's not connection pooling, it does stop the original problem of lots of connections to the db. 

I'd love to see this as a 'settings.py' option, does anyone else think this would be a good idea?? Something like 'persistent' : True.. maybe?

Cal



Florian Apolloner

unread,
Feb 17, 2012, 4:58:27 PM2/17/12
to django-d...@googlegroups.com


On Friday, February 17, 2012 10:11:57 PM UTC+1, Cal Leeming [Simplicity Media Ltd] wrote:
# Apparently this will stop many connections to MySQL
from django.core import signals
from django.db import close_connection
signals.request_finished.disconnect(close_connection)

This approach has quite a few issues on it's own, eg for postgres if the transaction is broken all following requests will raise a 500. You have to at least reset the connection state to something useable again.

I'd love to see this as a 'settings.py' option, does anyone else think this would be a good idea?? Something like 'persistent' : True.. maybe?

-1, we already have enough of them ;)

Cheers,
Florian

Cal Leeming [Simplicity Media Ltd]

unread,
Feb 17, 2012, 5:08:40 PM2/17/12
to django-d...@googlegroups.com
On Fri, Feb 17, 2012 at 9:58 PM, Florian Apolloner <f.apo...@gmail.com> wrote:


On Friday, February 17, 2012 10:11:57 PM UTC+1, Cal Leeming [Simplicity Media Ltd] wrote:
# Apparently this will stop many connections to MySQL
from django.core import signals
from django.db import close_connection
signals.request_finished.disconnect(close_connection)

This approach has quite a few issues on it's own, eg for postgres if the transaction is broken all following requests will raise a 500. You have to at least reset the connection state to something useable again.

Could you elaborate on this a bit more? And would this affect MySQL?
 

I'd love to see this as a 'settings.py' option, does anyone else think this would be a good idea?? Something like 'persistent' : True.. maybe?

-1, we already have enough of them ;)

Hmm - what about a documentation update, so at least people in the future don't have to go trawling through tons of mailing lists to find this.
 

Cheers,
Florian

--
You received this message because you are subscribed to the Google Groups "Django developers" group.
To view this discussion on the web visit https://groups.google.com/d/msg/django-developers/-/pBSx93aPffIJ.

Javier Guerra Giraldez

unread,
Feb 17, 2012, 5:56:46 PM2/17/12
to django-d...@googlegroups.com
On Fri, Feb 17, 2012 at 4:58 PM, Florian Apolloner
<f.apo...@gmail.com> wrote:
> This approach has quite a few issues on it's own, eg for postgres if the
> transaction is broken all following requests will raise a 500. You have to
> at least reset the connection state to something useable again.

what about not closing the connections, but aborting any pending transaction?

--
Javier

Florian Apolloner

unread,
Feb 17, 2012, 6:00:11 PM2/17/12
to django-d...@googlegroups.com
Hi,


On Friday, February 17, 2012 11:08:40 PM UTC+1, Cal Leeming [Simplicity Media Ltd] wrote:
Could you elaborate on this a bit more? And would this affect MySQL?

Well there isn't much more to it than a "ABORT; DISCARD ALL" at the end of the session to discard changes and clean (abort) and open (broken) transaction. This way you will have a consistent connection state at every request begin. Regarding MySQL: I try to avoid it, so I can't really comment on that.

Hmm - what about a documentation update, so at least people in the future don't have to go trawling through tons of mailing lists to find this.

There is never anything wrong with more documentation, but the current solution isn't useful for other databases I guess. Aside from that, how about databases != 'default' etc…

Cheers,
Florian

Florian Apolloner

unread,
Feb 17, 2012, 6:01:30 PM2/17/12
to django-d...@googlegroups.com
Yes, ABORT + DISCARD should do it for postgres (or ABORT; RESET ALL; SET SESSION AUTHORIZATION DEFAULT if pg < 8.2)

Cheers,
Florian

Anssi Kääriäinen

unread,
Feb 18, 2012, 5:06:54 AM2/18/12
to Django developers
On Feb 18, 1:01 am, Florian Apolloner <f.apollo...@gmail.com> wrote:
> Yes, ABORT + DISCARD should do it for postgres (or ABORT; RESET ALL; SET
> SESSION AUTHORIZATION DEFAULT if pg < 8.2)

Inspired by this thread, I did some work for 3rd party database
connection pooling. What I have is at https://github.com/akaariai/django_pooled.
Quick summary: seems to work, except for Django's tests.

Now, there is a problem regarding connection state initialization.
Django doesn't separate between creating a connection and initializing
its state. All is done in ._cursor(). If the above were to work
reliably, the implementation of ._cursor() should be:
def _cursor(self):
if not self.connection:
self.connection = self.new_connection()
self.initialize_connection()
return CursorWrapper(self.connection.cursor())

Now a pooling connection wrapper could just override new_connection()
in a subclass and everything should work. The connection returned from
the pool would still get properly initialized. This change would make
sense from code-clarity and consistency between backends point of
views, too. So, I think doing this refactoring would be a good idea.

Note that the connection-state initialization problem doesn't really
matter in the normal usage. However, in Django's test suite, where the
connection initialization will do different things depending on
overridden settings (settings.USE_TZ for example) things will break.

The above mentioned change is what I have meant when I have said that
Django should encourage extensibility: create nicely extensible
implementations. They need not be public API.

BTW You should not run ABORT + DISCARD ALL as connection reset string
in PostgreSQL from Python. Two reasons: abort == rollback, and this
means psycopg2 will lose track of transaction state. In addition
DISCARD ALL will reset the connection state, and due to the problem of
not separating new connections and initialization of connection state,
this will mean connection state will be incorrect for second
connection onwards. Just do connection.rollback(). ABORT + DISCARD ALL
is still the right thing to do in external poolers (pgpool2, pgbouncer
etc).

I think what I have should work for MySQL, too. I have tested it for
PostgreSQL and SQLite3, where things seem to work. Except for the
above mentioned state-init problem.

So, anybody interested in connection pooling should in my opinion work
for making Django's backends more extensible, and then creating a 3rd
party connection pooler. What I have might be a good starting point,
or at least it might give some pointers of what to do.

Note that connection pooling in Python for speed reasons does not make
sense. You will get much better results from external pools, which can
view the application as a whole. In-Django pool is limited to one
process at a time view, which isn't good at all. However, there are
some other nice things you could do: reporting of most time consuming/
used queries. Rewrite normal queries to prepared statements/procedure
calls. Track where you have left transactions open. Share connections
in auto-commit mode (this would actually make a _lot_ of sense from
performance standpoint in read-only views). I did some of those in
another pooler experiment: https://github.com/akaariai/django-psycopg-pooled

- Anssi
Reply all
Reply to author
Forward
0 new messages