I wrote a proof of concept patch to add prepared statement support to
Django for the PostgreSQL backend. Note that it's just a hack to see
if this approach could work at all, I know it's badly written. :)
The patch is quite simple and so far has worked with all queries
generated by Django, for a few different applications. It added a
noticeable speed boost, though I haven't done any repeatable
benchmarks. The main advantage is skipping redundant planning stages
of similar queries -- especially for web pages which may involve
complex queries -- many JOINs etc -- but only fetch a page of 25 rows.
Here's the hack:
https://bitbucket.org/intgr/django-queue/src/308dee4377c6/prepared_initial.patch
Screenshot in action: http://ompldr.org/vODIzdQ/django_prepared.png
Now I'm wondering how to approach a solution that would be mergeable
into Django core.
I get the impression that Django core developers have been opposed to
built-in connection pooling. However, prepared statements are mostly
useless without persistent connections. Is there any chance that
prepared statements would be accepted into core or is this a show
stopper? I'm willing to study Django's internals and take on a fair
bit of work to implement this feature.
What are the steps to get there?
This is what I currently think needs to be done:
1. Implement some sort of persistent database connection
2. Add prepared statement support for PostgreSQL, default to on for
persistent connections?
3. Except queries with LIKE, extra(), etc, that might not work well
with prepared statements
4. Fix up CursorDebugWrapper to make prepared statements more
transparent for the developer
Some more ideas:
* QuerySet method to force enable/disable preparing
* QuerySet or Q attributes to force certain literals to be constants
* Skip SQL building for prepared statements, by caching statements
based on sql.Query attributes
* Similar prepared statement support provided by MySQL
Regards,
Marti
Oh, I forgot to mention, I'm using the patch together with the
persistent connections snippet from here:
http://djangosnippets.org/snippets/1707/
Regards,
Marti
To clarify -- we've historically been opposed to adding connection
pooling to Django is for the same reason that we don't include a web
server in Django -- the capability already exists in third party
tools, and they're in a position to do a much better job at it than us
because it's their sole focus. Django doesn't have to be the whole
stack.
In principle, I am (and, I suspect the same is true of most of the
core team) open to any suggestion that exposes a performance feature
of the underlying data store. However, absent of details, it's
difficult to say whether a proposal would gain traction.
If it's a feature that is only of benefit in pooled environments, the
barrier to entry will be higher. However, it might be enough to
stimulate some discussion into how to improve Django's support for
connection pooled environments -- especially if you can demonstrate
some real-world performance benefits.
Acceptance will also depend on the invasiveness on the change you're
proposing -- if you need to gut SQL compilers to make it work in an
elegant way, the level of enthusiasm probably won't be high.
The user-facing API will also matter -- how do you propose to wrap the
underlying feature in a mechanism that makes sense in ORM terms.
It's also worth pointing out that proposals to add APIs are generally
looked upon more favorably than specific feature additions --
especially for 'edge case' improvements. Adding a feature means the
core team inherits a feature we have to look after; adding an APIs
lets others implement and maintain the features, and we just have to
keep the API consistent. I don't know if this approach will be viable
in your case, but it's worth considering.
So, the answer is a definite "maybe" :-) If you can provide some more
details, we might be able to provide a more definitive answer.
Yours,
Russ Magee %-)