Proposal for prepared statements API

258 views
Skip to first unread message

Curtis Maloney

unread,
Mar 25, 2014, 8:53:42 AM3/25/14
to django-d...@googlegroups.com
I've been discussing this idea for some time now, and was reminded of it recently... and akaariai has pushed me to put forward this proposal.

Prepared Statements.

The benefit of prepared statements, for those who don't know, is it avoids repeating the time the Query Planner in the DBMS takes to figure out how it will execute the query.  This can sometimes be quite a significant portion of the query execution time.

API:

My idea includes two new classes, and one new queryset method.

One class is a handle for a prepared statement, which is a callable accepting all the bound parameters of the prepared statement.

The other is used for denoting bound parameters in QuerySet construction - so far we've been calling this Param.

The new QuerySet method would be something like "prepare", which returns a prepared statement class.

So, a sample of constructing a prepared statement would be:

ps = MyModel.objects.filter(foo__lt=Param('a').prepare()

The result is now a callable that accepts one parameter - "a".  To invoke the query:

results = ps(a=1000)


Clearly it's early days yet - I've written no code.  And akaariai has pointed out already there's some corners cases which won't work well with existing behaviours (e.g. foo=None being silently translated to foo__isnull=True), but it's best to get this idea under wider public scrutiny earlier, rather than later.

--
C

Anssi Kääriäinen

unread,
Mar 25, 2014, 10:22:59 AM3/25/14
to django-d...@googlegroups.com
On Tuesday, March 25, 2014 2:53:42 PM UTC+2, Curtis Maloney wrote:
ps = MyModel.objects.filter(foo__lt=Param('a').prepare()

The result is now a callable that accepts one parameter - "a".  To invoke the query:

results = ps(a=1000)


Clearly it's early days yet - I've written no code.  And akaariai has pointed out already there's some corners cases which won't work well with existing behaviours (e.g. foo=None being silently translated to foo__isnull=True), but it's best to get this idea under wider public scrutiny earlier, rather than later.

I like this style of prepared statements. It is explicit, and implementation should be achievable without too much added code complexity. I prefer ps.execute(a=1000) personally, but the exact syntax isn't that important at this stage.

There will be a couple of corner cases that will be hard to solve. The problems are around value preparation during .filter() calls and how certain special values are dealt with. Quickly thinking the value preparation (basically get_prep_lookup() call) shouldn't be that much of a problem - it is currently done during .filter() calls, but it should be possible to defer it to execution time.

The foo=None case is something that likely can't be solved. The problem here is that foo=None translates to WHERE foo IS NULL, while foo=1 translates to WHERE foo = 1. These are syntactically different queries, and thus single prepared statement can't handle both of these. There are also cases where isnull=True/False require different join promotion depending if True or False is supplied. These again can't be handled.

I am OK for just documenting these corner cases. They aren't something that should happen too often. The implementation for prepared statements is relatively straightforward (generate SQL, prepare it once, execute using given values), but if the corner case needs to be handled the implementation will be more complex, likely so much more complex that nobody will implement this feature.

In short: +1 for implementing this with documentation of the corner cases.

 - Anssi

Michael Manfre

unread,
Mar 25, 2014, 11:17:13 AM3/25/14
to django-d...@googlegroups.com
Is falling back to a direct queries being considered? Not all backends support prepared statements or recommend using them. The native mssql drivers support prepared statements, but the other drivers django-mssql supports do not. Also, "In SQL Server, the prepare/execute model has no significant performance advantage over direct execution, because of the way SQL Server reuses execution plans." [1]

Given Microsoft's recommendation against using prepared statements, my motivation is geared toward having this be an truly optional feature that will not prevent using 3rd party apps with django-mssql. I do like the proposal with an explicit .execute() and think documenting corner cases is the sane way to proceed. 

Regards,
Michael Manfre




--
You received this message because you are subscribed to the Google Groups "Django developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-develop...@googlegroups.com.
To post to this group, send email to django-d...@googlegroups.com.
Visit this group at http://groups.google.com/group/django-developers.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/3123fda6-d7b3-46d3-82ec-28ed3003e837%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Jeremy Dunck

unread,
Mar 25, 2014, 12:58:55 PM3/25/14
to django-d...@googlegroups.com
On the None -> IS NULL issue, I presume there are, for any given use case, not that many argument permutations of None and not None passed.  I suggest that the PreparedStatement abstraction map to multiple actual prepared statements, one for each None/not None permutation.  Then when executing, you know the value, and you know to use the (None, not None) statement if given (a=None, b=1) for example.

(This sort of mapping of object to different statements may be needed for fallback on backends that don't support prepareds, as well.)

As for deferring get_prep_lookup until execution, do we lose any pruning or other optimization opportunities for normal QuerySet usage if we go that path?





--
You received this message because you are subscribed to the Google Groups "Django developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-develop...@googlegroups.com.
To post to this group, send email to django-d...@googlegroups.com.
Visit this group at http://groups.google.com/group/django-developers.

Shai Berger

unread,
Mar 25, 2014, 3:36:50 PM3/25/14
to django-d...@googlegroups.com
On IRC, @apollo13 asked some very good questions about the lifecycle of
prepared statements. I would like to elaborate.

Prepared statements usually live on the server, in the context of a session --
which, for Django, means they're only valid in the thread where they were
built; without persistent connections, this would mean "only valid during the
processing of one request", but even with them, connections are sometimes
dropped. So, prepared statement objects cannot really be long-lived and reused
across requests.

So, I suspect the API:

> ps = MyModel.objects.filter(foo__lt=Param('a')).prepare()

Has good chances to become a mini-footgun -- it all-but tells users "prepare
me once, use me forever", but you really can't do that. It will pass tests
(run in a single thread over a single connection), and crash in production --
unless a prepared statement can somehow be aware of the connection state, and
take some measures (I don't currently see which) against it.

The main benefit of prepared statements comes when you can prepare them once
then use them many times (with different parameters); but this is very atypical
for Django apps (within a single request). I think if we want this feature, we
should "brand" its API as something for special uses, and give the right hints
about its link to the database session -- so, something like

qset = MyModel.objects.filter(foo__lt=Param('a'))
ps = connection.prepare(qset)

or maybe even

ps = connection.cursor().prepare(qset)

and also for execution:

results = cursor.execute(ps, a=7)

My 2 cents,

Shai.

Curtis Maloney

unread,
Mar 25, 2014, 6:57:51 PM3/25/14
to django-d...@googlegroups.com
Firstly, I mostly proposed this API in response to others calls for it.  Yes, I'd love to have it, but I'm content to leave it in the "too hard" basket.

That said, it doesn't mean I'm not going to try to solve these issue :)

So:

Firstly -- can we assume anyone using this feature is not a complete novice, and so will take the caveats mentioned into consideration?

Yes, prepared statements are local to their connection/session.  And would be expected to "go away" should the connection drop.  However, in most cases connection drop-out is quite rare [at least, in my experience], and would be even more rare in the case of people using certain connection pooling tools.

Assuming it's not fatal to a transaction, would it be feasible for a prepared statement to work on the assumption is has been prepared, and if it hasn't, prepare itself and continue?  I'd prefer this to, for instance, having PS listen for connection closed signals.

Having to prepare the statement via connection would certainly drive home its binding to it, but I'm not sure many people think about connections as transient [happy to be proven wrong here :)]

Further, as an "expert" feature, would it be unreasonable to limit its use to cases where you know it will benefit, but also only be used in a single connection?  True, this limits its use cases somewhat, but it's still available when of benefit.

As to Jeremy's idea of multiple shapes of a single prepared query based on _potential_ arguments, I disagree.  Much safer [and easier to code] to detail in the documentation that "because we can't guess at values, you can't rely on these ORM shortcuts".

Michael:  Perhaps there are more docs you are referencing, but what you quote there says to me there's little benefit to preparing statements on MSSQL, not that they actively discourage it.  Certainly, I agree there should be the ability for PS to behave as a canned QuerySet in cases where the backend does not support them.  It's safe, easy, and low-surprise.

--
Curtis



--
You received this message because you are subscribed to the Google Groups "Django developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-develop...@googlegroups.com.
To post to this group, send email to django-d...@googlegroups.com.
Visit this group at http://groups.google.com/group/django-developers.

Florian Apolloner

unread,
Mar 25, 2014, 7:10:20 PM3/25/14
to django-d...@googlegroups.com
On Tuesday, March 25, 2014 11:57:51 PM UTC+1, Curtis Maloney wrote:
Firstly -- can we assume anyone using this feature is not a complete novice, and so will take the caveats mentioned into consideration?

Yes
 
Yes, prepared statements are local to their connection/session.  And would be expected to "go away" should the connection drop.  However, in most cases connection drop-out is quite rare [at least, in my experience], and would be even more rare in the case of people using certain connection pooling tools.

Assume connection drops don't exist for now. How can/will Django know if a query is prepared already and when does it have to prepare it, or do you expect me to issue a PREPARE statement everywhere I use it and handle the error if it already exists?! I can easily see this working for management scripts, but not for web requests which are possible routed through pgpool etc… [That said, I have no real experience with those things, but I'd like to know how this can work]

Assuming it's not fatal to a transaction, would it be feasible for a prepared statement to work on the assumption is has been prepared, and if it hasn't, prepare itself and continue?  I'd prefer this to, for instance, having PS listen for connection closed signals.

See above, how can you reliable determine if this assumption holds up?

Further, as an "expert" feature, would it be unreasonable to limit its use to cases where you know it will benefit, but also only be used in a single connection?  True, this limits its use cases somewhat, but it's still available when of benefit.

What do you mean by "single connection", persistent connections are single connections and as you said above prepared statements are local to the connection/session anyways…

 
As to Jeremy's idea of multiple shapes of a single prepared query based on _potential_ arguments, I disagree.  Much safer [and easier to code] to detail in the documentation that "because we can't guess at values, you can't rely on these ORM shortcuts".

Ditto.

Florian

Curtis Maloney

unread,
Mar 26, 2014, 12:06:48 AM3/26/14
to django-d...@googlegroups.com
On 26 March 2014 10:10, Florian Apolloner <f.apo...@gmail.com> wrote:
On Tuesday, March 25, 2014 11:57:51 PM UTC+1, Curtis Maloney wrote:
Firstly -- can we assume anyone using this feature is not a complete novice, and so will take the caveats mentioned into consideration?

Yes
 
Yes, prepared statements are local to their connection/session.  And would be expected to "go away" should the connection drop.  However, in most cases connection drop-out is quite rare [at least, in my experience], and would be even more rare in the case of people using certain connection pooling tools.

Assume connection drops don't exist for now. How can/will Django know if a query is prepared already and when does it have to prepare it, or do you expect me to issue a PREPARE statement everywhere I use it and handle the error if it already exists?! I can easily see this working for management scripts, but not for web requests which are possible routed through pgpool etc… [That said, I have no real experience with those things, but I'd like to know how this can work]

I had been working from the perspective that as soon as you call prepare() on the QuerySet, it will prepare the statement. 

I can find some bug reports for pgpool to do with unnamed prepares, but [oddly] can't find docs about unnamed prepares.

Even if the first implementation is simply: Provide the API, document the caveats, it's at least a good first step upon which we can test how the various drivers support it, dbs interact, and error conditions explode.
 
Assuming it's not fatal to a transaction, would it be feasible for a prepared statement to work on the assumption is has been prepared, and if it hasn't, prepare itself and continue?  I'd prefer this to, for instance, having PS listen for connection closed signals.

See above, how can you reliable determine if this assumption holds up?

As they say in electricians school: suck it and see. :)
 

Further, as an "expert" feature, would it be unreasonable to limit its use to cases where you know it will benefit, but also only be used in a single connection?  True, this limits its use cases somewhat, but it's still available when of benefit.

What do you mean by "single connection", persistent connections are single connections and as you said above prepared statements are local to the connection/session anyways…

And I would expect the prepared statement to persist between requests in that case.

If I thought we could rely on DB dis/connect signals [maybe we can, I don't know yet] we could teach prepared statements to track that and re-prepare themselves  on first use.

Just throwing ideas out there... seeing which ones excite :)

--
C

Anssi Kääriäinen

unread,
Mar 26, 2014, 4:16:24 AM3/26/14
to django-d...@googlegroups.com
On 03/26/2014 06:06 AM, Curtis Maloney wrote:
 

Further, as an "expert" feature, would it be unreasonable to limit its use to cases where you know it will benefit, but also only be used in a single connection?  True, this limits its use cases somewhat, but it's still available when of benefit.

What do you mean by "single connection", persistent connections are single connections and as you said above prepared statements are local to the connection/session anyways…

And I would expect the prepared statement to persist between requests in that case.

If I thought we could rely on DB dis/connect signals [maybe we can, I don't know yet] we could teach prepared statements to track that and re-prepare themselves  on first use.

Just throwing ideas out there... seeing which ones excite :)
Django must know which connections have which statements prepared, and Django must also know the name for each prepared statement.

We can do this by having a map of SQL for the statement -> name of the prepared statement in the connection. On connection close the known statements are cleaned. An alternate way to do this is to have a connection -> prepared statement name map in the statement object itself. In that case connection close signals must be used to clean the map. This might be a cleaner design as connection object doesn't need to know about prepared statements. As a bonus we don't need to map SQL to statement names. The problem being that the same SQL doesn't necessarily mean we are executing the same plan (for example changes to search_path can cause problems here).

So, when you do ps = qs.prepare() you get a prepared statement object which knows the SQL needed for execution. When you do ps.execute() the statement object will check the used connection for existing prepared statements. If none exists, PREPARE is issued. If one exists, then the statement is reused.

I kind of like the ps = connection.prepare(qs) syntax. But this doesn't really solve anything, the connection variable is thread local, so doing ps = connection.prepare(qs) doesn't mean that the prepare is in effect in other threads. We still need to know which connections have which statements prepared. So, nothing is actually solved this way.

As for the foo=None/foo__isnull=True problems - I think we can do one step better than just documenting these cases, we can actually check if the given parameters lead to incorrect plan and raise an error on execute. I don't think we can prepare different plans for different statements - doing so can potentially mean preparing exponential amount of statements. We can't easily prepare different plans on execute either due to the way Django's QuerySet construction works.

 - Anssi

Aymeric Augustin

unread,
Mar 26, 2014, 10:04:31 AM3/26/14
to django-d...@googlegroups.com
2014-03-26 9:16 GMT+01:00 Anssi Kääriäinen <anssi.ka...@thl.fi>:
We can do this by having a map of SQL for the statement -> name of the prepared statement in the connection. On connection close the known statements are cleaned.

Yes, that seems correct.

Since prepared statements are connection-local, it makes sense to store them on the connection object.

--
Aymeric.

VernonCole

unread,
Mar 27, 2014, 4:11:14 AM3/27/14
to django-d...@googlegroups.com
There was a recent query about prepared statements on the db-sig mailing list, too. Apparently thought is being given to adding such functionality to Psycopg. If such functionality is added, it could be useful to support, I suppose. Some SQL engines apparently benefit from the techinque. MS SQL Server is not one of them, so I have not bothered to add support for them to adodbapi.  If I do so, it will use the same api as mxodbc uses now (a copy of the SQL statement is kept with the cursor). [note: my reading of Microsoft's recommendation is not "don't do that", it is "why bother?".]

Pep-0249 is silent on the subject of how to support prepared statements, so any existing systems are likely to do so differently.  In particular, there can be no expectation that there is any support whatsoever for the concept, so it will have to be emulated where not present (-- i.e. almost everywhere).

Curtis Maloney

unread,
Mar 27, 2014, 6:58:48 AM3/27/14
to django-d...@googlegroups.com
On 27 March 2014 19:11, VernonCole <verno...@gmail.com> wrote:
There was a recent query about prepared statements on the db-sig mailing list, too. Apparently thought is being given to adding such functionality to Psycopg. If such functionality is added, it could be useful to support, I suppose. Some SQL engines apparently benefit from the techinque. MS SQL Server is not one of them, so I have not bothered to add support for them to adodbapi.  If I do so, it will use the same api as mxodbc uses now (a copy of the SQL statement is kept with the cursor). [note: my reading of Microsoft's recommendation is not "don't do that", it is "why bother?".]

Pep-0249 is silent on the subject of how to support prepared statements, so any existing systems are likely to do so differently.  In particular, there can be no expectation that there is any support whatsoever for the concept, so it will have to be emulated where not present (-- i.e. almost everywhere).

Yeah... the only places PEP 249 mentions prepare is in execute and executemany... where I'm fairly sure almost no drivers do "prepare and execute".

As for lack of existing support, I suspect there's a degree of "if you build it, they will come" to this... worst case is to simply implement it as raw SQL.

--
C

Ramiro Morales

unread,
Apr 2, 2014, 9:27:46 AM4/2/14
to django-d...@googlegroups.com
On Thu, Mar 27, 2014 at 5:11 AM, VernonCole <verno...@gmail.com> wrote:
> There was a recent query about prepared statements on the db-sig mailing
> list, too. Apparently thought is being given to adding such functionality to
> Psycopg.

This thread on the psycopg mailing list (and the linked, older
discussion and article) of the status might be of interest regarding
the status of PS support in such DB-API driver:

http://www.postgresql.org/message-id/2014031418...@fetter.org

--
Ramiro Morales
@ramiromorales
Reply all
Reply to author
Forward
0 new messages