A general way to batch SQL queries in Django

1,567 views
Skip to first unread message

Ram Rachum

unread,
Feb 27, 2015, 7:14:05 AM2/27/15
to django-d...@googlegroups.com
Hi guys,

After asking this question on django-users:


And in several other forums, and not finding a solution, I've reached a conclusion: It would be really helpful to allow batching SQL queries in Django. I should preface by saying I'm not very good at SQL and I don't know how the ORM works exactly, so maybe what I'm saying is wrong in some way, if so please correct me.

I know Django already support bulk insertion, but what I want is to make multiple SQL reads, of various kinds, and have them populate multiple model instances. This is important for lowering the number of database roundtrips that Django does.

I gave one example for when this need arises in the above link; another example is how on every request, Django fetches the session, it then parses the session, looks for the user ID, and then fetches the user. This is a waste. If Django could do a combined query that fetches both the session and the user in SQL, this would be best. I'm not good at SQL, so I'm not sure whether it can be done. But that's just one example of where you can save a roundtrip to the database.

Am I right that Django doesn't currently let you do that? Do you think it's possible to make Django do that? 


Thanks,
Ram.

Josh Smeaton

unread,
Feb 27, 2015, 9:12:52 AM2/27/15
to django-d...@googlegroups.com
The concept of batched SELECT statements doesn't really exist in SQL, unless the relations you're selecting have identical column types. Example:

SELECT name, age_in_years FROM person
UNION ALL
SELECT item_name, quantity FROM item;

The UNION here means combine the results of each query into the one result set. A query like this probably isn't useful though, because you have no way of knowing which row belongs to which relation (or model). UNION queries are useful for specific kinds of queries, not as a general purpose batch method.

Can you see the issue with trying to batch queries that aren't exactly the same?

SELECT a, b FROM T1
UNION ALL
SELECT d, e, f from T2;

What would the result set look like if this was possible?

Your example of session and user could be made to use a single query if the session had a foreign key to the user table. It doesn't though, because you don't *need* users for sessions to work. This is why django has to parse the session data to determine whether or not it has to load a user object.

To answer your questions, no, django doesn't let you batch up multiple select statements. It can't, because that's not how SQL works. Connection pooling should help reduce the initialisation time of creating the connection, and multiple select statements can be sent over that single connection though, which is the closest you're going to get.

Regards,

Ram Rachum

unread,
Feb 27, 2015, 9:18:19 AM2/27/15
to django-developers

Thank you for the explanation. I still wonder though: you asked what the result set would look like, given it's two different tables. What I want is multiple result sets, but in only one roundtrip. Is it possible to send multiple SQL queries to the database in one roundtrip? We'd need the second query to use information from the first for the session usecase, but even if we can't use the information from the first query, it'll still be useful for other cases.

Is it possible to send multiple queries in a single roundtrip?

Sent from my phone.

--
You received this message because you are subscribed to a topic in the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/django-developers/3Lk-HEF16iI/unsubscribe.
To unsubscribe from this group and all its topics, send an email to django-develop...@googlegroups.com.
To post to this group, send email to django-d...@googlegroups.com.
Visit this group at http://groups.google.com/group/django-developers.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/a888777b-de4c-4ef6-b351-afc620601f64%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

aRkadeFR

unread,
Feb 27, 2015, 10:20:26 AM2/27/15
to django-d...@googlegroups.com
What do you mean by a single roundtrip?

You can do multiple queries in a single transaction (thus in a single connection to
the DB). By default (correct me if I'm wrong) with PG for example, it does
all the queries in a single transaction while being in a request.
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-develop...@googlegroups.com.

To post to this group, send email to django-d...@googlegroups.com.
Visit this group at http://groups.google.com/group/django-developers.

Tim Chase

unread,
Feb 27, 2015, 12:24:12 PM2/27/15
to django-d...@googlegroups.com
On 2015-02-27 06:12, Josh Smeaton wrote:
> The concept of batched SELECT statements doesn't really exist in
> SQL, unless the relations you're selecting have identical column
> types. Example:
>
> SELECT name, age_in_years FROM person
> UNION ALL
> SELECT item_name, quantity FROM item;
>
> The UNION here means combine the results of each query into the one
> result set. A query like this probably isn't useful though, because
> you have no way of knowing which row belongs to which relation (or
> model)

For the record, I've done this with

SELECT 'Person' AS description, name, age_in_years FROM person
UNION ALL
SELECT 'Item', item_name, quantity FROM item;

which allows me to distinguish them while gaining everything in one
query/result-set.

-tkc



Tom Evans

unread,
Feb 27, 2015, 2:47:05 PM2/27/15
to django-d...@googlegroups.com
On Fri, Feb 27, 2015 at 3:19 PM, aRkadeFR <con...@arkade.info> wrote:
> What do you mean by a single roundtrip?

He means asking the database server to consider multiple queries
(concurrently?), and return data once all of them are available. In
pseudo code:

people, jobs, cities = DB.fetch_multi(
Person.objects.all(),
Job.objects.all(),
City.objects.all())

MySQL's C API supports executing multiple SQL statements in a single
round trip, the data sets are made available in turn to the client API

http://dev.mysql.com/doc/refman/5.7/en/c-api-multiple-queries.html

however, each statement is executed in turn, not concurrently, so the
only speed up you would have is that there is only one query to parse,
and a few small packets less sent to the database server... The
database connection is not torn down between requests to the database.

The only way I could see it having any non-marginal effect is if you
have high latency to your database server. But then you have a lot of
problems.

Cheers

Tom

Michael Manfre

unread,
Feb 27, 2015, 5:41:54 PM2/27/15
to django-d...@googlegroups.com
Stored procedures, at least with MSSQL, provide another way of returning multiple result sets with a single SQL statement. The queries will be parsed and executed faster due to stored procedures being parsed and compiled when created, instead of when executed. A stored procedure will also allow you to get more complex results with a few simple queries, instead of a single complex query. 

Using stored procedures adds complexity to a Django project and should be used sparingly. From my personal experience, the extra server resources to have more instances running to service requests is usually better than having the extra complexity of using stored procedures in a Django project.

Regards,
Michael Manfre

--
You received this message because you are subscribed to the Google Groups "Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-develop...@googlegroups.com.
To post to this group, send email to django-d...@googlegroups.com.
Visit this group at http://groups.google.com/group/django-developers.

Ram Rachum

unread,
Feb 28, 2015, 4:06:23 PM2/28/15
to django-developers
Hi everyone,

Thank you for your answers.

Arkade: You mentioned transactions. I believe they're not relevant here because every read is still executed synchronously (i.e. Django process waits for database to respond.)

Tom: Thank you for understanding what I'm looking for :)

You said "The only way I could see it having any non-marginal effect is if you have high latency to your database server. But then you have a lot of problems." Well, I'm using Heroku, so a database query that takes 1ms on my local dev machine takes 3ms there, presumably because on my dev machine the database is running on the same machine as the Django server, where in Heroku the database is on a separate machine. I think that many other people are in the same scenario, so allowing to do multiple queries in the same roundtrip would be helpful to many people. 

Here is what I'm currently thinking about: Can we make Django do multiple separate queries (SQL queries are separated by a semicolon right? Sorry for being a noob) and send them one after another before blocking on reading any of them? Only then when everything is sent, start reading and populating querysets. Is this possible to do with Django?


Thanks for your help,
Ram Rachum.

--
You received this message because you are subscribed to a topic in the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/django-developers/3Lk-HEF16iI/unsubscribe.
To unsubscribe from this group and all its topics, send an email to django-develop...@googlegroups.com.

To post to this group, send email to django-d...@googlegroups.com.
Visit this group at http://groups.google.com/group/django-developers.

Shai Berger

unread,
Mar 1, 2015, 3:18:24 AM3/1/15
to django-d...@googlegroups.com

On Saturday 28 February 2015 23:05:56 Ram Rachum wrote:
>
> Here is what I'm currently thinking about: Can we make Django do multiple
> separate queries (SQL queries are separated by a semicolon right? Sorry for
> being a noob) and send them one after another before blocking on reading
> any of them? Only then when everything is sent, start reading and
> populating querysets. *Is this possible to do with Django?*
>
>
Not with Django as it is currently.

For every backend supported by Django, this would require different code.
Sometimes the semantics isn't even equivalent. And from a brief look, it isn't
very clear what kind of limitations this would place on the queries
themselves. For Oracle you can only do this with stored procedures; for PG
there appears to be away to avoid them, with unclear limitations [1]. Either
way, it would be a lot of work and a large maitainability burden, and the
gains don't seem to be all that large.

Sorry,
Shai.

[1] https://blog.dsl-platform.com/multiple-result-sets-alternatives-in-
postgres-3/

Aymeric Augustin

unread,
Mar 1, 2015, 10:53:41 AM3/1/15
to django-d...@googlegroups.com
2015-02-28 22:05 GMT+01:00 Ram Rachum <r...@rachum.com>:
Here is what I'm currently thinking about: Can we make Django do multiple separate queries (SQL queries are separated by a semicolon right? Sorry for being a noob) and send them one after another before blocking on reading any of them? Only then when everything is sent, start reading and populating querysets. Is this possible to do with Django?

The general problem here isn't specific to Django. You want to parallelize some blocking tasks. The easiest solution is to run them in a thread pool. The interface to the database isn't the right level for solving this problem. I think you should tackle it at a higher level.

--
Aymeric.

Guilherme Leal

unread,
Mar 1, 2015, 6:34:03 PM3/1/15
to django-developers
I don't think your problem is related to Django at all, but with the way that SQL work instead. There's no sense in batch a query that return values from your DB (unless you're doing 1 query per connection, witch is a stupid thing to do IMO).

You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-develop...@googlegroups.com.

To post to this group, send email to django-d...@googlegroups.com.
Visit this group at http://groups.google.com/group/django-developers.
Reply all
Reply to author
Forward
0 new messages