Compacting SQL queries

1 view
Skip to first unread message

Hawkeye

unread,
Sep 21, 2006, 11:09:54 AM9/21/06
to Django developers
Hi all,

I'm wondering if there's a way to generate more compact queries in the
DB layer.

For example, when I have a model:
===============
class Object(models.Model):
bar = models.ForeignKey(...)
foo = models.CharField(...)
something = models.DateTimeField(...)
===============

Object.objects.get(pk=1) results in the following query (I think):
===============
SELECT
`app_object`.`id`,`app_object`.`bar_id`,`app_object`.`foo`,`app_object`.`something`
FROM `app_object` WHERE (`app_object`.`id` = 1)
===============
(138 chars)

This could be turned into:
===============
SELECT `a`.`id`,`a`.`bar_id`,`a`.`foo`,`a`.`something` FROM
`app_object` as `a` WHERE (`a`.`id` = 1)
===============
(100 chars ~ 25% reduction)

Or in the general case (without .values(), all entries are requested):
===============
SELECT * FROM `app_object` as `a` WHERE (`a`.`id` = 1)
===============
(54 chars ~ 60% reduction)

So, my first question is... is this possible?

My second question is... if we can, is there any real value
(specifically for very large sites)?

Thanks,
--Ben

Malcolm Tredinnick

unread,
Sep 21, 2006, 11:20:07 AM9/21/06
to django-d...@googlegroups.com
On Thu, 2006-09-21 at 08:09 -0700, Hawkeye wrote:
[...]

> Or in the general case (without .values(), all entries are requested):
> ===============
> SELECT * FROM `app_object` as `a` WHERE (`a`.`id` = 1)
> ===============
> (54 chars ~ 60% reduction)

This is generally a bad idea, because it relies on column orders not
changing and/or the client-side knowing what the column orders are. It
also assumes that the only fields in the table are the ones known to the
Django model (which is not always true).

> So, my first question is... is this possible?

Nothing's impossible.

>
> My second question is... if we can, is there any real value
> (specifically for very large sites)?

What is the advantage of doing this? The only saving you're making here
is the amount of data you are sending from your client to the database
server. Since that is generally going to be no worse than over a local
network, the difference between, say, 2K and 1.5K (in the big case) is
really nothing, compared to the time to run the query, return the
results, process the request back to the user, etc. Have you benchmarked
something that shows this really speeds things up noticeably?

I'm -1 on this; a fair bit of work for no real gain.

Regards,
Malcolm

Eric Walstad

unread,
Sep 21, 2006, 11:33:56 AM9/21/06
to django-d...@googlegroups.com
Hawkeye wrote:
> ===============
> (100 chars ~ 25% reduction)
...

> My second question is... if we can, is there any real value
> (specifically for very large sites)?

Our site isn't huge, but it's not small, either (~8M records). Network
bandwidth between the web and data servers is very near the bottom of my
list of possible optimizations. Maybe if I had to use 1200 baud packet
radio between them ;) Ah, the good ol' days.

73's
kc6ntp

Michael Radziej

unread,
Sep 21, 2006, 11:52:09 AM9/21/06
to django-d...@googlegroups.com
Malcolm Tredinnick schrieb:

> What is the advantage of doing this? The only saving you're making here
> is the amount of data you are sending from your client to the database
> server. Since that is generally going to be no worse than over a local
> network, the difference between, say, 2K and 1.5K (in the big case) is
> really nothing, compared to the time to run the query, return the
> results, process the request back to the user, etc. Have you benchmarked
> something that shows this really speeds things up noticeably?

It might even be make the total query time longer, since it could
be harder for the database query parser to find the corresponding
tables for the columns.

Anyway, the sql generation code has some heavy problems, and it's
the wrong time to make it any more complicated without fixing
the bugs. Ever tried stuff like:

models.Thing.objects.extra(
tables=["upsy"],
where=["(thing.upsy=upsy.id and upsy.daisy=1)"]) \
| models.Thing.objects.extra(
tables=["upsy"],
where=["(thing.upsy=upsy.id and upsy.daisy=2")])

Even combining normal query sets with joins with "|" or "&"
doesn't work the right way.

The sql generation code is just too simple to find out about the
joined tables, and it's not possible without a major rewrite to
support rewriting the sql query.

For me, -1

Michael

Reply all
Reply to author
Forward
0 new messages