Model.objects.raw() (#11863)

SeanOC

unread,

Sep 28, 2009, 9:24:40 PM9/28/09

to Django developers

Hello All,

During the Djangocon sprints I started to work on a patch which would
add a nicer interface for dealing with raw SQL queries. While there I
talked to RKM about where it should fit into the ORM API and where in
the code base should live. I've finally gotten around to finishing
the code I wrote during the sprint and have posted a patch to the
related ticket (http://code.djangoproject.com/ticket/11863). You can
also get the code from http://github.com/SeanOC/django.

So far the patch is functional and has some basic unit tests but it is
not documented. I'll be working on the documentation this week. In
the mean time, the unit tests are probably the best place too look for
how to use the code.

If anybody would like to look over the code and provide some feedback
it would be greatly appreciated.

Thanks!

-SeanOC

Alex Gaynor

unread,

Sep 28, 2009, 9:29:34 PM9/28/09

to django-d...@googlegroups.com

So my question, and this is something I've been thinking about a lot
of the proposals lately is: why should this be in Django itself. This
looks to me like something that would work perfectly well outside of
Django itself, indeed it's just a custom manager.

Alex

--
"I disapprove of what you say, but I will defend to the death your
right to say it." -- Voltaire
"The people's good is the highest law." -- Cicero
"Code can always be simpler than you think, but never as simple as you
want" -- Me

Russell Keith-Magee

unread,

Sep 28, 2009, 10:25:13 PM9/28/09

to django-d...@googlegroups.com

On Tue, Sep 29, 2009 at 9:29 AM, Alex Gaynor <alex....@gmail.com> wrote:
>
> On Mon, Sep 28, 2009 at 9:24 PM, SeanOC <sean.b....@gmail.com> wrote:
>>
>> Hello All,
>>
>> During the Djangocon sprints I started to work on a patch which would
>> add a nicer interface for dealing with raw SQL queries. While there I
>> talked to RKM about where it should fit into the ORM API and where in
>> the code base should live. I've finally gotten around to finishing
>> the code I wrote during the sprint and have posted a patch to the
>> related ticket (http://code.djangoproject.com/ticket/11863). You can
>> also get the code from http://github.com/SeanOC/django.
>>
>> So far the patch is functional and has some basic unit tests but it is
>> not documented. I'll be working on the documentation this week. In
>> the mean time, the unit tests are probably the best place too look for
>> how to use the code.
>>
>> If anybody would like to look over the code and provide some feedback
>> it would be greatly appreciated.
>>
>> Thanks!
>>
>> -SeanOC
>> >
>>
>
> So my question, and this is something I've been thinking about a lot
> of the proposals lately is: why should this be in Django itself. This
> looks to me like something that would work perfectly well outside of
> Django itself, indeed it's just a custom manager.

Yes, this *could* be done entirely externally. However, I think it is
a manifestation of a pattern that we actively encourage, so therefore
we should provide nice API support for it.

I've lost count of the number of times I've said "ORM != SQL, so just
use a cursor and write raw SQL". However, "write raw SQL" isn't always
trivial, since you don't get back ORM objects. If you want ORM
objects, you have to:
(1) know about the trick of instantiating an object with the
unrolled list version of a cursor, and
(2) ensure that you've got all your columns specified in the right order.

We can fix (1) with some documentation, but (2) is one of those nasty
implementation details that you don't know you've got right until
everything blows up in your face. There's also an edge case where two
versions of a table might end up with different column orders due to
applying different migration strategies on each version. If this
happens, then the cursor approach fails because there is no consistent
column order.

If we can provide API-level support to make it easier to write custom
SQL, I think we should do so.

Yours,
Russ Magee %-)

Eric Florenzano

unread,

Sep 28, 2009, 11:23:33 PM9/28/09

to Django developers

> So my question, and this is something I've been thinking about a lot
> of the proposals lately is: why should this be in Django itself.

I couldn't agree with your sentiment more, in the whole. In fact, to
me, until now all of the proposals aside from Simon's seem better
outside of Django than inside of it.

That being said, this proposal is actually something that I think fits
well within Django core itself, as it really is a logical extension of
the core functionality of the ORM.

Thanks,
Eric Florenzano

David Reynolds

unread,

Sep 29, 2009, 4:27:27 AM9/29/09

to django-d...@googlegroups.com

Russ,

On 29 Sep 2009, at 03:25, Russell Keith-Magee wrote:

> (1) know about the trick of instantiating an object with the
> unrolled list version of a cursor, and

Any chance you could expand upon this?

--
David Reynolds
da...@reynoldsfamily.org.uk

Simon Willison

unread,

Sep 29, 2009, 5:06:19 AM9/29/09

to Django developers

On Sep 29, 2:24 am, SeanOC <sean.b.ocon...@gmail.com> wrote:
> During the Djangocon sprints I started to work on a patch which would
> add a nicer interface for dealing with raw SQL queries. While there I
> talked to RKM about where it should fit into the ORM API and where in
> the code base should live. I've finally gotten around to finishing
> the code I wrote during the sprint and have posted a patch to the
> related ticket (http://code.djangoproject.com/ticket/11863). You can
> also get the code fromhttp://github.com/SeanOC/django.

I'm a big fan of adding this feature - I firmly believe we should be
encouraging people to roll their own SQL where necessary (it's the
ultimate answer to complaints about missing features in the ORM), and
I don't think just telling them to instantiate their own cursor is a
good enough answer.

It's worth talking to Jacob about this - last year he had something
like this in a private branch, but I don't think he ever released it.

Cheers,

Simon

Russell Keith-Magee

unread,

Sep 29, 2009, 5:30:03 AM9/29/09

to django-d...@googlegroups.com

On Tue, Sep 29, 2009 at 4:27 PM, David Reynolds
<da...@reynoldsfamily.org.uk> wrote:
>
> Russ,
>
> On 29 Sep 2009, at 03:25, Russell Keith-Magee wrote:
>
>> (1) know about the trick of instantiating an object with the
>> unrolled list version of a cursor, and
>
>
> Any chance you could expand upon this?

Sure. Assume a simple model:

class Author(Model):
name = CharField()
age = IntegerField()

Now, in your code, get a cursor and issue a SQL statement:

from django.db import connection.
cursor = connection.cursor()

cursor.execte('SELECT id, name, age FROM myapp_author')
result = cursor.fetchone()

At this point, result holds a tuple containing (id, name, age). You
can instantiate an instance of Author by unrolling this tuple:

instance = Author(*result)

if you want multiple instances, use cursor.fetchall():

results = cursor.fetchall()
instances = [Author(*result) for result in results]

You need to be a little careful with the fetchall version because you
could end up with a _lot_ of results - but that's really just the
standard cursor usage warning.

The real caveat: the order of the columns you SELECT *must* match the
order in which the fields are specified in the model. For example, if
you made the following SQL query:

cursor.execute('SELECT id, age, name FROM myapp_author')

the query will work fine, but you'll get a TypeError when you create
the object because 'age' can't be coerced into a string. If the two
fields that are out of order are the same data type, you won't get any
errors at all - you'll just get a badly represented instance. Hilarity
ensues.

This trick is exactly what Django does internally when it constructs
object instances. However, in the Django internals, it is a completely
automated process - Django issues the query and parses the results, so
there's no risk of getting the column order wrong.

There are some ways to work around the column ordering problem. For
example, you can interrogate the cursor to get the name of the columns
that have been returned. This is what Sean has done in his patch to
make the raw() call a little more robust. This code is completely
generic; hence the interest in adding this to core.

Yours,
Russ Magee %-)

Jeremy Dunck

unread,

Sep 29, 2009, 11:02:46 AM9/29/09

to django-d...@googlegroups.com

On Tue, Sep 29, 2009 at 4:30 AM, Russell Keith-Magee
<freakb...@gmail.com> wrote:
...

> This trick is exactly what Django does internally when it constructs
> object instances. However, in the Django internals, it is a completely
> automated process - Django issues the query and parses the results, so
> there's no risk of getting the column order wrong.

Code here; note the treatment of .defer'd querysets in a branch that
uses **dict as well.
http://code.djangoproject.com/browser/django/trunk/django/db/models/query.py#L950

Simon Willison

unread,

Sep 29, 2009, 1:10:21 PM9/29/09

to Django developers

On Sep 29, 2:24 am, SeanOC <sean.b.ocon...@gmail.com> wrote:

> During the Djangocon sprints I started to work on a patch which would
> add a nicer interface for dealing with raw SQL queries. While there I
> talked to RKM about where it should fit into the ORM API and where in
> the code base should live. I've finally gotten around to finishing
> the code I wrote during the sprint and have posted a patch to the
> related ticket (http://code.djangoproject.com/ticket/11863). You can
> also get the code fromhttp://github.com/SeanOC/django.

I've had a bit of a think about this, and there are a couple of things
I'd be very keen on seeing supported by this feature. I have a nasty
feeling they won't be that straight forward though.

The first is that I'd like it to be compatible with deferred loading
of model attributes - I haven't looked in to it, but my hunch is that
this won't be too hard (it might even work in its present form without
any further changes).

The second one is probably a lot harder - it feels to me like this
feature would be much more useful if it could perform select_related
style graph population. One of the key use-cases for raw SQL is
improving performance - writing queries which perform better than the
ORM (SQL optimisations, or queries that hit an alternative view or use
a stored procedure or similar). It's very plausible that a developer
might want to write a custom query that populates a full graph of
objects, just like select_related() does. It would be really useful if
the QuerySet.raw() method let them do that.

I haven't looked at how much work it would be to support graph
population - it will certainly complicate the user-facing API, and it
might require a bunch of work at the ORM level. If it's just too much
hassle I think the feature would be worth including without it, but
it's certainly worth some further investigation.

Cheers,

Simon

SeanOC

unread,

Sep 29, 2009, 2:02:17 PM9/29/09

to Django developers

> The first is that I'd like it to be compatible with deferred loading
> of model attributes - I haven't looked in to it, but my hunch is that
> this won't be too hard (it might even work in its present form without
> any further changes).

The current implementation doesn't support this. It will actually
raise an exception if not all of the fields have been provided. For
now I was looking to get something basic out there to get the ball
rolling. Once I have the docs for the current implementation done
I'll look into what it will take to do this.

>
> The second one is probably a lot harder - it feels to me like this
> feature would be much more useful if it could perform select_related
> style graph population. One of the key use-cases for raw SQL is
> improving performance - writing queries which perform better than the
> ORM (SQL optimisations, or queries that hit an alternative view or use
> a stored procedure or similar). It's very plausible that a developer
> might want to write a custom query that populates a full graph of
> objects, just like select_related() does. It would be really useful if
> the QuerySet.raw() method let them do that.
>
> I haven't looked at how much work it would be to support graph
> population - it will certainly complicate the user-facing API, and it
> might require a bunch of work at the ORM level. If it's just too much
> hassle I think the feature would be worth including without it, but
> it's certainly worth some further investigation.
>

This one also isn't implemented right now. I've had a few thoughts on
how it could be done but they all result in a somewhat ugly API.

The least ugly option I've thought of so far would be to accept an
optional parameter called related_available which would expect a tuple
of tuples. The inner tuple(s) would include a prefix for the columns
with related data and the name of the related field.

I definitely would be interested in hearing other suggestions.

-Sean

mrts

unread,

Oct 2, 2009, 7:35:08 AM10/2/09

to Django developers

Wishful thinking follows.

It would be awesome if one could mix ordinary QuerySet methods
with raw() (or, rather, raw_extra(), see below for that).

Assuming the following models:

class Foo(models.Model):
name = models.CharField(max_length=255)

class FooDates(models.Model):
foo = models.ForeignKey(Foo)
start = models.DateField()
end = models.DateField()

I for one would be definitely struck with awe if
I could write something in the lines of:

RAW_ORDER_BY = """
ORDER BY (SELECT MIN("start")
FROM "myapp_foodates"
WHERE "myapp_foodates"."start" >= %s
AND "myapp_foo.id" = "myapp_foodates"."foo_id")
"""

q = Foo.objects.filter(
foodate__end__gte=datetime.date.now())\
.raw(RAW_ORDER_BY, params=(datetime.date.now(),))\
.distinct()

and q.query.as_sql() would look as follows:

SELECT DISTINCT ...
FROM "myapp_foo"
INNER JOIN "myapp_foodates"
ON ("myapp_foo"."id" = "myapp_foodates"."foo_id")
WHERE "myapp_foodates"."end" >= "2009-10-02"
ORDER BY (
SELECT MIN("start")
FROM "myapp_foodates"
WHERE "myapp_foodates"."start" >= "2009-10-02"
AND "myapp_foo"."id" = "myapp_foodates"."foo_id");

However, I assume that achieving this would be extremely
hard with plain raw().

What probably would work, however, is an extra()-like
raw_extra() that would accept raw parametrized strings
for each of `select`, `where` and `order_by` and plainly
replace the corresponding part in the query builder.

I.e. the example above would read:

q = Foo.objects.filter(
foodate__end__gte=datetime.date.now())\
.raw_extra(order_by=RAW_ORDER_BY,
params=(datetime.date.now(),))\
.distinct()

Best,
Mart Sõmermaa

Rajeev J Sebastian

unread,

Oct 2, 2009, 10:21:25 AM10/2/09

to django-d...@googlegroups.com

Sorry for the noise, but ... this stuff is really good. Great job!

One perhaps silly question: I saw the testAnnotations test, and I was
wondering if the annotation could be any possible SQL supported by the
DB.

For e.g.

"SELECT a.*, db_specific_function(a.id) as my_annotation FROM
raw_query_author a ORDER BY a.id"

Regards
Rajeev J Sebastian

Russell Keith-Magee

unread,

Oct 2, 2009, 10:37:29 AM10/2/09

to django-d...@googlegroups.com

On Fri, Oct 2, 2009 at 7:35 PM, mrts <mrts....@gmail.com> wrote:
>
> Wishful thinking follows.
>
> It would be awesome if one could mix ordinary QuerySet methods
> with raw() (or, rather, raw_extra(), see below for that).

While I can see what you're aiming at here, I'm not really a fan of the idea.

Django's ORM is designed to make the simple cases really simple, but
once you move beyond the basics, you really should be looking to use
raw SQL.

When you try to mash segments of raw SQL into Django's internal query
representation, it's very easy to start breaking things. extra() is
already really fragile when it comes to interacting with other query
features. Increasing the complexity of the SQL segments that you can
inject into a query isn't going to improve the situation.

I'm in favor of raw() specifically because it is only trying to make
the output of raw SQL statements a little more palatable. It's
interesting to me specifically because it doesn't try to integrate
with normal Django queries. It makes it easier to use raw SQL without
increasing the complexity of the query engine. Rather than trying to
work out how to cram a query into Django's ORM, we acknowledge that
Django's ORM has strengths, but for complex cases, raw SQL is still
the language of choice.

I think there are some things that we might be able to do to make
writing the raw SQL a little easier - for example, allowing string
substitutions for common parts of the query:

SELECT %(columns)s FROM %(db_table) ...

but even these I could live without. The important part is to provide
a better answer for users that have a query that exceeds the
capabilities of Django's ORM.

Yours,
Russ Magee %-)

Sean O'Connor

unread,

Oct 2, 2009, 10:42:54 AM10/2/09

to django-d...@googlegroups.com

To be honest this seems like something which would be a lot of work with relatively little gain. For this to work raw() would need to change from a relatively simple bit of code which doesn't need to touch all of the complex query code in the ORM to a complex bit of code which needs to deeply modify the behavior of the query code.

For most use-cases where you simply wish to inject bits of SQL into the query, the existing extra() method works well. Otherwise you've probably already written the query you want to use. To use "advanced extra" type tools you'd have to do additional work to break down your new query into something that fits into these functions. With a simple raw all one would need to do is plug their query into the raw() method.

In addition to making raw() much more complicated, this proposal eliminates or dramatically complicates the possibility of implementing raw() for non-relational back-ends.

If somebody really wants this functionality and can provide an implementation it could certainly be looked at more deeply. Otherwise IMHO this would not be a good thing to add to raw() and have little interest in implementing it.

____________________________
Sean O'Connor
http://seanoc.com

mrts

unread,

Oct 2, 2009, 12:35:32 PM10/2/09

to Django developers

On Oct 2, 5:42 pm, "Sean O'Connor" <sean.b.ocon...@gmail.com> wrote:
> To be honest this seems like something which would be a lot of work with
> relatively little gain. For this to work raw() would need to change from a
> relatively simple bit of code which doesn't need to touch all of the complex
> query code in the ORM to a complex bit of code which needs to deeply modify
> the behavior of the query code.

Perhaps I exposed my thinking process too much. Surely .raw() should
remain
as-is and raw_extra() *added*, being orthogonal to raw() both usage-
and
implementation-wise. And it should certainly be labeled as "only use
this
if you know what you're doing."

As for the usefulness, I believe I'm not the only one who builds
queries
dynamically a lot (e.g. according to GET/POST parameters:
if 'foo' in request.GET: q = q.filter(...)) before evaluating it. For
that
use case, raw() does not help -- I'd have to glue together SQL strings
to
build the query manually, like the ORM does, which is obviously
fragile
(think about tracking joins) and defies the purpose of an ORM. Mixing
.filter() with .raw_extra() would be a cleaner way to execute advanced
SQL.

As for the implementation, I took a superficial glance at
django/db/models/sql/query.py before posting and e.g. the ORDER BY
override
looked quite doable in as_sql(). Let me provide an oversimplified,
naive
example:

--- django/db/models/sql/query.py (revision 11594)
+++ django/db/models/sql/query.py (working copy)
@@ -446,7 +446,9 @@
result.append('HAVING %s' % having)
params.extend(h_params)

- if ordering:
+ if self.raw_order_by:
+ result.append(self.raw_order_by)
+ elif ordering:
result.append('ORDER BY %s' % ', '.join(ordering))

But let me stop right here. If it doesn't look sensible to RKM or you,
let it be, lengthy arguments in similar situations have been futile
(and e.g. in case of
http://code.djangoproject.com/ticket/10697#comment:4 and the
corresponding django-dev thread, brought me bad karma which
I obviously don't have any use of :) ).

---

Finally let me express my thanks that you've taken the trouble of
implementing raw(). The prospect of having it in Django is excellent
news.

mrts

unread,

Oct 5, 2009, 6:48:38 AM10/5/09

to Django developers

On Oct 2, 7:35 pm, mrts <mrts.py...@gmail.com> wrote:
> But let me stop right here.

Contradicting myself, I've pursued this further and
implemented raw_override() for order_by in [1] as a proof of
concept. See [2] for what's different and tests [3] for
usage.

I'd say that for order_by, the implementation is simple and
straightforward and not that much more fragile than extra()
(see tests for the second model in [3] for a complex-ish
example).

As for the usefulness, complex ordering can probably be
solved with existing extra() by SELECTing the complex order
criteria, aliasing them and using that alias in order_by.
But perhaps someone likes or needs the idea and wants to
take it further for other parts of the query.

Once again, over and out on this matter,
Mart Sõmermaa

[1] branch: http://github.com/mrts/django/tree/raw_sql_override
[2] diff: http://github.com/mrts/django/commit/dd127debfed070bb8152f10cb5770b216baee3de
[3] tests: http://github.com/mrts/django/blob/raw_sql_override/tests/modeltests/raw_override/models.py

James Bennett

unread,

Oct 5, 2009, 7:42:09 AM10/5/09

to django-d...@googlegroups.com

On Fri, Oct 2, 2009 at 9:37 AM, Russell Keith-Magee
<freakb...@gmail.com> wrote:
> Django's ORM is designed to make the simple cases really simple, but
> once you move beyond the basics, you really should be looking to use
> raw SQL.

Indeed. I look at raw() really as "I know *exactly* the query I want
to run, get the ORM out of my way and just make it easy to get
objects".

--
"Bureaucrat Conrad, you are technically correct -- the best kind of correct."

SeanOC

unread,

Nov 18, 2009, 9:03:24 AM11/18/09

to Django developers

I've posted a new patch to the ticket (http://code.djangoproject.com/
ticket/11863) and have pushed my latest code to
http://github.com/SeanOC/django/tree/ticket11863. All of the issues
which Jacob raised should now be resolved and an initial version of
caching the query results has been implemented. The main remaining
task is to write some documentation. Hopefully I should have a first
pass at having that complete this weekend.

Any feedback or comments would be great.

Thanks!

-Sean

Reply all

Reply to author

Forward