Last call: #11863 (Model.objects.raw)

24 views
Skip to first unread message

Jacob Kaplan-Moss

unread,
Dec 15, 2009, 5:15:19 PM12/15/09
to django-d...@googlegroups.com
Hey folks --

Forgot to mention it during the sprint this weekend, but I've pushed a
RC patch to #11863, Model.objects.raw(). If anyone's got any feedback,
let it fly. Otherwise, I'll be checking this in in a couple-three days
or so.

Jacob

Jeremy Dunck

unread,
Dec 15, 2009, 6:31:28 PM12/15/09
to django-d...@googlegroups.com
======
InsuficientFields -> InsufficientFields

======
This bit doesn't seem to be true; It seems that missing fields will
raise InsuficientFields instead. Am I reading it wrong?
"
The ``Person`` objects returned by this query will have their ``first_name``
attributes set correctly, but *will not have any other model fields set*. This
means that accessing ``last_name`` or ``birth_date`` will result in an
``AttributeError``.
"

======
RawQuery.__len__ calls _populate_cache twice. I see the comment about
SQLite, but if that was intended, why not just use len(self._cache) to
start with, since _populate_cache does .fetchall()?

======

RawQuery._populate_cache does fetchall(). This is sort of surprising,
since normal QuerySets go out of their way to avoid fetchall.
RawQuerySets are not as lazy as normal querysets in that normal
querysets do fetchmany. If this was intended, it might be worth
pointing out. In fact, I think RawQuerySet.iterator won't do what
people expect.

======
On "admonition:: Model table names" - Person._meta.db_table would have
the value, and it might be better to be more explicit about it. But
it's an "_" API, so maybe we don't want to make it clearer after all.
;-)

======
"You'll use ``%s``"
Is that back-end independent? It looks like it gets evaluated by
whatever backend you're using, and so would depend on DB-API
paramstyle.

======
Typo here:
"It's tempting to write the above query as:: "
>>> query = 'SELECT * FROM myapp_person WHERE last_name = %s', % lname)
->
>>> query = 'SELECT * FROM myapp_person WHERE last_name = %s' % lname
Or did you make it bad syntax on purpose?

======

RawQuery.validate_sql excludes anything but selects, but Oracle uses
comments to do query hinting. Can an Oracle person confirm that those
hints can't start the query SQL? (Not worth holding up landing, of
course.)

======

Ian Kelly

unread,
Dec 15, 2009, 6:38:08 PM12/15/09
to django-d...@googlegroups.com
On Tue, Dec 15, 2009 at 4:31 PM, Jeremy Dunck <jdu...@gmail.com> wrote:
> RawQuery.validate_sql excludes anything but selects, but Oracle uses
> comments to do query hinting.  Can an Oracle person confirm that those
> hints can't start the query SQL?  (Not worth holding up landing, of
> course.)

Hints always immediately follow the SELECT keyword.

http://download.oracle.com/docs/cd/B28359_01/server.111/b28286/sql_elements006.htm#i31713

Jacob Kaplan-Moss

unread,
Dec 15, 2009, 7:54:48 PM12/15/09
to django-d...@googlegroups.com
Thanks for the review, Jeremy.

On Tue, Dec 15, 2009 at 5:31 PM, Jeremy Dunck <jdu...@gmail.com> wrote:
> This bit doesn't seem to be true; It seems that missing fields will
> raise InsuficientFields instead.  Am I reading it wrong?

Ah, good catch. I'd intended to remove this behavior as it's overly
strict -- the whole point of ``raw()`` is that it lets you play fast
and loose with best practices. The code now matches the docs.

> RawQuery._populate_cache does fetchall().  This is sort of surprising,
> since normal QuerySets go out of their way to avoid fetchall.
> RawQuerySets are not as lazy as normal querysets in that normal
> querysets do fetchmany.  If this was intended, it might be worth
> pointing out.   In fact, I think RawQuerySet.iterator won't do what
> people expect.

Yeah, this is annoying: SQLite doesn't support cursor.rowcount until
all the rows have been fetched, so supporting a cheap __len__ is hard.

After thinking about it for a while I've decided just to ditch __len__
and return the raw cursor for __iter__. That's closer to the "raw"
database access anyway. Users can always ``len(list(q))`` if they
must. Nothing in the docs mentioned len() any way, and I can't see it
being all that useful -- as long as you're writing raw SQL, COUNT(*)
is going to be more efficient anyway.

> On "admonition:: Model table names" - Person._meta.db_table would have
> the value, and it might be better to be more explicit about it.  But
> it's an "_" API, so maybe we don't want to make it clearer after all.
> ;-)

I'm leaving it out deliberately -- we've avoided stabilizing _meta (so
far), so until we do (or add an official model reflection API) I'm
leaving it out of the docs.

> "You'll use ``%s``"
> Is that back-end independent?  It looks like it gets evaluated by
> whatever backend you're using, and so would depend on DB-API
> paramstyle.

Nope -- ``connection.cursor()`` returns a ``CursorWrapper`` that
translates query styles into '%s' reguardless (see
django.db.backends.sqlite3.SQLiteCursorWrapper for one example).

> RawQuery.validate_sql excludes anything but selects, but Oracle uses
> comments to do query hinting.

*** boggles ***

(Looks like it's not a problem, though.)


----

New patch uploaded - let me know if you see anything else.

Jacob

Russell Keith-Magee

unread,
Dec 15, 2009, 8:24:33 PM12/15/09
to django-d...@googlegroups.com
Hi Jacob,

A couple of quick notes on the RC2 patch:

* If you have an incomplete selection of fields, the patch currently
marks those fields as None/default values. Couldn't (Shouldn't?) they
be marked as deferred fields?

* Looking slightly forward - what's the integration point for
multidb? One option is to put a using argument on raw::

Person.objects.raw('SELECT ...", using='other')

It would be nice to allow .using(), but while it is easy to allow::

Person.objects.raw('SELECT ...").using('other')

it isn't so easy to allow::

Person.objects.using('other').raw('SELECT ...")

We could jump through some hoops to put raw() on queryset, but raise
exceptions under most uses (i.e., if you've performed any query
modifying operation). However, this is a lot of hoops just to allow an
edge case API use.

Obviously, multidb isn't in scope for this patch, but given the
obvious overlap, I thought I'd ask for opinions.

Other than that, RC2 looks good to me.

Russ %-)

Sean O'Connor

unread,
Dec 16, 2009, 12:16:49 AM12/16/09
to django-d...@googlegroups.com
Big thanks Jacob for picking up my slack and putting the finishing touches on the patch and writing the docs.  Work got crazy and I dropped the ball.  Definitely happy that the work will get completed and put into trunk regardless.

In regard to the deferred fields option, I'll let Jacob speak for his view but I've approached such functionality as "nice to have" for the patch since its not critical to the patch being useful.  Personally I haven't had the time to figure it out and implement it so my original patch didn't include it.

For the multidb approach I'd lean towards the kwargs approach.  Right now the .raw() code is fairly well insulated from the bulk of the ORM and visa versa.  This keeps the raw() code pretty simple and minimizes the opportunities for introducing new bugs in the ORM.  

As far as putting raw() on querysets I'd be pretty against it as well.  It strikes me in a lot of ways as mutli-table inheritance does.  People are really going to want it, until they try and use it in the real world, and realize that it was a really bad idea.  While I'm sure Russ or some others around here could work some awesome magic and get it working after a fashion, I don't think it will ever work the way a new user will expect.  What does performing a raw query on a queryset even mean?  In the end I think adding .raw() to queryset would lead to a much more complicated implementation and more confusion for users.

____________________________
Sean O'Connor
http://seanoc.com



--

You received this message because you are subscribed to the Google Groups "Django developers" group.
To post to this group, send email to django-d...@googlegroups.com.
To unsubscribe from this group, send email to django-develop...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/django-developers?hl=en.



Jeremy Dunck

unread,
Dec 16, 2009, 6:56:44 AM12/16/09
to django-d...@googlegroups.com, django-d...@googlegroups.com
On Dec 15, 2009, at 11:16 PM, "Sean O'Connor"
<sean.b....@gmail.com> wrote:

> In regard to the deferred fields option, I'll let Jacob speak for
> his view but I've approached such functionality as "nice to have"
> for the patch since its not critical to the patch being useful.
> Personally I haven't had the time to figure it out and implement it
> so my original patch didn't include it.

I like the idea of the deferred fields, but if we don't implement it
now, people may come to rely on the AttributeError so that we can't
add deferred later. Perhaps a note in the docs stating our intent to
add deferreds would suffice?


Russell Keith-Magee

unread,
Dec 16, 2009, 7:51:36 AM12/16/09
to django-d...@googlegroups.com
No need for workaround docs - I've just uploaded an RC3 patch that
implements deferred fields.

The one gotcha on this patch is that it now requires that you request
the primary key when you retrieve an object. That is, you can't just
run::

Author.objects.raw('SELECT firstname, lastname FROM author')

You must now include the pk:

Author.objects.raw('SELECT id, firstname, lastname FROM author')

If you don't, you get an exception. Unfortunately, it's an exception
after the SQL has been executed, but that's the only way to know
exactly which columns have been requested.

This is slightly more restrictive than Jacob's RC2 patch - but I think
the RC3 behaviour is preferable. The primary key value is a fairly
essential part of the Django infrastructure. In RC2, if you retrieved
an Author with firstname and lastname, then saved the object, you
would get a new object in the database. RC3 avoids this because the
deferred object has the right primary key.

If the price of avoiding surprises like this is forcing the PK to be
retrieved, I think it's a price worth paying. If you really can't
afford to have the PK retrieved, then I'd argue you aren't retrieving
a Django object; you can still call on raw SQL cursors to accommodate
those use cases.

Yours,
Russ Magee %-)

Anssi Kaariainen

unread,
Dec 16, 2009, 9:19:37 AM12/16/09
to Django developers


On Dec 16, 2:51 pm, Russell Keith-Magee <freakboy3...@gmail.com>
wrote:
> On Wed, Dec 16, 2009 at 7:56 PM, Jeremy Dunck <jdu...@gmail.com> wrote:
> > On Dec 15, 2009, at 11:16 PM, "Sean O'Connor"
One use case where deferred fields aren't so nice is when creating
models which don't have any backing tables in the db. That is, models
with managed = False. These models would be the Django equivalent of
views. In these cases trying to access the field, even for testing if
it is None, would result in db query and an exception. And probably in
aborted transaction, too.

Using raw() as a way to perform view operations (complex joins etc.)
is the first use case I though of when I saw this. Anyways, using
default or None as a value isn't good either. How do you know if you
got that from the DB or not? A nice way to test which fields the model
were populated and marking the non-populated fields as deferred would
be optimal in my opinion. One use case where you don't necessary know
which fields are populated and which ones aren't is when you have
multiple raw() queries defined populating different fields of the
model.

Anssi Kaariainen

Jeremy Dunck

unread,
Dec 16, 2009, 9:28:58 AM12/16/09
to django-d...@googlegroups.com
On Wed, Dec 16, 2009 at 8:19 AM, Anssi Kaariainen <akaa...@cc.hut.fi> wrote:
...

> A nice way to test which fields the model
> were populated and marking the non-populated fields as deferred would
> be optimal in my opinion. One use case where you don't necessary know
> which fields are populated and which ones aren't is when you have
> multiple raw() queries defined populating different fields of the
> model.

I was with you until the end here, where I'm not sure I follow.
Describe the API you're wishing for?

Sean O'Connor

unread,
Dec 16, 2009, 9:34:08 AM12/16/09
to django-d...@googlegroups.com
Nice work Russ!  Got to love when something goes from "nice to have" to "done".

Anssi, I don't think I understand your use case.  Even if an unmanaged model doesn't have a literal table behind it, it needs something which at least resembles a table (i.e. a view) to query against.  Otherwise the ORM will generate errors regardless of the behavior of .raw().

To answer your question, raw() determines what fields are populated by the names of the columns returned by the database cursor.  Accordingly, if you want to pull different fields using different queries, you probably do so using subqueries in one big SQL statement.  As long as the field names returned by the query match the field names in the model or you provide a translations parameter to describe which query fields go to which model fields it should work.

As far as allowing users to define raw queries for each field to allow them to be deferred, this is way outside the scope of this patch.  The goal of .raw() was to provide some relatively simple syntactic sugar to make a common class of raw queries easier.  It was not intended to completely replace all possible use cases for doing raw queries using a cursor.

____________________________
Sean O'Connor
http://seanoc.com


Jacob Kaplan-Moss

unread,
Dec 16, 2009, 9:48:08 AM12/16/09
to django-d...@googlegroups.com
On Wed, Dec 16, 2009 at 6:51 AM, Russell Keith-Magee
<freakb...@gmail.com> wrote:
> No need for workaround docs - I've just uploaded an RC3 patch that
> implements deferred fields.

Sweet! I love it when other people do my work for me...

> The one gotcha on this patch is that it now requires that you request
> the primary key when you retrieve an object.
[...]
> This is slightly more restrictive than Jacob's RC2 patch - but I think
> the RC3 behaviour is preferable.

I agree, and it actually opens up another use for raw() -- fetching
lazy objects where you've just got the ID, say in a materialized view
or whathaveyou.

I'll probably make a change to the docs to emphasize that you need the
primary key a bit more strongly, but I'm quite happy with this change.

Thanks!

Jacob

Anssi Kaariainen

unread,
Dec 16, 2009, 11:02:32 AM12/16/09
to Django developers
On Dec 16, 4:34 pm, "Sean O'Connor" <sean.b.ocon...@gmail.com> wrote:
> Nice work Russ!  Got to love when something goes from "nice to have" to
> "done".
>
> Anssi, I don't think I understand your use case.  Even if an unmanaged model
> doesn't have a literal table behind it, it needs something which at least
> resembles a table (i.e. a view) to query against.  Otherwise the ORM will
> generate errors regardless of the behavior of .raw().

If you have a model which will be populated ONLY using raw queries,
then you don't need a backing view or table. And the use case is
creating something resembling a database view purely in Django.

> To answer your question, raw() determines what fields are populated by the
> names of the columns returned by the database cursor.  Accordingly, if you
> want to pull different fields using different queries, you probably do so
> using subqueries in one big SQL statement.  As long as the field names
> returned by the query match the field names in the model or you provide a
> translations parameter to describe which query fields go to which model
> fields it should work.

But if I don't provide some of the field names in the select clause,
and try to access a field that isn't included in the query, I will get
a ProgrammingError: db table not found.

> As far as allowing users to define raw queries for each field to allow them
> to be deferred, this is way outside the scope of this patch.  The goal of
> .raw() was to provide some relatively simple syntactic sugar to make a
> common class of raw queries easier.  It was not intended to completely
> replace all possible use cases for doing raw queries using a cursor.

I am not suggesting this. What I would like to have is something like
foo.field.is_deferred(). I don't think there is any easy way to test
this currently. This could come handy in a template for example. You
could use the same template for objects fetched with different raw
queries, and skip deferred fields when showing data about the object.

I haven't done any Django coding, but is_deferred() seems something
that I might be able to do. I am not sure if this is something that is
needed, though.

Anssi Kääriäinen

Jeremy Dunck

unread,
Dec 16, 2009, 11:13:24 AM12/16/09
to django-d...@googlegroups.com
On Wed, Dec 16, 2009 at 10:02 AM, Anssi Kaariainen <akaa...@cc.hut.fi> wrote:
...
> I am not suggesting this. What I would like to have is something like
> foo.field.is_deferred(). I don't think there is any easy way to test
> this currently. This could come handy in a template for example. You
> could use the same template for objects fetched with different raw
> queries, and skip deferred fields when showing data about the object.

This won't work, because deferred fields are descriptors, and
accessing foo.field would run the query.

Something you could do is foo.deferred_fields.field_name -> Boolean,
but that seems pretty clunky to me.

Jacob Kaplan-Moss

unread,
Dec 16, 2009, 12:08:47 PM12/16/09
to django-d...@googlegroups.com
On Wed, Dec 16, 2009 at 10:13 AM, Jeremy Dunck <jdu...@gmail.com> wrote:
> This won't work, because deferred fields are descriptors, and
> accessing foo.field would run the query.
>
> Something you could do is foo.deferred_fields.field_name -> Boolean,
> but that seems pretty clunky to me.

You can get at this information now if you really need to::

>>> e = Entry.objects.defer('body')[0]
>>> [f.attname for f in e._meta.fields if f.attname not in e.__dict__]
['body']

But the point is that deferred fields are an optimization. You
shouldn't need to know which fields are deferred: you should be adding
``defer`` as a last-step optimization once you *know* the fields in
question aren't needed.

IOW, why do I need to inspect the ``Entry`` to figure out what's
deferred? I just *told you* what's deferred in the previous line.

Jacob

Alex Gaynor

unread,
Dec 16, 2009, 12:13:14 PM12/16/09
to django-d...@googlegroups.com
On Wed, Dec 16, 2009 at 12:08 PM, Jacob Kaplan-Moss <ja...@jacobian.org> wrote:
> On Wed, Dec 16, 2009 at 10:13 AM, Jeremy Dunck <jdu...@gmail.com> wrote:
>> This won't work, because deferred fields are descriptors, and
>> accessing foo.field would run the query.
>>
>> Something you could do is foo.deferred_fields.field_name -> Boolean,
>> but that seems pretty clunky to me.
>
> You can get at this information now if you really need to::
>
>    >>> e = Entry.objects.defer('body')[0]
>    >>> [f.attname for f in e._meta.fields if f.attname not in e.__dict__]
>    ['body']
>

A better approach is, IMO:

>>> [f.name for f in e._meta.fields if isinstance(e.__class__.__dict__.get(f.attname), DeferredAttribute)]
["body"]

since it more accurately expresses what you're trying to do (also,
it's crazy longer and looks way more serious).


Alex
> But the point is that deferred fields are an optimization. You
> shouldn't need to know which fields are deferred: you should be adding
> ``defer`` as a last-step optimization once you *know* the fields in
> question aren't needed.
>
> IOW, why do I need to inspect the ``Entry`` to figure out what's
> deferred? I just *told you* what's deferred in the previous line.
>
> Jacob
>
> --
>
> You received this message because you are subscribed to the Google Groups "Django developers" group.
> To post to this group, send email to django-d...@googlegroups.com.
> To unsubscribe from this group, send email to django-develop...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/django-developers?hl=en.
>
>
>



--
"I disapprove of what you say, but I will defend to the death your
right to say it." -- Voltaire
"The people's good is the highest law." -- Cicero
"Code can always be simpler than you think, but never as simple as you
want" -- Me
Reply all
Reply to author
Forward
0 new messages