Re: RawQuerySet as subquery when used with an __in filter

1,150 views
Skip to first unread message

Anssi Kääriäinen

unread,
Sep 4, 2012, 1:26:19 PM9/4/12
to Django developers
On 30 elo, 11:46, Alex Hill <a...@hill.net.au> wrote:
> Hi, and thanks for making Django.
>
> At the moment, when passing a RawQuerySet to an __in filter, the raw query
> is evaluated and the result passed as a parameter to the filter's SQL. It
> would be nice to have the option of using the raw SQL in a subquery
> instead, for performance reasons.
>
> I think in this case the rationale for acceptinghttps://code.djangoproject.com/ticket/14733applies. Raw queries already
> require the developer to know what they're doing - this just makes them a
> little more useful.
>
> I can see two ways of implementing this. The first is simple: just require
> that the query only select one column, and implement _as_sql on RawQuerySet
> to simply return the SQL and params. A flag passed to raw() could let a
> developer explicitly indicate that the query should be used as a subquery
> instead of evaluated.
>
> The second way is more complicated but nicer: wrap the raw SQL in another
> SELECT, selecting the Model's primary key. The advantage of this method is
> that the raw query can select many columns, and therefore remain useful in
> other contexts, while still working as expected with __in. This could then
> be the default behaviour, with the current behaviour achieved by wrapping
> the query in list() as is the case for regular queries.
>
> I much prefer the second approach, but I don't think I'm familiar enough
> with the ORM to implement it without a little guidance from someone more
> knowledgeable.
>
> I'd love to hear some feedback on this idea!

Some possible issues with the second approach:
1. It should not be assumed that the primary key is the field needed
in the inner query. We do have foreign keys to other fields, and one
can use this approach without foreign keys at all.
2. If the query is of the form NOT IN, then we need to also filter
out null values, otherwise no results are returned if the raw query
contains a single null (NOT IN is weird...)
3. Does the wrapping of the subquery into (select field from
(raw_query) where field is not null) cause performance problems? It
could be possible that this prevents the DB's optimizer from working
correctly.
4. If the query doesn't select the required column (that is, it is
deferred), we don't have any way to know that. This seems like a
documentation issue, but worth mentioning.

It would be useful to have this feature, so a +1 to this idea. The
above issues do not seem too severe, so I believe it is possible to
make this work.

- Anssi

Alex Hill

unread,
Sep 6, 2012, 4:22:03 AM9/6/12
to django-d...@googlegroups.com
Hi Anssi,

Thanks for your feedback! Responses inline.


  1. It should not be assumed that the primary key is the field needed
in the inner query. We do have foreign keys to other fields, and one
can use this approach without foreign keys at all.

Hmm, yes. Thinking about it, this seems like the biggest obstacle to implementing the query-wrapping approach.

The reason I proposed just selecting the PK was to mirror the behaviour of QuerySet when passed to __in, as happens here:


But since we're talking about raw SQL, we'd want maximum flexibility - so would we need to introduce a ValuesRawQuerySet in order to select arbitrary columns? It seems that would be the approach that most closely mirrors QuerySet.

  2. If the query is of the form NOT IN, then we need to also filter
out null values, otherwise no results are returned if the raw query
contains a single null (NOT IN is weird...)

Yep.
 
  3. Does the wrapping of the subquery into (select field from
(raw_query) where field is not null) cause performance problems? It
could be possible that this prevents the DB's optimizer from working
correctly.

This would need more testing, but Postgres at least optimises the inner query away nicely. These two queries have the same execution plan:

SELECT ... WHERE foo NOT IN (SELECT foo FROM (SELECT * FROM table WHERE ...) WHERE foo IS NOT NULL)
SELECT ... WHERE foo NOT IN (SELECT foo FROM table WHERE ... AND foo IS NOT NULL)
 
Same goes for the equivalent IN queries.

  4. If the query doesn't select the required column (that is, it is
deferred), we don't have any way to know that. This seems like a
documentation issue, but worth mentioning.

Yes, definitely. The docs page where .raw() is introduced will have to explain this feature with a big warning.
 
It would be useful to have this feature, so a +1 to this idea. The
above issues do not seem too severe, so I believe it is possible to
make this work.

Me too!

Thanks,
Alex

Anssi Kääriäinen

unread,
Sep 7, 2012, 5:34:08 AM9/7/12
to Django developers
On 6 syys, 11:22, Alex Hill <a...@hill.net.au> wrote:
> Hi Anssi,
>
> Thanks for your feedback! Responses inline.
>
>   1. It should not be assumed that the primary key is the field needed
>
> > in the inner query. We do have foreign keys to other fields, and one
> > can use this approach without foreign keys at all.
>
> Hmm, yes. Thinking about it, this seems like the biggest obstacle to
> implementing the query-wrapping approach.
>
> The reason I proposed just selecting the PK was to mirror the behaviour of
> QuerySet when passed to __in, as happens here:
>
> https://github.com/django/django/blob/master/django/db/models/query.p...
>
> But since we're talking about raw SQL, we'd want maximum flexibility - so
> would we need to introduce a ValuesRawQuerySet in order to select arbitrary
> columns? It seems that would be the approach that most closely mirrors
> QuerySet.

The ValuesRawQuerySet might be a good idea. Granted, the use cases for
it are hard to come by apart of support for subqueries... One might
say we have implemented a really fancy way to do cursor.execute();
cursor.fetchall()... :)

Another way is to have a pre_subselect hook, and wrap the raw SQL with
the right column selected automatically using that.

- Anssi

Alex Hill

unread,
Dec 13, 2013, 1:35:09 AM12/13/13
to django-d...@googlegroups.com
Hi Anssi,

Getting back to this, a year later...I've created a branch on my fork and opened a ticket on Trac.

My feature branch: https://github.com/AlexHill/django/compare/master...raw_subqueries

Ticket on Trac, with more responses based on your comments above: https://code.djangoproject.com/ticket/21604

Would love to get some feedback.

Cheers,
Alex

Alex Hill

unread,
Dec 17, 2013, 8:48:15 PM12/17/13
to django-d...@googlegroups.com
Hi all,

In implementing this I ran into a bug in the way SQLite returns column information: https://code.djangoproject.com/ticket/21603

It affects raw queries in general, but this feature in particular due to the way the code generate nested SQL queries.

Cheers,
Alex
Reply all
Reply to author
Forward
0 new messages