On 30 elo, 11:46, Alex Hill <
a...@hill.net.au> wrote:
> Hi, and thanks for making Django.
>
> At the moment, when passing a RawQuerySet to an __in filter, the raw query
> is evaluated and the result passed as a parameter to the filter's SQL. It
> would be nice to have the option of using the raw SQL in a subquery
> instead, for performance reasons.
>
> I think in this case the rationale for acceptinghttps://
code.djangoproject.com/ticket/14733applies. Raw queries already
> require the developer to know what they're doing - this just makes them a
> little more useful.
>
> I can see two ways of implementing this. The first is simple: just require
> that the query only select one column, and implement _as_sql on RawQuerySet
> to simply return the SQL and params. A flag passed to raw() could let a
> developer explicitly indicate that the query should be used as a subquery
> instead of evaluated.
>
> The second way is more complicated but nicer: wrap the raw SQL in another
> SELECT, selecting the Model's primary key. The advantage of this method is
> that the raw query can select many columns, and therefore remain useful in
> other contexts, while still working as expected with __in. This could then
> be the default behaviour, with the current behaviour achieved by wrapping
> the query in list() as is the case for regular queries.
>
> I much prefer the second approach, but I don't think I'm familiar enough
> with the ORM to implement it without a little guidance from someone more
> knowledgeable.
>
> I'd love to hear some feedback on this idea!
Some possible issues with the second approach:
1. It should not be assumed that the primary key is the field needed
in the inner query. We do have foreign keys to other fields, and one
can use this approach without foreign keys at all.
2. If the query is of the form NOT IN, then we need to also filter
out null values, otherwise no results are returned if the raw query
contains a single null (NOT IN is weird...)
3. Does the wrapping of the subquery into (select field from
(raw_query) where field is not null) cause performance problems? It
could be possible that this prevents the DB's optimizer from working
correctly.
4. If the query doesn't select the required column (that is, it is
deferred), we don't have any way to know that. This seems like a
documentation issue, but worth mentioning.
It would be useful to have this feature, so a +1 to this idea. The
above issues do not seem too severe, so I believe it is possible to
make this work.
- Anssi