future of QuerySet.extra()?

688 views
Skip to first unread message

Tim Graham

unread,
Jul 31, 2015, 1:58:12 PM7/31/15
to Django developers (Contributions to Django itself)
In light of the new expressions API, the idea of deprecating QuerySet.extra() has been informally discussed in IRC and elsewhere. I wonder if there is consensus to mark extra() as "unmaintained" and to suggest filing feature requests for functionality that can be performed through extra() but not through other existing QuerySet methods? There are at least several tickets (examples below) of edge cases that don't work with extra(). It seems like a waste of time to leave these tickets as accepted and to triage new issues with extra() if they won't be fixed.

https://code.djangoproject.com/ticket/24142
https://code.djangoproject.com/ticket/19434
https://code.djangoproject.com/ticket/12890

Marc Tamlyn

unread,
Jul 31, 2015, 2:01:20 PM7/31/15
to django-d...@googlegroups.com
I don't know about unmaintained, but I think there's a consensus that .extra() has a horrible API and we should do away with it eventually. That said I think there are still enough things that can't be done without it at present. A lot fewer now we have expressions, but still some.

I'd be happy to put a moratorium on improving it, but we can't deprecate it yet.

--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-develop...@googlegroups.com.
To post to this group, send email to django-d...@googlegroups.com.
Visit this group at http://groups.google.com/group/django-developers.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/6e1be326-3b17-49ca-accf-03eec5ad41ef%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Andres Osinski

unread,
Jul 31, 2015, 2:01:45 PM7/31/15
to django-d...@googlegroups.com
Would the expressions API be able to define fields that are aggregations of complex foreign relationships? I keep using it when I need fields that are the result of weird aggregations over multiple JOINs with some unusual stuff like CASE fields and subqueries.

--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-develop...@googlegroups.com.
To post to this group, send email to django-d...@googlegroups.com.
Visit this group at http://groups.google.com/group/django-developers.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/6e1be326-3b17-49ca-accf-03eec5ad41ef%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--
Andrés Osinski

Collin Anderson

unread,
Jul 31, 2015, 2:07:34 PM7/31/15
to Django developers (Contributions to Django itself), marc....@gmail.com
I wonder if there's a way in the docs we can deprecate it as in "we don't recommend you use it", but not actually schedule it for removal.

Tim Graham

unread,
Jul 31, 2015, 4:00:01 PM7/31/15
to Django developers (Contributions to Django itself), cmawe...@gmail.com
I had in mind a documentation note like this:

Use this method as a last resort


This is an old API that we aim to deprecate at some point in the future. Use it only if you cannot express your query using other queryset methods. If you do need to use it, please file a ticket with your use case so that we can enhance the QuerySet API to allow removing extra(). We are no longer improving or fixing bugs for this method.

Marc Tamlyn

unread,
Jul 31, 2015, 5:13:15 PM7/31/15
to django-d...@googlegroups.com

Christophe Pettus

unread,
Jul 31, 2015, 7:02:36 PM7/31/15
to django-d...@googlegroups.com

Michael Manfre

unread,
Jul 31, 2015, 8:52:05 PM7/31/15
to django-d...@googlegroups.com
As some one who has had to abuse .extra() at times to make mssql things work, I support this approach.


For more options, visit https://groups.google.com/d/optout.



--
GPG Fingerprint: 74DE D158 BAD0 EDF8

Anssi Kääriäinen

unread,
Aug 3, 2015, 3:27:14 AM8/3/15
to Django developers (Contributions to Django itself)
You can annotate raw SQL with expressions. I think the only case that can't be done with expressions is addition of extra tables to the query. I am certain we will get a solution to this too in future releases.

 - Anssi

Anssi Kääriäinen

unread,
Aug 3, 2015, 3:27:49 AM8/3/15
to Django developers (Contributions to Django itself), cmawe...@gmail.com
+1

 - Anssi

Tim Graham

unread,
Aug 3, 2015, 7:07:03 AM8/3/15
to Django developers (Contributions to Django itself)
Is there an example we could give in the docs to point readers in that direction?

Anssi Kääriäinen

unread,
Aug 3, 2015, 9:10:00 AM8/3/15
to django-d...@googlegroups.com
On Mon, Aug 3, 2015 at 2:07 PM, Tim Graham <timog...@gmail.com> wrote:
> Is there an example we could give in the docs to point readers in that
> direction?

There doesn't seem to be direct documentation about RawSQL expression.
The RawSQL is once mentioned in the documentionation, but it
definitely needs a bit more in the database functions documentation.

The usage of RawSQL is really simple:
qs.annotate(val=RawSQL("(select col from sometable where othercol
= %s)", (someparam,)))

This is equivalent to qs.extra(select={'val': "select col from
sometable where othercol = %s"}, select_params=(someparam,))

The main benefit of using RawSQL is that you can set output_field if
needed. The main downside is that if you refer to some table alias of
the qs in the raw sql, then it is possible that Django changes that
alias (for example, when the qs is used as a subquery in yet another
query).

Combining raw SQL with alias relabeling support is possible, too with
the following library: https://github.com/akaariai/django-refsql

- Anssi

Tim Graham

unread,
Aug 3, 2015, 5:09:36 PM8/3/15
to Django developers (Contributions to Django itself)
Thanks! Here's a polished version of that: https://github.com/django/django/pull/5095

Let me know if it makes sense.

Shai Berger

unread,
Aug 3, 2015, 5:10:05 PM8/3/15
to django-d...@googlegroups.com
On Monday 03 August 2015 10:27:14 Anssi Kääriäinen wrote:
> You can annotate raw SQL with expressions. I think the only case that can't
> be done with expressions is addition of extra tables to the query.

Also grouping over non-relations, if I am not mistaken.

Shai.

Josh Smeaton

unread,
Aug 3, 2015, 8:28:03 PM8/3/15
to Django developers (Contributions to Django itself), sh...@platonix.com
Can you explain/give an example of grouping over non relations? I'll see if I can translate that to expressions in their current form.

Tai Lee

unread,
Aug 4, 2015, 1:06:40 AM8/4/15
to Django developers (Contributions to Django itself)
I think that while `extra()` gets a bad rap, it is extremely useful when creating generic pluggable apps, and I for one would miss it if it were to be removed.

In a comment on #7231 a few years ago I wrote:

I can't reply to the discussion linked on Google Groups anymore, but wanted to add my comment that I have found many uses for .extra() and also hit the limitation of being unable to add left outer joins to a query generated by the ORM. I would not like to see it disappear in favour of raw SQL.

My current use case is for joining reverse generic relations on queries against a model that does not have a GenericRelation field back to the generic model (because I don't know which models it will be used with -- I can't add models to 3rd party code).

To require the whole query in raw SQL in order to join generic relations would mitigate the pluggability of the generic model in question.

While `raw()` may be allow you to get model instances from any custom SQL, it will often be prohibitively complicated or impossible to provide generic queryset methods that extend 3rd party apps. The RawSQL expression example given would appear to execute a subquery for each row of the result set. It wouldn't seem to help with the example above, unless I'm missing something?

Cheers.
Tai.

Anssi Kääriäinen

unread,
Aug 4, 2015, 5:50:47 AM8/4/15
to django-d...@googlegroups.com
On Tue, Aug 4, 2015 at 8:06 AM, Tai Lee <real....@mrmachine.net> wrote:
> I think that while `extra()` gets a bad rap, it is extremely useful when
> creating generic pluggable apps, and I for one would miss it if it were to
> be removed.

The idea is that expressions should do everything that extra() can do.
There might be cases where this is not yet true, but I hope we will
get there in a couple of releases.

The extra() method should not be better than expressions for reusable
apps. Actually the contrary, with expressions you can generate
different SQL for different databases.

> In a comment on #7231 a few years ago I wrote:
>
>> I can't reply to the discussion linked on Google Groups anymore, but
>> wanted to add my comment that I have found many uses for .extra() and also
>> hit the limitation of being unable to add left outer joins to a query
>> generated by the ORM. I would not like to see it disappear in favour of raw
>> SQL.
>>
>>
>> My current use case is for joining reverse generic relations on queries
>> against a model that does not have a GenericRelation field back to the
>> generic model (because I don't know which models it will be used with -- I
>> can't add models to 3rd party code).
>>
>>
>> To require the whole query in raw SQL in order to join generic relations
>> would mitigate the pluggability of the generic model in question.
>
>
> While `raw()` may be allow you to get model instances from any custom SQL,
> it will often be prohibitively complicated or impossible to provide generic
> queryset methods that extend 3rd party apps. The RawSQL expression example
> given would appear to execute a subquery for each row of the result set. It
> wouldn't seem to help with the example above, unless I'm missing something?

What we want is something like
qs.add_relation(new_rel=Relation(SomeModel, on_clause=some_expression,
allow_nulls=True))

The some_expression variable generates sql for the ON clause, and
allow_nulls=True means the join will be LEFT join. Typically
some_expression could be something like Q(id=F('new_rel__other_id')) &
Q(new_rel__lang='fi').

The effect of this would be that the qs has a new relation called
new_rel to SomeModel, and that relation could be used in other query
constructs (for example .filter(new_rel__somef__lte=10)).

Making this work is high priority to me.

- Anssi

Shai Berger

unread,
Aug 4, 2015, 2:10:38 PM8/4/15
to django-d...@googlegroups.com
Hi,

On Tuesday 04 August 2015 03:28:03 Josh Smeaton wrote:
> On Tuesday, 4 August 2015 07:10:05 UTC+10, Shai Berger wrote:
> > On Monday 03 August 2015 10:27:14 Anssi Kääriäinen wrote:
> > > You can annotate raw SQL with expressions. I think the only case that
> > > can't be done with expressions is addition of extra tables to the query.
> >
> > Also grouping over non-relations, if I am not mistaken.
> >
> Can you explain/give an example of grouping over non relations? I'll see if
> I can translate that to expressions in their current form.
>
The classic database aggregation examples involve aggregation over time: Sum
of sales per quarter (which is, itself, a function over date); average
temparature per month; etc. All these require group-by clauses which name
(expressions over) columns which are not FKs.

There are many similar examples involving "enum columns" -- columns with a
limited set of choices, where the choices are not instances of another model.

As far as I am aware, we can still not do these aggregates in the ORM,

Shai.

Anssi Kääriäinen

unread,
Aug 4, 2015, 3:05:25 PM8/4/15
to django-d...@googlegroups.com
On Tuesday, August 4, 2015, Shai Berger <sh...@platonix.com> wrote:
The classic database aggregation examples involve aggregation over time: Sum
of sales per quarter (which is, itself, a function over date); average
temparature per month; etc. All these require group-by clauses which name
(expressions over) columns which are not FKs.


The following should work:

 qs.annotate(quarter=Quarter('sold_date')).values('quarter').annotate(sum=Sum('amount'))

I recall some changes to how the group by expression is generated for expressions. If I recall correctly the group by should have the expression itself now, not the base columns of the expression.

There are many similar examples involving "enum columns" -- columns with a
limited set of choices, where the choices are not instances of another model.

I'm not sure of this one.

 - Anssi

Josh Smeaton

unread,
Aug 4, 2015, 8:09:30 PM8/4/15
to django-d...@googlegroups.com
You're right about that. You can aggregate over custom expressions. For aggregating over enums you should be able to use case expressions

--
You received this message because you are subscribed to a topic in the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/django-developers/FojuU0syO8Y/unsubscribe.
To unsubscribe from this group and all its topics, send an email to django-develop...@googlegroups.com.

To post to this group, send email to django-d...@googlegroups.com.
Visit this group at http://groups.google.com/group/django-developers.

Adam Johnson

unread,
Aug 6, 2015, 5:06:38 PM8/6/15
to Django developers (Contributions to Django itself)
I've seen extra() misused in cases where developers came from a background of writing raw SQL and then used it rather than figuring out the (often relatively simple) ORM way of doing it. This is then a big maintenance burden, and the harsher warning against its use is a good idea.

Also thanks for documenting RawSQL, didn't realize it was usable until now so I've been able to get rid of some extra() usage in django-mysql.

Adam Johnson

unread,
Aug 21, 2015, 9:33:33 AM8/21/15
to Django developers (Contributions to Django itself)
Actually I spoke too soon, I excitedly misunderstood what RawSQL does/doesn't do. Unfortunately it requires access to a column, which is not what I need. I'm inserting WHERE conditions like /*specialcomment*/1 - the special comment is translated into query rewriting, leaving 1 in the WHERE condition as a no-op. Unfortunately I can't see a way of injecting the raw SQL into the QuerySet using filter without referring to a column (required to make the ORM extension work without affecting queries too badly) - so extra remains the only solution here, unless I'm missing something... any suggestions welcome ( https://github.com/adamchainz/django-mysql/pull/144/files ).
Reply all
Reply to author
Forward
0 new messages