Ticket #7231, join in extra for querysets ?

40 views
Skip to first unread message

Matthieu Rigal

unread,
Mar 10, 2014, 6:14:03 PM3/10/14
to django...@googlegroups.com
Hi guys,

I wanted to talk about the ticket https://code.djangoproject.com/ticket/7231 in the django-dev forum, but my contribution was just deleted, so I'm posting it here to at least get a chance for a discussion.

One argument of the core devs in the ticket is that extra is a bad thing to have, only there because it was there before raw() came. I think that this is wrong, for example when you want to have all the power of the filters and the sortings but still be able to make complex queries that the ORM can't handle in a decent time.

And I think that the ability to add join to extra would be a great plus. For example in the admin, you may want to display the number of foreign keys for each row, and eventually, two different counts. Two counts as annotate returns wrong results, where as the possibility to add something like the following resolves the problem and reduces dramatically the query time (by factor 120 in my case).

LEFT OUTER JOIN (SELECT COUNT(*) AS "devices_count", "accounts_venuedevice"."venue_id" FROM "accounts_venuedevice" WHERE "accounts_venuedevice"."enabled"=true GROUP BY "accounts_venuedevice"."venue_id" ) AS "enabled_devices" ON ( "accounts_venue"."id" = "enabled_devices"."venue_id") 

On the other hand, having to add the filters and the sorting by hand to the RawQuerySet is a pain in the ass.

Even if extra joins shouldn't be used for production code, it could be very helpful for some edge-cases like for the admin interface.

Best,
Matthieu

Russell Keith-Magee

unread,
Mar 11, 2014, 12:16:19 AM3/11/14
to Django Users
On Tue, Mar 11, 2014 at 2:14 AM, Matthieu Rigal <matt....@googlemail.com> wrote:
Hi guys,

I wanted to talk about the ticket https://code.djangoproject.com/ticket/7231 in the django-dev forum, but my contribution was just deleted, so I'm posting it here to at least get a chance for a discussion.

This statement is a little concerning. We're not in the habit of deleting contributions to django-dev, and we certainly don't censor posts that we don't like (which is what it sounds like we're being accused of). What are you basing your assertion on? 

Is it just that you posted a message, and it didn't appear in the forum? If that's the case, I should let you know that the group is set to "moderate first post" - so your first post to django-dev may take a few hours to get posted to everyone else. This is done as a spam prevention measure.
 
One argument of the core devs in the ticket is that extra is a bad thing to have, only there because it was there before raw() came. I think that this is wrong, for example when you want to have all the power of the filters and the sortings but still be able to make complex queries that the ORM can't handle in a decent time.

And I think that the ability to add join to extra would be a great plus. For example in the admin, you may want to display the number of foreign keys for each row, and eventually, two different counts. Two counts as annotate returns wrong results, where as the possibility to add something like the following resolves the problem and reduces dramatically the query time (by factor 120 in my case).

LEFT OUTER JOIN (SELECT COUNT(*) AS "devices_count", "accounts_venuedevice"."venue_id" FROM "accounts_venuedevice" WHERE "accounts_venuedevice"."enabled"=true GROUP BY "accounts_venuedevice"."venue_id" ) AS "enabled_devices" ON ( "accounts_venue"."id" = "enabled_devices"."venue_id") 

On the other hand, having to add the filters and the sorting by hand to the RawQuerySet is a pain in the ass.

Even if extra joins shouldn't be used for production code, it could be very helpful for some edge-cases like for the admin interface.

The short answer to this -- if you think this is easy, go ahead and produce a patch. There's a reason we haven't done this.

The longer answer:

The core team's position on extra() doesn't exist because we've got some sort of theoretical purity that we want to maintain. It's come as a result of *many* years wrestling with a concept that, at it's core, is badly formed. extra() has been the cause of more headaches and sleepless nights on my part than I care to count. If I could kill it with fire, I would.

As I indicated in a different thread about Marc Tamlyn's Kickstarter project -- Django's ORM is very deliberately *not* a SQL generation engine. When  we talk about aggregates, we don't talk about "GROUP BY" clauses - we are expressing the idea of building an average/sum/etc over a group of objects. When we talk about related objects, we talk about filtering related object, not the construction of JOIN clauses.

This is a deliberate decision. Django's ORM is designed to make 90% of common SQL queries easy to generate. It also had the original idea that it should be possible to express *non* SQL queries as well - although this hasn't actually manifested in practice. 

Django's ORM *does not* have "join", "group by", "having" or other SQL-specific API endpoints -- and it won't gain them without a *massive* shift occurring in the fundamental design of the ORM.

I'm not saying that JOIN etc clauses aren't useful - I'm saying it's a deliberate design decision of Django's ORM to *not* support that use case. 

If you're looking at a problem, and thinking "I need a LEFT OUTER JOIN", then the answer to your problem is raw SQL, not to introduce SQL concepts into Django's ORM. Alternatively, use an ORM that *does* give first-class representation of SQL concepts, like SQLAlchemy. 

Yours,
Russ Magee %-)

Matthieu Rigal

unread,
Mar 11, 2014, 12:42:50 PM3/11/14
to django...@googlegroups.com
Hi Russel,


On Tuesday, March 11, 2014 1:16:19 AM UTC+1, Russell Keith-Magee wrote:

On Tue, Mar 11, 2014 at 2:14 AM, Matthieu Rigal <matt....@googlemail.com> wrote:
Hi guys,

I wanted to talk about the ticket https://code.djangoproject.com/ticket/7231 in the django-dev forum, but my contribution was just deleted, so I'm posting it here to at least get a chance for a discussion.

This statement is a little concerning. We're not in the habit of deleting contributions to django-dev, and we certainly don't censor posts that we don't like (which is what it sounds like we're being accused of). What are you basing your assertion on? 

Is it just that you posted a message, and it didn't appear in the forum? If that's the case, I should let you know that the group is set to "moderate first post" - so your first post to django-dev may take a few hours to get posted to everyone else. This is done as a spam prevention measure.

Excuse my message and forget this sentence, I was introduced in error by Google Group. I know the forum is moderated and this is fine ! It is just than after some time doing a refresh of the conversation, I could read a message "This post was deleted"... Sorry for the wrong assertion
 
 
One argument of the core devs in the ticket is that extra is a bad thing to have, only there because it was there before raw() came. I think that this is wrong, for example when you want to have all the power of the filters and the sortings but still be able to make complex queries that the ORM can't handle in a decent time.

And I think that the ability to add join to extra would be a great plus. For example in the admin, you may want to display the number of foreign keys for each row, and eventually, two different counts. Two counts as annotate returns wrong results, where as the possibility to add something like the following resolves the problem and reduces dramatically the query time (by factor 120 in my case).

LEFT OUTER JOIN (SELECT COUNT(*) AS "devices_count", "accounts_venuedevice"."venue_id" FROM "accounts_venuedevice" WHERE "accounts_venuedevice"."enabled"=true GROUP BY "accounts_venuedevice"."venue_id" ) AS "enabled_devices" ON ( "accounts_venue"."id" = "enabled_devices"."venue_id") 

On the other hand, having to add the filters and the sorting by hand to the RawQuerySet is a pain in the ass.

Even if extra joins shouldn't be used for production code, it could be very helpful for some edge-cases like for the admin interface.

The short answer to this -- if you think this is easy, go ahead and produce a patch. There's a reason we haven't done this.

The longer answer:

The core team's position on extra() doesn't exist because we've got some sort of theoretical purity that we want to maintain. It's come as a result of *many* years wrestling with a concept that, at it's core, is badly formed. extra() has been the cause of more headaches and sleepless nights on my part than I care to count. If I could kill it with fire, I would.

As I indicated in a different thread about Marc Tamlyn's Kickstarter project -- Django's ORM is very deliberately *not* a SQL generation engine. When  we talk about aggregates, we don't talk about "GROUP BY" clauses - we are expressing the idea of building an average/sum/etc over a group of objects. When we talk about related objects, we talk about filtering related object, not the construction of JOIN clauses.

This is a deliberate decision. Django's ORM is designed to make 90% of common SQL queries easy to generate. It also had the original idea that it should be possible to express *non* SQL queries as well - although this hasn't actually manifested in practice. 

Django's ORM *does not* have "join", "group by", "having" or other SQL-specific API endpoints -- and it won't gain them without a *massive* shift occurring in the fundamental design of the ORM.

I'm not saying that JOIN etc clauses aren't useful - I'm saying it's a deliberate design decision of Django's ORM to *not* support that use case. 

If you're looking at a problem, and thinking "I need a LEFT OUTER JOIN", then the answer to your problem is raw SQL, not to introduce SQL concepts into Django's ORM. Alternatively, use an ORM that *does* give first-class representation of SQL concepts, like SQLAlchemy. 

Yours,
Russ Magee %-)


I understand and respect your point. I also know a little bit how many years you have spent on these subjects, I am following Django since a while ;-)

Nevertheless, I think that my usecase, of being willing to add foreign keys counts to querysets for example, is a valid and rather common usecase. And I would rather tend to think that it should then be integrated somehow in Django, without aiming to do a copy of SQLAlchemy. The possibilities are many, as in Josh answer on the dev group, I don't know which one is correct, I thought that adding join possibility to extra was a rather easy one...

Regards,
Matthieu

Russell Keith-Magee

unread,
Mar 11, 2014, 11:21:53 PM3/11/14
to Django Users
I'm glad you think that. :-) Like I said earlier - if you think this is easy, *you* write the patch. I'll see you back here in a couple of years. :-)

Take it from someone who has been there - it *isn't* simple. 

However, you've also changed your question. There's a big difference between saying "I want LEFT OUTER JOIN", and saying "I want to add foreign key counts to query sets". The former is an attempt to put SQL into the ORM. The latter is an abstract concept. The latter is the sort of idea that *would* get traction for the ORM's API, if you can give an elegant expression for what you're trying to encompass.

Yours,
Russ Magee %-)

Reply all
Reply to author
Forward
0 new messages