ordering m2m query based on through table

30 views
Skip to first unread message

KentH

unread,
May 22, 2012, 12:14:16 AM5/22/12
to django...@googlegroups.com
I am trying to configure a m2m relationship to be ordered based on data in the through table. I assumed an 'order' attribute in the through-table's Meta class would be honored, but it's not. Bug #11850 (closed: wontfix) relates to this problem, but with no pointer as to a work-around.

As custom ordering would seem a prime use case for a custom through class on m2m relations, I'm surprised how hard it is.

In my case, I'm trying to order digital assets (eg sheet music) into a number of folios (collections). Music can appear in several collections. Order is manually specified for each collection. An attribute in a custom through table is a natural solution, but it doesn't work.

Other examples which come to mind: Authors and books, or songs and composers. Since books are alphabetized by lead author, the order matters. And we wouldn't want Django to list "Lennon & McCartney" songs  as "McCartney & Lennon"

Anyway, how is the best way to crack this nut? After studying `db.models.related` I tried building a custom manager on the related class, and override the `get_query_set` method to look for the through attribute on the manager to update the ordered_by clause. No luck -- but with prefetch interaction, this seems pretty fragile in any case.

Thanks for any help.

Kent.

akaariai

unread,
May 22, 2012, 10:34:21 AM5/22/12
to Django users
So, you would want to do collection.songs.all() and have that ordered
by the sequence defined in the m2m table? I agree - this seems to be
somewhat hard currently. You would need to do [t.song for t in
Through.objects.filter(collection=collection).order_by('seq').select_related('song')]
or something like that... It might be nice if the related manager of
collection would have a special lookup 'through', so you could to
collection.songs.all().select_related('through'), or
collection.songs.all().order_by('through__seq'). But it would be
somewhat messy to do that currently. Implementing better custom
lookups could make this easier to implement.

- Anssi

bruno desthuilliers

unread,
May 23, 2012, 5:12:53 AM5/23/12
to Django users
On May 22, 4:34 pm, akaariai <akaar...@gmail.com> wrote:

> So, you would want to do collection.songs.all() and have that ordered
> by the sequence defined in the m2m table? I agree - this seems to be
> somewhat hard currently.


I currently have a similar scheme in one of our projects, works fine
as far as I can tell:


class Categorie(Model):
# ....


class Blook(Model):
# ....
categories = models.ManyToManyField(
Category,
through='Classification',
related_name="blooks",
)

@property
def sorted_categories(self):
return self.categories.order_by("classifications__position")


class Classification(Model):
category = models.ForeignKey(
Category,
related_name="classifications"
)

blook = models.ForeignKey(
Blook,
related_name="classifications"
)

position = models.PositiveIntegerField(
_(u"Position"),
)

class Meta:
unique_together = (
("blook", "category"),
("blook", "position"),
)




akaariai

unread,
May 23, 2012, 5:43:52 AM5/23/12
to Django users
On May 23, 12:12 pm, bruno desthuilliers
I would guess you would get duplicate objects returned. I think the
query works as follows:
- Fetch all Blook's categories
- For those categories you fetch all the classifications (not only
those which are related to Blook) and order by the position in there.
So, you could get the same category multiple times ordered by
positions not related to the blook <-> category connection.

Maybe the above code does work correctly. The interaction between
multiple references to same reverse foreign key relation in single
queryset is somewhat hard to remember. Could you post the generated
SQL?

- Anssi

bruno desthuilliers

unread,
May 23, 2012, 7:36:02 AM5/23/12
to Django users
On May 23, 11:43 am, akaariai <akaar...@gmail.com> wrote:

> I would guess you would get duplicate objects returned.

We programmers are usually very bad at guessing. So when in doubt,
check, don't guess ;)

> I think the
> query works as follows:
>   - Fetch all Blook's categories
>   - For those categories you fetch all the classifications (not only
> those which are related to Blook) and order by the position in there.

Not quite - cf the generated SQL below.

> So, you could get the same category multiple times ordered by
> positions not related to the blook <-> category connection.

That's not the case. Demonstration:

First, check that we do have categories shared by multiples blooks
(else it wouldn't mean much):

>>> for blook in blooks:
... print blook, blook.categories.order_by("id").values_list("id",
flat=True)
...

#139 [533L, 534L, 535L, 536L, 537L, 538L, 539L, 540L, 541L, 542L,
543L, 544L, 545L, 546L, 547L]
#129 [534L, 535L, 537L, 539L, 540L, 542L, 543L, 544L, 545L, 546L,
547L]
#128 [534L]
#126 [533L, 534L, 535L, 536L, 537L, 538L, 539L, 540L, 541L, 542L,
543L, 544L, 545L, 546L, 547L]
#127 [533L, 534L, 535L, 536L, 537L, 538L, 539L, 540L, 541L, 542L,
543L, 544L, 545L, 546L, 547L]
#118 [534L]

Now check that for each blook, we have the same categories whether
sorted or unsorted:

>>> for blook in blooks:
... cats = blook.categories.all()
... scats = blook.sorted_categories
... assert set(cats) == set(scats), "OOPS ??? %s" % blook
...

Well... No AssertionError raised, so it's correct.

> Maybe the above code does work correctly.

It does.

Custom ordering is possibly one of the most common use case for
"through" m2ms, so you can bet it works - else there would have been
quite some criticism about it ;)

> The interaction between
> multiple references to same reverse foreign key relation in single
> queryset is somewhat hard to remember. Could you post the generated
> SQL?

here you go - reformated for readability:

SELECT
`blookcore_category`.`id`,
`blookcore_category`.`label`
FROM
`blookcore_category`
LEFT OUTER JOIN `blookcore_classification`
ON (`blookcore_category`.`id` =
`blookcore_classification`.`category_id`)
WHERE
`blookcore_classification`.`blook_id` = 118
ORDER BY
`blookcore_classification`.`position` ASC


I don't quite get why it uses a left outer join instead of the inner
join used when not adding the order by clause, but writing a working,
usable ORM is not exactly a piece of cake so I won't complain about
this ;). Anyway: the where clause still makes sure we only get the
relevant categories.

HTH

akaariai

unread,
May 23, 2012, 8:36:26 AM5/23/12
to Django users
On May 23, 2:36 pm, bruno desthuilliers
<bruno.desthuilli...@gmail.com> wrote:
> > The interaction between
> > multiple references to same reverse foreign key relation in single
> > queryset is somewhat hard to remember. Could you post the generated
> > SQL?
>
> here you go - reformated for readability:
>
> SELECT
>   `blookcore_category`.`id`,
>   `blookcore_category`.`label`
> FROM
>   `blookcore_category`
>    LEFT OUTER JOIN `blookcore_classification`
>    ON (`blookcore_category`.`id` =
> `blookcore_classification`.`category_id`)
> WHERE
>    `blookcore_classification`.`blook_id` = 118
> ORDER BY
>    `blookcore_classification`.`position` ASC
>
> I don't quite get why it uses a left outer join instead of the inner
> join used when not adding the order by clause, but writing a working,
> usable ORM is not exactly a piece of cake so I won't complain about
> this ;). Anyway: the where clause still makes sure we only get the
> relevant categories.

Right you are about the query producing correct results. I should have
checked that myself.

I guess (again!) that the reason for the LEFT JOIN is this call in sql/
compiler.py:_setup_joins()

self.query.promote_alias_chain(joins,
self.query.alias_map[joins[0]].join_type == self.query.LOUTER)

This seems to promote the join to outer join too aggressively, the
promote_alias_chain call does not see that the join is constrained by
other conditions already. I don't know if this has any practical
effects, nor do I know how easy/hard this would be to fix. It could be
as easy as just starting from the first new join generated by the
_setup_joins() call, and leaving the pre-existing joins alone. If it
is that easy, it should be fixed...

- Anssi

akaariai

unread,
May 23, 2012, 4:33:26 PM5/23/12
to Django users
On May 23, 3:36 pm, akaariai <akaar...@gmail.com> wrote:
> I guess (again!) that the reason for the LEFT JOIN is this call in sql/
> compiler.py:_setup_joins()
>
> self.query.promote_alias_chain(joins,
>      self.query.alias_map[joins[0]].join_type == self.query.LOUTER)
>
> This seems to promote the join to outer join too aggressively, the
> promote_alias_chain call does not see that the join is constrained by
> other conditions already. I don't know if this has any practical
> effects, nor do I know how easy/hard this would be to fix. It could be
> as easy as just starting from the first new join generated by the
> _setup_joins() call, and leaving the pre-existing joins alone. If it
> is that easy, it should be fixed...

I have a fix for the issue mentioned above. I haven't figured out how
to create a query where promotion from INNER JOIN to LEFT OUTER JOIN
will cause wrong results. Fixing this is still worth it, as even if
there isn't a query where the current promotion logic breaks the
results, it is possible there one day will be a way to do that. There
might be performance impacts at least for some databases.

If somebody happens to figure out a query which produces wrong results
due to the INNER -> OUTER promotion it would be good to add a test
case for that.

See this pull request for details: https://github.com/django/django/pull/90

- Anssi

akaariai

unread,
May 24, 2012, 11:54:45 AM5/24/12
to Django users
Reply all
Reply to author
Forward
0 new messages