Getting at order_by column values easily in queryset-refactor?

9 views
Skip to first unread message

Matt Hoskins

unread,
Mar 3, 2008, 5:30:14 AM3/3/08
to Django users
In my application I have a component for displaying paged sorted
tables of objects. Sometimes it's relevant for a column to show
information from a related object. In some cases it's relevant for the
column to show information from related objects where the related
objects are related via a many to many. That's easy enough to do. It's
desirable to be able to sort on such columns. When not sorting on that
column the column will show the names of all the related objects for
the given row. When sorting on that column it's desirable to show the
name of the related object that sorts to that position.

So if you have Contract A and Contract B with Contract A linking to
Supplier 2 and Contract B linking to Supplier 1 and Supplier 3 the
desired results to display (when sorting by supplier) would be:
Contract B | Supplier 1
Contract A | Supplier 2
Contract B | Supplier 3

(when sorting on contract name you'd just see one row for Contract B

Now with queryset_refactor I can order by the supplier names no
trouble, and I get the correct ordering of contracts in the queryset,
but I don't think there's a clean way for me to get at the values from
the order_by when using a queryset returning objects without using
extras() to do the order_by and include it as a select extra (or use
extras() to just specify the select using the sql table name under the
assumption that it will be added in to fulfill the order_by).

Possible extra features in QuerySet that could provide me with a
tidier way of getting at it would be to allow the "select" part of
extras to specify supplier__name style references as well as just raw
SQL, or to just allow the "select" part of extras to specify a
supplier__name reference if it's in the order_by too then you get that
order_by back. An alternative is some way of saying "I want you to add
an attribute called order_bys to the objects in the returned set which
contain the order_by values". So then on each object from the queryset
I'd have .order_bys which would be, say, {'suppliers_name':'Supplier
1'}

Mat

unread,
Mar 3, 2008, 6:07:58 AM3/3/08
to django...@googlegroups.com
Hi

New to django, and I was wondering if there is anyway of adding custom
fields to M2M relationships in the model?

For example I have two tables source and destinations, and a M2M mapping
between them. I would like to add a order field (int) to show which order to
list the destinations in, and this should be able to vary on a source by
source basis.

Any thoughts?

Thanks in advance.

Mat

Russell Keith-Magee

unread,
Mar 3, 2008, 6:13:26 AM3/3/08
to django...@googlegroups.com

A search of the archives should reveal at least two significant alternatives:

http://www.djangoproject.com/documentation/models/m2m_intermediary/

http://code.djangoproject.com/ticket/6095

Yours,
Russ Magee %-)

Malcolm Tredinnick

unread,
Mar 3, 2008, 7:16:59 AM3/3/08
to django...@googlegroups.com

Personally, I think you're approaching this problem in the wrong
fashion: since you know the ordering when you specify it to the
queryset, just tack it on yourself as an attribute there.

However, if you really want to rummage around in the internals, you
still have access to the query. So queryset.query.order_by (I'm talking
about the queryset-refactor code here. There might be a similar way on
trunk) will give you the ordering that was specified to the query.

Malcolm

--
If it walks out of your refrigerator, LET IT GO!!
http://www.pointy-stick.com/blog/

Malcolm Tredinnick

unread,
Mar 3, 2008, 8:12:48 AM3/3/08
to django...@googlegroups.com

On Mon, 2008-03-03 at 02:30 -0800, Matt Hoskins wrote:
> In my application I have a component for displaying paged sorted
> tables of objects. Sometimes it's relevant for a column to show
> information from a related object. In some cases it's relevant for the
> column to show information from related objects where the related
> objects are related via a many to many. That's easy enough to do. It's
> desirable to be able to sort on such columns. When not sorting on that
> column the column will show the names of all the related objects for
> the given row. When sorting on that column it's desirable to show the
> name of the related object that sorts to that position.
>
> So if you have Contract A and Contract B with Contract A linking to
> Supplier 2 and Contract B linking to Supplier 1 and Supplier 3 the
> desired results to display (when sorting by supplier) would be:
> Contract B | Supplier 1
> Contract A | Supplier 2
> Contract B | Supplier 3
>
> (when sorting on contract name you'd just see one row for Contract B

Having read this again in light of your complaint in #6701, I should
point out that it's not going to work like this if you're always
filtering on Contract. A queryset is going to return one object for each
distinct object if you do the equivalent of Contract.objects.all(), not
on per many-to-many result.

So if you want to order by suppliers like this, you'll need to do a
queryset based on suppliers and pull back their related contract objects
and then you can order on suppliers however you like.

Malcolm

--
I just got lost in thought. It was unfamiliar territory.
http://www.pointy-stick.com/blog/

Matt Hoskins

unread,
Mar 3, 2008, 9:04:02 AM3/3/08
to Django users
> Having read this again in light of your complaint in #6701, I should
> point out that it's not going to work like this if you're always
> filtering on Contract.

Not really a complaint in #6701 - you didn't say at first that you
were going to remove support for m2m order_by, so I was concerned that
in the future people would do m2m order_bys and then hit the issue
that .count() wouldn't match len(queryset). My apologies for the drift
into discussing the meaning of m2m order_by though - I just replied
there 'cos that's where you said up that you didn't think it made
sense for django to do m2m order_by.

> A queryset is going to return one object for each
> distinct object if you do the equivalent of Contract.objects.all(), not
> on per many-to-many result.

So the .distinct() method is going to become redundant in qf-rf
because it will effectively always be the case?

> So if you want to order by suppliers like this, you'll need to do a
> queryset based on suppliers and pull back their related contract objects
> and then you can order on suppliers however you like.

So in the future with qs-rf I won't even be able to achieve this using
on a query of contracts by using .extras() to join in the suppliers
table for the purposes of expanding/sorting the results?

Contract objects are the focus of the results I want, and it's a
queryset of these that I pass to the paginator. Constructing something
to pass to the paginator from the starting point of supplier objects
just seems horribly inefficient and requires a chunk of special case
code for that specific sort rather than the alternative of a tweak to
expand the initial result set. If django isn't going to have the
notion of ordering on an m2m field and .extras() can't be used to work
around this then I think I'll just subclass queryset instead to make
it do what I want.

Matt

Malcolm Tredinnick

unread,
Mar 3, 2008, 9:47:32 AM3/3/08
to django...@googlegroups.com

On Mon, 2008-03-03 at 06:04 -0800, Matt Hoskins wrote:
> > Having read this again in light of your complaint in #6701, I should
> > point out that it's not going to work like this if you're always
> > filtering on Contract.
>
> Not really a complaint in #6701 - you didn't say at first that you
> were going to remove support for m2m order_by, so I was concerned that
> in the future people would do m2m order_bys and then hit the issue
> that .count() wouldn't match len(queryset). My apologies for the drift
> into discussing the meaning of m2m order_by though - I just replied
> there 'cos that's where you said up that you didn't think it made
> sense for django to do m2m order_by.

It's more than a preference and it's very important to realise that. It
is not possible for it to work unless you pick a behaviour arbitrarily.

Suppose you have the following objects:

A : related to x1, x2, x3
B : related to x1 and x3
C : related to nothing.

One possible "ordering" on the related m2m field here is C, B, A: using
a "result set size" ordering. Two others are "C, A, B" and "A, B, C",
using lexicographic ordering, after ordering the m2m set and with two
alternatives because there are two possibilities for how to order the
empty set. Another possibility is "A, B, C" based on doing it all in one
SQL query and picking the first row (so only the first many-to-many
result will have an effect) and where "A" happens to sort before "B" for
some reason. You happen to want A, B, A, A, B with C either first or
last (plus a couple of permutations depending upon how A and B order
themselves).

All of these orderings are possible, although your one is probably the
least logical on the grounds that it changes the result set. The problem
is that none of them are particularly canonically correct.

This isn't some personal preference issue where I think that we could
support it but it isn't worth it. There is no right answer to doing
this. It's simply not well-defined at the logical and relational levels.

> > A queryset is going to return one object for each
> > distinct object if you do the equivalent of Contract.objects.all(), not
> > on per many-to-many result.
>
> So the .distinct() method is going to become redundant in qf-rf
> because it will effectively always be the case?

Not at all. I explicitly said the all() query, which is essentially what
you are doing. It's impossible to automatically tell if distinct() will
be wanted/needed or not, so it cannot be removed for the general
filtering case. There are still going to be cases when a query returns
multiple results unless you use distinct(). However, adjusting the
order_by() fragment will not be any of those. It only orders the result
set (at the SQL level), it does not change the result set.

Queyrset-refactor doesn't introduce new magical powers to querysets.
They will still behave basically the way they did before, only with less
bugs. Your current attempt to order by a many-to-many happened to work
by accident, mostly because I didn't insert all the extra overhead to do
that checking (constructing a query is expensive enough as it is) and I
expected most people would work out that it isn't a well-defined
operation. At the time I punted on whether to enforce it or document it
(keep in mind that the branch isn't finished yet), since I wasn't sure
which way it would pan out. Recently I had to add support for another
code path that should allow us to error out correctly when somebody
tries to add a multi-valued field to the ordering, so we'll probably go
with the error path.

> > So if you want to order by suppliers like this, you'll need to do a
> > queryset based on suppliers and pull back their related contract objects
> > and then you can order on suppliers however you like.
>
> So in the future with qs-rf I won't even be able to achieve this using
> on a query of contracts by using .extras() to join in the suppliers
> table for the purposes of expanding/sorting the results?

The extras() call will work as it does now, except that it has its own
ordering clause, because mixing it with the queryset's ordering was too
fragile in a lot of cases. But using extra() is entirely different form
accessing a many-to-many relation. I suggested one approach to solving
this, which would seem to be the most logical for most situations: if
you're driving off the suppliers, use that as the queryset root, rather
than trying to get multiple results for the contracts. Using extra()
might be a different solution; it depends on your circumstances and
query performance characteristics.

*All* I am trying to point out in this sub-thread is that expecting the
objects that are returned in the result to change because you put in an
order_by() is mistaken. Using order_by() only orders existing results
and it only does so at the SQL level. The fact that it didn't raise an
error when you passed in a many-to-many doesn't mean that should work.
It just confirms that there are many logically undefined things you can
do in programming and raising an error for every single situation isn't
always going to happen. The solution is "don't do that".

Regards,
Malcolm

--
Works better when plugged in.
http://www.pointy-stick.com/blog/

Matt Hoskins

unread,
Mar 3, 2008, 3:24:46 PM3/3/08
to Django users
> It's more than a preference and it's very important to realise that. It
> is not possible for it to work unless you pick a behaviour arbitrarily.
> Suppose you have the following objects:

I don't think I've ever said anything about preferences (apologies if
I had). It is, IMHO, a design choice as to whether to support
something in an API or not. If the semantics of an operation could
have several interpretations no clear correct one then it's a design
choices as to whether to support that operation or not, and then if
deciding to support it which interpretation to go for (with many
external and internal factors feeding into that decision). I'm
certainly not trying to suggest there's anything wrong with your
design choices, I'm just trying to say where I'm coming from and find
out where you're coming from!

>
> A : related to x1, x2, x3
> B : related to x1 and x3
> C : related to nothing.
>
> One possible "ordering" on the related m2m field here is C, B, A: using
> a "result set size" ordering. Two others are "C, A, B" and "A, B, C",
> using lexicographic ordering, after ordering the m2m set and with two
> alternatives because there are two possibilities for how to order the
> empty set. Another possibility is "A, B, C" based on doing it all in one
> SQL query and picking the first row (so only the first many-to-many
> result will have an effect) and where "A" happens to sort before "B" for
> some reason. You happen to want A, B, A, A, B with C either first or
> last (plus a couple of permutations depending upon how A and B order
> themselves).
>
> All of these orderings are possible, although your one is probably the
> least logical on the grounds that it changes the result set. The problem
> is that none of them are particularly canonically correct.

I've never commented on how logical my one is - just that it would be
the most useful to me and I've given an example of why/what I'd use
the behaviour for. The example I've given of how the data would be
presented to the user when they request sorting a data table on such a
multi value column is a pattern I became familiar with many years ago
doing Lotus Notes development work and has been useful as a way of
presenting data to users in the applications I've developed over the
years in other platforms. I think of the ordering behaviour I'm after
as being like a SQL join. In your example above you'd apply all the
filtering to give your final set of primary objects and then because
sorting is wanted on an attribute of the other class of objects you
get the tuples (A,x1),(A,x2),(A,x3),(B,x1),(B,x2),(C,NULL) and you'd
then sort those tuples.

Regarding none of them being particularly canonically correct surely
just means that there's no clear choice, not that none of them could
ever be chosen (no clear choice unless other factors from an outside
context come into play, for example, a common design pattern emerged
in a large number of django applications that would find one of the
orderings more useful than the others - in that case it may be the
"arbitrary" choice is then to implement that and document that as the
behaviour so as to serve the needs of that user base unless there are
still compelling reasons not to).

>
> > > A queryset is going to return one object for each
> > > distinct object if you do the equivalent of Contract.objects.all(), not
> > > on per many-to-many result.
>
> > So the .distinct() method is going to become redundant in qf-rf
> > because it will effectively always be the case?
>
> Not at all. I explicitly said the all() query, which is essentially what
> you are doing. It's impossible to automatically tell if distinct() will
> be wanted/needed or not, so it cannot be removed for the general
> filtering case. There are still going to be cases when a query returns
> multiple results unless you use distinct(). However, adjusting the
> order_by() fragment will not be any of those. It only orders the result
> set (at the SQL level), it does not change the result set.

Probably my inexperience with django - but I wasn't sure if you meant
"any query that starts from .all()" by that which is why I asked.

> Queyrset-refactor doesn't introduce new magical powers to querysets.
> They will still behave basically the way they did before, only with less
> bugs. Your current attempt to order by a many-to-many happened to work
> by accident, mostly because I didn't insert all the extra overhead to do
> that checking (constructing a query is expensive enough as it is) and I
> expected most people would work out that it isn't a well-defined
> operation.

I do appreciate the effort you're putting into queryset-refactor - I'm
just trying to understand what may or may not be possible with it in
this area, and now I know, so that's good! I'm not asking for it to
support my behaviour, and I've certainly not suggested it should have
magical powers.

> accessing a many-to-many relation. I suggested one approach to solving
> this, which would seem to be the most logical for most situations: if
> you're driving off the suppliers, use that as the queryset root, rather
> than trying to get multiple results for the contracts. Using extra()
> might be a different solution; it depends on your circumstances and
> query performance characteristics.

For what I'm doing due to the circumstances, driving off suppliers is
not the way to go, but thanks for the suggestion.

> *All* I am trying to point out in this sub-thread is that expecting the
> objects that are returned in the result to change because you put in an
> order_by() is mistaken. Using order_by() only orders existing results
> and it only does so at the SQL level. The fact that it didn't raise an
> error when you passed in a many-to-many doesn't mean that should work.
> It just confirms that there are many logically undefined things you can
> do in programming and raising an error for every single situation isn't
> always going to happen. The solution is "don't do that".

You pointed out above various possible response to a request to sort
on an m2m, so it's entirely possible for an API user to assume that
one of the might happen if they specify an m2m field. If it then looks
like one of them is actually happening then they could equally infer
it might be supported behaviour. The django documentation isn't
complete so I'm finding I'm having to do some trial and error and also
reading of code to figure out what's supported when doing anything
other than the simple cases in a few areas. This isn't a complaint,
just an observation.

Anyway sounds like the bottom line is specifying an m2m field in an
order_by is not supported by django's queryset as there is no
particularly canonically correct behaviour, which is fair enough and
as I said in my previous reply I'll find another way of getting the
behaviour I want with what is supported (which may involve .extras()
or subclasses or something else).

I appreciate you taking the time to chat this over - we both have
different perspectives and are coming from different places - please
take what I'm saying as me trying to put my perspective rather than
anything else. I think we've probably talked this one to death now :)

Matt
Reply all
Reply to author
Forward
0 new messages