I've been working my way through a couple of aggregation tickets, and
I've hit a bit of an inconsistency that I think may need to be
addressed. The problem lies in the interaction of values() and
annotate(), especially as compared to the interaction of extra() and
values().
Ordinarily, values() directs the ORM to return the raw values, rather
than an object instance. values('x','y','z') directs the ORM to return
only the named columns x,y, and z.
If the query has an extra clause(), values('x','y','z') will only
return the nominated columns. If an extra select column isn't
mentioned specifically in the values clause, it isn't included in the
result set.
However, if the query contains an annotate() clause, the annotated
value is always returned in the values() set, regardless of whether it
is mentioned in the values clause or not.
Aside from this being inconsistent, it's a little inconvenient when it
interacts with the behaviour introduced in [9701] - the ability to use
querysets as rvalues for __in queries. If you have a query with an
annotate in it, you essentially can't use it as an rvalue, because it
will always have a minimum of 2 columns, which means you can't use a
query with an annotation in an __in clause (which require a single
column).
So - the initial suggestion here is to modify ValuesQuerySet so that
annotations aren't automatically included in the result set - you
would need to specifically mention the annotate column in order for it
to be returned.
There is a slight complication, though. The ordering of values() and
annotate() is significant - values() controls the grouping of
annotated values if it precedes the annotate(). However, you can't
mention the annotated column in a values() clause before it is
specified, so are left in a situation where you can't actually return
the annotated value. For example in the query:
>>> Book.values('title','isbn').annotate(n_authors=Count('authors'))
n_authors wouldn't be returned in the result set, and there isn't any
way to add it to the returned values.
So - some options:
1) Leave things as-is. Annotated columns always appear in the result
set. This is inconsistent with extra(), and means you can't use
annotated queries in __in clauses.
2) Modify things slightly - annotate() calls before a values() call
aren't automatically added to the results, but annotate() calls after
values() are included automatically. This would allow annotations to
always be returned, optionally under those circumstances that allow
(i.e., annotate() before values()).
3) Rely on multiple calls to values() to get annotate columns into the
result set. i.e., the previous example would need to become something
like:
>>> Book.values('title','isbn').annotate(n_authors=Count('authors')).values('title','isbn','n_authors')
This has the potential to be a bit fragile - we would need to put in
all sorts of checks for changes in column lists between successive
calls to values() - but it would allow all combinations of annotation.
Opinions? Any other options I may have missed?
Yours
Russ Magee %-)
So - some options:
1) Leave things as-is. Annotated columns always appear in the result
set. This is inconsistent with extra(), and means you can't use
annotated queries in __in clauses.
2) Modify things slightly - annotate() calls before a values() call
aren't automatically added to the results, but annotate() calls after
values() are included automatically. This would allow annotations to
always be returned, optionally under those circumstances that allow
(i.e., annotate() before values()).
3) Rely on multiple calls to values() to get annotate columns into the
result set. i.e., the previous example would need to become something
like:
>>> Book.values('title','isbn').annotate(n_authors=Count('authors')).values('title','isbn','n_authors')
This has the potential to be a bit fragile - we would need to put in
all sorts of checks for changes in column lists between successive
calls to values() - but it would allow all combinations of annotation.
Opinions? Any other options I may have missed?
Option 2 was my preferred option. The implementation borders on
trivial, and there's only 1 or two test cases from the test suite that
are affected. I just wanted a sanity check to see if I was missing an
obvious option from being to close to the code, or if anyone was
particularly attached to the compulsory inclusion behavior of
annotate().
Yours,
Russ Magee %-)
This is the nub of the problem, isn't it. We possibly messed up here by
overloading what values() does/means. :-(
Problem is, adding a new API to specify the grouping order has the same
drawbacks as having to write values() twice -- you end up needing to
name the fields once for values() and once for the order they group by
and then you think about maybe extracting the latter info from the
former and we're back where we started.
> However, you can't
> mention the annotated column in a values() clause before it is
> specified, so are left in a situation where you can't actually return
> the annotated value. For example in the query:
>
> >>> Book.values('title','isbn').annotate(n_authors=Count('authors'))
>
> n_authors wouldn't be returned in the result set, and there isn't any
> way to add it to the returned values.
>
> So - some options:
>
> 1) Leave things as-is. Annotated columns always appear in the result
> set. This is inconsistent with extra(), and means you can't use
> annotated queries in __in clauses.
Unintuitive and inconsistent. As much as I'm willing to declare extra()
the odd man out for most API consistency arguments and as an example of
solid behaviour, I like the current behaviour with
extra(...).values(...).
>
> 2) Modify things slightly - annotate() calls before a values() call
> aren't automatically added to the results, but annotate() calls after
> values() are included automatically. This would allow annotations to
> always be returned, optionally under those circumstances that allow
> (i.e., annotate() before values()).
This is probably the least intrusive and most supportable option.
>
> 3) Rely on multiple calls to values() to get annotate columns into the
> result set. i.e., the previous example would need to become something
> like:
>
> >>> Book.values('title','isbn').annotate(n_authors=Count('authors')).values('title','isbn','n_authors')
>
> This has the potential to be a bit fragile - we would need to put in
> all sorts of checks for changes in column lists between successive
> calls to values() - but it would allow all combinations of annotation.
No, really... you were just throwing this out to see how it sounded when
it the wall and broke, right? Not a good option.
Regards,
Malcolm
I'm not convinced we messed up - it's a messy situation in general.
The behaviour as implemented is mostly a requirement of the way
values() is interpreted. Once you start reducing the set of columns
that is returned by a queryset, you pretty much have to start
interpreting that as a grouping. The only other option I can think of
is to require that once a queryset contains an aggregate, you continue
to retrieve all the columns, but you only get back the set you
requested, thereby losing all the optimization benefits of values().
If you can propose a better interpretation for values() in these
circumstances, it probably isn't too late to do something about it,
but time is running out fast.
> Problem is, adding a new API to specify the grouping order has the same
> drawbacks as having to write values() twice -- you end up needing to
> name the fields once for values() and once for the order they group by
> and then you think about maybe extracting the latter info from the
> former and we're back where we started.
You also inherit all the validation problems of needing to ensure that
all the values() columns are also mentioned in the group_by(). It
could be munged into some interpretation of "working", but it wouldn't
be pretty as an end user API.
>> However, you can't
>> mention the annotated column in a values() clause before it is
>> specified, so are left in a situation where you can't actually return
>> the annotated value. For example in the query:
>>
>> >>> Book.values('title','isbn').annotate(n_authors=Count('authors'))
>>
>> n_authors wouldn't be returned in the result set, and there isn't any
>> way to add it to the returned values.
>>
>> So - some options:
>>
>> 1) Leave things as-is. Annotated columns always appear in the result
>> set. This is inconsistent with extra(), and means you can't use
>> annotated queries in __in clauses.
>
> Unintuitive and inconsistent. As much as I'm willing to declare extra()
> the odd man out for most API consistency arguments and as an example of
> solid behaviour, I like the current behaviour with
> extra(...).values(...).
Just to clarify - what exactly do you see as the current expected
behaviour of extra(select=).values(...)? I started fiddling with this
last night in the context of ticket #10132, but it doesn't appear (as
far as I can tell) to be specifically documented. When I started
messing about with code, I was somewhat surprised that I didn't get
flooded with test failures.
By way of example, this is what 1.0.X produces:
>>> Book.objects.extra(select={'a':'name','b':'price','c':'pages'}).values('name','pages','a')
[{'a': u'Book 1', 'c': 11, 'b': Decimal("11.11"), 'name': u'Book 1',
'pages': 11}, ...
>>> Book.objects.extra(select={'a':'name','b':'price','c':'pages'}).values('name','pages')
[{'name': u'Book 1', 'pages': 11}, ...
I couldn't find a test case that covers this usage - every use of
extra() that I found intersecting with named values() specifically
included all the extra selected columns in the values() clause. The
behaviour as implemented seems to walk a weird middle ground between
including all extra columns, and only including extra columns if
specifically requested.
To my mind, the second answer is right, but the first result should be:
[{'a': u'Book 1', 'name': u'Book 1', 'pages': 11}, ...
Have I missed something obvious here?
>> 2) Modify things slightly - annotate() calls before a values() call
>> aren't automatically added to the results, but annotate() calls after
>> values() are included automatically. This would allow annotations to
>> always be returned, optionally under those circumstances that allow
>> (i.e., annotate() before values()).
>
> This is probably the least intrusive and most supportable option.
This is pretty much the conclusion I had come to - I just didn't want
to sway anyone else's decision making process.
>> 3) Rely on multiple calls to values() to get annotate columns into the
>> result set. i.e., the previous example would need to become something
>> like:
>>
>> >>> Book.values('title','isbn').annotate(n_authors=Count('authors')).values('title','isbn','n_authors')
>>
>> This has the potential to be a bit fragile - we would need to put in
>> all sorts of checks for changes in column lists between successive
>> calls to values() - but it would allow all combinations of annotation.
>
> No, really... you were just throwing this out to see how it sounded when
> it the wall and broke, right?
Pretty much :-)
> Not a good option.
Completely agreed.
Russ %-)
I can't think of anything different. I tossed this around a bit when I
read your original mail, but improvements in one aspect led to
complications elsewhere. You're right that it's probably just a slightly
hairy area to try and smoothly abstract into an API.
That looks like A Bug(tm). I wouldn't have expected 'b' and 'c' to
appear there, since values() should describe the full set of values
returned (sans any extra behaviour of the annotate() portion).
If you don't have time or feel motivated to poke at that, assign a
ticket to me. I've had a brain failure somewhere along the lines by the
look of it, particularly in light of the next example.
>
> >>> Book.objects.extra(select={'a':'name','b':'price','c':'pages'}).values('name','pages')
> [{'name': u'Book 1', 'pages': 11}, ...
That looks correct. So it's only when you partially include the extra
values. That's cool (but b0rked).
>
> I couldn't find a test case that covers this usage - every use of
> extra() that I found intersecting with named values() specifically
> included all the extra selected columns in the values() clause. The
> behaviour as implemented seems to walk a weird middle ground between
> including all extra columns, and only including extra columns if
> specifically requested.
>
> To my mind, the second answer is right, but the first result should be:
>
> [{'a': u'Book 1', 'name': u'Book 1', 'pages': 11}, ...
>
> Have I missed something obvious here?
Assuming competence on the part of the guy who implemented it? Could be
an over-evaluation. Nice catch, though, thanks.
Regards,
Malcolm
I'm in the area, so I'm happy to look at it. I should be able to bash
out a solution this evening.
Thanks,
Russ %-)