Support Negative Indexing on QuerySets

587 views
Skip to first unread message

Mark Young

unread,
Jul 30, 2013, 4:32:41 PM7/30/13
to django-d...@googlegroups.com
In this support ticket (https://code.djangoproject.com/ticket/13089), the original closer said "This is a long-standing explicit design decision in the ORM, and so gets a wontfix...". What is the reasoning behind this? Why is negative indexing of querysets disallowed?

Thanks!

Florian Apolloner

unread,
Jul 30, 2013, 5:06:51 PM7/30/13
to django-d...@googlegroups.com
Hi Mark,

How do you think such support would look like? For negative indices you'd have to know the size of the resultset to be able to do "limit somthing offset length-your_negative_index" -- this doesn't seem to make any sense for an ORM. You can always do list(qs)]:-1] though…

Cheers,
Florian

Andre Terra

unread,
Jul 30, 2013, 5:34:18 PM7/30/13
to django-d...@googlegroups.com
On Tue, Jul 30, 2013 at 6:06 PM, Florian Apolloner <f.apo...@gmail.com> wrote:
You can always do list(qs)]:-1] though…

Although you really shouldn't [1].

As for the reasons for disallowing negative indexes, dcramer's comment in the ticket makes it clear: there is no way to infer what the last item in a query would be, except if you order it descendingly. For what is worth, production code should never rely on any kind of indexing that's not accompanied by an explicit order-by clause, as the default ordering is unrealiable -- at least in PostgreSQL[2], and I assume in other vendors as well[3].

In fact, one might even argue that it would make more sense to disallow any sort of indexing on querysets lacking an explicit .order_by(), in order to enforce best practices, although I would find that a little too strict. The specific one-off case of retrieving a single row for testing/developing/debugging would be best served by the use of .first() which orders by the primary_key by default[4], which therefore yields a different query from the one created by queryset indexing[5].

As it currently stands, the API mirrors the experience in writing a raw query insofar as it is *impossible* to retrieve the last row of a query without explicitly defining an order by clause. For this reason, I'm -1 on the proposed change.


Cheers,
AT




Florian Apolloner

unread,
Jul 30, 2013, 5:55:14 PM7/30/13
to django-d...@googlegroups.com
On Tuesday, July 30, 2013 11:34:18 PM UTC+2, Andre Terra wrote:
On Tue, Jul 30, 2013 at 6:06 PM, Florian Apolloner <f.apo...@gmail.com> wrote:
You can always do list(qs)]:-1] though…

Although you really shouldn't [1].

Right, it depends on your usecase; I was just trying to point out other alternatives aside from the ones mentioned on the ticket.

Wim Lewis

unread,
Jul 30, 2013, 6:04:42 PM7/30/13
to django-d...@googlegroups.com

On 30 Jul 2013, at 2:06 PM, Florian Apolloner wrote:
> How do you think such support would look like? For negative indices you'd have to know the size of the resultset to be able to do "limit somthing offset length-your_negative_index" -- this doesn't seem to make any sense for an ORM. You can always do list(qs)]:-1] though…

It seems like the first comment in the ticket answers that question. Django would reverse the sense of the query's ordering clause and use a simple LIMIT.

If there isn't an ordering clause in the query, then I agree it makes no sense to do any indexing other than [0:N].

ubernostrum's comment in the ticket makes it sound as if there is some explicit reasoning behind this, though, that's at least three years old. We can speculate, but does anyone know what that reasoning actually was?


Andre Terra

unread,
Jul 30, 2013, 7:03:31 PM7/30/13
to django-d...@googlegroups.com
On Tue, Jul 30, 2013 at 6:55 PM, Florian Apolloner <f.apo...@gmail.com> wrote:
Right, it depends on your usecase; I was just trying to point out other alternatives aside from the ones mentioned on the ticket.

I'm sorry if I seemed arrogant in my post. I most definitely did not intend it, as I was absolutely sure that you were aware of the implications of wrapping list() around a queryset.

My motivation was solely to highlight that caveat for potential lurkers and future readers visiting this thread that quite probably will not be as experienced as you are.


Best wishes,
AT

Florian Apolloner

unread,
Jul 31, 2013, 2:48:57 AM7/31/13
to django-d...@googlegroups.com


On Wednesday, July 31, 2013 1:03:31 AM UTC+2, Andre Terra wrote:
On Tue, Jul 30, 2013 at 6:55 PM, Florian Apolloner <f.apo...@gmail.com> wrote:
Right, it depends on your usecase; I was just trying to point out other alternatives aside from the ones mentioned on the ticket.

I'm sorry if I seemed arrogant in my post. I most definitely did not intend it, as I was absolutely sure that you were aware of the implications of wrapping list() around a queryset.

No worries, you weren't sry if my wording suggested that.

Florian Apolloner

unread,
Jul 31, 2013, 2:56:21 AM7/31/13
to django-d...@googlegroups.com
Hi Wim,


On Wednesday, July 31, 2013 12:04:42 AM UTC+2, Wim Lewis wrote:
On 30 Jul 2013, at 2:06 PM, Florian Apolloner wrote:
> How do you think such support would look like? For negative indices you'd have to know the size of the resultset to be able to do "limit somthing offset length-your_negative_index" -- this doesn't seem to make any sense for an ORM. You can always do list(qs)]:-1] though…

It seems like the first comment in the ticket answers that question. Django would reverse the sense of the query's ordering clause and use a simple LIMIT.

In my opinion it doesn't; eg imagine the following as query: MyModel.objects.order_by('whatever')[0:-50]; this isn't translateable into MyModel.objects.order_by('-whatever')[50:] (the issue here is that the end is now again undefined) since at least SQLite doesn't support an OFFSET clause without a LIMIT. Also I think it's more natural to rewrite the ordering of the query yourself to express that instead of using negative ranges.


If there isn't an ordering clause in the query, then I agree it makes no sense to do any indexing other than [0:N].

In that case it's even debatable if limiting makes any sense at all ;)

Cheers,
Florian

Daniele Procida

unread,
Jul 31, 2013, 5:54:02 AM7/31/13
to django-d...@googlegroups.com
On Tue, Jul 30, 2013, Andre Terra <andre...@gmail.com> wrote:

>As for the reasons for disallowing negative indexes, dcramer's comment in
>the ticket makes it clear: there is no way to infer what the last item in a
>query would be, except if you order it descendingly. For what is worth,
>production code should never rely on any kind of indexing that's not
>accompanied by an explicit order-by clause, as the default ordering is
>unrealiable -- at least in PostgreSQL[2], and I assume in other vendors as
>well[3].

By "an explicit order-by clause" I presume you include the "ordering" attribute on the model?

Daniele

Simon Riggs

unread,
Jul 31, 2013, 8:01:56 AM7/31/13
to django-developers
On 31 July 2013 07:56, Florian Apolloner <f.apo...@gmail.com> wrote:
> Hi Wim,
>
>
> On Wednesday, July 31, 2013 12:04:42 AM UTC+2, Wim Lewis wrote:
>>
>> On 30 Jul 2013, at 2:06 PM, Florian Apolloner wrote:
>> > How do you think such support would look like? For negative indices
>> > you'd have to know the size of the resultset to be able to do "limit
>> > somthing offset length-your_negative_index" -- this doesn't seem to make any
>> > sense for an ORM. You can always do list(qs)]:-1] though…
>>
>> It seems like the first comment in the ticket answers that question.
>> Django would reverse the sense of the query's ordering clause and use a
>> simple LIMIT.
>
>
> In my opinion it doesn't; eg imagine the following as query:
> MyModel.objects.order_by('whatever')[0:-50]; this isn't translateable into
> MyModel.objects.order_by('-whatever')[50:] (the issue here is that the end
> is now again undefined) since at least SQLite doesn't support an OFFSET
> clause without a LIMIT. Also I think it's more natural to rewrite the
> ordering of the query yourself to express that instead of using negative
> ranges.

At first glance, there's no important difference in offsets from one
end of an array or the other.

ORDER BY x ASC LIMIT 10 gives us the first 10 results in the ordered
list of result rows.
ORDER BY x DESC LIMIT 10 gives us the last 10 results in the ordered
list of result rows.

except that the overall ordering of output rows is different. So there
*is* a qualitative difference in negative indexing and I can see why
people think they want it. But there are a few problems since SQL
Standard does not provide a mechanism for this nor are any DBMS I'm
aware of tuned for that type of request. I understand why that is,
since "scroll all the way to end of result set" type logic isn't too
smart with arbitrary result set sizes that we see on databases,
whereas with Python array sizes are bounded much more significantly.

To implement this just in Django could be done in the following way.

If you want to see the data in one ordering (ASC), yet define the
LIMIT in another ordering you can either
1) retrieve via ASC: scroll all the way to the end of the result set,
then step back $LIMIT and read from there.
2) retrieve via DESC, apply LIMIT and then re-sort by ASC using two queries
3) retrieve via DESC, apply LIMIT and then re-sort by ASC using derived table
4) implement core logic for that into PostgreSQL

(1) would be very expensive with large result sets and should be avoided

(2) would optimise better but is hard to define generically.

(3) would require two sorts on the data like this

SELECT * FROM (rest-of-query ORDER BY ... DESC LIMIT n ) ORDER BY ... ASC

(4) though I'm not sure there'd be much interest in a not-very-useful
and non-standard SQL feature that would be non-trivial to implement


Django could support negative indexes, but it would require some
complex SQL that might not optimise well so it seems worth avoiding.

The order_by('-whatever') doesn't optimise well since this is a
function on the whatever column, which will defeat any attempt to
utilise an index to supply the ordering. i.e. it results in a
sequential scan and should be avoided.

I'd suggest an explicit option to change the ordering ASC | DESC,
which is matched by a SQL Standard feature.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Andre Terra

unread,
Jul 31, 2013, 10:02:34 AM7/31/13
to django-d...@googlegroups.com
Indeed. I include in that definition anything that causes an ORDER BY clause to be inserted in the final SQL query.


Cheers,
AT

Tom Evans

unread,
Jul 31, 2013, 11:16:49 AM7/31/13
to django-d...@googlegroups.com
On Tue, Jul 30, 2013 at 11:04 PM, Wim Lewis <wi...@omnigroup.com> wrote:
>
> On 30 Jul 2013, at 2:06 PM, Florian Apolloner wrote:
>> How do you think such support would look like? For negative indices you'd have to know the size of the resultset to be able to do "limit somthing offset length-your_negative_index" -- this doesn't seem to make any sense for an ORM. You can always do list(qs)]:-1] though…
>
> It seems like the first comment in the ticket answers that question. Django would reverse the sense of the query's ordering clause and use a simple LIMIT.
>

What would it do if the query had already been evaluated? IE, how many
queries does this run?

qs = Model.objects.filter(…).order_by(…)
print qs[0]
print qs[-1]
print qs[0]

If it is as simple as reversing the order by and negating the index,
then the programmer themselves can do this, there is no need for
magic. Having code that looks like it is a simple operation, but is in
fact changing the logic of when queries are evaluated is evil in my
opinion.

-1

Cheers

Tom

Loic Bistuer

unread,
Jul 31, 2013, 2:39:15 PM7/31/13
to django-d...@googlegroups.com
In your example "print qs[0]" evaluates a *clone* of "qs", not "qs" itself.

Therefore "qs[0]; qs[-1]; qs[0]" triggers 3 queries, just like "qs[0]; qs[0]; qs[0]" would.

Now, if you really evaluate your "qs", for example by doing "list(qs)", then further slicing/indexing on "qs" would operate on the cached result, which internally is a plain Python list.

I've put together a quick and dirty proof of concept: https://github.com/loic/django/compare/ticket13089.

--
Loic

Richard Laager

unread,
Jul 31, 2013, 2:58:53 PM7/31/13
to django-d...@googlegroups.com
On Wed, 2013-07-31 at 13:01 +0100, Simon Riggs wrote:
> 3) retrieve via DESC, apply LIMIT and then re-sort by ASC using derived table
...
> (3) would require two sorts on the data like this
>
> SELECT * FROM (rest-of-query ORDER BY ... DESC LIMIT n ) ORDER BY ... ASC

I haven't followed this thread closely, but in case this is relevant...
The MSSQL plugin (at least for old versions of SQL Server) uses*
something this already. For example, in the following scenario
(extraneous columns omitted):

In [4]: User.objects.order_by('username')[20:30]
Out[4]: Executing SQL:
SELECT *
FROM (
SELECT TOP 10 *
FROM (
SELECT TOP 30 [users].[id], [users].[username] AS OrdAlias1 FROM [users]
ORDER BY OrdAlias1 ASC
) AS [users]
ORDER BY OrdAlias1 DESC
) AS [users]
ORDER BY OrdAlias1 ASC

* I didn't test the latest version of it or Django.

--
Richard
signature.asc

Ian Kelly

unread,
Jul 31, 2013, 3:09:29 PM7/31/13
to django-d...@googlegroups.com
On Tue, Jul 30, 2013 at 4:04 PM, Wim Lewis <wi...@omnigroup.com> wrote:
>
> On 30 Jul 2013, at 2:06 PM, Florian Apolloner wrote:
>> How do you think such support would look like? For negative indices you'd have to know the size of the resultset to be able to do "limit somthing offset length-your_negative_index" -- this doesn't seem to make any sense for an ORM. You can always do list(qs)]:-1] though…
>
> It seems like the first comment in the ticket answers that question. Django would reverse the sense of the query's ordering clause and use a simple LIMIT.

This would only work if the ordering specified were total. If it were
not a total ordering, then I believe there is no guarantee that the
DESC order will simply be the reverse of the ASC order.

Michael Manfre

unread,
Jul 31, 2013, 3:30:57 PM7/31/13
to django-d...@googlegroups.com
Is that from django-pyodbc or a very old version of django-mssql?

The recent versions of django-mssql use SELECT ROW_NUMBER() OVER (...) to avoid some of the issues with flipping the order multiple times. The biggest issue is that order is not guaranteed without explicitly specifying the order for all columns.

Regards,
Michael Manfre

Tom Evans

unread,
Aug 1, 2013, 5:05:07 AM8/1/13
to django-d...@googlegroups.com
On Wed, Jul 31, 2013 at 7:39 PM, Loic Bistuer <loic.b...@sixmedia.com> wrote:
> In your example "print qs[0]" evaluates a *clone* of "qs", not "qs" itself.
>
> Therefore "qs[0]; qs[-1]; qs[0]" triggers 3 queries, just like "qs[0]; qs[0]; qs[0]" would.
>

Fine, be pedantic:

qs = ...
print len(qs)
print qs[0]
print qs[0]

This is one query.

qs = ...
print len(qs)
print qs[0]
print qs[-1]
print qs[0]

How many queries for this?

Loic Bistuer

unread,
Aug 1, 2013, 5:44:48 AM8/1/13
to django-d...@googlegroups.com
On Aug 1, 2013, at 4:05 PM, Tom Evans <teva...@googlemail.com> wrote:

> qs = ...
> print len(qs)
> print qs[0]
> print qs[-1]
> print qs[0]
>
> How many queries for this?

Just one and "qs[-1]" will return the last element of the cached result.

I'm not trying to be pedantic, I'm just pointing out that a queryset becomes a different beast once it has been evaluated; it's basically a simple list of cached result.

--
Loic

Tom Evans

unread,
Aug 1, 2013, 7:59:11 AM8/1/13
to django-d...@googlegroups.com
Yes you are right, I was mistaken in thinking that indexing would
evaluate the queryset if not evaluated - I think it should tbh,
equating qs[10] to either qs[10:11].get() or object_cache[10] does not
seem right, but impossible to change existing accepted behaviour.

I do not like that the behaviour of qs[10] changes whether the qs is
evaluated or not, or that iterating through a queryset by index could
be woefully slow unless the developer explicitly evaluates the
queryset beforehand - and if you're a developer who thinks that it is
right to iterate through a queryset using an index, you probably would
not be aware of the need to evaluate it first.

Cheers

Tom
Reply all
Reply to author
Forward
0 new messages