Inconsistent pagination when sorting by non-unique columns (ticket 34251)

197 views
Skip to first unread message

tatari...@gmail.com

unread,
Jan 11, 2023, 7:55:25 AM1/11/23
to Django developers (Contributions to Django itself)
As described in https://code.djangoproject.com/ticket/34251, some database backends (incl. PostgreSQL) will produce non-deterministic ordering when sorting by columns with non-unique values.
This leads to inconsistent pagination, which I argue falls under the same category as paginating by unordered QuerySet. 

I suggest we add a warning similar to the UnorderedObjectListWarning when total deterministic ordering is not specified (PK or unique/unique_together not null fields).

The consequence of this change is that a lot of projects will receive a warning. This issue is easy to overlook since non-unique columns like "name" can have mostly unique values and you need to have duplicates at the page borders to get into trouble. While seemingly hard to reproduce, I believe it impacts a lot of production systems out there.

Correctly handling it requires some thought, especially when exposing sorting options to the end user. Whether 3rd-party tools like django-sortable-listview or DRF (OrderingFilter) should automatically enforce the correct ordering by adding "pk"? This brings problems with indexing like in this ticket. Another option with these tools is adding pk or another unique field to the query string when specifying desired sorting, which looks just wrong to me, pushing responsibility to the front end.

What do you folks think? I've been bitten by this issue several times, in a production environment after smoothly running for a while. The first time was a complete surprise to me, and I believe it's may surprise quite a lot of developers as well, so adding a warning will at least bring the attention. 

Barry Johnson

unread,
Jan 11, 2023, 9:59:41 AM1/11/23
to Django developers (Contributions to Django itself)

I'd be very much -1 if the framework modified my sorting parameters (by adding 'pk' at the end of the ordering).  As mentioned, it can certainly break queries that can be handled by a covering index...  but mostly because 'pk' isn't the field we would prefer to use in this case anyway.  For example, when sorting a list of products, the final parameter ought to be the product's stocking number; when looking at items by store, the final parameter needs to be both stocking number and store.  There wouldn't seem to be -any- meaningful "always right" answer.

We have something like 20,000 tests running today (and will probably hit 40,000 over time), many of which do pagination and sorting.   I'd really, really hate to have to go fix all of those tests -- it would be thousands of them.  If there was a way to disable this proposed warning, I could live with that for a while.

baj
-------------
Barry Johnson

tatari...@gmail.com

unread,
Jan 11, 2023, 10:35:24 AM1/11/23
to Django developers (Contributions to Django itself)
Thanks for the input, Barry!
To be clear - Django already enforces total ordering in the admin panel by adding "pk" (in this method), which makes me believe that's a viable approach.
Sure, there can be domain-specific unique/unique_together fields that ensure consistent ordering as well, that's why it first checks to see if those were used.
What is your solution? Does your frontend/UI provide the "stocking number"/"stocking number and store" as the final parameter(s) or do you somehow inject them in the VIew? 
P.S. I believe `warnings.filterwarnings` can be used to suppress specific warnings. 

Adrian Torres

unread,
Jan 12, 2023, 4:45:25 AM1/12/23
to Django developers (Contributions to Django itself)
I've been bitten by this once before as well but I think it's the kind of mistake you make once and never again, and if you happen to encounter it again you immediately know what it is (duplicated/missing items across pages? must be a non-unique ordering).

I don't think implicitly adding an extra field to order by is a good solution because the user might be totally unaware that such a thing is happening, maybe it could be a setting? e.g. (ENFORCE_TOTAL_ORDERING) and it should **only** be applied in the context of paginated querysets, but I would be more in favor of adding a warning to the pagination documentation.

I don't think warnings would work, because as soon as you get a "false-positive" users would simply filter/ignore the entire warning which would hide actual positives, but I guess it's better than nothing.

To sum up:
-1 on implicit pk ordering by default
+0 on implicit pk ordering via a setting
+0 on warnings
+1 on documentation update

Jörg Breitbart

unread,
Jan 12, 2023, 5:21:55 AM1/12/23
to django-d...@googlegroups.com
We had a similar issue in table views, where users can highly customize
the output (like switching on/off columns, complex filtering and custom
multilevel ordering).

For the ordering issue at hand the solution was to append the pk as n-th
order directive on the queryset, if all previous fields were not unique.

I'm not sure, if this edge case can or should be handled by django
builtin logic, at least for us the condition, whether to or not to
append the pk field was not solely based on uniqueness of fields.

Cheers,
Jerch
Reply all
Reply to author
Forward
0 new messages