|Feature request: Unicode collation algorithm in django||Ed Hagen||2/1/12 11:27 AM|
Let me preface this request by saying that when it comes to django,
I'm an advanced beginner (so this might be a dumb request).
The motivation for my request involved users of a django-based
database of international scholars who wanted their names sorted
"correctly." I explained that different languages sorted characters
differently, and therefore there was no single correct sort order, but
I promised to see if I could easily implement language-specific
orderings. What I found was that django seems to rely on the database
for this feature:
which (if I've understood things correctly) makes sense for
performance reasons, but makes it more difficult to change things on
the fly, e.g., to provide language-specific ordering.
Using suggestions on this page:
I fixed things well-enough for my present purposes, but I thought it
would be useful to abstract this capability away from the database,
with django itself providing some version of the Unicode collation
This might hook into django's internationalization and localization
features, and/or be accessible at a lower level, e.g., with a keyword
argument to, or variant of, "order_by".
Rationale: any django developer that needs to display sorted lists to
international users could probably benefit from this feature.
|Re: Feature request: Unicode collation algorithm in django||Łukasz Rekucki||2/1/12 1:56 PM|
On 1 February 2012 20:27, Ed Hagen <eha...@gmail.com> wrote:
Performance is the main concern here. Any query with ordering on a
Weirdly enough, I was looking at this thread lately, trying to explain
1) Use PyICU - this would solve a lot of problems (some which Django
2) Use the "locale" module: it will work... if you have all the
3) Use some other listed libraries: none of them looks like maintained
4) Write UCA ourselves from scratch. This involves including 1.6MB
All those solutions of course, still have the problem of needing all
> I fixed things well-enough for my present purposes, but I thought it
Could you describe your current solution in more detail?
|Re: Feature request: Unicode collation algorithm in django||Ed Hagen||2/1/12 4:15 PM|
Łukasz, Thanks for your comments.
Well, my use of the term "solution" was perhaps a bit generous. First,
I educated my users why this problem was hard to solve. Second, I
picked a single collation that everyone could live with. Third, I used
the locale module to sort on the application side, which, as you say,
was slow. But because the data don't change much, I was able to cache
the result to achieve acceptable performance.
Even if application-side sorting is slow, nevertheless, in many cases
that will be preferable to users not finding what they are looking for
because the item doesn't show up where it "should." So it's a tradeoff
between two aspects of user experience (speed vs. expected sort
order). In many cases, that tradeoff would favor language-specific
My two cents.
|Re: Feature request: Unicode collation algorithm in django||Anssi Kääriäinen||2/1/12 6:34 PM|
The problem is that application side sorting does not scale. So, if
you are depending on application side sorting, there will likely come
a day when you have so much data that it is simply impossible to do
the sorting Python-side. Databases are a lot faster at sorting,
especially with indexes.
Now, my proposed solution would be to have some way of doing:
SELECT name, ...
ORDER BY name collate 'fi';
That some way might be something like
and now collate would be in effect for filters (that is,
Making Django's ORM do the above isn't the most trivial thing. And not
all databases support collate clauses.
You can do the above with .extra() even now if your DB happens to
support collations. Default collation for your database might also be
an option for your particular problem. At least in PostgreSQL versions
prior to 9.1 you have one collation for the DB, which you can set only
at CREATE DB time.
Anyways, Django's design decision is to do sorting in the DB. And I
think that is a good decision.
|Re: Feature request: Unicode collation algorithm in django||Łukasz Rekucki||2/2/12 12:03 AM|
On 2 February 2012 03:34, Anssi Kääriäinen <anssi.ka...@thl.fi> wrote:
The user should probably be able to specifiy the collation only for
After a quick research, I think they actually do now (at least in
: At least since 5.0:
|Re: Feature request: Unicode collation algorithm in django||Łukasz Rekucki||2/2/12 12:07 AM|
2012/2/2 Łukasz Rekucki <lrek...@gmail.com>:
Just came to my mind, that we could just mimic the DBs and have a
|Re: Feature request: Unicode collation algorithm in django||Anssi Kääriäinen||2/2/12 4:00 AM|
On Feb 2, 10:07 am, Łukasz Rekucki <lreku...@gmail.com> wrote:I have been thining a lot about allowing annotate and order_by to
accept basically anything that has an .as_sql() method. So that you
could do things like:
of course, the above example is doable using current Django ORM.
params=f('name')).order_by(RawSQL('%s collate "%%s" desc nulls last',
These would be much better than the .extra(), as aliases are relabled
properly and you can "chain" the RawSQL clauses. In the ORM the code
that deals with .extra handling is kinda hacky, and this would
probably make that part of the ORM cleaner.
I think those really are doable. Using objects in cols, order by etc.
inside the ORM instead of the current implementation would probably
make the ORM cleaner and faster. Of course, without patch this is easy
Problem is, I don't have time to do anything about this right now, and
it seems the ORM-knowing core developers have the same problem.