Performance is the main concern here. Any query with ordering on a
text field would have to fetch all results and sort it on the
application side. It's just terrible.
>
> Using suggestions on this page:
>
> http://stackoverflow.com/questions/1097908/how-do-i-sort-unicode-strings-alphabetically-in-python
>
Weirdly enough, I was looking at this thread lately, trying to explain
to a beginner, why Python doesn't provide an easy way to do this,
which actually works :(. Summary of options:
1) Use PyICU - this would solve a lot of problems (some which Django
already solves by itself). But it's quite a big dependency on an
external package (written in C++, so I guess it won't run on PyPy,
Jython nor App Engine). Django currently has no external dependencies
and that's good :)
2) Use the "locale" module: it will work... if you have all the
possible locales compiled on your OS... and you're not running on
Windows... or using threads. AFAIK, switching locale is also quite
slow.
3) Use some other listed libraries: none of them looks like maintained
by authors.
4) Write UCA ourselves from scratch. This involves including 1.6MB
collation table in Django.
All those solutions of course, still have the problem of needing all
the data on the application side.
> I fixed things well-enough for my present purposes, but I thought it
> would be useful to abstract this capability away from the database,
> with django itself providing some version of the Unicode collation
> algorithm:
>
> http://unicode.org/reports/tr10/
>
> This might hook into django's internationalization and localization
> features, and/or be accessible at a lower level, e.g., with a keyword
> argument to, or variant of, "order_by".
Could you describe your current solution in more detail?
--
Łukasz Rekucki
The user should probably be able to specifiy the collation only for
the fields he wants, as it most likely uses a different type of index
and is more expensive then a standard ordering, so I like the
.collate(name="fi") option (and a shortcut of .collate("fi") to apply
to all text fields.
>
> Making Django's ORM do the above isn't the most trivial thing. And not
> all databases support collate clauses.
After a quick research, I think they actually do now (at least in
ORDER BY). MySQL[1] and SQLite[2] both have COLLATE, in Oracle we
could use NLSSORT()[3] which is something like locale.strxfrm.
>
> You can do the above with .extra() even now if your DB happens to
> support collations. Default collation for your database might also be
> an option for your particular problem. At least in PostgreSQL versions
> prior to 9.1 you have one collation for the DB, which you can set only
> at CREATE DB time.
>
[1]: At least since 5.0:
http://dev.mysql.com/doc/refman/5.0/en/charset-collate.html
[2]: SQLite doesn't support UCA by default, but lets you define any
collation: http://docs.python.org/library/sqlite3.html#sqlite3.Connection.create_collation
[3]: http://docs.oracle.com/cd/B19306_01/server.102/b14225/ch9sql.htm#i1006311
--
Łukasz Rekucki
Just came to my mind, that we could just mimic the DBs and have a
Collate operator (like Q, F, Count, etc.) + maybe some defaults on the
model:
M.objects.order_by(Collate("name", "uca"))
M.objects.filter(name__gte=Collate('e', "fi"))
--
Łukasz Rekucki