pandas.sort_values() but locale specific

152 views
Skip to first unread message

zlju...@gmail.com

unread,
Feb 12, 2021, 2:59:01 AM2/12/21
to PyData
Hi,

what is the official way of sorting values but respecting locale specifics?
For example, in English you sort strings like:
a, b, c, d, e, f, g...

If want to sort the data in Croatian (hr_HR) specific manner, like this:
a, b, c, ć, č, d, dž, đ, e, f....

what are my options?

Regards

Josh Friedlander

unread,
Feb 12, 2021, 4:35:01 AM2/12/21
to pyd...@googlegroups.com
In modern Pandas (I'm using 1.2.2), sort_values() takes a key parameter, just like sorted() does for lists. The only difference (as explained in the docs) is that it must be vectorised. So we can adapt this example:

df = pd.DataFrame({'a':  ['d', 'č', 'dž', 'đ', 'a']})
letters=['a', 'b', 'c', 'ć', 'č', 'd', 'dž', 'đ', 'e', 'f']  # here you put your alphabet
df['a'].sort_values(key=lambda x: x.map({i:letters.index(i) for i in letters}))

>>
4     a
1     č
0     d
2    dž
3     đ
Name: a, dtype: object

--
You received this message because you are subscribed to the Google Groups "PyData" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pydata+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/pydata/efdee2cc-012d-4acc-a4c4-eef87dea74ben%40googlegroups.com.

Zoran Ljubišić

unread,
Feb 13, 2021, 6:37:28 AM2/13/21
to pyd...@googlegroups.com
Thank you Josh, it works.
The only thing I would like to avoid is specifying alphabet manually.
It would be nice if I can get locale aware letters (numbers, interpunction...) from somewhere.
I am aiming to the ICU library. 
Anyway, if I don't find a solution for that, your suggestion will surely work.

Thanks, again. 👍

Zoran


Reply all
Reply to author
Forward
0 new messages