Re: Wrong sort order on swedish characters…?

448 views
Skip to first unread message

Adam C

unread,
Jan 15, 2013, 8:03:46 AM1/15/13
to mongod...@googlegroups.com
Hi Viktor,

There is an existing request for this (to vote on and watch):


But for now, the sort order is not changeable based on the language and just defaults to the UTF-8 sort order, the only option right now is to handle this on the client/application side.

Adam.

On Tuesday, January 15, 2013 12:27:04 PM UTC, Viktor Hedefalk wrote:
Hi, i have problems with sort order of swedish characters.

> db.test.save([{foo: "äpple"}, {foo: "ål"}, {foo: "öl"}])
> db.test.find().sort( { foo : 1} )
{ "_id" : ObjectId("50f549e6a81f4afd95439ed0"), "foo" : "äpple" }
{ "_id" : ObjectId("50f549e6a81f4afd95439ed1"), "foo" : "ål" }
{ "_id" : ObjectId("50f54aa3a81f4afd95439ed2"), "foo" : "öl" }

This is definitely wrong. The order of å and ä is mixed up. It's the other way around: å, ä, ö are the last letters of the swedish alphabet.

Is there any locale-setting for tuning sort order? Thing is I never seen this before in any other db - I don't believe that å is after ä in any encoding I've seen?
 
Thanks,
Viktor


Derick Rethans

unread,
Jan 15, 2013, 8:23:16 AM1/15/13
to mongod...@googlegroups.com
On Tue, 15 Jan 2013, Viktor Hedefalk wrote:

> Hi, i have problems with sort order of swedish characters.
>
> > db.test.save([{foo: "äpple"}, {foo: "ål"}, {foo: "öl"}])
> > db.test.find().sort( { foo : 1} )
> { "_id" : ObjectId("50f549e6a81f4afd95439ed0"), "foo" : "äpple" }
> { "_id" : ObjectId("50f549e6a81f4afd95439ed1"), "foo" : "ål" }
> { "_id" : ObjectId("50f54aa3a81f4afd95439ed2"), "foo" : "öl" }
>
> This is definitely wrong. The order of å and ä is mixed up. It's the other
> way around: å, ä, ö are the last letters of the swedish alphabet.
>
> Is there any locale-setting for tuning sort order? Thing is I never seen
> this before in any other db - I don't believe that å is after ä in any
> encoding I've seen?

MongoDB does not have locale based sorting, and merely uses Unicode
codepoint ordering. In Unicode, the letters you mention are (in order):

ä U+00E4 LATIN SMALL LETTER A WITH DIAERESIS
å U+00E5 LATIN SMALL LETTER A WITH RING ABOVE
ö U+00F6 LATIN SMALL LETTER O WITH DIAERESIS

This issue is being tracked in
https://jira.mongodb.org/browse/SERVER-1920 - and there is a workaround
described in
https://jira.mongodb.org/browse/SERVER-1920?focusedCommentId=175927&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-175927

I just upvoted the server ticket, I suggest you do as well.

cheers,
Derick

--
{
website: [ "http://mongodb.org", "http://derickrethans.nl" ],
twitter: [ "@derickr", "@mongodb" ]
}

Viktor Hedefalk

unread,
Jan 15, 2013, 8:42:06 AM1/15/13
to mongod...@googlegroups.com
Thanks,

I did not know that the Unicode order was that way - have never seen it in any other storage-system, so I guess they all have some kind of collation-settings.

I'll probably try that workaround. I have already hooks on save so it wont be such a big thing to add a sort-field in my java-application. Problem is I have a tables with massive data that are to be sortable by different columns so I'll have to add lots of these extra sort-fields - one per sortable alphanumeric column. Feels a bit messy...

Anyway, I'm gonna follow that ticket closely.

Thanks again,
Viktor
Reply all
Reply to author
Forward
0 new messages