French sort order

10 views
Skip to first unread message

TonyNaden

unread,
May 13, 2021, 7:54:51 AM5/13/21
to Shoebox/Toolbox Field Linguist's Toolbox
Has anyone got a French sort order that works, preferably known to be working on current versions of TOOLBOX and LexiquePro?

ToolBox Support

unread,
May 13, 2021, 12:32:19 PM5/13/21
to shoeboxtoolbox-fiel...@googlegroups.com
Hi, Tony.

I downloaded the Language Encodings zip from the Toobox website. Among other things is the attached FrenchU.lng. Looking at the underlying codes, I can tell you that it has the composite forms rather than using separate overstrikes.

Let me know if you find any problems with it.

Karen
Toolbox Support


On Thu, May 13, 2021 at 6:54 AM TonyNaden <lostma...@gmail.com> wrote:
Has anyone got a French sort order that works, preferably known to be working on current versions of TOOLBOX and LexiquePro?

--
You received this message because you are subscribed to the Google Groups "Shoebox/Toolbox Field Linguist's Toolbox" group.
To unsubscribe from this group and stop receiving emails from it, send an email to shoeboxtoolbox-field-ling...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/shoeboxtoolbox-field-linguists-toolbox/2ee35e8d-6d2d-47fd-a9a2-b33832f84c2en%40googlegroups.com.
FrenchU.lng

Tony Naden

unread,
May 14, 2021, 4:55:15 AM5/14/21
to shoeboxtoolbox-fiel...@googlegroups.com
Thanks, I'll try it.



--
Address: "Lost Marbles", 31, Reading Road,
Pangbourne, Berks., RG8 7HY  -

Tel.: 01189842368

Keep us, good Lord,
under the shadow of your mercy
in this time of uncertainty and distress.
Sustain and support the anxious and fearful,
and lift up all who are brought low;
that we may rejoice in your comfort
knowing that nothing can separate us from your love
in Christ Jesus our Lord.
Amen.


Tony Naden

unread,
May 14, 2021, 5:01:45 AM5/14/21
to shoeboxtoolbox-fiel...@googlegroups.com
Looking good at the moment, wish I'd thought of asking when it first came up! 

On Thu, 13 May 2021 at 17:32, ToolBox Support <too...@sil.org> wrote:

David Rowe

unread,
Aug 17, 2021, 12:23:29 AM8/17/21
to shoeboxtoolbox-fiel...@googlegroups.com

ToolBox Support

unread,
Aug 17, 2021, 12:04:12 PM8/17/21
to shoeboxtoolbox-fiel...@googlegroups.com
Thanks, Dave, for reporting this.

According to Wikipedia, the o-e combination goes where o followed by e would occur in the sorting. Toolbox can do this. But I have two questions:
1) Is this correct, ie, is this what is wanted.
2) Do you already have a properly functioning French sort order (and the rest of the Language Encoding)?

Thanks.
Karen
Toolbox Support

David Rowe

unread,
Aug 29, 2021, 6:34:57 PM8/29/21
to shoeboxtoolbox-fiel...@googlegroups.com
Hi Karen, Hope all is well with you folks.

1) I believe that is correct: "œ" (and "Œ") should sort like "oe" (and "OE").
2) Sorry, no.

Thanks,
Dave

ToolBox Support

unread,
Aug 30, 2021, 11:40:39 PM8/30/21
to shoeboxtoolbox-fiel...@googlegroups.com
Thanks for the reminder about this, David. 

I've attached a revised sort-order which includes a CC table. They would both go into the settings folder.

I'm hoping you or someone can check it out for me. I checked, but real data often shows that assumptions in faked data don't always hold up.

My faked data included -- and sorted in this order -- the following:
coda
cœur
coeur (separate o and e)
coffee
Not pure French. Let me know if there are problems.

I also added in the ae ligature -- after the a as I wasn't sure. Wikipedia says it's rarely used and didn't say anything about the sorting. Is it also to be treated like "a" followed by "e"? If it's not used, it won't hurt anything by being there.

They should be attached.

Karen
Toolbox Support

PS, anyone else is welcome to test these too. When I hear confirmation, I'll replace the less correct version in the set of Language Encodings.



FrenchU.lng
French oe.cct

Andrew Cunningham

unread,
Sep 2, 2021, 8:35:12 PM9/2/21
to shoeboxtoolbox-fiel...@googlegroups.com
In CLDR, French Collation has no additional rules, ie it uses the CLDR Collation Algorithm unmodified.

French Canadian collation contains an extra modification.

I am assuming toolbox is either using byte/codepoint point order for sorting or DUCET from the UCA.  If it is using DUCET, an option to use the CLDR collation algorithm may be useful, since a number of languages (including French) will sort correctly with CLDR root collation rather than with DUCET.

Usually within my Python code, if I do not need a specific locale collation, I generally fallback to the root collation in CLDR rather than DUCET. Since the two will sort differently.

Andrew



--
Andrew Cunningham
lang.s...@gmail.com


ToolBox Support

unread,
Sep 3, 2021, 4:25:27 PM9/3/21
to shoeboxtoolbox-fiel...@googlegroups.com
Hi, Andrew. 

Thanks for your message.

You said:
I am assuming Toolbox is either using byte/codepoint point order for sorting or DUCET from the UCA. 
Actually, we aren't. 

Toolbox predates CLDR and probably DUCET (didn't see a date) by some years. Sorting is user-specified, as explained in the Help file which is now the Reference manual. Beyond the basic primary and secondary orderings, it is possible to modify the order using a Consistent Changes table to build the sorting key. With that we have done some very complex syllable-based sorting. Another approach to sorting is to use the basic code points. That may be at least related to what you are referring to. If no sort order is specified, but punctuation is specified, then Toolbox will go with the code values. This works for Korean, but for English it would result in all the upper case words listed first, then the lower case.

Karen
Toolbox Support


Reply all
Reply to author
Forward
0 new messages