the :sort command does not appear to give expected result

129 views
Skip to first unread message

Chris Jones

unread,
Oct 27, 2020, 6:55:47 PM10/27/20
to vim...@googlegroups.com
Here's a (test) file that contains a sample of single characters from
the French alphabet.

Column 1 contains a <tab> character (0x09) and column 2 contains the
actual letters.

A
E
O
À
È
É
Ô
Œ

If I use the sort command provided on linux by the GNU coreutils package
so as to sort this file at the terminal with the following locale:

LANG=en_US.UTF-8
LANGUAGE=
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=

... without changing locales the resulting ouput appears to be
sorted the way it should be:

A
À
E
É
È
O
Ô
Œ

But when I edit the file in vim and run the :sort / / where the '//'
pattern contains a tab character (0x09) nothing happens.

In other words... the fancy pants letters (À, È, É, Ô, Œ ) stay where
they are instead of being moved to the spot where they belong.

So I tried launching vim like so:

$ LANG='fr_FR.UTF-8' vim

I noticed that vim was now talking French to me and when I ran the
':language' commmand I saw that vim's locale-related variables were now
set to the 'fr_FR' locale:

Langue courante pour :

"LC_CTYPE=fr_FR.UTF-8;
LC_NUMERIC=C;
LC_TIME=fr_FR.UTF-8;
LC_COLLATE=fr_FR.UTF-8;
LC_MONETARY=fr_FR.UTF-8;
LC_MESSAGES=fr_FR.UTF-8;
LC_PAPER=fr_FR.UTF-8;
LC_NAME=fr_FR.UTF-8;
LC_ADDRESS=fr_FR.UTF-8;
LC_TELEPHONE=fr_FR.UTF-8;
LC_MEASUREMENT=fr_FR.UTF-8;
LC_IDENTIFICATION=fr_FR.UTF-8"

But when I ran the same ':sort / /' command it didn't make any difference.

Am I doing it wrong?

Thanks,

CJ

P.S. I'm using a bit of vim trickery to translate the LaTeX '\index ...'
etc. stuff to html tags so as to have a basic index with links to
anchors in the HTML version of the document. Unfortunately the original
document happens to be in French... and naturally... correct sorting of
the 'TABLE ALPHABÉTIQUE' is crucial (I do want eggs/œufs to appear under
letter 'O'... not relegated to the index's last page).

I've read the ':h :sort' doc something like a dozen times and find parts
of it a little cryptic. Especially when somewhere near the end it says:
'Vim does do a "stable" sort.' :-) What's up with that?

Tony Mechelynck

unread,
Oct 27, 2020, 10:27:12 PM10/27/20
to vim_use
A "stable" sort is a sort which will keep lines with the same sort
keys in the order they were before the sort. (If you sort on whole
lines the difference is not visible, unless there exist different
lines which sort as equal, but if you sort on "pattern" or on "first
number" it may matter.)

But there is another few sentences which may be relevant in the help
for :sort, near the end, as follows:

<quote>
The details about sorting depend on the library function used. There is no
guarantee that sorting obeys the current locale. You will have to try it out.
</quote>

$LC_COLLATE is the part of the locale which says how to sort. if
$LC_ALL is set if overrides all the others, otherwise $LANG is used as
a fallback for any locale variable which is not set. ":lang" with no
arguments lists all settiings after taking care of $LANG and/or
$LC_ALL if present.

Best regards,
Tony.

Chris Jones

unread,
Oct 31, 2020, 3:16:53 PM10/31/20
to vim_use
Thanks for reminding me what 'stable' means in this context. What I am
driving at is that... stable or not... I just need sort to do the job
right, which in this particular use case appears not to be the case.

In other words when I use the vim :sort command the output should have
index entries starting with É under letter E... Œ under letter O, etc.
which as far as I know is the way things work with French language
indexes.

As quoted above the ':help :sort' documentation proceeds to inform me
that I will have to 'try it out'. Seriously?

As it happens my original post explained just that. I 'tried it out' and
the result is either of two things: I did not 'try it out' right (user
error) or vim does not sort correctly (bug).

Now you're telling me that it's not vim's fault (couldn't care less
whose fault it is)... it's "the libary's".

So what's the next step?

Should I determine what library is at fault so I can discuss it with the
developer... or should I ask vim to kindly switch to a library that
actually works such as GNU's (coreutils) which as tested (cf. original
post) does sort correctly?

Thank you,

CJ

Tekki

unread,
Nov 1, 2020, 2:45:28 AM11/1/20
to vim_use
Chris Jones schrieb am Samstag, 31. Oktober 2020 um 20:16:53 UTC+1:

So what's the next step?


You could create your own sort command in vimrc, for example

command -nargs=1 Sort :.,+<args>!sort -

Then :Sort7 will sort the above list of letters corretly.

Dominique Pellé

unread,
Nov 1, 2020, 3:49:41 AM11/1/20
to Vim List
> Then :Sort7 will sort the above list of letters correctly.

Indeed, you could use %!sort to use the Unix sort command
instead of the Vim ex command (possibly replace % with
another range if you don't want to sort the entire file).

That said, sorting using the locale ordering with Ex :sort
could be useful. I see that ex_sort() in ex_cmds.c calls
sort_compare() and that function calls STRCMP() or
STRICMP(). strcmp() not use the locale, but strcoll() does.
We could consider adding a sorting option to honor the current
locale (e.g. :sort l) which would compare using strcoll() instead
of STRCMP or STRICMP.

Regards
Dominique

Dominique Pellé

unread,
Nov 1, 2020, 5:56:51 AM11/1/20
to Vim List
I just created a git pull request to implement sorting using the
current locale. It adds a l option to the :sort Ex command.

See:

https://github.com/vim/vim/pull/7237

Regards
Dominique

Gary Johnson

unread,
Nov 1, 2020, 8:32:09 PM11/1/20
to Vim List
On 2020-11-01, Dominique Pellé wrote:
> Dominique Pellé wrote:
Is this consistent with Christian's fix for #6229 in June?
I haven't looked at either patch closely--I just remembered that
this issue had been discussed before--but they seem to take
different approaches to setting the collation order. I don't
particularly care how it's done, but I do care that it's done
consistently.

Regards,
Gary

Dominique Pellé

unread,
Nov 2, 2020, 1:55:08 AM11/2/20
to Vim List
Hi Garry

I did not know or remember about Christian's patch
(vim-8.2.0988) which:
- introduced the read-only v:collate variable.
- and added an option using a dictionary parameter
to use collation order with readdir() and readdirex(),.

Christian's patch did not affect :sort or sort().
My patch 8.2.1933 introduced options to :sort and sort()
to use collation order.

I don't think there are inconsistencies. We could not
use a dictionary option to neither :sort and sort().
sort() already had a dict with different semantics.

That said, we could make minor clean-ups:
- the tests introduced by 8.2.1933 could check v:collate
instead of checking execute("language collate").
- and the doc of :sort and :sort() could link to v:collate.

I wonder whether we really needed to introduce the
read-only v:collate given that the existing ":language collate"
was sufficient to check the collation order.

I also wonder whether there are other commands or
functions that could use collation order besides
readdir(), readdirex(), sort() and :sort.

Regards
Dominique

Christian Brabandt

unread,
Nov 2, 2020, 3:33:30 AM11/2/20
to Vim List
I basically added the v:collate variable to be consistent with the other
v: variable, like v:lang and v:ctype and v:lc_time variable and I think
it is easier to get the current locale using those variables instead of
having to redir the :language [type] subcommand.

I remember that I intentionally left out the :sort and sort() functions,
because I wanted to fix the immediate problem first and left the rest
back then alone, wanting to do it later. It seems I forgot :( so thanks
for adding that.

> I also wonder whether there are other commands or
> functions that could use collation order besides
> readdir(), readdirex(), sort() and :sort.

Possible. I suppose if we need those, we can always enhance it further.

Best,
Christian
--
Stilblüten aus Schreiben von Versicherungsnehmern:
Ich bin von Beruf Schweißer. Ihr Computer hat an der falschen Stelle
gespart und bei meinem Beruf das w weggelassen.

Chris Jones

unread,
Nov 5, 2020, 3:40:31 PM11/5/20
to vim_use
Gives the intended result. Thanks for spelling it out.

Chris Jones

unread,
Nov 5, 2020, 5:27:23 PM11/5/20
to vim_use
On Sun, Nov 01, 2020 at 02:45:28AM EST, Tekki wrote:
Now let's take this one step further.

What I am really trying to do is get rid of this file and do the
processing in-core using a vim dictionary.

In other words I have iterated over my source buffers and created a temp
file that contains key/value pairs that will serve as the basis for my
HTML index.

Now instead of this temp file that (as a reminder) contains something
like:

| Ï
| O
| E
| A
| É
| Æ
| I
| Œ

I create a vim dictionary that contains the same data:

| :let g:dict
| {'A': ' ', 'E': ' ', 'Æ': ' ', 'É': ' ', 'I': ' ', 'Œ': ' ', 'Ï': ' ', 'O': ' '}

I know I can loop over the contents of the dictionary like so:

| :for l in items(g:dict)
| : let l
| : endfor

handing me back the (unsorted) output:

| l ['A', ' ']
| l ['E', ' ']
| l ['Æ', ' ']
| l ['É', ' ']
| l ['I', ' ']
| l ['Œ', ' ']
| l ['Ï', ' ']
| l ['O', ' ']

Now I need this sorted alphabetically following French collating rules
... 'Æ' follows 'A'... 'É' follows 'E' etc.

So I have two problems:

1. sort the output by key value (couldn't think of a simple way
to do this off the bat¹...

2. until the issue I described earlier in this thread is addressed
I need to invoke gnu/coreutils' sort utility...

The solution to problem #1 should be pretty straightforward.

As to the second problem I need to pass my dictionary's entries to
gnu/sort in a way that somehow will return them sorted alphabetically.

Is this feasible using an external sort program?

Thanks,

CJ

¹ In python:

| for k, v in sorted(dict.items()):
| print(k, v)

Chris Jones

unread,
Dec 5, 2020, 7:33:50 PM12/5/20
to vim_use
On Thu, Nov 05, 2020 at 05:27:22PM EST, Chris Jones wrote:

> So I have two problems:
>
> 1. sort the output by key value (couldn't think of a simple way
> to do this off the batน...
>
> 2. until the issue I described earlier in this thread is addressed
> I need to invoke gnu/coreutils' sort utility...
>
> The solution to problem #1 should be pretty straightforward.
>
> As to the second problem I need to pass my dictionary's entries to
> gnu/sort in a way that somehow will return them sorted alphabetically.
>
> Is this feasible using an external sort program?
>
> Thanks,
>
> CJ
>
> น In python:
>
> | for k, v in sorted(dict.items()):
> | print(k, v)

As it turns out the above is incorrect: python does not do it correctly
either... at least if you use its native sort() method.

It turns out that one possibility is to use IBM's 'ICU' library: on
debian I only had to install the pyICU bindings via python's pip/pip3
and this appears to intall the correct version of the library...

Here's a vim wrapper to work around this difficulty:

| function! Lsort(list, locale)
| py3 << EOP
| import vim, icu
| list = vim.eval('a:list')
| locale = vim.eval('a:locale')
| collator = icu.Collator.createInstance(icu.Locale(locale))
| slist = sorted(list, key=collator.getSortKey)
| # pass back the sorted list to vim
| vim.command("let lst = %s"% slist)
| EOP
| return lst
| endfunc

e.g.

| :let sorted_list = Lsort(unsorted_list, 'fr_FR.UTF-8')

Should also work with other western languages or fancy English words
with diacritics (untested).

As to GNU/sort I have not been able to figure out how to invoke it from
a vim wrapper function.

BTW... How can I keep an eye on if/when/how this issue will be addressed
natively in vim?

Thanks,

CJ

Christian Brabandt

unread,
Dec 6, 2020, 3:44:18 AM12/6/20
to vim_use

On Sa, 05 Dez 2020, Chris Jones wrote:

> BTW... How can I keep an eye on if/when/how this issue will be addressed
> natively in vim?

If I remember correctly what the issue was then I think this has been
fixed as of 8.2.1933

Best
Christian
--
Hat der Bauer voll die Scheuer zahlt er etwas mehr an Steuer.

BPJ

unread,
Dec 6, 2020, 4:38:14 AM12/6/20
to vim_use
Den tis 27 okt. 2020 23:56Chris Jones <cjns...@gmail.com> skrev:

If I use the sort command provided on linux by the GNU coreutils package
so as to sort this file at the terminal 

I use an external Perl script which uses one of several sort orders defined in Perl modules with [Unicode::Collate::Locale][] or [Sort::ArbBiLex][]. It doesn't have the feature to sort on what is or isn't matched by a regex but I guess that could be added. (Darn, I think I just got an itch! :-)


Chris Jones

unread,
Dec 6, 2020, 4:41:00 PM12/6/20
to vim_use
On Sun, Dec 06, 2020 at 03:44:03AM EST, Christian Brabandt wrote:
>
> On Sa, 05 Dez 2020, Chris Jones wrote:
>
> > BTW... How can I keep an eye on if/when/how this issue will be addressed
> > natively in vim?
>
> If I remember correctly what the issue was then I think this has been
> fixed as of 8.2.1933
>
> Best
> Christian

Thank you Christian... I had already found that... :)

I am not familiar with the vim development stuff and I was none the
wiser after reading through the issue.

No rush anyway... For the time being I am content with the workaround
I came up with that appears to work. I guess I can live with its
additional dependencies.

CJ

Chris Jones

unread,
Dec 6, 2020, 5:33:53 PM12/6/20
to vim_use
One of the python Stack Exchange issues I found that covered this issue
actually suggested using this Perl library! I was indeed quite
surprised python didn't do it right out of the box so that I had to use
what appears to be the pyICU bindings to ICU IBM-maintained C++ libary.

Obviously since this is in an ebook/epub context and collating order
across a variety of languages is a rather intricate issue that normally
addressed by people who know what they are doing... all I wanted to know
was grabbin the language hard-coded in the book's metadata and pass it
to a function that takes care of sorting the dictionary created by
scanning the markdown source and picking up the LaTeX index tags. If
I was crazy enough to think of maintaining my own locale stuff and have
it do the right thing for Hungarian or Portuguese not to mention Greek
and various flavour of Cyrillic... I wouldn't know where to start.

Thanks,

CJ
Reply all
Reply to author
Forward
0 new messages