[vim/vim] Cyrillic encoding issue with buffers menu (set enc=utf-8) (#8221)

430 views
Skip to first unread message

Maxim Kim

unread,
May 18, 2021, 2:37:33 AM5/18/21
to vim/vim, Subscribed

Describe the bug

Buffers menu has "strange" menu items on windows with russian locale

To Reproduce
Detailed steps to reproduce the behavior:
0. have windows with russian locale

  1. Run gvim -Nu NONE -c "set enc=utf-8"
  2. Buffers menu is unreadable

Expected behavior

Buffers menu should have proper ru text

Screenshots

image

image

Environment (please complete the following information):

  • Vim version [e.g. 8.1.1234] (Or paste the result of vim --version.)
  • OS: [e.g. Ubuntu 18.04, Windows 10 1809, macOS 10.14]
  • Terminal: [e.g. GNOME Terminal, mintty, iTerm2, tmux, GNU screen] (Use GUI if you use the GUI.)

Additional context

VIM - Vi IMproved 8.2 (2019 Dec 12, compiled May 16 2021 22:02:14)
MS-Windows 64-bit GUI version with OLE support
Included patches: 1-2860
Compiled by appveyor@APPVYR-WIN

PS, there is also issue with popup menu:

image

But I am not sure how to reproduce it with minimal steps.


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub, or unsubscribe.

Maxim Kim

unread,
May 18, 2021, 2:54:31 AM5/18/21
to vim/vim, Subscribed

The issue is with filetype on:

to reproduce:

  1. have windows with ru locale
  2. run gvim -Nu vimrc_menu --noplugin where vimrc_menu has 2 lines:
filetype on
set encoding=utf8
  1. all menus are corrupted

image

:scriptnames shows there are two localization files being loaded:

image

koi8-r and utf-8

Tony Mechelynck

unread,
May 18, 2021, 5:56:13 AM5/18/21
to vim/vim, Subscribed

On openSUSE Linux, if I start gvim with

LC_MESSAGES=ru_RU.utf-8 gvim -Nu NONE

all menus (including Буферы and its sub-menus) are in proper Cyrillic Russian. 'encoding' is utf-8 from the beginning.

When I run :scriptnames I get

  1: /usr/local/share/vim/vim82/menu.vim

  2: /usr/local/share/vim/vim82/lang/menu_ru_ru.koi8-r.vim

  3: /usr/local/share/vim/vim82/lang/menu_ru_ru.utf-8.vim

  4: /usr/local/share/vim/vim82/lang/menu_ru_ru.vim

  5: /usr/local/share/vim/vim82/autoload/paste.vim

Best regards,
Tony.

Maxim Kim

unread,
May 18, 2021, 10:22:51 AM5/18/21
to vim/vim, Subscribed

On openSUSE Linux, if I start gvim with

Windows gvim by default has non-unicode encoding (for me it is cp1251) so I have to set encoding=utf-8.

It is not a big deal for me -- I dont use menus, but it is nice to have properly rendered if I ever need them.

And again, if I remove filetype on everything is ok.

Christian Brabandt

unread,
May 18, 2021, 10:50:04 AM5/18/21
to vim/vim, Subscribed

may be it's time to revisit the encoding for windows and make that utf-8 per default as on linux

matveyt

unread,
May 18, 2021, 3:43:38 PM5/18/21
to vim/vim, Subscribed

This particular issue is due to using -c instead of --cmd. You really should change encoding before processing your vimrc, not after that. Otherwise various side-effects are possible.

But there's a real problem behind this: $VIMRUNTIME/filetype.vim may also source $VIMRUNTIME/menu.vim. And hence there comes surprising difference between putting :filetype on before :set encoding=xxx or after. However, $VIMRUNTIME/menu.vim will be sourced later by startup routine anyway, so it's not clear to me why it also needed the :source here.

As a workaround I have the following on the very top of my config:

if !exists("did_load_filetypes") && !has("nvim")
    let did_install_syntax_menu = 1
    filetype plugin indent on
    syntax on
    unlet did_install_syntax_menu
endif

This allows to set encoding later without breaking anything. Though I'd probably liked more if :filetype on wouldn't have such side effect at all.

Maxim Kim

unread,
May 18, 2021, 4:08:13 PM5/18/21
to vim/vim, Subscribed

You really should change encoding before processing your vimrc, not after that. Otherwise various side-effects are possible.

I would like not to change it at all, but I have to and do it in vimrc. For ppl who rely on menus it might be an issue.

Also note that you won't encounter such problem at all if set encoding precedes filetype on

Huh, I think I wanted to try it out but never did, thx!

Bram Moolenaar

unread,
May 18, 2021, 6:00:25 PM5/18/21
to vim/vim, Subscribed

As mentioned, if you change 'encoding' you need to do this very early, because it invalidates mappings and menus already defined.

This is not really a bug, but defaulting 'encoding' to "utf-8" is most likely better for most users. This already happens on Linux systems, since the environment is usually setup for it. I'm not sure about MS-Windows. Is there any choice the user makes to decide about the encoding? At least in the console there is the active codepage, that we better match with to avoid various problems. For the GUI I'm not sure how to decide about the encoding. The main thing is that the value of 'encoding' also set the default for the encoding of newly created files. Does using "utf-8" work for that? Quite a few MS-Windows utils default to using utf-16 (which is the worst choice in many ways).

Maxim Kim

unread,
May 19, 2021, 2:05:50 AM5/19/21
to vim/vim, Subscribed

This is not really a bug, but defaulting 'encoding' to "utf-8" is most likely better for most users.

I can't say for the most users, but I use it on windows since many years ago.

I'm not sure about MS-Windows. Is there any choice the user makes to decide about the encoding?

I don't think majority of win users do this choice if it exists -- windows is usually pre-installed. Ofc there are region settings, input "language" selection and whatnot, but I have never changed encoding for Windows system-wide.

At least in the console there is the active codepage, that we better match with to avoid various problems.

I use console vim from time to time having set encoding=utf8 in vimrc. My console has cp1251 and vim has no issues with that fact.

For the GUI I'm not sure how to decide about the encoding. The main thing is that the value of 'encoding' also set the default for the encoding of newly created files.
Does using "utf-8" work for that?

Absolutely! I believe (we need fact checkers here :) ) most modern software in windows works with utf-8 files.

Quite a few MS-Windows utils default to using utf-16 (which is the worst choice in many ways).

But having encoding==cp1251 in vim wouldn't help here anyway, right?

matveyt

unread,
May 19, 2021, 2:11:53 AM5/19/21
to vim/vim, Subscribed

Is there any choice the user makes to decide about the encoding?

It's possible to check $LANG environment variable. By default Windows does not have or use it but some ported software may need it anyway, so there's a small chance that users intentionally set it even on Windows.

matveyt

unread,
May 19, 2021, 2:16:28 AM5/19/21
to vim/vim, Subscribed

@habamax

we need fact checkers here :)

Well, Neovim already does this for years and it seems no one complains.

Dominique Pellé

unread,
May 19, 2021, 2:40:04 AM5/19/21
to vim/vim, Subscribed

@matveyt wrote:

Well, Neovim already does this for years and it seems no one complains

I see the in neovim (I tried v0.5.0-dev+1318-g61aefaf29), :help vim-diff
shows that 'encoding' defaults to utf-8, whereas for vim, :help 'encoding'
says that default is "latin1" or value from $LANG.

Given that the +multi-byte feature is nowadays always enabled (since vim-8.1.0733)
in all versions of Vim, perhaps we should consider making utf-8 the default
'encoding' value. It's an incompatible change, but it would probably fix more
problem than it introduces. And user can always set 'enc' explicitly in any case.

matveyt

unread,
May 19, 2021, 3:14:00 AM5/19/21
to vim/vim, Subscribed

@dpelle In Neovim 'encoding' is even fixed to utf-8 and cannot be changed by user (unlike 'fileencoding').

K.Takata

unread,
May 19, 2021, 3:53:22 AM5/19/21
to vim/vim, Subscribed

I think that the release of Vim 9 is a very good chance to change the default encoding to UTF-8.
See also: #3907

Bram Moolenaar

unread,
May 20, 2021, 4:34:17 PM5/20/21
to vim/vim, Subscribed


> @habamax
>
> >we need fact checkers here :)
>
> Well, Neovim already does this for years and it seems no one complains.

That doesn't mean much. If someone has a problem with Neovim then they
switch back to Vim. We've had several things that Neovim users didn't
complain about, but did raise complaints once included with Vim.
Also, several encoding problems with libvterm were only uncovered once
it was included with Vim.

I'm not sure how to pick utf-8 for encoding on MS-Windows, since there
are so many versions out there (and people do use unsupported versions
too, for various reasons).

--
A real patriot is the fellow who gets a parking ticket and rejoices
that the system works.


/// Bram Moolenaar -- Br...@Moolenaar.net -- http://www.Moolenaar.net \\\
/// \\\
\\\ sponsor Vim, vote for features -- http://www.Vim.org/sponsor/ ///
\\\ help me help AIDS victims -- http://ICCF-Holland.org ///

Maxim Kim

unread,
May 21, 2021, 3:07:48 AM5/21/21
to vim/vim, Subscribed

I'm not sure how to pick utf-8 for encoding on MS-Windows, since there are so many versions out there (and people do use unsupported versions too, for various reasons).

I use vim more than 15 years now -- mostly on various versions of windows. And I always set encoding=utf8 cause otherwise vim just doesn't work properly with russian text. Normal commands do smth unexpected like w goes to some random character instead of the word boundary.

Probably non-utf encodings works just fine for latin alphabet users but not for the others.

Again, I can live with what we have now (it was ok for me that many years anyway) but as @k-takata said -- vim9 is a good excuse to introduce default utf-8 for windows users (we deserve good things too).

Bram Moolenaar

unread,
May 21, 2021, 6:10:56 AM5/21/21
to vim/vim, Subscribed


> >I'm not sure how to pick utf-8 for encoding on MS-Windows, since
> >there are so many versions out there (and people do use unsupported
> >versions too, for various reasons).
>
>
> I use vim more than 15 years now -- mostly on various versions of
> windows. And I always `set encoding=utf8` cause otherwise vim just
> doesn't work properly with russian text. Normal commands do smth
> unexpected like `w` goes to some random character instead of the word
> boundary.
>
> Probably non-utf encodings works just fine for latin alphabet users
> but not for the others.
>
> Again, I can live with what we have now (it was ok for me that many
> years anyway) but as @k-takata said -- vim9 is a good excuse to
> introduce default utf-8 for windows users (we deserve good things
> too).

One possibility is to use utf-8 for the GUI. In the console we have to
work with the active codepage, some encoding issues might appear
otherwise. For the GUI we do not have such problems, and Unicode
support should always be there.

So how about defaulting 'encoding' to "utf-8" in the MS-Windows GUI?

--
Futility Factor: No experiment is ever a complete failure - it can always
serve as a negative example.


/// Bram Moolenaar -- Br...@Moolenaar.net -- http://www.Moolenaar.net \\\
/// \\\
\\\ sponsor Vim, vote for features -- http://www.Vim.org/sponsor/ ///
\\\ help me help AIDS victims -- http://ICCF-Holland.org ///

Maxim Kim

unread,
May 21, 2021, 6:27:02 AM5/21/21
to vim/vim, Subscribed

One possibility is to use utf-8 for the GUI. In the console we have to
work with the active codepage, some encoding issues might appear
otherwise.

Well, without enc=utf8 the same issues in cmd vim -- impossible to use with russian texts.

vim-enc

For the GUI we do not have such problems, and Unicode support should always be there.
So how about defaulting 'encoding' to "utf-8" in the MS-Windows GUI?

With gui only change windows users would still need to add set encoding=utf-8 to their vimrcs. Cause you never know when you will run cmd.exe and vim.

K.Takata

unread,
May 21, 2021, 7:45:03 AM5/21/21
to vim/vim, Subscribed

I think that using different defaults between the GUI and the console just introduces complexity.

matveyt

unread,
May 21, 2021, 9:21:57 AM5/21/21
to vim/vim, Subscribed

@brammool

I'm not sure about MS-Windows. Is there any choice the user makes to decide about the encoding?

In Windows 10 user is finally allowed to pick utf-8 as default. Though the feature is still marked as "beta" but it looks all right (for both Vim and gVim).

Bram Moolenaar

unread,
May 21, 2021, 3:07:27 PM5/21/21
to vim/vim, Subscribed


> @brammool
>
> >I'm not sure about MS-Windows. Is there any choice the user makes to
> >decide about the encoding?
>
> In Windows 10 user is finally allowed to pick utf-8 as default. Though
> the feature is still marked as "beta" but it looks all right (for both
> Vim and gVim).

Is that when installing a new system? I know most people just keep
updating what they have, especially now that Windows 10 is here to stay.
Thus I'm not sure depending on a new user setting will help. I was
hoping there was an existing one.

I checked the options initialization code, and it appears that when
'encoding' is "utf-8", the 'termencoding' option should be set
correctly. And we can probably rely on conversion to always work (one
doesn't need to install iconv). I'm still a bit careful, just switching
to "utf-8" as the default encoding might upset users who are not
involved in development.

--
GUARD #2: Wait a minute -- supposing two swallows carried it together?
GUARD #1: No, they'd have to have it on a line.
GUARD #2: Well, simple! They'd just use a standard creeper!
GUARD #1: What, held under the dorsal guiding feathers?
GUARD #2: Well, why not?
The Quest for the Holy Grail (Monty Python)


/// Bram Moolenaar -- Br...@Moolenaar.net -- http://www.Moolenaar.net \\\
/// \\\
\\\ sponsor Vim, vote for features -- http://www.Vim.org/sponsor/ ///
\\\ help me help AIDS victims -- http://ICCF-Holland.org ///

matveyt

unread,
May 21, 2021, 5:00:37 PM5/21/21
to vim/vim, Subscribed

@brammool It's tunable by Control Panel applet. The feature is still marked as "beta" and it's not (yet?) choosable during installation process.

For developers, Microsoft recommends to Use the UTF-8 code page per process.

matveyt

unread,
May 22, 2021, 9:57:13 AM5/22/21
to vim/vim, Subscribed

So, as it turns out, the following is enough to select utf-8 as ACP under W10 (1903 or later):

diff --git a/src/gvim.exe.mnf b/src/gvim.exe.mnf
index e5c250ff2..e1eeb1bb1 100644
--- a/src/gvim.exe.mnf
+++ b/src/gvim.exe.mnf
@@ -29,10 +29,11 @@
       </requestedPrivileges>
     </security>
   </trustInfo>
-  <!-- Vista High DPI aware -->
+  <!-- Vista High DPI aware, W10 active code page is UTF-8 -->
   <asmv3:application>
-    <asmv3:windowsSettings xmlns="http://schemas.microsoft.com/SMI/2005/WindowsSettings">
-      <dpiAware>true</dpiAware>
+    <asmv3:windowsSettings>
+      <dpiAware xmlns="http://schemas.microsoft.com/SMI/2005/WindowsSettings">true</dpiAware>
+      <activeCodePage xmlns="http://schemas.microsoft.com/SMI/2019/WindowsSettings">UTF-8</activeCodePage>
     </asmv3:windowsSettings>
   </asmv3:application>
   <compatibility xmlns="urn:schemas-microsoft-com:compatibility.v1">

Now vim --clean reports encoding=utf-8 and termencoding=cp866 as expected.

Bram Moolenaar

unread,
May 22, 2021, 2:30:45 PM5/22/21
to vim/vim, Subscribed


> So, as it turns out, the following is enough to select utf-8 as ACP under W10 (1903 or later):
>
> ```diff
> diff --git a/src/gvim.exe.mnf b/src/gvim.exe.mnf
> index e5c250ff2..e1eeb1bb1 100644
> --- a/src/gvim.exe.mnf
> +++ b/src/gvim.exe.mnf
> @@ -29,10 +29,11 @@
> </requestedPrivileges>
> </security>
> </trustInfo>
> - <!-- Vista High DPI aware -->
> + <!-- Vista High DPI aware, W10 active code page is UTF-8 -->
> <asmv3:application>
> - <asmv3:windowsSettings xmlns="http://schemas.microsoft.com/SMI/2005/WindowsSettings">
> - <dpiAware>true</dpiAware>
> + <asmv3:windowsSettings>
> + <dpiAware xmlns="http://schemas.microsoft.com/SMI/2005/WindowsSettings">true</dpiAware>
> + <activeCodePage xmlns="http://schemas.microsoft.com/SMI/2019/WindowsSettings">UTF-8</activeCodePage>
> </asmv3:windowsSettings>
> </asmv3:application>
> <compatibility xmlns="urn:schemas-microsoft-com:compatibility.v1">
> ```
>
> Now `vim --clean` reports `encoding=utf-8` and `termencoding=cp866` as
> expected.

But then it's only effective vor the GUI version, right?

--
Have you heard about the new Barbie doll? It's called Divorce
Barbie. It comes with most of Ken's stuff.


/// Bram Moolenaar -- Br...@Moolenaar.net -- http://www.Moolenaar.net \\\
/// \\\
\\\ sponsor Vim, vote for features -- http://www.Vim.org/sponsor/ ///
\\\ help me help AIDS victims -- http://ICCF-Holland.org ///

matveyt

unread,
May 22, 2021, 3:55:59 PM5/22/21
to vim/vim, Subscribed

@brammool Why? No, this is GetACP(), i,.e. string encoding for all WinAPI-"A" calls except console I/O (and except "W"-calls that always use utf-16le). So it doesn't affect console I/O but it's effective for any application.

Bram Moolenaar

unread,
May 22, 2021, 4:53:30 PM5/22/21
to vim/vim, Subscribed

The name gvim.exe.mnf suggests this only applies to gvim.exe.
But there is lots of confusion. looking at the mingw makefile, it gets turned into vimrc.o.
Which has nothing to do with "vimrc", so that's a bad name.
And it looks like vimrc.o is added no matter whether the GUI is used or not.
So why is the file called gvim.exe.mnf?

To add to the confusion. gvim.exe.mnf doesn't have any comment explaining what the file is for, no copyright notice or anything.
And it uses "assembly" while it clearly is not assembly language. That deserves an explanation.

I would appreciate if someone can make a pull request to fix this first.

K.Takata

unread,
May 22, 2021, 9:32:50 PM5/22/21
to vim/vim, Subscribed

I don't like the idea that using gvim.exe.mnf for changing the default code page to UTF-8.
That method is useful for a program that uses ANSI APIs, but Vim already uses Wide APIs.
No need to change the behavior of GetACP().
And, that method may break the 'makeencoding' option when it is set to "char".

gvim.exe.mnf was originally introduced for gvim.exe for Windows XP's new common controls.
But later, it is used for Vista security requirements, Vista High DPI awareness, and Windows 10's GetWindowsVersionEx() support.
So, now gvim.exe.mnf is also used in vim.exe. I think that "vim.manifest" is a better file name.

And it uses "assembly" while it clearly is not assembly language. That deserves an explanation.

It's a Windows term. This "assembly" means "side-by-side assembly".
https://docs.microsoft.com/en-us/windows/win32/sbscs/side-by-side-assemblies-reference

matveyt

unread,
May 23, 2021, 2:58:32 AM5/23/21
to vim/vim, Subscribed

That method is useful for a program that uses ANSI APIs, but Vim already uses Wide APIs.

Yes, but it costs an extra conversion, as Vim never uses utf16-le internally other than for WinAPI calls.

And, that method may break the 'makeencoding' option when it is set to "char".

That's a pity, although makeencoding=char is never guaranteed to work.

K.Takata

unread,
May 23, 2021, 10:22:01 PM5/23/21
to vim/vim, Subscribed

Yes, but it costs an extra conversion, as Vim never uses utf16-le internally other than for WinAPI calls.

Vim doesn't use ANSI APIs (except only a few parts), so it's not an "extra" conversion.

matveyt

unread,
May 23, 2021, 11:51:29 PM5/23/21
to vim/vim, Subscribed

@k-takata But it's possible to use ANSI API + utf-8 everywhere and to avoid WIDE API + utf16-le completely. Okay, I understand that it's not of much convenience as we must also support older systems. Just an empty observation.

K.Takata

unread,
May 24, 2021, 12:44:56 AM5/24/21
to vim/vim, Subscribed

Using UTF-8 on ANSI APIs has at least one big issue; the length of the file name.
Wide APIs can handle up to MAX_PATH code units, but ANSI APIs can handle only MAX_PATH bytes.
In the worst case, ANSI APIs can handle only 1/3 length of the Wide APIs.

Another thing we should consider is that Windows uses UTF-16 internally, which means that a UTF-8 string passed to an ANSI API will be converted to UTF-16 internally.

ANSI APIs + UTF-8 should be used only when someone cannot (easily) use Wide APIs.

matveyt

unread,
May 24, 2021, 3:47:15 AM5/24/21
to vim/vim, Subscribed

Speaking of long paths, support for longPathAware needs to be implemented at some point...

Bram Moolenaar

unread,
May 24, 2021, 9:46:35 AM5/24/21
to vim/vim, Subscribed


Ken Takata wrote:

> I don't like the idea that using gvim.exe.mnf for changing the default
> code page to UTF-8. That method is useful for a program that uses

> ANSI APIs, but Vim already uses Wide APIs.
> No need to change the behavior of GetACP().
> And, that method may break the 'makeencoding' option when it is set to
> "char".

Defaulting the value of 'encoding' to "utf-8" is what we want, if
changing the manifest has side effects we better not do it that way.


> gvim.exe.mnf was originally introduced for gvim.exe for Windows XP's
> new common controls.
> But later, it is used for Vista security requirements, Vista High DPI awareness, and Windows 10's GetWindowsVersionEx() support.
> So, now gvim.exe.mnf is also used in vim.exe. I think that
> "vim.manifest" is a better file name.

I see that matveyt has made a PR for this.


> > And it uses "assembly" while it clearly is not assembly language.
> > That deserves an explanation.
>
> It's a Windows term. This "assembly" means "side-by-side assembly".
> https://docs.microsoft.com/en-us/windows/win32/sbscs/side-by-side-assemblies-reference

In the XML tag names they leave out "side-by-side", which makes it
confusing. I suppose a <!-- --> comment can be added near the top.
But I'm not sure it can go before the "<?xml" line, probably not.

--
ARTHUR: Be quiet!
DENNIS: --but by a two-thirds majority in the case of more--
ARTHUR: Be quiet! I order you to be quiet!
WOMAN: Order, eh -- who does he think he is?
ARTHUR: I am your king!

The Quest for the Holy Grail (Monty Python)

/// Bram Moolenaar -- Br...@Moolenaar.net -- http://www.Moolenaar.net \\\
/// \\\
\\\ sponsor Vim, vote for features -- http://www.Vim.org/sponsor/ ///
\\\ help me help AIDS victims -- http://ICCF-Holland.org ///

matveyt

unread,
May 24, 2021, 10:07:24 AM5/24/21
to vim/vim, Subscribed

But I'm not sure it can go before the "<?xml" line, probably not.

No, "xml" tag is required to be at the very top. Even if Windows could somehow survive this, we should follow the standart. Therefore I put the comment beneath "xml" tag. Something worth to note: the manifest will be included into Vim executable verbatim.

Maxim Kim

unread,
May 31, 2021, 3:08:02 AM5/31/21
to vim/vim, Subscribed

Closed #8221.

Maxim Kim

unread,
May 31, 2021, 3:08:03 AM5/31/21
to vim/vim, Subscribed

Default encoding on windows now is utf-8. Closing this.

Reply all
Reply to author
Forward
0 new messages