Describe the bug
Buffers menu has "strange" menu items on windows with russian locale
To Reproduce
Detailed steps to reproduce the behavior:
0. have windows with russian locale
gvim -Nu NONE -c "set enc=utf-8"Expected behavior
Buffers menu should have proper ru text
Screenshots
Environment (please complete the following information):
vim --version.)Additional context
VIM - Vi IMproved 8.2 (2019 Dec 12, compiled May 16 2021 22:02:14)
MS-Windows 64-bit GUI version with OLE support
Included patches: 1-2860
Compiled by appveyor@APPVYR-WIN
PS, there is also issue with popup menu:
But I am not sure how to reproduce it with minimal steps.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub, or unsubscribe.![]()
On openSUSE Linux, if I start gvim with
LC_MESSAGES=ru_RU.utf-8 gvim -Nu NONE
all menus (including Буферы and its sub-menus) are in proper Cyrillic Russian. 'encoding' is utf-8 from the beginning.
When I run :scriptnames I get
1: /usr/local/share/vim/vim82/menu.vim
2: /usr/local/share/vim/vim82/lang/menu_ru_ru.koi8-r.vim
3: /usr/local/share/vim/vim82/lang/menu_ru_ru.utf-8.vim
4: /usr/local/share/vim/vim82/lang/menu_ru_ru.vim
5: /usr/local/share/vim/vim82/autoload/paste.vim
Best regards,
Tony.
On openSUSE Linux, if I start gvim with
Windows gvim by default has non-unicode encoding (for me it is cp1251) so I have to set encoding=utf-8.
It is not a big deal for me -- I dont use menus, but it is nice to have properly rendered if I ever need them.
And again, if I remove filetype on everything is ok.
may be it's time to revisit the encoding for windows and make that utf-8 per default as on linux
This particular issue is due to using -c instead of --cmd. You really should change encoding before processing your vimrc, not after that. Otherwise various side-effects are possible.
But there's a real problem behind this: $VIMRUNTIME/filetype.vim may also source $VIMRUNTIME/menu.vim. And hence there comes surprising difference between putting :filetype on before :set encoding=xxx or after. However, $VIMRUNTIME/menu.vim will be sourced later by startup routine anyway, so it's not clear to me why it also needed the :source here.
As a workaround I have the following on the very top of my config:
if !exists("did_load_filetypes") && !has("nvim") let did_install_syntax_menu = 1 filetype plugin indent on syntax on unlet did_install_syntax_menu endif
This allows to set encoding later without breaking anything. Though I'd probably liked more if :filetype on wouldn't have such side effect at all.
You really should change encoding before processing your vimrc, not after that. Otherwise various side-effects are possible.
I would like not to change it at all, but I have to and do it in vimrc. For ppl who rely on menus it might be an issue.
Also note that you won't encounter such problem at all if set encoding precedes filetype on
Huh, I think I wanted to try it out but never did, thx!
As mentioned, if you change 'encoding' you need to do this very early, because it invalidates mappings and menus already defined.
This is not really a bug, but defaulting 'encoding' to "utf-8" is most likely better for most users. This already happens on Linux systems, since the environment is usually setup for it. I'm not sure about MS-Windows. Is there any choice the user makes to decide about the encoding? At least in the console there is the active codepage, that we better match with to avoid various problems. For the GUI I'm not sure how to decide about the encoding. The main thing is that the value of 'encoding' also set the default for the encoding of newly created files. Does using "utf-8" work for that? Quite a few MS-Windows utils default to using utf-16 (which is the worst choice in many ways).
This is not really a bug, but defaulting 'encoding' to "utf-8" is most likely better for most users.
I can't say for the most users, but I use it on windows since many years ago.
I'm not sure about MS-Windows. Is there any choice the user makes to decide about the encoding?
I don't think majority of win users do this choice if it exists -- windows is usually pre-installed. Ofc there are region settings, input "language" selection and whatnot, but I have never changed encoding for Windows system-wide.
At least in the console there is the active codepage, that we better match with to avoid various problems.
I use console vim from time to time having set encoding=utf8 in vimrc. My console has cp1251 and vim has no issues with that fact.
For the GUI I'm not sure how to decide about the encoding. The main thing is that the value of 'encoding' also set the default for the encoding of newly created files.
Does using "utf-8" work for that?
Absolutely! I believe (we need fact checkers here :) ) most modern software in windows works with utf-8 files.
Quite a few MS-Windows utils default to using utf-16 (which is the worst choice in many ways).
But having encoding==cp1251 in vim wouldn't help here anyway, right?
Is there any choice the user makes to decide about the encoding?
It's possible to check $LANG environment variable. By default Windows does not have or use it but some ported software may need it anyway, so there's a small chance that users intentionally set it even on Windows.
we need fact checkers here :)
Well, Neovim already does this for years and it seems no one complains.
@matveyt wrote:
Well, Neovim already does this for years and it seems no one complains
I see the in neovim (I tried v0.5.0-dev+1318-g61aefaf29), :help vim-diff
shows that 'encoding' defaults to utf-8, whereas for vim, :help 'encoding'
says that default is "latin1" or value from $LANG.
Given that the +multi-byte feature is nowadays always enabled (since vim-8.1.0733)
in all versions of Vim, perhaps we should consider making utf-8 the default
'encoding' value. It's an incompatible change, but it would probably fix more
problem than it introduces. And user can always set 'enc' explicitly in any case.
@dpelle In Neovim 'encoding' is even fixed to utf-8 and cannot be changed by user (unlike 'fileencoding').
I think that the release of Vim 9 is a very good chance to change the default encoding to UTF-8.
See also: #3907
I'm not sure how to pick utf-8 for encoding on MS-Windows, since there are so many versions out there (and people do use unsupported versions too, for various reasons).
I use vim more than 15 years now -- mostly on various versions of windows. And I always set encoding=utf8 cause otherwise vim just doesn't work properly with russian text. Normal commands do smth unexpected like w goes to some random character instead of the word boundary.
Probably non-utf encodings works just fine for latin alphabet users but not for the others.
Again, I can live with what we have now (it was ok for me that many years anyway) but as @k-takata said -- vim9 is a good excuse to introduce default utf-8 for windows users (we deserve good things too).
—
One possibility is to use utf-8 for the GUI. In the console we have to
work with the active codepage, some encoding issues might appear
otherwise.
Well, without enc=utf8 the same issues in cmd vim -- impossible to use with russian texts.
For the GUI we do not have such problems, and Unicode support should always be there.
So how about defaulting 'encoding' to "utf-8" in the MS-Windows GUI?
With gui only change windows users would still need to add set encoding=utf-8 to their vimrcs. Cause you never know when you will run cmd.exe and vim.
I think that using different defaults between the GUI and the console just introduces complexity.
I'm not sure about MS-Windows. Is there any choice the user makes to decide about the encoding?
In Windows 10 user is finally allowed to pick utf-8 as default. Though the feature is still marked as "beta" but it looks all right (for both Vim and gVim).
—
@brammool It's tunable by Control Panel applet. The feature is still marked as "beta" and it's not (yet?) choosable during installation process.
For developers, Microsoft recommends to Use the UTF-8 code page per process.
So, as it turns out, the following is enough to select utf-8 as ACP under W10 (1903 or later):
diff --git a/src/gvim.exe.mnf b/src/gvim.exe.mnf index e5c250ff2..e1eeb1bb1 100644 --- a/src/gvim.exe.mnf +++ b/src/gvim.exe.mnf @@ -29,10 +29,11 @@ </requestedPrivileges> </security> </trustInfo> - <!-- Vista High DPI aware --> + <!-- Vista High DPI aware, W10 active code page is UTF-8 --> <asmv3:application> - <asmv3:windowsSettings xmlns="http://schemas.microsoft.com/SMI/2005/WindowsSettings"> - <dpiAware>true</dpiAware> + <asmv3:windowsSettings> + <dpiAware xmlns="http://schemas.microsoft.com/SMI/2005/WindowsSettings">true</dpiAware> + <activeCodePage xmlns="http://schemas.microsoft.com/SMI/2019/WindowsSettings">UTF-8</activeCodePage> </asmv3:windowsSettings> </asmv3:application> <compatibility xmlns="urn:schemas-microsoft-com:compatibility.v1">
Now vim --clean reports encoding=utf-8 and termencoding=cp866 as expected.
—
@brammool Why? No, this is GetACP(), i,.e. string encoding for all WinAPI-"A" calls except console I/O (and except "W"-calls that always use utf-16le). So it doesn't affect console I/O but it's effective for any application.
The name gvim.exe.mnf suggests this only applies to gvim.exe.
But there is lots of confusion. looking at the mingw makefile, it gets turned into vimrc.o.
Which has nothing to do with "vimrc", so that's a bad name.
And it looks like vimrc.o is added no matter whether the GUI is used or not.
So why is the file called gvim.exe.mnf?
To add to the confusion. gvim.exe.mnf doesn't have any comment explaining what the file is for, no copyright notice or anything.
And it uses "assembly" while it clearly is not assembly language. That deserves an explanation.
I would appreciate if someone can make a pull request to fix this first.
I don't like the idea that using gvim.exe.mnf for changing the default code page to UTF-8.
That method is useful for a program that uses ANSI APIs, but Vim already uses Wide APIs.
No need to change the behavior of GetACP().
And, that method may break the 'makeencoding' option when it is set to "char".
gvim.exe.mnf was originally introduced for gvim.exe for Windows XP's new common controls.
But later, it is used for Vista security requirements, Vista High DPI awareness, and Windows 10's GetWindowsVersionEx() support.
So, now gvim.exe.mnf is also used in vim.exe. I think that "vim.manifest" is a better file name.
And it uses "assembly" while it clearly is not assembly language. That deserves an explanation.
It's a Windows term. This "assembly" means "side-by-side assembly".
https://docs.microsoft.com/en-us/windows/win32/sbscs/side-by-side-assemblies-reference
That method is useful for a program that uses ANSI APIs, but Vim already uses Wide APIs.
Yes, but it costs an extra conversion, as Vim never uses utf16-le internally other than for WinAPI calls.
And, that method may break the 'makeencoding' option when it is set to "char".
That's a pity, although makeencoding=char is never guaranteed to work.
Yes, but it costs an extra conversion, as Vim never uses utf16-le internally other than for WinAPI calls.
Vim doesn't use ANSI APIs (except only a few parts), so it's not an "extra" conversion.
@k-takata But it's possible to use ANSI API + utf-8 everywhere and to avoid WIDE API + utf16-le completely. Okay, I understand that it's not of much convenience as we must also support older systems. Just an empty observation.
Using UTF-8 on ANSI APIs has at least one big issue; the length of the file name.
Wide APIs can handle up to MAX_PATH code units, but ANSI APIs can handle only MAX_PATH bytes.
In the worst case, ANSI APIs can handle only 1/3 length of the Wide APIs.
Another thing we should consider is that Windows uses UTF-16 internally, which means that a UTF-8 string passed to an ANSI API will be converted to UTF-16 internally.
ANSI APIs + UTF-8 should be used only when someone cannot (easily) use Wide APIs.
Speaking of long paths, support for longPathAware needs to be implemented at some point...
—
But I'm not sure it can go before the "<?xml" line, probably not.
No, "xml" tag is required to be at the very top. Even if Windows could somehow survive this, we should follow the standart. Therefore I put the comment beneath "xml" tag. Something worth to note: the manifest will be included into Vim executable verbatim.
Closed #8221.
Default encoding on windows now is utf-8. Closing this.