Several keymap files with _utf-8 in their filename are actually encoded in latin1 or cp1255, not UTF-8. This causes errors when tools attempt to read these files as UTF-8:
esperanto_utf-8.vim: 'utf-8' codec can't decode byte 0xf9 in position 336greek_utf-8.vim: 'utf-8' codec can't decode byte 0xb4 in position 24957hebrewp_utf-8.vim: 'utf-8' codec can't decode byte 0xf9 in position 262| File | Original Encoding | Issue |
|---|---|---|
esperanto_utf-8.vim |
latin1 | Contains Ù/ù (U WITH GRAVE) as keymaps; has scriptencoding latin1 directive |
greek_utf-8.vim |
latin1 | Contains ´/ª characters in keymap entries |
hebrewp_utf-8.vim |
cp1255 (Windows Hebrew) | Contains Hebrew characters in comments |
scriptencoding directive in esperanto_utf-8.vim from latin1 to utf-8All three files now pass UTF-8 validation:
>>> for f in files: ... open(f, encoding='utf-8').read() # No errors
The keymap functionality remains unchanged - only the encoding was modified.
closes #16390
https://github.com/vim/vim/pull/19240
(3 files)
—
Reply to this email directly, view it on GitHub.
You are receiving this because you are subscribed to this thread.![]()
The CI failure in linux (huge, gcc, no_x11_wl) appears to be a flaky test unrelated to this change, as this PR only modifies runtime keymap files and doesn't touch any C code or tests.
Could a maintainer please re-run the failed job? Thank you!
—
Reply to this email directly, view it on GitHub.
You are receiving this because you are subscribed to this thread.![]()
Thanks. Just curious, how did you notice?
—
Reply to this email directly, view it on GitHub.
You are receiving this because you are subscribed to this thread.![]()
Thanks. Just curious, how did you notice?
I tried fixing it with Neovim, and they told me to switch to Vim first, and then I would be transferred to Neovim
—
Reply to this email directly, view it on GitHub.
You are receiving this because you are subscribed to this thread.![]()
@ThanhNguyxn I think Christian is asking how you noticed that the files (in Neovim) had the wrong encoding. What were you doing with them?
—
Reply to this email directly, view it on GitHub.
You are receiving this because you are subscribed to this thread.![]()
@clason I was working on a tool that processes Vim runtime files and needs to read them as UTF-8. When iterating through keymap files, Python's open(file, encoding='utf-8') raised UnicodeDecodeError on these three files.
Since the filenames contain _utf-8, I expected them to be UTF-8 encoded, but they were actually encoded in latin1/cp1255. The existing issue #16390 also reported this same problem.
So I converted them to proper UTF-8 encoding to match their filenames.
—
Reply to this email directly, view it on GitHub.
You are receiving this because you are subscribed to this thread.![]()
Apologies for the confusion in my earlier reply and the incorrect issue reference in the PR description.
Context: I discovered this encoding issue while working on a tool that processes Vim/Neovim runtime files. Python's open(file, encoding='utf-8') raised UnicodeDecodeError on these three keymap files when iterating through them.
The original issue was reported in Neovim: neovim/neovim#31688. I initially submitted a PR there (neovim/neovim#37499), but @clason advised that since these files are synced from Vim, the fix should be made here first, then ported to Neovim.
Note: The closes #16390 in my PR description was a mistake - that PR is unrelated (it's about v:stacktrace). I'll update the description to remove that reference.
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you are subscribed to this thread.![]()
Thanks. I did a quick test:
find runtime -type f -name "*utf-8*.vim" -exec sh -c 'iconv -f utf-8 -t utf-8 "$1" >/dev/null 2>&1 || echo "non utf-8 encoding detected in $1"' find-sh {} \; |grep "non utf-8 encoding"
this found a few more files that claim to be utf-8 but are latin9 encoding.
I think it makes sense to stick something into our CI like this:
find . -type f -name "*utf-8*.vim" -exec sh -c 'iconv -f utf-8 -t utf-8 "$1" >/dev/null 2>&1 || echo "non utf-8 encoding detected in $1"' find-sh {} \; |grep "non utf-8 encoding" && exit 3
Will prepare such a change for CI later and fix the remaining files when I merge this PR here.
—
Reply to this email directly, view it on GitHub.
You are receiving this because you are subscribed to this thread.![]()