vim cannot decode 'cp936' properly

88 views
Skip to first unread message

Yubin Ruan

unread,
May 31, 2016, 10:37:48 AM5/31/16
to Vimdev
Currently I have to deal with some files from Windows on Linux. Those
files are encoded in 'cp936', which is a Chinese encoding schema on
Windows, and it can be decode properly on Windows using vim, in which I
see the encoding name 'cp936'.

However, when on Linux, vim didn't decode those file properly. At first,
vim automatically decoded those files using 'latin-1', which of course
didn't work. Then I manually change the encoding to 'cp936' using
command 'set encoding=cp936'. But it didn't work either. Then I try 'set
fileencoding=cp936', the same.

Can anyone tell me why the same file is treated differently by vim when
I set the same encode/decode schema ? On Windows, vim decode properly
with the 'cp936' schema, but on Linux, it just doesn't work. Isn't that
weird? Or is there anything that I did wrong ?

Also, I have tried to decode that message with Python, and it work
expectedly:
f=open("file_name","r")
text=f.read()
print text.decode('cp936')

I have attached a sample file, if anyone may be interested.
c06_mbr.asm

Tony Mechelynck

unread,
May 31, 2016, 11:11:33 AM5/31/16
to vim_dev, Vimdev
'encoding' should not be changed except at the start of your vimrc,
before loading any file, because otherwise it can have far-reaching
effects everywhere in your Vim session, potentially making
already-loaded text unreadable or corrupt.

I recommend to se 'encoding' to UTF-8 once and for all, see
http://vim.wikia.com/wiki/Working_with_Unicode

Not every file-encoding can be automagically identified by Vim; in
particular, East-Asian encodings aren't easy to recognize
automatically. Now and then you'll meet one which Vim cannot detect,
and in that case it will usually fall back to Latin1.

But if you know what encoding the file uses, you can tell Vim, see
":help ++opt". For instance, if you know that file "filename.ext" is
in cp936, you can open it in a new window with

:new ++enc=cp936 filename.ext

Of course, if you _don't_ know which one of a possible limited set of
encodings your file is in, you might find it out by trial and error,
by using the various encodings (starting with the most likely one) as
the argument of ++enc=

This may require +iconv, which is usually compiled-in with multibyte
versions of Vim but may require an additional external library if it
wasn't statically included in your Vim binary. (If ":echo
has('iconv')", without the double quotes but with the single ones,
answers 1, then you've got it and any necessary library was found. If
it answers 0, then either iconv wasn't compiled-in, or, if dynamically
included, the library wasn't found by Vim at runtime.)

Or, since cp936 is a Windows encoding (see
https://en.wikipedia.org/wiki/Code_page_1386 ), if your Unix/Linux
system does not know about cp936 you might have more luck with
GB18030, which is the new official PRC standard and supersedes it.
IIUC it is a superset of cp936, see
https://en.wikipedia.org/wiki/GB_18030


Best regards,
Tony.
> --
> --
> You received this message from the "vim_dev" maillist.
> Do not top-post! Type your reply below the text you are replying to.
> For more information, visit http://www.vim.org/maillist.php
>
> --- You received this message because you are subscribed to the Google
> Groups "vim_dev" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to vim_dev+u...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

Yubin Ruan

unread,
May 31, 2016, 8:27:24 PM5/31/16
to antoine.m...@gmail.com, Vimdev, vim...@googlegroups.com
Yes,thanks, that " ++enc=cp936 " really work.

Regards,
Ruan.

Reply all
Reply to author
Forward
0 new messages