fencs trial is terminated unexpectedly.

Taro MURAOKA

unread,

Apr 23, 2013, 11:19:57 AM4/23/13

to vim...@googlegroups.com

Hi list.

When 'enc' is "utf-8" and 'fencs' includes "ucs-2",
and open a file which is not "ucs-2" encoding,
then fencs trial is terminated at "ucs-2" unexpectedly.

For example:

:set enc=utf-8
:set fencs=ucs-2
:e abc.txt

It is failed when opening attached "abc.txt".

I wrote an attached patch to fix this.
Please check it.

Best.

abc.txt

fix_fencs_trial_termination.diff

Ben Fritz

unread,

Apr 23, 2013, 12:14:56 PM4/23/13

to vim...@googlegroups.com

OK, I wasn't sure what the problem actually was from your description, so I downloaded your abc.txt file and tried it myself.

On Windows 7, gvim 7.3.822, with:

gvim -N -u NONE -i NONE

:set enc=utf-8
:set fencs=ucs-2
:e abc.txt

I would expect, from :help 'fileencodings', that Vim would set 'fenc' to an empty string and try to read the file in the utf-8 encoding (falling back to the 'encoding' option).

Instead, I get a CONVERSION ERROR message and fenc is set to ucs-2.

If I use :e ++enc=utf-8 abc.txt, then the file loads correctly.

So 'fileencodings' is not working as documented when no encodings are valid for the file.

I tried again, with 'fencs' set to "ucs-2,utf-8,latin1" which should definitely succeed. It should first try ucs-2, then try utf-8 and succeed. If utf-8 had not succeeded, it should fall back to latin1. Instead, I see the same result: a CONVERSION ERROR message and fenc is now ucs-2.

I did not try your patch, but I agree this is a bug and can readily reproduce it on my system.

Bram Moolenaar

unread,

Apr 23, 2013, 12:19:37 PM4/23/13

to Taro MURAOKA, vim...@googlegroups.com

Thanks. I can easily reproduce the problem, I'll check out the patch
soon.

--
hundred-and-one symptoms of being an internet addict:
212. Your Internet group window has more icons than your Accessories window.

/// Bram Moolenaar -- Br...@Moolenaar.net -- http://www.Moolenaar.net \\\
/// sponsor Vim, vote for features -- http://www.Vim.org/sponsor/ \\\
\\\ an exciting new programming language -- http://www.Zimbu.org ///
\\\ help me help AIDS victims -- http://ICCF-Holland.org ///

Tony Mechelynck

unread,

Apr 23, 2013, 4:33:47 PM4/23/13

to vim...@googlegroups.com

- Especially when 'encoding' is utf-8, it is recommended to start
'fileencodings' with ucs-bom.
- It is always recommended to end the 'fileencodings' with some 8-bit
encoding, which will serve as default
- It is useless to put more than one 8-bit encoding in 'fileencodings',
nothing after the first 8-bit encoding will ever be tried
- ucs-2 is obsolete, utf-16 should be used instead. (UTF-16 can
represent codepoints up to U+10FFFF, using surrogate pairs for anything
above U+FFFF. UCS-2 cannot go further up than U+FFFF and surrogates are
invalid when using it.)
- For ucs-something and utf-something other than utf-8 (and utf-7 which
is also obsolete), big-endian is assumed unless you explicitly specify
little-endian, even when running on a little-endian machine. So, for
Vim, utf-16 is the same as utf-16be, not utf-16le, even on Intel x86
processors.
- It is very hard to detect utf-16 (and the obsolete ucs-2) correctly
unless there is a BOM (in which case ucs-bom will handle it)
- In recent versions of Vim (including all patchlevels of 7.3),
++enc=something completely bypasses the 'fileencodings' heuristics,
forcing the charset you mentioned. You may get � or hollow-box wildcards
if the file contents are invalid for that encoding.

For "Western" locales, I recommend

:set fencs=ucs-bom,utf-8,latin1

For East-Asian locales there is a script somewhere that improves on the
'fileencodings' heuristic (trying to discriminate as best as possible
between the common encodings used for the various CJK languages) but I
don't know the details.

Best regards,
Tony.
--
This message contains 78% recycled characters.

Reply all

Reply to author

Forward