enc, fenc and tenc

Felix von Leitner

unread,

Nov 27, 2008, 8:05:14 AM11/27/08

to v...@vim.org

I am editing a UTF-8 text file. Sometimes I am editing it from a latin1
environment, sometimes from a UTF-8 environment.

In an UTF-8 environment, I get this:

fileencoding=utf-8
termencoding=
encoding=utf-8

In a latin1 environment, I get this:

fileencoding=
termencoding=
encoding=iso-8859-15

and the utf-8 characters are misinterpreted as latin1.

Now, I thought the obvious way to remedy this (the file is called
"journal") is an autocommand:

au BufReadPost journal set fenc=utf-8

But this does not help at all. Why is termencoding not set?
I was expecting it to be set to iso-8859-15 in the latin1 environment
and to utf-8 in the utf-8 environment. Am I supposed to set this
manually? And I was expecting encoding to be set to utf-8
universally. So I tried also setting encoding to utf-8 universally, but
that does not help either. vim then fails to convert utf-8 characters
to the latin1 charset the terminal understands. My locale is set
correctly and inquiring about the charset works fine:

strcmp(nl_langinfo(CODESET), "UTF-8") == 0

What is the problem here? vim is compiled with multibyte support,
obviously. Am I supposed to set tenc manually in a shell script wrapper
around vim?

Felix

Tony Mechelynck

unread,

Nov 27, 2008, 1:40:54 PM11/27/08

to vim...@googlegroups.com, v...@vim.org

No shell script wrapper is needed, and indeed none would work; where to
do it is in the vimrc, see below.

'encoding' (global) is the charset used by Vim to represent data in
memory. By default, it is set to your locale (your "national setting")
but it is possible to change it in your vimrc. You shouldn't change it
after starting to edit because it might make the data in memory invalid.

'termencoding' (global) is the charset used to read from the keyboard
and, in Console mode only, to communicate with the display interface.
The default value is empty, which means "use 'encoding'". If you change
'encoding', you should first save its old value in 'termencoding' to
avoid misunderstandings with the keyboard driver (and, in Console mode,
the display interface too).

'fileencoding' (singular) (buffer-local) is the charset used to
represent a file on disk. When reading, it is usually set by means of
the 'fileencodings' (plural) heuristic (see below), except if you
override it by ++enc (see ":help ++opt").

'fileencodings' (plural) (global) is the heuristic used to determine the
'fileencoding' of a file being opened. It is a comma-separated list of
possible charsets which are tried left-to-right:
- ucs-bom (if present) should be first. It means that a BOM will be
recognised when present at the start of a Unicode file to set the
appropriate 'fileencoding' and the 'bomb' option.
- Multi-byte encodings will be tried left-to-right; if invalid bytes are
found for an encoding being tried, the next encoding in sequence will
then be tried.
- An 8-bit encoding, if present, should be last, because 8-bit encodings
cannot give a "fail" signal.
- If all listed encodings give failure signals, Vim will fall back on
Latin1; I prefer to list it explicitly though.

Note: ISO-8859-15 is not Latin1 (which is ISO-8859-1) but Latin9; it is
also known as "Western with Euro sign". A few of its characters in the
range 0xA0 to 0xBF differ from the corresponding ones of Latin1 (Latin9
Euro sign, French upper- and lower-case OE digraph, capital Y with
diaeresis, etc., which have no representation in Latin1).

Here's what I use in my vimrc:

" set Unicode if possible
" First, check that the necessary capabilities are compiled-in
if has("multi_byte")
" (optional) remember the locale set by the OS
let g:locale_encoding = &encoding
" if already Unicode, no need to change it
" we assume that an encoding name is a Unicode one
" iff its name starts with u or U
if &encoding !~? '^u'
" avoid clobbering the keyboard's charset
if &termencoding == ""
let &termencoding = &encoding
endif
" now we're ready to set UTF-8
set encoding=utf-8
endif
" heuristics for use at file-open
set fileencodings=ucs-bom,utf-8,latin1
" optional: defaults for new files
setglobal bomb fileencoding=utf-8
endif

Best regards,
Tony.
--
Man is the only animal that can remain on friendly terms with the
victims he intends to eat until he eats them.
-- Samuel Butler

Tony Mechelynck

unread,

Nov 27, 2008, 1:40:54 PM11/27/08

to vim...@googlegroups.com, v...@vim.org

On 27/11/08 14:05, Felix von Leitner wrote:

No shell script wrapper is needed, and indeed none would work; where to

Tony Mechelynck

unread,

Nov 27, 2008, 1:49:17 PM11/27/08

to vim...@googlegroups.com, v...@vim.org

On 27/11/08 14:05, Felix von Leitner wrote:

[...]

> Now, I thought the obvious way to remedy this (the file is called
> "journal") is an autocommand:
>
> au BufReadPost journal set fenc=utf-8
>
> But this does not help at all. Why is termencoding not set?

The reason this doesn't work is that by the time the BufReadPost event
is triggered, the file has already been read. Setting 'fenc' there
_tells_ Vim how to translate from memory to disc; but the damage has
been done, it's too late. Anyway, if 'encoding' is set to iso-8859-15
and 'tenc' to empty you cannot read files in Russian, Greek, Chinese, or
indeed any file using one or more glyphs not representable in Latin9.

Best regards,
Tony.
--
Romeo wasn't bilked in a day.
-- Walt Kelly, "Ten Ever-Lovin' Blue-Eyed Years With
Pogo"

Tony Mechelynck

unread,

Nov 27, 2008, 1:49:17 PM11/27/08

to vim...@googlegroups.com, v...@vim.org

On 27/11/08 14:05, Felix von Leitner wrote:
[...]

> Now, I thought the obvious way to remedy this (the file is called
> "journal") is an autocommand:
>
> au BufReadPost journal set fenc=utf-8
>
> But this does not help at all. Why is termencoding not set?

The reason this doesn't work is that by the time the BufReadPost event

Felix von Leitner

unread,

Nov 27, 2008, 9:00:58 PM11/27/08

to v...@vim.org

> > Now, I thought the obvious way to remedy this (the file is called
> > "journal") is an autocommand:
> >
> > au BufReadPost journal set fenc=utf-8
> >
> > But this does not help at all. Why is termencoding not set?
> The reason this doesn't work is that by the time the BufReadPost event
> is triggered, the file has already been read. Setting 'fenc' there
> _tells_ Vim how to translate from memory to disc; but the damage has
> been done, it's too late. Anyway, if 'encoding' is set to iso-8859-15
> and 'tenc' to empty you cannot read files in Russian, Greek, Chinese, or
> indeed any file using one or more glyphs not representable in Latin9.

For the archives: here's how I worked around the problem -- with the
following small C wrapper:

#include <stdlib.h>
#include <unistd.h>
int main(int argc,char* argv[],char* envp[]) {
char* c=getenv("LC_CTYPE");
char** x=alloca((argc+3)*sizeof(char*));
memcpy(x+2,argv+1,argc*sizeof(char*));
x[0]="vim";
if (c && strstr(c,"UTF-8"))
x[1]="+set tenc=utf-8";
else
x[1]="+set tenc=iso-8859-15";
execve("/usr/bin/vim",x,envp);
}

This is no general solution but works for me because my latin1 locale is called
de_DE and my UTF-8 locale is called de_DE.UTF-8.

The more general way would be to do

setlocale(LC_CTYPE, "");
if (!strcmp(nl_langinfo(CODESET), "UTF-8"))
x[1]="+set tenc=utf-8";
else
x[1]="+set tenc=iso-8859-15";

I'm not doing that because setlocale does way more than query the locale
and I want the wrapper to be as small and fast as possible so I don't
notice it.

In my .vimrc I have this line:

au BufReadPost journal set fenc=utf-8 enc=utf-8

I am mentioning this here because a) it solves my problem, b) setting
fenc has the opposite behavior of what you predicted, c) using a wrapper
works although you predicted it wouldn't, and d) someone else might run
into the same problem.

I still maintain vim should always set termencoding to the encoding of the
terminal. In my case either utf-8 or iso-8859-15. Having to set this
manually is just bogus. I don't even think there should be an option to
set this. That's what the locale functions in the libc are for. OTOH I
hear the MacOS locale implementation is sufficiently broken that it
might be needed.

Please note that the documentation also disagrees with your assessment
of whether you can set fenc in a BufReadPost, :help fencs actually
gives an example how to do it.

Also note that setting fileencodings as you recommended also fails.
vim recognizes the file as utf-8 then, and sets enc and fenc to utf-8,
but since it does not set tenc to latin1 if I run vim in a latin1
terminal, the contents is displayed incorrectly.

As I said, it now works for me. Consider this a bug report in case you
do want to fix the issue for other people who need to edit utf-8 files
in an 8-bit environment.

Felix

Matt Wozniski

unread,

Nov 27, 2008, 9:29:34 PM11/27/08

to vim...@googlegroups.com

On Thu, Nov 27, 2008 at 9:00 PM, Felix von Leitner wrote:
>
>> > Now, I thought the obvious way to remedy this (the file is called
>> > "journal") is an autocommand:
>> >
>> > au BufReadPost journal set fenc=utf-8
>> >
>> > But this does not help at all. Why is termencoding not set?
>> The reason this doesn't work is that by the time the BufReadPost event
>> is triggered, the file has already been read. Setting 'fenc' there
>> _tells_ Vim how to translate from memory to disc; but the damage has
>> been done, it's too late. Anyway, if 'encoding' is set to iso-8859-15
>> and 'tenc' to empty you cannot read files in Russian, Greek, Chinese, or
>> indeed any file using one or more glyphs not representable in Latin9.
>
> For the archives: here's how I worked around the problem -- with the
> following small C wrapper:

Is there a reason you found this better than just putting the normal
configuration in your ~/.vimrc?

> As I said, it now works for me. Consider this a bug report in case you
> do want to fix the issue for other people who need to edit utf-8 files
> in an 8-bit environment.

I don't understand what the bug you're saying you have is... If you
want to use utf-8 files in an 8-bit environment, you just need to set
'enc' to utf-8, and set 'tenc' to your 8-bit encoding... ie,

if has('multi_byte')
if empty(&tenc)
let &tenc = &enc
endif
set enc=utf-8
endif

~Matt

Yongwei Wu

unread,

Nov 29, 2008, 4:51:05 AM11/29/08

to vim...@googlegroups.com

2008/11/28 Felix von Leitner <feli...@fefe.de>:

>
>> > Now, I thought the obvious way to remedy this (the file is called
>> > "journal") is an autocommand:
>> >
>> > au BufReadPost journal set fenc=utf-8

What should for your purpose:

function! SetFileEncodings(encodings)
let b:my_fileencodings_bak=&fileencodings
let &fileencodings=a:encodings
endfunction

function! RestoreFileEncodings()
let &fileencodings=b:my_fileencodings_bak
unlet b:my_fileencodings_bak
endfunction

au BufReadPre journal call SetFileEncodings('utf-8')
au BufReadPost journal call RestoreFileEncodings()

> In my .vimrc I have this line:
>
> au BufReadPost journal set fenc=utf-8 enc=utf-8

This breaks your opened buffers, since enc is global.

> I am mentioning this here because a) it solves my problem, b) setting
> fenc has the opposite behavior of what you predicted, c) using a wrapper
> works although you predicted it wouldn't, and d) someone else might run
> into the same problem.

If it solved your problem, it would probably be the combined
side-effect of your actions.

> Please note that the documentation also disagrees with your assessment
> of whether you can set fenc in a BufReadPost, :help fencs actually
> gives an example how to do it.

I believe you still misunderstand. Setting fenc in BufReadPost is
basically the same as setting it when you have loaded file, i.e.
change the file encoding the next time the file is saved.

> Also note that setting fileencodings as you recommended also fails.
> vim recognizes the file as utf-8 then, and sets enc and fenc to utf-8,

Vim would not set enc automatically as you described. enc is set only
by your configuration or the environment, but not the file read.

> but since it does not set tenc to latin1 if I run vim in a latin1
> terminal, the contents is displayed incorrectly.

Probably because there is no safe way for vim to detect the terminal
encoding.

> As I said, it now works for me. Consider this a bug report in case you
> do want to fix the issue for other people who need to edit utf-8 files
> in an 8-bit environment.

You did not prove this is a vim bug, or there is a sure way to fix
the problem on the side of vim.

Best regards,

Yongwei

--
Wu Yongwei
URL: http://wyw.dcweb.cn/

Reply all

Reply to author

Forward