Changing encoding of an already loaded buffer

A. Wik

unread,

Dec 7, 2020, 11:40:28 AM12/7/20

to vim...@googlegroups.com

Hi all,

I sometimes need to change the encoding used for a file. I have the
default set to latin1 except for files with an ucs-bom. However, when
I load a file encoded in UTF-8 or CP-437 the default is wrong. What I
do then is normally to ":set fencs=utf8" and ":vi" to reload the file.

However, what can I do about a file that cannot be reloaded? Eg:

$ man llseek | gvim -f -

To work around it, I have to do this:

$ man llseek > llseek.man
$ gvim llseek.man

Is there another way?

Regards,
Albert.

Gabriele F

unread,

Dec 7, 2020, 3:49:13 PM12/7/20

to vim...@googlegroups.com

The actual "correct" way to "change" the encoding of a buffer is, I
believe, with the "++enc" option, added either to :e (e.g. `:e
++enc=utf8`) or several similar commands such as indeed :vi (`:vi
++enc=utf8`).

However I couldn't find a way to make it work with a file-less buffer,
such as your pipe example:

If I use `:e! ++enc=utf8` I'm given an «E32: No file name» error.

I thought of passing "%" of "#n" as the filename for :e (`:e ++enc=utf8
%`), but it doesn't work, I'm given a «E499: Empty file name for '%' or
'#', only works with ":p:h"» error (and indeed the `:h _%` stuff is
described as standing for "file names", not for the actual buffers).

Then I tried adding a filename, with `:file whatever`, but once that's
done :e! loads a new empty buffer named "whatever"...

So there doesn't seem to be a way to really reload (possibly with
different encoding options) the current buffer, only to reload the file
from which the current buffer was loaded, and so for file-less buffers
no way at all.

However under Linux and other systems there may well be a way to access
the buffer's file's descriptor (/dev/fd/0 ?), so it might work by
passing that as the filename.

And there's probably some other way by copying the text around.

By the way, apparently this also means that you can't even set the
encoding of a pipe that you haven't yet created, from the shell, since
to the best of my knowledge the only way to set the encoding of a file
from the shell, before opening it, is `vim +":e ++enc=<encoding>
<filename>"` (which actually means to open it from inside vim). But
maybe you can with some more intricate command.

I'm far from being Vim expert however, I might well be missing something
(or a lot).

And encoding stuff is in general quite a mess in Vim, I'll grumble about
it one time or another... :/

Cheers

Gabriele F

unread,

Dec 7, 2020, 3:55:11 PM12/7/20

to vim...@googlegroups.com

Ah yes, I had also tried passing "-" as a filename for the reload
attempts, nope, it was interpreted as an actual "-" file name...

Tony Mechelynck

unread,

Dec 7, 2020, 8:45:27 PM12/7/20

to awi...@gmail.com, vim_use

If you find out after loading the stdin that it was opened in the
wrong encoding, then it's too late; but if you know the file's
encoding in advance, the should be a way, especially if your
'encoding' (the charset used internally by Vim) is UTF-8 and if your
Vim is compiled with +iconv.

To be able to detect Latin1 and UTF-8 (and UTF-16 with BOM) automagically, add
set fileencodings=ucs-bom,utf-8,latin1
somewhere in your vimrc (the s at the end of fileencodings is
important); but this isn't enough for files in cp437, especially if
Vim gets them on stdin. For those, load them with (untested)
someprogram | view ++enc=cp437 -
(the minus sign at the end is important) which means that you have to
know the file's encoding before starting Vim if it is other than UTF-8
or Latin1. Using "view" instead of "vim" on the command-line avoids
problems with the 'modified' flag; for ++enc see ":help ++enc".

The above will detect files in 7-bit us-ascii encoding as utf-8 rather
than Latin1. This is not a bug, because the 128 characters which are
valid in us-ascii are represented identically in all three in
us-ascii, Latin1 and UTF-8.

Best regards,
Tony.

A. Wik

unread,

Dec 8, 2020, 4:48:08 AM12/8/20

to Tony Mechelynck, vim_use

Hi all,

I tried a few things:

(1) gvim -f ++enc=utf8 -
result: "E492: Not an editor command: +enc=utf8
(2) gvim -f +enc=utf8 -
result: see (1)
(3) gvim -f +"set fenc=utf8" -
result: no error message; sets fenc to "utf-8", but file is loaded as
if with latin1.
(4) gvim -f -c "set fenc=utf8" -
result: see (3)
(5) gvim -f --cmd "set fenc=utf8" -
no error message; fenc remains is "latin1"

A different approach:
(6) (man llseek ; echo 'vim:fenc=utf8:') | gvim -f -
result: no error message; fenc gets set to "utf-8"; file is loaded as
if with latin1

See also below:

On Tue, 8 Dec 2020 at 01:45, Tony Mechelynck
<antoine.m...@gmail.com> wrote:
>
> If you find out after loading the stdin that it was opened in the
> wrong encoding, then it's too late; but if you know the file's
> encoding in advance, the should be a way, especially if your
> 'encoding' (the charset used internally by Vim) is UTF-8 and if your
> Vim is compiled with +iconv.

Both conditions hold true.

> To be able to detect Latin1 and UTF-8 (and UTF-16 with BOM) automagically, add
> set fileencodings=ucs-bom,utf-8,latin1

I tried that months ago. The result was that new files were assumed
to have fenc=utf-8, for reasons you mention below. This is not
acceptable, so I use "fileencodings=ucs-bom,latin1,cp437" (yes, I know
the trailing ",cp437" is pointless).

> somewhere in your vimrc (the s at the end of fileencodings is
> important); but this isn't enough for files in cp437, especially if
> Vim gets them on stdin. For those, load them with (untested)
> someprogram | view ++enc=cp437 -

I tested it; see top of message.

> The above will detect files in 7-bit us-ascii encoding as utf-8 rather
> than Latin1. This is not a bug, because the 128 characters which are
> valid in us-ascii are represented identically in all three in
> us-ascii, Latin1 and UTF-8.

Right!

Cheers,
Albert.

A. Wik

unread,

Dec 8, 2020, 5:00:39 AM12/8/20

to vim_use

On Mon, 7 Dec 2020 at 20:49, Gabriele F <gb...@tiscali.it> wrote:
>
> The actual "correct" way to "change" the encoding of a buffer is, I
> believe, with the "++enc" option, added either to :e (e.g. `:e
> ++enc=utf8`) or several similar commands such as indeed :vi (`:vi
> ++enc=utf8`).

Thanks, I didn't know about that. It's more convenient than changing
the "fileencodings".

> However I couldn't find a way to make it work with a file-less buffer,
> such as your pipe example:

Right. The only way I've found is to use a temporary file.
Incidentally, the zsh shell makes that easy:
% gvim -f =(man llseek)

Regards,
Albert.

Bram Moolenaar

unread,

Dec 8, 2020, 7:55:45 AM12/8/20

to vim...@googlegroups.com, A. Wik

Assuming that loading the text as latin1 didn't mess it up (since it's
an 8 bit encoding it should be OK), then you can convert it to utf-8
with:
:set fencs=utf-8,latin1
:%!iconv -f latin1 -t utf-8

Vim might recognize the utf-8 encoding, if not set set 'fenc':
:set fenc=utf8

Hopefully that works.

--
You can be stopped by the police for biking over 65 miles per hour.
You are not allowed to walk across a street on your hands.
[real standing laws in Connecticut, United States of America]

/// Bram Moolenaar -- Br...@Moolenaar.net -- http://www.Moolenaar.net \\\
/// sponsor Vim, vote for features -- http://www.Vim.org/sponsor/ \\\
\\\ an exciting new programming language -- http://www.Zimbu.org ///
\\\ help me help AIDS victims -- http://ICCF-Holland.org ///

A. Wik

unread,

Dec 8, 2020, 8:59:04 AM12/8/20

to Bram Moolenaar, vim_use

On Tue, 8 Dec 2020 at 12:55, Bram Moolenaar <Br...@moolenaar.net> wrote:

>
>
> Albert Wik wrote:
> >
> > Right. The only way I've found is to use a temporary file.
> > Incidentally, the zsh shell makes that easy:
> > % gvim -f =(man llseek)
>
> Assuming that loading the text as latin1 didn't mess it up (since it's
> an 8 bit encoding it should be OK), then you can convert it to utf-8
> with:
> :set fencs=utf-8,latin1
> :%!iconv -f latin1 -t utf-8
>
> Vim might recognize the utf-8 encoding, if not set set 'fenc':
> :set fenc=utf8
>
> Hopefully that works.

Thanks a lot for the "%!"-idea! That's what I needed.

This works:
:set fencs=utf8
:%!cat
although "fenc" remains "latin1".

It is not appropriate to use "iconv -f latin1 -t utf8" (that does in
fact corrupt the data!) because the data is already in UTF-8, and that
is why it is not displayed properly in Vim (because Vim thinks it is
in Latin-1); in particular, the short dash character is shown as
"â<80><90>". When it is displayed properly, a "‐" is shown; putting
the cursor at it and doing "ga" reports that this is character number
0x2010.

Why does "set fencs=utf8" matter for the "%!cat" operation if Vim is
not going to change the "fenc" accordingly?

Cheers,
Albert.

Bram Moolenaar

unread,

Dec 8, 2020, 11:47:46 AM12/8/20

to vim...@googlegroups.com, A. Wik

Albert Wik wrote:

> > > Right. The only way I've found is to use a temporary file.
> > > Incidentally, the zsh shell makes that easy:
> > > % gvim -f =(man llseek)
> >
> > Assuming that loading the text as latin1 didn't mess it up (since it's
> > an 8 bit encoding it should be OK), then you can convert it to utf-8
> > with:
> > :set fencs=utf-8,latin1
> > :%!iconv -f latin1 -t utf-8
> >
> > Vim might recognize the utf-8 encoding, if not set set 'fenc':
> > :set fenc=utf8
> >
> > Hopefully that works.
>
> Thanks a lot for the "%!"-idea! That's what I needed.
>
> This works:
> :set fencs=utf8
> :%!cat
> although "fenc" remains "latin1".

Yeah, for an existing buffer and filtering the first entry in 'fencs' is
used to read the filter output, but 'fenc' isn't set. That's a bit
strange, but I'm not sure what would break if we change this. It might
actually be good to fix this, since if you write that file it might get
messed up.

> It is not appropriate to use "iconv -f latin1 -t utf8" (that does in
> fact corrupt the data!) because the data is already in UTF-8, and that
> is why it is not displayed properly in Vim (because Vim thinks it is
> in Latin-1); in particular, the short dash character is shown as
> "â<80><90>". When it is displayed properly, a "‐" is shown; putting
> the cursor at it and doing "ga" reports that this is character number
> 0x2010.
>
> Why does "set fencs=utf8" matter for the "%!cat" operation if Vim is
> not going to change the "fenc" accordingly?

When reading a file (or filter output) the values in 'fencs' are tried
one by one. Normally when something fails then the next one is tried,
but since reading filter output from a pipe doesn't allow for a retry,
it will always use the first one.

The real problem is that 'fencs' was set to "latin1" at first, thus Vim
didn't even try to use another encoding. Perhaps it also works if you
do that on the command line:
somecommand | vim - -c 'set fencs=utf8,latin1'

Didn't try it. Should at least work if you set 'fencs' in your .vimrc.

--
If an elephant is left tied to a parking meter, the parking fee has to be paid
just as it would for a vehicle.
[real standing law in Florida, United States of America]

A. Wik

unread,

Dec 9, 2020, 12:47:45 PM12/9/20

to Bram Moolenaar, vim_use

On Tue, 8 Dec 2020 at 16:47, Bram Moolenaar <Br...@moolenaar.net> wrote:
>
>
> Albert Wik wrote:
> >

> > Why does "set fencs=utf8" matter for the "%!cat" operation if Vim is
> > not going to change the "fenc" accordingly?
>
> When reading a file (or filter output) the values in 'fencs' are tried
> one by one. Normally when something fails then the next one is tried,
> but since reading filter output from a pipe doesn't allow for a retry,
> it will always use the first one.

Thanks, that is useful to know.

> The real problem is that 'fencs' was set to "latin1" at first, thus Vim
> didn't even try to use another encoding. Perhaps it also works if you
> do that on the command line:
> somecommand | vim - -c 'set fencs=utf8,latin1'

No, because (according to --help) the command is run after loading the
first file. Meanwhile, "--cmd <command>" does not work because it
runs the command before sourcing any vimrc file, and so, the new fencs
setting gets overwritten by the vimrc. It would be useful to have an
option to run a command just *before* loading the first file but after
any rc-files.

I don't include utf8 in my default fencs setting because that has the
side effect of using utf8 for any newly created files.

-aw

Gabriele F

unread,

Dec 9, 2020, 2:35:54 PM12/9/20

to vim...@googlegroups.com

On 08/12/2020 14.58, A. Wik wrote:
> Thanks a lot for the "%!"-idea! That's what I needed.
>
> This works:
> :set fencs=utf8
> :%!cat

That :%!cat is indeed a neat (if hacky) idea!

Gabriele F

unread,

Dec 9, 2020, 2:35:54 PM12/9/20

to vim...@googlegroups.com

On 08/12/2020 10.47, A. Wik wrote:
> Hi all,
>
> I tried a few things:
>
> (1) gvim -f ++enc=utf8 -
> result: "E492: Not an editor command: +enc=utf8
> (2) gvim -f +enc=utf8 -
> result: see (1)
> (3) gvim -f +"set fenc=utf8" -
> result: no error message; sets fenc to "utf-8", but file is loaded as
> if with latin1.
> (4) gvim -f -c "set fenc=utf8" -
> result: see (3)
> (5) gvim -f --cmd "set fenc=utf8" -
> no error message; fenc remains is "latin1"

Yes, I tried stuff like that while perusing the manual a hundred times,
it can't work and that's also kind of declared in some points of the
documentation; :h fenc is a jungle, and I seem to remember that it's
also not completely correct. Basically 'fenc' is only looked at when
writing a file, and who knows what the output of that write will be.

So essentially, besides 'fencs', the ++enc "opt" (which **has nothing to
do with the 'enc' option!!!**) is the only thing that can have an effect
when reading a file, and after it's read you better forget about fixing
its encoding.

The only way forward in my opinion would be to deprecate 'enc', 'fenc',
++enc and probably 'fencs', giving warnings when they do get used, and
introduce completely different options and commands.

Gabriele F

unread,

Dec 9, 2020, 2:35:56 PM12/9/20

to vim...@googlegroups.com

On 08/12/2020 17.47, Bram Moolenaar wrote:
>> This works:
>> :set fencs=utf8
>> :%!cat
>> although "fenc" remains "latin1".
> Yeah, for an existing buffer and filtering the first entry in 'fencs' is
> used to read the filter output, but 'fenc' isn't set. That's a bit
> strange, but I'm not sure what would break if we change this. It might
> actually be good to fix this, since if you write that file it might get
> messed up.

I performed a couple of tests trying to write the result to a file after
doing the above (using a correct UTF-8 file as source):
- if you leave fenc to latin1 the new file will be in latin1 (with all
the characters correctly encoded)
- if you set fenc to utf8 *after* the %!cat (but of course before
writing the file) the new file will be in UTF-8 with all the characters
correctly encoded
- if you set fenc to utf8 *before* the %!cat (and of course before
writing the file) the new file will be... a mess: by all appearances Vim
thinks that the individual bytes of the UTF-8 file are individual latin1
characters, and it then converts them to UTF-8; so you'll get a UTF-8
encoded file with the wrong characters, e.g. a "C3 B2" sequence in the
original file, which stands for a UTF-8 encoded "ò", (Unicode code point
F2) will become a "C3 83 C2 B2" sequence in the written file: "C3" is a
"Â" in latin1 (and yes, in Unicode too), and "Â" is encoded as "C3 83"
in UTF-8, "B2" is a "²" in latin1 (and Unicode) and "²" is encoded as
"C2 B2" in UTF-8 (in case someone noticed it, don't let yourself get
confused by the fact that C3 and B2 occur both in the source and the
translated sequence, that's largely just an unfortunate coincidence of
my example).

Given that Unicode is identical to latin1 in the first 256 characters,
to better confirm what happened I also tried using another charset
(cp850) instead of latin1 in the above tests (fencs=cp850 in my vimrc
and setting fenc=cp850 in the second and third tests), still using a
correct UTF-8 file as a source; the results are analogous, with a
correct cp850 file in the first test, a correct UTF-8 one in the second
and a UTF-8 one with the original file's bytes interpreted as cp850 and
then converted to UTF-8 in the third (the original "ò", "C3 83", becomes
a "E2 94 9C E2 96 93" sequence, given that "C3" is a "├" symbol in
cp850, Unicode code point 251C -> "E2 94 9C" UTF-8, and 83 is a "▓",
Unicode code point 2593 -> "E2 96 93" UTF-8).

Yes, I... ahem, had a lot of fun this afternoon :D

Cheers

Gabriele F

unread,

Dec 9, 2020, 3:20:30 PM12/9/20

to vim...@googlegroups.com

On 09/12/2020 18.47, A. Wik wrote:
> I don't include utf8 in my default fencs setting because that has the
> side effect of using utf8 for any newly created files.

Completely off-topic, if you don't have particular needs I'd advise you
to use UTF- 8 with BOMs for all your new files ('set bomb', 'set
encoding=utf-8' and 'fenc' left to the default in your vimrc), it will
prevent any future encoding problem for at least them.

I've been doing so for more than a decade and pretty much never had
problems, and sigh a relief every time I see I'm working with one of them.

I heard many protest the BOMs in UTF-8, but they are the first thing
ever to allow a reliable encoding detection and they solve a lot more
problems than they can cause (if they cause problems they usually do so
immediately and noticeably, much better than discovering years later
that you irremediably botched the encoding of some file). So I find it
absurd to disparage them, and delusive to think that we'll ever get to a
point when non-utf8 files will be rare enough that we won't need to
handle them.
I imagine most of the critics are from countries that never needed more
than ASCII

Tony Mechelynck

unread,

Dec 9, 2020, 11:58:41 PM12/9/20

to vim_use

IIUC the critics are from people who do a lot of programming, either
in C (where sources are supposed to be in Latin1; they may be in UTF-8
if characters above U+007F are used only in alphanumeric literals, but
they cannot start with a BOM) or in Perl, Python, Unix shell script
language, etc. (where the first two bytes of a source file must be #!
in that order):

The problem with ":setg fenc=utf8 bomb" is that *every* new text file
will start with 0xEF 0xBB 0xBF unless you explicitly turn it off for
that file by means of ":setl nobomb" or ":setl fenc=latin1" or similar
before writing it. For C sources this wil confuse the compiler
(generating an error and preventing successful compilation) and for
anything starting with a shebang (shell scripts, perl sources, etc.)
it will prevent the #! shebang leader from being recognized. OTOH for
"well-behaved" filetypes like Vim scripts (if not run by means of a
shebang), HTML pages, CSS style sheets, etc., there is no problem. So
whether or not to set it should depend on what types of files you
write most often. I use it because most of the files I write are HTML
or CSS, followed by Vim scripts; but then when I write a shell script
I have to remember to turn the 'bomb' setting off for that file.

Best regards,
Tony.

A. Wik

unread,

Dec 10, 2020, 8:04:27 AM12/10/20

to vim_use

On Wed, 9 Dec 2020 at 20:20, Gabriele F <gb...@tiscali.it> wrote:
>
> On 09/12/2020 18.47, A. Wik wrote:
> > I don't include utf8 in my default fencs setting because that has the
> > side effect of using utf8 for any newly created files.
>

> Completely off-topic, if you don't have particular needs ...

I just like to keep things "8-bit clean". As long as all tools used
to process the files are also 8-bit clean, nothing gets corrupted.
Alas, it does mean files are sometimes displayed incorrectly. But in
my experience, it gets messy when I introduce UTF-8.

> I imagine most of the critics are from countries that never needed more
> than ASCII

There is something to it. People who use only ASCII seem to like
UTF-8 better than those who frequently use non-English characters.
I've seen claims that UTF-8 is "compact" but compared to strictly
8-bit character sets like Latin-1 it is not.

-aw

Tony Mechelynck

unread,

Dec 10, 2020, 8:23:39 AM12/10/20

to vim_use

- For pure 7-bit ASCII, all three of us-ascii, Latin1 and UTF-8 are
equivalent, they represent the data identically.
- For "Western Latin" (French, Spanish, etc.) Latin1 is slightly more
economical than UTF-8. How much more depends on the percent abundance
of accented letters not found in ASCII.
- When mixing several scripts (at least two of Latin, Greek, Cyrillic,
Hebrew, Arabic, CJK ideographic, etc.) within a single document, I
know no better encoding than UTF-8. In an 8-bit charset like Latin1
you have only (at most) 256 different valid character values, and that
is much too few as soon as you start mixing scripts: be it for a
juxtalinear edition of the Bible (with the original Hebrew, Aramaic or
Greek text next to a translation and/or commentary) or for a
Greek-Russian or Russian-Finnish dictionary. And of course even for a
single CJK script, no 8-bit script can do the job.

Best regards,
Tony.

Gabriele F

unread,

Dec 10, 2020, 12:09:00 PM12/10/20

to vim...@googlegroups.com

I should add that those tests were all made with 'encoding' set in my
vimrc to utf-8, I haven't tried with the default latin1 or other values.
I don't know if this influenced something.

That's the setting that A. Wik said to have as well, anyway.

Gabriele F

unread,

Dec 10, 2020, 12:09:18 PM12/10/20

to vim...@googlegroups.com

On 09/12/2020 20.35, Gabriele F wrote:
> That :%!cat is indeed a neat (if hacky) idea!

It should be noted that it works only as long as the 'shelltemp' option
is on though, which is the default.

'shelltemp' makes Vim use a temporary file for the filtering instead of
a pipe, which is evidently the (probably accidental) cause of the
effects on the encoding.

Boyko Bantchev

unread,

Dec 10, 2020, 12:18:45 PM12/10/20

to vim...@googlegroups.com

On Thu, 10 Dec 2020 at 15:04, A. Wik <awi...@gmail.com> wrote:
>
> On Wed, 9 Dec 2020 at 20:20, Gabriele F <gb...@tiscali.it> wrote:

> ..............

> > I imagine most of the critics are from countries that never needed more
> > than ASCII
>
> There is something to it. People who use only ASCII seem to like
> UTF-8 better than those who frequently use non-English characters.
> I've seen claims that UTF-8 is "compact" but compared to strictly
> 8-bit character sets like Latin-1 it is not.

To people who use only ASCII the distinction between ASCII and
UTF-8 is totally irrelevant, because in their case UTF-8 is precisely ASCII
by definition.

But people like me, who regularly use scripts other than Latin, and who
also like to indulge themselves with mathematical and other ‘special’
characters in plain text – they are those who really appreciate and
praise the advent of Unicode and UTF-8.

Gabriele F

unread,

Dec 10, 2020, 12:22:38 PM12/10/20

to vim...@googlegroups.com

On 09/12/2020 21.19, Gabriele F wrote:
> Completely off-topic, if you don't have particular needs I'd advise
> you to use UTF- 8 with BOMs for all your new files ('set bomb', 'set
> encoding=utf-8' and 'fenc' left to the default in your vimrc), it will
> prevent any future encoding problem for at least them.
>
> I've been doing so for more than a decade and pretty much never had
> problems, and sigh a relief every time I see I'm working with one of
> them.

I should have specified that in that time I used mostly other text
editors, and on Windows, I've been using Vim only for a few years and I
still use more frequently other editors.
Although I do have a "set bomb" in my vimrc, I have less experience with
it in Vim, and still am on Windows most of the time.

Gabriele F

unread,

Dec 10, 2020, 12:35:11 PM12/10/20

to vim...@googlegroups.com

On 10/12/2020 5.58, Tony Mechelynck wrote:
> The problem with ":setg fenc=utf8 bomb" is that *every* new text file
> will start with 0xEF 0xBB 0xBF unless you explicitly turn it off for
> that file by means of ":setl nobomb" or ":setl fenc=latin1" or similar
> before writing it.

That's the point, indeed

> For C sources this wil confuse the compiler
> (generating an error and preventing successful compilation) and for
> anything starting with a shebang (shell scripts, perl sources, etc.)
> it will prevent the #! shebang leader from being recognized. OTOH for

It's true, it depends on what you most do in the editor, if you need to
frequently create files that cannot have a BOM in them, it's most likely
inconvenient. Maybe use more than one editor, or aliases with different
configurations...?

I indeed personally use text editors mostly for normal textual or web
files, use mostly IDEs for programming, rarely edit shell scripts, and
it actually may well be that I usually left bomb disabled when using
unices...

Anyway, for textual files or filetypes that do support the BOM, I
believe it's more beneficial to include it, and that it should not be
discouraged.

Gabriele F

unread,

Dec 10, 2020, 3:00:48 PM12/10/20

to vim...@googlegroups.com

On 10/12/2020 14.04, A. Wik wrote:
> I just like to keep things "8-bit clean". As long as all tools used
> to process the files are also 8-bit clean, nothing gets corrupted.
> Alas, it does mean files are sometimes displayed incorrectly. But in
> my experience, it gets messy when I introduce UTF-8.

Ok, my experience instead is that a lot of tools do mess up the
encodings and its hard to promptly recognize those mess-ups when not
using a UTF encoding. I guess it comes up to one's usual tools, needs
and habits.

> There is something to it. People who use only ASCII seem to like
> UTF-8 better than those who frequently use non-English characters.
> I've seen claims that UTF-8 is "compact" but compared to strictly
> 8-bit character sets like Latin-1 it is not.

Maybe that was in the first years of UTF-8, now several tests showed
that UTF-8 is fairly efficient even for asian languages, so I think it's
generally well accepted and the controversy is just on the BOM.
Anyway I don't think anyone who needs non-english characters has ever
favoured any old non-unicode encoding, Unicode is a bliss precisely for
them.

Reply all

Reply to author

Forward