Setting file encoding in a modeline

21 views
Skip to first unread message

A. Wik

unread,
Dec 10, 2020, 8:37:27 AM12/10/20
to vim_use
It seems to me that a modeline would be a convenient place to set the
encoding used for a file. However, while it does set 'fenc'
accordingly, the file is not loaded and displayed according to this
setting.

Bram said reading is tried with each encoding in fencs until one
succeeds. Why not reload the file with the correct encoding once a
modeline with a fenc setting has been read?

-Albert.

Christian Brabandt

unread,
Dec 10, 2020, 8:43:14 AM12/10/20
to vim_use
The problem with this is, it is a catch 22. To be able to handle the
modeline, Vim must have already read correctly the buffer with the
correct encoding.

So if it was wrong, the modeline could not be read correctly and
therefore this doesn't help.

Best,
Christian
--
Im Knast, da schmor'n die Bösen, die Guten dürfen draußen dösen.

Tony Mechelynck

unread,
Dec 10, 2020, 8:57:47 AM12/10/20
to vim_use
You are against a chicken-and-egg problem here: the modeline can only
be interpreted after the file has ben read into a buffer, and that is
too late for setting the 'fileencoding'. For a disk file at least,
there are the following ways to circumvent that problem:
(a) By setting the 'fileencodings' (plural) heuristic to something
starting with "ucs-bom,utf-8" without the quotes and in that order,
then any Unicode file with BOM, and any UTF-8 file with or without a
BOM, will be correctly detected. (A side-effect is that files in 7-bit
ASCII will be detected as UTF-8 but this is not a bug: indeed, UTF-8
and US-ASCII represent anything between U+0000 and U+007F
identically.) The default setting of "ucs-bom, utf-8,default,latin1"
is an example of this method.
(b) For other files whose charset you know in advance, see :help ++enc
(c) If a file is read by mistake in the wrong encoding, reread it
immediately in another (guessed) better charset. You may need to try
several times until it looks right. Example, for a Japanese file
mistakenly read in Latin1
:e! ++enc=sjis
The exclamation mark discards any possible changes and rereads the file "as-is".

Best regards,
Tony.

A. Wik

unread,
Dec 10, 2020, 9:04:39 AM12/10/20
to vim_use
On Thu, 10 Dec 2020 at 13:43, Christian Brabandt <cbl...@256bit.org> wrote:
>
>
> On Do, 10 Dez 2020, A. Wik wrote:
>
> > It seems to me that a modeline would be a convenient place to set the
> > encoding used for a file. However, while it does set 'fenc'
> > accordingly, the file is not loaded and displayed according to this
> > setting.
> >
> > Bram said reading is tried with each encoding in fencs until one
> > succeeds. Why not reload the file with the correct encoding once a
> > modeline with a fenc setting has been read?
>
> The problem with this is, it is a catch 22. To be able to handle the
> modeline, Vim must have already read correctly the buffer with the
> correct encoding.
>
> So if it was wrong, the modeline could not be read correctly and
> therefore this doesn't help.

I see, but let's assume there is a "latin1" in 'fencs', and Vim reads
the file as if it is in this encoding, and successfully decodes a
modeline that says "fenc=cp437", and sets the 'fenc' accordingly, then
why not also do a reload or just a "re-display" based on this setting?

-aw

Bram Moolenaar

unread,
Dec 10, 2020, 11:04:11 AM12/10/20
to vim...@googlegroups.com, A. Wik

Albert Wik wrote:

> > > It seems to me that a modeline would be a convenient place to set the
> > > encoding used for a file. However, while it does set 'fenc'
> > > accordingly, the file is not loaded and displayed according to this
> > > setting.
> > >
> > > Bram said reading is tried with each encoding in fencs until one
> > > succeeds. Why not reload the file with the correct encoding once a
> > > modeline with a fenc setting has been read?
> >
> > The problem with this is, it is a catch 22. To be able to handle the
> > modeline, Vim must have already read correctly the buffer with the
> > correct encoding.
> >
> > So if it was wrong, the modeline could not be read correctly and
> > therefore this doesn't help.
>
> I see, but let's assume there is a "latin1" in 'fencs', and Vim reads
> the file as if it is in this encoding, and successfully decodes a
> modeline that says "fenc=cp437", and sets the 'fenc' accordingly, then
> why not also do a reload or just a "re-display" based on this setting?

These days using utf-8 is the standard way. If you still have a file
laying around in another encoding and you are OK editing it, it's easier
to convert it to utf-8 then to add a modeline.

It's also easy to get wrong: If 'fenc' is set in the modeline, one can
write the file in another encoding and mess it up.

If you really want to, you can make a BufReadPost autocommand that does
it for you. You don't even need a modeline then, anything you can
recognize the file by would do (e.g. path prefix or some text found in
the file).

--
A law to reduce crime states: "It is mandatory for a motorist with criminal
intentions to stop at the city limits and telephone the chief of police as he
is entering the town.
[real standing law in Washington, United States of America]

/// Bram Moolenaar -- Br...@Moolenaar.net -- http://www.Moolenaar.net \\\
/// sponsor Vim, vote for features -- http://www.Vim.org/sponsor/ \\\
\\\ an exciting new programming language -- http://www.Zimbu.org ///
\\\ help me help AIDS victims -- http://ICCF-Holland.org ///

Gabriele F

unread,
Dec 10, 2020, 2:48:32 PM12/10/20
to vim...@googlegroups.com
On 10/12/2020 17.03, Bram Moolenaar wrote:
> These days using utf-8 is the standard way. If you still have a file
> laying around in another encoding and you are OK editing it, it's easier
> to convert it to utf-8 then to add a modeline.

Unfortunately in some cases it's still necessary to keep using a
specific encoding, on files still being occasionally edited and for
which a modeline would work.
So, it would still be nice if it could be done.
The prospect of being able to declare the encodings in that way was
actually one of the main reasons for me to learn Vim, of course until I
found out it's the one thing that Vim's modelines absolutely cannot do...



> It's also easy to get wrong: If 'fenc' is set in the modeline, one can
> write the file in another encoding and mess it up.

I'm not sure what you mean (wouldn't that happen only if the user
changed 'fenc' manually?), but if this feature were to be implemented it
would absolutely make sense to do so through some new syntax instead of
changing how 'fenc' in the modeline is interpreted, because:
- changing that would easily cause hard-to-notice encoding problems if
one were to use an older Vim version on files with these 'fenc'
modelines (and it's not unusual to have to use older Vims from time to time)
- it's very hard to predict the effect of 'fenc' and the other current
encoding options

If it were implemented anyway it would seem reasonable to go check the
modeline before writing and ask for confirmation if the encoding
declared there didn't match the one about to be used.

---

I don't know if it would make more sense to introduce a
modeline-pseudo-option, for example "vim: expected-encoding=cp447", or a
completely new modeline type such as "encoding: cp447", of course that
could be used in addition to a normal modeline line.

For the pseudo-option alternative, this would not correspond in any way
to an actual Vim option, it would only be something recognized and
handled by the modeline interpreter.
This approach would have the effect of producing an "Unknown option"
warning on older Vim versions, which could be argued to be either a good
or a bad thing:
- good because you'd have a very visible warning of the encoding caveats
even on these older versions
- bad because it could be a nuisance to someone.
I would lend towards the "good" view.

For the new modeline alternative, it would have the pro/con of not
giving rise to warnings, it might be a little less confusing, and might
have the advantage of being easier to pick up and support by other
editors too.

Indeed it would be nice to have something agreed with other popular editors.

About that... couldn't we just support the Emacs thing?
-*- coding: cp437  -*-
works there
(https://www.gnu.org/software/emacs/manual/html_node/emacs/Specifying-File-Variables.html).
Of course just for the encoding, I'm not arguing for attempting to
support other Emac's file variables, and not necessarily all its coding:
values either.
Is this something completely inconceivable?

By the way, this Emacs thing also gives you an example of how the
feature can be implemented.

But maybe a completely new syntax for all editors and other software
would be fairer and easier to accept for everyone (though probably hard
to agree upon).



> If you really want to, you can make a BufReadPost autocommand that does
> it for you. You don't even need a modeline then, anything you can
> recognize the file by would do (e.g. path prefix or some text found in
> the file).

Hmm that's an option to keep in mind, but:
- it's quite a bit harder to use
- in the path prefix case, the encoding of a file is a property of that
file, "detaching" it and putting it in Vim's configuration in many cases
would be not very clean and more prone to lead to encoding mistakes.

I think that even in this era of convergence towards UTF-8 the ability
to declare the encoding would be useful to enough people to warrant an
easy-to-use feature, especially if it were something that could be
adopted by other programs too.


Gabriele F

unread,
Dec 10, 2020, 3:15:32 PM12/10/20
to vim...@googlegroups.com
On 10/12/2020 14.57, Tony Mechelynck wrote:
> (c) If a file is read by mistake in the wrong encoding, reread it
> immediately in another (guessed) better charset. You may need to try
> several times until it looks right.

The problem is that in languages such as Italian that have only
occasional (but still frequent) non-ascii characters encoding problems
get very easily unnoticed, most files look right at first look and
sometimes even when looking specifically and carefully for encoding
problems.

It's not unlikely to discover the mistakes only much later after you
already irremediably botched the text.

Reply all
Reply to author
Forward
0 new messages