How gVIM can handle UTF like notepad

195 views
Skip to first unread message

Erhy

unread,
Oct 10, 2016, 11:33:26 AM10/10/16
to vim_use
Hello,
for administer Windows I would like to use gVIM.
But there are some UTF encoded files.
Opening such files gVim decoded them and save the without BOMB.
Are there setting to have the same behavior as Windows notepad?
Thank you for tips
Erhy

Ben Fritz

unread,
Oct 10, 2016, 11:42:38 AM10/10/16
to vim_use

I guess, you're asking how to save the file with a BOM? For that, ":set bomb" before saving will do the trick.

To detect this automatically, be sure that you have "set encoding=utf-8", "setglobal bomb", and "set fileencodings=ucs-bomb,utf-8,latin1" or similar in your .vimrc.

http://vim.wikia.com/wiki/Working_with_Unicode

Erhy

unread,
Oct 11, 2016, 3:54:47 AM10/11/16
to vim_use
I'm confused about the message
converted,
if I open a file

Erhy

Gabriele

unread,
Oct 11, 2016, 6:17:33 AM10/11/16
to vim...@googlegroups.com
On 10/10/2016 17.42, Ben Fritz wrote:
> To detect this automatically, be sure that you have "set
> encoding=utf-8", "setglobal bomb", and "set
> fileencodings=ucs-bomb,utf-8,latin1" or similar in your .vimrc.
> http://vim.wikia.com/wiki/Working_with_Unicode

Can you tell me if you intentionally used "setglobal" for "bomb", or you
just copied what's in that wiki?
On my system the global bomb setting is not used if I don't also add
"setlocal bomb<" or "set bomb<".

It is likely that on that wiki "setglobal" was used just by chance,
because that's what was used for fileencoding, see
http://vim.wikia.com/wiki/Working_with_Unicode?diff=29876&oldid=29794 .

Also it shouldn't be necessary to set "fileencodings", when encoding is
set to utf-8 it gets set by default to a good
"ucs-bom,utf-8,default,latin1".

Gabriele

Message has been deleted
Message has been deleted

Erhy

unread,
Oct 11, 2016, 10:14:29 AM10/11/16
to vim_use, gb...@tiscali.it
now I inserted the lines below in my _vimrc.
it works well, although the message
converted
appears.
I detected, that VIM allways append a
0D 0A
at the end, if there wasn't such,
solved with
set nofixendofline

Erhy

if has("multi_byte")
echomsg "has MULTIBYTE"
if &termencoding == ""
let &termencoding = &encoding
endif
setglobal encoding=utf-8
set encoding<
setglobal fileencoding=utf-8
set fileencoding<
setglobal bomb
set bomb<
setglobal fileencodings=ucs-bom,utf-8,latin1
set fileencodings<
endif

Gabriele Fava

unread,
Oct 11, 2016, 10:31:46 AM10/11/16
to vim...@googlegroups.com
You posted the same settings in the last 3 mails.

Anyway, the only problem you are still having is the [converted] message?
I don't get it, are you sure that the files you're opening are really in
UTF-8?
Note that the Windows' Notepad saves in UTF-16 if you select "Unicode"
as Encoding.
Actually I noticed that you never mentioned UTF-8 in your mails. If you
want that the UTF-16 files be left in that encoding you'll have to add
that as well to the 'fileencodings' (before latin1). Maybe both utf-16
and utf-16le, I'm not sure what 'uft-16 exactly stands for in vim.


Gabriele

Erhy

unread,
Oct 11, 2016, 10:47:42 AM10/11/16
to vim_use, gb...@tiscali.it

I will accept the converted message because
saving the file, the converted message appears again
and the resulting BOMB is the same.

This all for exported tasks from Windows Task Scheduler.

Erhy

aro...@vex.net

unread,
Oct 11, 2016, 11:36:47 AM10/11/16
to vim...@googlegroups.com
> On my system the global bomb setting is not used if I don't also add
> "setlocal bomb<" or "set bomb<".
>

Don't even think about discussing this anywhere near an airport! :-)*

Erhy

unread,
Oct 11, 2016, 4:35:38 PM10/11/16
to vim_use, aro...@vex.net
Thanks all for discussing and the joke.

The repeated mails detected by Gabriele
were created because I want to correct them (delete and asnwere again)

Erhy

Gabriele

unread,
Oct 12, 2016, 6:59:08 AM10/12/16
to vim...@googlegroups.com
On 11/10/2016 16.30, Gabriele Fava wrote:
> If you want that the UTF-16 files be left in that encoding you'll have
> to add that as well to the 'fileencodings' (before latin1).
I was wrong, if you have ucs-bom at the start of fileencodings it will
detect UTF-16 encodings as well, set the 'fileencoding' appropriately
and thus save it back in the same encoding.
So you might need to add utf-16le and uft-16be to the 'fileencodings'
only if you have some file without a BOM, but actually 'fileencodings'
seems quite stupid and it is likely that you'll still need to specify
the correct encoding manually when opening the file.


> Maybe both utf-16 and utf-16le, I'm not sure what 'uft-16 exactly
> stands for in vim.
I now saw that it stands for UTF-16BE


Gabriele

unread,
Oct 12, 2016, 7:00:30 AM10/12/16
to vim...@googlegroups.com
On 11/10/2016 16.47, Erhy wrote:
> I will accept the converted message because
> saving the file, the converted message appears again
> and the resulting BOMB is the same.

I found out that [converted] only means "conversion from 'fileencoding'
to 'encoding' done" (insert.txt help), so it doesn't really mean
anything if you have 'encoding' set to a unicode format, I don't think
there's a chance that it could result in information loss.

It's called BOM, not BOMB, by the way.


> This all for exported tasks from Windows Task Scheduler.
I can't check by myself right now but I read that they are in UTF-16LE,
as I wrote in the other reply if you have ucs-bom at the start of
'fileencodings' (and 'encoding' set to a unicode format) they will be
handled correctly.


Gabriele

Gabriele

unread,
Oct 12, 2016, 8:24:41 AM10/12/16
to vim...@googlegroups.com
Yeah, better add "set nobomb" to your notebook's vimrc :-)

Erhy

unread,
Oct 12, 2016, 2:01:49 PM10/12/16
to vim_use, gb...@tiscali.it
see my corrected additional lines for Wordpad compatibility at the end of this posting

The results of my test are:
Edit existing files with VIM are written back in the same UTF/UNICODE/ANSI code as originally.

Creating a new file differs
Notepad accepts all characters e.g. by paste text from a page with IE
and warns on saving the file in ANSI format when there are UNICODE characters.

With VIM UNICODE characters pasted are not shown as expected.
A corrective is

set encoding=utf-8

Because I prefer to have the same result as with Notepad
I put additionally

set bomb


"additional lines in my _vimrc
"
setglobal nofixendofline
set fixendofline<
"
if has("multi_byte")


if &termencoding == ""
let &termencoding = &encoding
endif

setglobal fileencodings=ucs-bom,utf-8,latin1
set fileencodings<
endif
"

Ben Fritz

unread,
Oct 29, 2016, 5:35:48 PM10/29/16
to vim_use, gb...@tiscali.it
On Tuesday, October 11, 2016 at 5:17:33 AM UTC-5, Gabriele wrote:
> On 10/10/2016 17.42, Ben Fritz wrote:
> > To detect this automatically, be sure that you have "set
> > encoding=utf-8", "setglobal bomb", and "set
> > fileencodings=ucs-bomb,utf-8,latin1" or similar in your .vimrc.
> > http://vim.wikia.com/wiki/Working_with_Unicode
>
> Can you tell me if you intentionally used "setglobal" for "bomb", or you
> just copied what's in that wiki?
> On my system the global bomb setting is not used if I don't also add
> "setlocal bomb<" or "set bomb<".
>
> It is likely that on that wiki "setglobal" was used just by chance,
> because that's what was used for fileencoding, see
> http://vim.wikia.com/wiki/Working_with_Unicode?diff=29876&oldid=29794 .
>

I'm sure I intentionally used "setglobal bomb". When I experiment with "gvim -N -u NONE -i NONE" and then ":set encoding=utf-8" and ":setglobal bomb", any new buffer I create *after* this via ":new" will automatically get 'bomb' set. If I omit the ":setglobal bomb" then new buffers do NOT get 'bomb' set by default.

Note that the initial buffer created on Vim startup will not have 'bomb' set from the setglobal command. If you need that first buffer to also have 'bomb' set then yes, you will need a setlocal or set command as well.

Tony Mechelynck

unread,
Oct 29, 2016, 10:45:45 PM10/29/16
to vim...@googlegroups.com, gb...@tiscali.it
I also use "setglobal bomb", intentionally, in my vimrc; but beware
that not all programs, and in particular (on Unix/Linux) not the
script loader (y'know, whatever it is that recognises the #! shebang
at the start of a script) will recognize (and discard) 0xEF 0xBB 0xBF
(i.e., a UTF-8 BOM) at the start of a supposedly ASCII script — so
shell scripts (probably among others) need "setlocal nobomb" done
either manually or in a filetype-plugin (in an after-plugin since the
default $VIMRUNTIME/ftplugin/sh.vim doesn't set it).

Best regards,
Tony.
Reply all
Reply to author
Forward
0 new messages