coding file ANSI<>UNICODE and reverse

Paolo Baruffa

unread,

Nov 5, 2009, 8:02:36 AM11/5/09

to vim...@googlegroups.com

Hi!
I need to perform ANSI/UNICODE commands in my GVIM.
I read many docs on the web about and I did set these:

"------------------------------------------
" .vimrc includes
"----------------------
:set ffs=dos,unix,mac " autosense Dos,Unix,Mac
:set fileencodings=ucs-bom,utf-8,latin1 " autosense coding
" (no fileencoding is set in .vimrc)

"------------------------------------------
" my menu includes
"----------------------

:set fileencoding=latin1<CR><Esc>:set ff=dos<CR>:w!<CR> " ANSI Dos
:set fileencoding=utf-8<CR>:w!<CR><Esc> " UNICODE

"-------------------------------------------

but they don't work correctly (I check the files with an
other editor)...

1) I open an ANSI file with GVim, I ask ":set fileencoding"
and the file appears as "utf-8"

2) conversion between ANSI <> UNICODE and then reverse likes
right ("converted" in bottom status line), but refreshing
the file on the other editor I see the same coding...

Does anybody can help, pls?

winterTTr

unread,

Nov 5, 2009, 7:58:27 PM11/5/09

to vim...@googlegroups.com

On Thu, Nov 5, 2009 at 9:02 PM, Paolo Baruffa <win...@people.it> wrote:

Hi!
I need to perform ANSI/UNICODE commands in my GVIM.
I read many docs on the web about and I did set these:

"------------------------------------------
" .vimrc includes
"----------------------
:set ffs=dos,unix,mac " autosense Dos,Unix,Mac
:set fileencodings=ucs-bom,utf-8,latin1 " autosense coding
" (no fileencoding is set in .vimrc)

"------------------------------------------
" my menu includes
"----------------------

:set fileencoding=latin1<CR><Esc>:set ff=dos<CR>:w!<CR> " ANSI Dos
:set fileencoding=utf-8<CR>:w!<CR><Esc> " UNICODE

"-------------------------------------------

but they don't work correctly (I check the files with an
other editor)...

1) I open an ANSI file with GVim, I ask ":set fileencoding"
and the file appears as "utf-8"

What's the value for the option "encoding" ?
if the "encoding" is set to utf-8 when you don't set "fenc" , the file
will open with the same encoding as what you set to enc , which should
become 'utf8'

2) conversion between ANSI <> UNICODE and then reverse likes
right ("converted" in bottom status line), but refreshing
the file on the other editor I see the same coding...

What do you mean for the "The same coding "?
The coding is always ANSI or UNICODE?

And before this , there is something should be mentioned.
UTF-8 is a variable-length character encoding for UNICODE , and
if you original file is ansi encoding , when it is convert to utf8 ( without BOM),
the file should be same ( seems like no convert ). Because utf8 uses the single
octet encoding only for the 128 US-ASCII characters which is the same as
when it is encoding with ansi.

bill lam

unread,

Nov 5, 2009, 8:28:43 PM11/5/09

to vim...@googlegroups.com

On Thu, 05 Nov 2009, Paolo Baruffa wrote:
> "------------------------------------------
> " my menu includes
> "----------------------
>
> :set fileencoding=latin1<CR><Esc>:set ff=dos<CR>:w!<CR> " ANSI Dos
> :set fileencoding=utf-8<CR>:w!<CR><Esc> " UNICODE
>
> "-------------------------------------------

does it work if you replace fileencoding with encoding? eg.

:set encoding=latin1<CR><Esc>:set ff=dos<CR>:w!<CR> " ANSI Dos
:set encoding=utf-8<CR>:w!<CR><Esc> " UNICODE

--
regards,
====================================================
GPG key 1024D/4434BAB3 2008-08-24
gpg --keyserver subkeys.pgp.net --recv-keys 4434BAB3

John Beckett

unread,

Nov 5, 2009, 9:22:46 PM11/5/09

to vim...@googlegroups.com

Paolo wrote:
> I need to perform ANSI/UNICODE commands in my GVIM.

The procedure is pretty baffling. Generally, by the time you
have read the file, it is too late. I used the following code to
convert several files a year ago.

I have the following in my vimrc, but I _think_ that this does
not matter given the following procedure:
set encoding=utf-8

You would put the following in a file, say convert.vim, and
edit it for the names of the files you want. You need to also
specify the encoding for reading, and for writing.

" Convert specified files from cp1251 to utf-8.
let files = 'first.txt second.txt third.txt'
for f in split(files)
exec 'edit ++enc=cp1251 '.f
exec 'write ++enc=utf-8 '.f
endfor

After saving the above file, open it in Vim (do NOT open any
other file), and enter the following to execute the code:

:so %

That "sources" the current file. You should have a COPY of the
files you want to convert in the current directory. They will be
overwritten.

John

Paolo Baruffa

unread,

Nov 6, 2009, 3:31:32 AM11/6/09

to vim...@googlegroups.com

to winterTTr:

> What's the value for the option "encoding" ?

querying ":set", encoding doesn't appear - maybe because I don't setted it in .vimrc...

querying "set encoding", answer is "latin1" (which is correct)

>> refreshing the file on the other editor
>> I see the same coding

> What do you mean for the "The same coding "?

pls, forget this point:
GVim is latin1, other editor is ANSI.
GVim convert to utf-8: refreshing other editor shows "utf-8 w/signature".
GVim convert to latin1: refreshing other editor it DID show again utf-8...
I don't know why... now it works...

Paolo Baruffa

unread,

Nov 6, 2009, 3:36:52 AM11/6/09

to vim...@googlegroups.com

to bill lam:

> does it work if you replace fileencoding with encoding? eg.

> :set encoding=utf-8<CR>:w!<CR><Esc> " UNICODE

no! it stay ANSI again... and the Euro symbol is shown as hex 80...

this way doesn't work.

Paolo Baruffa

unread,

Nov 6, 2009, 3:36:58 AM11/6/09

to vim...@googlegroups.com

Paolo Baruffa

unread,

Nov 6, 2009, 3:49:41 AM11/6/09

to vim...@googlegroups.com

to eveybody, I get right result using this settings:

"-------------------------------------------
" .vimrc
"-------------------------------------------
" autosense Dos,Unix,Mac
:set ffs=dos,unix,mac
" none settings for encoding...
":let &termencoding = &encoding
":set encoding=utf-8
":set encoding=latin1
":set encoding=utf-8
" autosense Unicode-ISO 8859-1 and 15
:set fileencodings=ucs-bom,utf-8,latin1,latin9

"-------------------------------------------
" in menu
"-------------------------------------------

:set fenc=latin1<CR>:set ff=dos<CR> "for ANSI DOS
:set fenc=utf-8 bomb<CR> "for UNICODE Utf-8

" also :set fenc=ucs-2le<CR> " for Windows Unicode

(note than convert to "Utf-8 with signature"...
also, note than :set fenc=latin9 and then :w! doesn't convert...
but anyway I have a sequence which works to Unicode and reverse to ANSI!
thx to everybody ;)

winterTTr

unread,

Nov 6, 2009, 3:52:49 AM11/6/09

to vim...@googlegroups.com

On Fri, Nov 6, 2009 at 4:31 PM, Paolo Baruffa <win...@people.it> wrote:

to winterTTr:

> What's the value for the option "encoding" ?

querying ":set", encoding doesn't appear - maybe because I don't setted it in .vimrc...

querying "set encoding", answer is "latin1" (which is correct)

>> refreshing the file on the other editor
>> I see the same coding

Maybe i know the reason.

Because the sequence of setting you set to the fileencodings.

When a file is read , vim try to read the file via the each one you set

to the "fileencodings" until he find the one with which the file could be read in

correctly. Then vim will set the fileencoding to the one he find.

You set "utf-8" before "latin1" in fileencodings as below :

:set fileencodings=ucs-bom,utf-8,latin1 " autosense coding

so when the vim try to read a file , he will try to use the encoding utf8 to

read the file , and success.

So ,the fileencoding is always set to utf8 .

# Latin1 never has the change to be used.

Tony Mechelynck

unread,

Nov 6, 2009, 4:56:41 AM11/6/09

to vim...@googlegroups.com

If the Euro sign is 0x80 then the file is NOT in Latin1 aka ISO-8859-1
(there is no Euro sign in Latin1), and also not in Latin9 aka
ISO-8859-15 (where the Euro sign is 0xA4), but it could be Windows-1252
(sometimes known as cp1252), where 0x80 is indeed the Euro sign. In
Unicode the Euro sign is assigned to codepoint U+20AC, represented on
disk in UTF-8 as 0xE2 0x82 0xAC.

IIUC, Windows-1252 and Latin1 are identical except for 0x80 to 0xBF,
which are (nonprinting) control characters in Latin1 and printable
characters in Windows-1252. Many Windows systems abusively call their
cp1252 charset "Latin1".

To see the available characters in any 8-bit non-EBCDIC encoding, use
(in a gvim with 'encoding' set to UTF-8)

:view ++enc=<encoding> alphabet.txt

on the attached file, replacing <encoding> by the charset's name. You'll
see characters 0x20 to 0xFF arranged in order, in 14 lines of 16. Vim
may say [Conversion error at line <number>], with the line number of the
first line where it couldn't convert, but that just means there are
bytes which should never happen in a file coded in the encoding you
chose. A "?" (question mark) placeholder appears instead of these
characters.

Best regards,
Tony.
--
"Based on what you know about him in history books, what do you think
Abraham Lincoln would be doing if he were alive today?

(1) Writing his memoirs of the Civil War.
(2) Advising the President.
(3) Desperately clawing at the inside of his coffin."
-- David Letterman

alphabet.txt

Reply all

Reply to author

Forward