How to save a UTF-8 file on Windows using a non-ASCII name

43 views
Skip to first unread message

Fan Decheng

unread,
Jan 20, 2008, 2:03:50 AM1/20/08
to vim...@googlegroups.com
Here I mean on the Windows platform, using Vim 6.4 or 7.1.

I've encountered this problem several times, but don't know whether
there is a
solution:

1. Use gvim to open a file with Chinese characters in its name. For
example: 测
试.txt .
2. Type ":set enc=utf-8" (without quotes).
3. Type ":e" to make the file content displayed using utf-8.
4. Type ":wq" to save the file.

After these steps, the file is saved in the name ²âÊÔ.txt rather than the
original name. Another thing that went wrong is 测试.txt.swp is left
undeleted.

I looked for any file name encoding options in vim but failed to find
anything.
Any ideas?

--
Fan Decheng
(Robbie Mosaic)
dts...@citiz.net


Linxiao

unread,
Jan 20, 2008, 10:27:31 PM1/20/08
to vim...@googlegroups.com
This is vim-dev maillist, better sending your question to v...@vim.org .

PS. If you wanna save a UTF-8 content file, just :set fenc=utf-8 but
not enc=utf-8.

Good Luck!


--
leal @ www.leal.cn

Tony Mechelynck

unread,
Jan 20, 2008, 10:41:46 PM1/20/08
to vim...@googlegroups.com
Linxiao wrote:
> This is vim-dev maillist, better sending your question to v...@vim.org .
>
> PS. If you wanna save a UTF-8 content file, just :set fenc=utf-8 but
> not enc=utf-8.
>
> Good Luck!

Tt, tt, tt... If 'encoding' is other than UTF-8 (or GB18030), Vim cannot
represent all Unicode codepoints in memory; therefore, if you try to edit a
UTF-8 file you run the risk of losing part of the data. (If you set 'enc' to
UTF-16, UCS-2 or UCS-4 aka UTF-32, with any endianness, what Vim will use is
actually UTF-8.)

To edit UTF-8 data you should have both 'encoding' (= memory representation of
the data) and 'fileencoding (= disk representation of the data) set to UTF-8.

>
> On 1/20/08, Fan Decheng <dts...@citiz.net> wrote:
>> Here I mean on the Windows platform, using Vim 6.4 or 7.1.
>>
>> I've encountered this problem several times, but don't know whether
>> there is a
>> solution:
>>
>> 1. Use gvim to open a file with Chinese characters in its name. For
>> example: 测
>> 试.txt .
>> 2. Type ":set enc=utf-8" (without quotes).
>> 3. Type ":e" to make the file content displayed using utf-8.
>> 4. Type ":wq" to save the file.
>>
>> After these steps, the file is saved in the name ²âÊÔ.txt rather than the
>> original name. Another thing that went wrong is 测试.txt.swp is left
>> undeleted.
>>
>> I looked for any file name encoding options in vim but failed to find
>> anything.
>> Any ideas?
>>
>> --
>> Fan Decheng
>> (Robbie Mosaic)
>> dts...@citiz.net
>>
>>
>>
>
>

Best regards,
Tony.
--
During a grouse hunt in North Carolina two intrepid sportsmen
were blasting away at a clump of trees near a stone wall. Suddenly a
red-faced country squire popped his head over the wall and shouted,
"Hey, you almost hit my wife."
"Did I?" cried the hunter, aghast. "Terribly sorry. Have a
shot at mine, over there."

Edward L. Fox

unread,
Jan 20, 2008, 10:43:17 PM1/20/08
to vim...@googlegroups.com
Hi Fan,

On Jan 20, 2008 3:03 PM, Fan Decheng <dts...@citiz.net> wrote:
>
> Here I mean on the Windows platform, using Vim 6.4 or 7.1.
>
> I've encountered this problem several times, but don't know whether
> there is a
> solution:
>
> 1. Use gvim to open a file with Chinese characters in its name. For
> example: 测
> 试.txt .
> 2. Type ":set enc=utf-8" (without quotes).

Here is a snippet from the Vim's reference:

NOTE: Changing this option will not change the encoding of the
existing text in Vim. It may cause non-ASCII text to become invalid.
It should normally be kept at its default value, or set when Vim
starts up. See |multibyte|. To reload the menus see |:menutrans|.

Personally I think this should be a bug of Vim. However, as it had
already been well-documented, I think you should follow the
principles.

> 3. Type ":e" to make the file content displayed using utf-8.
> 4. Type ":wq" to save the file.
>
> After these steps, the file is saved in the name ²âÊÔ.txt rather than the
> original name. Another thing that went wrong is 测试.txt.swp is left
> undeleted.
>
> I looked for any file name encoding options in vim but failed to find
> anything.

Please carefully read the documentations of the following options:

fencs
fenc
encoding
termencoding

> Any ideas?
>
> --
> Fan Decheng
> (Robbie Mosaic)
> dts...@citiz.net
>
>
>
> >
>


Regards,


L. F.

Edward L. Fox

unread,
Jan 20, 2008, 11:00:49 PM1/20/08
to vim...@googlegroups.com
Hi Tony,

On Jan 21, 2008 11:41 AM, Tony Mechelynck <antoine.m...@gmail.com> wrote:
>
> Linxiao wrote:
> [...]


>
> Tt, tt, tt... If 'encoding' is other than UTF-8 (or GB18030), Vim cannot
> represent all Unicode codepoints in memory; therefore, if you try to edit a
> UTF-8 file you run the risk of losing part of the data. (If you set 'enc' to
> UTF-16, UCS-2 or UCS-4 aka UTF-32, with any endianness, what Vim will use is
> actually UTF-8.)

I'm familiar with different shapes of malformed characters. In fact
the *thread-host*'s problem was not caused by the code points losing.
"²âÊÔ" was generated by the following steps:

1. At first, the thread-host represents "测试" in GBK encoding.

2. Then he re-sets the encoding to UTF-8. So the filename information
in Vim gets lost. Vim re-interprets the filename as Latin-1.

3. Vim converts the latin-1 string to UTF-8.

4. Vim saves the file to the disk with the new name. Windows will
convert the UTF-8 string to UCS, of course. Now the new filename is
exactly "²âÊÔ".

Here is the illustration (my system charset is UTF-8):

[edward@argos ~]$ echo 测试 | iconv -f utf-8 -t gbk | iconv -f latin1 -t utf-8
²âÊÔ

> To edit UTF-8 data you should have both 'encoding' (= memory representation of
> the data) and 'fileencoding (= disk representation of the data) set to UTF-8.
>

> [...]


>
> Best regards,
> Tony.
> --
> During a grouse hunt in North Carolina two intrepid sportsmen
> were blasting away at a clump of trees near a stone wall. Suddenly a
> red-faced country squire popped his head over the wall and shouted,
> "Hey, you almost hit my wife."
> "Did I?" cried the hunter, aghast. "Terribly sorry. Have a
> shot at mine, over there."
>
>
> >
>


Regards,


L. F.

Fan Decheng

unread,
Jan 21, 2008, 12:30:50 AM1/21/08
to vim_dev
On Jan 21, 11:41 am, Tony Mechelynck <antoine.mechely...@gmail.com>
wrote:
> Linxiao wrote:
> > This is vim-dev maillist, better sending your question to v...@vim.org .
>
> > PS. If you wanna save a UTF-8 content file, just :set fenc=utf-8 but
> > not enc=utf-8.
>
> > Good Luck!
>
> Tt, tt, tt... If 'encoding' is other than UTF-8 (or GB18030), Vim cannot
> represent all Unicode codepoints in memory; therefore, if you try to edit a
> UTF-8 file you run the risk of losing part of the data. (If you set 'enc' to
> UTF-16, UCS-2 or UCS-4 aka UTF-32, with any endianness, what Vim will use is
> actually UTF-8.)
>
> To edit UTF-8 data you should have both 'encoding' (= memory representation of
> the data) and 'fileencoding (= disk representation of the data) set to UTF-8.
>
>
> Best regards,
> Tony.

Thanks! I tried these:

D:\test>gvim
:set fenc=utf-8
:set enc=utf-8
:e 测试.txt
E37: No write since last change (add ! to override)
:e! 测试.txt
:set fenc?
fileencoding=utf-8
:set enc?
encoding=utf-8

It is all-right. Note that in the above test I opened gvim with an
empty
buffer first. Another test shows some problems that I should take
care:

D:\test>gvim 测试.txt
:set fenc?
fileencoding=
:set enc?
enc=cp936

Of course, under this situation the file contents are in a wrong
encoding.
Then I tried these:

:set enc=utf-8

After this, the window title has changed to:
<b2><e2><ca><d4>.txt (D:\test) - GVIM
:set fenc?
fileencoding=
:set enc?
encoding=utf-8

I proceed with:
:e

The window title became:
<b2><e2><ca><d4>.txt = (D:\test) - GVIM
:set fenc?
fileencoding=utf-8
:set enc?
encoding=utf-8

However as I observed, the file name of the .swp file is still
.测试.txt.swp.

To make the window title correct and make the file non-read-only
(writable), I typed:

:e 测试.txt

A new buffer is opened and the window title changed back to the
original:
测试.txt = (D:\test) - GVIM

Now every write to the file is OK. However after exiting gvim, the
swap
file is still there.

Sorry for writing this long, just for some reference. I've read the
help
for `fencs', but I did't find it helpful to this situation.

Fan Decheng

unread,
Jan 21, 2008, 12:46:37 AM1/21/08
to vim_dev
On Jan 21, 11:43 am, "Edward L. Fox" <edy...@gmail.com> wrote:
> Hi Fan,
>
> On Jan 20, 2008 3:03 PM, Fan Decheng <dts...@citiz.net> wrote:
>
>
>
> > Here I mean on the Windows platform, using Vim 6.4 or 7.1.
>
> > I've encountered this problem several times, but don't know whether
> > there is a
> > solution:
>
> > 1. Use gvim to open a file with Chinese characters in its name. For
> > example: 测
> > 试.txt .
> > 2. Type ":set enc=utf-8" (without quotes).
>
> Here is a snippet from the Vim's reference:
>
> NOTE: Changing this option will not change the encoding of the
> existing text in Vim. It may cause non-ASCII text to become invalid.
> It should normally be kept at its default value, or set when Vim
> starts up. See |multibyte|. To reload the menus see |:menutrans|.

Thanks. It seems that setting `encoding' before opening the file works.

Edward L. Fox

unread,
Jan 21, 2008, 3:48:22 AM1/21/08
to vim...@googlegroups.com
Hi Fan,

2008/1/21 Fan Decheng <fande...@gmail.com>:
>
> [...]


> >
> > Here is a snippet from the Vim's reference:
> >
> > NOTE: Changing this option will not change the encoding of the
> > existing text in Vim. It may cause non-ASCII text to become invalid.
> > It should normally be kept at its default value, or set when Vim
> > starts up. See |multibyte|. To reload the menus see |:menutrans|.
>
> Thanks. It seems that setting `encoding' before opening the file works.

Yes, it works fine for this case. But it doesn't necessary means that
it's also OK for all the functions in Vim. So as a practical
suggestion, never modify the "encoding" as long as you've already
launched Vim. Only change this option *ONCE* in your .vimrc.

>
> >
>


L. F.

Edward L. Fox

unread,
Jan 21, 2008, 4:00:04 AM1/21/08
to vim...@googlegroups.com
Hi Fan,

On Jan 21, 2008 1:30 PM, Fan Decheng <fande...@gmail.com> wrote:
>
> [...]


>
> Now every write to the file is OK. However after exiting gvim, the
> swap
> file is still there.

Although your solution seems to solve part of your problem, but I
still want to admit that modifying the "encoding" during runtime is
really harmful for your health. So never do that unless your are an
expert. If you want to change the value of "encoding", please modify
your ".vimrc" and restart your (g)vim. You should never modify the
encoding when (g)vim is already running.

Any way, if all the characters in your file can be covered by GBK,
it's also OK for you to use GBK as your internal encoding. To create
a UTF-8 file:

:set fenc=utf-8
:w filename

To load a UTF-8 file:

:e ++enc=utf-8 filename

But personally I still prefer using UTF-8 in Vim.

> Sorry for writing this long, just for some reference. I've read the
> help
> for `fencs', but I did't find it helpful to this situation.

"fencs" has nothing to do with your original question. Any way, I
referred that to you just because I think you'll need it if you want
your UTF-8 file to be recognized automatically rather than setting
"++enc" manually every time. You may also have a try on the plugin
named "fencview", developed by Ming Bai and Wuyong Wei.

>
>
>
> >
>

Regards,


L. F.

Fan Decheng

unread,
Jan 21, 2008, 4:11:33 AM1/21/08
to vim...@googlegroups.com
2008/1/21, Edward L. Fox <edy...@gmail.com>:

Thanks to your so detailed explanation! I'll try what you've said.
Sorry for my messing up the vim_dev list. I didn't know that it wasn't
a bug.

Best regards,
Fan Decheng

Ben Schmidt

unread,
Jan 21, 2008, 8:55:30 AM1/21/08
to vim...@googlegroups.com
> Here is a snippet from the Vim's reference:
>
> NOTE: Changing this option will not change the encoding of the
> existing text in Vim. It may cause non-ASCII text to become invalid.
> It should normally be kept at its default value, or set when Vim
> starts up. See |multibyte|. To reload the menus see |:menutrans|.
>
> Personally I think this should be a bug of Vim. However, as it had
> already been well-documented, I think you should follow the
> principles.

I personally think that's perfectly reasonable and not a bug.

But something I really do think is worth changing because it's really confusing,
is ++enc. Why do we call this ++enc not ++fenc which would make a huge amount more
sense, and be more consistent with ++ff and ++bin which both set their namesake
options? We see evidence of people getting 'enc' and 'fenc' confused on a regular
basis, and this feature naming really doesn't help matters. What would you think
of changing this, Bram? Perhaps making it officially ++fenc but accepting ++enc
for compatibility with old scripts (and old users!)?

Also, I wonder whether it might be worth adding a 'best practices' section to
mbyte.txt and referring to it in such places as 'enc' (probably mostly there)
which explains the basics in a few short paragraphs: set 'enc' in your .vimrc
(recommend utf-8), 'tenc' if your terminal/locale is different (unneeded in GUI),
use ++fenc if a file is read with wrong encoding detected, 'fenc' to change what
encoding to write a file with for future writes. This sort of material is repeated
frequently on the mailing lists which suggests users aren't finding it easily in
the help (though it is all there, it is somewhat spread around, etc.). Do others
think this might or something similar might be a good idea?

Cheers,

Ben.


Send instant messages to your online friends http://au.messenger.yahoo.com

Bram Moolenaar

unread,
Jan 21, 2008, 3:05:28 PM1/21/08
to Ben Schmidt, vim...@googlegroups.com

Ben Schmidt wrote:

> > Here is a snippet from the Vim's reference:
> >
> > NOTE: Changing this option will not change the encoding of the
> > existing text in Vim. It may cause non-ASCII text to become invalid.
> > It should normally be kept at its default value, or set when Vim
> > starts up. See |multibyte|. To reload the menus see |:menutrans|.
> >
> > Personally I think this should be a bug of Vim. However, as it had
> > already been well-documented, I think you should follow the
> > principles.
>
> I personally think that's perfectly reasonable and not a bug.
>
> But something I really do think is worth changing because it's really
> confusing, is ++enc. Why do we call this ++enc not ++fenc which would
> make a huge amount more sense, and be more consistent with ++ff and
> ++bin which both set their namesake options? We see evidence of people
> getting 'enc' and 'fenc' confused on a regular basis, and this feature
> naming really doesn't help matters. What would you think of changing
> this, Bram? Perhaps making it officially ++fenc but accepting ++enc
> for compatibility with old scripts (and old users!)?

These are not really option names, although I can see that they are so
similar that people might think that. They are options for the command.
Offering more alternatives isn't going to make it simpler. We also
don't have 'fbin' for reading a file in binary mode.

> Also, I wonder whether it might be worth adding a 'best practices'
> section to mbyte.txt and referring to it in such places as 'enc'
> (probably mostly there) which explains the basics in a few short
> paragraphs: set 'enc' in your .vimrc (recommend utf-8), 'tenc' if your
> terminal/locale is different (unneeded in GUI), use ++fenc if a file
> is read with wrong encoding detected, 'fenc' to change what encoding
> to write a file with for future writes. This sort of material is
> repeated frequently on the mailing lists which suggests users aren't
> finding it easily in the help (though it is all there, it is somewhat
> spread around, etc.). Do others think this might or something similar
> might be a good idea?

It will certainly help to update the documentation to explain common
pitfalls. Writing this in a nice way, without becoming too verbose and
putting it in the right place is not easy. If you or someone else can
suggest a patch that would be great.

--
Wi n0t trei a h0liday in Sweden thi yer?
"Monty Python and the Holy Grail" PYTHON (MONTY) PICTURES LTD

/// Bram Moolenaar -- Br...@Moolenaar.net -- http://www.Moolenaar.net \\\
/// sponsor Vim, vote for features -- http://www.Vim.org/sponsor/ \\\
\\\ download, build and distribute -- http://www.A-A-P.org ///
\\\ help me help AIDS victims -- http://ICCF-Holland.org ///

Ben Schmidt

unread,
Jan 21, 2008, 11:27:51 PM1/21/08
to vim...@googlegroups.com
>> But something I really do think is worth changing because it's really
>> confusing, is ++enc. Why do we call this ++enc not ++fenc which would
>> make a huge amount more sense, and be more consistent with ++ff and
>> ++bin which both set their namesake options? We see evidence of people
>> getting 'enc' and 'fenc' confused on a regular basis, and this feature
>> naming really doesn't help matters. What would you think of changing
>> this, Bram? Perhaps making it officially ++fenc but accepting ++enc
>> for compatibility with old scripts (and old users!)?
>
> These are not really option names, although I can see that they are so
> similar that people might think that. They are options for the command.
> Offering more alternatives isn't going to make it simpler. We also
> don't have 'fbin' for reading a file in binary mode.

Yes, I realise they're not option names, but they undeniably do look like it, and
time and time again we see people getting 'fenc' and 'enc' confused, and ++ff sets
'ff' when reading, ++bin sets 'bin', but ++enc sets 'fenc'. This is only confusing
because there *is* an option called 'enc'. If there weren't, as there isn't a
'fbin' option, nor a 'format' option, it would be no problem.

But, hey, it's not too important. I just thought I'd throw it out there, as I
think it has potential to help a lot of users avoid confusion. Maybe I'm wrong. It
would be interesting to hear what others think.

>> Also, I wonder whether it might be worth adding a 'best practices'
>> section to mbyte.txt and referring to it in such places as 'enc'
>> (probably mostly there) which explains the basics in a few short
>> paragraphs: set 'enc' in your .vimrc (recommend utf-8), 'tenc' if your
>> terminal/locale is different (unneeded in GUI), use ++fenc if a file
>> is read with wrong encoding detected, 'fenc' to change what encoding
>> to write a file with for future writes. This sort of material is
>> repeated frequently on the mailing lists which suggests users aren't
>> finding it easily in the help (though it is all there, it is somewhat
>> spread around, etc.). Do others think this might or something similar
>> might be a good idea?
>
> It will certainly help to update the documentation to explain common
> pitfalls. Writing this in a nice way, without becoming too verbose and
> putting it in the right place is not easy. If you or someone else can
> suggest a patch that would be great.

I agree. Terseness is definitely not my strength, though, so I might leave this to
someone else to have a try if anyone is willing. Volunteers?

Reply all
Reply to author
Forward
0 new messages