Windows Registry Editor text files (.REG) in Unicode (?) encoding - displayed as garbage

5,534 views
Skip to first unread message

Frantisek Rysanek

unread,
Dec 2, 2009, 3:54:56 PM12/2/09
to v...@vim.org
Dear everybody,

following up on my Unix-originated habit, I'm using Vim as my
favourite text editor for anything that resembles plain text in
Windows. The Windows context menu item "edit with Vim" is very useful
indeed :-)

And that's where I have a problem with .REG files.
If the file is encoded as plain ASCII, that's no problem.
The problem is that Regedit in XP exports Unicode (or what), maybe
utf-16 (or some other multibyte charset). If I try "edit with Vim" on
such files, I get a screenful of garbage.

I'm using the basic Windows build of Vim (gvim.exe) that gets
installed by the "allaround Windows installer" of VIM 7.2, available
at www.vim.org.

Is there an easy way out? A few dark curses in the command line?
":help multibyte" was not much help... I don't even know what
encoding i should try.
Or is the default Windows build missing some important compile-time
features? Some parts of "multibyte" support?
:version only mentions +multi_byte_ime/dyn , not +multi_byte.

Any ideas are welcome :-)

Frank Rysanek

Christian Brabandt

unread,
Dec 2, 2009, 4:25:19 PM12/2/09
to v...@vim.org
Hi Frantisek!
I believe Windows uses a UTF-16 encoding, so try reloading the file with
:e! ++enc=utf-16le

(Or whatever the windows encoding is)

regards,
Christian
--
:wq!

Kelly

unread,
Dec 2, 2009, 4:52:54 PM12/2/09
to v...@vim.org

You can export windows registry files in
plain text as well. Explore the "save
as" options.


--

Quoting Christian Brabandt (cbl...@256bit.org):
Subject: Re: Windows Registry Editor text files (.REG) in Unicode (?)?encoding - displayed as garbage
Date: Wed, Dec 02, 2009 at 10:25:19PM +0100
--
You received this message from the "vim_use" maillist.
For more information, visit http://www.vim.org/maillist.php

Dennis German

unread,
Dec 2, 2009, 6:16:32 PM12/2/09
to vim...@googlegroups.com
Windows registry will not export correctly as text if there non-ascii
characters in the registry.

John Beckett

unread,
Dec 2, 2009, 6:28:18 PM12/2/09
to vim...@googlegroups.com
Please bottom post on this list. Quote a small (relevant) part
of the message you are replying to, and put your text underneath.

See the list guidelines:
http://groups.google.com/group/vim_use/web/vim-information

Linda W

unread,
Dec 2, 2009, 8:58:13 PM12/2/09
to vim...@googlegroups.com
Frantisek Rysanek wrote:
> And that's where I have a problem with .REG files.
> If the file is encoded as plain ASCII, that's no problem.
> The problem is that Regedit in XP exports Unicode (or what), maybe
> utf-16 (or some other multibyte charset). If I try "edit with Vim" on
> such files, I get a screenful of garbage.
> Any ideas are welcome :-)
---
Are you using 'gvim' for win32? How is your vim compiled?

You need to make sure you have a version with the multi-byte option
compiled in. Then it should autodetect the correct format for these
files.

When I use vim (win32 or win64), I use 'gvim' -- which is fully
UTF-8 compliant. A localized, 8-bit, ASCII-only version of vim
won't work.

The reg files are "Little-endian UTF-16 Unicode with CRLF line terminators",
with vim showing their encoding as "utf-16le".

If you do a 'set' command, you should see fileencodings set to:

fileencodings=ucs-bom,utf-8,default,latin1

It's important that the "ucs-bom" is before the utf-8 or vim won't
detect bom files correctly.

Good luck!
linda

Frantisek Rysanek

unread,
Dec 3, 2009, 1:22:48 AM12/3/09
to vim...@googlegroups.com
On 2 Dec 2009 at 22:25, Christian Brabandt wrote:
>
> On Mi, 02 Dez 2009, Frantisek Rysanek wrote:
>
> > The problem is that Regedit in XP exports Unicode (or what), maybe
> > utf-16 (or some other multibyte charset). If I try "edit with Vim" on
> > such files, I get a screenful of garbage.
>
> I believe Windows uses a UTF-16 encoding, so try reloading the file with
> :e! ++enc=utf-16le
>
Thanks for that tip :-)
When I did just that, VIM responded something like "couldn't
convert".
But once I added "set enc=utf-16le" to my _vimrc, suddenly it started
to work :-) I'll probably add that to syntax/registry.vim or
someplace like that (oh wait - chicken & egg problem looming?)

On 3 Dec 2009 at 2:58, Linda W wrote:
>
> The reg files are "Little-endian UTF-16 Unicode with CRLF line
> terminators", with vim showing their encoding as "utf-16le".
>
> If you do a 'set' command, you should see fileencodings set to:
>
> fileencodings=ucs-bom,utf-8,default,latin1
>
> It's important that the "ucs-bom" is before the utf-8 or vim won't
> detect bom files correctly.
>
Allright, in my case, I'm more likely to see encodings such as ISO
latin2 or CP1250, so I've added this line into my systemwide _vimrc:
set fileencodings=ucs-bom,utf-8,default,latin2,cp1250

But that line doesn't seem to have any effect.
The enc=utf-16le alone does have the desired effect though :-)

The correct loading of the REG file in UTF-16le seems to have an
interesting side effect. My install of Gvim seems to talk to me in
Czech, probably based on some Windows locale. And, with enc=utf-16le,
the result messages from VIM get the nationalized characters garbled
(printed as hexa codes), while the buffer being edited is displayed
just right (even if the buffer contains 8bit cp1250 text). I tried
forcing "helplang" from cs to en (in _vimrc), but that didn't help
:-)
It's not very bad, but if you had a further pointer on how to set
that straight, please let me know :-)

Yes, I'm using GVIM, from the default Windows build downloaded off
gvim.org as a binary installer package. I've written some snippets of
code in Mingw before, but I'm not up to rolling my own Windows build
of VIM :-) I just don't bother. Come to think of that, the only thing
I changed in the default install is the _vimrc (deleted the original
and replaced with something left over from the past).
Maybe that's where I should've started :-)

Thank you both, Chris and Linda...

Frank Rysanek

Tony Mechelynck

unread,
Jan 3, 2010, 3:58:34 AM1/3/10
to vim...@googlegroups.com, Frantisek Rysanek

I'm coming to this thread about a month late, which shows how far behind
I am in handling email.

Reading this thread shows that a solution "which seems to work" has been
found by trial and error, but, it seems, not with much understanding of
what was going on. So here goes:

1) IUC, Windows registry files (*.REG) are encoded in UTF-16le with BOM.

2) To display (and possibly edit) these files in Vim, you need not only
a Vim version which is "multi-byte capable" (i.e. compiled with
+multi_byte or +multi_byte_ime/dyn), you also need to exercise those
capabilities by setting 'encoding' (the internal representation Vim will
use to represent files in memory) to something that will be able to
represent all characters in all files you'll edit, and in this case this
means UTF-8 (Note: Setting 'encoding' to any of utf-16, utf-16be,
utf-16le, ucs-2, ucs-2be, ucs-2le, ucs-4, ucs-4be, ucs-4le, utf-32,
utf-32be, utf-32le, some of which are synonymous, will result in UTF-8
being used internally, because UTF-8 allows lossless conversion to and
from these encodings but, unlike them, it never uses the byte 0x00 for
anything other than the NUL control character also used in C to
terminate strings.)

3) The 'fileencodings' option defines the heuristics used to determine
the encoding used on disk within a file. The first value is tried first,
then the next in case of failure, and so on. Since 8-bit encodings
cannot give a "fail" signal, there should be at most one of them, and at
the end. OTOH, ucs-bom (if used) should be first, and in any case before
any other Unicode encoding. IOW:

Bad: set fencs=utf-8,ucs-bom,latin1,cp1252
- The BOM in a UTF-8 file will never be detected and you'll see <FEFF>
at the start of line 1 of any file in UTF-8 with BOM;
- Code page 1252 will never be used because latin1 (which cannot give a
"fail" signal) will be tried before. (Note that ending with
,cp1252,latin1 would be just as bad: in this case it's latin1 which
would never be used.)

Good: set fencs=ucs-bom,utf-8,cp1252

Any encoding not in the 'fencs' can still be used, see ":help ++enc".

For more details, see http://vim.wikia.com/Working_with_Unicode


Best regards,
Tony.
--
Republicans raise dahlias, Dalmatians and eyebrows.
Democrats raise Airedales, kids and taxes.

Democrats eat the fish they catch.
Republicans hang them on the wall.

Republican boys date Democratic girls. They plan to marry Republican
girls, but feel they're entitled to a little fun first.

Democrats make up plans and then do something else.
Republicans follow the plans their grandfathers made.

Republicans consume three-fourths of the rutabaga produced in the USA.
The remainder is thrown out.

Republicans sleep in twin beds -- some even in separate rooms.
That is why there are more Democrats.
-- The Official Rules, as compiled by Paul Dickson

Reply all
Reply to author
Forward
0 new messages