Unable to use xxd to convert utf-16 file back from hex

742 views
Skip to first unread message

Gabriel Barta

unread,
Jun 30, 2016, 9:45:14 AM6/30/16
to vim...@vim.org
I was using xxd to look at the hex of a utf-16le file with byte-order-mark.  I noticed that converting from the hex form back to text just left my buffer empty.

It seems that this is because the fileencoding causes the filter   %!xxd -r  to pass utf-16le text to xxd.  However, xxd quite reasonably seems to expect the hex dump to be in ascii, (or EBCDIC!).

Thinking about what vim ought to do in general for handling encoding of shell filters makes my head hurt!  However, at least the requirements for xxd are a known quantity.

Although I don't know how often I will run into this problem, I have currently solved it for myself with the following in my vimrc:

fun! s:unhexify()
    let l:fenc = &fileencoding
    setlocal fileencoding=latin1
    :%!xxd -r
    exe 'setlocal fileencoding='.l:fenc
endfun
com! Xxd :%!xxd
com! XxdR :call s:unhexify()


Is this a solid approach to the problem, and is it worth working something like this into the vim runtime files?

- Gabriel

Tony Mechelynck

unread,
Jun 30, 2016, 10:13:54 PM6/30/16
to vim_dev, vim-dev
Which OS?

On a Unix-like OS, you might try (assuming a bash-like shell)

LC_CTYPE=en_US.UTF-8 LC_ALL= xxd

i.e. set the $LC_CTYPE environment variable explicitly to UTF-8 (where
codepoints U+0000 to U+007F are represented by their single-byte
US-ASCII representation) or even to Latin1 (by omitting the .UTF8
part).

Best regards,
Tony.

Gabriel Barta

unread,
Jul 1, 2016, 9:11:13 AM7/1/16
to vim...@googlegroups.com, vim-dev
<...>
> Which OS?
It is the same on linux, mac and windows.

To see it happen, try:
new|exe 'norm iTest'|set fenc=utf-16le|set bomb|exe '%!xxd'|%!xxd -r

You will get a --No lines in buffer-- message from vim, and an empty buffer
where it should say Test. Removing the last command shows that it was
fine up until trying to convert from hex back to text.

>
> On a Unix-like OS, you might try (assuming a bash-like shell)
>
> LC_CTYPE=en_US.UTF-8 LC_ALL= xxd

I don't think changing the locale for xxd can help - xxd converts between
7-bit ascii and untranslated binary.

My previous workaround with using latin1 only happened to work because
the file I was playing with didn't contain any unicode codepoints. Just a
case of windows using utf-16 for the joy of it.

I might have something that works for unicode buffers with BOM (see below),
but I think the real answer might be that it is reasonable to view text files in
hex mode, but it is a bit silly to want to be able to edit them.

fun! s:unhexify()
let l:unibomb = !&binary && (&fileencoding=~#'^u') && &bomb
if (l:unibomb)
let l:fenc = &fileencoding
let l:enc = &encoding
let l:tenc = &termencoding
setlocal fenc=utf-8
setlocal enc=utf-8
setlocal tenc=
endif
silent %!xxd -r
if l:unibomb
exe 'setlocal fenc='.l:fenc
exe 'setlocal enc='.l:enc
exe 'setlocal tenc='.l:tenc
endif
endfun

Nikolay Aleksandrovich Pavlov

unread,
Jul 1, 2016, 3:55:58 PM7/1/16
to vim_dev, vim-dev
You are not supposed to alter &encoding ever after startup: if you
change &encoding this automatically corrupts all strings. Also note
that `setlocal` here is misleading: `&encoding` is pure global option.
Consider your code run on system with EBCDIC support and &encoding set
to ebcdic (not sure though that encoding=utf8 will work there): AFAIK
the only variant in which &encoding is something not ASCII-compatible
(i.e. the only variant where it may need to be set *at all*, assuming
xxd on such systems does not expect EBCDIC):

0. Before running function all strings are in EBCDIC. &encoding is
EBCDIC as well.
1. You set &fileencoding to UTF-8 from whatever it was (e.g.
UTF16-LE). This changes nothing so far because it affects only what
encoding file will be converted to before writing/filtering/etc.
2. You set &encoding to UTF-8 from EBCDIC. This makes Vim thinks that
all internal strings are UTF-8 *without reencoding such strings*.
(Maybe the result of this action will be that function will stop
executing though: Vim keeps not AST, but function lines as-is and
reparses them on each run (including e.g. each iteration of the
cycle). But assume it did not.)
3. You set &termencoding to nothing. Absolutely useless action which
may only result in corrupt view, if it will have any results at all.
4. Now you run `silent %!xxd -r`. Because &fileencoding is UTF-8 and
&encoding is UTF-8, but actual text is still EBCDIC it will pass
EBCDIC text to `xxd` as UTF-8 (&encoding) does not need to be
converted to UTF-8 (&fileencoding).

You may see the same thing with the following code:

LANG=C vim -u NONE -i NONE -N --cmd 'source /tmp/test.vim' --cmd
cq 2>&1 | iconv -f latin1

" /tmp/test.vim
scriptencoding utf-8
function Corrupt()
setlocal termencoding=latin1
setlocal encoding=latin1
setlocal fileencoding=latin1
call setline('.', ["«»"])
echomsg string(getline(1))
%!hexdump -C
echomsg string(getline(1))
%delete _
call setline('.', ["«»"])
echomsg string(getline(1))
setlocal termencoding=utf-8
setlocal encoding=utf-8
setlocal fileencoding=utf-8
echomsg string(getline(1))
%!hexdump -C
echomsg string(getline(1))
%delete _
call setline('.', ["«»"])
endfunction
call Corrupt()

output will be

'«»'
'00000000 ab bb 0a |...|'
'«»'
'<ab><bb>'
'00000000 ab bb 0a |...|'

: note that despite you changed &encoding from latin1 to utf-8 what
hexdump received did not change at all. Only you got corrupt view on
`«»`.

Basically your function needs to alter *only* &fileencoding. It *must
not* alter &encoding. It is *useless* to alter &termencoding. The only
reason it works is because unless you compiled Vim with EBCDIC support
on EBCDIC system Vim only allows ASCII-compatible &encoding values,
but &fileencoding has no such restriction so function is still useful
for your applications.

> setlocal tenc=
> endif
> silent %!xxd -r
> if l:unibomb
> exe 'setlocal fenc='.l:fenc
> exe 'setlocal enc='.l:enc
> exe 'setlocal tenc='.l:tenc
> endif
> endfun
>
> --
> --
> You received this message from the "vim_dev" maillist.
> Do not top-post! Type your reply below the text you are replying to.
> For more information, visit http://www.vim.org/maillist.php
>
> ---
> You received this message because you are subscribed to the Google Groups "vim_dev" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to vim_dev+u...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
Reply all
Reply to author
Forward
0 new messages