bad display of output utf-8 chars

67 views
Skip to first unread message

Ni Va

unread,
Dec 5, 2017, 11:31:05 AM12/5/17
to vim_use
Hi,

I use gvim 8.0.136x under windows 10 and got a bad display of output of robocopy tool.

How can I fix it ?
Thank you
Nicholas

Capture.PNG

Tony Mechelynck

unread,
Dec 5, 2017, 7:39:33 PM12/5/17
to vim...@googlegroups.com
It seems that Vim doesn't recognise the charset in which Robocopy
wrote its output. For instance, it seems that the letter é (small
latin letter e with acute) is represented by 0x82 while in Latin1 it
would be 0xE9 (and therefore in Unicode it would be codepoint U+00E9,
represented in UTF-16le as E9 00 or in UTF-8 as C3 A9).

Can you find out which code page is set in your locale (or "country
parameters" or whatever Windows calls it)? For instance, if it is
(let's say) code page 850, you can maybe open the file in Vim by means
of the command

:view ++enc=cp850 filename.txt " readonly
or
:e ++enc=cp850 filename.txt " read-write

replacing of course "filename.txt" by the actual name (and path, if
necessary) of the file. Similarly with another value after =cp if it
is another code page.

See ":help ++opt" for details.


Best regards,
Tony.
> --
> --
> You received this message from the "vim_use" maillist.
> Do not top-post! Type your reply below the text you are replying to.
> For more information, visit http://www.vim.org/maillist.php
>
> ---
> You received this message because you are subscribed to the Google Groups "vim_use" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to vim_use+u...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

Ni Va

unread,
Dec 6, 2017, 9:29:51 AM12/6/17
to vim_use
Hi Tony,

In fact, I launch it through a job_start('robocopy "srcpath" "destpath" *.ico', out_cb).So, I get output of robocopy from my own out_cb job function.

So how can I set the <good encoding charset> dealing with out of job_start call ?

Nota : Even from microsoft knowledge, it's difficult to know which charset robocopy used. I've read that it can depend to the charset of executable cmd, or system.



Tony Mechelynck

unread,
Dec 6, 2017, 10:45:15 AM12/6/17
to vim...@googlegroups.com
Well, the "system" charset should be set in your "country settings",
or something, but it's been years and years since I left Windows (XP
SP4) for Linux, and I can't check it here.

On a Unix-like system you would set the LC_CTYPE environment variable
to the desired locale (such as fr_FR.UTF-8) before starting the
program, and that would be that; maybe setting ":language ctype
fr_FR.UTF-8" would be enough; but Windows is decidedly non-Unix-like
and doesn't work the same way. Assuming that you have set 'encoding'
to UTF-8 in your vimrc, you might try something like ":language ctype
French_France.10646", or something, before starting the job, but I'm
not sure of the settings to use for Windows (maybe ":language ctype"
with no furter argument might help you), and I'm not sure of which
Windows "code page" means UTF-8, or at least some Unicode encoding
with BOM. You might fall back on 1252 for the codepage (the value
after the dot) when starting Robocopy (but come back to UTF-8
immediately afterwards): that codepage corresponds to the charset
known internationally as "Windows-1252", it is an 8-bit encoding very
similar to Latin1 — the Windows salespeople claim that it is what
Latin1 ought to be, or sometimes they even gloss over the difference,
which concerns only characters 0x80 to 0x9F: these are rarely-used
controls in "true" Latin1 but in Windows-1252 they are additional
printable characters, including among others the Euro sign and IIRC
the œ and Œ digraphs.

Best regards,
Tony.

Lucien Gentis

unread,
Dec 6, 2017, 11:01:12 AM12/6/17
to vim...@googlegroups.com, Ni Va
Hello Ni Va,

As Tony said, "letter é (small latin letter e with acute) is represented
by 0x82" , so it is code page 850

I dont use Windows, but by googling, I found that you can open a command
terminal and use 'chcp' command to know which code page is set.

Then, you can try 'chcp 850', or 'chcp 1252', then use a command like
'gvim <your file>' if you can redirect your robocopy output to <your file>

Tony Mechelynck

unread,
Dec 6, 2017, 12:59:07 PM12/6/17
to vim...@googlegroups.com, Ni Va
On Wed, Dec 6, 2017 at 5:01 PM, Lucien Gentis <lucien...@waika9.com> wrote:
> Hello Ni Va,
>
> As Tony said, "letter é (small latin letter e with acute) is represented by
> 0x82" , so it is code page 850
>
> I dont use Windows, but by googling, I found that you can open a command
> terminal and use 'chcp' command to know which code page is set.
>
> Then, you can try 'chcp 850', or 'chcp 1252', then use a command like 'gvim
> <your file>' if you can redirect your robocopy output to <your file>

OK, so for code page 850, if you can redirect robocopy output to a
file, Vim (with +iconv, or with +iconv/dyn and the iconv or libiconv
library where Vim can find it. I tried to find a relevant help tag,
but ":helpgrep iconv" delivered so many useful items (including, no
doubt, some that you won't need today) that I thought it would be
better for you to browse them (first, map :cnext<CR> to some F key if
you haven't yet done it, then run ":helpgrep iconv", and then
repeatedly press the {rhs} of the mapping to see, one after another,
all the places in the help where that word is mentioned).

...er, I was saying, to open a file in code page 850, use ":view
++enc=cp850 filename.ext" or ":e ++enc=cp850 filename.ext" as I
originally said (at the time, I was _guessing_ that it could be code
page 850, which happens to be the International code page for MS/DOS).


Best regards,
Tony.

Ni Va

unread,
Dec 6, 2017, 6:18:17 PM12/6/17
to vim_use
Yes thank you Tony and Mr Gentis,

Enter chcp under windows console return 850 so you're right Tony.
So i can try to redirect robocopy's output even if it is so far of my idea ( getting output data live at runtime)

Tony Mechelynck

unread,
Dec 6, 2017, 6:40:16 PM12/6/17
to vim...@googlegroups.com
You could also try setting "chcp 1252" or (if that's what it is for
Unicode) "chcp 10646" either before starting Vim, or, if you use gvim,
in a batfile before invoking robocopy, maybe as follows:

-- ROBOCOPY.BAT ----------
chcp 1252
robocopy.exe %1 %2 %3 %4 %5 %6 %7 %8 %9
-- ROBOCOPY.BAT -- END -----

(assuming that your robocopy program is a binary executable, not a bat
or perl or python etc. script, and that you start it with no more than
9 command-line arguments). You would then invoke robocopy.bat
explicitly _with_ the .bat extension in your Vim job-start command,
and that .bat file would set code page 1252 (which Vim ought to be
able to read with no trouble) then it would pass its command-line
arguments (if any) to the .exe program.


Best regards,
Tony.

Dan Wierenga

unread,
Dec 6, 2017, 6:59:53 PM12/6/17
to vim...@googlegroups.com

On Wed, Dec 6, 2017 at 3:18 PM, Ni Va <niva...@gmail.com> wrote:
Yes thank you Tony and Mr Gentis,

Enter chcp under windows console return 850 so you're right Tony.
So i can try to redirect robocopy's output even if it is so far of my idea ( getting output data live at runtime)


robocopy.exe has an explicit /UNICODE flag to control the output encoding.   You may want to try that before re-architecting your process to deal with redirected output.

Ni Va

unread,
Dec 7, 2017, 9:44:09 AM12/7/17
to vim_use
For the moment it works with my output showmessages func fixed.

function! sequencerutil#showmessages() abort " {{{
let debug_file=tempname()
exe 'redir! > ' . debug_file .'|silent messages|redir END'
call setqflist(readfile(debug_file))
set encoding=cp850
copen
endfunc "}}}



Nota : /UNILOG option causes errors combined with /MIR that I need

Tony Mechelynck

unread,
Dec 7, 2017, 11:02:09 AM12/7/17
to vim...@googlegroups.com
Changing 'encoding' anywhere other than your vimrc (which is sourced
before loading the first editfile) can have disastrous results,
because it changes how the contents of all text in Vim memory is
interpreted but it doesn't reload any files already in memory. If at
that moment you already have another file loaded (a help file, maybe:
some of them, including options.txt for the 'langmap' example, are in
"true" UTF-8) it could become hopelessly garbled.

Alas, ":copen" doesn't accept a ++enc modifier.

Best regards,
Tony.

Ni Va

unread,
Dec 7, 2017, 11:37:53 AM12/7/17
to vim_use
I understand and saw disastrous chinese results yesterday.. :) hopefully I had a 7z of my Vi distribution :)

So, if copen does not accept ++enc modifier which way can I take to modify only copened tempfile ?

Tony Mechelynck

unread,
Dec 7, 2017, 11:45:19 AM12/7/17
to vim...@googlegroups.com
On Thu, Dec 7, 2017 at 5:37 PM, Ni Va <niva...@gmail.com> wrote:
[...]
> I understand and saw disastrous chinese results yesterday.. :) hopefully I had a 7z of my Vi distribution :)
>
> So, if copen does not accept ++enc modifier which way can I take to modify only copened tempfile ?

:-( I don't know. Have you tried to have robocopy create it in Windows-1252?


Best regards,
Tony.

Ni Va

unread,
Dec 7, 2017, 12:19:15 PM12/7/17
to vim_use
No, just reading that for the moment
https://cloud.google.com/storage/docs/gsutil/addlhelp/Filenameencodingandinteroperabilityproblems

But the problem should be generic: Have capacity to change encoding into buffer only.

At end of my launching jobs mecanism, all messages are put in a temp file, this is a user case with robocopy but many others tools too. (Siemens 90' for example)

Tony Mechelynck

unread,
Dec 7, 2017, 1:28:03 PM12/7/17
to vim...@googlegroups.com
The charset used by Vim to represent data in memory ('encoding') is
global. For each edit buffer, there is in addition the 'fileencoding'
which Vim uses to remember which charset is used by the file on disk.
That is buffer-local and is set either explicitly by reading the file
with ++enc= or else by means of the 'fileencodings' (plural) option
which defines the heuristic to be used.

A recommended 'encoding' value is utf-8 because that can be translated
losslessly to and from all other charsets: Latin1, UTF-8, UTF16 (le or
be) and UTF-32 (aka UCS-4, le or be) are handled internally by Vim;
the rest uses a library such as iconv (iconv.dll or libiconv.dll on
Windows with +iconv/dyn, or the iconv library can be linked statically
on any platform when Vim was compiled with +iconv without /dyn).

'fileencodings' (the heuristic) is a comma-separated list. Each
charset is tried in turn, until there is one which gives no error.
There should be at most one 8-bit charset and it should come last,
because 8-bit charsets can give no "failure" signal. A recommended
value (and the Vim default if 'encoding' is set to some Unicode value)
is ucs-bom,utf8,default,latin1 where the "default" encoding (the
system default), which can be for instance some national Far-East
encoding, will be tried if no Unicode BOM is found (that's "ucs-bom")
and if the file is not in UTF-8 (which has very strict rules for what
a valid byte sequence is). You might want to add utf-16le before
"default" if you often use files in UTF-16le without BOM. A result of
this particular heuristic is that files in 7-bit US-ASCII will be
recognized as UTF-8 but that is not an error because the two are
(intentionally) byte-for-byte compatible in the ASCII range which is
0..0x7F

For details, see http://vim.wikia.com/wiki/Working_with_Unicode most
definitely including the "References" section at the end, which gives
a number of "places of interest" in the Vim online help.


Best regards,
Tony.

Ni Va

unread,
Dec 7, 2017, 2:06:43 PM12/7/17
to vim_use
Ok Thank you Tony it helps a lot.

Ken Takata

unread,
Dec 8, 2017, 5:46:20 AM12/8/17
to vim_use
Hi,
Sorry, I don't follow the discussion, but if you use `:grep`, `:make` or
some other commands, you can use the 'makeencoding' option to set the
encoding for those commands. (If you use recent versions of Vim.)

If you use setqflist() to set qflist, you may want to use iconv() to
convert the encoding.
E.g.

call setqflist(iconv(readfile(debug_file), 'cp1252', &encoding))

Maybe you can also try &termencoding or 'char' instead of 'cp1252'.

Regards,
Ken Takata

Ni Va

unread,
May 7, 2018, 5:52:51 AM5/7/18
to vim_use
Hi,

Sorry for the late responding.

I confirm launching robocopy /tee under windows from job_start use encoding cp850 and can be convert by that :


function! s:out_cb(channel, message) dict "{{{
" 1.
call writefile([a:message],s:debug_file,"a")
call setqflist(split(iconv(string(readfile(s:debug_file)), 'cp850', &encoding)))

" 2.
call sequencerutil#echomsg(s:start, iconv(a:message, 'cp850', &encoding))
....

1. direct message retrieved from out_cb method
2. if a file has been written
Reply all
Reply to author
Forward
0 new messages