if you look at the log of say the git project from git bash, you'll
find lines like
Nguy<E1><BB><85>n Th<C3><A1>i Ng<E1><BB><8D>c Duy <pcl...@gmail.com>
where I assume things like <E1><BB><85> are utf-8 sequences that
weren't decoded at all by the pager (gnu less).
Is this a known issue, or is there a solution?
-Alex
In Git Bash:
$ xxd -g1 ~/a.txt
0000000: c3 a9 // file contains 2 bytes: 0xC3 0xA9
$ cat ~/a.txt
Ac
The file contains the utf-8 sequence for U+00E9 LATIN SMALL LETTER E WITH ACUTE.
But the display is 2 characters A, followed by c. It didn't decode it
correctly as 1 character.
I'm using a TrueType font in the console window (Lucida console).
If I use cmd.exe, by default it also mis-decodes is (since the default
codepage is OEM == 437), but I can easily change it to UTF-8 by
running: chcp 65001, then 'type a.txt' will decode and print the
correct character, é.
I've seen issue 358 but I don't believe anything there remedies this situation.
-Alex