Unicode support for Windows console

270 views
Skip to first unread message

hippy...@gmail.com

unread,
Nov 6, 2007, 1:12:54 AM11/6/07
to vim_dev
Long time Vim user, first time hacker.

I've recently noticed that although the Windows console has supported
Unicode since Windows 2000,
Vim for the Windows console doesn't yet support Unicode.

There are two ways to write Unicode to the Windows console: Native
Windows UCS-2/UTF-16
accessed via Windows's ...W APIs and UTF-8 accessed via Windows's ...A
APIs after setting the
console's code page to UTF-8. I'm assuming the latter method will be
easier for Vim.

To see how Vim currently behaves in UTF-8 here are the steps:
>From the Start menu choose "Run"
Enter "cmd /u" (this makes the console filehandles pipes etc work with
Unicode)
Type into the console "chcp 65001" (this sets the code page to UTF-8)
Change the terminal font to a Unicode font: Right click title bar,
Properties, Font, Lucida Console.
Now run Vim.

Even at startup the display is corrupt. My first guess is a
discrepancy between number of bytes and number of characters in
displayed strings. Note that :set termenc=utf8 and :set enc=utf8 have
no effect.
Note also that attributes (colours) and input seem to work fine.

I'm just feeling my way around the Vim code so far but os_win32.c /
write_chars() has caught my eye.
Perhaps somebody on this list can direct me where else to look in the
code.

Andrew Dunbar.

Tony Mechelynck

unread,
Nov 6, 2007, 9:45:24 AM11/6/07
to vim...@googlegroups.com

I guess there is a misunderstanding between Vim and its terminal.

Try

set LC_CTYPE=en_US.UTF-8

just before starting Vim.

Best regards,
Tony.
--
hundred-and-one symptoms of being an internet addict:
118. You are on a first-name basis with your ISP's staff.

Bram Moolenaar

unread,
Nov 6, 2007, 4:19:39 PM11/6/07
to hippy...@gmail.com, vim_dev

Andrew Sunbar wrote:

As far as I know nearly nobody uses cp65001. Vim doesn't support it.
I don't know if it would work when Vim would support is. The MS-Windows
console is known to be flaky.

Input through Unicode (UTF-16) should work, but I don't know of people
actually using it. Most non-Asian character sets are 8-bit. Asians use
DBCS encodings. So when would UTF-16 be used?

--
From "know your smileys":
:q vi user saying, "How do I get out of this damn emacs editor?"

/// Bram Moolenaar -- Br...@Moolenaar.net -- http://www.Moolenaar.net \\\
/// sponsor Vim, vote for features -- http://www.Vim.org/sponsor/ \\\
\\\ download, build and distribute -- http://www.A-A-P.org ///
\\\ help me help AIDS victims -- http://ICCF-Holland.org ///

Fan Decheng

unread,
Nov 6, 2007, 8:42:01 AM11/6/07
to vim...@googlegroups.com
Strange. I tried chcp 65001, and then typed

dir | more

and see:

C:\Documents and Settings\Fan Decheng>dir |more
Not enough memory.

I don't know what happened. Also, many other programs doesn't work
either. After chcp 936 (switching back), it works all right.

--
Fan Decheng
dts...@citiz.net


Andrew Dunbar

unread,
Nov 7, 2007, 8:33:29 AM11/7/07
to vim...@googlegroups.com

Yes Code page 65001 support seems quite flaky both at the API level
and at the level of support from Micosoft's own basic command line tools.
It's clear that I shouldn't persue a UTF-8 Windows console.

Unicode console support in general is a little better though. If you
create some files with Russian or Greek names, the filenames will be
displayed by dir, but piping to more will mung them.

I'm still interested in getting Vim out of the "more" category and into
the "dir" category (-:

Andrew Dunbar.


> --
> Fan Decheng
> dts...@citiz.net
>
>
>
> >
>


--
http://wiktionarydev.leuksman.com http://linguaphile.sf.net

Reply all
Reply to author
Forward
0 new messages