[PATCH v14 00/26] Issue 80: Unicode support on Windows

41 views
Skip to first unread message

karste...@dcon.de

unread,
Jan 16, 2012, 7:27:08 PM1/16/12
to msy...@googlegroups.com, bl...@dcon.de, Johannes....@gmx.de, kusm...@gmail.com, ki...@mns.spb.ru, mic...@wheelycreek.net, robin.r...@gmail.com, at...@chejz.com, pa...@paulbetts.org
Hi again,

here is v14 of the unicode patch series:

git (moved to github): https://github.com/kblees/git/tree/kb/unicode-v14
less-444 (unchanged):
https://github.com/kblees/msysgit/commits/kb/msys/less
Unicode msys.dll (unchanged):
https://github.com/kblees/msysgit/commits/kb/msys/unicode
Recodetree script (new):
https://github.com/kblees/msysgit/commits/kb/recodetree

A git installer that bundles the Unicode msys.dll, less-444, recodetree
script and git v1.7.9-rc0 with unicode-v14 patches can be found here:
https://docs.google.com/uc?id=0BxXUoUg2r8ftMjI2MWE5ODMtMGFhYi00ZThjLWE3NzUtNTU1MzM2ODgwNGNh&export=download

And TortoiseGit V1.7.5.0 built with CP_ACP replaced by CP_UTF8
(unchanged):
32 bit:
https://docs.google.com/uc?id=0BxXUoUg2r8ftYjFmMzg0MmItYjZhMy00MjM4LWFkYjktN2RiOTUxNDdiMzdk&export=download

64 bit:
https://docs.google.com/uc?id=0BxXUoUg2r8ftNDQxMWEyMTgtM2FjMy00YzA5LTgxNGEtNTc2MjhkYWZjNDIy&export=download


As my stupid mail client rewraps all the patches anyway, and most changes
have already been discussed on the mailing list, I think I'll skip the 26
followup mails this time. If you desparately crave for those mails, please
speak up, otherwise feel free to comment on the interdiff below.

Bye,
Karsten


Changes since kb/unicode-v13:
* Revert case-sensitive-before-case-insensitive environment patches
* Win32: Thread-safe windows console output:
- improve thread exit mechanism (reversed pipe + DisonnectNamedPipe)
- use ARRAY_SIZE instead of hardcoded buffer size
* Win32: Unicode file name support (except dirent)
- reduce conversion buffer for f[re]open's otype parameter from 10 to 4
- use ARRAY_SIZE instead of hardcoded buffer sizes
* Win32: Unicode file name support (dirent)
- use xutftowcs_path with a buffer[MAX_PATH + 2] (as in 'is_dir_empty')
* Unicode file name support (gitk and git-gui)
- fix git-gui startup if absolute worktree path contains non-ascii chars
* Win32: Unicode environment (outgoing)
- don't change environment iteration to array semantics (yet)
- add comment to explain +2 in lengh calculation
- rename envblkpos and envblksz variables for consistency
* Win32: unify environment case-sensitivity
- revert to case-insensitive-only lookup
* Win32: move environment block creation to a helper method
- renamed to "Win32: factor out environment block creation"
* Win32: don't copy the environment twice when spawning child processes
- prevent empty statement in for-loop
* Win32: keep the environment sorted
- revert to case-insensitive-only binary search
- fix endless loop when searching for a value smaller than the first
entry
- fix potential overflow with > 2G environment entries


Commit logs ('+': new/rewritten in v14, '-': removed, '!': modified)
---
- [--/26] MSVC: link dynamically to the CRT (already merged)
[01/26] git-gui: fix encoding in git-gui file browser
[02/26] gitk: fix file name encoding in diff hunk headers
[03/26] Revert "Disable test on MinGW that challenges its bash quoting"
+ [04/26] Revert "Windows: teach getenv to do a case-sensitive search"
+ [05/26] Revert "mingw.c: move definition of mingw_getenv down"
! [06/26] Win32: Thread-safe windows console output
[07/26] Win32: add Unicode conversion functions
! [08/26] Win32: Unicode file name support (except dirent)
! [09/26] Win32: Unicode file name support (dirent)
! [10/26] Unicode file name support (gitk and git-gui)
[11/26] Win32: Unicode arguments (outgoing)
[12/26] Win32: Unicode arguments (incoming)
[13/26] Win32: sync Unicode console output and file system
! [14/26] Win32: Unicode environment (outgoing)
[15/26] Win32: Unicode environment (incoming)
[16/26] MinGW: disable legacy encoding tests
[17/26] Win32: fix environment memory leaks
! [18/26] Win32: unify environment case-sensitivity
[19/26] Win32: simplify internal mingw_spawn* APIs
[20/26] Win32: move environment functions
[21/26] Win32: unify environment function names
[22/26] Win32: factor out environment block creation
! [23/26] Win32: don't copy the environment twice when spawning child
processes
[24/26] Win32: reduce environment array reallocations
! [25/26] Win32: keep the environment sorted
[26/26] Win32: patch Windows environment on startup

Makefile | 2 -
compat/mingw.c | 680
+++++++++++++++++++++++++++++++----------------
compat/mingw.h | 134 +++++++++-
compat/win32/dirent.c | 30 +--
compat/win32/dirent.h | 2 +-
compat/winansi.c | 397 ++++++++++++++++++----------
git-gui/git-gui.sh | 8 +-
git-gui/lib/browser.tcl | 2 +-
git-gui/lib/index.tcl | 6 +-
gitk-git/gitk | 15 +-
run-command.c | 10 +-
t/t3901-i18n-patch.sh | 19 +-
t/t4201-shortlog.sh | 6 +-
t/t5505-remote.sh | 5 +-
t/t8005-blame-i18n.sh | 8 +-
15 files changed, 880 insertions(+), 444 deletions(-)
--


Diff between v12 and v13...
---
$ git diff -p --stat kb/unicode-v13-578cb11c..kb/unicode-v14
compat/mingw.c | 57
++++++++++++++----------------------------------
compat/win32/dirent.c | 10 ++------
compat/winansi.c | 44 +++++++++++++++++++++----------------
git-gui/git-gui.sh | 2 +-
4 files changed, 46 insertions(+), 67 deletions(-)

diff --git a/compat/mingw.c b/compat/mingw.c
index af42e3c..183a2a4 100644
--- a/compat/mingw.c
+++ b/compat/mingw.c
@@ -379,14 +379,14 @@ FILE *mingw_fopen (const char *filename, const char
*otype)
{
int hide = 0;
FILE *file;
- wchar_t wfilename[MAX_PATH], wotype[10];
+ wchar_t wfilename[MAX_PATH], wotype[4];
if (hide_dotfiles == HIDE_DOTFILES_TRUE &&
basename((char*)filename)[0] == '.')
hide = access(filename, F_OK);
if (filename && !strcmp(filename, "/dev/null"))
filename = "nul";
if (xutftowcs_path(wfilename, filename) < 0 ||
- xutftowcs(wotype, otype, 10) < 0)
+ xutftowcs(wotype, otype, ARRAY_SIZE(wotype)) < 0)
return NULL;
file = _wfopen(wfilename, wotype);
if (file && hide && make_hidden(wfilename))
@@ -398,14 +398,14 @@ FILE *mingw_freopen (const char *filename, const
char *otype, FILE *stream)
{
int hide = 0;
FILE *file;
- wchar_t wfilename[MAX_PATH], wotype[10];
+ wchar_t wfilename[MAX_PATH], wotype[4];
if (hide_dotfiles == HIDE_DOTFILES_TRUE &&
basename((char*)filename)[0] == '.')
hide = access(filename, F_OK);
if (filename && !strcmp(filename, "/dev/null"))
filename = "nul";
if (xutftowcs_path(wfilename, filename) < 0 ||
- xutftowcs(wotype, otype, 10) < 0)
+ xutftowcs(wotype, otype, ARRAY_SIZE(wotype)) < 0)
return NULL;
file = _wfreopen(wfilename, wotype, stream);
if (file && hide && make_hidden(wfilename))
@@ -727,7 +727,7 @@ char *mingw_getcwd(char *pointer, int len)
{
int i;
wchar_t wpointer[MAX_PATH];
- if (!_wgetcwd(wpointer, MAX_PATH))
+ if (!_wgetcwd(wpointer, ARRAY_SIZE(wpointer)))
return NULL;
if (xwcstoutf(pointer, wpointer, len) < 0)
return NULL;
@@ -761,41 +761,18 @@ static int compareenv(const void *v1, const void
*v2)

static int bsearchenv(char **env, const char *name, size_t size)
{
- unsigned low = 0, high = size - 1, mid, i, j, nmln;
- int cmp;
- while (low <= high) {
- mid = (low + high) >> 1;
- cmp = compareenv(&env[mid], &name);
- if (cmp < 0)
+ unsigned low = 0, high = size;
+ while (low < high) {
+ unsigned mid = low + ((high - low) >> 1);
+ int cmp = compareenv(&env[mid], &name);
+ if (cmp < 0)
low = mid + 1;
else if (cmp > 0)
- high = mid - 1;
+ high = mid;
else
- goto found;
+ return mid;
}
return ~low; /* not found, return 1's complement of insert
position */
-
-found:
- /* look for duplicate entries before and after the matching entry
(mid) */
- i = j = mid;
- while (i > low && !compareenv(&env[i - 1], &name))
- i--;
- while (j < high && !compareenv(&env[j + 1], &name))
- j++;
- if (i == j)
- return mid; /* no duplicates */
-
- /*
- * try case-sensitive match first, this is necessary for
sh-i18n--envsubst
- * to work with environment variables that differ only in case
(e.g. $PATH
- * and $path)
- */
- nmln = strchrnul(name, '=') - name;
- for (; i <= j; i++)
- if (!strncmp(env[i], name, nmln) && '=' == env[i][nmln])
- /* matches */
- return i;
- return mid;
}

/*
@@ -1039,7 +1016,7 @@ static wchar_t *make_environment_block(char
**deltaenv)
{
wchar_t *wenvblk = NULL;
char **tmpenv;
- int i = 0, size = environ_size, envblksz = 0, envblkpos = 0;
+ int i = 0, size = environ_size, wenvsz = 0, wenvpos = 0;

while (deltaenv && deltaenv[i])
i++;
@@ -1054,12 +1031,12 @@ static wchar_t *make_environment_block(char
**deltaenv)

/* create environment block from temporary environment */
for (i = 0; tmpenv[i]; i++) {
- size = 2 * strlen(tmpenv[i]) + 2;
- ALLOC_GROW(wenvblk, (envblkpos + size) * sizeof(wchar_t),
envblksz);
- envblkpos += xutftowcs(&wenvblk[envblkpos], tmpenv[i],
size) + 1;
+ size = 2 * strlen(tmpenv[i]) + 2; /* +2 for final \0 */
+ ALLOC_GROW(wenvblk, (wenvpos + size) * sizeof(wchar_t),
wenvsz);
+ wenvpos += xutftowcs(&wenvblk[wenvpos], tmpenv[i], size) +
1;
}
/* add final \0 terminator */
- wenvblk[envblkpos] = 0;
+ wenvblk[wenvpos] = 0;
free(tmpenv);
return wenvblk;
}
diff --git a/compat/win32/dirent.c b/compat/win32/dirent.c
index c69a689..52420ec 100644
--- a/compat/win32/dirent.c
+++ b/compat/win32/dirent.c
@@ -20,19 +20,15 @@ static inline void finddata2dirent(struct dirent *ent,
WIN32_FIND_DATAW *fdata)

DIR *opendir(const char *name)
{
- wchar_t pattern[MAX_PATH];
+ wchar_t pattern[MAX_PATH + 2]; /* + 2 for '/' '*' */
WIN32_FIND_DATAW fdata;
HANDLE h;
int len;
DIR *dir;

- /* convert name to UTF-16, check length (-2 for '/' '*') */
- len = xutftowcs(pattern, name, MAX_PATH - 2);
- if (len < 0) {
- if (errno == ERANGE)
- errno = ENAMETOOLONG;
+ /* convert name to UTF-16 and check length < MAX_PATH */
+ if ((len = xutftowcs_path(pattern, name)) < 0)
return NULL;
- }

/* append optional '/' and wildcard '*' */
if (len && !is_dir_sep(pattern[len - 1]))
diff --git a/compat/winansi.c b/compat/winansi.c
index 18f2cdf..9f95954 100644
--- a/compat/winansi.c
+++ b/compat/winansi.c
@@ -120,7 +120,7 @@ static void write_console(unsigned char *str, size_t
len)
static wchar_t wbuf[2 * BUFFER_SIZE + 1];

/* convert utf-8 to utf-16 */
- int wlen = xutftowcsn(wbuf, (char*) str, 2 * BUFFER_SIZE + 1,
len);
+ int wlen = xutftowcsn(wbuf, (char*) str, ARRAY_SIZE(wbuf), len);

/* write directly to console */
WriteConsoleW(console, wbuf, wlen, NULL, NULL);
@@ -315,7 +315,7 @@ static void set_attr(char func, const int *params, int
paramlen)
}

enum {
- TEXT = 0, ESCAPE = 033, BRACKET = '[', EXIT = -1
+ TEXT = 0, ESCAPE = 033, BRACKET = '['
};

static DWORD WINAPI console_thread(LPVOID unused)
@@ -325,12 +325,13 @@ static DWORD WINAPI console_thread(LPVOID unused)
int start, end = 0, c, parampos = 0, state = TEXT;
int params[MAX_PARAMS];

- while (state != EXIT) {
+ while (1) {
/* read next chunk of bytes from the pipe */
if (!ReadFile(hread, buffer + end, BUFFER_SIZE - end,
&bytes,
NULL)) {
- /* exit if pipe has been closed */
- if (GetLastError() == ERROR_BROKEN_PIPE)
+ /* exit if pipe has been closed or disconnected */
+ if (GetLastError() == ERROR_PIPE_NOT_CONNECTED ||
+ GetLastError() ==
ERROR_BROKEN_PIPE)
break;
/* ignore other errors */
continue;
@@ -375,9 +376,6 @@ static DWORD WINAPI console_thread(LPVOID unused)
parampos++;
if (parampos >= MAX_PARAMS)
state = TEXT;
- } else if (c == 'q') {
- /* "\033[q": terminate the thread
*/
- state = EXIT;
} else {
/*
* end of escape sequence, change
@@ -428,22 +426,22 @@ static DWORD WINAPI console_thread(LPVOID unused)

static void winansi_exit(void)
{
- DWORD dummy;
/* flush all streams */
_flushall();

- /* close the write ends of the pipes */
+ /* signal console thread to exit */
+ FlushFileBuffers(hwrite);
+ DisconnectNamedPipe(hwrite);
+
+ /* wait for console thread to copy remaining data */
+ WaitForSingleObject(hthread, INFINITE);
+
+ /* cleanup handles... */
if (hwrite1 != INVALID_HANDLE_VALUE)
CloseHandle(hwrite1);
if (hwrite2 != INVALID_HANDLE_VALUE)
CloseHandle(hwrite2);
-
- /* send termination sequence to the thread */
- WriteFile(hwrite, "\033[q", 3, &dummy, NULL);
CloseHandle(hwrite);
-
- /* allow the thread to copy remaining data to the console */
- WaitForSingleObject(hthread, 1000);
CloseHandle(hthread);
}

@@ -488,6 +486,7 @@ static HANDLE redirect_console(FILE *stream, HANDLE
*phcon, int new_fd)
void winansi_init(void)
{
int con1, con2, hwrite_fd;
+ char name[32];

/* check if either stdout or stderr is a console output screen
buffer */
con1 = is_console(1);
@@ -495,9 +494,16 @@ void winansi_init(void)
if (!con1 && !con2)
return;

- /* create an anonymous pipe */
- if (!CreatePipe(&hread, &hwrite, NULL, BUFFER_SIZE))
- die_lasterr("CreatePipe failed");
+ /* create a named pipe to communicate with the console thread */
+ sprintf(name, "\\\\.\\pipe\\winansi%lu", GetCurrentProcessId());
+ hwrite = CreateNamedPipe(name, PIPE_ACCESS_OUTBOUND,
+ PIPE_TYPE_BYTE | PIPE_WAIT, 1, BUFFER_SIZE, 0, 0, NULL);
+ if (hwrite == INVALID_HANDLE_VALUE)
+ die_lasterr("CreateNamedPipe failed");
+
+ hread = CreateFile(name, GENERIC_READ, 0, NULL, OPEN_EXISTING, 0,
NULL);
+ if (hread == INVALID_HANDLE_VALUE)
+ die_lasterr("CreateFile for named pipe failed");

/* start console spool thread on the pipe's read end */
hthread = CreateThread(NULL, 0, console_thread, NULL, 0, NULL);
diff --git a/git-gui/git-gui.sh b/git-gui/git-gui.sh
index e5178cf..9988746 100755
--- a/git-gui/git-gui.sh
+++ b/git-gui/git-gui.sh
@@ -1273,7 +1273,7 @@ apply_config

# v1.7.0 introduced --show-toplevel to return the canonical work-tree
if {[package vsatisfies $_git_version 1.7.0]} {
- set _gitworktree [git rev-parse --show-toplevel]
+ set _gitworktree [encoding convertfrom utf-8 [git rev-parse
--show-toplevel]]
} else {
# try to set work tree from environment, core.worktree or use
# cdup to obtain a relative path to the top of the worktree. If

Vitaly

unread,
Jan 17, 2012, 10:11:04 AM1/17/12
to msy...@googlegroups.com

>  - fix git-gui startup if absolute worktree path contains non-ascii chars

It still didn't work, all other things works well.

Karsten Blees

unread,
Jan 17, 2012, 3:00:55 PM1/17/12
to msysGit, vital...@gmail.com, bl...@dcon.de, Johannes....@gmx.de, kusm...@gmail.com, ki...@mns.spb.ru, mic...@wheelycreek.net, robin.r...@gmail.com, at...@chejz.com, pa...@paulbetts.org
[re-adding CC list]

On Jan 17, 4:11 pm, Vitaly <vitalys...@gmail.com> wrote:
> >  - fix git-gui startup if absolute worktree path contains non-ascii chars
>
> It still didn't work, all other things works well.

That fixed it for me when the repo was in a non-ASCII directory.

Could you add some debug output to %GIT_INSTALL_DIR%/libexec/git-core/
git-gui.tcl to see what actually happens on your system? The repo
initialization code starts somewhere around line 1235. I used
'tk_messageBox -message "..."' and moved it around until I found the
culprit.

Hth,
Karsten

Vitaly

unread,
Jan 18, 2012, 12:45:15 AM1/18/12
to msy...@googlegroups.com, vital...@gmail.com, bl...@dcon.de, Johannes....@gmx.de, kusm...@gmail.com, ki...@mns.spb.ru, mic...@wheelycreek.net, robin.r...@gmail.com, at...@chejz.com, pa...@paulbetts.org
Problem in this line:


    set _gitworktree [encoding convertfrom utf-8 [git rev-parse --show-toplevel]]

I replace it with:

   encoding system utf-8

   set _gitworktree [git rev-parse --show-toplevel]


and got right encoding.

Also the next problem in line 2935: (exec cygpath --mixed ...) - cygpath.exe bundled with git doesn't have such option.
I have replace it with "cygpath --windows" and git gui show me their window.

Konstantin Khomoutov

unread,
Jan 18, 2012, 2:53:14 AM1/18/12
to msy...@googlegroups.com, vital...@gmail.com, bl...@dcon.de, Johannes....@gmx.de, kusm...@gmail.com, ki...@mns.spb.ru, mic...@wheelycreek.net, robin.r...@gmail.com, at...@chejz.com, pa...@paulbetts.org
On Tue, Jan 17, 2012 at 09:45:15PM -0800, Vitaly wrote:

> Problem in this line:
>
> set _gitworktree [encoding convertfrom utf-8 [git rev-parse
> --show-toplevel]]
>
> I replace it with:
>
> encoding system utf-8
> set _gitworktree [git rev-parse --show-toplevel]
>
>
> and got right encoding.

Please don't do that.
Messing with "system encoding" is generally frowned upon in the Tcl
community. The right solution should be detecting the system encoding
and converting paths between it and utf-8 (which, I assume, is what git
plumbing tools expect to see).

I'd like to see at this issue myself (as I'm both Tcl-literate and
experiencing the problem being discussed) but as usually there are free
time constraints.

karste...@dcon.de

unread,
Jan 18, 2012, 5:55:06 AM1/18/12
to Konstantin Khomoutov, at...@chejz.com, bl...@dcon.de, Johannes....@gmx.de, ki...@mns.spb.ru, kusm...@gmail.com, mic...@wheelycreek.net, msy...@googlegroups.com, pa...@paulbetts.org, robin.r...@gmail.com, vital...@gmail.com
I think I have an idea what's happening. Can you check if this works for you?

set _gitworktree [encoding convertfrom utf-8 [encoding convertto [git rev-parse --show-toplevel]]]

The other 'git output parsers' use proc git_read and explicitly fconfigure the resulting file descriptor to -encoding binary, so 'encoding convertfrom utf-8' is sufficient. proc git doesn't do that, so git's output (returned as string) would already be converted from system encoding, which needs to be reversed to binary before converting from utf-8. It probably worked for me because my system encoding is cp1252, which is more or less identical to unicode in range 00-ff.

It would be cleaner to teach proc git not to convert git's output (or to always convertfrom utf-8). But that may entail a lot more [encoding convertfrom utf-8 ...] changes.

Bye,
Karsten

Vitaly

unread,
Jan 18, 2012, 8:37:29 AM1/18/12
to msy...@googlegroups.com, Konstantin Khomoutov, at...@chejz.com, bl...@dcon.de, Johannes....@gmx.de, ki...@mns.spb.ru, kusm...@gmail.com, mic...@wheelycreek.net, pa...@paulbetts.org, robin.r...@gmail.com, vital...@gmail.com
> set _gitworktree [encoding convertfrom utf-8 [encoding convertto [git rev-parse --show-toplevel]]]

Yes, it works.

karste...@dcon.de

unread,
Feb 5, 2012, 2:03:28 PM2/5/12
to msy...@googlegroups.com, at...@chejz.com, bl...@dcon.de, Johannes....@gmx.de, ki...@mns.spb.ru, kusm...@gmail.com, mic...@wheelycreek.net, msy...@googlegroups.com, pa...@paulbetts.org, robin.r...@gmail.com, vital...@gmail.com
Vitaly <vital...@gmail.com> wrote on 18.01.2012 06:45:15:
> Also the next problem in line 2935: (exec cygpath --mixed ...) -
> cygpath.exe bundled with git doesn't have such option.
> I have replace it with "cygpath --windows" and git gui show me their
window.

I think this may be due to an outdated cygwin installation on your path.
Git for Windows does not install a cygpath binary. Also, git-gui may fail
in unexpected ways if it 'thinks' it uses cygwin git when actually using
msysgit git...

Reply all
Reply to author
Forward
0 new messages