[PATCH v10 00/28] Issue 80: Unicode support on Windows

290 views
Skip to first unread message

Karsten Blees

unread,
Oct 6, 2011, 1:40:52 PM10/6/11
to bl...@dcon.de, msy...@googlegroups.com

Hi there,

I just pushed v10 of the Unicode patches for msysgit to http://repo.or.cz/w/git/mingw/4msysgit/kblees.git.

A git installer that bundles the Unicode msys.dll, less-444 and git v1.7.6 with unicode-v10 patches is available here: https://docs.google.com/uc?id=0BxXUoUg2r8ftMjZlZDBjYjktNTE2Yi00NjExLTk0MzctMTMwZjQzNDkyNmZl&export=download

So far I haven't encountered any problems with the Unicode-aware msys.dll (http://groups.google.com/group/msysgit/browse_thread/thread/981da260b4fcb1df), and there was no feedback from the mailing list either (this could be good or bad :-).

Ciao,
Karsten


Branches of interest:

kb/unicode-v9-full-2cbf7794
(http://repo.or.cz/w/git/mingw/4msysgit/kblees.git/kb/unicode-v9-full-2cbf7794)
This is v9 rebased to current devel, with all conflicts resolved in favor of v9 (for reference).

kb/unicode-v10-minimal
(http://repo.or.cz/w/git/mingw/4msysgit/kblees.git/kb/unicode-v10-minimal)
Unicode support with minimal environment changes.
Changes since kb/unicode-v9-minimal:
- removed patches that have already been merged
- utftowcsn: fixed some advanced checks by using unsigned char * for the UTF-8 string
- reenabled t5505
- changed tests to work with Unicode-aware msys.dll / bash

kb/unicode-v10
(http://repo.or.cz/w/git/mingw/4msysgit/kblees.git/kb/unicode-v10)
Additional environment features. The first two of these patches I consider important fixes (memory leaks and case-sensitivity), the rest is nice to have.
Changes since kb/unicode-v9-full:
- reordered the environment patches to cleanup first, then add features
- removed unnecessary char **env parameter in some spawn functions
- bsearchenv: added case-sensitive matching introduced in "Windows: teach getenv to do a case-sensitive search" (df599e96)


Karsten Blees (28) ('+': new/rewritten in v10, '!': modified):
MSVC: disable const qualifier compiler warnings
MSVC: don't use uninitialized variables
MSVC: include <io.h> for mktemp
MSVC: fix winansi.c compile errors
MSVC: link dynamically to the CRT
git-gui: fix encoding in git-gui file browser
gitk: fix file name encoding in diff hunk headers
+ Revert "Disable test on MinGW that challenges its bash quoting"
! Win32: add Unicode conversion functions
Win32: Unicode file name support (except dirent)
Win32: Unicode file name support (dirent)
Unicode file name support (gitk and git-gui)
Win32: Unicode arguments (outgoing)
Win32: Unicode arguments (incoming)
Win32: Thread-safe windows console output
Win32: Unicode environment (outgoing)
Win32: Unicode environment (incoming)
+ (kb/unicode-v10-minimal) MinGW: disable legacy encoding tests
+ Win32: fix environment memory leaks
+ Win32: unify environment case-sensitivity
+ Win32: simplify internal mingw_spawn* APIs
Win32: move environment functions
! Win32: unify environment function names
! Win32: move environment block creation to a helper method
! Win32: don't copy the environment twice when spawning child processes
Win32: reduce environment array reallocations
! Win32: keep the environment sorted
(kb/unicode-v10) Win32: patch Windows environment on startup

Makefile | 8 +-
builtin/cat-file.c | 2 +-
builtin/fast-export.c | 2 +-
builtin/rev-list.c | 2 +-
compat/mingw.c | 706 ++++++++++++++++++++++++++-------------
compat/mingw.h | 107 ++++++-
compat/vcbuild/include/unistd.h | 5 +
compat/win32/dirent.c | 31 +--
compat/win32/dirent.h | 2 +-
compat/winansi.c | 366 +++++++++++++--------
fast-import.c | 8 +-
git-gui/git-gui.sh | 6 +-
git-gui/lib/browser.tcl | 2 +-
git-gui/lib/index.tcl | 6 +-
gitk-git/gitk | 15 +-
match-trees.c | 12 +-
merge-recursive.c | 2 +-
run-command.c | 12 +-
submodule.c | 2 +-
t/t3901-i18n-patch.sh | 19 +-
t/t4201-shortlog.sh | 6 +-
t/t5505-remote.sh | 5 +-
t/t8005-blame-i18n.sh | 8 +-
transport.c | 2 +-
wt-status.c | 2 +-
25 files changed, 872 insertions(+), 466 deletions(-)
---

And the diff between v9-full and v10...
---
$ git diff -p --stat kb/unicode-v9-full-2cbf7794..kb/unicode-v10
compat/mingw.c | 83 +++++++++++++++++++++++++++++--------------------
t/t3901-i18n-patch.sh | 19 ++++++-----
t/t4201-shortlog.sh | 6 ++--
t/t5505-remote.sh | 5 +--
t/t8005-blame-i18n.sh | 8 ++--
5 files changed, 68 insertions(+), 53 deletions(-)

diff --git a/compat/mingw.c b/compat/mingw.c
index 51cde10..77ea5d9 100644
--- a/compat/mingw.c
+++ b/compat/mingw.c
@@ -181,7 +181,7 @@ static int ask_yes_no_if_possible(const char *format, ...)
vsnprintf(question, sizeof(question), format, args);
va_end(args);

- if ((retry_hook[0] = mingw_getenv("GIT_ASK_YESNO"))) {
+ if ((retry_hook[0] = getenv("GIT_ASK_YESNO"))) {
retry_hook[1] = question;
return !run_command_v_opt(retry_hook, 0);
}
@@ -761,18 +761,41 @@ static int compareenv(const void *v1, const void *v2)

static int bsearchenv(char **env, const char *name, size_t size)
{
- unsigned low = 0, high = size - 1;
+ unsigned low = 0, high = size - 1, mid, i, j, nmln;
+ int cmp;
while (low <= high) {
- unsigned mid = (low + high) >> 1;
- int cmp = compareenv(&env[mid], &name);
+ mid = (low + high) >> 1;
+ cmp = compareenv(&env[mid], &name);
if (cmp < 0)
low = mid + 1;
else if (cmp > 0)
high = mid - 1;
else
- return mid; /* found */
+ goto found;
}
return ~low; /* not found, return 1's complement of insert position */
+
+found:
+ /* look for duplicate entries before and after the matching entry (mid) */
+ i = j = mid;
+ while (i > low && !compareenv(&env[i - 1], &name))
+ i--;
+ while (j < high && !compareenv(&env[j + 1], &name))
+ j++;
+ if (i == j)
+ return mid; /* no duplicates */
+
+ /*
+ * try case-sensitive match first, this is necessary for sh-i18n--envsubst
+ * to work with environment variables that differ only in case (e.g. $PATH
+ * and $path)
+ */
+ nmln = strlen(name);
+ for (; i <= j; i++)
+ if (!strncmp(env[i], name, nmln) && '=' == env[i][nmln])
+ /* matches */
+ return i;
+ return mid;
}

/*
@@ -925,7 +948,7 @@ static const char *parse_interpreter(const char *cmd)
*/
static char **get_path_split(void)
{
- char *p, **path, *envpath = mingw_getenv("PATH");
+ char *p, **path, *envpath = getenv("PATH");
int i, n = 0;

if (!envpath || !*envpath)
@@ -1012,13 +1035,13 @@ static char *path_lookup(const char *cmd, char **path, int exe_only)
* Create environment block suitable for CreateProcess. Merges current
* process environment and the supplied environment changes.
*/
-static wchar_t *make_environment_block(char **env)
+static wchar_t *make_environment_block(char **deltaenv)
{
wchar_t *wenvblk = NULL;
char **tmpenv;
int i = 0, size = environ_size, envblksz = 0, envblkpos = 0;

- while (env && env[i])
+ while (deltaenv && deltaenv[i])
i++;

/* copy the environment, leaving space for changes */
@@ -1026,8 +1049,8 @@ static wchar_t *make_environment_block(char **env)
memcpy(tmpenv, environ, size * sizeof(char*));

/* merge supplied environment changes into the temporary environment */
- for (i = 0; env && env[i]; i++)
- size = do_putenv(tmpenv, env[i], size, 0);
+ for (i = 0; deltaenv && deltaenv[i]; i++)
+ size = do_putenv(tmpenv, deltaenv[i], size, 0);

/* create environment block from temporary environment */
for (i = 0; tmpenv[i]; i++) {
@@ -1049,7 +1072,7 @@ struct pinfo_t {
struct pinfo_t *pinfo = NULL;
CRITICAL_SECTION pinfo_cs;

-static pid_t mingw_spawnve_fd(const char *cmd, const char **argv, char **env,
+static pid_t mingw_spawnve_fd(const char *cmd, const char **argv, char **deltaenv,
const char *dir,
int prepend_cmd, int fhin, int fhout, int fherr)
{
@@ -1117,10 +1140,7 @@ static pid_t mingw_spawnve_fd(const char *cmd, const char **argv, char **env,
utftowcs(wargs, args.buf, 2 * args.len + 1);
strbuf_release(&args);

- if (env == environ)
- env = NULL;
-
- wenvblk = make_environment_block(env);
+ wenvblk = make_environment_block(deltaenv);

memset(&pi, 0, sizeof(pi));
ret = CreateProcessW(wcmd, wargs, NULL, NULL, TRUE, flags,
@@ -1156,13 +1176,12 @@ static pid_t mingw_spawnve_fd(const char *cmd, const char **argv, char **env,
return (pid_t)pi.dwProcessId;
}

-static pid_t mingw_spawnve(const char *cmd, const char **argv, char **env,
- int prepend_cmd)
+static pid_t mingw_spawnv(const char *cmd, const char **argv, int prepend_cmd)
{
- return mingw_spawnve_fd(cmd, argv, env, NULL, prepend_cmd, 0, 1, 2);
+ return mingw_spawnve_fd(cmd, argv, NULL, NULL, prepend_cmd, 0, 1, 2);
}

-pid_t mingw_spawnvpe(const char *cmd, const char **argv, char **env,
+pid_t mingw_spawnvpe(const char *cmd, const char **argv, char **deltaenv,
const char *dir,
int fhin, int fhout, int fherr)
{
@@ -1186,14 +1205,14 @@ pid_t mingw_spawnvpe(const char *cmd, const char **argv, char **env,
pid = -1;
}
else {
- pid = mingw_spawnve_fd(iprog, argv, env, dir, 1,
+ pid = mingw_spawnve_fd(iprog, argv, deltaenv, dir, 1,
fhin, fhout, fherr);
free(iprog);
}
argv[0] = argv0;
}
else
- pid = mingw_spawnve_fd(prog, argv, env, dir, 0,
+ pid = mingw_spawnve_fd(prog, argv, deltaenv, dir, 0,
fhin, fhout, fherr);
free(prog);
}
@@ -1201,7 +1220,7 @@ pid_t mingw_spawnvpe(const char *cmd, const char **argv, char **env,
return pid;
}

-static int try_shell_exec(const char *cmd, char *const *argv, char **env)
+static int try_shell_exec(const char *cmd, char *const *argv)
{
const char *interpr = parse_interpreter(cmd);
char **path;
@@ -1219,7 +1238,7 @@ static int try_shell_exec(const char *cmd, char *const *argv, char **env)
argv2 = xmalloc(sizeof(*argv) * (argc+1));
argv2[0] = (char *)cmd; /* full path to the script file */
memcpy(&argv2[1], &argv[1], sizeof(*argv) * argc);
- pid = mingw_spawnve(prog, argv2, env, 1);
+ pid = mingw_spawnv(prog, argv2, 1);
if (pid >= 0) {
int status;
if (waitpid(pid, &status, 0) < 0)
@@ -1234,13 +1253,13 @@ static int try_shell_exec(const char *cmd, char *const *argv, char **env)
return pid;
}

-static void mingw_execve(const char *cmd, char *const *argv, char *const *env)
+void mingw_execv(const char *cmd, char *const *argv)
{
/* check if git_command is a shell script */
- if (!try_shell_exec(cmd, argv, (char **)env)) {
+ if (!try_shell_exec(cmd, argv)) {
int pid, status;

- pid = mingw_spawnve(cmd, (const char **)argv, (char **)env, 0);
+ pid = mingw_spawnv(cmd, (const char **)argv, 0);
if (pid < 0)
return;
if (waitpid(pid, &status, 0) < 0)
@@ -1255,7 +1274,7 @@ void mingw_execvp(const char *cmd, char *const *argv)
char *prog = path_lookup(cmd, path, 0);

if (prog) {
- mingw_execve(prog, argv, environ);
+ mingw_execv(prog, argv);
free(prog);
} else
errno = ENOENT;
@@ -1263,11 +1282,6 @@ void mingw_execvp(const char *cmd, char *const *argv)
free_path_split(path);
}

-void mingw_execv(const char *cmd, char *const *argv)
-{
- mingw_execve(cmd, argv, environ);
-}
-
int mingw_kill(pid_t pid, int sig)
{
if (pid > 0 && sig == SIGTERM) {
@@ -1952,9 +1966,10 @@ int mingw_offset_1st_component(const char *path)
return offset + is_dir_sep(path[offset]);
}

-int utftowcsn(wchar_t *wcs, const char *utf, size_t wcslen, int utflen)
+int utftowcsn(wchar_t *wcs, const char *utfs, size_t wcslen, int utflen)
{
int upos = 0, wpos = 0;
+ const unsigned char *utf = (const unsigned char*) utfs;
if (!utf || !wcs || wcslen < 1) {
errno = EINVAL;
return -1;
@@ -1965,7 +1980,7 @@ int utftowcsn(wchar_t *wcs, const char *utf, size_t wcslen, int utflen)
utflen = INT_MAX;

while (upos < utflen) {
- int c = ((int) utf[upos++]) & 0xff;
+ int c = utf[upos++] & 0xff;
if (utflen == INT_MAX && c == 0)
break;

diff --git a/t/t3901-i18n-patch.sh b/t/t3901-i18n-patch.sh
index 31a5770..55c8a2f 100755
--- a/t/t3901-i18n-patch.sh
+++ b/t/t3901-i18n-patch.sh
@@ -54,10 +54,13 @@ test_expect_success setup '
git add yours &&
git commit -s -m "Second on side" &&

- # the second one on the side branch is ISO-8859-1
- git config i18n.commitencoding ISO8859-1 &&
- # use author and committer name in ISO-8859-1 to match it.
- . "$TEST_DIRECTORY"/t3901-8859-1.txt &&
+ if test_have_prereq NOT_MINGW
+ then
+ # the second one on the side branch is ISO-8859-1
+ git config i18n.commitencoding ISO8859-1 &&
+ # use author and committer name in ISO-8859-1 to match it.
+ . "$TEST_DIRECTORY"/t3901-8859-1.txt
+ fi &&
test_tick &&
echo Yet another >theirs &&
git add theirs &&
@@ -119,7 +122,7 @@ test_expect_success 'rebase (U/L)' '
check_encoding 2
'

-test_expect_success 'rebase (L/L)' '
+test_expect_success NOT_MINGW 'rebase (L/L)' '
# In this test we want ISO-8859-1 encoded commits as the result
git config i18n.commitencoding ISO8859-1 &&
git config i18n.logoutputencoding ISO8859-1 &&
@@ -131,7 +134,7 @@ test_expect_success 'rebase (L/L)' '
check_encoding 2 8859
'

-test_expect_success 'rebase (L/U)' '
+test_expect_success NOT_MINGW 'rebase (L/U)' '
# This is pathological -- use UTF-8 as intermediate form
# to get ISO-8859-1 results.
git config i18n.commitencoding ISO8859-1 &&
@@ -159,7 +162,7 @@ test_expect_success 'cherry-pick(U/U)' '
check_encoding 3
'

-test_expect_success 'cherry-pick(L/L)' '
+test_expect_success NOT_MINGW 'cherry-pick(L/L)' '
# Both the commitencoding and logoutputencoding is set to ISO-8859-1

git config i18n.commitencoding ISO8859-1 &&
@@ -189,7 +192,7 @@ test_expect_success 'cherry-pick(U/L)' '
check_encoding 3
'

-test_expect_success 'cherry-pick(L/U)' '
+test_expect_success NOT_MINGW 'cherry-pick(L/U)' '
# Again, the commitencoding is set to ISO-8859-1 but
# logoutputencoding is set to UTF-8.

diff --git a/t/t4201-shortlog.sh b/t/t4201-shortlog.sh
index 6872ba1..4896381 100755
--- a/t/t4201-shortlog.sh
+++ b/t/t4201-shortlog.sh
@@ -93,7 +93,7 @@ test_expect_success 'output from user-defined format is re-wrapped' '
test_cmp expect log.predictable
'

-test_expect_success 'shortlog wrapping' '
+test_expect_success NOT_MINGW 'shortlog wrapping' '
cat >expect <<\EOF &&
A U Thor (5):
Test
@@ -114,7 +114,7 @@ EOF
test_cmp expect out
'

-test_expect_success 'shortlog from non-git directory' '
+test_expect_success NOT_MINGW 'shortlog from non-git directory' '
git log HEAD >log &&
GIT_DIR=non-existing git shortlog -w <log >out &&
test_cmp expect out
@@ -135,7 +135,7 @@ $DSCHO (2):

EOF

-test_expect_success 'shortlog encoding' '
+test_expect_success NOT_MINGW 'shortlog encoding' '
git reset --hard "$commit" &&
git config --unset i18n.commitencoding &&
echo 2 > a1 &&
diff --git a/t/t5505-remote.sh b/t/t5505-remote.sh
index fae1200..0d0222e 100755
--- a/t/t5505-remote.sh
+++ b/t/t5505-remote.sh
@@ -915,10 +915,7 @@ test_expect_success 'remote set-url --add bbb' '
'

test_expect_success 'remote set-url --delete .*' '
- if test_have_prereq NOT_MINGW
- then
- test_must_fail git remote set-url --delete someremote .\*
- fi &&
+ test_must_fail git remote set-url --delete someremote .\* &&
echo "YYY" >expect &&
echo baz >>expect &&
echo bbb >>expect &&
diff --git a/t/t8005-blame-i18n.sh b/t/t8005-blame-i18n.sh
index cb39055..a6e73d0 100755
--- a/t/t8005-blame-i18n.sh
+++ b/t/t8005-blame-i18n.sh
@@ -33,7 +33,7 @@ author $SJIS_NAME
summary $SJIS_MSG
EOF

-test_expect_success \
+test_expect_success NOT_MINGW \
'blame respects i18n.commitencoding' '
git blame --incremental file | \
egrep "^(author|summary) " > actual &&
@@ -49,7 +49,7 @@ author $EUC_JAPAN_NAME
summary $EUC_JAPAN_MSG
EOF

-test_expect_success \
+test_expect_success NOT_MINGW \
'blame respects i18n.logoutputencoding' '
git config i18n.logoutputencoding eucJP &&
git blame --incremental file | \
@@ -66,7 +66,7 @@ author $UTF8_NAME
summary $UTF8_MSG
EOF

-test_expect_success \
+test_expect_success NOT_MINGW \
'blame respects --encoding=UTF-8' '
git blame --incremental --encoding=UTF-8 file | \
egrep "^(author|summary) " > actual &&
@@ -82,7 +82,7 @@ author $UTF8_NAME
summary $UTF8_MSG
EOF

-test_expect_success \
+test_expect_success NOT_MINGW \
'blame respects --encoding=none' '
git blame --incremental --encoding=none file | \
egrep "^(author|summary) " > actual &&
---

Karsten Blees

unread,
Oct 6, 2011, 1:40:54 PM10/6/11
to bl...@dcon.de, msy...@googlegroups.com
Git code sometimes uses uninitialized variables to initialize themselves
(i.e. 'type varname = varname;'). According to some commit messages, this
is supposedly to suppress the 'using uninitialized variable' warning
(which is quite paradox). GCC currently doesn't seem to catch this, but
MSVC does. Initialize with some arbitrary value instead to silence the
compilers.

Signed-off-by: Karsten Blees <bl...@dcon.de>
---


builtin/cat-file.c | 2 +-
builtin/fast-export.c | 2 +-
builtin/rev-list.c | 2 +-

fast-import.c | 8 ++++----
match-trees.c | 12 ++++++------
merge-recursive.c | 2 +-
run-command.c | 2 +-
submodule.c | 2 +-


transport.c | 2 +-
wt-status.c | 2 +-

10 files changed, 18 insertions(+), 18 deletions(-)

diff --git a/builtin/cat-file.c b/builtin/cat-file.c
index 07bd984..8a6bb24 100644
--- a/builtin/cat-file.c
+++ b/builtin/cat-file.c
@@ -168,7 +168,7 @@ static int batch_one_object(const char *obj_name, int print_contents)
unsigned char sha1[20];
enum object_type type = 0;
unsigned long size;
- void *contents = contents;
+ void *contents = NULL;

if (!obj_name)
return 1;
diff --git a/builtin/fast-export.c b/builtin/fast-export.c
index 9836e6b..c7ce01c 100644
--- a/builtin/fast-export.c
+++ b/builtin/fast-export.c
@@ -481,7 +481,7 @@ static void get_tags_and_duplicates(struct object_array *pending,
for (i = 0; i < pending->nr; i++) {
struct object_array_entry *e = pending->objects + i;
unsigned char sha1[20];
- struct commit *commit = commit;
+ struct commit *commit = NULL;
char *full_name;

if (dwim_ref(e->name, strlen(e->name), sha1, &full_name) != 1)
diff --git a/builtin/rev-list.c b/builtin/rev-list.c
index 56727e8..9a6073f 100644
--- a/builtin/rev-list.c
+++ b/builtin/rev-list.c
@@ -394,7 +394,7 @@ int cmd_rev_list(int argc, const char **argv, const char *prefix)
mark_edges_uninteresting(revs.commits, &revs, show_edge);

if (bisect_list) {
- int reaches = reaches, all = all;
+ int reaches = 0, all = 0;

revs.commits = find_bisection(revs.commits, &reaches, &all,
bisect_find_all);
diff --git a/fast-import.c b/fast-import.c
index 7cc2262..4406f1f 100644
--- a/fast-import.c
+++ b/fast-import.c
@@ -2183,7 +2183,7 @@ static void file_change_m(struct branch *b)
const char *p = command_buf.buf + 2;
static struct strbuf uq = STRBUF_INIT;
const char *endp;
- struct object_entry *oe = oe;
+ struct object_entry *oe = NULL;
unsigned char sha1[20];
uint16_t mode, inline_data = 0;

@@ -2352,7 +2352,7 @@ static void note_change_n(struct branch *b, unsigned char old_fanout)
{
const char *p = command_buf.buf + 2;
static struct strbuf uq = STRBUF_INIT;
- struct object_entry *oe = oe;
+ struct object_entry *oe = NULL;
struct branch *s;
unsigned char sha1[20], commit_sha1[20];
char path[60];
@@ -2512,7 +2512,7 @@ static int parse_from(struct branch *b)

static struct hash_list *parse_merge(unsigned int *count)
{
- struct hash_list *list = NULL, *n, *e = e;
+ struct hash_list *list = NULL, *n, *e = NULL;
const char *from;
struct branch *s;

@@ -2806,7 +2806,7 @@ static void cat_blob(struct object_entry *oe, unsigned char sha1[20])
static void parse_cat_blob(void)
{
const char *p;
- struct object_entry *oe = oe;
+ struct object_entry *oe = NULL;
unsigned char sha1[20];

/* cat-blob SP <object> LF */
diff --git a/match-trees.c b/match-trees.c
index 26f7ed1..de050d0 100644
--- a/match-trees.c
+++ b/match-trees.c
@@ -72,12 +72,12 @@ static int score_trees(const unsigned char *hash1, const unsigned char *hash2)
die("%s is not a tree", sha1_to_hex(hash2));
init_tree_desc(&two, two_buf, size);
while (one.size | two.size) {
- const unsigned char *elem1 = elem1;
- const unsigned char *elem2 = elem2;
- const char *path1 = path1;
- const char *path2 = path2;
- unsigned mode1 = mode1;
- unsigned mode2 = mode2;
+ const unsigned char *elem1 = NULL;
+ const unsigned char *elem2 = NULL;
+ const char *path1 = NULL;
+ const char *path2 = NULL;
+ unsigned mode1 = 0;
+ unsigned mode2 = 0;
int cmp;

if (one.size)
diff --git a/merge-recursive.c b/merge-recursive.c
index 0cc1e6f..542f61d 100644
--- a/merge-recursive.c
+++ b/merge-recursive.c
@@ -1579,7 +1579,7 @@ int merge_recursive(struct merge_options *o,
{
struct commit_list *iter;
struct commit *merged_common_ancestors;
- struct tree *mrtree = mrtree;
+ struct tree *mrtree = NULL;
int clean;

if (show(o, 4)) {
diff --git a/run-command.c b/run-command.c
index a2796c4..a8be1b1 100644
--- a/run-command.c
+++ b/run-command.c
@@ -137,7 +137,7 @@ int start_command(struct child_process *cmd)
{
int need_in, need_out, need_err;
int fdin[2], fdout[2], fderr[2];
- int failed_errno = failed_errno;
+ int failed_errno = 0;

/*
* In case of errors we must keep the promise to close FDs
diff --git a/submodule.c b/submodule.c
index 1ba9646..71a0508 100644
--- a/submodule.c
+++ b/submodule.c
@@ -253,7 +253,7 @@ void show_submodule_summary(FILE *f, const char *path,
const char *del, const char *add, const char *reset)
{
struct rev_info rev;
- struct commit *left = left, *right = right;
+ struct commit *left = NULL, *right = NULL;
const char *message = NULL;
struct strbuf sb = STRBUF_INIT;
int fast_forward = 0, fast_backward = 0;
diff --git a/transport.c b/transport.c
index 98c5778..be1538c 100644
--- a/transport.c
+++ b/transport.c
@@ -104,7 +104,7 @@ static void insert_packed_refs(const char *packed_refs, struct ref **list)
return;

for (;;) {
- int cmp = cmp, len;
+ int cmp = 0, len;

if (!fgets(buffer, sizeof(buffer), f)) {
fclose(f);
diff --git a/wt-status.c b/wt-status.c
index 0237772..eba4d31 100644
--- a/wt-status.c
+++ b/wt-status.c
@@ -227,7 +227,7 @@ static void wt_status_print_change_data(struct wt_status *s,
{
struct wt_status_change_data *d = it->util;
const char *c = color(change_type, s);
- int status = status;
+ int status = 0;
char *one_name;
char *two_name;
const char *one, *two;
--
1.7.6.msysgit.1

Karsten Blees

unread,
Oct 6, 2011, 1:40:55 PM10/6/11
to bl...@dcon.de, msy...@googlegroups.com
Include io.h from unistd.h, as in MinGW, so that mktemp is available.

Signed-off-by: Karsten Blees <bl...@dcon.de>
---

compat/vcbuild/include/unistd.h | 2 ++
1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/compat/vcbuild/include/unistd.h b/compat/vcbuild/include/unistd.h
index bde4d05..3ca5c57 100644
--- a/compat/vcbuild/include/unistd.h
+++ b/compat/vcbuild/include/unistd.h
@@ -3,6 +3,8 @@

/* Win32 define for porting git*/

+#include <io.h>
+
/* disable const** -> void* warning (~130 warnings for realloc etc.) */
#pragma warning(disable : 4090)

--
1.7.6.msysgit.1

Karsten Blees

unread,
Oct 6, 2011, 1:40:53 PM10/6/11
to bl...@dcon.de, msy...@googlegroups.com
MSVC warns about implicit casts from const** to void* (which should be OK,
as void* indicates the content is treated as opaque). E.g.

const char *x;
realloc(x, ...); // warning C4090 due to const** -> void*

As there's not that much MSVC-only code, the 'real' const to non-const
casts that we want to know about will be caught by GCC builds anyway, so
disable this warning.

MSVC link.exe doesn't recognize the /wd option in CFLAGS in the Makefile,
so use #pragma warning in central, MSVC-specific unistd.h.

Signed-off-by: Karsten Blees <bl...@dcon.de>
---

compat/vcbuild/include/unistd.h | 3 +++
1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/compat/vcbuild/include/unistd.h b/compat/vcbuild/include/unistd.h
index b14fcf9..bde4d05 100644
--- a/compat/vcbuild/include/unistd.h
+++ b/compat/vcbuild/include/unistd.h
@@ -3,6 +3,9 @@



/* Win32 define for porting git*/

+/* disable const** -> void* warning (~130 warnings for realloc etc.) */
+#pragma warning(disable : 4090)
+
#ifndef _MODE_T_
#define _MODE_T_
typedef unsigned short _mode_t;
--
1.7.6.msysgit.1

Karsten Blees

unread,
Oct 6, 2011, 1:40:56 PM10/6/11
to bl...@dcon.de, msy...@googlegroups.com
Some constants (such as LF_FACESIZE) are undefined with -DNOGDI (set in the
Makefile), and CONSOLE_FONT_INFOEX is available in MSVC, but not in MinGW.

Cast FARPROC to PGETCURRENTCONSOLEFONTEX to suppress MSVC compiler warning.

Signed-off-by: Karsten Blees <bl...@dcon.de>
---

compat/winansi.c | 7 +++++--
1 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/compat/winansi.c b/compat/winansi.c
index a5ca2d9..bf514f9 100644
--- a/compat/winansi.c
+++ b/compat/winansi.c
@@ -2,6 +2,7 @@
* Copyright 2008 Peter Harris <g...@peter.is-a-geek.org>
*/

+#undef NOGDI
#include "../git-compat-util.h"
#include <malloc.h>
#include <wingdi.h>
@@ -30,6 +31,7 @@ static int negative;
static FILE *last_stream = NULL;
static int non_ascii_used = 0;

+#ifdef __MINGW32__
typedef struct _CONSOLE_FONT_INFOEX {
ULONG cbSize;
DWORD nFont;
@@ -38,6 +40,7 @@ typedef struct _CONSOLE_FONT_INFOEX {
UINT FontWeight;
WCHAR FaceName[LF_FACESIZE];
} CONSOLE_FONT_INFOEX, *PCONSOLE_FONT_INFOEX;
+#endif

typedef BOOL (WINAPI *PGETCURRENTCONSOLEFONTEX)(HANDLE, BOOL,
PCONSOLE_FONT_INFOEX);
@@ -52,8 +55,8 @@ static void warn_if_raster_font(void)
return;

/* GetCurrentConsoleFontEx is available since Vista */
- pGetCurrentConsoleFontEx = GetProcAddress(GetModuleHandle("kernel32.dll"),
- "GetCurrentConsoleFontEx");
+ pGetCurrentConsoleFontEx = (PGETCURRENTCONSOLEFONTEX) GetProcAddress(
+ GetModuleHandle("kernel32.dll"), "GetCurrentConsoleFontEx");
if (pGetCurrentConsoleFontEx) {
CONSOLE_FONT_INFOEX cfi;
cfi.cbSize = sizeof(cfi);
--
1.7.6.msysgit.1

Karsten Blees

unread,
Oct 6, 2011, 1:41:01 PM10/6/11
to bl...@dcon.de, msy...@googlegroups.com
Add Unicode conversion functions to convert between Windows native UTF-16LE
encoding to UTF-8 and back.

To support repositories with legacy-encoded file names, the UTF-8 to UTF-16
conversion function tries to create valid, unique file names even for
invalid UTF-8 byte sequences, so that these repositories can be checked out
without error.

The current implementation leaves invalid UTF-8 bytes in range 0xa0 - 0xff
as is (producing printable Unicode chars \u00a0 - \u00ff, equivalent to
ISO-8859-1), and converts 0x80 - 0x9f to hex-code (\u0080 - \u009f are
control chars).

The Windows MultiByteToWideChar API was not used as it either drops invalid
UTF-8 sequences (on Win2k/XP; producing non-unique or even empty file
names) or converts them to the replacement char \ufffd (Vista/7; causing
ERROR_INVALID_NAME in subsequent calls to file system APIs).

Signed-off-by: Karsten Blees <bl...@dcon.de>
---

compat/mingw.c | 85 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
compat/mingw.h | 77 ++++++++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 162 insertions(+), 0 deletions(-)

diff --git a/compat/mingw.c b/compat/mingw.c
index 57d41dd..ba7ac27 100644
--- a/compat/mingw.c
+++ b/compat/mingw.c
@@ -1869,6 +1869,91 @@ int mingw_offset_1st_component(const char *path)


return offset + is_dir_sep(path[offset]);
}

+int utftowcsn(wchar_t *wcs, const char *utfs, size_t wcslen, int utflen)

+{
+ int upos = 0, wpos = 0;


+ const unsigned char *utf = (const unsigned char*) utfs;

+ if (!utf || !wcs || wcslen < 1) {
+ errno = EINVAL;
+ return -1;
+ }
+ /* reserve space for \0 */
+ wcslen--;
+ if (utflen < 0)
+ utflen = INT_MAX;
+
+ while (upos < utflen) {


+ int c = utf[upos++] & 0xff;

+ if (utflen == INT_MAX && c == 0)
+ break;
+
+ if (wpos >= wcslen) {
+ wcs[wpos] = 0;
+ errno = ENAMETOOLONG;
+ return -1;
+ }
+
+ if (c < 0x80) {
+ /* ASCII */
+ wcs[wpos++] = c;
+ } else if (c >= 0xc2 && c < 0xe0 && upos < utflen &&
+ (utf[upos] & 0xc0) == 0x80) {
+ /* 2-byte utf-8 */
+ c = ((c & 0x1f) << 6);
+ c |= (utf[upos++] & 0x3f);
+ wcs[wpos++] = c;
+ } else if (c >= 0xe0 && c < 0xf0 && upos + 1 < utflen &&
+ !(c == 0xe0 && utf[upos] < 0xa0) && /* over-long encoding */
+ (utf[upos] & 0xc0) == 0x80 &&
+ (utf[upos + 1] & 0xc0) == 0x80) {
+ /* 3-byte utf-8 */
+ c = ((c & 0x0f) << 12);
+ c |= ((utf[upos++] & 0x3f) << 6);
+ c |= (utf[upos++] & 0x3f);
+ wcs[wpos++] = c;
+ } else if (c >= 0xf0 && c < 0xf5 && upos + 2 < utflen &&
+ wpos + 1 < wcslen &&
+ !(c == 0xf0 && utf[upos] < 0x90) && /* over-long encoding */
+ !(c == 0xf4 && utf[upos] >= 0x90) && /* > \u10ffff */
+ (utf[upos] & 0xc0) == 0x80 &&
+ (utf[upos + 1] & 0xc0) == 0x80 &&
+ (utf[upos + 2] & 0xc0) == 0x80) {
+ /* 4-byte utf-8: convert to \ud8xx \udcxx surrogate pair */
+ c = ((c & 0x07) << 18);
+ c |= ((utf[upos++] & 0x3f) << 12);
+ c |= ((utf[upos++] & 0x3f) << 6);
+ c |= (utf[upos++] & 0x3f);
+ c -= 0x10000;
+ wcs[wpos++] = 0xd800 | (c >> 10);
+ wcs[wpos++] = 0xdc00 | (c & 0x3ff);
+ } else if (c >= 0xa0) {
+ /* invalid utf-8 byte, printable unicode char: convert 1:1 */
+ wcs[wpos++] = c;
+ } else {
+ /* invalid utf-8 byte, non-printable unicode: convert to hex */
+ static const char *hex = "0123456789abcdef";
+ wcs[wpos++] = hex[c >> 4];
+ if (wpos < wcslen)
+ wcs[wpos++] = hex[c & 0x0f];
+ }
+ }
+ wcs[wpos] = 0;
+ return wpos;
+}
+
+int wcstoutf(char *utf, const wchar_t *wcs, size_t utflen)
+{
+ if (!wcs || !utf || utflen < 1) {
+ errno = EINVAL;
+ return -1;
+ }
+ utflen = WideCharToMultiByte(CP_UTF8, 0, wcs, -1, utf, utflen, NULL, NULL);
+ if (utflen)
+ return utflen - 1;
+ errno = ENAMETOOLONG;
+ return -1;
+}
+
/*
* Disable MSVCRT command line wildcard expansion (__getmainargs called from
* mingw startup code, see init.c in mingw runtime).
diff --git a/compat/mingw.h b/compat/mingw.h
index 3aa80bb..b4ff6cd 100644
--- a/compat/mingw.h
+++ b/compat/mingw.h
@@ -326,6 +326,83 @@ void mingw_mark_as_git_dir(const char *dir);
char **make_augmented_environ(const char *const *vars);
void free_environ(char **env);

+/**
+ * Converts UTF-8 encoded string to UTF-16LE.
+ *
+ * To support repositories with legacy-encoded file names, invalid UTF-8 bytes
+ * 0xa0 - 0xff are converted to corresponding printable Unicode chars \u00a0 -
+ * \u00ff, and invalid UTF-8 bytes 0x80 - 0x9f (which would make non-printable
+ * Unicode) are converted to hex-code.
+ *
+ * Lead-bytes not followed by an appropriate number of trail-bytes, over-long
+ * encodings and 4-byte encodings > \u10ffff are detected as invalid UTF-8.
+ *
+ * Maximum space requirement for the target buffer is two wide chars per UTF-8
+ * char (((strlen(utf) * 2) + 1) [* sizeof(wchar_t)]).
+ *
+ * The maximum space is needed only if the entire input string consists of
+ * invalid UTF-8 bytes in range 0x80-0x9f, as per the following table:
+ *
+ * | | UTF-8 | UTF-16 |
+ * Code point | UTF-8 sequence | bytes | words | ratio
+ * --------------+-------------------+-------+--------+-------
+ * 000000-00007f | 0-7f | 1 | 1 | 1
+ * 000080-0007ff | c2-df + 80-bf | 2 | 1 | 0.5
+ * 000800-00ffff | e0-ef + 2 * 80-bf | 3 | 1 | 0.33
+ * 010000-10ffff | f0-f4 + 3 * 80-bf | 4 | 2 (a) | 0.5
+ * invalid | 80-9f | 1 | 2 (b) | 2
+ * invalid | a0-ff | 1 | 1 | 1
+ *
+ * (a) encoded as UTF-16 surrogate pair
+ * (b) encoded as two hex digits
+ *
+ * Note that, while the UTF-8 encoding scheme can be extended to 5-byte, 6-byte
+ * or even indefinite-byte sequences, the largest valid code point \u10ffff
+ * encodes as only 4 UTF-8 bytes.
+ *
+ * @param wcs wide char target buffer
+ * @param utf string to convert
+ * @param wcslen size of target buffer (in wchar_t's)
+ * @param utflen size of string to convert, or -1 if 0-terminated
+ * @return length of converted string (_wcslen(wcs)), or -1 on failure (errno
+ * is set to EINVAL or ENAMETOOLONG)
+ * @see mbstowcs, MultiByteToWideChar
+ */
+int utftowcsn(wchar_t *wcs, const char *utf, size_t wcslen, int utflen);
+static inline int utftowcs(wchar_t *wcs, const char *utf, size_t wcslen)
+{
+ return utftowcsn(wcs, utf, wcslen, -1);
+}
+
+/**
+ * Converts UTF-16LE encoded string to UTF-8.
+ *
+ * Maximum space requirement for the target buffer is three UTF-8 chars per
+ * wide char ((_wcslen(wcs) * 3) + 1).
+ *
+ * The maximum space is needed only if the entire input string consists of
+ * UTF-16 words in range 0x0800-0xd7ff or 0xe000-0xffff (i.e. \u0800-\uffff
+ * modulo surrogate pairs), as per the following table:
+ *
+ * | | UTF-16 | UTF-8 |
+ * Code point | UTF-16 sequence | words | bytes | ratio
+ * --------------+-----------------------+--------+-------+-------
+ * 000000-00007f | 0000-007f | 1 | 1 | 1
+ * 000080-0007ff | 0080-07ff | 1 | 2 | 2
+ * 000800-00ffff | 0800-d7ff / e000-ffff | 1 | 3 | 3
+ * 010000-10ffff | d800-dbff + dc00-dfff | 2 | 4 | 2
+ *
+ * Note that invalid code points > 10ffff cannot be represented in UTF-16.
+ *
+ * @param utf target buffer
+ * @param wcs wide string to convert
+ * @param utflen size of target buffer
+ * @return length of converted string, or -1 on failure (errno is set to EINVAL
+ * or ENAMETOOLONG)
+ * @see wcstombs, WideCharToMultiByte
+ */
+int wcstoutf(char *utf, const wchar_t *wcs, size_t utflen);
+
/*
* A replacement of main() that adds win32 specific initialization.
*/
--
1.7.6.msysgit.1

Karsten Blees

unread,
Oct 6, 2011, 1:41:00 PM10/6/11
to bl...@dcon.de, msy...@googlegroups.com
This reverts commit 8117ed6a2205c17823910749cf0391ad811bdfa9.

With "MinGW: disable CRT command line globbing" (a05e9a86), wildcard
expansion works as expected (handled by git, not the CRT).

Signed-off-by: Karsten Blees <bl...@dcon.de>
---

t/t5505-remote.sh | 5 +----
1 files changed, 1 insertions(+), 4 deletions(-)

diff --git a/t/t5505-remote.sh b/t/t5505-remote.sh
index fae1200..0d0222e 100755
--- a/t/t5505-remote.sh
+++ b/t/t5505-remote.sh
@@ -915,10 +915,7 @@ test_expect_success 'remote set-url --add bbb' '
'

test_expect_success 'remote set-url --delete .*' '
- if test_have_prereq NOT_MINGW
- then
- test_must_fail git remote set-url --delete someremote .\*
- fi &&
+ test_must_fail git remote set-url --delete someremote .\* &&
echo "YYY" >expect &&
echo baz >>expect &&
echo bbb >>expect &&

--
1.7.6.msysgit.1

Karsten Blees

unread,
Oct 6, 2011, 1:40:57 PM10/6/11
to bl...@dcon.de, msy...@googlegroups.com
Dynamic linking is generally preferred over static linking, and MSVCRT.dll
has been integral part of Windows for a long time.

This also fixes linker warnings for _malloc and _free in zlib.lib, which
seems to be compiled for MSVCRT.dll already.

The DLL version also exports some of the CRT initialization functions,
which are hidden in the static libcmt.lib (e.g. __wgetmainargs, required by
subsequent Unicode patches).

Signed-off-by: Karsten Blees <bl...@dcon.de>
---

Makefile | 6 +++---
1 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/Makefile b/Makefile
index 234d0af..4594d81 100644
--- a/Makefile
+++ b/Makefile
@@ -1133,16 +1133,16 @@ ifeq ($(uname_S),Windows)
compat/win32/pthread.o compat/win32/syslog.o \
compat/win32/sys/poll.o compat/win32/dirent.o
COMPAT_CFLAGS = -D__USE_MINGW_ACCESS -DNOGDI -DHAVE_STRING_H -DHAVE_ALLOCA_H -Icompat -Icompat/regex -Icompat/win32 -DSTRIP_EXTENSION=\".exe\"
- BASIC_LDFLAGS = -IGNORE:4217 -IGNORE:4049 -NOLOGO -SUBSYSTEM:CONSOLE -NODEFAULTLIB:MSVCRT.lib
+ BASIC_LDFLAGS = -IGNORE:4217 -IGNORE:4049 -NOLOGO -SUBSYSTEM:CONSOLE
EXTLIBS = user32.lib advapi32.lib shell32.lib wininet.lib ws2_32.lib
PTHREAD_LIBS =
lib =
ifndef DEBUG
- BASIC_CFLAGS += -GL -Os -MT
+ BASIC_CFLAGS += -GL -Os -MD
BASIC_LDFLAGS += -LTCG
AR += -LTCG
else
- BASIC_CFLAGS += -Zi -MTd
+ BASIC_CFLAGS += -Zi -MDd
endif
X = .exe
endif
--
1.7.6.msysgit.1

Karsten Blees

unread,
Oct 6, 2011, 1:40:59 PM10/6/11
to bl...@dcon.de, msy...@googlegroups.com
Decode file names from system encoding in all diff hunk header lines, not
just the first (i.e. print nice file names in 'rename from' / 'rename to' /
'Binary files' lines, too).

Signed-off-by: Karsten Blees <bl...@dcon.de>
---

gitk-git/gitk | 1 +
1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/gitk-git/gitk b/gitk-git/gitk
index 2a92e20..935ed5c 100755
--- a/gitk-git/gitk
+++ b/gitk-git/gitk
@@ -7778,6 +7778,7 @@ proc getblobdiffline {bdf ids} {
set diffinhdr 0
continue
}
+ set line [encoding convertfrom $line]
$ctext insert end "$line\n" filesep

} else {
--
1.7.6.msysgit.1

Karsten Blees

unread,
Oct 6, 2011, 1:40:58 PM10/6/11
to bl...@dcon.de, msy...@googlegroups.com
Assume git tree objects (i.e. output of git-ls-tree) are encoded in system
encoding, for display in the git-gui file browser.

Signed-off-by: Karsten Blees <bl...@dcon.de>
---

git-gui/lib/browser.tcl | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/git-gui/lib/browser.tcl b/git-gui/lib/browser.tcl
index a8c6223..d68b1f9 100644
--- a/git-gui/lib/browser.tcl
+++ b/git-gui/lib/browser.tcl
@@ -191,7 +191,7 @@ method _ls {tree_id {name {}}} {
$w conf -state disabled

set fd [git_read ls-tree -z $tree_id]
- fconfigure $fd -blocking 0 -translation binary -encoding binary
+ fconfigure $fd -blocking 0 -translation binary
fileevent $fd readable [cb _read $fd]
}

--
1.7.6.msysgit.1

Karsten Blees

unread,
Oct 6, 2011, 1:41:03 PM10/6/11
to bl...@dcon.de, msy...@googlegroups.com
Changes opendir/readdir to use Windows Unicode APIs and convert between
UTF-8/UTF-16.

Removes parameter checks that are already covered by utftowcs.

Increases the size of dirent.d_name to accommodate the full
WIN32_FIND_DATA.cFileName converted to UTF-8 (UTF-16 to UTF-8 conversion
may grow by factor three in the worst case).

Signed-off-by: Karsten Blees <bl...@dcon.de>
---

compat/win32/dirent.c | 31 +++++++++++--------------------
compat/win32/dirent.h | 2 +-
2 files changed, 12 insertions(+), 21 deletions(-)

diff --git a/compat/win32/dirent.c b/compat/win32/dirent.c
index 82a515c..37f56b7 100644
--- a/compat/win32/dirent.c
+++ b/compat/win32/dirent.c
@@ -6,10 +6,10 @@ struct DIR {
int dd_stat; /* 0-based index */
};

-static inline void finddata2dirent(struct dirent *ent, WIN32_FIND_DATAA *fdata)
+static inline void finddata2dirent(struct dirent *ent, WIN32_FIND_DATAW *fdata)
{
- /* copy file name from WIN32_FIND_DATA to dirent */
- memcpy(ent->d_name, fdata->cFileName, sizeof(ent->d_name));
+ /* convert UTF-16 name to UTF-8 */
+ wcstoutf(ent->d_name, fdata->cFileName, sizeof(ent->d_name));

/* Set file type, based on WIN32_FIND_DATA */
if (fdata->dwFileAttributes & FILE_ATTRIBUTE_DIRECTORY)
@@ -20,25 +20,16 @@ static inline void finddata2dirent(struct dirent *ent, WIN32_FIND_DATAA *fdata)

DIR *opendir(const char *name)
{
- char pattern[MAX_PATH];
- WIN32_FIND_DATAA fdata;
+ wchar_t pattern[MAX_PATH];
+ WIN32_FIND_DATAW fdata;
HANDLE h;
int len;
DIR *dir;

- /* check that name is not NULL */
- if (!name) {
- errno = EINVAL;
+ /* convert name to UTF-16, check length (-2 for '/' '*') */
+ len = utftowcs(pattern, name, MAX_PATH - 2);
+ if (len < 0)
return NULL;
- }
- /* check that the pattern won't be too long for FindFirstFileA */
- len = strlen(name);
- if (len + 2 >= MAX_PATH) {
- errno = ENAMETOOLONG;
- return NULL;
- }
- /* copy name to temp buffer */
- memcpy(pattern, name, len + 1);

/* append optional '/' and wildcard '*' */
if (len && !is_dir_sep(pattern[len - 1]))
@@ -47,7 +38,7 @@ DIR *opendir(const char *name)
pattern[len] = 0;

/* open find handle */
- h = FindFirstFileA(pattern, &fdata);
+ h = FindFirstFileW(pattern, &fdata);
if (h == INVALID_HANDLE_VALUE) {
DWORD err = GetLastError();
errno = (err == ERROR_DIRECTORY) ? ENOTDIR : err_win_to_posix(err);
@@ -72,8 +63,8 @@ struct dirent *readdir(DIR *dir)
/* if first entry, dirent has already been set up by opendir */
if (dir->dd_stat) {
/* get next entry and convert from WIN32_FIND_DATA to dirent */
- WIN32_FIND_DATAA fdata;
- if (FindNextFileA(dir->dd_handle, &fdata)) {
+ WIN32_FIND_DATAW fdata;
+ if (FindNextFileW(dir->dd_handle, &fdata)) {
finddata2dirent(&dir->dd_dir, &fdata);
} else {
DWORD lasterr = GetLastError();
diff --git a/compat/win32/dirent.h b/compat/win32/dirent.h
index 8838cd6..058207e 100644
--- a/compat/win32/dirent.h
+++ b/compat/win32/dirent.h
@@ -10,7 +10,7 @@ typedef struct DIR DIR;

struct dirent {
unsigned char d_type; /* file type to prevent lstat after readdir */
- char d_name[MAX_PATH]; /* file name */
+ char d_name[MAX_PATH * 3]; /* file name (* 3 for UTF-8 conversion) */
};

DIR *opendir(const char *dirname);
--
1.7.6.msysgit.1

Karsten Blees

unread,
Oct 6, 2011, 1:41:02 PM10/6/11
to bl...@dcon.de, msy...@googlegroups.com
Replaces Windows "ANSI" APIs dealing with file- or path names with their
Unicode equivalent, adding UTF-8/UTF-16LE conversion as necessary.

The dirent API (opendir/readdir/closedir) is updated in a separate commit.

Adds trivial wrappers for access, chmod and chdir.

Adds wrapper for mktemp (needed for both mkstemp and mkdtemp).

The simplest way to convert a repository with legacy-encoded (e.g. Cp1252)
file names to UTF-8 ist to checkout with an old msysgit version and
"git add --all & git commit" with the new version.

Signed-off-by: Karsten Blees <bl...@dcon.de>
---

compat/mingw.c | 213 +++++++++++++++++++++++++++++++++++++++-----------------
compat/mingw.h | 13 ++++
2 files changed, 161 insertions(+), 65 deletions(-)

diff --git a/compat/mingw.c b/compat/mingw.c
index ba7ac27..fe759c0 100644
--- a/compat/mingw.c
+++ b/compat/mingw.c
@@ -1,6 +1,7 @@
#include "../git-compat-util.h"
#include "win32.h"
#include <conio.h>
+#include <wchar.h>
#include "../strbuf.h"
#include "../run-command.h"
#include "../cache.h"
@@ -200,14 +201,16 @@ static int ask_yes_no_if_possible(const char *format, ...)
}
}

-#undef unlink
int mingw_unlink(const char *pathname)
{
int ret, tries = 0;
+ wchar_t wpathname[MAX_PATH];
+ if (utftowcs(wpathname, pathname, MAX_PATH) < 0)
+ return -1;

/* read-only files cannot be removed */
- chmod(pathname, 0666);
- while ((ret = unlink(pathname)) == -1 && tries < ARRAY_SIZE(delay)) {
+ _wchmod(wpathname, 0666);
+ while ((ret = _wunlink(wpathname)) == -1 && tries < ARRAY_SIZE(delay)) {
if (!is_file_in_use_error(GetLastError()))
break;
/*
@@ -223,43 +226,40 @@ int mingw_unlink(const char *pathname)
while (ret == -1 && is_file_in_use_error(GetLastError()) &&
ask_yes_no_if_possible("Unlink of file '%s' failed. "
"Should I try again?", pathname))
- ret = unlink(pathname);
+ ret = _wunlink(wpathname);
return ret;
}

-static int is_dir_empty(const char *path)
+static int is_dir_empty(const wchar_t *wpath)
{
- struct strbuf buf = STRBUF_INIT;
- WIN32_FIND_DATAA findbuf;
+ WIN32_FIND_DATAW findbuf;
HANDLE handle;
-
- strbuf_addf(&buf, "%s\\*", path);
- handle = FindFirstFileA(buf.buf, &findbuf);
- if (handle == INVALID_HANDLE_VALUE) {
- strbuf_release(&buf);
+ wchar_t wbuf[MAX_PATH + 2];
+ wcscpy(wbuf, wpath);
+ wcscat(wbuf, L"\\*");
+ handle = FindFirstFileW(wbuf, &findbuf);
+ if (handle == INVALID_HANDLE_VALUE)
return GetLastError() == ERROR_NO_MORE_FILES;
- }

- while (!strcmp(findbuf.cFileName, ".") ||
- !strcmp(findbuf.cFileName, ".."))
- if (!FindNextFile(handle, &findbuf)) {
- strbuf_release(&buf);
+ while (!wcscmp(findbuf.cFileName, L".") ||
+ !wcscmp(findbuf.cFileName, L".."))
+ if (!FindNextFileW(handle, &findbuf))
return GetLastError() == ERROR_NO_MORE_FILES;
- }
FindClose(handle);
- strbuf_release(&buf);
return 0;
}

-#undef rmdir
int mingw_rmdir(const char *pathname)
{
int ret, tries = 0;
+ wchar_t wpathname[MAX_PATH];
+ if (utftowcs(wpathname, pathname, MAX_PATH) < 0)
+ return -1;

- while ((ret = rmdir(pathname)) == -1 && tries < ARRAY_SIZE(delay)) {
+ while ((ret = _wrmdir(wpathname)) == -1 && tries < ARRAY_SIZE(delay)) {
if (!is_file_in_use_error(GetLastError()))
break;
- if (!is_dir_empty(pathname)) {
+ if (!is_dir_empty(wpathname)) {
errno = ENOTEMPTY;
break;
}
@@ -276,14 +276,14 @@ int mingw_rmdir(const char *pathname)
while (ret == -1 && is_file_in_use_error(GetLastError()) &&
ask_yes_no_if_possible("Deletion of directory '%s' failed. "
"Should I try again?", pathname))
- ret = rmdir(pathname);
+ ret = _wrmdir(wpathname);
return ret;
}

-static int make_hidden(const char *path)
+static int make_hidden(const wchar_t *path)
{
- DWORD attribs = GetFileAttributes(path);
- if (SetFileAttributes(path, FILE_ATTRIBUTE_HIDDEN | attribs))
+ DWORD attribs = GetFileAttributesW(path);
+ if (SetFileAttributesW(path, FILE_ATTRIBUTE_HIDDEN | attribs))
return 0;
errno = err_win_to_posix(GetLastError());
return -1;
@@ -291,19 +291,23 @@ static int make_hidden(const char *path)

void mingw_mark_as_git_dir(const char *dir)
{
- if (hide_dotfiles != HIDE_DOTFILES_FALSE && !is_bare_repository() &&
- make_hidden(dir))
- warning("Failed to make '%s' hidden", dir);
+ wchar_t wdir[MAX_PATH];
+ if (hide_dotfiles != HIDE_DOTFILES_FALSE && !is_bare_repository())
+ if (utftowcs(wdir, dir, MAX_PATH) < 0 || make_hidden(wdir))
+ warning("Failed to make '%s' hidden", dir);
git_config_set("core.hideDotFiles",
hide_dotfiles == HIDE_DOTFILES_FALSE ? "false" :
(hide_dotfiles == HIDE_DOTFILES_DOTGITONLY ?
"dotGitOnly" : "true"));
}

-#undef mkdir
int mingw_mkdir(const char *path, int mode)
{
- int ret = mkdir(path);
+ int ret;
+ wchar_t wpath[MAX_PATH];
+ if (utftowcs(wpath, path, MAX_PATH) < 0)
+ return -1;
+ ret = _wmkdir(wpath);
if (!ret && hide_dotfiles == HIDE_DOTFILES_TRUE) {
/*
* In Windows a file or dir starting with a dot is not
@@ -312,17 +316,17 @@ int mingw_mkdir(const char *path, int mode)
*/
const char *start = basename((char*)path);
if (*start == '.')
- return make_hidden(path);
+ return make_hidden(wpath);
}
return ret;
}

-#undef open
int mingw_open (const char *filename, int oflags, ...)
{
va_list args;
unsigned mode;
int fd;
+ wchar_t wfilename[MAX_PATH];

va_start(args, oflags);
mode = va_arg(args, int);
@@ -331,10 +335,12 @@ int mingw_open (const char *filename, int oflags, ...)
if (filename && !strcmp(filename, "/dev/null"))
filename = "nul";

- fd = open(filename, oflags, mode);
+ if (utftowcs(wfilename, filename, MAX_PATH) < 0)
+ return -1;
+ fd = _wopen(wfilename, oflags, mode);

if (fd < 0 && (oflags & O_CREAT) && errno == EACCES) {
- DWORD attrs = GetFileAttributes(filename);
+ DWORD attrs = GetFileAttributesW(wfilename);
if (attrs != INVALID_FILE_ATTRIBUTES && (attrs & FILE_ATTRIBUTE_DIRECTORY))
errno = EISDIR;
}
@@ -346,7 +352,7 @@ int mingw_open (const char *filename, int oflags, ...)
* such a file is created.
*/
const char *start = basename((char*)filename);
- if (*start == '.' && make_hidden(filename))
+ if (*start == '.' && make_hidden(wfilename))
warning("Could not mark '%s' as hidden.", filename);
}
return fd;
@@ -369,38 +375,69 @@ ssize_t mingw_write(int fd, const void *buf, size_t count)
return write(fd, buf, min(count, 31 * 1024 * 1024));
}

-#undef fopen
FILE *mingw_fopen (const char *filename, const char *otype)
{
int hide = 0;
FILE *file;
+ wchar_t wfilename[MAX_PATH], wotype[10];
if (hide_dotfiles == HIDE_DOTFILES_TRUE &&
basename((char*)filename)[0] == '.')
hide = access(filename, F_OK);
if (filename && !strcmp(filename, "/dev/null"))
filename = "nul";
- file = fopen(filename, otype);
- if (file && hide && make_hidden(filename))
+ if (utftowcs(wfilename, filename, MAX_PATH) < 0 ||
+ utftowcs(wotype, otype, 10) < 0)
+ return NULL;
+ file = _wfopen(wfilename, wotype);
+ if (file && hide && make_hidden(wfilename))
warning("Could not mark '%s' as hidden.", filename);
return file;
}

-#undef freopen
FILE *mingw_freopen (const char *filename, const char *otype, FILE *stream)
{
int hide = 0;
FILE *file;
+ wchar_t wfilename[MAX_PATH], wotype[10];
if (hide_dotfiles == HIDE_DOTFILES_TRUE &&
basename((char*)filename)[0] == '.')
hide = access(filename, F_OK);
if (filename && !strcmp(filename, "/dev/null"))
filename = "nul";
- file = freopen(filename, otype, stream);
- if (file && hide && make_hidden(filename))
+ if (utftowcs(wfilename, filename, MAX_PATH) < 0 ||
+ utftowcs(wotype, otype, 10) < 0)
+ return NULL;
+ file = _wfreopen(wfilename, wotype, stream);
+ if (file && hide && make_hidden(wfilename))
warning("Could not mark '%s' as hidden.", filename);
return file;
}

+int mingw_access(const char *filename, int mode)
+{
+ wchar_t wfilename[MAX_PATH];
+ if (utftowcs(wfilename, filename, MAX_PATH) < 0)
+ return -1;
+ /* X_OK is not supported by the MSVCRT version */
+ return _waccess(wfilename, mode & ~X_OK);
+}
+
+int mingw_chdir(const char *dirname)
+{
+ wchar_t wdirname[MAX_PATH];
+ if (utftowcs(wdirname, dirname, MAX_PATH) < 0)
+ return -1;
+ return _wchdir(wdirname);
+}
+
+int mingw_chmod(const char *filename, int mode)
+{
+ wchar_t wfilename[MAX_PATH];
+ if (utftowcs(wfilename, filename, MAX_PATH) < 0)
+ return -1;
+ return _wchmod(wfilename, mode);
+}
+
/*
* The unit of FILETIME is 100-nanoseconds since January 1, 1601, UTC.
* Returns the 100-nanoseconds ("hekto nanoseconds") since the epoch.
@@ -426,10 +463,12 @@ static inline time_t filetime_to_time_t(const FILETIME *ft)
*/
static int do_lstat(int follow, const char *file_name, struct stat *buf)
{
- int err;
WIN32_FILE_ATTRIBUTE_DATA fdata;
+ wchar_t wfilename[MAX_PATH];
+ if (utftowcs(wfilename, file_name, MAX_PATH) < 0)
+ return -1;

- if (!(err = get_file_attr(file_name, &fdata))) {
+ if (GetFileAttributesExW(wfilename, GetFileExInfoStandard, &fdata)) {
buf->st_ino = 0;
buf->st_gid = 0;
buf->st_uid = 0;
@@ -442,8 +481,8 @@ static int do_lstat(int follow, const char *file_name, struct stat *buf)
buf->st_mtime = filetime_to_time_t(&(fdata.ftLastWriteTime));
buf->st_ctime = filetime_to_time_t(&(fdata.ftCreationTime));
if (fdata.dwFileAttributes & FILE_ATTRIBUTE_REPARSE_POINT) {
- WIN32_FIND_DATAA findbuf;
- HANDLE handle = FindFirstFileA(file_name, &findbuf);
+ WIN32_FIND_DATAW findbuf;
+ HANDLE handle = FindFirstFileW(wfilename, &findbuf);
if (handle != INVALID_HANDLE_VALUE) {
if ((findbuf.dwFileAttributes & FILE_ATTRIBUTE_REPARSE_POINT) &&
(findbuf.dwReserved0 == IO_REPARSE_TAG_SYMLINK)) {
@@ -462,7 +501,23 @@ static int do_lstat(int follow, const char *file_name, struct stat *buf)
}
return 0;
}
- errno = err;
+ switch (GetLastError()) {
+ case ERROR_ACCESS_DENIED:
+ case ERROR_SHARING_VIOLATION:
+ case ERROR_LOCK_VIOLATION:
+ case ERROR_SHARING_BUFFER_EXCEEDED:
+ errno = EACCES;
+ break;
+ case ERROR_BUFFER_OVERFLOW:
+ errno = ENAMETOOLONG;
+ break;
+ case ERROR_NOT_ENOUGH_MEMORY:
+ errno = ENOMEM;
+ break;
+ default:
+ errno = ENOENT;
+ break;
+ }
return -1;
}

@@ -551,16 +606,20 @@ int mingw_utime (const char *file_name, const struct utimbuf *times)
{
FILETIME mft, aft;
int fh, rc;
+ DWORD attrs;
+ wchar_t wfilename[MAX_PATH];
+ if (utftowcs(wfilename, file_name, MAX_PATH) < 0)
+ return -1;

/* must have write permission */
- DWORD attrs = GetFileAttributes(file_name);
+ attrs = GetFileAttributesW(wfilename);
if (attrs != INVALID_FILE_ATTRIBUTES &&
(attrs & FILE_ATTRIBUTE_READONLY)) {
/* ignore errors here; open() will report them */
- SetFileAttributes(file_name, attrs & ~FILE_ATTRIBUTE_READONLY);
+ SetFileAttributesW(wfilename, attrs & ~FILE_ATTRIBUTE_READONLY);
}

- if ((fh = open(file_name, O_RDWR | O_BINARY)) < 0) {
+ if ((fh = _wopen(wfilename, O_RDWR | O_BINARY)) < 0) {
rc = -1;
goto revert_attrs;
}
@@ -583,7 +642,7 @@ revert_attrs:
if (attrs != INVALID_FILE_ATTRIBUTES &&
(attrs & FILE_ATTRIBUTE_READONLY)) {
/* ignore errors again */
- SetFileAttributes(file_name, attrs);
+ SetFileAttributesW(wfilename, attrs);
}
return rc;
}
@@ -594,6 +653,18 @@ unsigned int sleep (unsigned int seconds)
return 0;
}

+char *mingw_mktemp(char *template)
+{
+ wchar_t wtemplate[MAX_PATH];
+ if (utftowcs(wtemplate, template, MAX_PATH) < 0)
+ return NULL;
+ if (!_wmktemp(wtemplate))
+ return NULL;
+ if (wcstoutf(template, wtemplate, strlen(template) + 1) < 0)
+ return NULL;
+ return template;
+}
+
int mkstemp(char *template)
{
char *filename = mktemp(template);
@@ -652,17 +723,18 @@ struct tm *localtime_r(const time_t *timep, struct tm *result)
return result;
}

-#undef getcwd
char *mingw_getcwd(char *pointer, int len)
{
int i;
- char *ret = getcwd(pointer, len);
- if (!ret)
- return ret;
+ wchar_t wpointer[MAX_PATH];
+ if (!_wgetcwd(wpointer, MAX_PATH))
+ return NULL;
+ if (wcstoutf(pointer, wpointer, len) < 0)
+ return NULL;
for (i = 0; pointer[i]; i++)
if (pointer[i] == '\\')
pointer[i] = '/';
- return ret;
+ return pointer;
}

/*
@@ -1505,33 +1577,38 @@ int mingw_rename(const char *pold, const char *pnew)
{
DWORD attrs, gle;
int tries = 0;
+ wchar_t wpold[MAX_PATH], wpnew[MAX_PATH];
+ if (utftowcs(wpold, pold, MAX_PATH) < 0)
+ return -1;
+ if (utftowcs(wpnew, pnew, MAX_PATH) < 0)
+ return -1;

/*
* Try native rename() first to get errno right.
* It is based on MoveFile(), which cannot overwrite existing files.
*/
- if (!rename(pold, pnew))
+ if (!_wrename(wpold, wpnew))
return 0;
if (errno != EEXIST)
return -1;
repeat:
- if (MoveFileEx(pold, pnew, MOVEFILE_REPLACE_EXISTING))
+ if (MoveFileExW(wpold, wpnew, MOVEFILE_REPLACE_EXISTING))
return 0;
/* TODO: translate more errors */
gle = GetLastError();
if (gle == ERROR_ACCESS_DENIED &&
- (attrs = GetFileAttributes(pnew)) != INVALID_FILE_ATTRIBUTES) {
+ (attrs = GetFileAttributesW(wpnew)) != INVALID_FILE_ATTRIBUTES) {
if (attrs & FILE_ATTRIBUTE_DIRECTORY) {
errno = EISDIR;
return -1;
}
if ((attrs & FILE_ATTRIBUTE_READONLY) &&
- SetFileAttributes(pnew, attrs & ~FILE_ATTRIBUTE_READONLY)) {
- if (MoveFileEx(pold, pnew, MOVEFILE_REPLACE_EXISTING))
+ SetFileAttributesW(wpnew, attrs & ~FILE_ATTRIBUTE_READONLY)) {
+ if (MoveFileExW(wpold, wpnew, MOVEFILE_REPLACE_EXISTING))
return 0;
gle = GetLastError();
/* revert file attributes on failure */
- SetFileAttributes(pnew, attrs);
+ SetFileAttributesW(wpnew, attrs);
}
}
if (tries < ARRAY_SIZE(delay) && gle == ERROR_ACCESS_DENIED) {
@@ -1741,11 +1818,17 @@ void mingw_open_html(const char *unixpath)

int link(const char *oldpath, const char *newpath)
{
- typedef BOOL (WINAPI *T)(const char*, const char*, LPSECURITY_ATTRIBUTES);
+ typedef BOOL (WINAPI *T)(LPCWSTR, LPCWSTR, LPSECURITY_ATTRIBUTES);
static T create_hard_link = NULL;
+ wchar_t woldpath[MAX_PATH], wnewpath[MAX_PATH];
+ if (utftowcs(woldpath, oldpath, MAX_PATH) < 0)
+ return -1;
+ if (utftowcs(wnewpath, newpath, MAX_PATH) < 0)
+ return -1;
+
if (!create_hard_link) {
create_hard_link = (T) GetProcAddress(
- GetModuleHandle("kernel32.dll"), "CreateHardLinkA");
+ GetModuleHandle("kernel32.dll"), "CreateHardLinkW");
if (!create_hard_link)
create_hard_link = (T)-1;
}
@@ -1753,7 +1836,7 @@ int link(const char *oldpath, const char *newpath)
errno = ENOSYS;
return -1;
}
- if (!create_hard_link(newpath, oldpath, NULL)) {
+ if (!create_hard_link(wnewpath, woldpath, NULL)) {
errno = err_win_to_posix(GetLastError());
return -1;
}
diff --git a/compat/mingw.h b/compat/mingw.h
index b4ff6cd..4454978 100644
--- a/compat/mingw.h
+++ b/compat/mingw.h
@@ -181,6 +181,19 @@ FILE *mingw_fopen (const char *filename, const char *otype);
FILE *mingw_freopen (const char *filename, const char *otype, FILE *stream);
#define freopen mingw_freopen

+int mingw_access(const char *filename, int mode);
+#undef access
+#define access mingw_access
+
+int mingw_chdir(const char *dirname);
+#define chdir mingw_chdir
+
+int mingw_chmod(const char *filename, int mode);
+#define chmod mingw_chmod
+
+char *mingw_mktemp(char *template);
+#define mktemp mingw_mktemp
+
char *mingw_getcwd(char *pointer, int len);
#define getcwd mingw_getcwd

--
1.7.6.msysgit.1

Karsten Blees

unread,
Oct 6, 2011, 1:41:05 PM10/6/11
to bl...@dcon.de, msy...@googlegroups.com
Convert command line arguments from UTF-8 to UTF-16 when creating other
processes.

Signed-off-by: Karsten Blees <bl...@dcon.de>
---

compat/mingw.c | 18 ++++++++++++++----
1 files changed, 14 insertions(+), 4 deletions(-)

diff --git a/compat/mingw.c b/compat/mingw.c
index fe759c0..818a7f0 100644
--- a/compat/mingw.c
+++ b/compat/mingw.c
@@ -938,9 +938,10 @@ static pid_t mingw_spawnve_fd(const char *cmd, const char **argv, char **env,


const char *dir,
int prepend_cmd, int fhin, int fhout, int fherr)
{

- STARTUPINFO si;
+ STARTUPINFOW si;
PROCESS_INFORMATION pi;
struct strbuf envblk, args;
+ wchar_t wcmd[MAX_PATH], wdir[MAX_PATH], *wargs;
unsigned flags;
BOOL ret;

@@ -976,6 +977,11 @@ static pid_t mingw_spawnve_fd(const char *cmd, const char **argv, char **env,
si.hStdOutput = (HANDLE) _get_osfhandle(fhout);
si.hStdError = (HANDLE) _get_osfhandle(fherr);

+ if (utftowcs(wcmd, cmd, MAX_PATH) < 0)
+ return -1;
+ if (dir && utftowcs(wdir, dir, MAX_PATH) < 0)
+ return -1;
+
/* concatenate argv, quoting args as we go */
strbuf_init(&args, 0);
if (prepend_cmd) {
@@ -993,6 +999,10 @@ static pid_t mingw_spawnve_fd(const char *cmd, const char **argv, char **env,
free(quoted);
}

+ wargs = xmalloc((2 * args.len + 1) * sizeof(wchar_t));
+ utftowcs(wargs, args.buf, 2 * args.len + 1);
+ strbuf_release(&args);
+
if (env) {
int count = 0;
char **e, **sorted_env;
@@ -1014,12 +1024,12 @@ static pid_t mingw_spawnve_fd(const char *cmd, const char **argv, char **env,
}

memset(&pi, 0, sizeof(pi));
- ret = CreateProcess(cmd, args.buf, NULL, NULL, TRUE, flags,
- env ? envblk.buf : NULL, dir, &si, &pi);
+ ret = CreateProcessW(wcmd, wargs, NULL, NULL, TRUE, flags,
+ env ? envblk.buf : NULL, dir ? wdir : NULL, &si, &pi);

if (env)
strbuf_release(&envblk);
- strbuf_release(&args);
+ free(wargs);

if (!ret) {
errno = ENOENT;
--
1.7.6.msysgit.1

Karsten Blees

unread,
Oct 6, 2011, 1:41:04 PM10/6/11
to bl...@dcon.de, msy...@googlegroups.com
Assumes file names in git tree objects are UTF-8 encoded.

On most unix systems, the system encoding (and thus the TCL system
encoding) will be UTF-8, so file names will be displayed correctly.

On Windows, it is impossible to set the system encoding to UTF-8. Changing
the TCL system encoding (via 'encoding system ...', e.g. in the startup
code) is explicitly discouraged by the TCL docs.

Change gitk and git-gui functions dealing with file names to always convert
from and to UTF-8.

Signed-off-by: Karsten Blees <bl...@dcon.de>
---

git-gui/git-gui.sh | 6 +++---
git-gui/lib/browser.tcl | 2 +-
git-gui/lib/index.tcl | 6 +++---
gitk-git/gitk | 16 ++++++++--------
4 files changed, 15 insertions(+), 15 deletions(-)

diff --git a/git-gui/git-gui.sh b/git-gui/git-gui.sh
index 0d5bd34..74f10bf 100755
--- a/git-gui/git-gui.sh
+++ b/git-gui/git-gui.sh
@@ -1605,7 +1605,7 @@ proc read_diff_index {fd after} {
set i [split [string range $buf_rdi $c [expr {$z1 - 2}]] { }]
set p [string range $buf_rdi $z1 [expr {$z2 - 1}]]
merge_state \
- [encoding convertfrom $p] \
+ [encoding convertfrom utf-8 $p] \
[lindex $i 4]? \
[list [lindex $i 0] [lindex $i 2]] \
[list]
@@ -1638,7 +1638,7 @@ proc read_diff_files {fd after} {
set i [split [string range $buf_rdf $c [expr {$z1 - 2}]] { }]
set p [string range $buf_rdf $z1 [expr {$z2 - 1}]]
merge_state \
- [encoding convertfrom $p] \
+ [encoding convertfrom utf-8 $p] \
?[lindex $i 4] \
[list] \
[list [lindex $i 0] [lindex $i 2]]
@@ -1661,7 +1661,7 @@ proc read_ls_others {fd after} {
set pck [split $buf_rlo "\0"]
set buf_rlo [lindex $pck end]
foreach p [lrange $pck 0 end-1] {
- set p [encoding convertfrom $p]
+ set p [encoding convertfrom utf-8 $p]
if {[string index $p end] eq {/}} {
set p [string range $p 0 end-1]
}
diff --git a/git-gui/lib/browser.tcl b/git-gui/lib/browser.tcl
index d68b1f9..f4ee01d 100644


--- a/git-gui/lib/browser.tcl
+++ b/git-gui/lib/browser.tcl
@@ -191,7 +191,7 @@ method _ls {tree_id {name {}}} {
$w conf -state disabled

set fd [git_read ls-tree -z $tree_id]
- fconfigure $fd -blocking 0 -translation binary

+ fconfigure $fd -blocking 0 -translation binary -encoding utf-8


fileevent $fd readable [cb _read $fd]
}

diff --git a/git-gui/lib/index.tcl b/git-gui/lib/index.tcl
index 5d7bbf2..3d1d973 100644
--- a/git-gui/lib/index.tcl
+++ b/git-gui/lib/index.tcl
@@ -115,7 +115,7 @@ proc write_update_indexinfo {fd pathList totalCnt batch after} {
set info [lindex $s 2]
if {$info eq {}} continue

- puts -nonewline $fd "$info\t[encoding convertto $path]\0"
+ puts -nonewline $fd "$info\t[encoding convertto utf-8 $path]\0"
display_file $path $new
}

@@ -186,7 +186,7 @@ proc write_update_index {fd pathList totalCnt batch after} {
?M {set new M_}
?? {continue}
}
- puts -nonewline $fd "[encoding convertto $path]\0"
+ puts -nonewline $fd "[encoding convertto utf-8 $path]\0"
display_file $path $new
}

@@ -247,7 +247,7 @@ proc write_checkout_index {fd pathList totalCnt batch after} {
?M -
?T -
?D {
- puts -nonewline $fd "[encoding convertto $path]\0"
+ puts -nonewline $fd "[encoding convertto utf-8 $path]\0"
display_file $path ?_
}
}
diff --git a/gitk-git/gitk b/gitk-git/gitk
index 935ed5c..9facdbf 100755
--- a/gitk-git/gitk
+++ b/gitk-git/gitk
@@ -7251,7 +7251,7 @@ proc gettreeline {gtf id} {
if {[string index $fname 0] eq "\""} {
set fname [lindex $fname 0]
}
- set fname [encoding convertfrom $fname]
+ set fname [encoding convertfrom utf-8 $fname]
lappend treefilelist($id) $fname
}
if {![eof $gtf]} {
@@ -7474,7 +7474,7 @@ proc gettreediffline {gdtf ids} {
if {[string index $file 0] eq "\""} {
set file [lindex $file 0]
}
- set file [encoding convertfrom $file]
+ set file [encoding convertfrom utf-8 $file]
if {$file ne [lindex $treediff end]} {
lappend treediff $file
lappend sublist $file
@@ -7623,7 +7623,7 @@ proc makediffhdr {fname ids} {
global ctext curdiffstart treediffs diffencoding
global ctext_file_names jump_to_here targetline diffline

- set fname [encoding convertfrom $fname]
+ set fname [encoding convertfrom utf-8 $fname]
set diffencoding [get_path_encoding $fname]
set i [lsearch -exact $treediffs($ids) $fname]
if {$i >= 0} {
@@ -7657,7 +7657,7 @@ proc getblobdiffline {bdf ids} {
}
if {![string compare -length 5 "diff " $line]} {
if {![regexp {^diff (--cc|--git) } $line m type]} {
- set line [encoding convertfrom $line]
+ set line [encoding convertfrom utf-8 $line]
$ctext insert end "$line\n" hunksep
continue
}
@@ -7704,7 +7704,7 @@ proc getblobdiffline {bdf ids} {
makediffhdr $fname $ids

} elseif {![string compare -length 16 "* Unmerged path " $line]} {
- set fname [encoding convertfrom [string range $line 16 end]]
+ set fname [encoding convertfrom utf-8 [string range $line 16 end]]
$ctext insert end "\n"
set curdiffstart [$ctext index "end - 1c"]
lappend ctext_file_names $fname
@@ -7759,7 +7759,7 @@ proc getblobdiffline {bdf ids} {
if {[string index $fname 0] eq "\""} {
set fname [lindex $fname 0]
}
- set fname [encoding convertfrom $fname]
+ set fname [encoding convertfrom utf-8 $fname]
set i [lsearch -exact $treediffs($ids) $fname]
if {$i >= 0} {
setinlist difffilestart $i $curdiffstart
@@ -7778,7 +7778,7 @@ proc getblobdiffline {bdf ids} {
set diffinhdr 0
continue
}
- set line [encoding convertfrom $line]
+ set line [encoding convertfrom utf-8 $line]


$ctext insert end "$line\n" filesep

} else {

@@ -11361,7 +11361,7 @@ proc cache_gitattr {attr pathlist} {
foreach row [split $rlist "\n"] {
if {[regexp "(.*): $attr: (.*)" $row m path value]} {
if {[string index $path 0] eq "\""} {
- set path [encoding convertfrom [lindex $path 0]]
+ set path [encoding convertfrom utf-8 [lindex $path 0]]
}
set path_attr_cache($attr,$path) $value
}
--
1.7.6.msysgit.1

Karsten Blees

unread,
Oct 6, 2011, 1:41:06 PM10/6/11
to bl...@dcon.de, msy...@googlegroups.com
Convert command line arguments from UTF-16 to UTF-8 on startup.

Signed-off-by: Karsten Blees <bl...@dcon.de>
---

compat/mingw.c | 35 +++++++++++++++++++++++++++++++++--
1 files changed, 33 insertions(+), 2 deletions(-)

diff --git a/compat/mingw.c b/compat/mingw.c
index 818a7f0..e3cd129 100644
--- a/compat/mingw.c
+++ b/compat/mingw.c
@@ -2053,10 +2053,41 @@ int wcstoutf(char *utf, const wchar_t *wcs, size_t utflen)
*/
int _CRT_glob = 0;

+typedef struct {
+ int newmode;
+} _startupinfo;
+
+extern int __wgetmainargs(int *argc, wchar_t ***argv, wchar_t ***env, int glob,
+ _startupinfo *si);
+
void mingw_startup()
{
- /* copy executable name to argv[0] */
- __argv[0] = xstrdup(_pgmptr);
+ int i, len, maxlen, argc;
+ char *buffer;
+ wchar_t **wenv, **wargv;
+ _startupinfo si;
+
+ /* get wide char arguments and environment */
+ si.newmode = 0;
+ __wgetmainargs(&argc, &wargv, &wenv, _CRT_glob, &si);
+
+ /* determine size of argv and environ conversion buffer */
+ maxlen = wcslen(_wpgmptr);
+ for (i = 1; i < argc; i++)
+ maxlen = max(maxlen, wcslen(wargv[i]));
+
+ /* allocate buffer (wchar_t encodes to max 3 UTF-8 bytes) */
+ maxlen = 3 * maxlen + 1;
+ buffer = xmalloc(maxlen);
+
+ /* convert command line arguments and environment to UTF-8 */
+ len = wcstoutf(buffer, _wpgmptr, maxlen);
+ __argv[0] = xmemdupz(buffer, len);
+ for (i = 1; i < argc; i++) {
+ len = wcstoutf(buffer, wargv[i], maxlen);
+ __argv[i] = xmemdupz(buffer, len);
+ }
+ free(buffer);

/* initialize critical section for waitpid pinfo_t list */
InitializeCriticalSection(&pinfo_cs);
--
1.7.6.msysgit.1

Karsten Blees

unread,
Oct 6, 2011, 1:41:10 PM10/6/11
to bl...@dcon.de, msy...@googlegroups.com
On Windows, all native APIs are Unicode-based. It is impossible to pass
legacy encoded byte arrays to a process via command line or environment
variables. Disable the tests that try to do so.

In t3901, most tests still work if we don't mess up the repository encoding
in setup, so don't switch to ISO-8859-1 on MinGW.

Note that i18n tests that do their encoding tricks via encoded files (such
as t3900) are not affected by this.

Signed-off-by: Karsten Blees <bl...@dcon.de>
---

t/t3901-i18n-patch.sh | 19 +++++++++++--------
t/t4201-shortlog.sh | 6 +++---
t/t8005-blame-i18n.sh | 8 ++++----
3 files changed, 18 insertions(+), 15 deletions(-)

1.7.6.msysgit.1

Karsten Blees

unread,
Oct 6, 2011, 1:41:12 PM10/6/11
to bl...@dcon.de, msy...@googlegroups.com
The environment on Windows is case-insensitive. Some environment functions
(such as unsetenv and make_augmented_environment) have always used case-
sensitive comparisons instead, while others (getenv, putenv) were case-
insensitive. Recently, getenv has been changed to do a case-sensitive
before case-insensitive lookup to fix a problem in git's envsubst
implementation.

Prevent potential inconsistencies by using the same algorithm (case-
sensitive before case-insensitive) for all environment functions. This can
be implemented in the now central lookup_env function.

The original MSVCRT getenv (and thus, #undef getenv) is no longer needed,
direct references to mingw_getenv are no longer necessary.

Signed-off-by: Karsten Blees <bl...@dcon.de>
---

compat/mingw.c | 37 ++++++++++++++++++-------------------
1 files changed, 18 insertions(+), 19 deletions(-)

diff --git a/compat/mingw.c b/compat/mingw.c
index 93e7a58..7c8a1bc 100644


--- a/compat/mingw.c
+++ b/compat/mingw.c
@@ -181,7 +181,7 @@ static int ask_yes_no_if_possible(const char *format, ...)
vsnprintf(question, sizeof(question), format, args);
va_end(args);

- if ((retry_hook[0] = mingw_getenv("GIT_ASK_YESNO"))) {
+ if ((retry_hook[0] = getenv("GIT_ASK_YESNO"))) {
retry_hook[1] = question;
return !run_command_v_opt(retry_hook, 0);
}

@@ -836,7 +836,7 @@ static const char *parse_interpreter(const char *cmd)


*/
static char **get_path_split(void)
{
- char *p, **path, *envpath = mingw_getenv("PATH");
+ char *p, **path, *envpath = getenv("PATH");
int i, n = 0;

if (!envpath || !*envpath)

@@ -1215,13 +1215,21 @@ void free_environ(char **env)
static int lookup_env(char **env, const char *name, size_t nmln)
{
int i;
-


+ /*
+ * try case-sensitive match first, this is necessary for sh-i18n--envsubst
+ * to work with environment variables that differ only in case (e.g. $PATH
+ * and $path)
+ */

for (i = 0; env[i]; i++) {
- if (0 == strncmp(env[i], name, nmln)
- && '=' == env[i][nmln])


+ if (!strncmp(env[i], name, nmln) && '=' == env[i][nmln])

/* matches */
return i;
}
+ /* if there's no case-sensitive match, try case-insensitive instead */
+ for (i = 0; env[i]; i++) {
+ if (!strncasecmp(env[i], name, nmln) && '=' == env[i][nmln])
+ return i;
+ }
return -1;
}

@@ -1267,32 +1275,23 @@ char **make_augmented_environ(const char *const *vars)
return env;
}

-#undef getenv
-
-/*
- * The system's getenv looks up the name in a case-insensitive manner.
- * This version tries a case-sensitive lookup and falls back to
- * case-insensitive if nothing was found. This is necessary because,
- * as a prominent example, CMD sets 'Path', but not 'PATH'.
- * Warning: not thread-safe.
- */
-static char *getenv_cs(const char *name)
+static char *do_getenv(const char *name)
{
size_t len = strlen(name);
int i = lookup_env(environ, name, len);
if (i >= 0)
return environ[i] + len + 1; /* skip past name and '=' */
- return getenv(name);
+ return NULL;
}

char *mingw_getenv(const char *name)
{
- char *result = getenv_cs(name);
+ char *result = do_getenv(name);
if (!result && !strcmp(name, "TMPDIR")) {
/* on Windows it is TMP and TEMP */
- result = getenv_cs("TMP");
+ result = do_getenv("TMP");
if (!result)
- result = getenv_cs("TEMP");
+ result = do_getenv("TEMP");
}
else if (!result && !strcmp(name, "TERM")) {
/* simulate TERM to enable auto-color (see color.c) */
--
1.7.6.msysgit.1

Karsten Blees

unread,
Oct 6, 2011, 1:41:09 PM10/6/11
to bl...@dcon.de, msy...@googlegroups.com
Convert environment from UTF-16 to UTF-8 on startup.

No changes to getenv() are necessary, as the MSVCRT version is implemented
on top of char **environ.

However, putenv / _wputenv from MSVCRT no longer work, for two reasons:
1. they try to keep environ, _wenviron and the Win32 process environment
in sync, using the default system encoding instead of UTF-8 to convert
between charsets
2. msysgit and MSVCRT use different allocators, memory allocated in git
cannot be freed by the CRT and vice versa

Implement mingw_putenv using the env_setenv helper function from the
environment merge code.

Signed-off-by: Karsten Blees <bl...@dcon.de>
---

compat/mingw.c | 15 +++++++++++++++
compat/mingw.h | 2 ++
2 files changed, 17 insertions(+), 0 deletions(-)

diff --git a/compat/mingw.c b/compat/mingw.c
index b38a7b0..3702b06 100644
--- a/compat/mingw.c
+++ b/compat/mingw.c
@@ -1299,6 +1299,12 @@ char *mingw_getenv(const char *name)
return result;
}

+int mingw_putenv(const char *namevalue)
+{
+ environ = env_setenv(environ, namevalue);
+ return 0;
+}
+
/*
* Note, this isn't a complete replacement for getaddrinfo. It assumes
* that service contains a numerical port, or that it is null. It
@@ -2077,6 +2083,11 @@ void mingw_startup()
maxlen = wcslen(_wpgmptr);


for (i = 1; i < argc; i++)

maxlen = max(maxlen, wcslen(wargv[i]));

+ for (i = 0; wenv[i]; i++)
+ maxlen = max(maxlen, wcslen(wenv[i]));
+
+ /* nedmalloc can't free CRT memory, allocate resizable environment list */
+ environ = xcalloc(i + 1, sizeof(char*));



/* allocate buffer (wchar_t encodes to max 3 UTF-8 bytes) */

maxlen = 3 * maxlen + 1;

@@ -2089,6 +2100,10 @@ void mingw_startup()


len = wcstoutf(buffer, wargv[i], maxlen);

__argv[i] = xmemdupz(buffer, len);
}

+ for (i = 0; wenv[i]; i++) {
+ len = wcstoutf(buffer, wenv[i], maxlen);
+ environ[i] = xmemdupz(buffer, len);
+ }


free(buffer);

/* initialize critical section for waitpid pinfo_t list */

diff --git a/compat/mingw.h b/compat/mingw.h
index fac4634..29cbef7 100644
--- a/compat/mingw.h
+++ b/compat/mingw.h
@@ -199,6 +199,8 @@ char *mingw_getcwd(char *pointer, int len);

char *mingw_getenv(const char *name);
#define getenv mingw_getenv
+int mingw_putenv(const char *namevalue);
+#define putenv mingw_putenv

struct hostent *mingw_gethostbyname(const char *host);
#define gethostbyname mingw_gethostbyname
--
1.7.6.msysgit.1

Karsten Blees

unread,
Oct 6, 2011, 1:41:07 PM10/6/11
to bl...@dcon.de, msy...@googlegroups.com
Replace stdout and stderr with a pipe if they point to the console. A
background thread reads from the pipe, handles ANSI escape sequences and
UTF-8 to UTF-16 conversion, then writes to the console.

Global variables are either initialized on startup (single threaded) or
exclusively modified by the background thread. Threads communicate through
the pipe, no further synchronization is necessary.

Due to the byte-oriented pipe, ANSI escape sequences can no longer be
expected to arrive in one piece. Replace the string-based ansi_emulate()
with a simple stateful parser (this also fixes colored diff hunk headers,
which were broken as of commit 2efcc977).

Override isatty to return true for the pipes redirecting to the console.

Exec/spawn obtain the original console handle to pass to the next process
via winansi_get_osfhandle().

All other overrides are gone, the default stdio implementations work as
expected with the piped stdout/stderr descriptors.

Limitations: doesn't track reopened or duped file descriptors, i.e.:
- fdopen(1/2) returns fully buffered streams
- dup(1/2), dup2(1/2) returns normal pipe descriptors (i.e. isatty() =
false, winansi_get_osfhandle won't return the original console handle)

Currently, only the git-format-patch command uses xfdopen(xdup(1)) (see
"realstdout" in builtin/log.c), but works well with these limitations.

Signed-off-by: Karsten Blees <bl...@dcon.de>
---

compat/mingw.c | 9 +-
compat/mingw.h | 12 +--
compat/winansi.c | 359 +++++++++++++++++++++++++++++++++---------------------
3 files changed, 232 insertions(+), 148 deletions(-)

diff --git a/compat/mingw.c b/compat/mingw.c
index e3cd129..2f346dc 100644
--- a/compat/mingw.c
+++ b/compat/mingw.c
@@ -973,9 +973,9 @@ static pid_t mingw_spawnve_fd(const char *cmd, const char **argv, char **env,
memset(&si, 0, sizeof(si));
si.cb = sizeof(si);
si.dwFlags = STARTF_USESTDHANDLES;
- si.hStdInput = (HANDLE) _get_osfhandle(fhin);
- si.hStdOutput = (HANDLE) _get_osfhandle(fhout);
- si.hStdError = (HANDLE) _get_osfhandle(fherr);
+ si.hStdInput = winansi_get_osfhandle(fhin);
+ si.hStdOutput = winansi_get_osfhandle(fhout);
+ si.hStdError = winansi_get_osfhandle(fherr);



if (utftowcs(wcmd, cmd, MAX_PATH) < 0)

return -1;
@@ -2097,4 +2097,7 @@ void mingw_startup()
_setmode(_fileno(stdin), _O_BINARY);
_setmode(_fileno(stdout), _O_BINARY);
_setmode(_fileno(stderr), _O_BINARY);
+
+ /* initialize Unicode console */
+ winansi_init();
}
diff --git a/compat/mingw.h b/compat/mingw.h
index 4454978..fac4634 100644
--- a/compat/mingw.h
+++ b/compat/mingw.h
@@ -297,14 +297,10 @@ sig_handler_t mingw_signal(int sig, sig_handler_t handler);
* ANSI emulation wrappers
*/

-int winansi_fputs(const char *str, FILE *stream);
-int winansi_printf(const char *format, ...) __attribute__((format (printf, 1, 2)));
-int winansi_fprintf(FILE *stream, const char *format, ...) __attribute__((format (printf, 2, 3)));
-int winansi_vfprintf(FILE *stream, const char *format, va_list list);
-#define fputs winansi_fputs
-#define printf(...) winansi_printf(__VA_ARGS__)
-#define fprintf(...) winansi_fprintf(__VA_ARGS__)
-#define vfprintf winansi_vfprintf
+void winansi_init(void);
+int winansi_isatty(int fd);
+HANDLE winansi_get_osfhandle(int fd);
+#define isatty winansi_isatty

/*
* git specific compatibility
diff --git a/compat/winansi.c b/compat/winansi.c
index bf514f9..d11b532 100644
--- a/compat/winansi.c
+++ b/compat/winansi.c
@@ -4,18 +4,13 @@

#undef NOGDI
#include "../git-compat-util.h"
-#include <malloc.h>
#include <wingdi.h>
#include <winreg.h>

/*
Functions to be wrapped:
*/
-#undef printf
-#undef fprintf
-#undef fputs
-#undef vfprintf
-/* TODO: write */
+#undef isatty

/*
ANSI codes used by git: m, K
@@ -28,8 +23,10 @@ static HANDLE console;
static WORD plain_attr;
static WORD attr;
static int negative;
-static FILE *last_stream = NULL;


static int non_ascii_used = 0;

+static HANDLE hthread, hread, hwrite;
+static HANDLE hwrite1 = INVALID_HANDLE_VALUE, hwrite2 = INVALID_HANDLE_VALUE;
+static HANDLE hconsole1, hconsole2;

#ifdef __MINGW32__
typedef struct _CONSOLE_FONT_INFOEX {
@@ -74,64 +71,39 @@ static void warn_if_raster_font(void)
}
}

- if (!(fontFamily & TMPF_TRUETYPE))
- warning("Your console font probably doesn\'t support "
- "Unicode. If you experience strange characters in the output, "
- "consider switching to a TrueType font such as Lucida Console!");
+ if (!(fontFamily & TMPF_TRUETYPE)) {
+ const wchar_t *msg = L"\nWarning: Your console font probably doesn\'t "
+ L"support Unicode. If you experience strange characters in the "
+ L"output, consider switching to a TrueType font such as Lucida "
+ L"Console!\n";
+ WriteConsoleW(console, msg, wcslen(msg), NULL, NULL);
+ }
}

-static int is_console(FILE *stream)
+static int is_console(int fd)
{
CONSOLE_SCREEN_BUFFER_INFO sbi;
HANDLE hcon;

- static int initialized = 0;
-
- /* use cached value if stream hasn't changed */
- if (stream == last_stream)
- return console != NULL;
-
- last_stream = stream;
- console = NULL;
-
- /* get OS handle of the stream */
- hcon = (HANDLE) _get_osfhandle(_fileno(stream));
+ /* get OS handle of the file descriptor */
+ hcon = (HANDLE) _get_osfhandle(fd);
if (hcon == INVALID_HANDLE_VALUE)
return 0;

+ /* check if its a device (i.e. console, printer, serial port) */
+ if (GetFileType(hcon) != FILE_TYPE_CHAR)
+ return 0;
+
/* check if its a handle to a console output screen buffer */
if (!GetConsoleScreenBufferInfo(hcon, &sbi))
return 0;

- if (!initialized) {
- attr = plain_attr = sbi.wAttributes;
- negative = 0;
- initialized = 1;
- /* check console font on exit */
- atexit(warn_if_raster_font);
- }
-
- console = hcon;
+ /* initialize attributes */
+ attr = plain_attr = sbi.wAttributes;
+ negative = 0;
return 1;
}

-static int write_console(const char *str, size_t len)
-{
- /* convert utf-8 to utf-16, write directly to console */
- int wlen = MultiByteToWideChar(CP_UTF8, 0, str, len, NULL, 0);
- wchar_t *wbuf = (wchar_t *) alloca(wlen * sizeof(wchar_t));
- MultiByteToWideChar(CP_UTF8, 0, str, len, wbuf, wlen);
-
- WriteConsoleW(console, wbuf, wlen, NULL, NULL);
-
- /* remember if non-ascii characters are printed */
- if (wlen != len)
- non_ascii_used = 1;
-
- /* return original (utf-8 encoded) length */
- return len;
-}
-
#define FOREGROUND_ALL (FOREGROUND_RED | FOREGROUND_GREEN | FOREGROUND_BLUE)
#define BACKGROUND_ALL (BACKGROUND_RED | BACKGROUND_GREEN | BACKGROUND_BLUE)

@@ -175,18 +147,13 @@ static void erase_in_line(void)
&dummy);
}

-
-static const char *set_attr(const char *str)
+static void set_attr(char func, const int *params, int paramlen)
{
- const char *func;
- size_t len = strspn(str, "0123456789;");
- func = str + len;
-
- switch (*func) {
+ int i;
+ switch (func) {
case 'm':
- do {
- long val = strtol(str, (char **)&str, 10);
- switch (val) {
+ for (i = 0; i < paramlen; i++) {
+ switch (params[i]) {
case 0: /* reset */
attr = plain_attr;
negative = 0;
@@ -309,9 +276,7 @@ static const char *set_attr(const char *str)
/* Unsupported code */
break;
}
- str++;
- } while (*(str-1) == ';');
-
+ }
set_console_attr();
break;
case 'K':
@@ -321,112 +286,232 @@ static const char *set_attr(const char *str)
/* Unsupported code */
break;
}
-
- return func + 1;
}

-static int ansi_emulate(const char *str, FILE *stream)
+#define BUFFER_SIZE 4096
+#define MAX_PARAMS 16
+
+static void write_console(char *str, size_t len)
{
- int rv = 0;
- const char *pos = str;
+ /* only called from console_thread, so a static buffer will do */
+ static wchar_t wbuf[2 * BUFFER_SIZE + 1];

- fflush(stream);
+ /* convert utf-8 to utf-16 */
+ int wlen = utftowcsn(wbuf, str, 2 * BUFFER_SIZE + 1, len);

- while (*pos) {
- pos = strstr(str, "\033[");
- if (pos) {
- size_t len = pos - str;
+ /* write directly to console */
+ WriteConsoleW(console, wbuf, wlen, NULL, NULL);

- if (len) {
- size_t out_len = write_console(str, len);
- rv += out_len;
- if (out_len < len)
- return rv;
- }
+ /* remember if non-ascii characters are printed */
+ if (wlen != len)
+ non_ascii_used = 1;
+}

- str = pos + 2;
- rv += 2;
+enum {
+ TEXT = 0, ESCAPE = 033, BRACKET = '[', EXIT = -1
+};

- pos = set_attr(str);
- rv += pos - str;
- str = pos;
- } else {
- size_t len = strlen(str);
- rv += write_console(str, len);
- return rv;
+static DWORD WINAPI console_thread(LPVOID unused)
+{
+ char buffer[BUFFER_SIZE];
+ DWORD bytes;
+ int start, end, c, parampos = 0, state = TEXT;
+ int params[MAX_PARAMS];
+
+ while (state != EXIT) {
+ /* read next chunk of bytes from the pipe */
+ if (!ReadFile(hread, buffer, BUFFER_SIZE, &bytes, NULL)) {
+ /* exit if pipe has been closed */
+ if (GetLastError() == ERROR_BROKEN_PIPE)
+ break;
+ /* ignore other errors */
+ continue;
}
+
+ /* scan the bytes and handle ANSI control codes */
+ start = end = 0;
+ while (end < bytes) {
+ c = buffer[end++];
+ switch (state) {
+ case TEXT:
+ if (c == ESCAPE) {
+ /* print text seen so far */
+ if (end - 1 > start)
+ write_console(buffer + start, end - 1 - start);
+
+ /* then start parsing escape sequence */
+ start = end - 1;
+ memset(params, 0, sizeof(params));
+ parampos = 0;
+ state = ESCAPE;
+ }
+ break;
+
+ case ESCAPE:
+ /* continue if "\033[", otherwise bail out */
+ state = (c == BRACKET) ? BRACKET : TEXT;
+ break;
+
+ case BRACKET:
+ /* parse [0-9;]* into array of parameters */
+ if (c >= '0' && c <= '9') {
+ params[parampos] *= 10;
+ params[parampos] += c - '0';
+ } else if (c == ';') {
+ /* next parameter, bail out if out of bounds */
+ parampos++;
+ if (parampos >= MAX_PARAMS)
+ state = TEXT;
+ } else if (c == 'q') {
+ /* "\033[q": terminate the thread */
+ state = EXIT;
+ } else {
+ /* end of escape sequence, change console attributes */
+ set_attr(c, params, parampos + 1);
+ start = end;
+ state = TEXT;
+ }
+ break;
+ }
+ }
+
+ /* print remaining text unless we're parsing an escape sequence */
+ if (state == TEXT && end > start)
+ write_console(buffer + start, end - start);
}
- return rv;
+
+ /* check if the console font supports unicode */
+ warn_if_raster_font();
+
+ CloseHandle(hread);
+ return 0;
}

-int winansi_fputs(const char *str, FILE *stream)
+static void winansi_exit(void)
{
- int rv;
-
- if (!is_console(stream))
- return fputs(str, stream);
+ DWORD dummy;
+ /* flush all streams */
+ _flushall();
+
+ /* close the write ends of the pipes */
+ if (hwrite1 != INVALID_HANDLE_VALUE)
+ CloseHandle(hwrite1);
+ if (hwrite2 != INVALID_HANDLE_VALUE)
+ CloseHandle(hwrite2);
+
+ /* send termination sequence to the thread */
+ WriteFile(hwrite, "\033[q", 3, &dummy, NULL);
+ CloseHandle(hwrite);
+
+ /* allow the thread to copy remaining data to the console */
+ WaitForSingleObject(hthread, 1000);
+ CloseHandle(hthread);
+}

- rv = ansi_emulate(str, stream);
+static void die_lasterr(const char *fmt, ...)
+{
+ va_list params;
+ va_start(params, fmt);
+ errno = err_win_to_posix(GetLastError());
+ die_errno(fmt, params);
+ va_end(params);
+}

- if (rv >= 0)
- return 0;
- else
- return EOF;
+static HANDLE duplicate_handle(HANDLE hnd)
+{
+ HANDLE hresult, hproc = GetCurrentProcess();
+ if (!DuplicateHandle(hproc, hnd, hproc, &hresult, 0, TRUE,
+ DUPLICATE_SAME_ACCESS))
+ die_lasterr("DuplicateHandle(%li) failed", (long) hnd);
+ return hresult;
}

-int winansi_vfprintf(FILE *stream, const char *format, va_list list)
+static HANDLE redirect_console(FILE *stream, HANDLE *phcon, int new_fd)
{
- int len, rv;
- char small_buf[256];
- char *buf = small_buf;
- va_list cp;
+ /* get original console handle */
+ int fd = _fileno(stream);
+ HANDLE hcon = (HANDLE) _get_osfhandle(fd);
+ if (hcon == INVALID_HANDLE_VALUE)
+ die_errno("_get_osfhandle(%i) failed", fd);

- if (!is_console(stream))
- goto abort;
+ /* save a copy to phcon and console (used by the background thread) */
+ console = *phcon = duplicate_handle(hcon);

- va_copy(cp, list);
- len = vsnprintf(small_buf, sizeof(small_buf), format, cp);
- va_end(cp);
+ /* duplicate new_fd over fd (closes fd and associated handle (hcon)) */
+ if (_dup2(new_fd, fd))
+ die_errno("_dup2(%i, %i) failed", new_fd, fd);

- if (len > sizeof(small_buf) - 1) {
- buf = malloc(len + 1);
- if (!buf)
- goto abort;
+ /* no buffering, or stdout / stderr will be out of sync */
+ setbuf(stream, NULL);
+ return (HANDLE) _get_osfhandle(fd);
+}

- len = vsnprintf(buf, len + 1, format, list);
- }
+void winansi_init(void)
+{
+ int con1, con2, hwrite_fd;

- rv = ansi_emulate(buf, stream);
+ /* check if either stdout or stderr is a console output screen buffer */
+ con1 = is_console(1);
+ con2 = is_console(2);
+ if (!con1 && !con2)
+ return;

- if (buf != small_buf)
- free(buf);
- return rv;
+ /* create an anonymous pipe */
+ if (!CreatePipe(&hread, &hwrite, NULL, BUFFER_SIZE))
+ die_lasterr("CreatePipe failed");

-abort:
- rv = vfprintf(stream, format, list);
- return rv;
-}
+ /* start console spool thread on the pipe's read end */
+ hthread = CreateThread(NULL, 0, console_thread, NULL, 0, NULL);
+ if (hthread == INVALID_HANDLE_VALUE)
+ die_lasterr("CreateThread(console_thread) failed");

-int winansi_fprintf(FILE *stream, const char *format, ...)
-{
- va_list list;
- int rv;
+ /* schedule cleanup routine */
+ if (atexit(winansi_exit))
+ die_errno("atexit(winansi_exit) failed");

- va_start(list, format);
- rv = winansi_vfprintf(stream, format, list);
- va_end(list);
+ /* create a file descriptor for the write end of the pipe */
+ hwrite_fd = _open_osfhandle((long) duplicate_handle(hwrite), _O_BINARY);
+ if (hwrite_fd == -1)
+ die_errno("_open_osfhandle(%li) failed", (long) hwrite);

- return rv;
+ /* redirect stdout / stderr to the pipe */
+ if (con1)
+ hwrite1 = redirect_console(stdout, &hconsole1, hwrite_fd);
+ if (con2)
+ hwrite2 = redirect_console(stderr, &hconsole2, hwrite_fd);
+
+ /* close pipe file descriptor (also closes the duped hwrite) */
+ close(hwrite_fd);
}

-int winansi_printf(const char *format, ...)
+static int is_same_handle(HANDLE hnd, int fd)
{
- va_list list;
- int rv;
+ return hnd != INVALID_HANDLE_VALUE && hnd == (HANDLE) _get_osfhandle(fd);
+}

- va_start(list, format);
- rv = winansi_vfprintf(stdout, format, list);
- va_end(list);
+/*
+ * Return true if stdout / stderr is a pipe redirecting to the console.
+ */
+int winansi_isatty(int fd)
+{
+ if (fd == 1 && is_same_handle(hwrite1, 1))
+ return 1;
+ else if (fd == 2 && is_same_handle(hwrite2, 2))
+ return 1;
+ else
+ return isatty(fd);
+}

- return rv;
+/*
+ * Returns the real console handle if stdout / stderr is a pipe redirecting
+ * to the console. Allows spawn / exec to pass the console to the next process.
+ */
+HANDLE winansi_get_osfhandle(int fd)
+{
+ if (fd == 1 && is_same_handle(hwrite1, 1))
+ return hconsole1;
+ else if (fd == 2 && is_same_handle(hwrite2, 2))
+ return hconsole2;
+ else
+ return (HANDLE) _get_osfhandle(fd);
}
--
1.7.6.msysgit.1

Karsten Blees

unread,
Oct 6, 2011, 1:41:11 PM10/6/11
to bl...@dcon.de, msy...@googlegroups.com
All functions that modify the environment have memory leaks.

Disable gitunsetenv in the Makefile and use env_setenv (via mingw_putenv)
instead (this frees removed environment entries).

Move xstrdup from env_setenv to make_augmented_environ, so that
mingw_putenv no longer copies the environment entries (according to POSIX
[1], "the string [...] shall become part of the environment"). This also
fixes the memory leak in gitsetenv, which expects a POSIX compliant putenv.

[1] http://pubs.opengroup.org/onlinepubs/009695399/functions/putenv.html

Signed-off-by: Karsten Blees <bl...@dcon.de>
---

Makefile | 2 --
compat/mingw.c | 10 ++++++----
compat/mingw.h | 1 +
3 files changed, 7 insertions(+), 6 deletions(-)

diff --git a/Makefile b/Makefile
index 4594d81..71bc9a7 100644
--- a/Makefile
+++ b/Makefile
@@ -1095,7 +1095,6 @@ ifeq ($(uname_S),Windows)
NO_SYMLINK_HEAD = YesPlease
NO_IPV6 = YesPlease
NO_SETENV = YesPlease
- NO_UNSETENV = YesPlease
NO_STRCASESTR = YesPlease
NO_STRLCPY = YesPlease
NO_STRTOK_R = YesPlease
@@ -1188,7 +1187,6 @@ ifneq (,$(findstring MINGW,$(uname_S)))
NO_LIBGEN_H = YesPlease
NO_SYMLINK_HEAD = YesPlease
NO_SETENV = YesPlease
- NO_UNSETENV = YesPlease
NO_STRCASESTR = YesPlease
NO_STRLCPY = YesPlease
NO_STRTOK_R = YesPlease
diff --git a/compat/mingw.c b/compat/mingw.c
index 3702b06..93e7a58 100644
--- a/compat/mingw.c
+++ b/compat/mingw.c
@@ -1238,14 +1238,14 @@ static char **env_setenv(char **env, const char *name)
for (i = 0; env[i]; i++)
;
env = xrealloc(env, (i+2)*sizeof(*env));
- env[i] = xstrdup(name);
+ env[i] = (char*) name;
env[i+1] = NULL;
}
}
else {
free(env[i]);
if (*eq)
- env[i] = xstrdup(name);
+ env[i] = (char*) name;
else
for (; env[i]; i++)
env[i] = env[i+1];
@@ -1260,8 +1260,10 @@ char **make_augmented_environ(const char *const *vars)
{
char **env = copy_environ();

- while (*vars)
- env = env_setenv(env, *vars++);
+ while (*vars) {
+ const char *v = *vars++;
+ env = env_setenv(env, strchr(v, '=') ? xstrdup(v) : v);
+ }
return env;
}

diff --git a/compat/mingw.h b/compat/mingw.h
index 29cbef7..131d2ec 100644
--- a/compat/mingw.h
+++ b/compat/mingw.h
@@ -201,6 +201,7 @@ char *mingw_getenv(const char *name);
#define getenv mingw_getenv
int mingw_putenv(const char *namevalue);
#define putenv mingw_putenv
+#define unsetenv mingw_putenv

Karsten Blees

unread,
Oct 6, 2011, 1:41:08 PM10/6/11
to bl...@dcon.de, msy...@googlegroups.com
Convert environment from UTF-8 to UTF-16 when creating other processes.

Signed-off-by: Karsten Blees <bl...@dcon.de>
---

compat/mingw.c | 26 ++++++++++++++------------
1 files changed, 14 insertions(+), 12 deletions(-)

diff --git a/compat/mingw.c b/compat/mingw.c
index 2f346dc..b38a7b0 100644
--- a/compat/mingw.c
+++ b/compat/mingw.c
@@ -940,9 +940,9 @@ static pid_t mingw_spawnve_fd(const char *cmd, const char **argv, char **env,
{
STARTUPINFOW si;
PROCESS_INFORMATION pi;
- struct strbuf envblk, args;
- wchar_t wcmd[MAX_PATH], wdir[MAX_PATH], *wargs;
- unsigned flags;
+ struct strbuf args;
+ wchar_t wcmd[MAX_PATH], wdir[MAX_PATH], *wargs, *wenvblk = NULL;
+ unsigned flags = CREATE_UNICODE_ENVIRONMENT;
BOOL ret;

/* Determine whether or not we are associated to a console */
@@ -959,7 +959,7 @@ static pid_t mingw_spawnve_fd(const char *cmd, const char **argv, char **env,
* instead of CREATE_NO_WINDOW to make ssh
* recognize that it has no console.
*/
- flags = DETACHED_PROCESS;
+ flags |= DETACHED_PROCESS;
} else {
/* There is already a console. If we specified
* DETACHED_PROCESS here, too, Windows would
@@ -967,7 +967,6 @@ static pid_t mingw_spawnve_fd(const char *cmd, const char **argv, char **env,
* The same is true for CREATE_NO_WINDOW.
* Go figure!
*/
- flags = 0;
CloseHandle(cons);
}
memset(&si, 0, sizeof(si));
@@ -1006,6 +1005,7 @@ static pid_t mingw_spawnve_fd(const char *cmd, const char **argv, char **env,


if (env) {
int count = 0;
char **e, **sorted_env;

+ int i = 0, size = 0, envblksz = 0, envblkpos = 0;

for (e = env; *e; e++)
count++;
@@ -1015,20 +1015,22 @@ static pid_t mingw_spawnve_fd(const char *cmd, const char **argv, char **env,
memcpy(sorted_env, env, sizeof(*sorted_env) * (count + 1));
qsort(sorted_env, count, sizeof(*sorted_env), env_compare);

- strbuf_init(&envblk, 0);
- for (e = sorted_env; *e; e++) {
- strbuf_addstr(&envblk, *e);
- strbuf_addch(&envblk, '\0');
+ /* create environment block from temporary environment */
+ for (i = 0; sorted_env[i]; i++) {
+ size = 2 * strlen(sorted_env[i]) + 2;
+ ALLOC_GROW(wenvblk, (envblkpos + size) * sizeof(wchar_t), envblksz);
+ envblkpos += utftowcs(&wenvblk[envblkpos], sorted_env[i], size) + 1;
}
+ /* add final \0 terminator */
+ wenvblk[envblkpos] = 0;
free(sorted_env);


}

memset(&pi, 0, sizeof(pi));
ret = CreateProcessW(wcmd, wargs, NULL, NULL, TRUE, flags,

- env ? envblk.buf : NULL, dir ? wdir : NULL, &si, &pi);
+ wenvblk, dir ? wdir : NULL, &si, &pi);

- if (env)
- strbuf_release(&envblk);
+ free(wenvblk);
free(wargs);

if (!ret) {
--
1.7.6.msysgit.1

Karsten Blees

unread,
Oct 6, 2011, 1:41:18 PM10/6/11
to bl...@dcon.de, msy...@googlegroups.com
Move environment array reallocation from do_putenv to the respective
callers. Keep track of the environment size in a global variable. Use
ALLOC_GROW in mingw_putenv to reduce reallocations. Allocate a
sufficiently sized environment array in make_environment_block to prevent
reallocations.

Signed-off-by: Karsten Blees <bl...@dcon.de>
---

compat/mingw.c | 50 +++++++++++++++++++++++++++++---------------------
1 files changed, 29 insertions(+), 21 deletions(-)

diff --git a/compat/mingw.c b/compat/mingw.c
index aeae266..0f85f12 100644
--- a/compat/mingw.c
+++ b/compat/mingw.c
@@ -767,19 +767,19 @@ static int lookupenv(char **env, const char *name, size_t nmln)

/*
* If name contains '=', then sets the variable, otherwise it unsets it
+ * Size includes the terminating NULL. Env must have room for size + 1 entries
+ * (in case of insert). Returns the new size. Optionally frees removed entries.
*/
-static char **do_putenv(char **env, const char *name, int free_old)
+static int do_putenv(char **env, const char *name, int size, int free_old)
{
char *eq = strchrnul(name, '=');
int i = lookupenv(env, name, eq-name);

if (i < 0) {
if (*eq) {
- for (i = 0; env[i]; i++)
- ;
- env = xrealloc(env, (i+2)*sizeof(*env));
- env[i] = (char*) name;
- env[i+1] = NULL;
+ env[size - 1] = (char*) name;
+ env[size] = NULL;
+ size++;
}
}
else {
@@ -787,13 +787,20 @@ static char **do_putenv(char **env, const char *name, int free_old)


free(env[i]);
if (*eq)

env[i] = (char*) name;

- else
+ else {


for (; env[i]; i++)
env[i] = env[i+1];

+ size--;
+ }
}
- return env;
+ return size;
}

+/* used number of elements of environ array, including terminating NULL */
+static int environ_size = 0;
+/* allocated size of environ array, in bytes */
+static int environ_alloc = 0;
+


static char *do_getenv(const char *name)
{
size_t len = strlen(name);

@@ -821,7 +828,8 @@ char *mingw_getenv(const char *name)

int mingw_putenv(const char *namevalue)
{
- environ = do_putenv(environ, namevalue, 1);
+ ALLOC_GROW(environ, (environ_size + 1) * sizeof(char*), environ_alloc);
+ environ_size = do_putenv(environ, namevalue, environ_size, 1);
return 0;
}

@@ -1014,25 +1022,22 @@ static char *path_lookup(const char *cmd, char **path, int exe_only)


static wchar_t *make_environment_block(char **deltaenv)
{
wchar_t *wenvblk = NULL;

- int count = 0;
char **tmpenv;
- int i = 0, size = 0, envblksz = 0, envblkpos = 0;
+ int i = 0, size = environ_size, envblksz = 0, envblkpos = 0;

- while (environ[count])
- count++;


+ while (deltaenv && deltaenv[i])

+ i++;

- /* copy the environment */
- tmpenv = xmalloc(sizeof(*tmpenv) * (count + 1));
- memcpy(tmpenv, environ, sizeof(*tmpenv) * (count + 1));
+ /* copy the environment, leaving space for changes */
+ tmpenv = xmalloc((size + i) * sizeof(char*));
+ memcpy(tmpenv, environ, size * sizeof(char*));



/* merge supplied environment changes into the temporary environment */

for (i = 0; deltaenv && deltaenv[i]; i++)

- tmpenv = do_putenv(tmpenv, deltaenv[i], 0);


+ size = do_putenv(tmpenv, deltaenv[i], size, 0);

/* environment must be sorted */
- for (count = 0; tmpenv[count]; count++)
- ;
- qsort(tmpenv, count, sizeof(*tmpenv), compareenv);
+ qsort(tmpenv, size - 1, sizeof(char*), compareenv);



/* create environment block from temporary environment */
for (i = 0; tmpenv[i]; i++) {

@@ -2065,7 +2070,9 @@ void mingw_startup()
maxlen = max(maxlen, wcslen(wenv[i]));



/* nedmalloc can't free CRT memory, allocate resizable environment list */

- environ = xcalloc(i + 1, sizeof(char*));
+ environ = NULL;
+ environ_size = i + 1;
+ ALLOC_GROW(environ, environ_size * sizeof(char*), environ_alloc);



/* allocate buffer (wchar_t encodes to max 3 UTF-8 bytes) */
maxlen = 3 * maxlen + 1;

@@ -2082,6 +2089,7 @@ void mingw_startup()
len = wcstoutf(buffer, wenv[i], maxlen);


environ[i] = xmemdupz(buffer, len);
}

+ environ[i] = NULL;


free(buffer);

/* initialize critical section for waitpid pinfo_t list */

--
1.7.6.msysgit.1

Karsten Blees

unread,
Oct 6, 2011, 1:41:14 PM10/6/11
to bl...@dcon.de, msy...@googlegroups.com
...so that they can be reused by mingw_spawnve_fd.

Signed-off-by: Karsten Blees <bl...@dcon.de>
---

compat/mingw.c | 174 ++++++++++++++++++++++++++++----------------------------
1 files changed, 87 insertions(+), 87 deletions(-)

diff --git a/compat/mingw.c b/compat/mingw.c
index dadabeb..425fac8 100644
--- a/compat/mingw.c
+++ b/compat/mingw.c
@@ -737,6 +737,93 @@ char *mingw_getcwd(char *pointer, int len)
return pointer;
}

+static int env_compare(const void *a, const void *b)
+{
+ char *const *ea = a;
+ char *const *eb = b;
+ return strcasecmp(*ea, *eb);
+}
+
+static int lookup_env(char **env, const char *name, size_t nmln)
+{
+ int i;


+ /*
+ * try case-sensitive match first, this is necessary for sh-i18n--envsubst
+ * to work with environment variables that differ only in case (e.g. $PATH
+ * and $path)
+ */

+ for (i = 0; env[i]; i++) {


+ if (!strncmp(env[i], name, nmln) && '=' == env[i][nmln])
+ /* matches */
+ return i;
+ }

+ /* if there's no case-sensitive match, try case-insensitive instead */
+ for (i = 0; env[i]; i++) {

+ if (!strncasecmp(env[i], name, nmln) && '=' == env[i][nmln])
+ return i;
+ }


+ return -1;
+}
+

+/*
+ * If name contains '=', then sets the variable, otherwise it unsets it
+ */
+static char **env_setenv(char **env, const char *name)
+{
+ char *eq = strchrnul(name, '=');
+ int i = lookup_env(env, name, eq-name);
+
+ if (i < 0) {
+ if (*eq) {
+ for (i = 0; env[i]; i++)
+ ;
+ env = xrealloc(env, (i+2)*sizeof(*env));
+ env[i] = (char*) name;
+ env[i+1] = NULL;
+ }
+ }
+ else {
+ free(env[i]);
+ if (*eq)
+ env[i] = (char*) name;
+ else
+ for (; env[i]; i++)
+ env[i] = env[i+1];
+ }
+ return env;
+}
+


+static char *do_getenv(const char *name)

+{
+ size_t len = strlen(name);
+ int i = lookup_env(environ, name, len);
+ if (i >= 0)
+ return environ[i] + len + 1; /* skip past name and '=' */
+ return NULL;
+}
+
+char *mingw_getenv(const char *name)
+{


+ char *result = do_getenv(name);

+ if (!result && !strcmp(name, "TMPDIR")) {
+ /* on Windows it is TMP and TEMP */


+ result = do_getenv("TMP");

+ if (!result)


+ result = do_getenv("TEMP");

+ }
+ else if (!result && !strcmp(name, "TERM")) {
+ /* simulate TERM to enable auto-color (see color.c) */
+ result = "winansi";
+ }
+ return result;
+}
+


+int mingw_putenv(const char *namevalue)
+{
+ environ = env_setenv(environ, namevalue);
+ return 0;
+}
+
/*

* See http://msdn2.microsoft.com/en-us/library/17w5ykft(vs.71).aspx
* (Parsing C++ Command-Line Arguments)
@@ -919,13 +1006,6 @@ static char *path_lookup(const char *cmd, char **path, int exe_only)
return prog;
}

-static int env_compare(const void *a, const void *b)
-{
- char *const *ea = a;
- char *const *eb = b;
- return strcasecmp(*ea, *eb);
-}
-
struct pinfo_t {
struct pinfo_t *next;
pid_t pid;
@@ -1206,55 +1286,6 @@ void free_environ(char **env)
free(env);
}

-static int lookup_env(char **env, const char *name, size_t nmln)
-{
- int i;
- /*
- * try case-sensitive match first, this is necessary for sh-i18n--envsubst
- * to work with environment variables that differ only in case (e.g. $PATH
- * and $path)
- */
- for (i = 0; env[i]; i++) {
- if (!strncmp(env[i], name, nmln) && '=' == env[i][nmln])
- /* matches */
- return i;
- }
- /* if there's no case-sensitive match, try case-insensitive instead */
- for (i = 0; env[i]; i++) {
- if (!strncasecmp(env[i], name, nmln) && '=' == env[i][nmln])
- return i;
- }
- return -1;
-}
-
-/*
- * If name contains '=', then sets the variable, otherwise it unsets it
- */
-static char **env_setenv(char **env, const char *name)
-{
- char *eq = strchrnul(name, '=');
- int i = lookup_env(env, name, eq-name);
-
- if (i < 0) {
- if (*eq) {


- for (i = 0; env[i]; i++)
- ;
- env = xrealloc(env, (i+2)*sizeof(*env));
- env[i] = (char*) name;
- env[i+1] = NULL;

- }
- }
- else {
- free(env[i]);
- if (*eq)


- env[i] = (char*) name;

- else
- for (; env[i]; i++)
- env[i] = env[i+1];
- }
- return env;
-}
-
/*
* Copies global environ and adjusts variables as specified by vars.
*/
@@ -1269,37 +1300,6 @@ char **make_augmented_environ(const char *const *vars)
return env;
}

-static char *do_getenv(const char *name)
-{
- size_t len = strlen(name);
- int i = lookup_env(environ, name, len);
- if (i >= 0)
- return environ[i] + len + 1; /* skip past name and '=' */
- return NULL;
-}
-
-char *mingw_getenv(const char *name)
-{
- char *result = do_getenv(name);
- if (!result && !strcmp(name, "TMPDIR")) {
- /* on Windows it is TMP and TEMP */
- result = do_getenv("TMP");
- if (!result)
- result = do_getenv("TEMP");
- }
- else if (!result && !strcmp(name, "TERM")) {
- /* simulate TERM to enable auto-color (see color.c) */
- result = "winansi";
- }
- return result;
-}
-
-int mingw_putenv(const char *namevalue)
-{
- environ = env_setenv(environ, namevalue);
- return 0;
-}
-


/*
* Note, this isn't a complete replacement for getaddrinfo. It assumes
* that service contains a numerical port, or that it is null. It

--
1.7.6.msysgit.1

Karsten Blees

unread,
Oct 6, 2011, 1:41:15 PM10/6/11
to bl...@dcon.de, msy...@googlegroups.com
Environment helper functions use random naming ('env' prefix or suffix or
both, with or without '_'). Change to POSIX naming scheme ('env' suffix,
no '_').

Env_setenv has more in common with putenv than setenv. Change to do_putenv.

Signed-off-by: Karsten Blees <bl...@dcon.de>
---

compat/mingw.c | 16 ++++++++--------
1 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/compat/mingw.c b/compat/mingw.c
index 425fac8..8ea5fe0 100644
--- a/compat/mingw.c
+++ b/compat/mingw.c
@@ -737,14 +737,14 @@ char *mingw_getcwd(char *pointer, int len)
return pointer;
}


-static int env_compare(const void *a, const void *b)

+static int compareenv(const void *a, const void *b)
{


char *const *ea = a;

char *const *eb = b;

return strcasecmp(*ea, *eb);
}

-static int lookup_env(char **env, const char *name, size_t nmln)
+static int lookupenv(char **env, const char *name, size_t nmln)
{
int i;
/*
@@ -768,10 +768,10 @@ static int lookup_env(char **env, const char *name, size_t nmln)
/*


* If name contains '=', then sets the variable, otherwise it unsets it

*/
-static char **env_setenv(char **env, const char *name)

+static char **do_putenv(char **env, const char *name)
{


char *eq = strchrnul(name, '=');
- int i = lookup_env(env, name, eq-name);

+ int i = lookupenv(env, name, eq-name);



if (i < 0) {
if (*eq) {

@@ -796,7 +796,7 @@ static char **env_setenv(char **env, const char *name)


static char *do_getenv(const char *name)
{

size_t len = strlen(name);
- int i = lookup_env(environ, name, len);

+ int i = lookupenv(environ, name, len);
if (i >= 0)


return environ[i] + len + 1; /* skip past name and '=' */

return NULL;
@@ -820,7 +820,7 @@ char *mingw_getenv(const char *name)

int mingw_putenv(const char *namevalue)
{


- environ = env_setenv(environ, namevalue);

+ environ = do_putenv(environ, namevalue);
return 0;
}

@@ -1093,7 +1093,7 @@ static pid_t mingw_spawnve_fd(const char *cmd, const char **argv, char **env,


/* environment must be sorted */

sorted_env = xmalloc(sizeof(*sorted_env) * (count + 1));
memcpy(sorted_env, env, sizeof(*sorted_env) * (count + 1));
- qsort(sorted_env, count, sizeof(*sorted_env), env_compare);
+ qsort(sorted_env, count, sizeof(*sorted_env), compareenv);



/* create environment block from temporary environment */

for (i = 0; sorted_env[i]; i++) {
@@ -1295,7 +1295,7 @@ char **make_augmented_environ(const char *const *vars)

while (*vars) {


const char *v = *vars++;

- env = env_setenv(env, strchr(v, '=') ? xstrdup(v) : v);
+ env = do_putenv(env, strchr(v, '=') ? xstrdup(v) : v);
}
return env;
}
--
1.7.6.msysgit.1

Karsten Blees

unread,
Oct 6, 2011, 1:41:16 PM10/6/11
to bl...@dcon.de, msy...@googlegroups.com
Signed-off-by: Karsten Blees <bl...@dcon.de>
---
compat/mingw.c | 55 ++++++++++++++++++++++++++++++++-----------------------
1 files changed, 32 insertions(+), 23 deletions(-)

diff --git a/compat/mingw.c b/compat/mingw.c
index 8ea5fe0..303b661 100644
--- a/compat/mingw.c
+++ b/compat/mingw.c
@@ -1006,6 +1006,36 @@ static char *path_lookup(const char *cmd, char **path, int exe_only)
return prog;
}

+/*
+ * Create environment block suitable for CreateProcess.
+ */
+static wchar_t *make_environment_block(char **env)
+{
+ wchar_t *wenvblk = NULL;
+ int count = 0;
+ char **e, **tmpenv;
+ int i = 0, size = 0, envblksz = 0, envblkpos = 0;
+
+ for (e = env; *e; e++)
+ count++;
+
+ /* environment must be sorted */
+ tmpenv = xmalloc(sizeof(*tmpenv) * (count + 1));
+ memcpy(tmpenv, env, sizeof(*tmpenv) * (count + 1));
+ qsort(tmpenv, count, sizeof(*tmpenv), compareenv);
+
+ /* create environment block from temporary environment */
+ for (i = 0; tmpenv[i]; i++) {
+ size = 2 * strlen(tmpenv[i]) + 2;


+ ALLOC_GROW(wenvblk, (envblkpos + size) * sizeof(wchar_t), envblksz);

+ envblkpos += utftowcs(&wenvblk[envblkpos], tmpenv[i], size) + 1;


+ }
+ /* add final \0 terminator */
+ wenvblk[envblkpos] = 0;

+ free(tmpenv);
+ return wenvblk;
+}
+


struct pinfo_t {
struct pinfo_t *next;
pid_t pid;

@@ -1082,29 +1112,8 @@ static pid_t mingw_spawnve_fd(const char *cmd, const char **argv, char **env,


utftowcs(wargs, args.buf, 2 * args.len + 1);
strbuf_release(&args);

- if (env) {


- int count = 0;

- char **e, **sorted_env;
- int i = 0, size = 0, envblksz = 0, envblkpos = 0;
-
- for (e = env; *e; e++)
- count++;
-
- /* environment must be sorted */
- sorted_env = xmalloc(sizeof(*sorted_env) * (count + 1));
- memcpy(sorted_env, env, sizeof(*sorted_env) * (count + 1));
- qsort(sorted_env, count, sizeof(*sorted_env), compareenv);
-
- /* create environment block from temporary environment */
- for (i = 0; sorted_env[i]; i++) {
- size = 2 * strlen(sorted_env[i]) + 2;
- ALLOC_GROW(wenvblk, (envblkpos + size) * sizeof(wchar_t), envblksz);
- envblkpos += utftowcs(&wenvblk[envblkpos], sorted_env[i], size) + 1;
- }
- /* add final \0 terminator */
- wenvblk[envblkpos] = 0;
- free(sorted_env);
- }
+ if (env)
+ wenvblk = make_environment_block(env);



memset(&pi, 0, sizeof(pi));
ret = CreateProcessW(wcmd, wargs, NULL, NULL, TRUE, flags,

--
1.7.6.msysgit.1

Karsten Blees

unread,
Oct 6, 2011, 1:41:13 PM10/6/11
to bl...@dcon.de, msy...@googlegroups.com
The only public spawn function that needs to tweak the environment is
mingw_spawnvpe (called from start_command). Nevertheless, all internal
spawn* functions take an env parameter and needlessly pass the global
char **environ around. Remove the env parameter where it's not needed.

Signed-off-by: Karsten Blees <bl...@dcon.de>
---

compat/mingw.c | 22 ++++++++--------------
1 files changed, 8 insertions(+), 14 deletions(-)

diff --git a/compat/mingw.c b/compat/mingw.c
index 7c8a1bc..dadabeb 100644
--- a/compat/mingw.c
+++ b/compat/mingw.c
@@ -1060,10 +1060,9 @@ static pid_t mingw_spawnve_fd(const char *cmd, const char **argv, char **env,


return (pid_t)pi.dwProcessId;
}

-static pid_t mingw_spawnve(const char *cmd, const char **argv, char **env,
- int prepend_cmd)
+static pid_t mingw_spawnv(const char *cmd, const char **argv, int prepend_cmd)
{
- return mingw_spawnve_fd(cmd, argv, env, NULL, prepend_cmd, 0, 1, 2);

+ return mingw_spawnve_fd(cmd, argv, environ, NULL, prepend_cmd, 0, 1, 2);


}

pid_t mingw_spawnvpe(const char *cmd, const char **argv, char **env,

@@ -1105,7 +1104,7 @@ pid_t mingw_spawnvpe(const char *cmd, const char **argv, char **env,


return pid;
}

-static int try_shell_exec(const char *cmd, char *const *argv, char **env)
+static int try_shell_exec(const char *cmd, char *const *argv)
{
const char *interpr = parse_interpreter(cmd);
char **path;

@@ -1123,7 +1122,7 @@ static int try_shell_exec(const char *cmd, char *const *argv, char **env)


argv2 = xmalloc(sizeof(*argv) * (argc+1));
argv2[0] = (char *)cmd; /* full path to the script file */
memcpy(&argv2[1], &argv[1], sizeof(*argv) * argc);
- pid = mingw_spawnve(prog, argv2, env, 1);
+ pid = mingw_spawnv(prog, argv2, 1);
if (pid >= 0) {
int status;
if (waitpid(pid, &status, 0) < 0)

@@ -1138,13 +1137,13 @@ static int try_shell_exec(const char *cmd, char *const *argv, char **env)


return pid;
}

-static void mingw_execve(const char *cmd, char *const *argv, char *const *env)
+void mingw_execv(const char *cmd, char *const *argv)
{
/* check if git_command is a shell script */
- if (!try_shell_exec(cmd, argv, (char **)env)) {
+ if (!try_shell_exec(cmd, argv)) {
int pid, status;

- pid = mingw_spawnve(cmd, (const char **)argv, (char **)env, 0);
+ pid = mingw_spawnv(cmd, (const char **)argv, 0);
if (pid < 0)
return;
if (waitpid(pid, &status, 0) < 0)

@@ -1159,7 +1158,7 @@ void mingw_execvp(const char *cmd, char *const *argv)


char *prog = path_lookup(cmd, path, 0);

if (prog) {
- mingw_execve(prog, argv, environ);
+ mingw_execv(prog, argv);
free(prog);
} else
errno = ENOENT;

@@ -1167,11 +1166,6 @@ void mingw_execvp(const char *cmd, char *const *argv)


free_path_split(path);
}

-void mingw_execv(const char *cmd, char *const *argv)
-{
- mingw_execve(cmd, argv, environ);
-}
-
int mingw_kill(pid_t pid, int sig)
{
if (pid > 0 && sig == SIGTERM) {

--
1.7.6.msysgit.1

Karsten Blees

unread,
Oct 6, 2011, 1:41:17 PM10/6/11
to bl...@dcon.de, msy...@googlegroups.com
When spawning child processes via start_command(), the environment and all
environment entries are copied twice. First by make_augmented_environ /
copy_environ to merge with child_process.env. Then a second time by
make_environment_block to create a sorted environment block string as
required by CreateProcess.

Move the merge logic to make_environment_block so that we only need to copy
the environment once. This changes semantics of the env parameter: it now
expects a delta (such as child_process.env) rather than a full environment.
This is not a problem as the parameter is only used by start_command()
(all other callers previously passed char **environ, and now pass NULL).

The merge logic no longer xstrdup()s the environment strings, so do_putenv
must not free them. Add a parameter to distinguish this from normal putenv.

Remove the now unused make_augmented_environ / free_environ API.

Signed-off-by: Karsten Blees <bl...@dcon.de>
---

compat/mingw.c | 76 ++++++++++++++++++-------------------------------------
compat/mingw.h | 6 ----
run-command.c | 10 +------
3 files changed, 27 insertions(+), 65 deletions(-)

diff --git a/compat/mingw.c b/compat/mingw.c
index 303b661..aeae266 100644
--- a/compat/mingw.c
+++ b/compat/mingw.c
@@ -768,7 +768,7 @@ static int lookupenv(char **env, const char *name, size_t nmln)


/*
* If name contains '=', then sets the variable, otherwise it unsets it
*/

-static char **do_putenv(char **env, const char *name)
+static char **do_putenv(char **env, const char *name, int free_old)


{
char *eq = strchrnul(name, '=');

int i = lookupenv(env, name, eq-name);

@@ -783,7 +783,8 @@ static char **do_putenv(char **env, const char *name)


}
}
else {
- free(env[i]);

+ if (free_old)
+ free(env[i]);
if (*eq)


env[i] = (char*) name;

else
@@ -820,7 +821,7 @@ char *mingw_getenv(const char *name)

int mingw_putenv(const char *namevalue)
{
- environ = do_putenv(environ, namevalue);
+ environ = do_putenv(environ, namevalue, 1);
return 0;
}

@@ -1007,21 +1008,30 @@ static char *path_lookup(const char *cmd, char **path, int exe_only)
}

/*
- * Create environment block suitable for CreateProcess.
+ * Create environment block suitable for CreateProcess. Merges current
+ * process environment and the supplied environment changes.
*/
-static wchar_t *make_environment_block(char **env)
+static wchar_t *make_environment_block(char **deltaenv)
{
wchar_t *wenvblk = NULL;
int count = 0;
- char **e, **tmpenv;
+ char **tmpenv;


int i = 0, size = 0, envblksz = 0, envblkpos = 0;

- for (e = env; *e; e++)
+ while (environ[count])
count++;



- /* environment must be sorted */

+ /* copy the environment */


tmpenv = xmalloc(sizeof(*tmpenv) * (count + 1));

- memcpy(tmpenv, env, sizeof(*tmpenv) * (count + 1));
+ memcpy(tmpenv, environ, sizeof(*tmpenv) * (count + 1));
+
+ /* merge supplied environment changes into the temporary environment */
+ for (i = 0; deltaenv && deltaenv[i]; i++)
+ tmpenv = do_putenv(tmpenv, deltaenv[i], 0);


+
+ /* environment must be sorted */

+ for (count = 0; tmpenv[count]; count++)


+ ;
qsort(tmpenv, count, sizeof(*tmpenv), compareenv);

/* create environment block from temporary environment */

@@ -1044,7 +1054,7 @@ struct pinfo_t {
struct pinfo_t *pinfo = NULL;
CRITICAL_SECTION pinfo_cs;

-static pid_t mingw_spawnve_fd(const char *cmd, const char **argv, char **env,
+static pid_t mingw_spawnve_fd(const char *cmd, const char **argv, char **deltaenv,


const char *dir,
int prepend_cmd, int fhin, int fhout, int fherr)
{

@@ -1112,8 +1122,7 @@ static pid_t mingw_spawnve_fd(const char *cmd, const char **argv, char **env,


utftowcs(wargs, args.buf, 2 * args.len + 1);
strbuf_release(&args);

- if (env)

- wenvblk = make_environment_block(env);
+ wenvblk = make_environment_block(deltaenv);



memset(&pi, 0, sizeof(pi));
ret = CreateProcessW(wcmd, wargs, NULL, NULL, TRUE, flags,

@@ -1151,10 +1160,10 @@ static pid_t mingw_spawnve_fd(const char *cmd, const char **argv, char **env,



static pid_t mingw_spawnv(const char *cmd, const char **argv, int prepend_cmd)
{

- return mingw_spawnve_fd(cmd, argv, environ, NULL, prepend_cmd, 0, 1, 2);
+ return mingw_spawnve_fd(cmd, argv, NULL, NULL, prepend_cmd, 0, 1, 2);
}

-pid_t mingw_spawnvpe(const char *cmd, const char **argv, char **env,
+pid_t mingw_spawnvpe(const char *cmd, const char **argv, char **deltaenv,
const char *dir,


int fhin, int fhout, int fherr)
{

@@ -1178,14 +1187,14 @@ pid_t mingw_spawnvpe(const char *cmd, const char **argv, char **env,
pid = -1;
}
else {
- pid = mingw_spawnve_fd(iprog, argv, env, dir, 1,
+ pid = mingw_spawnve_fd(iprog, argv, deltaenv, dir, 1,
fhin, fhout, fherr);
free(iprog);
}
argv[0] = argv0;
}
else
- pid = mingw_spawnve_fd(prog, argv, env, dir, 0,
+ pid = mingw_spawnve_fd(prog, argv, deltaenv, dir, 0,
fhin, fhout, fherr);
free(prog);
}
@@ -1274,41 +1283,6 @@ int mingw_kill(pid_t pid, int sig)
return -1;
}

-static char **copy_environ(void)
-{
- char **env;
- int i = 0;
- while (environ[i])
- i++;
- env = xmalloc((i+1)*sizeof(*env));
- for (i = 0; environ[i]; i++)
- env[i] = xstrdup(environ[i]);
- env[i] = NULL;


- return env;
-}
-

-void free_environ(char **env)
-{
- int i;
- for (i = 0; env[i]; i++)
- free(env[i]);
- free(env);
-}
-
-/*
- * Copies global environ and adjusts variables as specified by vars.
- */
-char **make_augmented_environ(const char *const *vars)
-{
- char **env = copy_environ();
-
- while (*vars) {
- const char *v = *vars++;
- env = do_putenv(env, strchr(v, '=') ? xstrdup(v) : v);


- }
- return env;
-}
-
/*

* Note, this isn't a complete replacement for getaddrinfo. It assumes
* that service contains a numerical port, or that it is null. It

diff --git a/compat/mingw.h b/compat/mingw.h
index 131d2ec..3913b34 100644
--- a/compat/mingw.h
+++ b/compat/mingw.h
@@ -331,12 +331,6 @@ void mingw_open_html(const char *path);
void mingw_mark_as_git_dir(const char *dir);
#define mark_as_git_dir mingw_mark_as_git_dir

-/*
- * helpers
- */
-
-char **make_augmented_environ(const char *const *vars);
-void free_environ(char **env);

/**


* Converts UTF-8 encoded string to UTF-16LE.

diff --git a/run-command.c b/run-command.c
index a8be1b1..112489e 100644
--- a/run-command.c
+++ b/run-command.c
@@ -316,7 +316,6 @@ fail_pipe:
{
int fhin = 0, fhout = 1, fherr = 2;
const char **sargv = cmd->argv;
- char **env = environ;

if (cmd->no_stdin)
fhin = open("/dev/null", O_RDWR);
@@ -341,23 +340,18 @@ fail_pipe:
else if (cmd->out > 1)
fhout = dup(cmd->out);

- if (cmd->env)
- env = make_augmented_environ(cmd->env);
-
if (cmd->git_cmd) {
cmd->argv = prepare_git_cmd(cmd->argv);
} else if (cmd->use_shell) {
cmd->argv = prepare_shell_cmd(cmd->argv);
}

- cmd->pid = mingw_spawnvpe(cmd->argv[0], cmd->argv, env, cmd->dir,
- fhin, fhout, fherr);
+ cmd->pid = mingw_spawnvpe(cmd->argv[0], cmd->argv, (char**) cmd->env,
+ cmd->dir, fhin, fhout, fherr);
failed_errno = errno;
if (cmd->pid < 0 && (!cmd->silent_exec_failure || errno != ENOENT))
error("cannot spawn %s: %s", cmd->argv[0], strerror(errno));

- if (cmd->env)
- free_environ(env);
if (cmd->git_cmd)
free(cmd->argv);

--
1.7.6.msysgit.1

Karsten Blees

unread,
Oct 6, 2011, 1:41:19 PM10/6/11
to bl...@dcon.de, msy...@googlegroups.com
The Windows environment is sorted, keep it that way for O(log n)
environment access.

Change compareenv to compare only the keys, so that it can be used to
find an entry irrespective of the value.

Change lookupenv to binary seach for an entry. Return one's complement of
the insert position if not found (libc's bsearch return's NULL). If a
case-insensitive match is found, try to find a better case-sensitive match
around that entry.

Change do_putenv to insert new entries at the correct position. Simplify
the function by swapping if conditions and using memmove instead of for
loops.

Move qsort from make_environment_block to mingw_startup. We still need to
sort on startup to make sure that the environment is sorted according to
our compareenv function (while Win32 / CreateProcess requires the
environment block to be sorted case-insensitively, CreateProcess currently
doesn't enforce this, and some applications such as bash just don't care).

Signed-off-by: Karsten Blees <bl...@dcon.de>
---

compat/mingw.c | 110 ++++++++++++++++++++++++++++++++++++--------------------
1 files changed, 71 insertions(+), 39 deletions(-)

diff --git a/compat/mingw.c b/compat/mingw.c
index 0f85f12..dc905bb 100644
--- a/compat/mingw.c
+++ b/compat/mingw.c
@@ -737,32 +737,65 @@ char *mingw_getcwd(char *pointer, int len)
return pointer;
}

-static int compareenv(const void *a, const void *b)
+/*
+ * Compare environment entries by key (i.e. stopping at '=' or '\0').
+ */
+static int compareenv(const void *v1, const void *v2)
{
- char *const *ea = a;
- char *const *eb = b;
- return strcasecmp(*ea, *eb);
+ const char *e1 = *(const char**)v1;
+ const char *e2 = *(const char**)v2;
+
+ for (;;) {
+ int c1 = *e1++;
+ int c2 = *e2++;
+ c1 = (c1 == '=') ? 0 : tolower(c1);
+ c2 = (c2 == '=') ? 0 : tolower(c2);
+ if (c1 > c2)
+ return 1;
+ if (c1 < c2)
+ return -1;
+ if (c1 == 0)
+ return 0;
+ }
}

-static int lookupenv(char **env, const char *name, size_t nmln)
+static int bsearchenv(char **env, const char *name, size_t size)
{
- int i;
+ unsigned low = 0, high = size - 1, mid, i, j, nmln;
+ int cmp;
+ while (low <= high) {
+ mid = (low + high) >> 1;
+ cmp = compareenv(&env[mid], &name);
+ if (cmp < 0)
+ low = mid + 1;
+ else if (cmp > 0)
+ high = mid - 1;
+ else
+ goto found;
+ }
+ return ~low; /* not found, return 1's complement of insert position */
+
+found:
+ /* look for duplicate entries before and after the matching entry (mid) */
+ i = j = mid;
+ while (i > low && !compareenv(&env[i - 1], &name))
+ i--;
+ while (j < high && !compareenv(&env[j + 1], &name))
+ j++;
+ if (i == j)
+ return mid; /* no duplicates */
+
/*


* try case-sensitive match first, this is necessary for sh-i18n--envsubst

* to work with environment variables that differ only in case (e.g. $PATH

* and $path)
*/
- for (i = 0; env[i]; i++) {
+ nmln = strlen(name);
+ for (; i <= j; i++)


if (!strncmp(env[i], name, nmln) && '=' == env[i][nmln])

/* matches */


return i;
- }
- /* if there's no case-sensitive match, try case-insensitive instead */

- for (i = 0; env[i]; i++) {

- if (!strncasecmp(env[i], name, nmln) && '=' == env[i][nmln])
- return i;
- }
- return -1;

+ return mid;
}

/*
@@ -772,26 +805,24 @@ static int lookupenv(char **env, const char *name, size_t nmln)
*/
static int do_putenv(char **env, const char *name, int size, int free_old)
{
- char *eq = strchrnul(name, '=');
- int i = lookupenv(env, name, eq-name);
+ int i = bsearchenv(env, name, size - 1);
+
+ /* optionally free removed / replaced entry */
+ if (i >= 0 && free_old)
+ free(env[i]);



- if (i < 0) {
- if (*eq) {

- env[size - 1] = (char*) name;
- env[size] = NULL;
+ if (strchr(name, '=')) {
+ /* if new value ('key=value') is specified, insert or replace entry */
+ if (i < 0) {
+ i = ~i;
+ memmove(&env[i + 1], &env[i], (size - i) * sizeof(char*));
size++;
}
- }
- else {
- if (free_old)
- free(env[i]);


- if (*eq)
- env[i] = (char*) name;
- else {

- for (; env[i]; i++)
- env[i] = env[i+1];
- size--;
- }
+ env[i] = (char*) name;
+ } else if (i >= 0) {
+ /* otherwise ('key') remove existing entry */
+ size--;
+ memmove(&env[i], &env[i + 1], (size - i) * sizeof(char*));
}
return size;
}
@@ -803,11 +834,12 @@ static int environ_alloc = 0;



static char *do_getenv(const char *name)
{

- size_t len = strlen(name);
- int i = lookupenv(environ, name, len);


- if (i >= 0)

- return environ[i] + len + 1; /* skip past name and '=' */
- return NULL;
+ char *value;
+ int pos = bsearchenv(environ, name, environ_size - 1);
+ if (pos < 0)
+ return NULL;
+ value = strchr(environ[pos], '=');
+ return value ? &value[1] : NULL;
}

char *mingw_getenv(const char *name)
@@ -1036,9 +1068,6 @@ static wchar_t *make_environment_block(char **deltaenv)


for (i = 0; deltaenv && deltaenv[i]; i++)

size = do_putenv(tmpenv, deltaenv[i], size, 0);



- /* environment must be sorted */

- qsort(tmpenv, size - 1, sizeof(char*), compareenv);
-


/* create environment block from temporary environment */

for (i = 0; tmpenv[i]; i++) {

size = 2 * strlen(tmpenv[i]) + 2;

@@ -2092,6 +2121,9 @@ void mingw_startup()


environ[i] = NULL;
free(buffer);

+ /* sort environment for O(log n) getenv / putenv */
+ qsort(environ, i, sizeof(char*), compareenv);
+


/* initialize critical section for waitpid pinfo_t list */

InitializeCriticalSection(&pinfo_cs);

--
1.7.6.msysgit.1

Karsten Blees

unread,
Oct 6, 2011, 1:41:20 PM10/6/11
to bl...@dcon.de, msy...@googlegroups.com
...rather than checking for special values on every getenv call.

Signed-off-by: Karsten Blees <bl...@dcon.de>
---

compat/mingw.c | 33 ++++++++++++++++-----------------
1 files changed, 16 insertions(+), 17 deletions(-)

diff --git a/compat/mingw.c b/compat/mingw.c
index dc905bb..77ea5d9 100644
--- a/compat/mingw.c
+++ b/compat/mingw.c
@@ -832,7 +832,7 @@ static int environ_size = 0;


/* allocated size of environ array, in bytes */

static int environ_alloc = 0;

-static char *do_getenv(const char *name)

+char *mingw_getenv(const char *name)
{
char *value;


int pos = bsearchenv(environ, name, environ_size - 1);

@@ -842,22 +842,6 @@ static char *do_getenv(const char *name)


return value ? &value[1] : NULL;
}

-char *mingw_getenv(const char *name)
-{
- char *result = do_getenv(name);
- if (!result && !strcmp(name, "TMPDIR")) {
- /* on Windows it is TMP and TEMP */
- result = do_getenv("TMP");
- if (!result)
- result = do_getenv("TEMP");
- }
- else if (!result && !strcmp(name, "TERM")) {
- /* simulate TERM to enable auto-color (see color.c) */
- result = "winansi";
- }
- return result;
-}
-

int mingw_putenv(const char *namevalue)
{


ALLOC_GROW(environ, (environ_size + 1) * sizeof(char*), environ_alloc);

@@ -2124,6 +2108,21 @@ void mingw_startup()


/* sort environment for O(log n) getenv / putenv */

qsort(environ, i, sizeof(char*), compareenv);

+ /* fix Windows specific environment settings */
+


+ /* on Windows it is TMP and TEMP */

+ if (!getenv("TMPDIR")) {
+ const char *tmp = getenv("TMP");
+ if (!tmp)
+ tmp = getenv("TEMP");
+ if (tmp)
+ setenv("TMPDIR", tmp, 1);
+ }
+


+ /* simulate TERM to enable auto-color (see color.c) */

+ if (!getenv("TERM"))
+ setenv("TERM", "winansi", 1);

Kirill

unread,
Oct 6, 2011, 11:51:10 PM10/6/11
to Karsten Blees, msy...@googlegroups.com
Karsten,

Frankly, I only briefly looked at your series, but...
NO MATTER what others say, this is EXTREMELY IMPRESSIVE! Although, I'm not
affected by international home dir names and all-languages-mixed file names
in my source trees, I GREATLY appreciate your effort.

Wish you BEST OF LUCK with the series!

--
Kirill.


Jörg Rosenkranz

unread,
Oct 7, 2011, 2:42:21 AM10/7/11
to Karsten Blees, msy...@googlegroups.com
Sorry, this should have gone to the list too:

2011/10/6 Karsten Blees <bl...@dcon.de>:
> So far I haven't encountered any problems with the Unicode-aware msys.dll (http://groups.google.com/group/msysgit/browse_thread/thread/981da260b4fcb1df), and there was no feedback from the mailing list either (this could be good or bad :-).

To give you some positive feedback: I'm using your last Unicode
installer (Git-1.7.5-unicode-20110720) for some weeks now and haven't
encountered any problems yet.

Thanks for the effort!
Joerg.

Johannes Schindelin

unread,
Oct 7, 2011, 4:35:08 PM10/7/11
to Karsten Blees, msy...@googlegroups.com
Hi Karsten,

On Thu, 6 Oct 2011, Karsten Blees wrote:

> I just pushed v10 of the Unicode patches for msysgit to
> http://repo.or.cz/w/git/mingw/4msysgit/kblees.git.

Thanks for being so persistent with this issue.

For starters, however, I would like to integrate your work on "less"
first. Could you make that a branch in your repository, too?

Thanks!
Dscho

karste...@dcon.de

unread,
Oct 7, 2011, 9:02:15 PM10/7/11
to Johannes Schindelin, Karsten Blees, msy...@googlegroups.com
http://repo.or.cz/w/git/mingw/4msysgit/kblees.git.

Not exactly, as it's a git fork...just gave birth to kblees/msysgit on github instead:

less-444:
https://github.com/kblees/msysgit/commits/kb/msys/less
Unicode-aware msys.dll: https://github.com/kblees/msysgit/commits/kb/msys/unicode

I haven't tested this in isolation, though. I believe less and the msys.dll console handler need some common understanding of the current cursor position for scrolling to work correctly, and teaching UTF-8 to the msys.dll console handler is a major part of that patch...

However, updating the msys.dll without applying the git unicode patches as well will break some tests, starting with basics such as t0050-filesystem.

You could probably try the new less without the LESSCHARSET=utf-8 setting.

Ciao,
Karsten


OT: I'm still a bit confused about branches in the msysgit repo...I think "devel" is the development environment for git and other mingw-based stuff, while "msys" is for customized msys tools, with binaries cherry-picked to devel, is that basically correct? It gave me quite a headache that some msys stuff (such as src/rt) is also in the devel branch, but won't build there at all...

Konstantin Khomoutov

unread,
Oct 9, 2011, 11:32:36 AM10/9/11
to msy...@googlegroups.com, Karsten Blees
Hi!

I have just tested your 1.7.6 installer with integrated Unicode patches
on Russian Windows XP Pro SP3 and would like to report on my experience.

What's presented below, were done in a regular cmd.exe window.

I first created a test repo with a simple all-ASCII pathname
(C:\tmp\foo) and created a file with simple Cyrillic name "тест.txt"
("тест" stands for "test" in Russian).
`git add .` went OK, but display presented with `git status` and
`git ls-files` showed \xxx escapes for Cyrillic letters in the file
listed (I think that were escapes for UTF-8 bytes comprising the file
name in fact). So I googled the core.quotepath configuration option and
set it to false, now the filename was shown in some garbled encoding but
Git hinted me to set a Unicode font on the console, so I set Lucida.
That finally fixed displaying but for some reason did not prevented Git
from nagging me about my console supposedly not supporting Unicode
(setting Lucida "for all windows of the same type", that is, globally,
did not fix the problem).

Committing the added file went OK, but `git show` afterwards barked at me:

C:\tmp\foo>git show
WARNING: terminal is not fully functional
- (press RETURN)

Hitting return presented the diff where the filename appeared OK.
I have a feeling this might be a problem with less, not Git.

git-gui and gitk showed no problems displaying the added file name.
I then played with the repository a bit, renaming the file to another
Cyrillic name etc, verifying each stage with git-gui and gitk--all went
OK except for that non-unicode console and not fully-functional
terminal nagging.

I then logged in with a user with cyrillic username ("Администратор")
and created another repo--this time with the repo directory itself named
using Cyrillic letters.
Git had no apparent problems setting the user.email and user.name
configuration variables globally (via `git config --add --global ...`)
which are stored somewhere under %USERPROFILE% as I understand this.

Then I created a file in a subdirectory of the repo, with both the
subdirectory and the file named using Cyrillic letters.
Again, no problems beyond those described above with the git itself.
But then I tried to fire git-gui from inside the repo directory and it
failed to start with this error message:

---------------------------
Отсутствует рабочий каталог C:/tmp/директория:

couldn't change working directory to "C:/tmp/директория": no such file or directory
---------------------------

The first error, translated from Russian, means
"Missing working directory C:/tmp/...:"

The name of the directory looks rather typical to me: it's an UTF-8
string interpreted as containing characters in Windows-1251.
I did not look at the code yet but supposedly git-gui somehow asks git
about its working directory and it returns an UTF-8 encoded pathname.
If this pathname is read using a pipe from git's stdout, it may well be
interpreted as being in Windows-1251 as this is the standard Tcl's
behaviour on Windows--to assume the standard handles of spawned
processes operate with the currently active codepage.

gitk worked without any apparent problems.

The only remaining issue which manifested itself in all my experiments
is that dreaded commit encoding problem. I have gvim.exe (version 7.3)
as my $EDITOR and it assumes all files it opens on Windows are encoded
using the currently active codepage (cp1251, again).
Hence two problems:
* Filenames in the commit message file are displayed botched.
* If nothing is done to fix whis, like forced setting of the output
file's encoding to be UTF-8, commit message will be composed in the
system's codepage and will appear unreadable in both git's output and
its GUI tools.

I failed to verify the behaviour of notepad.exe with regard to commit
encodings because the commit message is generated with sole LFs as
line ending markers and so notepad shows it as one long string.


Some sort of summary follows.

So, here are slight technical glitches and some conceptual issues.

Techincal glitches are:
* Reporting about non-Unicode console while it has Unicode font set.
* Reporting something about broken terminal (less problem?).
* git-gui failing to run from within a directory with non-ASCII
characters in its name.
I beleive these can be fixed.

Now onto conceptual issues.
I take Subversion as being the gold standard in the field of supporting
system's native language on Windows among free VCS tools--with it,
filenames and commit messages "just work" in the eyes of a typical
inexperienced developer, both in untweaked console window and "in the GUI"
(everyone uses TortoiseSVN). Please note that's not too subjective:
every Windows developer I discussed the issue of supporting native
language in [D]VCS tools on Windows with agrees on Subversion as being
the only one doing it right (well, may be it's actually the only thing
it does right).

Compared to Subversion, Git for Windows with Unicode patch has these
issues:
* Console window has to be tweaked.
* core.quotepath config variable has to be set to false.
* The user has to somehow force their text editor to interpret and save
the commit message as being composed in UTF-8 or resort to using
i18n.*encoding variables.

I don't know if these issues can be solved at all to match the
experience developers get with Subversion but I thought they are
worth summarising here anyway.

All in all, this patch series considerably improves the current
situation, thanks Karsten and everyone involved!

(And I'll try to do more extensive testing between Windows and Linux
machines.)

karste...@dcon.de

unread,
Oct 9, 2011, 2:26:37 PM10/9/11
to Konstantin Khomoutov, Karsten Blees, msy...@googlegroups.com

Konstantin Khomoutov <flat...@users.sourceforge.net> wrote on 09.10.2011 17:32:36:

> Hi!
>
> I have just tested your 1.7.6 installer with integrated Unicode patches
> on Russian Windows XP Pro SP3 and would like to report on my experience.
>


Thanks for testing and the detailed feedback!

> C:\tmp\foo>git show
> WARNING: terminal is not fully functional
> -  (press RETURN)
>

That's a less warning if the terminal is lacking some capability, set TERM=cygwin or msys in ControlPanel. I had the same problem after installing StrawberryPerl, this added a global TERM=dumb setting.

> But then I tried to fire git-gui from inside the repo directory and it
> failed to start with this error message:
>
> ---------------------------
> Отсутствует рабочий каталог C:/tmp/директория:
>
> couldn't change working directory to "C:/tmp/директория":
> no such file or directory
> ---------------------------
>


I cannot reproduce that, git-gui works fine for me with non-ASCII user names. Perhaps that user's TEMP setting is broken?

> The only remaining issue which manifested itself in all my experiments
> is that dreaded commit encoding problem.  I have gvim.exe (version 7.3)
> as my $EDITOR and it assumes all files it opens on Windows are encoded
> using the currently active codepage (cp1251, again).
> Hence two problems:
> * Filenames in the commit message file are displayed botched.
> * If nothing is done to fix whis, like forced setting of the output
>   file's encoding to be UTF-8, commit message will be composed in the
>   system's codepage and will appear unreadable in both git's output and
>   its GUI tools.
>


I don't use [g]vim if I can help it, but the vim wiki suggests ":e ++enc=utf-8"...there's probably a .[g]vimrc to set the default encoding, I don't know.

> I failed to verify the behaviour of notepad.exe with regard to commit
> encodings because the commit message is generated with sole LFs as
> line ending markers and so notepad shows it as one long string.
>


Notepad.exe saves UTF-8 with BOM (and of course CRLF), so it's not really an option. I'm using Notepad++ as general purpose editor, that supports UTF-8 without BOM and LF-only line breaks (never tried it with git though).

> Techincal glitches are:
> * Reporting about non-Unicode console while it has Unicode font set.


On Windows XP, you need to set the console font of the default console window to a TrueType font (not the Git Bash window). That's because on XP there is no API to check the font, we have to rely on the registry (HKEY_CURRENT_USER/Console/FontFamily).

> Compared to Subversion, Git for Windows with Unicode patch has these
> issues:
> * Console window has to be tweaked.


I bet SVN has the exact same problems with fonts, as it's inherent to the Windows console. This may not be apparent as long as you stay within your OEM charset, but trying to list e.g. a russian filename on my german Windows box will definitely not work with a raster font.

> * core.quotepath config variable has to be set to false.


Core.quotepath=on is the default even on Unix, probably for good reason. So maybe this is an issue for the git mailing list?

> * The user has to somehow force their text editor to interpret and save
>   the commit message as being composed in UTF-8 or resort to using
>   i18n.*encoding variables.
>


Setting i18n.commitencoding will prevent specifying commit messages on the command line, and probably has other issues (e.g. mixing utf-8 user name with cp125x commit message). This should be fixed on the editor side I think.

karste...@dcon.de

unread,
Oct 9, 2011, 3:04:43 PM10/9/11
to Karsten Blees, bl...@dcon.de, msy...@googlegroups.com

I found a bug in the new case-sensitive bsearchenv code: 'name' can be a "key=value" pair from putenv, but we only want to compare the key, so it should be:
@@ -790,7 +790,7 @@ found:
         * to work with environment variables that differ only in case (e.g. $PATH
         * and $path)
         */
-       nmln = strlen(name);
+       nmln = strchrnul(name, '=') - name;
        for (; i <= j; i++)
                if (!strncmp(env[i], name, nmln) && '=' == env[i][nmln])
                        /* matches */

The Unicode patches with this fix, rebased to tentative (i.e. v1.7.7) are here:
http://repo.or.cz/w/git/mingw/4msysgit/kblees.git/kb/unicode-v11

And the git installer for this version (Unicode msys.dll, less-444, git v1.7.7 with unicode-v11 patches):
https://docs.google.com/uc?id=0BxXUoUg2r8ftYWQ2NGY2MTItY2E2ZC00NTk5LThiNTEtZDMzOTA1YmFmN2Fl&export=download

$ /share/msysGit/run-tests.sh
These tests failed:
t1020-subdirectory
t1402-check-ref-format
t5510-fetch
t5701-clone-local
t5704-bundle
t9001-send-email
t9300-fast-import
t9901-git-web--browse
Unfinished tests:
t9200-git-cvsexportcommit



I've also put together a small Git for Windows Unicode Howto. As the Wiki is still down, I'll post it here...

= Git for Windows Unicode Howto =

== Migrating old Git for Windows repositories ==

This is only relevant if you used non-ASCII file names with previous Git for Windows versions (V1.7.6 and earlier).

Previous Git for Windows versions stored file names in the default encoding of the originating Windows system, making these repositories incompatible with other Windows language-versions and other Git versions (including Cygwin-git and JGit / EGit on Windows).

The Unicode-enabled Git for Windows stores file names UTF-8 encoded.

=== Migration with previous Git for Windows version available ===

The simplest way to convert old repositories is by keeping an old Git for Windows version around:

# With the old Git for Windows version: check out a completely clean state of the working copy (so <tt>git status</tt> reports nothing, not even untracked files). This can be achieved by manually deleting everything except the .git directory, and then doing a <tt>git reset --hard</tt>.
# With the new Git for Windows version: <tt>git status</tt> with the new version should now report non-ASCII file names as deleted (with mangled file name) as well as untracked (with correct file name). Add the changes to the repository with <tt>git add --all & git commit -m "UTF-8 conversion"</tt>.
# Repeat for every branch of interest

=== Migration without previous Git for Windows available ===

This requires renaming all non-ASCII file names manually.

# <tt>git status</tt> will report non-ASCII file names as deleted (with mangled names) as well as untracked (also with mangled names). Fix the file names in the working copy.
# Add the changes to the repository, e.g. with <tt>git add --all & git commit -m "UTF-8 conversion"</tt>.
# Repeat for every branch of interest

=== Convert config files ===

Git config files with non-ASCII content need to be converted to UTF-8, for example your name in %HOME%/.gitconfig, or .gitattributes / .gitignore / .gitmodules files pointing to non-ASCII paths.

== Settings ==

=== Windows settings ===

==== Console font (per user) ====

The default console font does not support Unicode. Change the console font to a TrueType font such as Lucida Console or Consolas. The setup program does this automatically, but only for the installing user.

=== Git settings ===

These can be set per user (with the --global option) or per repository, the repository settings take precedence.

==== Disable quoted file names ====

By default, git will print non-ASCII file names in quoted octal notation, i.e. "\nnn\nnn...". This can be disabled with

 git config [--global] core.quotepath off

==== Disable commit message transcoding ====

Previous Git for Windows required to set the <tt>i18n.logoutputencoding</tt> to your Windows system's default OEM encoding for proper console output of non-ASCII commit messages. This is no longer necessary. Remove this or set it to 'utf-8':

 git config [--global] i18n.logoutputencoding utf-8

The i18n.commitencoding setting should also be removed or set to 'utf-8' to support commit messages on the command line (<tt>git commit -m "..."</tt> from cmd.exe, MSYS bash won't let you enter non-ASCII characters):

 git config [--global] i18n.commitencoding utf-8

If you want to use a custom text editor to enter commit messages, find one that supports Unix line breaks (LF only) and can save UTF-8 without BOM (so Windows notepad.exe is a bad choice). The default MSYS vim won't let you enter non-ASCII characters.

Erik Faye-Lund

unread,
Oct 10, 2011, 10:10:21 AM10/10/11
to Karsten Blees, msy...@googlegroups.com
On Thu, Oct 6, 2011 at 7:40 PM, Karsten Blees <bl...@dcon.de> wrote:
> MSVC warns about implicit casts from const** to void* (which should be OK,
> as void* indicates the content is treated as opaque). E.g.
>
> const char *x;
> realloc(x, ...); // warning C4090 due to const** -> void*
>
> As there's not that much MSVC-only code, the 'real' const to non-const
> casts that we want to know about will be caught by GCC builds anyway, so
> disable this warning.
>
> MSVC link.exe doesn't recognize the /wd option in CFLAGS in the Makefile,
> so use #pragma warning in central, MSVC-specific unistd.h.
>

I think compat/msvc.h is the correct place to put this, not in our
placement unistd.h.

Erik Faye-Lund

unread,
Oct 10, 2011, 10:15:52 AM10/10/11
to Karsten Blees, msy...@googlegroups.com
On Thu, Oct 6, 2011 at 7:40 PM, Karsten Blees <bl...@dcon.de> wrote:
> Git code sometimes uses uninitialized variables to initialize themselves
> (i.e. 'type varname = varname;'). According to some commit messages, this
> is supposedly to suppress the 'using uninitialized variable' warning
> (which is quite paradox). GCC currently doesn't seem to catch this, but
> MSVC does. Initialize with some arbitrary value instead to silence the
> compilers.

While I'm in favor of doing these kind of changes, patches like this
have caused some controversy upstream earlier. I'm not entirely sure
what the preferred solution upstream is anymore, though.

The problem with doing changes like this is that it increase the delta
against the upstream without much gain. Can we do something like what
the previous patch did instead, i.e "#pragma warning(disable : XXXX)"?

Erik Faye-Lund

unread,
Oct 10, 2011, 10:20:52 AM10/10/11
to Karsten Blees, msy...@googlegroups.com
On Thu, Oct 6, 2011 at 7:40 PM, Karsten Blees <bl...@dcon.de> wrote:
> Include io.h from unistd.h, as in MinGW, so that mktemp is available.

mktemp should be defined by stdlib.h, not unistd.h:
http://pubs.opengroup.org/onlinepubs/7908799/xsh/mktemp.html

But perhaps we don't support overriding headers that exist in the
Windows SDK? If that is the case, I think this would be better to
place in compat/msvc.h...

Erik Faye-Lund

unread,
Oct 10, 2011, 10:26:51 AM10/10/11
to Karsten Blees, msy...@googlegroups.com
On Thu, Oct 6, 2011 at 7:40 PM, Karsten Blees <bl...@dcon.de> wrote:
> Dynamic linking is generally preferred over static linking, and MSVCRT.dll
> has been integral part of Windows for a long time.
>
> This also fixes linker warnings for _malloc and _free in zlib.lib, which
> seems to be compiled for MSVCRT.dll already.
>
> The DLL version also exports some of the CRT initialization functions,
> which are hidden in the static libcmt.lib (e.g. __wgetmainargs, required by
> subsequent Unicode patches).

>
> Signed-off-by: Karsten Blees <bl...@dcon.de>
> ---
>  Makefile |    6 +++---
>  1 files changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/Makefile b/Makefile
> index 234d0af..4594d81 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -1133,16 +1133,16 @@ ifeq ($(uname_S),Windows)
>                compat/win32/pthread.o compat/win32/syslog.o \
>                compat/win32/sys/poll.o compat/win32/dirent.o
>        COMPAT_CFLAGS = -D__USE_MINGW_ACCESS -DNOGDI -DHAVE_STRING_H -DHAVE_ALLOCA_H -Icompat -Icompat/regex -Icompat/win32 -DSTRIP_EXTENSION=\".exe\"
> -       BASIC_LDFLAGS = -IGNORE:4217 -IGNORE:4049 -NOLOGO -SUBSYSTEM:CONSOLE -NODEFAULTLIB:MSVCRT.lib
> +       BASIC_LDFLAGS = -IGNORE:4217 -IGNORE:4049 -NOLOGO -SUBSYSTEM:CONSOLE
>        EXTLIBS = user32.lib advapi32.lib shell32.lib wininet.lib ws2_32.lib
>        PTHREAD_LIBS =
>        lib =
>  ifndef DEBUG
> -       BASIC_CFLAGS += -GL -Os -MT
> +       BASIC_CFLAGS += -GL -Os -MD
>        BASIC_LDFLAGS += -LTCG
>        AR += -LTCG
>  else
> -       BASIC_CFLAGS += -Zi -MTd
> +       BASIC_CFLAGS += -Zi -MDd
>  endif
>        X = .exe
>  endif
> --
> 1.7.6.msysgit.1
>
>

Looks good to me. But if we're splitting out an MSVC topic, then this
should probably be included in that ;)

Erik Faye-Lund

unread,
Oct 10, 2011, 10:31:49 AM10/10/11
to Karsten Blees, msy...@googlegroups.com
On Thu, Oct 6, 2011 at 7:41 PM, Karsten Blees <bl...@dcon.de> wrote:
> Replaces Windows "ANSI" APIs dealing with file- or path names with their
> Unicode equivalent, adding UTF-8/UTF-16LE conversion as necessary.
>
> The dirent API (opendir/readdir/closedir) is updated in a separate commit.
>

Does splitting this step in two produce a Git for Windows that works?
If not, it'll be unfortunate for bisection later on...

Johannes Schindelin

unread,
Oct 10, 2011, 10:46:54 AM10/10/11
to Erik Faye-Lund, Karsten Blees, msy...@googlegroups.com
Hi,

I'd prefer the #pragma approach, too. It is my hope that at least someone
regularly uses the valgrind mode I worked so hard on, so we should be
reasonably sure that we're more clever than the compiler regarding those
"uninitialized" values.

And I guess that the reasoning is that initializing values to rogue values
does not buy us anything except maybe a performance hit.

Ciao,
Dscho

Erik Faye-Lund

unread,
Oct 10, 2011, 10:49:43 AM10/10/11
to Karsten Blees, msy...@googlegroups.com
On Thu, Oct 6, 2011 at 7:41 PM, Karsten Blees <bl...@dcon.de> wrote:
> Add Unicode conversion functions to convert between Windows native UTF-16LE
> encoding to UTF-8 and back.
>
> To support repositories with legacy-encoded file names, the UTF-8 to UTF-16
> conversion function tries to create valid, unique file names even for
> invalid UTF-8 byte sequences, so that these repositories can be checked out
> without error.
>
> The current implementation leaves invalid UTF-8 bytes in range 0xa0 - 0xff
> as is (producing printable Unicode chars \u00a0 - \u00ff, equivalent to
> ISO-8859-1), and converts 0x80 - 0x9f to hex-code (\u0080 - \u009f are
> control chars).
>
> The Windows MultiByteToWideChar API was not used as it either drops invalid
> UTF-8 sequences (on Win2k/XP; producing non-unique or even empty file
> names) or converts them to the replacement char \ufffd (Vista/7; causing
> ERROR_INVALID_NAME in subsequent calls to file system APIs).

>
> Signed-off-by: Karsten Blees <bl...@dcon.de>
> ---
>  compat/mingw.c |   85 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  compat/mingw.h |   77 ++++++++++++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 162 insertions(+), 0 deletions(-)
>
> diff --git a/compat/mingw.c b/compat/mingw.c
> index 57d41dd..ba7ac27 100644
> --- a/compat/mingw.c
> +++ b/compat/mingw.c
> @@ -1869,6 +1869,91 @@ int mingw_offset_1st_component(const char *path)
>        return offset + is_dir_sep(path[offset]);
>  }
>
> +int utftowcsn(wchar_t *wcs, const char *utfs, size_t wcslen, int utflen)
> +{
> +       int upos = 0, wpos = 0;
> +       const unsigned char *utf = (const unsigned char*) utfs;
> +       if (!utf || !wcs || wcslen < 1) {
> +               errno = EINVAL;
> +               return -1;
> +       }
> +       /* reserve space for \0 */
> +       wcslen--;
> +       if (utflen < 0)
> +               utflen = INT_MAX;
> +
> +       while (upos < utflen) {
> +               int c = utf[upos++] & 0xff;
> +               if (utflen == INT_MAX && c == 0)
> +                       break;
> +
> +               if (wpos >= wcslen) {
> +                       wcs[wpos] = 0;
> +                       errno = ENAMETOOLONG;
> +                       return -1;

You have earlier[1] said that this function has the semantics of
wcstombs. But this part isn't. wcstombs does not apply filename
semantics to buffer sizes. In fact[2] it does not even error if the
string is too long (but it does return the same value as passed in as
'n', which can be used to tell us that we overflowed the buffer). So I
think it's better to modify this so that it behaves like wcstombs.

[1]: http://markmail.org/message/f7ckzoopqafywj3k
[2]: http://pubs.opengroup.org/onlinepubs/009604599/functions/wcstombs.html

Erik Faye-Lund

unread,
Oct 10, 2011, 10:55:17 AM10/10/11
to Karsten Blees, msy...@googlegroups.com
On Thu, Oct 6, 2011 at 7:41 PM, Karsten Blees <bl...@dcon.de> wrote:
> Assumes file names in git tree objects are UTF-8 encoded.
>
> On most unix systems, the system encoding (and thus the TCL system
> encoding) will be UTF-8, so file names will be displayed correctly.
>
> On Windows, it is impossible to set the system encoding to UTF-8. Changing
> the TCL system encoding (via 'encoding system ...', e.g. in the startup
> code) is explicitly discouraged by the TCL docs.
>
> Change gitk and git-gui functions dealing with file names to always convert
> from and to UTF-8.
>
> Signed-off-by: Karsten Blees <bl...@dcon.de>

This reminds me of "[PATCH v10 07/28] gitk: fix file name encoding in
diff hunk headers"... is there a reason why this is split in two
patches (or not coming directly after each other, at least)?

Erik Faye-Lund

unread,
Oct 10, 2011, 11:02:14 AM10/10/11
to Karsten Blees, msy...@googlegroups.com
On Thu, Oct 6, 2011 at 7:41 PM, Karsten Blees <bl...@dcon.de> wrote:
> Convert command line arguments from UTF-16 to UTF-8 on startup.

When doing input and output in two different patches, don't we end up
breaking locale-dependent repos for a little while (i.e where one git
tool gets a UTF-8 filename when it expects a Windows 1252 filename
from some other git tool)? It's probably not a big deal, but it'd be
nice if people who have issues with Windows 1252 repos could safely
bisect their issues past these commits also...

Erik Faye-Lund

unread,
Oct 10, 2011, 11:05:54 AM10/10/11
to Karsten Blees, msy...@googlegroups.com
On Thu, Oct 6, 2011 at 7:41 PM, Karsten Blees <bl...@dcon.de> wrote:
> Replace stdout and stderr with a pipe if they point to the console. A
> background thread reads from the pipe, handles ANSI escape sequences and
> UTF-8 to UTF-16 conversion, then writes to the console.
>
> Global variables are either initialized on startup (single threaded) or
> exclusively modified by the background thread. Threads communicate through
> the pipe, no further synchronization is necessary.
>
> Due to the byte-oriented pipe, ANSI escape sequences can no longer be
> expected to arrive in one piece. Replace the string-based ansi_emulate()
> with a simple stateful parser (this also fixes colored diff hunk headers,
> which were broken as of commit 2efcc977).
>
> Override isatty to return true for the pipes redirecting to the console.
>
> Exec/spawn obtain the original console handle to pass to the next process
> via winansi_get_osfhandle().
>
> All other overrides are gone, the default stdio implementations work as
> expected with the piped stdout/stderr descriptors.
>
> Limitations: doesn't track reopened or duped file descriptors, i.e.:
> - fdopen(1/2) returns fully buffered streams
> - dup(1/2), dup2(1/2) returns normal pipe descriptors (i.e. isatty() =
>  false, winansi_get_osfhandle won't return the original console handle)
>
> Currently, only the git-format-patch command uses xfdopen(xdup(1)) (see
> "realstdout" in builtin/log.c), but works well with these limitations.
>
> Signed-off-by: Karsten Blees <bl...@dcon.de>

This commit message doesn't quite work for me; it describes WHAT the
patch does, but not WHY. The what-part is already covered pretty well
by the code; can you please describe the problem it solves instead of
how it solves some problem I don't know what is? :)

Erik Faye-Lund

unread,
Oct 10, 2011, 11:09:57 AM10/10/11
to Karsten Blees, msy...@googlegroups.com
On Thu, Oct 6, 2011 at 7:41 PM, Karsten Blees <bl...@dcon.de> wrote:
> Convert environment from UTF-16 to UTF-8 on startup.
>
> No changes to getenv() are necessary, as the MSVCRT version is implemented
> on top of char **environ.
>
> However, putenv / _wputenv from MSVCRT no longer work, for two reasons:
> 1. they try to keep environ, _wenviron and the Win32 process environment
> in sync, using the default system encoding instead of UTF-8 to convert
> between charsets
> 2. msysgit and MSVCRT use different allocators, memory allocated in git
> cannot be freed by the CRT and vice versa
>
> Implement mingw_putenv using the env_setenv helper function from the
> environment merge code.
>
> Signed-off-by: Karsten Blees <bl...@dcon.de>

Again, I suspect that we might end up breaking Windows 1252 encoded
repos here for a single revision. Squash the incoming/outgoing
patches, perhaps?

Erik Faye-Lund

unread,
Oct 10, 2011, 11:26:16 AM10/10/11
to Karsten Blees, msy...@googlegroups.com
On Thu, Oct 6, 2011 at 7:41 PM, Karsten Blees <bl...@dcon.de> wrote:
> On Windows, all native APIs are Unicode-based. It is impossible to pass
> legacy encoded byte arrays to a process via command line or environment
> variables. Disable the tests that try to do so.

So, the earlier patches didn't update the test suite as they went
along and broke stuff that used to work? If so, that's not how we do
it: this makes it impossible to bisect breakages in the test-suite.

But I don't get why this is needed. Surely it was possible to pass the
environment variables as 8859-1 earlier, since these tests used to
work. What changed? Is this change needed after MSYS was patched to do
Unicode? In that case, I doubt the MSYS patch is correct; if people on
unix can pass 8859-1 encoded environment variables from a
shell-script, we should be able to do so too, no?

If I understand the tests correctly it's about respecting
i18n.commitencoding, which is something I expected this series to help
us do BETTER, not worse...

Erik Faye-Lund

unread,
Oct 10, 2011, 11:30:59 AM10/10/11
to Karsten Blees, msy...@googlegroups.com
On Thu, Oct 6, 2011 at 7:41 PM, Karsten Blees <bl...@dcon.de> wrote:
> All functions that modify the environment have memory leaks.
>
> Disable gitunsetenv in the Makefile and use env_setenv (via mingw_putenv)
> instead (this frees removed environment entries).
>
> Move xstrdup from env_setenv to make_augmented_environ, so that
> mingw_putenv no longer copies the environment entries (according to POSIX
> [1], "the string [...] shall become part of the environment"). This also
> fixes the memory leak in gitsetenv, which expects a POSIX compliant putenv.
>
> [1] http://pubs.opengroup.org/onlinepubs/009695399/functions/putenv.html
>
> Signed-off-by: Karsten Blees <bl...@dcon.de>

This is a bugfix for 'devel', and not strictly speaking Unicode
related, no? If so, I would have expected it to be earlier in the
series...