I just pushed v10 of the Unicode patches for msysgit to http://repo.or.cz/w/git/mingw/4msysgit/kblees.git.
A git installer that bundles the Unicode msys.dll, less-444 and git v1.7.6 with unicode-v10 patches is available here: https://docs.google.com/uc?id=0BxXUoUg2r8ftMjZlZDBjYjktNTE2Yi00NjExLTk0MzctMTMwZjQzNDkyNmZl&export=download
So far I haven't encountered any problems with the Unicode-aware msys.dll (http://groups.google.com/group/msysgit/browse_thread/thread/981da260b4fcb1df), and there was no feedback from the mailing list either (this could be good or bad :-).
Ciao,
Karsten
Branches of interest:
kb/unicode-v9-full-2cbf7794
(http://repo.or.cz/w/git/mingw/4msysgit/kblees.git/kb/unicode-v9-full-2cbf7794)
This is v9 rebased to current devel, with all conflicts resolved in favor of v9 (for reference).
kb/unicode-v10-minimal
(http://repo.or.cz/w/git/mingw/4msysgit/kblees.git/kb/unicode-v10-minimal)
Unicode support with minimal environment changes.
Changes since kb/unicode-v9-minimal:
- removed patches that have already been merged
- utftowcsn: fixed some advanced checks by using unsigned char * for the UTF-8 string
- reenabled t5505
- changed tests to work with Unicode-aware msys.dll / bash
kb/unicode-v10
(http://repo.or.cz/w/git/mingw/4msysgit/kblees.git/kb/unicode-v10)
Additional environment features. The first two of these patches I consider important fixes (memory leaks and case-sensitivity), the rest is nice to have.
Changes since kb/unicode-v9-full:
- reordered the environment patches to cleanup first, then add features
- removed unnecessary char **env parameter in some spawn functions
- bsearchenv: added case-sensitive matching introduced in "Windows: teach getenv to do a case-sensitive search" (df599e96)
Karsten Blees (28) ('+': new/rewritten in v10, '!': modified):
MSVC: disable const qualifier compiler warnings
MSVC: don't use uninitialized variables
MSVC: include <io.h> for mktemp
MSVC: fix winansi.c compile errors
MSVC: link dynamically to the CRT
git-gui: fix encoding in git-gui file browser
gitk: fix file name encoding in diff hunk headers
+ Revert "Disable test on MinGW that challenges its bash quoting"
! Win32: add Unicode conversion functions
Win32: Unicode file name support (except dirent)
Win32: Unicode file name support (dirent)
Unicode file name support (gitk and git-gui)
Win32: Unicode arguments (outgoing)
Win32: Unicode arguments (incoming)
Win32: Thread-safe windows console output
Win32: Unicode environment (outgoing)
Win32: Unicode environment (incoming)
+ (kb/unicode-v10-minimal) MinGW: disable legacy encoding tests
+ Win32: fix environment memory leaks
+ Win32: unify environment case-sensitivity
+ Win32: simplify internal mingw_spawn* APIs
Win32: move environment functions
! Win32: unify environment function names
! Win32: move environment block creation to a helper method
! Win32: don't copy the environment twice when spawning child processes
Win32: reduce environment array reallocations
! Win32: keep the environment sorted
(kb/unicode-v10) Win32: patch Windows environment on startup
Makefile | 8 +-
builtin/cat-file.c | 2 +-
builtin/fast-export.c | 2 +-
builtin/rev-list.c | 2 +-
compat/mingw.c | 706 ++++++++++++++++++++++++++-------------
compat/mingw.h | 107 ++++++-
compat/vcbuild/include/unistd.h | 5 +
compat/win32/dirent.c | 31 +--
compat/win32/dirent.h | 2 +-
compat/winansi.c | 366 +++++++++++++--------
fast-import.c | 8 +-
git-gui/git-gui.sh | 6 +-
git-gui/lib/browser.tcl | 2 +-
git-gui/lib/index.tcl | 6 +-
gitk-git/gitk | 15 +-
match-trees.c | 12 +-
merge-recursive.c | 2 +-
run-command.c | 12 +-
submodule.c | 2 +-
t/t3901-i18n-patch.sh | 19 +-
t/t4201-shortlog.sh | 6 +-
t/t5505-remote.sh | 5 +-
t/t8005-blame-i18n.sh | 8 +-
transport.c | 2 +-
wt-status.c | 2 +-
25 files changed, 872 insertions(+), 466 deletions(-)
---
And the diff between v9-full and v10...
---
$ git diff -p --stat kb/unicode-v9-full-2cbf7794..kb/unicode-v10
compat/mingw.c | 83 +++++++++++++++++++++++++++++--------------------
t/t3901-i18n-patch.sh | 19 ++++++-----
t/t4201-shortlog.sh | 6 ++--
t/t5505-remote.sh | 5 +--
t/t8005-blame-i18n.sh | 8 ++--
5 files changed, 68 insertions(+), 53 deletions(-)
diff --git a/compat/mingw.c b/compat/mingw.c
index 51cde10..77ea5d9 100644
--- a/compat/mingw.c
+++ b/compat/mingw.c
@@ -181,7 +181,7 @@ static int ask_yes_no_if_possible(const char *format, ...)
vsnprintf(question, sizeof(question), format, args);
va_end(args);
- if ((retry_hook[0] = mingw_getenv("GIT_ASK_YESNO"))) {
+ if ((retry_hook[0] = getenv("GIT_ASK_YESNO"))) {
retry_hook[1] = question;
return !run_command_v_opt(retry_hook, 0);
}
@@ -761,18 +761,41 @@ static int compareenv(const void *v1, const void *v2)
static int bsearchenv(char **env, const char *name, size_t size)
{
- unsigned low = 0, high = size - 1;
+ unsigned low = 0, high = size - 1, mid, i, j, nmln;
+ int cmp;
while (low <= high) {
- unsigned mid = (low + high) >> 1;
- int cmp = compareenv(&env[mid], &name);
+ mid = (low + high) >> 1;
+ cmp = compareenv(&env[mid], &name);
if (cmp < 0)
low = mid + 1;
else if (cmp > 0)
high = mid - 1;
else
- return mid; /* found */
+ goto found;
}
return ~low; /* not found, return 1's complement of insert position */
+
+found:
+ /* look for duplicate entries before and after the matching entry (mid) */
+ i = j = mid;
+ while (i > low && !compareenv(&env[i - 1], &name))
+ i--;
+ while (j < high && !compareenv(&env[j + 1], &name))
+ j++;
+ if (i == j)
+ return mid; /* no duplicates */
+
+ /*
+ * try case-sensitive match first, this is necessary for sh-i18n--envsubst
+ * to work with environment variables that differ only in case (e.g. $PATH
+ * and $path)
+ */
+ nmln = strlen(name);
+ for (; i <= j; i++)
+ if (!strncmp(env[i], name, nmln) && '=' == env[i][nmln])
+ /* matches */
+ return i;
+ return mid;
}
/*
@@ -925,7 +948,7 @@ static const char *parse_interpreter(const char *cmd)
*/
static char **get_path_split(void)
{
- char *p, **path, *envpath = mingw_getenv("PATH");
+ char *p, **path, *envpath = getenv("PATH");
int i, n = 0;
if (!envpath || !*envpath)
@@ -1012,13 +1035,13 @@ static char *path_lookup(const char *cmd, char **path, int exe_only)
* Create environment block suitable for CreateProcess. Merges current
* process environment and the supplied environment changes.
*/
-static wchar_t *make_environment_block(char **env)
+static wchar_t *make_environment_block(char **deltaenv)
{
wchar_t *wenvblk = NULL;
char **tmpenv;
int i = 0, size = environ_size, envblksz = 0, envblkpos = 0;
- while (env && env[i])
+ while (deltaenv && deltaenv[i])
i++;
/* copy the environment, leaving space for changes */
@@ -1026,8 +1049,8 @@ static wchar_t *make_environment_block(char **env)
memcpy(tmpenv, environ, size * sizeof(char*));
/* merge supplied environment changes into the temporary environment */
- for (i = 0; env && env[i]; i++)
- size = do_putenv(tmpenv, env[i], size, 0);
+ for (i = 0; deltaenv && deltaenv[i]; i++)
+ size = do_putenv(tmpenv, deltaenv[i], size, 0);
/* create environment block from temporary environment */
for (i = 0; tmpenv[i]; i++) {
@@ -1049,7 +1072,7 @@ struct pinfo_t {
struct pinfo_t *pinfo = NULL;
CRITICAL_SECTION pinfo_cs;
-static pid_t mingw_spawnve_fd(const char *cmd, const char **argv, char **env,
+static pid_t mingw_spawnve_fd(const char *cmd, const char **argv, char **deltaenv,
const char *dir,
int prepend_cmd, int fhin, int fhout, int fherr)
{
@@ -1117,10 +1140,7 @@ static pid_t mingw_spawnve_fd(const char *cmd, const char **argv, char **env,
utftowcs(wargs, args.buf, 2 * args.len + 1);
strbuf_release(&args);
- if (env == environ)
- env = NULL;
-
- wenvblk = make_environment_block(env);
+ wenvblk = make_environment_block(deltaenv);
memset(&pi, 0, sizeof(pi));
ret = CreateProcessW(wcmd, wargs, NULL, NULL, TRUE, flags,
@@ -1156,13 +1176,12 @@ static pid_t mingw_spawnve_fd(const char *cmd, const char **argv, char **env,
return (pid_t)pi.dwProcessId;
}
-static pid_t mingw_spawnve(const char *cmd, const char **argv, char **env,
- int prepend_cmd)
+static pid_t mingw_spawnv(const char *cmd, const char **argv, int prepend_cmd)
{
- return mingw_spawnve_fd(cmd, argv, env, NULL, prepend_cmd, 0, 1, 2);
+ return mingw_spawnve_fd(cmd, argv, NULL, NULL, prepend_cmd, 0, 1, 2);
}
-pid_t mingw_spawnvpe(const char *cmd, const char **argv, char **env,
+pid_t mingw_spawnvpe(const char *cmd, const char **argv, char **deltaenv,
const char *dir,
int fhin, int fhout, int fherr)
{
@@ -1186,14 +1205,14 @@ pid_t mingw_spawnvpe(const char *cmd, const char **argv, char **env,
pid = -1;
}
else {
- pid = mingw_spawnve_fd(iprog, argv, env, dir, 1,
+ pid = mingw_spawnve_fd(iprog, argv, deltaenv, dir, 1,
fhin, fhout, fherr);
free(iprog);
}
argv[0] = argv0;
}
else
- pid = mingw_spawnve_fd(prog, argv, env, dir, 0,
+ pid = mingw_spawnve_fd(prog, argv, deltaenv, dir, 0,
fhin, fhout, fherr);
free(prog);
}
@@ -1201,7 +1220,7 @@ pid_t mingw_spawnvpe(const char *cmd, const char **argv, char **env,
return pid;
}
-static int try_shell_exec(const char *cmd, char *const *argv, char **env)
+static int try_shell_exec(const char *cmd, char *const *argv)
{
const char *interpr = parse_interpreter(cmd);
char **path;
@@ -1219,7 +1238,7 @@ static int try_shell_exec(const char *cmd, char *const *argv, char **env)
argv2 = xmalloc(sizeof(*argv) * (argc+1));
argv2[0] = (char *)cmd; /* full path to the script file */
memcpy(&argv2[1], &argv[1], sizeof(*argv) * argc);
- pid = mingw_spawnve(prog, argv2, env, 1);
+ pid = mingw_spawnv(prog, argv2, 1);
if (pid >= 0) {
int status;
if (waitpid(pid, &status, 0) < 0)
@@ -1234,13 +1253,13 @@ static int try_shell_exec(const char *cmd, char *const *argv, char **env)
return pid;
}
-static void mingw_execve(const char *cmd, char *const *argv, char *const *env)
+void mingw_execv(const char *cmd, char *const *argv)
{
/* check if git_command is a shell script */
- if (!try_shell_exec(cmd, argv, (char **)env)) {
+ if (!try_shell_exec(cmd, argv)) {
int pid, status;
- pid = mingw_spawnve(cmd, (const char **)argv, (char **)env, 0);
+ pid = mingw_spawnv(cmd, (const char **)argv, 0);
if (pid < 0)
return;
if (waitpid(pid, &status, 0) < 0)
@@ -1255,7 +1274,7 @@ void mingw_execvp(const char *cmd, char *const *argv)
char *prog = path_lookup(cmd, path, 0);
if (prog) {
- mingw_execve(prog, argv, environ);
+ mingw_execv(prog, argv);
free(prog);
} else
errno = ENOENT;
@@ -1263,11 +1282,6 @@ void mingw_execvp(const char *cmd, char *const *argv)
free_path_split(path);
}
-void mingw_execv(const char *cmd, char *const *argv)
-{
- mingw_execve(cmd, argv, environ);
-}
-
int mingw_kill(pid_t pid, int sig)
{
if (pid > 0 && sig == SIGTERM) {
@@ -1952,9 +1966,10 @@ int mingw_offset_1st_component(const char *path)
return offset + is_dir_sep(path[offset]);
}
-int utftowcsn(wchar_t *wcs, const char *utf, size_t wcslen, int utflen)
+int utftowcsn(wchar_t *wcs, const char *utfs, size_t wcslen, int utflen)
{
int upos = 0, wpos = 0;
+ const unsigned char *utf = (const unsigned char*) utfs;
if (!utf || !wcs || wcslen < 1) {
errno = EINVAL;
return -1;
@@ -1965,7 +1980,7 @@ int utftowcsn(wchar_t *wcs, const char *utf, size_t wcslen, int utflen)
utflen = INT_MAX;
while (upos < utflen) {
- int c = ((int) utf[upos++]) & 0xff;
+ int c = utf[upos++] & 0xff;
if (utflen == INT_MAX && c == 0)
break;
diff --git a/t/t3901-i18n-patch.sh b/t/t3901-i18n-patch.sh
index 31a5770..55c8a2f 100755
--- a/t/t3901-i18n-patch.sh
+++ b/t/t3901-i18n-patch.sh
@@ -54,10 +54,13 @@ test_expect_success setup '
git add yours &&
git commit -s -m "Second on side" &&
- # the second one on the side branch is ISO-8859-1
- git config i18n.commitencoding ISO8859-1 &&
- # use author and committer name in ISO-8859-1 to match it.
- . "$TEST_DIRECTORY"/t3901-8859-1.txt &&
+ if test_have_prereq NOT_MINGW
+ then
+ # the second one on the side branch is ISO-8859-1
+ git config i18n.commitencoding ISO8859-1 &&
+ # use author and committer name in ISO-8859-1 to match it.
+ . "$TEST_DIRECTORY"/t3901-8859-1.txt
+ fi &&
test_tick &&
echo Yet another >theirs &&
git add theirs &&
@@ -119,7 +122,7 @@ test_expect_success 'rebase (U/L)' '
check_encoding 2
'
-test_expect_success 'rebase (L/L)' '
+test_expect_success NOT_MINGW 'rebase (L/L)' '
# In this test we want ISO-8859-1 encoded commits as the result
git config i18n.commitencoding ISO8859-1 &&
git config i18n.logoutputencoding ISO8859-1 &&
@@ -131,7 +134,7 @@ test_expect_success 'rebase (L/L)' '
check_encoding 2 8859
'
-test_expect_success 'rebase (L/U)' '
+test_expect_success NOT_MINGW 'rebase (L/U)' '
# This is pathological -- use UTF-8 as intermediate form
# to get ISO-8859-1 results.
git config i18n.commitencoding ISO8859-1 &&
@@ -159,7 +162,7 @@ test_expect_success 'cherry-pick(U/U)' '
check_encoding 3
'
-test_expect_success 'cherry-pick(L/L)' '
+test_expect_success NOT_MINGW 'cherry-pick(L/L)' '
# Both the commitencoding and logoutputencoding is set to ISO-8859-1
git config i18n.commitencoding ISO8859-1 &&
@@ -189,7 +192,7 @@ test_expect_success 'cherry-pick(U/L)' '
check_encoding 3
'
-test_expect_success 'cherry-pick(L/U)' '
+test_expect_success NOT_MINGW 'cherry-pick(L/U)' '
# Again, the commitencoding is set to ISO-8859-1 but
# logoutputencoding is set to UTF-8.
diff --git a/t/t4201-shortlog.sh b/t/t4201-shortlog.sh
index 6872ba1..4896381 100755
--- a/t/t4201-shortlog.sh
+++ b/t/t4201-shortlog.sh
@@ -93,7 +93,7 @@ test_expect_success 'output from user-defined format is re-wrapped' '
test_cmp expect log.predictable
'
-test_expect_success 'shortlog wrapping' '
+test_expect_success NOT_MINGW 'shortlog wrapping' '
cat >expect <<\EOF &&
A U Thor (5):
Test
@@ -114,7 +114,7 @@ EOF
test_cmp expect out
'
-test_expect_success 'shortlog from non-git directory' '
+test_expect_success NOT_MINGW 'shortlog from non-git directory' '
git log HEAD >log &&
GIT_DIR=non-existing git shortlog -w <log >out &&
test_cmp expect out
@@ -135,7 +135,7 @@ $DSCHO (2):
EOF
-test_expect_success 'shortlog encoding' '
+test_expect_success NOT_MINGW 'shortlog encoding' '
git reset --hard "$commit" &&
git config --unset i18n.commitencoding &&
echo 2 > a1 &&
diff --git a/t/t5505-remote.sh b/t/t5505-remote.sh
index fae1200..0d0222e 100755
--- a/t/t5505-remote.sh
+++ b/t/t5505-remote.sh
@@ -915,10 +915,7 @@ test_expect_success 'remote set-url --add bbb' '
'
test_expect_success 'remote set-url --delete .*' '
- if test_have_prereq NOT_MINGW
- then
- test_must_fail git remote set-url --delete someremote .\*
- fi &&
+ test_must_fail git remote set-url --delete someremote .\* &&
echo "YYY" >expect &&
echo baz >>expect &&
echo bbb >>expect &&
diff --git a/t/t8005-blame-i18n.sh b/t/t8005-blame-i18n.sh
index cb39055..a6e73d0 100755
--- a/t/t8005-blame-i18n.sh
+++ b/t/t8005-blame-i18n.sh
@@ -33,7 +33,7 @@ author $SJIS_NAME
summary $SJIS_MSG
EOF
-test_expect_success \
+test_expect_success NOT_MINGW \
'blame respects i18n.commitencoding' '
git blame --incremental file | \
egrep "^(author|summary) " > actual &&
@@ -49,7 +49,7 @@ author $EUC_JAPAN_NAME
summary $EUC_JAPAN_MSG
EOF
-test_expect_success \
+test_expect_success NOT_MINGW \
'blame respects i18n.logoutputencoding' '
git config i18n.logoutputencoding eucJP &&
git blame --incremental file | \
@@ -66,7 +66,7 @@ author $UTF8_NAME
summary $UTF8_MSG
EOF
-test_expect_success \
+test_expect_success NOT_MINGW \
'blame respects --encoding=UTF-8' '
git blame --incremental --encoding=UTF-8 file | \
egrep "^(author|summary) " > actual &&
@@ -82,7 +82,7 @@ author $UTF8_NAME
summary $UTF8_MSG
EOF
-test_expect_success \
+test_expect_success NOT_MINGW \
'blame respects --encoding=none' '
git blame --incremental --encoding=none file | \
egrep "^(author|summary) " > actual &&
---
Signed-off-by: Karsten Blees <bl...@dcon.de>
---
builtin/cat-file.c | 2 +-
builtin/fast-export.c | 2 +-
builtin/rev-list.c | 2 +-
fast-import.c | 8 ++++----
match-trees.c | 12 ++++++------
merge-recursive.c | 2 +-
run-command.c | 2 +-
submodule.c | 2 +-
transport.c | 2 +-
wt-status.c | 2 +-
10 files changed, 18 insertions(+), 18 deletions(-)
diff --git a/builtin/cat-file.c b/builtin/cat-file.c
index 07bd984..8a6bb24 100644
--- a/builtin/cat-file.c
+++ b/builtin/cat-file.c
@@ -168,7 +168,7 @@ static int batch_one_object(const char *obj_name, int print_contents)
unsigned char sha1[20];
enum object_type type = 0;
unsigned long size;
- void *contents = contents;
+ void *contents = NULL;
if (!obj_name)
return 1;
diff --git a/builtin/fast-export.c b/builtin/fast-export.c
index 9836e6b..c7ce01c 100644
--- a/builtin/fast-export.c
+++ b/builtin/fast-export.c
@@ -481,7 +481,7 @@ static void get_tags_and_duplicates(struct object_array *pending,
for (i = 0; i < pending->nr; i++) {
struct object_array_entry *e = pending->objects + i;
unsigned char sha1[20];
- struct commit *commit = commit;
+ struct commit *commit = NULL;
char *full_name;
if (dwim_ref(e->name, strlen(e->name), sha1, &full_name) != 1)
diff --git a/builtin/rev-list.c b/builtin/rev-list.c
index 56727e8..9a6073f 100644
--- a/builtin/rev-list.c
+++ b/builtin/rev-list.c
@@ -394,7 +394,7 @@ int cmd_rev_list(int argc, const char **argv, const char *prefix)
mark_edges_uninteresting(revs.commits, &revs, show_edge);
if (bisect_list) {
- int reaches = reaches, all = all;
+ int reaches = 0, all = 0;
revs.commits = find_bisection(revs.commits, &reaches, &all,
bisect_find_all);
diff --git a/fast-import.c b/fast-import.c
index 7cc2262..4406f1f 100644
--- a/fast-import.c
+++ b/fast-import.c
@@ -2183,7 +2183,7 @@ static void file_change_m(struct branch *b)
const char *p = command_buf.buf + 2;
static struct strbuf uq = STRBUF_INIT;
const char *endp;
- struct object_entry *oe = oe;
+ struct object_entry *oe = NULL;
unsigned char sha1[20];
uint16_t mode, inline_data = 0;
@@ -2352,7 +2352,7 @@ static void note_change_n(struct branch *b, unsigned char old_fanout)
{
const char *p = command_buf.buf + 2;
static struct strbuf uq = STRBUF_INIT;
- struct object_entry *oe = oe;
+ struct object_entry *oe = NULL;
struct branch *s;
unsigned char sha1[20], commit_sha1[20];
char path[60];
@@ -2512,7 +2512,7 @@ static int parse_from(struct branch *b)
static struct hash_list *parse_merge(unsigned int *count)
{
- struct hash_list *list = NULL, *n, *e = e;
+ struct hash_list *list = NULL, *n, *e = NULL;
const char *from;
struct branch *s;
@@ -2806,7 +2806,7 @@ static void cat_blob(struct object_entry *oe, unsigned char sha1[20])
static void parse_cat_blob(void)
{
const char *p;
- struct object_entry *oe = oe;
+ struct object_entry *oe = NULL;
unsigned char sha1[20];
/* cat-blob SP <object> LF */
diff --git a/match-trees.c b/match-trees.c
index 26f7ed1..de050d0 100644
--- a/match-trees.c
+++ b/match-trees.c
@@ -72,12 +72,12 @@ static int score_trees(const unsigned char *hash1, const unsigned char *hash2)
die("%s is not a tree", sha1_to_hex(hash2));
init_tree_desc(&two, two_buf, size);
while (one.size | two.size) {
- const unsigned char *elem1 = elem1;
- const unsigned char *elem2 = elem2;
- const char *path1 = path1;
- const char *path2 = path2;
- unsigned mode1 = mode1;
- unsigned mode2 = mode2;
+ const unsigned char *elem1 = NULL;
+ const unsigned char *elem2 = NULL;
+ const char *path1 = NULL;
+ const char *path2 = NULL;
+ unsigned mode1 = 0;
+ unsigned mode2 = 0;
int cmp;
if (one.size)
diff --git a/merge-recursive.c b/merge-recursive.c
index 0cc1e6f..542f61d 100644
--- a/merge-recursive.c
+++ b/merge-recursive.c
@@ -1579,7 +1579,7 @@ int merge_recursive(struct merge_options *o,
{
struct commit_list *iter;
struct commit *merged_common_ancestors;
- struct tree *mrtree = mrtree;
+ struct tree *mrtree = NULL;
int clean;
if (show(o, 4)) {
diff --git a/run-command.c b/run-command.c
index a2796c4..a8be1b1 100644
--- a/run-command.c
+++ b/run-command.c
@@ -137,7 +137,7 @@ int start_command(struct child_process *cmd)
{
int need_in, need_out, need_err;
int fdin[2], fdout[2], fderr[2];
- int failed_errno = failed_errno;
+ int failed_errno = 0;
/*
* In case of errors we must keep the promise to close FDs
diff --git a/submodule.c b/submodule.c
index 1ba9646..71a0508 100644
--- a/submodule.c
+++ b/submodule.c
@@ -253,7 +253,7 @@ void show_submodule_summary(FILE *f, const char *path,
const char *del, const char *add, const char *reset)
{
struct rev_info rev;
- struct commit *left = left, *right = right;
+ struct commit *left = NULL, *right = NULL;
const char *message = NULL;
struct strbuf sb = STRBUF_INIT;
int fast_forward = 0, fast_backward = 0;
diff --git a/transport.c b/transport.c
index 98c5778..be1538c 100644
--- a/transport.c
+++ b/transport.c
@@ -104,7 +104,7 @@ static void insert_packed_refs(const char *packed_refs, struct ref **list)
return;
for (;;) {
- int cmp = cmp, len;
+ int cmp = 0, len;
if (!fgets(buffer, sizeof(buffer), f)) {
fclose(f);
diff --git a/wt-status.c b/wt-status.c
index 0237772..eba4d31 100644
--- a/wt-status.c
+++ b/wt-status.c
@@ -227,7 +227,7 @@ static void wt_status_print_change_data(struct wt_status *s,
{
struct wt_status_change_data *d = it->util;
const char *c = color(change_type, s);
- int status = status;
+ int status = 0;
char *one_name;
char *two_name;
const char *one, *two;
--
1.7.6.msysgit.1
Signed-off-by: Karsten Blees <bl...@dcon.de>
---
compat/vcbuild/include/unistd.h | 2 ++
1 files changed, 2 insertions(+), 0 deletions(-)
diff --git a/compat/vcbuild/include/unistd.h b/compat/vcbuild/include/unistd.h
index bde4d05..3ca5c57 100644
--- a/compat/vcbuild/include/unistd.h
+++ b/compat/vcbuild/include/unistd.h
@@ -3,6 +3,8 @@
/* Win32 define for porting git*/
+#include <io.h>
+
/* disable const** -> void* warning (~130 warnings for realloc etc.) */
#pragma warning(disable : 4090)
--
1.7.6.msysgit.1
const char *x;
realloc(x, ...); // warning C4090 due to const** -> void*
As there's not that much MSVC-only code, the 'real' const to non-const
casts that we want to know about will be caught by GCC builds anyway, so
disable this warning.
MSVC link.exe doesn't recognize the /wd option in CFLAGS in the Makefile,
so use #pragma warning in central, MSVC-specific unistd.h.
Signed-off-by: Karsten Blees <bl...@dcon.de>
---
compat/vcbuild/include/unistd.h | 3 +++
1 files changed, 3 insertions(+), 0 deletions(-)
diff --git a/compat/vcbuild/include/unistd.h b/compat/vcbuild/include/unistd.h
index b14fcf9..bde4d05 100644
--- a/compat/vcbuild/include/unistd.h
+++ b/compat/vcbuild/include/unistd.h
@@ -3,6 +3,9 @@
/* Win32 define for porting git*/
+/* disable const** -> void* warning (~130 warnings for realloc etc.) */
+#pragma warning(disable : 4090)
+
#ifndef _MODE_T_
#define _MODE_T_
typedef unsigned short _mode_t;
--
1.7.6.msysgit.1
Cast FARPROC to PGETCURRENTCONSOLEFONTEX to suppress MSVC compiler warning.
Signed-off-by: Karsten Blees <bl...@dcon.de>
---
compat/winansi.c | 7 +++++--
1 files changed, 5 insertions(+), 2 deletions(-)
diff --git a/compat/winansi.c b/compat/winansi.c
index a5ca2d9..bf514f9 100644
--- a/compat/winansi.c
+++ b/compat/winansi.c
@@ -2,6 +2,7 @@
* Copyright 2008 Peter Harris <g...@peter.is-a-geek.org>
*/
+#undef NOGDI
#include "../git-compat-util.h"
#include <malloc.h>
#include <wingdi.h>
@@ -30,6 +31,7 @@ static int negative;
static FILE *last_stream = NULL;
static int non_ascii_used = 0;
+#ifdef __MINGW32__
typedef struct _CONSOLE_FONT_INFOEX {
ULONG cbSize;
DWORD nFont;
@@ -38,6 +40,7 @@ typedef struct _CONSOLE_FONT_INFOEX {
UINT FontWeight;
WCHAR FaceName[LF_FACESIZE];
} CONSOLE_FONT_INFOEX, *PCONSOLE_FONT_INFOEX;
+#endif
typedef BOOL (WINAPI *PGETCURRENTCONSOLEFONTEX)(HANDLE, BOOL,
PCONSOLE_FONT_INFOEX);
@@ -52,8 +55,8 @@ static void warn_if_raster_font(void)
return;
/* GetCurrentConsoleFontEx is available since Vista */
- pGetCurrentConsoleFontEx = GetProcAddress(GetModuleHandle("kernel32.dll"),
- "GetCurrentConsoleFontEx");
+ pGetCurrentConsoleFontEx = (PGETCURRENTCONSOLEFONTEX) GetProcAddress(
+ GetModuleHandle("kernel32.dll"), "GetCurrentConsoleFontEx");
if (pGetCurrentConsoleFontEx) {
CONSOLE_FONT_INFOEX cfi;
cfi.cbSize = sizeof(cfi);
--
1.7.6.msysgit.1
To support repositories with legacy-encoded file names, the UTF-8 to UTF-16
conversion function tries to create valid, unique file names even for
invalid UTF-8 byte sequences, so that these repositories can be checked out
without error.
The current implementation leaves invalid UTF-8 bytes in range 0xa0 - 0xff
as is (producing printable Unicode chars \u00a0 - \u00ff, equivalent to
ISO-8859-1), and converts 0x80 - 0x9f to hex-code (\u0080 - \u009f are
control chars).
The Windows MultiByteToWideChar API was not used as it either drops invalid
UTF-8 sequences (on Win2k/XP; producing non-unique or even empty file
names) or converts them to the replacement char \ufffd (Vista/7; causing
ERROR_INVALID_NAME in subsequent calls to file system APIs).
Signed-off-by: Karsten Blees <bl...@dcon.de>
---
compat/mingw.c | 85 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
compat/mingw.h | 77 ++++++++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 162 insertions(+), 0 deletions(-)
diff --git a/compat/mingw.c b/compat/mingw.c
index 57d41dd..ba7ac27 100644
--- a/compat/mingw.c
+++ b/compat/mingw.c
@@ -1869,6 +1869,91 @@ int mingw_offset_1st_component(const char *path)
return offset + is_dir_sep(path[offset]);
}
+int utftowcsn(wchar_t *wcs, const char *utfs, size_t wcslen, int utflen)
+{
+ int upos = 0, wpos = 0;
+ const unsigned char *utf = (const unsigned char*) utfs;
+ if (!utf || !wcs || wcslen < 1) {
+ errno = EINVAL;
+ return -1;
+ }
+ /* reserve space for \0 */
+ wcslen--;
+ if (utflen < 0)
+ utflen = INT_MAX;
+
+ while (upos < utflen) {
+ int c = utf[upos++] & 0xff;
+ if (utflen == INT_MAX && c == 0)
+ break;
+
+ if (wpos >= wcslen) {
+ wcs[wpos] = 0;
+ errno = ENAMETOOLONG;
+ return -1;
+ }
+
+ if (c < 0x80) {
+ /* ASCII */
+ wcs[wpos++] = c;
+ } else if (c >= 0xc2 && c < 0xe0 && upos < utflen &&
+ (utf[upos] & 0xc0) == 0x80) {
+ /* 2-byte utf-8 */
+ c = ((c & 0x1f) << 6);
+ c |= (utf[upos++] & 0x3f);
+ wcs[wpos++] = c;
+ } else if (c >= 0xe0 && c < 0xf0 && upos + 1 < utflen &&
+ !(c == 0xe0 && utf[upos] < 0xa0) && /* over-long encoding */
+ (utf[upos] & 0xc0) == 0x80 &&
+ (utf[upos + 1] & 0xc0) == 0x80) {
+ /* 3-byte utf-8 */
+ c = ((c & 0x0f) << 12);
+ c |= ((utf[upos++] & 0x3f) << 6);
+ c |= (utf[upos++] & 0x3f);
+ wcs[wpos++] = c;
+ } else if (c >= 0xf0 && c < 0xf5 && upos + 2 < utflen &&
+ wpos + 1 < wcslen &&
+ !(c == 0xf0 && utf[upos] < 0x90) && /* over-long encoding */
+ !(c == 0xf4 && utf[upos] >= 0x90) && /* > \u10ffff */
+ (utf[upos] & 0xc0) == 0x80 &&
+ (utf[upos + 1] & 0xc0) == 0x80 &&
+ (utf[upos + 2] & 0xc0) == 0x80) {
+ /* 4-byte utf-8: convert to \ud8xx \udcxx surrogate pair */
+ c = ((c & 0x07) << 18);
+ c |= ((utf[upos++] & 0x3f) << 12);
+ c |= ((utf[upos++] & 0x3f) << 6);
+ c |= (utf[upos++] & 0x3f);
+ c -= 0x10000;
+ wcs[wpos++] = 0xd800 | (c >> 10);
+ wcs[wpos++] = 0xdc00 | (c & 0x3ff);
+ } else if (c >= 0xa0) {
+ /* invalid utf-8 byte, printable unicode char: convert 1:1 */
+ wcs[wpos++] = c;
+ } else {
+ /* invalid utf-8 byte, non-printable unicode: convert to hex */
+ static const char *hex = "0123456789abcdef";
+ wcs[wpos++] = hex[c >> 4];
+ if (wpos < wcslen)
+ wcs[wpos++] = hex[c & 0x0f];
+ }
+ }
+ wcs[wpos] = 0;
+ return wpos;
+}
+
+int wcstoutf(char *utf, const wchar_t *wcs, size_t utflen)
+{
+ if (!wcs || !utf || utflen < 1) {
+ errno = EINVAL;
+ return -1;
+ }
+ utflen = WideCharToMultiByte(CP_UTF8, 0, wcs, -1, utf, utflen, NULL, NULL);
+ if (utflen)
+ return utflen - 1;
+ errno = ENAMETOOLONG;
+ return -1;
+}
+
/*
* Disable MSVCRT command line wildcard expansion (__getmainargs called from
* mingw startup code, see init.c in mingw runtime).
diff --git a/compat/mingw.h b/compat/mingw.h
index 3aa80bb..b4ff6cd 100644
--- a/compat/mingw.h
+++ b/compat/mingw.h
@@ -326,6 +326,83 @@ void mingw_mark_as_git_dir(const char *dir);
char **make_augmented_environ(const char *const *vars);
void free_environ(char **env);
+/**
+ * Converts UTF-8 encoded string to UTF-16LE.
+ *
+ * To support repositories with legacy-encoded file names, invalid UTF-8 bytes
+ * 0xa0 - 0xff are converted to corresponding printable Unicode chars \u00a0 -
+ * \u00ff, and invalid UTF-8 bytes 0x80 - 0x9f (which would make non-printable
+ * Unicode) are converted to hex-code.
+ *
+ * Lead-bytes not followed by an appropriate number of trail-bytes, over-long
+ * encodings and 4-byte encodings > \u10ffff are detected as invalid UTF-8.
+ *
+ * Maximum space requirement for the target buffer is two wide chars per UTF-8
+ * char (((strlen(utf) * 2) + 1) [* sizeof(wchar_t)]).
+ *
+ * The maximum space is needed only if the entire input string consists of
+ * invalid UTF-8 bytes in range 0x80-0x9f, as per the following table:
+ *
+ * | | UTF-8 | UTF-16 |
+ * Code point | UTF-8 sequence | bytes | words | ratio
+ * --------------+-------------------+-------+--------+-------
+ * 000000-00007f | 0-7f | 1 | 1 | 1
+ * 000080-0007ff | c2-df + 80-bf | 2 | 1 | 0.5
+ * 000800-00ffff | e0-ef + 2 * 80-bf | 3 | 1 | 0.33
+ * 010000-10ffff | f0-f4 + 3 * 80-bf | 4 | 2 (a) | 0.5
+ * invalid | 80-9f | 1 | 2 (b) | 2
+ * invalid | a0-ff | 1 | 1 | 1
+ *
+ * (a) encoded as UTF-16 surrogate pair
+ * (b) encoded as two hex digits
+ *
+ * Note that, while the UTF-8 encoding scheme can be extended to 5-byte, 6-byte
+ * or even indefinite-byte sequences, the largest valid code point \u10ffff
+ * encodes as only 4 UTF-8 bytes.
+ *
+ * @param wcs wide char target buffer
+ * @param utf string to convert
+ * @param wcslen size of target buffer (in wchar_t's)
+ * @param utflen size of string to convert, or -1 if 0-terminated
+ * @return length of converted string (_wcslen(wcs)), or -1 on failure (errno
+ * is set to EINVAL or ENAMETOOLONG)
+ * @see mbstowcs, MultiByteToWideChar
+ */
+int utftowcsn(wchar_t *wcs, const char *utf, size_t wcslen, int utflen);
+static inline int utftowcs(wchar_t *wcs, const char *utf, size_t wcslen)
+{
+ return utftowcsn(wcs, utf, wcslen, -1);
+}
+
+/**
+ * Converts UTF-16LE encoded string to UTF-8.
+ *
+ * Maximum space requirement for the target buffer is three UTF-8 chars per
+ * wide char ((_wcslen(wcs) * 3) + 1).
+ *
+ * The maximum space is needed only if the entire input string consists of
+ * UTF-16 words in range 0x0800-0xd7ff or 0xe000-0xffff (i.e. \u0800-\uffff
+ * modulo surrogate pairs), as per the following table:
+ *
+ * | | UTF-16 | UTF-8 |
+ * Code point | UTF-16 sequence | words | bytes | ratio
+ * --------------+-----------------------+--------+-------+-------
+ * 000000-00007f | 0000-007f | 1 | 1 | 1
+ * 000080-0007ff | 0080-07ff | 1 | 2 | 2
+ * 000800-00ffff | 0800-d7ff / e000-ffff | 1 | 3 | 3
+ * 010000-10ffff | d800-dbff + dc00-dfff | 2 | 4 | 2
+ *
+ * Note that invalid code points > 10ffff cannot be represented in UTF-16.
+ *
+ * @param utf target buffer
+ * @param wcs wide string to convert
+ * @param utflen size of target buffer
+ * @return length of converted string, or -1 on failure (errno is set to EINVAL
+ * or ENAMETOOLONG)
+ * @see wcstombs, WideCharToMultiByte
+ */
+int wcstoutf(char *utf, const wchar_t *wcs, size_t utflen);
+
/*
* A replacement of main() that adds win32 specific initialization.
*/
--
1.7.6.msysgit.1
With "MinGW: disable CRT command line globbing" (a05e9a86), wildcard
expansion works as expected (handled by git, not the CRT).
Signed-off-by: Karsten Blees <bl...@dcon.de>
---
t/t5505-remote.sh | 5 +----
1 files changed, 1 insertions(+), 4 deletions(-)
diff --git a/t/t5505-remote.sh b/t/t5505-remote.sh
index fae1200..0d0222e 100755
--- a/t/t5505-remote.sh
+++ b/t/t5505-remote.sh
@@ -915,10 +915,7 @@ test_expect_success 'remote set-url --add bbb' '
'
test_expect_success 'remote set-url --delete .*' '
- if test_have_prereq NOT_MINGW
- then
- test_must_fail git remote set-url --delete someremote .\*
- fi &&
+ test_must_fail git remote set-url --delete someremote .\* &&
echo "YYY" >expect &&
echo baz >>expect &&
echo bbb >>expect &&
--
1.7.6.msysgit.1
This also fixes linker warnings for _malloc and _free in zlib.lib, which
seems to be compiled for MSVCRT.dll already.
The DLL version also exports some of the CRT initialization functions,
which are hidden in the static libcmt.lib (e.g. __wgetmainargs, required by
subsequent Unicode patches).
Signed-off-by: Karsten Blees <bl...@dcon.de>
---
Makefile | 6 +++---
1 files changed, 3 insertions(+), 3 deletions(-)
diff --git a/Makefile b/Makefile
index 234d0af..4594d81 100644
--- a/Makefile
+++ b/Makefile
@@ -1133,16 +1133,16 @@ ifeq ($(uname_S),Windows)
compat/win32/pthread.o compat/win32/syslog.o \
compat/win32/sys/poll.o compat/win32/dirent.o
COMPAT_CFLAGS = -D__USE_MINGW_ACCESS -DNOGDI -DHAVE_STRING_H -DHAVE_ALLOCA_H -Icompat -Icompat/regex -Icompat/win32 -DSTRIP_EXTENSION=\".exe\"
- BASIC_LDFLAGS = -IGNORE:4217 -IGNORE:4049 -NOLOGO -SUBSYSTEM:CONSOLE -NODEFAULTLIB:MSVCRT.lib
+ BASIC_LDFLAGS = -IGNORE:4217 -IGNORE:4049 -NOLOGO -SUBSYSTEM:CONSOLE
EXTLIBS = user32.lib advapi32.lib shell32.lib wininet.lib ws2_32.lib
PTHREAD_LIBS =
lib =
ifndef DEBUG
- BASIC_CFLAGS += -GL -Os -MT
+ BASIC_CFLAGS += -GL -Os -MD
BASIC_LDFLAGS += -LTCG
AR += -LTCG
else
- BASIC_CFLAGS += -Zi -MTd
+ BASIC_CFLAGS += -Zi -MDd
endif
X = .exe
endif
--
1.7.6.msysgit.1
Signed-off-by: Karsten Blees <bl...@dcon.de>
---
gitk-git/gitk | 1 +
1 files changed, 1 insertions(+), 0 deletions(-)
diff --git a/gitk-git/gitk b/gitk-git/gitk
index 2a92e20..935ed5c 100755
--- a/gitk-git/gitk
+++ b/gitk-git/gitk
@@ -7778,6 +7778,7 @@ proc getblobdiffline {bdf ids} {
set diffinhdr 0
continue
}
+ set line [encoding convertfrom $line]
$ctext insert end "$line\n" filesep
} else {
--
1.7.6.msysgit.1
Signed-off-by: Karsten Blees <bl...@dcon.de>
---
git-gui/lib/browser.tcl | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)
diff --git a/git-gui/lib/browser.tcl b/git-gui/lib/browser.tcl
index a8c6223..d68b1f9 100644
--- a/git-gui/lib/browser.tcl
+++ b/git-gui/lib/browser.tcl
@@ -191,7 +191,7 @@ method _ls {tree_id {name {}}} {
$w conf -state disabled
set fd [git_read ls-tree -z $tree_id]
- fconfigure $fd -blocking 0 -translation binary -encoding binary
+ fconfigure $fd -blocking 0 -translation binary
fileevent $fd readable [cb _read $fd]
}
--
1.7.6.msysgit.1
Removes parameter checks that are already covered by utftowcs.
Increases the size of dirent.d_name to accommodate the full
WIN32_FIND_DATA.cFileName converted to UTF-8 (UTF-16 to UTF-8 conversion
may grow by factor three in the worst case).
Signed-off-by: Karsten Blees <bl...@dcon.de>
---
compat/win32/dirent.c | 31 +++++++++++--------------------
compat/win32/dirent.h | 2 +-
2 files changed, 12 insertions(+), 21 deletions(-)
diff --git a/compat/win32/dirent.c b/compat/win32/dirent.c
index 82a515c..37f56b7 100644
--- a/compat/win32/dirent.c
+++ b/compat/win32/dirent.c
@@ -6,10 +6,10 @@ struct DIR {
int dd_stat; /* 0-based index */
};
-static inline void finddata2dirent(struct dirent *ent, WIN32_FIND_DATAA *fdata)
+static inline void finddata2dirent(struct dirent *ent, WIN32_FIND_DATAW *fdata)
{
- /* copy file name from WIN32_FIND_DATA to dirent */
- memcpy(ent->d_name, fdata->cFileName, sizeof(ent->d_name));
+ /* convert UTF-16 name to UTF-8 */
+ wcstoutf(ent->d_name, fdata->cFileName, sizeof(ent->d_name));
/* Set file type, based on WIN32_FIND_DATA */
if (fdata->dwFileAttributes & FILE_ATTRIBUTE_DIRECTORY)
@@ -20,25 +20,16 @@ static inline void finddata2dirent(struct dirent *ent, WIN32_FIND_DATAA *fdata)
DIR *opendir(const char *name)
{
- char pattern[MAX_PATH];
- WIN32_FIND_DATAA fdata;
+ wchar_t pattern[MAX_PATH];
+ WIN32_FIND_DATAW fdata;
HANDLE h;
int len;
DIR *dir;
- /* check that name is not NULL */
- if (!name) {
- errno = EINVAL;
+ /* convert name to UTF-16, check length (-2 for '/' '*') */
+ len = utftowcs(pattern, name, MAX_PATH - 2);
+ if (len < 0)
return NULL;
- }
- /* check that the pattern won't be too long for FindFirstFileA */
- len = strlen(name);
- if (len + 2 >= MAX_PATH) {
- errno = ENAMETOOLONG;
- return NULL;
- }
- /* copy name to temp buffer */
- memcpy(pattern, name, len + 1);
/* append optional '/' and wildcard '*' */
if (len && !is_dir_sep(pattern[len - 1]))
@@ -47,7 +38,7 @@ DIR *opendir(const char *name)
pattern[len] = 0;
/* open find handle */
- h = FindFirstFileA(pattern, &fdata);
+ h = FindFirstFileW(pattern, &fdata);
if (h == INVALID_HANDLE_VALUE) {
DWORD err = GetLastError();
errno = (err == ERROR_DIRECTORY) ? ENOTDIR : err_win_to_posix(err);
@@ -72,8 +63,8 @@ struct dirent *readdir(DIR *dir)
/* if first entry, dirent has already been set up by opendir */
if (dir->dd_stat) {
/* get next entry and convert from WIN32_FIND_DATA to dirent */
- WIN32_FIND_DATAA fdata;
- if (FindNextFileA(dir->dd_handle, &fdata)) {
+ WIN32_FIND_DATAW fdata;
+ if (FindNextFileW(dir->dd_handle, &fdata)) {
finddata2dirent(&dir->dd_dir, &fdata);
} else {
DWORD lasterr = GetLastError();
diff --git a/compat/win32/dirent.h b/compat/win32/dirent.h
index 8838cd6..058207e 100644
--- a/compat/win32/dirent.h
+++ b/compat/win32/dirent.h
@@ -10,7 +10,7 @@ typedef struct DIR DIR;
struct dirent {
unsigned char d_type; /* file type to prevent lstat after readdir */
- char d_name[MAX_PATH]; /* file name */
+ char d_name[MAX_PATH * 3]; /* file name (* 3 for UTF-8 conversion) */
};
DIR *opendir(const char *dirname);
--
1.7.6.msysgit.1
The dirent API (opendir/readdir/closedir) is updated in a separate commit.
Adds trivial wrappers for access, chmod and chdir.
Adds wrapper for mktemp (needed for both mkstemp and mkdtemp).
The simplest way to convert a repository with legacy-encoded (e.g. Cp1252)
file names to UTF-8 ist to checkout with an old msysgit version and
"git add --all & git commit" with the new version.
Signed-off-by: Karsten Blees <bl...@dcon.de>
---
compat/mingw.c | 213 +++++++++++++++++++++++++++++++++++++++-----------------
compat/mingw.h | 13 ++++
2 files changed, 161 insertions(+), 65 deletions(-)
diff --git a/compat/mingw.c b/compat/mingw.c
index ba7ac27..fe759c0 100644
--- a/compat/mingw.c
+++ b/compat/mingw.c
@@ -1,6 +1,7 @@
#include "../git-compat-util.h"
#include "win32.h"
#include <conio.h>
+#include <wchar.h>
#include "../strbuf.h"
#include "../run-command.h"
#include "../cache.h"
@@ -200,14 +201,16 @@ static int ask_yes_no_if_possible(const char *format, ...)
}
}
-#undef unlink
int mingw_unlink(const char *pathname)
{
int ret, tries = 0;
+ wchar_t wpathname[MAX_PATH];
+ if (utftowcs(wpathname, pathname, MAX_PATH) < 0)
+ return -1;
/* read-only files cannot be removed */
- chmod(pathname, 0666);
- while ((ret = unlink(pathname)) == -1 && tries < ARRAY_SIZE(delay)) {
+ _wchmod(wpathname, 0666);
+ while ((ret = _wunlink(wpathname)) == -1 && tries < ARRAY_SIZE(delay)) {
if (!is_file_in_use_error(GetLastError()))
break;
/*
@@ -223,43 +226,40 @@ int mingw_unlink(const char *pathname)
while (ret == -1 && is_file_in_use_error(GetLastError()) &&
ask_yes_no_if_possible("Unlink of file '%s' failed. "
"Should I try again?", pathname))
- ret = unlink(pathname);
+ ret = _wunlink(wpathname);
return ret;
}
-static int is_dir_empty(const char *path)
+static int is_dir_empty(const wchar_t *wpath)
{
- struct strbuf buf = STRBUF_INIT;
- WIN32_FIND_DATAA findbuf;
+ WIN32_FIND_DATAW findbuf;
HANDLE handle;
-
- strbuf_addf(&buf, "%s\\*", path);
- handle = FindFirstFileA(buf.buf, &findbuf);
- if (handle == INVALID_HANDLE_VALUE) {
- strbuf_release(&buf);
+ wchar_t wbuf[MAX_PATH + 2];
+ wcscpy(wbuf, wpath);
+ wcscat(wbuf, L"\\*");
+ handle = FindFirstFileW(wbuf, &findbuf);
+ if (handle == INVALID_HANDLE_VALUE)
return GetLastError() == ERROR_NO_MORE_FILES;
- }
- while (!strcmp(findbuf.cFileName, ".") ||
- !strcmp(findbuf.cFileName, ".."))
- if (!FindNextFile(handle, &findbuf)) {
- strbuf_release(&buf);
+ while (!wcscmp(findbuf.cFileName, L".") ||
+ !wcscmp(findbuf.cFileName, L".."))
+ if (!FindNextFileW(handle, &findbuf))
return GetLastError() == ERROR_NO_MORE_FILES;
- }
FindClose(handle);
- strbuf_release(&buf);
return 0;
}
-#undef rmdir
int mingw_rmdir(const char *pathname)
{
int ret, tries = 0;
+ wchar_t wpathname[MAX_PATH];
+ if (utftowcs(wpathname, pathname, MAX_PATH) < 0)
+ return -1;
- while ((ret = rmdir(pathname)) == -1 && tries < ARRAY_SIZE(delay)) {
+ while ((ret = _wrmdir(wpathname)) == -1 && tries < ARRAY_SIZE(delay)) {
if (!is_file_in_use_error(GetLastError()))
break;
- if (!is_dir_empty(pathname)) {
+ if (!is_dir_empty(wpathname)) {
errno = ENOTEMPTY;
break;
}
@@ -276,14 +276,14 @@ int mingw_rmdir(const char *pathname)
while (ret == -1 && is_file_in_use_error(GetLastError()) &&
ask_yes_no_if_possible("Deletion of directory '%s' failed. "
"Should I try again?", pathname))
- ret = rmdir(pathname);
+ ret = _wrmdir(wpathname);
return ret;
}
-static int make_hidden(const char *path)
+static int make_hidden(const wchar_t *path)
{
- DWORD attribs = GetFileAttributes(path);
- if (SetFileAttributes(path, FILE_ATTRIBUTE_HIDDEN | attribs))
+ DWORD attribs = GetFileAttributesW(path);
+ if (SetFileAttributesW(path, FILE_ATTRIBUTE_HIDDEN | attribs))
return 0;
errno = err_win_to_posix(GetLastError());
return -1;
@@ -291,19 +291,23 @@ static int make_hidden(const char *path)
void mingw_mark_as_git_dir(const char *dir)
{
- if (hide_dotfiles != HIDE_DOTFILES_FALSE && !is_bare_repository() &&
- make_hidden(dir))
- warning("Failed to make '%s' hidden", dir);
+ wchar_t wdir[MAX_PATH];
+ if (hide_dotfiles != HIDE_DOTFILES_FALSE && !is_bare_repository())
+ if (utftowcs(wdir, dir, MAX_PATH) < 0 || make_hidden(wdir))
+ warning("Failed to make '%s' hidden", dir);
git_config_set("core.hideDotFiles",
hide_dotfiles == HIDE_DOTFILES_FALSE ? "false" :
(hide_dotfiles == HIDE_DOTFILES_DOTGITONLY ?
"dotGitOnly" : "true"));
}
-#undef mkdir
int mingw_mkdir(const char *path, int mode)
{
- int ret = mkdir(path);
+ int ret;
+ wchar_t wpath[MAX_PATH];
+ if (utftowcs(wpath, path, MAX_PATH) < 0)
+ return -1;
+ ret = _wmkdir(wpath);
if (!ret && hide_dotfiles == HIDE_DOTFILES_TRUE) {
/*
* In Windows a file or dir starting with a dot is not
@@ -312,17 +316,17 @@ int mingw_mkdir(const char *path, int mode)
*/
const char *start = basename((char*)path);
if (*start == '.')
- return make_hidden(path);
+ return make_hidden(wpath);
}
return ret;
}
-#undef open
int mingw_open (const char *filename, int oflags, ...)
{
va_list args;
unsigned mode;
int fd;
+ wchar_t wfilename[MAX_PATH];
va_start(args, oflags);
mode = va_arg(args, int);
@@ -331,10 +335,12 @@ int mingw_open (const char *filename, int oflags, ...)
if (filename && !strcmp(filename, "/dev/null"))
filename = "nul";
- fd = open(filename, oflags, mode);
+ if (utftowcs(wfilename, filename, MAX_PATH) < 0)
+ return -1;
+ fd = _wopen(wfilename, oflags, mode);
if (fd < 0 && (oflags & O_CREAT) && errno == EACCES) {
- DWORD attrs = GetFileAttributes(filename);
+ DWORD attrs = GetFileAttributesW(wfilename);
if (attrs != INVALID_FILE_ATTRIBUTES && (attrs & FILE_ATTRIBUTE_DIRECTORY))
errno = EISDIR;
}
@@ -346,7 +352,7 @@ int mingw_open (const char *filename, int oflags, ...)
* such a file is created.
*/
const char *start = basename((char*)filename);
- if (*start == '.' && make_hidden(filename))
+ if (*start == '.' && make_hidden(wfilename))
warning("Could not mark '%s' as hidden.", filename);
}
return fd;
@@ -369,38 +375,69 @@ ssize_t mingw_write(int fd, const void *buf, size_t count)
return write(fd, buf, min(count, 31 * 1024 * 1024));
}
-#undef fopen
FILE *mingw_fopen (const char *filename, const char *otype)
{
int hide = 0;
FILE *file;
+ wchar_t wfilename[MAX_PATH], wotype[10];
if (hide_dotfiles == HIDE_DOTFILES_TRUE &&
basename((char*)filename)[0] == '.')
hide = access(filename, F_OK);
if (filename && !strcmp(filename, "/dev/null"))
filename = "nul";
- file = fopen(filename, otype);
- if (file && hide && make_hidden(filename))
+ if (utftowcs(wfilename, filename, MAX_PATH) < 0 ||
+ utftowcs(wotype, otype, 10) < 0)
+ return NULL;
+ file = _wfopen(wfilename, wotype);
+ if (file && hide && make_hidden(wfilename))
warning("Could not mark '%s' as hidden.", filename);
return file;
}
-#undef freopen
FILE *mingw_freopen (const char *filename, const char *otype, FILE *stream)
{
int hide = 0;
FILE *file;
+ wchar_t wfilename[MAX_PATH], wotype[10];
if (hide_dotfiles == HIDE_DOTFILES_TRUE &&
basename((char*)filename)[0] == '.')
hide = access(filename, F_OK);
if (filename && !strcmp(filename, "/dev/null"))
filename = "nul";
- file = freopen(filename, otype, stream);
- if (file && hide && make_hidden(filename))
+ if (utftowcs(wfilename, filename, MAX_PATH) < 0 ||
+ utftowcs(wotype, otype, 10) < 0)
+ return NULL;
+ file = _wfreopen(wfilename, wotype, stream);
+ if (file && hide && make_hidden(wfilename))
warning("Could not mark '%s' as hidden.", filename);
return file;
}
+int mingw_access(const char *filename, int mode)
+{
+ wchar_t wfilename[MAX_PATH];
+ if (utftowcs(wfilename, filename, MAX_PATH) < 0)
+ return -1;
+ /* X_OK is not supported by the MSVCRT version */
+ return _waccess(wfilename, mode & ~X_OK);
+}
+
+int mingw_chdir(const char *dirname)
+{
+ wchar_t wdirname[MAX_PATH];
+ if (utftowcs(wdirname, dirname, MAX_PATH) < 0)
+ return -1;
+ return _wchdir(wdirname);
+}
+
+int mingw_chmod(const char *filename, int mode)
+{
+ wchar_t wfilename[MAX_PATH];
+ if (utftowcs(wfilename, filename, MAX_PATH) < 0)
+ return -1;
+ return _wchmod(wfilename, mode);
+}
+
/*
* The unit of FILETIME is 100-nanoseconds since January 1, 1601, UTC.
* Returns the 100-nanoseconds ("hekto nanoseconds") since the epoch.
@@ -426,10 +463,12 @@ static inline time_t filetime_to_time_t(const FILETIME *ft)
*/
static int do_lstat(int follow, const char *file_name, struct stat *buf)
{
- int err;
WIN32_FILE_ATTRIBUTE_DATA fdata;
+ wchar_t wfilename[MAX_PATH];
+ if (utftowcs(wfilename, file_name, MAX_PATH) < 0)
+ return -1;
- if (!(err = get_file_attr(file_name, &fdata))) {
+ if (GetFileAttributesExW(wfilename, GetFileExInfoStandard, &fdata)) {
buf->st_ino = 0;
buf->st_gid = 0;
buf->st_uid = 0;
@@ -442,8 +481,8 @@ static int do_lstat(int follow, const char *file_name, struct stat *buf)
buf->st_mtime = filetime_to_time_t(&(fdata.ftLastWriteTime));
buf->st_ctime = filetime_to_time_t(&(fdata.ftCreationTime));
if (fdata.dwFileAttributes & FILE_ATTRIBUTE_REPARSE_POINT) {
- WIN32_FIND_DATAA findbuf;
- HANDLE handle = FindFirstFileA(file_name, &findbuf);
+ WIN32_FIND_DATAW findbuf;
+ HANDLE handle = FindFirstFileW(wfilename, &findbuf);
if (handle != INVALID_HANDLE_VALUE) {
if ((findbuf.dwFileAttributes & FILE_ATTRIBUTE_REPARSE_POINT) &&
(findbuf.dwReserved0 == IO_REPARSE_TAG_SYMLINK)) {
@@ -462,7 +501,23 @@ static int do_lstat(int follow, const char *file_name, struct stat *buf)
}
return 0;
}
- errno = err;
+ switch (GetLastError()) {
+ case ERROR_ACCESS_DENIED:
+ case ERROR_SHARING_VIOLATION:
+ case ERROR_LOCK_VIOLATION:
+ case ERROR_SHARING_BUFFER_EXCEEDED:
+ errno = EACCES;
+ break;
+ case ERROR_BUFFER_OVERFLOW:
+ errno = ENAMETOOLONG;
+ break;
+ case ERROR_NOT_ENOUGH_MEMORY:
+ errno = ENOMEM;
+ break;
+ default:
+ errno = ENOENT;
+ break;
+ }
return -1;
}
@@ -551,16 +606,20 @@ int mingw_utime (const char *file_name, const struct utimbuf *times)
{
FILETIME mft, aft;
int fh, rc;
+ DWORD attrs;
+ wchar_t wfilename[MAX_PATH];
+ if (utftowcs(wfilename, file_name, MAX_PATH) < 0)
+ return -1;
/* must have write permission */
- DWORD attrs = GetFileAttributes(file_name);
+ attrs = GetFileAttributesW(wfilename);
if (attrs != INVALID_FILE_ATTRIBUTES &&
(attrs & FILE_ATTRIBUTE_READONLY)) {
/* ignore errors here; open() will report them */
- SetFileAttributes(file_name, attrs & ~FILE_ATTRIBUTE_READONLY);
+ SetFileAttributesW(wfilename, attrs & ~FILE_ATTRIBUTE_READONLY);
}
- if ((fh = open(file_name, O_RDWR | O_BINARY)) < 0) {
+ if ((fh = _wopen(wfilename, O_RDWR | O_BINARY)) < 0) {
rc = -1;
goto revert_attrs;
}
@@ -583,7 +642,7 @@ revert_attrs:
if (attrs != INVALID_FILE_ATTRIBUTES &&
(attrs & FILE_ATTRIBUTE_READONLY)) {
/* ignore errors again */
- SetFileAttributes(file_name, attrs);
+ SetFileAttributesW(wfilename, attrs);
}
return rc;
}
@@ -594,6 +653,18 @@ unsigned int sleep (unsigned int seconds)
return 0;
}
+char *mingw_mktemp(char *template)
+{
+ wchar_t wtemplate[MAX_PATH];
+ if (utftowcs(wtemplate, template, MAX_PATH) < 0)
+ return NULL;
+ if (!_wmktemp(wtemplate))
+ return NULL;
+ if (wcstoutf(template, wtemplate, strlen(template) + 1) < 0)
+ return NULL;
+ return template;
+}
+
int mkstemp(char *template)
{
char *filename = mktemp(template);
@@ -652,17 +723,18 @@ struct tm *localtime_r(const time_t *timep, struct tm *result)
return result;
}
-#undef getcwd
char *mingw_getcwd(char *pointer, int len)
{
int i;
- char *ret = getcwd(pointer, len);
- if (!ret)
- return ret;
+ wchar_t wpointer[MAX_PATH];
+ if (!_wgetcwd(wpointer, MAX_PATH))
+ return NULL;
+ if (wcstoutf(pointer, wpointer, len) < 0)
+ return NULL;
for (i = 0; pointer[i]; i++)
if (pointer[i] == '\\')
pointer[i] = '/';
- return ret;
+ return pointer;
}
/*
@@ -1505,33 +1577,38 @@ int mingw_rename(const char *pold, const char *pnew)
{
DWORD attrs, gle;
int tries = 0;
+ wchar_t wpold[MAX_PATH], wpnew[MAX_PATH];
+ if (utftowcs(wpold, pold, MAX_PATH) < 0)
+ return -1;
+ if (utftowcs(wpnew, pnew, MAX_PATH) < 0)
+ return -1;
/*
* Try native rename() first to get errno right.
* It is based on MoveFile(), which cannot overwrite existing files.
*/
- if (!rename(pold, pnew))
+ if (!_wrename(wpold, wpnew))
return 0;
if (errno != EEXIST)
return -1;
repeat:
- if (MoveFileEx(pold, pnew, MOVEFILE_REPLACE_EXISTING))
+ if (MoveFileExW(wpold, wpnew, MOVEFILE_REPLACE_EXISTING))
return 0;
/* TODO: translate more errors */
gle = GetLastError();
if (gle == ERROR_ACCESS_DENIED &&
- (attrs = GetFileAttributes(pnew)) != INVALID_FILE_ATTRIBUTES) {
+ (attrs = GetFileAttributesW(wpnew)) != INVALID_FILE_ATTRIBUTES) {
if (attrs & FILE_ATTRIBUTE_DIRECTORY) {
errno = EISDIR;
return -1;
}
if ((attrs & FILE_ATTRIBUTE_READONLY) &&
- SetFileAttributes(pnew, attrs & ~FILE_ATTRIBUTE_READONLY)) {
- if (MoveFileEx(pold, pnew, MOVEFILE_REPLACE_EXISTING))
+ SetFileAttributesW(wpnew, attrs & ~FILE_ATTRIBUTE_READONLY)) {
+ if (MoveFileExW(wpold, wpnew, MOVEFILE_REPLACE_EXISTING))
return 0;
gle = GetLastError();
/* revert file attributes on failure */
- SetFileAttributes(pnew, attrs);
+ SetFileAttributesW(wpnew, attrs);
}
}
if (tries < ARRAY_SIZE(delay) && gle == ERROR_ACCESS_DENIED) {
@@ -1741,11 +1818,17 @@ void mingw_open_html(const char *unixpath)
int link(const char *oldpath, const char *newpath)
{
- typedef BOOL (WINAPI *T)(const char*, const char*, LPSECURITY_ATTRIBUTES);
+ typedef BOOL (WINAPI *T)(LPCWSTR, LPCWSTR, LPSECURITY_ATTRIBUTES);
static T create_hard_link = NULL;
+ wchar_t woldpath[MAX_PATH], wnewpath[MAX_PATH];
+ if (utftowcs(woldpath, oldpath, MAX_PATH) < 0)
+ return -1;
+ if (utftowcs(wnewpath, newpath, MAX_PATH) < 0)
+ return -1;
+
if (!create_hard_link) {
create_hard_link = (T) GetProcAddress(
- GetModuleHandle("kernel32.dll"), "CreateHardLinkA");
+ GetModuleHandle("kernel32.dll"), "CreateHardLinkW");
if (!create_hard_link)
create_hard_link = (T)-1;
}
@@ -1753,7 +1836,7 @@ int link(const char *oldpath, const char *newpath)
errno = ENOSYS;
return -1;
}
- if (!create_hard_link(newpath, oldpath, NULL)) {
+ if (!create_hard_link(wnewpath, woldpath, NULL)) {
errno = err_win_to_posix(GetLastError());
return -1;
}
diff --git a/compat/mingw.h b/compat/mingw.h
index b4ff6cd..4454978 100644
--- a/compat/mingw.h
+++ b/compat/mingw.h
@@ -181,6 +181,19 @@ FILE *mingw_fopen (const char *filename, const char *otype);
FILE *mingw_freopen (const char *filename, const char *otype, FILE *stream);
#define freopen mingw_freopen
+int mingw_access(const char *filename, int mode);
+#undef access
+#define access mingw_access
+
+int mingw_chdir(const char *dirname);
+#define chdir mingw_chdir
+
+int mingw_chmod(const char *filename, int mode);
+#define chmod mingw_chmod
+
+char *mingw_mktemp(char *template);
+#define mktemp mingw_mktemp
+
char *mingw_getcwd(char *pointer, int len);
#define getcwd mingw_getcwd
--
1.7.6.msysgit.1
Signed-off-by: Karsten Blees <bl...@dcon.de>
---
compat/mingw.c | 18 ++++++++++++++----
1 files changed, 14 insertions(+), 4 deletions(-)
diff --git a/compat/mingw.c b/compat/mingw.c
index fe759c0..818a7f0 100644
--- a/compat/mingw.c
+++ b/compat/mingw.c
@@ -938,9 +938,10 @@ static pid_t mingw_spawnve_fd(const char *cmd, const char **argv, char **env,
const char *dir,
int prepend_cmd, int fhin, int fhout, int fherr)
{
- STARTUPINFO si;
+ STARTUPINFOW si;
PROCESS_INFORMATION pi;
struct strbuf envblk, args;
+ wchar_t wcmd[MAX_PATH], wdir[MAX_PATH], *wargs;
unsigned flags;
BOOL ret;
@@ -976,6 +977,11 @@ static pid_t mingw_spawnve_fd(const char *cmd, const char **argv, char **env,
si.hStdOutput = (HANDLE) _get_osfhandle(fhout);
si.hStdError = (HANDLE) _get_osfhandle(fherr);
+ if (utftowcs(wcmd, cmd, MAX_PATH) < 0)
+ return -1;
+ if (dir && utftowcs(wdir, dir, MAX_PATH) < 0)
+ return -1;
+
/* concatenate argv, quoting args as we go */
strbuf_init(&args, 0);
if (prepend_cmd) {
@@ -993,6 +999,10 @@ static pid_t mingw_spawnve_fd(const char *cmd, const char **argv, char **env,
free(quoted);
}
+ wargs = xmalloc((2 * args.len + 1) * sizeof(wchar_t));
+ utftowcs(wargs, args.buf, 2 * args.len + 1);
+ strbuf_release(&args);
+
if (env) {
int count = 0;
char **e, **sorted_env;
@@ -1014,12 +1024,12 @@ static pid_t mingw_spawnve_fd(const char *cmd, const char **argv, char **env,
}
memset(&pi, 0, sizeof(pi));
- ret = CreateProcess(cmd, args.buf, NULL, NULL, TRUE, flags,
- env ? envblk.buf : NULL, dir, &si, &pi);
+ ret = CreateProcessW(wcmd, wargs, NULL, NULL, TRUE, flags,
+ env ? envblk.buf : NULL, dir ? wdir : NULL, &si, &pi);
if (env)
strbuf_release(&envblk);
- strbuf_release(&args);
+ free(wargs);
if (!ret) {
errno = ENOENT;
--
1.7.6.msysgit.1
On most unix systems, the system encoding (and thus the TCL system
encoding) will be UTF-8, so file names will be displayed correctly.
On Windows, it is impossible to set the system encoding to UTF-8. Changing
the TCL system encoding (via 'encoding system ...', e.g. in the startup
code) is explicitly discouraged by the TCL docs.
Change gitk and git-gui functions dealing with file names to always convert
from and to UTF-8.
Signed-off-by: Karsten Blees <bl...@dcon.de>
---
git-gui/git-gui.sh | 6 +++---
git-gui/lib/browser.tcl | 2 +-
git-gui/lib/index.tcl | 6 +++---
gitk-git/gitk | 16 ++++++++--------
4 files changed, 15 insertions(+), 15 deletions(-)
diff --git a/git-gui/git-gui.sh b/git-gui/git-gui.sh
index 0d5bd34..74f10bf 100755
--- a/git-gui/git-gui.sh
+++ b/git-gui/git-gui.sh
@@ -1605,7 +1605,7 @@ proc read_diff_index {fd after} {
set i [split [string range $buf_rdi $c [expr {$z1 - 2}]] { }]
set p [string range $buf_rdi $z1 [expr {$z2 - 1}]]
merge_state \
- [encoding convertfrom $p] \
+ [encoding convertfrom utf-8 $p] \
[lindex $i 4]? \
[list [lindex $i 0] [lindex $i 2]] \
[list]
@@ -1638,7 +1638,7 @@ proc read_diff_files {fd after} {
set i [split [string range $buf_rdf $c [expr {$z1 - 2}]] { }]
set p [string range $buf_rdf $z1 [expr {$z2 - 1}]]
merge_state \
- [encoding convertfrom $p] \
+ [encoding convertfrom utf-8 $p] \
?[lindex $i 4] \
[list] \
[list [lindex $i 0] [lindex $i 2]]
@@ -1661,7 +1661,7 @@ proc read_ls_others {fd after} {
set pck [split $buf_rlo "\0"]
set buf_rlo [lindex $pck end]
foreach p [lrange $pck 0 end-1] {
- set p [encoding convertfrom $p]
+ set p [encoding convertfrom utf-8 $p]
if {[string index $p end] eq {/}} {
set p [string range $p 0 end-1]
}
diff --git a/git-gui/lib/browser.tcl b/git-gui/lib/browser.tcl
index d68b1f9..f4ee01d 100644
--- a/git-gui/lib/browser.tcl
+++ b/git-gui/lib/browser.tcl
@@ -191,7 +191,7 @@ method _ls {tree_id {name {}}} {
$w conf -state disabled
set fd [git_read ls-tree -z $tree_id]
- fconfigure $fd -blocking 0 -translation binary
+ fconfigure $fd -blocking 0 -translation binary -encoding utf-8
fileevent $fd readable [cb _read $fd]
}
diff --git a/git-gui/lib/index.tcl b/git-gui/lib/index.tcl
index 5d7bbf2..3d1d973 100644
--- a/git-gui/lib/index.tcl
+++ b/git-gui/lib/index.tcl
@@ -115,7 +115,7 @@ proc write_update_indexinfo {fd pathList totalCnt batch after} {
set info [lindex $s 2]
if {$info eq {}} continue
- puts -nonewline $fd "$info\t[encoding convertto $path]\0"
+ puts -nonewline $fd "$info\t[encoding convertto utf-8 $path]\0"
display_file $path $new
}
@@ -186,7 +186,7 @@ proc write_update_index {fd pathList totalCnt batch after} {
?M {set new M_}
?? {continue}
}
- puts -nonewline $fd "[encoding convertto $path]\0"
+ puts -nonewline $fd "[encoding convertto utf-8 $path]\0"
display_file $path $new
}
@@ -247,7 +247,7 @@ proc write_checkout_index {fd pathList totalCnt batch after} {
?M -
?T -
?D {
- puts -nonewline $fd "[encoding convertto $path]\0"
+ puts -nonewline $fd "[encoding convertto utf-8 $path]\0"
display_file $path ?_
}
}
diff --git a/gitk-git/gitk b/gitk-git/gitk
index 935ed5c..9facdbf 100755
--- a/gitk-git/gitk
+++ b/gitk-git/gitk
@@ -7251,7 +7251,7 @@ proc gettreeline {gtf id} {
if {[string index $fname 0] eq "\""} {
set fname [lindex $fname 0]
}
- set fname [encoding convertfrom $fname]
+ set fname [encoding convertfrom utf-8 $fname]
lappend treefilelist($id) $fname
}
if {![eof $gtf]} {
@@ -7474,7 +7474,7 @@ proc gettreediffline {gdtf ids} {
if {[string index $file 0] eq "\""} {
set file [lindex $file 0]
}
- set file [encoding convertfrom $file]
+ set file [encoding convertfrom utf-8 $file]
if {$file ne [lindex $treediff end]} {
lappend treediff $file
lappend sublist $file
@@ -7623,7 +7623,7 @@ proc makediffhdr {fname ids} {
global ctext curdiffstart treediffs diffencoding
global ctext_file_names jump_to_here targetline diffline
- set fname [encoding convertfrom $fname]
+ set fname [encoding convertfrom utf-8 $fname]
set diffencoding [get_path_encoding $fname]
set i [lsearch -exact $treediffs($ids) $fname]
if {$i >= 0} {
@@ -7657,7 +7657,7 @@ proc getblobdiffline {bdf ids} {
}
if {![string compare -length 5 "diff " $line]} {
if {![regexp {^diff (--cc|--git) } $line m type]} {
- set line [encoding convertfrom $line]
+ set line [encoding convertfrom utf-8 $line]
$ctext insert end "$line\n" hunksep
continue
}
@@ -7704,7 +7704,7 @@ proc getblobdiffline {bdf ids} {
makediffhdr $fname $ids
} elseif {![string compare -length 16 "* Unmerged path " $line]} {
- set fname [encoding convertfrom [string range $line 16 end]]
+ set fname [encoding convertfrom utf-8 [string range $line 16 end]]
$ctext insert end "\n"
set curdiffstart [$ctext index "end - 1c"]
lappend ctext_file_names $fname
@@ -7759,7 +7759,7 @@ proc getblobdiffline {bdf ids} {
if {[string index $fname 0] eq "\""} {
set fname [lindex $fname 0]
}
- set fname [encoding convertfrom $fname]
+ set fname [encoding convertfrom utf-8 $fname]
set i [lsearch -exact $treediffs($ids) $fname]
if {$i >= 0} {
setinlist difffilestart $i $curdiffstart
@@ -7778,7 +7778,7 @@ proc getblobdiffline {bdf ids} {
set diffinhdr 0
continue
}
- set line [encoding convertfrom $line]
+ set line [encoding convertfrom utf-8 $line]
$ctext insert end "$line\n" filesep
} else {
@@ -11361,7 +11361,7 @@ proc cache_gitattr {attr pathlist} {
foreach row [split $rlist "\n"] {
if {[regexp "(.*): $attr: (.*)" $row m path value]} {
if {[string index $path 0] eq "\""} {
- set path [encoding convertfrom [lindex $path 0]]
+ set path [encoding convertfrom utf-8 [lindex $path 0]]
}
set path_attr_cache($attr,$path) $value
}
--
1.7.6.msysgit.1
Signed-off-by: Karsten Blees <bl...@dcon.de>
---
compat/mingw.c | 35 +++++++++++++++++++++++++++++++++--
1 files changed, 33 insertions(+), 2 deletions(-)
diff --git a/compat/mingw.c b/compat/mingw.c
index 818a7f0..e3cd129 100644
--- a/compat/mingw.c
+++ b/compat/mingw.c
@@ -2053,10 +2053,41 @@ int wcstoutf(char *utf, const wchar_t *wcs, size_t utflen)
*/
int _CRT_glob = 0;
+typedef struct {
+ int newmode;
+} _startupinfo;
+
+extern int __wgetmainargs(int *argc, wchar_t ***argv, wchar_t ***env, int glob,
+ _startupinfo *si);
+
void mingw_startup()
{
- /* copy executable name to argv[0] */
- __argv[0] = xstrdup(_pgmptr);
+ int i, len, maxlen, argc;
+ char *buffer;
+ wchar_t **wenv, **wargv;
+ _startupinfo si;
+
+ /* get wide char arguments and environment */
+ si.newmode = 0;
+ __wgetmainargs(&argc, &wargv, &wenv, _CRT_glob, &si);
+
+ /* determine size of argv and environ conversion buffer */
+ maxlen = wcslen(_wpgmptr);
+ for (i = 1; i < argc; i++)
+ maxlen = max(maxlen, wcslen(wargv[i]));
+
+ /* allocate buffer (wchar_t encodes to max 3 UTF-8 bytes) */
+ maxlen = 3 * maxlen + 1;
+ buffer = xmalloc(maxlen);
+
+ /* convert command line arguments and environment to UTF-8 */
+ len = wcstoutf(buffer, _wpgmptr, maxlen);
+ __argv[0] = xmemdupz(buffer, len);
+ for (i = 1; i < argc; i++) {
+ len = wcstoutf(buffer, wargv[i], maxlen);
+ __argv[i] = xmemdupz(buffer, len);
+ }
+ free(buffer);
/* initialize critical section for waitpid pinfo_t list */
InitializeCriticalSection(&pinfo_cs);
--
1.7.6.msysgit.1
In t3901, most tests still work if we don't mess up the repository encoding
in setup, so don't switch to ISO-8859-1 on MinGW.
Note that i18n tests that do their encoding tricks via encoded files (such
as t3900) are not affected by this.
Signed-off-by: Karsten Blees <bl...@dcon.de>
---
t/t3901-i18n-patch.sh | 19 +++++++++++--------
t/t4201-shortlog.sh | 6 +++---
t/t8005-blame-i18n.sh | 8 ++++----
3 files changed, 18 insertions(+), 15 deletions(-)
1.7.6.msysgit.1
Prevent potential inconsistencies by using the same algorithm (case-
sensitive before case-insensitive) for all environment functions. This can
be implemented in the now central lookup_env function.
The original MSVCRT getenv (and thus, #undef getenv) is no longer needed,
direct references to mingw_getenv are no longer necessary.
Signed-off-by: Karsten Blees <bl...@dcon.de>
---
compat/mingw.c | 37 ++++++++++++++++++-------------------
1 files changed, 18 insertions(+), 19 deletions(-)
diff --git a/compat/mingw.c b/compat/mingw.c
index 93e7a58..7c8a1bc 100644
--- a/compat/mingw.c
+++ b/compat/mingw.c
@@ -181,7 +181,7 @@ static int ask_yes_no_if_possible(const char *format, ...)
vsnprintf(question, sizeof(question), format, args);
va_end(args);
- if ((retry_hook[0] = mingw_getenv("GIT_ASK_YESNO"))) {
+ if ((retry_hook[0] = getenv("GIT_ASK_YESNO"))) {
retry_hook[1] = question;
return !run_command_v_opt(retry_hook, 0);
}
@@ -836,7 +836,7 @@ static const char *parse_interpreter(const char *cmd)
*/
static char **get_path_split(void)
{
- char *p, **path, *envpath = mingw_getenv("PATH");
+ char *p, **path, *envpath = getenv("PATH");
int i, n = 0;
if (!envpath || !*envpath)
@@ -1215,13 +1215,21 @@ void free_environ(char **env)
static int lookup_env(char **env, const char *name, size_t nmln)
{
int i;
-
+ /*
+ * try case-sensitive match first, this is necessary for sh-i18n--envsubst
+ * to work with environment variables that differ only in case (e.g. $PATH
+ * and $path)
+ */
for (i = 0; env[i]; i++) {
- if (0 == strncmp(env[i], name, nmln)
- && '=' == env[i][nmln])
+ if (!strncmp(env[i], name, nmln) && '=' == env[i][nmln])
/* matches */
return i;
}
+ /* if there's no case-sensitive match, try case-insensitive instead */
+ for (i = 0; env[i]; i++) {
+ if (!strncasecmp(env[i], name, nmln) && '=' == env[i][nmln])
+ return i;
+ }
return -1;
}
@@ -1267,32 +1275,23 @@ char **make_augmented_environ(const char *const *vars)
return env;
}
-#undef getenv
-
-/*
- * The system's getenv looks up the name in a case-insensitive manner.
- * This version tries a case-sensitive lookup and falls back to
- * case-insensitive if nothing was found. This is necessary because,
- * as a prominent example, CMD sets 'Path', but not 'PATH'.
- * Warning: not thread-safe.
- */
-static char *getenv_cs(const char *name)
+static char *do_getenv(const char *name)
{
size_t len = strlen(name);
int i = lookup_env(environ, name, len);
if (i >= 0)
return environ[i] + len + 1; /* skip past name and '=' */
- return getenv(name);
+ return NULL;
}
char *mingw_getenv(const char *name)
{
- char *result = getenv_cs(name);
+ char *result = do_getenv(name);
if (!result && !strcmp(name, "TMPDIR")) {
/* on Windows it is TMP and TEMP */
- result = getenv_cs("TMP");
+ result = do_getenv("TMP");
if (!result)
- result = getenv_cs("TEMP");
+ result = do_getenv("TEMP");
}
else if (!result && !strcmp(name, "TERM")) {
/* simulate TERM to enable auto-color (see color.c) */
--
1.7.6.msysgit.1
No changes to getenv() are necessary, as the MSVCRT version is implemented
on top of char **environ.
However, putenv / _wputenv from MSVCRT no longer work, for two reasons:
1. they try to keep environ, _wenviron and the Win32 process environment
in sync, using the default system encoding instead of UTF-8 to convert
between charsets
2. msysgit and MSVCRT use different allocators, memory allocated in git
cannot be freed by the CRT and vice versa
Implement mingw_putenv using the env_setenv helper function from the
environment merge code.
Signed-off-by: Karsten Blees <bl...@dcon.de>
---
compat/mingw.c | 15 +++++++++++++++
compat/mingw.h | 2 ++
2 files changed, 17 insertions(+), 0 deletions(-)
diff --git a/compat/mingw.c b/compat/mingw.c
index b38a7b0..3702b06 100644
--- a/compat/mingw.c
+++ b/compat/mingw.c
@@ -1299,6 +1299,12 @@ char *mingw_getenv(const char *name)
return result;
}
+int mingw_putenv(const char *namevalue)
+{
+ environ = env_setenv(environ, namevalue);
+ return 0;
+}
+
/*
* Note, this isn't a complete replacement for getaddrinfo. It assumes
* that service contains a numerical port, or that it is null. It
@@ -2077,6 +2083,11 @@ void mingw_startup()
maxlen = wcslen(_wpgmptr);
for (i = 1; i < argc; i++)
maxlen = max(maxlen, wcslen(wargv[i]));
+ for (i = 0; wenv[i]; i++)
+ maxlen = max(maxlen, wcslen(wenv[i]));
+
+ /* nedmalloc can't free CRT memory, allocate resizable environment list */
+ environ = xcalloc(i + 1, sizeof(char*));
/* allocate buffer (wchar_t encodes to max 3 UTF-8 bytes) */
maxlen = 3 * maxlen + 1;
@@ -2089,6 +2100,10 @@ void mingw_startup()
len = wcstoutf(buffer, wargv[i], maxlen);
__argv[i] = xmemdupz(buffer, len);
}
+ for (i = 0; wenv[i]; i++) {
+ len = wcstoutf(buffer, wenv[i], maxlen);
+ environ[i] = xmemdupz(buffer, len);
+ }
free(buffer);
/* initialize critical section for waitpid pinfo_t list */
diff --git a/compat/mingw.h b/compat/mingw.h
index fac4634..29cbef7 100644
--- a/compat/mingw.h
+++ b/compat/mingw.h
@@ -199,6 +199,8 @@ char *mingw_getcwd(char *pointer, int len);
char *mingw_getenv(const char *name);
#define getenv mingw_getenv
+int mingw_putenv(const char *namevalue);
+#define putenv mingw_putenv
struct hostent *mingw_gethostbyname(const char *host);
#define gethostbyname mingw_gethostbyname
--
1.7.6.msysgit.1
Global variables are either initialized on startup (single threaded) or
exclusively modified by the background thread. Threads communicate through
the pipe, no further synchronization is necessary.
Due to the byte-oriented pipe, ANSI escape sequences can no longer be
expected to arrive in one piece. Replace the string-based ansi_emulate()
with a simple stateful parser (this also fixes colored diff hunk headers,
which were broken as of commit 2efcc977).
Override isatty to return true for the pipes redirecting to the console.
Exec/spawn obtain the original console handle to pass to the next process
via winansi_get_osfhandle().
All other overrides are gone, the default stdio implementations work as
expected with the piped stdout/stderr descriptors.
Limitations: doesn't track reopened or duped file descriptors, i.e.:
- fdopen(1/2) returns fully buffered streams
- dup(1/2), dup2(1/2) returns normal pipe descriptors (i.e. isatty() =
false, winansi_get_osfhandle won't return the original console handle)
Currently, only the git-format-patch command uses xfdopen(xdup(1)) (see
"realstdout" in builtin/log.c), but works well with these limitations.
Signed-off-by: Karsten Blees <bl...@dcon.de>
---
compat/mingw.c | 9 +-
compat/mingw.h | 12 +--
compat/winansi.c | 359 +++++++++++++++++++++++++++++++++---------------------
3 files changed, 232 insertions(+), 148 deletions(-)
diff --git a/compat/mingw.c b/compat/mingw.c
index e3cd129..2f346dc 100644
--- a/compat/mingw.c
+++ b/compat/mingw.c
@@ -973,9 +973,9 @@ static pid_t mingw_spawnve_fd(const char *cmd, const char **argv, char **env,
memset(&si, 0, sizeof(si));
si.cb = sizeof(si);
si.dwFlags = STARTF_USESTDHANDLES;
- si.hStdInput = (HANDLE) _get_osfhandle(fhin);
- si.hStdOutput = (HANDLE) _get_osfhandle(fhout);
- si.hStdError = (HANDLE) _get_osfhandle(fherr);
+ si.hStdInput = winansi_get_osfhandle(fhin);
+ si.hStdOutput = winansi_get_osfhandle(fhout);
+ si.hStdError = winansi_get_osfhandle(fherr);
if (utftowcs(wcmd, cmd, MAX_PATH) < 0)
return -1;
@@ -2097,4 +2097,7 @@ void mingw_startup()
_setmode(_fileno(stdin), _O_BINARY);
_setmode(_fileno(stdout), _O_BINARY);
_setmode(_fileno(stderr), _O_BINARY);
+
+ /* initialize Unicode console */
+ winansi_init();
}
diff --git a/compat/mingw.h b/compat/mingw.h
index 4454978..fac4634 100644
--- a/compat/mingw.h
+++ b/compat/mingw.h
@@ -297,14 +297,10 @@ sig_handler_t mingw_signal(int sig, sig_handler_t handler);
* ANSI emulation wrappers
*/
-int winansi_fputs(const char *str, FILE *stream);
-int winansi_printf(const char *format, ...) __attribute__((format (printf, 1, 2)));
-int winansi_fprintf(FILE *stream, const char *format, ...) __attribute__((format (printf, 2, 3)));
-int winansi_vfprintf(FILE *stream, const char *format, va_list list);
-#define fputs winansi_fputs
-#define printf(...) winansi_printf(__VA_ARGS__)
-#define fprintf(...) winansi_fprintf(__VA_ARGS__)
-#define vfprintf winansi_vfprintf
+void winansi_init(void);
+int winansi_isatty(int fd);
+HANDLE winansi_get_osfhandle(int fd);
+#define isatty winansi_isatty
/*
* git specific compatibility
diff --git a/compat/winansi.c b/compat/winansi.c
index bf514f9..d11b532 100644
--- a/compat/winansi.c
+++ b/compat/winansi.c
@@ -4,18 +4,13 @@
#undef NOGDI
#include "../git-compat-util.h"
-#include <malloc.h>
#include <wingdi.h>
#include <winreg.h>
/*
Functions to be wrapped:
*/
-#undef printf
-#undef fprintf
-#undef fputs
-#undef vfprintf
-/* TODO: write */
+#undef isatty
/*
ANSI codes used by git: m, K
@@ -28,8 +23,10 @@ static HANDLE console;
static WORD plain_attr;
static WORD attr;
static int negative;
-static FILE *last_stream = NULL;
static int non_ascii_used = 0;
+static HANDLE hthread, hread, hwrite;
+static HANDLE hwrite1 = INVALID_HANDLE_VALUE, hwrite2 = INVALID_HANDLE_VALUE;
+static HANDLE hconsole1, hconsole2;
#ifdef __MINGW32__
typedef struct _CONSOLE_FONT_INFOEX {
@@ -74,64 +71,39 @@ static void warn_if_raster_font(void)
}
}
- if (!(fontFamily & TMPF_TRUETYPE))
- warning("Your console font probably doesn\'t support "
- "Unicode. If you experience strange characters in the output, "
- "consider switching to a TrueType font such as Lucida Console!");
+ if (!(fontFamily & TMPF_TRUETYPE)) {
+ const wchar_t *msg = L"\nWarning: Your console font probably doesn\'t "
+ L"support Unicode. If you experience strange characters in the "
+ L"output, consider switching to a TrueType font such as Lucida "
+ L"Console!\n";
+ WriteConsoleW(console, msg, wcslen(msg), NULL, NULL);
+ }
}
-static int is_console(FILE *stream)
+static int is_console(int fd)
{
CONSOLE_SCREEN_BUFFER_INFO sbi;
HANDLE hcon;
- static int initialized = 0;
-
- /* use cached value if stream hasn't changed */
- if (stream == last_stream)
- return console != NULL;
-
- last_stream = stream;
- console = NULL;
-
- /* get OS handle of the stream */
- hcon = (HANDLE) _get_osfhandle(_fileno(stream));
+ /* get OS handle of the file descriptor */
+ hcon = (HANDLE) _get_osfhandle(fd);
if (hcon == INVALID_HANDLE_VALUE)
return 0;
+ /* check if its a device (i.e. console, printer, serial port) */
+ if (GetFileType(hcon) != FILE_TYPE_CHAR)
+ return 0;
+
/* check if its a handle to a console output screen buffer */
if (!GetConsoleScreenBufferInfo(hcon, &sbi))
return 0;
- if (!initialized) {
- attr = plain_attr = sbi.wAttributes;
- negative = 0;
- initialized = 1;
- /* check console font on exit */
- atexit(warn_if_raster_font);
- }
-
- console = hcon;
+ /* initialize attributes */
+ attr = plain_attr = sbi.wAttributes;
+ negative = 0;
return 1;
}
-static int write_console(const char *str, size_t len)
-{
- /* convert utf-8 to utf-16, write directly to console */
- int wlen = MultiByteToWideChar(CP_UTF8, 0, str, len, NULL, 0);
- wchar_t *wbuf = (wchar_t *) alloca(wlen * sizeof(wchar_t));
- MultiByteToWideChar(CP_UTF8, 0, str, len, wbuf, wlen);
-
- WriteConsoleW(console, wbuf, wlen, NULL, NULL);
-
- /* remember if non-ascii characters are printed */
- if (wlen != len)
- non_ascii_used = 1;
-
- /* return original (utf-8 encoded) length */
- return len;
-}
-
#define FOREGROUND_ALL (FOREGROUND_RED | FOREGROUND_GREEN | FOREGROUND_BLUE)
#define BACKGROUND_ALL (BACKGROUND_RED | BACKGROUND_GREEN | BACKGROUND_BLUE)
@@ -175,18 +147,13 @@ static void erase_in_line(void)
&dummy);
}
-
-static const char *set_attr(const char *str)
+static void set_attr(char func, const int *params, int paramlen)
{
- const char *func;
- size_t len = strspn(str, "0123456789;");
- func = str + len;
-
- switch (*func) {
+ int i;
+ switch (func) {
case 'm':
- do {
- long val = strtol(str, (char **)&str, 10);
- switch (val) {
+ for (i = 0; i < paramlen; i++) {
+ switch (params[i]) {
case 0: /* reset */
attr = plain_attr;
negative = 0;
@@ -309,9 +276,7 @@ static const char *set_attr(const char *str)
/* Unsupported code */
break;
}
- str++;
- } while (*(str-1) == ';');
-
+ }
set_console_attr();
break;
case 'K':
@@ -321,112 +286,232 @@ static const char *set_attr(const char *str)
/* Unsupported code */
break;
}
-
- return func + 1;
}
-static int ansi_emulate(const char *str, FILE *stream)
+#define BUFFER_SIZE 4096
+#define MAX_PARAMS 16
+
+static void write_console(char *str, size_t len)
{
- int rv = 0;
- const char *pos = str;
+ /* only called from console_thread, so a static buffer will do */
+ static wchar_t wbuf[2 * BUFFER_SIZE + 1];
- fflush(stream);
+ /* convert utf-8 to utf-16 */
+ int wlen = utftowcsn(wbuf, str, 2 * BUFFER_SIZE + 1, len);
- while (*pos) {
- pos = strstr(str, "\033[");
- if (pos) {
- size_t len = pos - str;
+ /* write directly to console */
+ WriteConsoleW(console, wbuf, wlen, NULL, NULL);
- if (len) {
- size_t out_len = write_console(str, len);
- rv += out_len;
- if (out_len < len)
- return rv;
- }
+ /* remember if non-ascii characters are printed */
+ if (wlen != len)
+ non_ascii_used = 1;
+}
- str = pos + 2;
- rv += 2;
+enum {
+ TEXT = 0, ESCAPE = 033, BRACKET = '[', EXIT = -1
+};
- pos = set_attr(str);
- rv += pos - str;
- str = pos;
- } else {
- size_t len = strlen(str);
- rv += write_console(str, len);
- return rv;
+static DWORD WINAPI console_thread(LPVOID unused)
+{
+ char buffer[BUFFER_SIZE];
+ DWORD bytes;
+ int start, end, c, parampos = 0, state = TEXT;
+ int params[MAX_PARAMS];
+
+ while (state != EXIT) {
+ /* read next chunk of bytes from the pipe */
+ if (!ReadFile(hread, buffer, BUFFER_SIZE, &bytes, NULL)) {
+ /* exit if pipe has been closed */
+ if (GetLastError() == ERROR_BROKEN_PIPE)
+ break;
+ /* ignore other errors */
+ continue;
}
+
+ /* scan the bytes and handle ANSI control codes */
+ start = end = 0;
+ while (end < bytes) {
+ c = buffer[end++];
+ switch (state) {
+ case TEXT:
+ if (c == ESCAPE) {
+ /* print text seen so far */
+ if (end - 1 > start)
+ write_console(buffer + start, end - 1 - start);
+
+ /* then start parsing escape sequence */
+ start = end - 1;
+ memset(params, 0, sizeof(params));
+ parampos = 0;
+ state = ESCAPE;
+ }
+ break;
+
+ case ESCAPE:
+ /* continue if "\033[", otherwise bail out */
+ state = (c == BRACKET) ? BRACKET : TEXT;
+ break;
+
+ case BRACKET:
+ /* parse [0-9;]* into array of parameters */
+ if (c >= '0' && c <= '9') {
+ params[parampos] *= 10;
+ params[parampos] += c - '0';
+ } else if (c == ';') {
+ /* next parameter, bail out if out of bounds */
+ parampos++;
+ if (parampos >= MAX_PARAMS)
+ state = TEXT;
+ } else if (c == 'q') {
+ /* "\033[q": terminate the thread */
+ state = EXIT;
+ } else {
+ /* end of escape sequence, change console attributes */
+ set_attr(c, params, parampos + 1);
+ start = end;
+ state = TEXT;
+ }
+ break;
+ }
+ }
+
+ /* print remaining text unless we're parsing an escape sequence */
+ if (state == TEXT && end > start)
+ write_console(buffer + start, end - start);
}
- return rv;
+
+ /* check if the console font supports unicode */
+ warn_if_raster_font();
+
+ CloseHandle(hread);
+ return 0;
}
-int winansi_fputs(const char *str, FILE *stream)
+static void winansi_exit(void)
{
- int rv;
-
- if (!is_console(stream))
- return fputs(str, stream);
+ DWORD dummy;
+ /* flush all streams */
+ _flushall();
+
+ /* close the write ends of the pipes */
+ if (hwrite1 != INVALID_HANDLE_VALUE)
+ CloseHandle(hwrite1);
+ if (hwrite2 != INVALID_HANDLE_VALUE)
+ CloseHandle(hwrite2);
+
+ /* send termination sequence to the thread */
+ WriteFile(hwrite, "\033[q", 3, &dummy, NULL);
+ CloseHandle(hwrite);
+
+ /* allow the thread to copy remaining data to the console */
+ WaitForSingleObject(hthread, 1000);
+ CloseHandle(hthread);
+}
- rv = ansi_emulate(str, stream);
+static void die_lasterr(const char *fmt, ...)
+{
+ va_list params;
+ va_start(params, fmt);
+ errno = err_win_to_posix(GetLastError());
+ die_errno(fmt, params);
+ va_end(params);
+}
- if (rv >= 0)
- return 0;
- else
- return EOF;
+static HANDLE duplicate_handle(HANDLE hnd)
+{
+ HANDLE hresult, hproc = GetCurrentProcess();
+ if (!DuplicateHandle(hproc, hnd, hproc, &hresult, 0, TRUE,
+ DUPLICATE_SAME_ACCESS))
+ die_lasterr("DuplicateHandle(%li) failed", (long) hnd);
+ return hresult;
}
-int winansi_vfprintf(FILE *stream, const char *format, va_list list)
+static HANDLE redirect_console(FILE *stream, HANDLE *phcon, int new_fd)
{
- int len, rv;
- char small_buf[256];
- char *buf = small_buf;
- va_list cp;
+ /* get original console handle */
+ int fd = _fileno(stream);
+ HANDLE hcon = (HANDLE) _get_osfhandle(fd);
+ if (hcon == INVALID_HANDLE_VALUE)
+ die_errno("_get_osfhandle(%i) failed", fd);
- if (!is_console(stream))
- goto abort;
+ /* save a copy to phcon and console (used by the background thread) */
+ console = *phcon = duplicate_handle(hcon);
- va_copy(cp, list);
- len = vsnprintf(small_buf, sizeof(small_buf), format, cp);
- va_end(cp);
+ /* duplicate new_fd over fd (closes fd and associated handle (hcon)) */
+ if (_dup2(new_fd, fd))
+ die_errno("_dup2(%i, %i) failed", new_fd, fd);
- if (len > sizeof(small_buf) - 1) {
- buf = malloc(len + 1);
- if (!buf)
- goto abort;
+ /* no buffering, or stdout / stderr will be out of sync */
+ setbuf(stream, NULL);
+ return (HANDLE) _get_osfhandle(fd);
+}
- len = vsnprintf(buf, len + 1, format, list);
- }
+void winansi_init(void)
+{
+ int con1, con2, hwrite_fd;
- rv = ansi_emulate(buf, stream);
+ /* check if either stdout or stderr is a console output screen buffer */
+ con1 = is_console(1);
+ con2 = is_console(2);
+ if (!con1 && !con2)
+ return;
- if (buf != small_buf)
- free(buf);
- return rv;
+ /* create an anonymous pipe */
+ if (!CreatePipe(&hread, &hwrite, NULL, BUFFER_SIZE))
+ die_lasterr("CreatePipe failed");
-abort:
- rv = vfprintf(stream, format, list);
- return rv;
-}
+ /* start console spool thread on the pipe's read end */
+ hthread = CreateThread(NULL, 0, console_thread, NULL, 0, NULL);
+ if (hthread == INVALID_HANDLE_VALUE)
+ die_lasterr("CreateThread(console_thread) failed");
-int winansi_fprintf(FILE *stream, const char *format, ...)
-{
- va_list list;
- int rv;
+ /* schedule cleanup routine */
+ if (atexit(winansi_exit))
+ die_errno("atexit(winansi_exit) failed");
- va_start(list, format);
- rv = winansi_vfprintf(stream, format, list);
- va_end(list);
+ /* create a file descriptor for the write end of the pipe */
+ hwrite_fd = _open_osfhandle((long) duplicate_handle(hwrite), _O_BINARY);
+ if (hwrite_fd == -1)
+ die_errno("_open_osfhandle(%li) failed", (long) hwrite);
- return rv;
+ /* redirect stdout / stderr to the pipe */
+ if (con1)
+ hwrite1 = redirect_console(stdout, &hconsole1, hwrite_fd);
+ if (con2)
+ hwrite2 = redirect_console(stderr, &hconsole2, hwrite_fd);
+
+ /* close pipe file descriptor (also closes the duped hwrite) */
+ close(hwrite_fd);
}
-int winansi_printf(const char *format, ...)
+static int is_same_handle(HANDLE hnd, int fd)
{
- va_list list;
- int rv;
+ return hnd != INVALID_HANDLE_VALUE && hnd == (HANDLE) _get_osfhandle(fd);
+}
- va_start(list, format);
- rv = winansi_vfprintf(stdout, format, list);
- va_end(list);
+/*
+ * Return true if stdout / stderr is a pipe redirecting to the console.
+ */
+int winansi_isatty(int fd)
+{
+ if (fd == 1 && is_same_handle(hwrite1, 1))
+ return 1;
+ else if (fd == 2 && is_same_handle(hwrite2, 2))
+ return 1;
+ else
+ return isatty(fd);
+}
- return rv;
+/*
+ * Returns the real console handle if stdout / stderr is a pipe redirecting
+ * to the console. Allows spawn / exec to pass the console to the next process.
+ */
+HANDLE winansi_get_osfhandle(int fd)
+{
+ if (fd == 1 && is_same_handle(hwrite1, 1))
+ return hconsole1;
+ else if (fd == 2 && is_same_handle(hwrite2, 2))
+ return hconsole2;
+ else
+ return (HANDLE) _get_osfhandle(fd);
}
--
1.7.6.msysgit.1
Disable gitunsetenv in the Makefile and use env_setenv (via mingw_putenv)
instead (this frees removed environment entries).
Move xstrdup from env_setenv to make_augmented_environ, so that
mingw_putenv no longer copies the environment entries (according to POSIX
[1], "the string [...] shall become part of the environment"). This also
fixes the memory leak in gitsetenv, which expects a POSIX compliant putenv.
[1] http://pubs.opengroup.org/onlinepubs/009695399/functions/putenv.html
Signed-off-by: Karsten Blees <bl...@dcon.de>
---
Makefile | 2 --
compat/mingw.c | 10 ++++++----
compat/mingw.h | 1 +
3 files changed, 7 insertions(+), 6 deletions(-)
diff --git a/Makefile b/Makefile
index 4594d81..71bc9a7 100644
--- a/Makefile
+++ b/Makefile
@@ -1095,7 +1095,6 @@ ifeq ($(uname_S),Windows)
NO_SYMLINK_HEAD = YesPlease
NO_IPV6 = YesPlease
NO_SETENV = YesPlease
- NO_UNSETENV = YesPlease
NO_STRCASESTR = YesPlease
NO_STRLCPY = YesPlease
NO_STRTOK_R = YesPlease
@@ -1188,7 +1187,6 @@ ifneq (,$(findstring MINGW,$(uname_S)))
NO_LIBGEN_H = YesPlease
NO_SYMLINK_HEAD = YesPlease
NO_SETENV = YesPlease
- NO_UNSETENV = YesPlease
NO_STRCASESTR = YesPlease
NO_STRLCPY = YesPlease
NO_STRTOK_R = YesPlease
diff --git a/compat/mingw.c b/compat/mingw.c
index 3702b06..93e7a58 100644
--- a/compat/mingw.c
+++ b/compat/mingw.c
@@ -1238,14 +1238,14 @@ static char **env_setenv(char **env, const char *name)
for (i = 0; env[i]; i++)
;
env = xrealloc(env, (i+2)*sizeof(*env));
- env[i] = xstrdup(name);
+ env[i] = (char*) name;
env[i+1] = NULL;
}
}
else {
free(env[i]);
if (*eq)
- env[i] = xstrdup(name);
+ env[i] = (char*) name;
else
for (; env[i]; i++)
env[i] = env[i+1];
@@ -1260,8 +1260,10 @@ char **make_augmented_environ(const char *const *vars)
{
char **env = copy_environ();
- while (*vars)
- env = env_setenv(env, *vars++);
+ while (*vars) {
+ const char *v = *vars++;
+ env = env_setenv(env, strchr(v, '=') ? xstrdup(v) : v);
+ }
return env;
}
diff --git a/compat/mingw.h b/compat/mingw.h
index 29cbef7..131d2ec 100644
--- a/compat/mingw.h
+++ b/compat/mingw.h
@@ -201,6 +201,7 @@ char *mingw_getenv(const char *name);
#define getenv mingw_getenv
int mingw_putenv(const char *namevalue);
#define putenv mingw_putenv
+#define unsetenv mingw_putenv
Signed-off-by: Karsten Blees <bl...@dcon.de>
---
compat/mingw.c | 26 ++++++++++++++------------
1 files changed, 14 insertions(+), 12 deletions(-)
diff --git a/compat/mingw.c b/compat/mingw.c
index 2f346dc..b38a7b0 100644
--- a/compat/mingw.c
+++ b/compat/mingw.c
@@ -940,9 +940,9 @@ static pid_t mingw_spawnve_fd(const char *cmd, const char **argv, char **env,
{
STARTUPINFOW si;
PROCESS_INFORMATION pi;
- struct strbuf envblk, args;
- wchar_t wcmd[MAX_PATH], wdir[MAX_PATH], *wargs;
- unsigned flags;
+ struct strbuf args;
+ wchar_t wcmd[MAX_PATH], wdir[MAX_PATH], *wargs, *wenvblk = NULL;
+ unsigned flags = CREATE_UNICODE_ENVIRONMENT;
BOOL ret;
/* Determine whether or not we are associated to a console */
@@ -959,7 +959,7 @@ static pid_t mingw_spawnve_fd(const char *cmd, const char **argv, char **env,
* instead of CREATE_NO_WINDOW to make ssh
* recognize that it has no console.
*/
- flags = DETACHED_PROCESS;
+ flags |= DETACHED_PROCESS;
} else {
/* There is already a console. If we specified
* DETACHED_PROCESS here, too, Windows would
@@ -967,7 +967,6 @@ static pid_t mingw_spawnve_fd(const char *cmd, const char **argv, char **env,
* The same is true for CREATE_NO_WINDOW.
* Go figure!
*/
- flags = 0;
CloseHandle(cons);
}
memset(&si, 0, sizeof(si));
@@ -1006,6 +1005,7 @@ static pid_t mingw_spawnve_fd(const char *cmd, const char **argv, char **env,
if (env) {
int count = 0;
char **e, **sorted_env;
+ int i = 0, size = 0, envblksz = 0, envblkpos = 0;
for (e = env; *e; e++)
count++;
@@ -1015,20 +1015,22 @@ static pid_t mingw_spawnve_fd(const char *cmd, const char **argv, char **env,
memcpy(sorted_env, env, sizeof(*sorted_env) * (count + 1));
qsort(sorted_env, count, sizeof(*sorted_env), env_compare);
- strbuf_init(&envblk, 0);
- for (e = sorted_env; *e; e++) {
- strbuf_addstr(&envblk, *e);
- strbuf_addch(&envblk, '\0');
+ /* create environment block from temporary environment */
+ for (i = 0; sorted_env[i]; i++) {
+ size = 2 * strlen(sorted_env[i]) + 2;
+ ALLOC_GROW(wenvblk, (envblkpos + size) * sizeof(wchar_t), envblksz);
+ envblkpos += utftowcs(&wenvblk[envblkpos], sorted_env[i], size) + 1;
}
+ /* add final \0 terminator */
+ wenvblk[envblkpos] = 0;
free(sorted_env);
}
memset(&pi, 0, sizeof(pi));
ret = CreateProcessW(wcmd, wargs, NULL, NULL, TRUE, flags,
- env ? envblk.buf : NULL, dir ? wdir : NULL, &si, &pi);
+ wenvblk, dir ? wdir : NULL, &si, &pi);
- if (env)
- strbuf_release(&envblk);
+ free(wenvblk);
free(wargs);
if (!ret) {
--
1.7.6.msysgit.1
Signed-off-by: Karsten Blees <bl...@dcon.de>
---
compat/mingw.c | 50 +++++++++++++++++++++++++++++---------------------
1 files changed, 29 insertions(+), 21 deletions(-)
diff --git a/compat/mingw.c b/compat/mingw.c
index aeae266..0f85f12 100644
--- a/compat/mingw.c
+++ b/compat/mingw.c
@@ -767,19 +767,19 @@ static int lookupenv(char **env, const char *name, size_t nmln)
/*
* If name contains '=', then sets the variable, otherwise it unsets it
+ * Size includes the terminating NULL. Env must have room for size + 1 entries
+ * (in case of insert). Returns the new size. Optionally frees removed entries.
*/
-static char **do_putenv(char **env, const char *name, int free_old)
+static int do_putenv(char **env, const char *name, int size, int free_old)
{
char *eq = strchrnul(name, '=');
int i = lookupenv(env, name, eq-name);
if (i < 0) {
if (*eq) {
- for (i = 0; env[i]; i++)
- ;
- env = xrealloc(env, (i+2)*sizeof(*env));
- env[i] = (char*) name;
- env[i+1] = NULL;
+ env[size - 1] = (char*) name;
+ env[size] = NULL;
+ size++;
}
}
else {
@@ -787,13 +787,20 @@ static char **do_putenv(char **env, const char *name, int free_old)
free(env[i]);
if (*eq)
env[i] = (char*) name;
- else
+ else {
for (; env[i]; i++)
env[i] = env[i+1];
+ size--;
+ }
}
- return env;
+ return size;
}
+/* used number of elements of environ array, including terminating NULL */
+static int environ_size = 0;
+/* allocated size of environ array, in bytes */
+static int environ_alloc = 0;
+
static char *do_getenv(const char *name)
{
size_t len = strlen(name);
@@ -821,7 +828,8 @@ char *mingw_getenv(const char *name)
int mingw_putenv(const char *namevalue)
{
- environ = do_putenv(environ, namevalue, 1);
+ ALLOC_GROW(environ, (environ_size + 1) * sizeof(char*), environ_alloc);
+ environ_size = do_putenv(environ, namevalue, environ_size, 1);
return 0;
}
@@ -1014,25 +1022,22 @@ static char *path_lookup(const char *cmd, char **path, int exe_only)
static wchar_t *make_environment_block(char **deltaenv)
{
wchar_t *wenvblk = NULL;
- int count = 0;
char **tmpenv;
- int i = 0, size = 0, envblksz = 0, envblkpos = 0;
+ int i = 0, size = environ_size, envblksz = 0, envblkpos = 0;
- while (environ[count])
- count++;
+ while (deltaenv && deltaenv[i])
+ i++;
- /* copy the environment */
- tmpenv = xmalloc(sizeof(*tmpenv) * (count + 1));
- memcpy(tmpenv, environ, sizeof(*tmpenv) * (count + 1));
+ /* copy the environment, leaving space for changes */
+ tmpenv = xmalloc((size + i) * sizeof(char*));
+ memcpy(tmpenv, environ, size * sizeof(char*));
/* merge supplied environment changes into the temporary environment */
for (i = 0; deltaenv && deltaenv[i]; i++)
- tmpenv = do_putenv(tmpenv, deltaenv[i], 0);
+ size = do_putenv(tmpenv, deltaenv[i], size, 0);
/* environment must be sorted */
- for (count = 0; tmpenv[count]; count++)
- ;
- qsort(tmpenv, count, sizeof(*tmpenv), compareenv);
+ qsort(tmpenv, size - 1, sizeof(char*), compareenv);
/* create environment block from temporary environment */
for (i = 0; tmpenv[i]; i++) {
@@ -2065,7 +2070,9 @@ void mingw_startup()
maxlen = max(maxlen, wcslen(wenv[i]));
/* nedmalloc can't free CRT memory, allocate resizable environment list */
- environ = xcalloc(i + 1, sizeof(char*));
+ environ = NULL;
+ environ_size = i + 1;
+ ALLOC_GROW(environ, environ_size * sizeof(char*), environ_alloc);
/* allocate buffer (wchar_t encodes to max 3 UTF-8 bytes) */
maxlen = 3 * maxlen + 1;
@@ -2082,6 +2089,7 @@ void mingw_startup()
len = wcstoutf(buffer, wenv[i], maxlen);
environ[i] = xmemdupz(buffer, len);
}
+ environ[i] = NULL;
free(buffer);
/* initialize critical section for waitpid pinfo_t list */
--
1.7.6.msysgit.1
Signed-off-by: Karsten Blees <bl...@dcon.de>
---
compat/mingw.c | 174 ++++++++++++++++++++++++++++----------------------------
1 files changed, 87 insertions(+), 87 deletions(-)
diff --git a/compat/mingw.c b/compat/mingw.c
index dadabeb..425fac8 100644
--- a/compat/mingw.c
+++ b/compat/mingw.c
@@ -737,6 +737,93 @@ char *mingw_getcwd(char *pointer, int len)
return pointer;
}
+static int env_compare(const void *a, const void *b)
+{
+ char *const *ea = a;
+ char *const *eb = b;
+ return strcasecmp(*ea, *eb);
+}
+
+static int lookup_env(char **env, const char *name, size_t nmln)
+{
+ int i;
+ /*
+ * try case-sensitive match first, this is necessary for sh-i18n--envsubst
+ * to work with environment variables that differ only in case (e.g. $PATH
+ * and $path)
+ */
+ for (i = 0; env[i]; i++) {
+ if (!strncmp(env[i], name, nmln) && '=' == env[i][nmln])
+ /* matches */
+ return i;
+ }
+ /* if there's no case-sensitive match, try case-insensitive instead */
+ for (i = 0; env[i]; i++) {
+ if (!strncasecmp(env[i], name, nmln) && '=' == env[i][nmln])
+ return i;
+ }
+ return -1;
+}
+
+/*
+ * If name contains '=', then sets the variable, otherwise it unsets it
+ */
+static char **env_setenv(char **env, const char *name)
+{
+ char *eq = strchrnul(name, '=');
+ int i = lookup_env(env, name, eq-name);
+
+ if (i < 0) {
+ if (*eq) {
+ for (i = 0; env[i]; i++)
+ ;
+ env = xrealloc(env, (i+2)*sizeof(*env));
+ env[i] = (char*) name;
+ env[i+1] = NULL;
+ }
+ }
+ else {
+ free(env[i]);
+ if (*eq)
+ env[i] = (char*) name;
+ else
+ for (; env[i]; i++)
+ env[i] = env[i+1];
+ }
+ return env;
+}
+
+static char *do_getenv(const char *name)
+{
+ size_t len = strlen(name);
+ int i = lookup_env(environ, name, len);
+ if (i >= 0)
+ return environ[i] + len + 1; /* skip past name and '=' */
+ return NULL;
+}
+
+char *mingw_getenv(const char *name)
+{
+ char *result = do_getenv(name);
+ if (!result && !strcmp(name, "TMPDIR")) {
+ /* on Windows it is TMP and TEMP */
+ result = do_getenv("TMP");
+ if (!result)
+ result = do_getenv("TEMP");
+ }
+ else if (!result && !strcmp(name, "TERM")) {
+ /* simulate TERM to enable auto-color (see color.c) */
+ result = "winansi";
+ }
+ return result;
+}
+
+int mingw_putenv(const char *namevalue)
+{
+ environ = env_setenv(environ, namevalue);
+ return 0;
+}
+
/*
* See http://msdn2.microsoft.com/en-us/library/17w5ykft(vs.71).aspx
* (Parsing C++ Command-Line Arguments)
@@ -919,13 +1006,6 @@ static char *path_lookup(const char *cmd, char **path, int exe_only)
return prog;
}
-static int env_compare(const void *a, const void *b)
-{
- char *const *ea = a;
- char *const *eb = b;
- return strcasecmp(*ea, *eb);
-}
-
struct pinfo_t {
struct pinfo_t *next;
pid_t pid;
@@ -1206,55 +1286,6 @@ void free_environ(char **env)
free(env);
}
-static int lookup_env(char **env, const char *name, size_t nmln)
-{
- int i;
- /*
- * try case-sensitive match first, this is necessary for sh-i18n--envsubst
- * to work with environment variables that differ only in case (e.g. $PATH
- * and $path)
- */
- for (i = 0; env[i]; i++) {
- if (!strncmp(env[i], name, nmln) && '=' == env[i][nmln])
- /* matches */
- return i;
- }
- /* if there's no case-sensitive match, try case-insensitive instead */
- for (i = 0; env[i]; i++) {
- if (!strncasecmp(env[i], name, nmln) && '=' == env[i][nmln])
- return i;
- }
- return -1;
-}
-
-/*
- * If name contains '=', then sets the variable, otherwise it unsets it
- */
-static char **env_setenv(char **env, const char *name)
-{
- char *eq = strchrnul(name, '=');
- int i = lookup_env(env, name, eq-name);
-
- if (i < 0) {
- if (*eq) {
- for (i = 0; env[i]; i++)
- ;
- env = xrealloc(env, (i+2)*sizeof(*env));
- env[i] = (char*) name;
- env[i+1] = NULL;
- }
- }
- else {
- free(env[i]);
- if (*eq)
- env[i] = (char*) name;
- else
- for (; env[i]; i++)
- env[i] = env[i+1];
- }
- return env;
-}
-
/*
* Copies global environ and adjusts variables as specified by vars.
*/
@@ -1269,37 +1300,6 @@ char **make_augmented_environ(const char *const *vars)
return env;
}
-static char *do_getenv(const char *name)
-{
- size_t len = strlen(name);
- int i = lookup_env(environ, name, len);
- if (i >= 0)
- return environ[i] + len + 1; /* skip past name and '=' */
- return NULL;
-}
-
-char *mingw_getenv(const char *name)
-{
- char *result = do_getenv(name);
- if (!result && !strcmp(name, "TMPDIR")) {
- /* on Windows it is TMP and TEMP */
- result = do_getenv("TMP");
- if (!result)
- result = do_getenv("TEMP");
- }
- else if (!result && !strcmp(name, "TERM")) {
- /* simulate TERM to enable auto-color (see color.c) */
- result = "winansi";
- }
- return result;
-}
-
-int mingw_putenv(const char *namevalue)
-{
- environ = env_setenv(environ, namevalue);
- return 0;
-}
-
/*
* Note, this isn't a complete replacement for getaddrinfo. It assumes
* that service contains a numerical port, or that it is null. It
--
1.7.6.msysgit.1
Env_setenv has more in common with putenv than setenv. Change to do_putenv.
Signed-off-by: Karsten Blees <bl...@dcon.de>
---
compat/mingw.c | 16 ++++++++--------
1 files changed, 8 insertions(+), 8 deletions(-)
diff --git a/compat/mingw.c b/compat/mingw.c
index 425fac8..8ea5fe0 100644
--- a/compat/mingw.c
+++ b/compat/mingw.c
@@ -737,14 +737,14 @@ char *mingw_getcwd(char *pointer, int len)
return pointer;
}
-static int env_compare(const void *a, const void *b)
+static int compareenv(const void *a, const void *b)
{
char *const *ea = a;
char *const *eb = b;
return strcasecmp(*ea, *eb);
}
-static int lookup_env(char **env, const char *name, size_t nmln)
+static int lookupenv(char **env, const char *name, size_t nmln)
{
int i;
/*
@@ -768,10 +768,10 @@ static int lookup_env(char **env, const char *name, size_t nmln)
/*
* If name contains '=', then sets the variable, otherwise it unsets it
*/
-static char **env_setenv(char **env, const char *name)
+static char **do_putenv(char **env, const char *name)
{
char *eq = strchrnul(name, '=');
- int i = lookup_env(env, name, eq-name);
+ int i = lookupenv(env, name, eq-name);
if (i < 0) {
if (*eq) {
@@ -796,7 +796,7 @@ static char **env_setenv(char **env, const char *name)
static char *do_getenv(const char *name)
{
size_t len = strlen(name);
- int i = lookup_env(environ, name, len);
+ int i = lookupenv(environ, name, len);
if (i >= 0)
return environ[i] + len + 1; /* skip past name and '=' */
return NULL;
@@ -820,7 +820,7 @@ char *mingw_getenv(const char *name)
int mingw_putenv(const char *namevalue)
{
- environ = env_setenv(environ, namevalue);
+ environ = do_putenv(environ, namevalue);
return 0;
}
@@ -1093,7 +1093,7 @@ static pid_t mingw_spawnve_fd(const char *cmd, const char **argv, char **env,
/* environment must be sorted */
sorted_env = xmalloc(sizeof(*sorted_env) * (count + 1));
memcpy(sorted_env, env, sizeof(*sorted_env) * (count + 1));
- qsort(sorted_env, count, sizeof(*sorted_env), env_compare);
+ qsort(sorted_env, count, sizeof(*sorted_env), compareenv);
/* create environment block from temporary environment */
for (i = 0; sorted_env[i]; i++) {
@@ -1295,7 +1295,7 @@ char **make_augmented_environ(const char *const *vars)
while (*vars) {
const char *v = *vars++;
- env = env_setenv(env, strchr(v, '=') ? xstrdup(v) : v);
+ env = do_putenv(env, strchr(v, '=') ? xstrdup(v) : v);
}
return env;
}
--
1.7.6.msysgit.1
diff --git a/compat/mingw.c b/compat/mingw.c
index 8ea5fe0..303b661 100644
--- a/compat/mingw.c
+++ b/compat/mingw.c
@@ -1006,6 +1006,36 @@ static char *path_lookup(const char *cmd, char **path, int exe_only)
return prog;
}
+/*
+ * Create environment block suitable for CreateProcess.
+ */
+static wchar_t *make_environment_block(char **env)
+{
+ wchar_t *wenvblk = NULL;
+ int count = 0;
+ char **e, **tmpenv;
+ int i = 0, size = 0, envblksz = 0, envblkpos = 0;
+
+ for (e = env; *e; e++)
+ count++;
+
+ /* environment must be sorted */
+ tmpenv = xmalloc(sizeof(*tmpenv) * (count + 1));
+ memcpy(tmpenv, env, sizeof(*tmpenv) * (count + 1));
+ qsort(tmpenv, count, sizeof(*tmpenv), compareenv);
+
+ /* create environment block from temporary environment */
+ for (i = 0; tmpenv[i]; i++) {
+ size = 2 * strlen(tmpenv[i]) + 2;
+ ALLOC_GROW(wenvblk, (envblkpos + size) * sizeof(wchar_t), envblksz);
+ envblkpos += utftowcs(&wenvblk[envblkpos], tmpenv[i], size) + 1;
+ }
+ /* add final \0 terminator */
+ wenvblk[envblkpos] = 0;
+ free(tmpenv);
+ return wenvblk;
+}
+
struct pinfo_t {
struct pinfo_t *next;
pid_t pid;
@@ -1082,29 +1112,8 @@ static pid_t mingw_spawnve_fd(const char *cmd, const char **argv, char **env,
utftowcs(wargs, args.buf, 2 * args.len + 1);
strbuf_release(&args);
- if (env) {
- int count = 0;
- char **e, **sorted_env;
- int i = 0, size = 0, envblksz = 0, envblkpos = 0;
-
- for (e = env; *e; e++)
- count++;
-
- /* environment must be sorted */
- sorted_env = xmalloc(sizeof(*sorted_env) * (count + 1));
- memcpy(sorted_env, env, sizeof(*sorted_env) * (count + 1));
- qsort(sorted_env, count, sizeof(*sorted_env), compareenv);
-
- /* create environment block from temporary environment */
- for (i = 0; sorted_env[i]; i++) {
- size = 2 * strlen(sorted_env[i]) + 2;
- ALLOC_GROW(wenvblk, (envblkpos + size) * sizeof(wchar_t), envblksz);
- envblkpos += utftowcs(&wenvblk[envblkpos], sorted_env[i], size) + 1;
- }
- /* add final \0 terminator */
- wenvblk[envblkpos] = 0;
- free(sorted_env);
- }
+ if (env)
+ wenvblk = make_environment_block(env);
memset(&pi, 0, sizeof(pi));
ret = CreateProcessW(wcmd, wargs, NULL, NULL, TRUE, flags,
--
1.7.6.msysgit.1
Signed-off-by: Karsten Blees <bl...@dcon.de>
---
compat/mingw.c | 22 ++++++++--------------
1 files changed, 8 insertions(+), 14 deletions(-)
diff --git a/compat/mingw.c b/compat/mingw.c
index 7c8a1bc..dadabeb 100644
--- a/compat/mingw.c
+++ b/compat/mingw.c
@@ -1060,10 +1060,9 @@ static pid_t mingw_spawnve_fd(const char *cmd, const char **argv, char **env,
return (pid_t)pi.dwProcessId;
}
-static pid_t mingw_spawnve(const char *cmd, const char **argv, char **env,
- int prepend_cmd)
+static pid_t mingw_spawnv(const char *cmd, const char **argv, int prepend_cmd)
{
- return mingw_spawnve_fd(cmd, argv, env, NULL, prepend_cmd, 0, 1, 2);
+ return mingw_spawnve_fd(cmd, argv, environ, NULL, prepend_cmd, 0, 1, 2);
}
pid_t mingw_spawnvpe(const char *cmd, const char **argv, char **env,
@@ -1105,7 +1104,7 @@ pid_t mingw_spawnvpe(const char *cmd, const char **argv, char **env,
return pid;
}
-static int try_shell_exec(const char *cmd, char *const *argv, char **env)
+static int try_shell_exec(const char *cmd, char *const *argv)
{
const char *interpr = parse_interpreter(cmd);
char **path;
@@ -1123,7 +1122,7 @@ static int try_shell_exec(const char *cmd, char *const *argv, char **env)
argv2 = xmalloc(sizeof(*argv) * (argc+1));
argv2[0] = (char *)cmd; /* full path to the script file */
memcpy(&argv2[1], &argv[1], sizeof(*argv) * argc);
- pid = mingw_spawnve(prog, argv2, env, 1);
+ pid = mingw_spawnv(prog, argv2, 1);
if (pid >= 0) {
int status;
if (waitpid(pid, &status, 0) < 0)
@@ -1138,13 +1137,13 @@ static int try_shell_exec(const char *cmd, char *const *argv, char **env)
return pid;
}
-static void mingw_execve(const char *cmd, char *const *argv, char *const *env)
+void mingw_execv(const char *cmd, char *const *argv)
{
/* check if git_command is a shell script */
- if (!try_shell_exec(cmd, argv, (char **)env)) {
+ if (!try_shell_exec(cmd, argv)) {
int pid, status;
- pid = mingw_spawnve(cmd, (const char **)argv, (char **)env, 0);
+ pid = mingw_spawnv(cmd, (const char **)argv, 0);
if (pid < 0)
return;
if (waitpid(pid, &status, 0) < 0)
@@ -1159,7 +1158,7 @@ void mingw_execvp(const char *cmd, char *const *argv)
char *prog = path_lookup(cmd, path, 0);
if (prog) {
- mingw_execve(prog, argv, environ);
+ mingw_execv(prog, argv);
free(prog);
} else
errno = ENOENT;
@@ -1167,11 +1166,6 @@ void mingw_execvp(const char *cmd, char *const *argv)
free_path_split(path);
}
-void mingw_execv(const char *cmd, char *const *argv)
-{
- mingw_execve(cmd, argv, environ);
-}
-
int mingw_kill(pid_t pid, int sig)
{
if (pid > 0 && sig == SIGTERM) {
--
1.7.6.msysgit.1
Move the merge logic to make_environment_block so that we only need to copy
the environment once. This changes semantics of the env parameter: it now
expects a delta (such as child_process.env) rather than a full environment.
This is not a problem as the parameter is only used by start_command()
(all other callers previously passed char **environ, and now pass NULL).
The merge logic no longer xstrdup()s the environment strings, so do_putenv
must not free them. Add a parameter to distinguish this from normal putenv.
Remove the now unused make_augmented_environ / free_environ API.
Signed-off-by: Karsten Blees <bl...@dcon.de>
---
compat/mingw.c | 76 ++++++++++++++++++-------------------------------------
compat/mingw.h | 6 ----
run-command.c | 10 +------
3 files changed, 27 insertions(+), 65 deletions(-)
diff --git a/compat/mingw.c b/compat/mingw.c
index 303b661..aeae266 100644
--- a/compat/mingw.c
+++ b/compat/mingw.c
@@ -768,7 +768,7 @@ static int lookupenv(char **env, const char *name, size_t nmln)
/*
* If name contains '=', then sets the variable, otherwise it unsets it
*/
-static char **do_putenv(char **env, const char *name)
+static char **do_putenv(char **env, const char *name, int free_old)
{
char *eq = strchrnul(name, '=');
int i = lookupenv(env, name, eq-name);
@@ -783,7 +783,8 @@ static char **do_putenv(char **env, const char *name)
}
}
else {
- free(env[i]);
+ if (free_old)
+ free(env[i]);
if (*eq)
env[i] = (char*) name;
else
@@ -820,7 +821,7 @@ char *mingw_getenv(const char *name)
int mingw_putenv(const char *namevalue)
{
- environ = do_putenv(environ, namevalue);
+ environ = do_putenv(environ, namevalue, 1);
return 0;
}
@@ -1007,21 +1008,30 @@ static char *path_lookup(const char *cmd, char **path, int exe_only)
}
/*
- * Create environment block suitable for CreateProcess.
+ * Create environment block suitable for CreateProcess. Merges current
+ * process environment and the supplied environment changes.
*/
-static wchar_t *make_environment_block(char **env)
+static wchar_t *make_environment_block(char **deltaenv)
{
wchar_t *wenvblk = NULL;
int count = 0;
- char **e, **tmpenv;
+ char **tmpenv;
int i = 0, size = 0, envblksz = 0, envblkpos = 0;
- for (e = env; *e; e++)
+ while (environ[count])
count++;
- /* environment must be sorted */
+ /* copy the environment */
tmpenv = xmalloc(sizeof(*tmpenv) * (count + 1));
- memcpy(tmpenv, env, sizeof(*tmpenv) * (count + 1));
+ memcpy(tmpenv, environ, sizeof(*tmpenv) * (count + 1));
+
+ /* merge supplied environment changes into the temporary environment */
+ for (i = 0; deltaenv && deltaenv[i]; i++)
+ tmpenv = do_putenv(tmpenv, deltaenv[i], 0);
+
+ /* environment must be sorted */
+ for (count = 0; tmpenv[count]; count++)
+ ;
qsort(tmpenv, count, sizeof(*tmpenv), compareenv);
/* create environment block from temporary environment */
@@ -1044,7 +1054,7 @@ struct pinfo_t {
struct pinfo_t *pinfo = NULL;
CRITICAL_SECTION pinfo_cs;
-static pid_t mingw_spawnve_fd(const char *cmd, const char **argv, char **env,
+static pid_t mingw_spawnve_fd(const char *cmd, const char **argv, char **deltaenv,
const char *dir,
int prepend_cmd, int fhin, int fhout, int fherr)
{
@@ -1112,8 +1122,7 @@ static pid_t mingw_spawnve_fd(const char *cmd, const char **argv, char **env,
utftowcs(wargs, args.buf, 2 * args.len + 1);
strbuf_release(&args);
- if (env)
- wenvblk = make_environment_block(env);
+ wenvblk = make_environment_block(deltaenv);
memset(&pi, 0, sizeof(pi));
ret = CreateProcessW(wcmd, wargs, NULL, NULL, TRUE, flags,
@@ -1151,10 +1160,10 @@ static pid_t mingw_spawnve_fd(const char *cmd, const char **argv, char **env,
static pid_t mingw_spawnv(const char *cmd, const char **argv, int prepend_cmd)
{
- return mingw_spawnve_fd(cmd, argv, environ, NULL, prepend_cmd, 0, 1, 2);
+ return mingw_spawnve_fd(cmd, argv, NULL, NULL, prepend_cmd, 0, 1, 2);
}
-pid_t mingw_spawnvpe(const char *cmd, const char **argv, char **env,
+pid_t mingw_spawnvpe(const char *cmd, const char **argv, char **deltaenv,
const char *dir,
int fhin, int fhout, int fherr)
{
@@ -1178,14 +1187,14 @@ pid_t mingw_spawnvpe(const char *cmd, const char **argv, char **env,
pid = -1;
}
else {
- pid = mingw_spawnve_fd(iprog, argv, env, dir, 1,
+ pid = mingw_spawnve_fd(iprog, argv, deltaenv, dir, 1,
fhin, fhout, fherr);
free(iprog);
}
argv[0] = argv0;
}
else
- pid = mingw_spawnve_fd(prog, argv, env, dir, 0,
+ pid = mingw_spawnve_fd(prog, argv, deltaenv, dir, 0,
fhin, fhout, fherr);
free(prog);
}
@@ -1274,41 +1283,6 @@ int mingw_kill(pid_t pid, int sig)
return -1;
}
-static char **copy_environ(void)
-{
- char **env;
- int i = 0;
- while (environ[i])
- i++;
- env = xmalloc((i+1)*sizeof(*env));
- for (i = 0; environ[i]; i++)
- env[i] = xstrdup(environ[i]);
- env[i] = NULL;
- return env;
-}
-
-void free_environ(char **env)
-{
- int i;
- for (i = 0; env[i]; i++)
- free(env[i]);
- free(env);
-}
-
-/*
- * Copies global environ and adjusts variables as specified by vars.
- */
-char **make_augmented_environ(const char *const *vars)
-{
- char **env = copy_environ();
-
- while (*vars) {
- const char *v = *vars++;
- env = do_putenv(env, strchr(v, '=') ? xstrdup(v) : v);
- }
- return env;
-}
-
/*
* Note, this isn't a complete replacement for getaddrinfo. It assumes
* that service contains a numerical port, or that it is null. It
diff --git a/compat/mingw.h b/compat/mingw.h
index 131d2ec..3913b34 100644
--- a/compat/mingw.h
+++ b/compat/mingw.h
@@ -331,12 +331,6 @@ void mingw_open_html(const char *path);
void mingw_mark_as_git_dir(const char *dir);
#define mark_as_git_dir mingw_mark_as_git_dir
-/*
- * helpers
- */
-
-char **make_augmented_environ(const char *const *vars);
-void free_environ(char **env);
/**
* Converts UTF-8 encoded string to UTF-16LE.
diff --git a/run-command.c b/run-command.c
index a8be1b1..112489e 100644
--- a/run-command.c
+++ b/run-command.c
@@ -316,7 +316,6 @@ fail_pipe:
{
int fhin = 0, fhout = 1, fherr = 2;
const char **sargv = cmd->argv;
- char **env = environ;
if (cmd->no_stdin)
fhin = open("/dev/null", O_RDWR);
@@ -341,23 +340,18 @@ fail_pipe:
else if (cmd->out > 1)
fhout = dup(cmd->out);
- if (cmd->env)
- env = make_augmented_environ(cmd->env);
-
if (cmd->git_cmd) {
cmd->argv = prepare_git_cmd(cmd->argv);
} else if (cmd->use_shell) {
cmd->argv = prepare_shell_cmd(cmd->argv);
}
- cmd->pid = mingw_spawnvpe(cmd->argv[0], cmd->argv, env, cmd->dir,
- fhin, fhout, fherr);
+ cmd->pid = mingw_spawnvpe(cmd->argv[0], cmd->argv, (char**) cmd->env,
+ cmd->dir, fhin, fhout, fherr);
failed_errno = errno;
if (cmd->pid < 0 && (!cmd->silent_exec_failure || errno != ENOENT))
error("cannot spawn %s: %s", cmd->argv[0], strerror(errno));
- if (cmd->env)
- free_environ(env);
if (cmd->git_cmd)
free(cmd->argv);
--
1.7.6.msysgit.1
Change compareenv to compare only the keys, so that it can be used to
find an entry irrespective of the value.
Change lookupenv to binary seach for an entry. Return one's complement of
the insert position if not found (libc's bsearch return's NULL). If a
case-insensitive match is found, try to find a better case-sensitive match
around that entry.
Change do_putenv to insert new entries at the correct position. Simplify
the function by swapping if conditions and using memmove instead of for
loops.
Move qsort from make_environment_block to mingw_startup. We still need to
sort on startup to make sure that the environment is sorted according to
our compareenv function (while Win32 / CreateProcess requires the
environment block to be sorted case-insensitively, CreateProcess currently
doesn't enforce this, and some applications such as bash just don't care).
Signed-off-by: Karsten Blees <bl...@dcon.de>
---
compat/mingw.c | 110 ++++++++++++++++++++++++++++++++++++--------------------
1 files changed, 71 insertions(+), 39 deletions(-)
diff --git a/compat/mingw.c b/compat/mingw.c
index 0f85f12..dc905bb 100644
--- a/compat/mingw.c
+++ b/compat/mingw.c
@@ -737,32 +737,65 @@ char *mingw_getcwd(char *pointer, int len)
return pointer;
}
-static int compareenv(const void *a, const void *b)
+/*
+ * Compare environment entries by key (i.e. stopping at '=' or '\0').
+ */
+static int compareenv(const void *v1, const void *v2)
{
- char *const *ea = a;
- char *const *eb = b;
- return strcasecmp(*ea, *eb);
+ const char *e1 = *(const char**)v1;
+ const char *e2 = *(const char**)v2;
+
+ for (;;) {
+ int c1 = *e1++;
+ int c2 = *e2++;
+ c1 = (c1 == '=') ? 0 : tolower(c1);
+ c2 = (c2 == '=') ? 0 : tolower(c2);
+ if (c1 > c2)
+ return 1;
+ if (c1 < c2)
+ return -1;
+ if (c1 == 0)
+ return 0;
+ }
}
-static int lookupenv(char **env, const char *name, size_t nmln)
+static int bsearchenv(char **env, const char *name, size_t size)
{
- int i;
+ unsigned low = 0, high = size - 1, mid, i, j, nmln;
+ int cmp;
+ while (low <= high) {
+ mid = (low + high) >> 1;
+ cmp = compareenv(&env[mid], &name);
+ if (cmp < 0)
+ low = mid + 1;
+ else if (cmp > 0)
+ high = mid - 1;
+ else
+ goto found;
+ }
+ return ~low; /* not found, return 1's complement of insert position */
+
+found:
+ /* look for duplicate entries before and after the matching entry (mid) */
+ i = j = mid;
+ while (i > low && !compareenv(&env[i - 1], &name))
+ i--;
+ while (j < high && !compareenv(&env[j + 1], &name))
+ j++;
+ if (i == j)
+ return mid; /* no duplicates */
+
/*
* try case-sensitive match first, this is necessary for sh-i18n--envsubst
* to work with environment variables that differ only in case (e.g. $PATH
* and $path)
*/
- for (i = 0; env[i]; i++) {
+ nmln = strlen(name);
+ for (; i <= j; i++)
if (!strncmp(env[i], name, nmln) && '=' == env[i][nmln])
/* matches */
return i;
- }
- /* if there's no case-sensitive match, try case-insensitive instead */
- for (i = 0; env[i]; i++) {
- if (!strncasecmp(env[i], name, nmln) && '=' == env[i][nmln])
- return i;
- }
- return -1;
+ return mid;
}
/*
@@ -772,26 +805,24 @@ static int lookupenv(char **env, const char *name, size_t nmln)
*/
static int do_putenv(char **env, const char *name, int size, int free_old)
{
- char *eq = strchrnul(name, '=');
- int i = lookupenv(env, name, eq-name);
+ int i = bsearchenv(env, name, size - 1);
+
+ /* optionally free removed / replaced entry */
+ if (i >= 0 && free_old)
+ free(env[i]);
- if (i < 0) {
- if (*eq) {
- env[size - 1] = (char*) name;
- env[size] = NULL;
+ if (strchr(name, '=')) {
+ /* if new value ('key=value') is specified, insert or replace entry */
+ if (i < 0) {
+ i = ~i;
+ memmove(&env[i + 1], &env[i], (size - i) * sizeof(char*));
size++;
}
- }
- else {
- if (free_old)
- free(env[i]);
- if (*eq)
- env[i] = (char*) name;
- else {
- for (; env[i]; i++)
- env[i] = env[i+1];
- size--;
- }
+ env[i] = (char*) name;
+ } else if (i >= 0) {
+ /* otherwise ('key') remove existing entry */
+ size--;
+ memmove(&env[i], &env[i + 1], (size - i) * sizeof(char*));
}
return size;
}
@@ -803,11 +834,12 @@ static int environ_alloc = 0;
static char *do_getenv(const char *name)
{
- size_t len = strlen(name);
- int i = lookupenv(environ, name, len);
- if (i >= 0)
- return environ[i] + len + 1; /* skip past name and '=' */
- return NULL;
+ char *value;
+ int pos = bsearchenv(environ, name, environ_size - 1);
+ if (pos < 0)
+ return NULL;
+ value = strchr(environ[pos], '=');
+ return value ? &value[1] : NULL;
}
char *mingw_getenv(const char *name)
@@ -1036,9 +1068,6 @@ static wchar_t *make_environment_block(char **deltaenv)
for (i = 0; deltaenv && deltaenv[i]; i++)
size = do_putenv(tmpenv, deltaenv[i], size, 0);
- /* environment must be sorted */
- qsort(tmpenv, size - 1, sizeof(char*), compareenv);
-
/* create environment block from temporary environment */
for (i = 0; tmpenv[i]; i++) {
size = 2 * strlen(tmpenv[i]) + 2;
@@ -2092,6 +2121,9 @@ void mingw_startup()
environ[i] = NULL;
free(buffer);
+ /* sort environment for O(log n) getenv / putenv */
+ qsort(environ, i, sizeof(char*), compareenv);
+
/* initialize critical section for waitpid pinfo_t list */
InitializeCriticalSection(&pinfo_cs);
--
1.7.6.msysgit.1
Signed-off-by: Karsten Blees <bl...@dcon.de>
---
compat/mingw.c | 33 ++++++++++++++++-----------------
1 files changed, 16 insertions(+), 17 deletions(-)
diff --git a/compat/mingw.c b/compat/mingw.c
index dc905bb..77ea5d9 100644
--- a/compat/mingw.c
+++ b/compat/mingw.c
@@ -832,7 +832,7 @@ static int environ_size = 0;
/* allocated size of environ array, in bytes */
static int environ_alloc = 0;
-static char *do_getenv(const char *name)
+char *mingw_getenv(const char *name)
{
char *value;
int pos = bsearchenv(environ, name, environ_size - 1);
@@ -842,22 +842,6 @@ static char *do_getenv(const char *name)
return value ? &value[1] : NULL;
}
-char *mingw_getenv(const char *name)
-{
- char *result = do_getenv(name);
- if (!result && !strcmp(name, "TMPDIR")) {
- /* on Windows it is TMP and TEMP */
- result = do_getenv("TMP");
- if (!result)
- result = do_getenv("TEMP");
- }
- else if (!result && !strcmp(name, "TERM")) {
- /* simulate TERM to enable auto-color (see color.c) */
- result = "winansi";
- }
- return result;
-}
-
int mingw_putenv(const char *namevalue)
{
ALLOC_GROW(environ, (environ_size + 1) * sizeof(char*), environ_alloc);
@@ -2124,6 +2108,21 @@ void mingw_startup()
/* sort environment for O(log n) getenv / putenv */
qsort(environ, i, sizeof(char*), compareenv);
+ /* fix Windows specific environment settings */
+
+ /* on Windows it is TMP and TEMP */
+ if (!getenv("TMPDIR")) {
+ const char *tmp = getenv("TMP");
+ if (!tmp)
+ tmp = getenv("TEMP");
+ if (tmp)
+ setenv("TMPDIR", tmp, 1);
+ }
+
+ /* simulate TERM to enable auto-color (see color.c) */
+ if (!getenv("TERM"))
+ setenv("TERM", "winansi", 1);
Frankly, I only briefly looked at your series, but...
NO MATTER what others say, this is EXTREMELY IMPRESSIVE! Although, I'm not
affected by international home dir names and all-languages-mixed file names
in my source trees, I GREATLY appreciate your effort.
Wish you BEST OF LUCK with the series!
--
Kirill.
2011/10/6 Karsten Blees <bl...@dcon.de>:
> So far I haven't encountered any problems with the Unicode-aware msys.dll (http://groups.google.com/group/msysgit/browse_thread/thread/981da260b4fcb1df), and there was no feedback from the mailing list either (this could be good or bad :-).
To give you some positive feedback: I'm using your last Unicode
installer (Git-1.7.5-unicode-20110720) for some weeks now and haven't
encountered any problems yet.
Thanks for the effort!
Joerg.
On Thu, 6 Oct 2011, Karsten Blees wrote:
> I just pushed v10 of the Unicode patches for msysgit to
> http://repo.or.cz/w/git/mingw/4msysgit/kblees.git.
Thanks for being so persistent with this issue.
For starters, however, I would like to integrate your work on "less"
first. Could you make that a branch in your repository, too?
Thanks!
Dscho
I have just tested your 1.7.6 installer with integrated Unicode patches
on Russian Windows XP Pro SP3 and would like to report on my experience.
What's presented below, were done in a regular cmd.exe window.
I first created a test repo with a simple all-ASCII pathname
(C:\tmp\foo) and created a file with simple Cyrillic name "тест.txt"
("тест" stands for "test" in Russian).
`git add .` went OK, but display presented with `git status` and
`git ls-files` showed \xxx escapes for Cyrillic letters in the file
listed (I think that were escapes for UTF-8 bytes comprising the file
name in fact). So I googled the core.quotepath configuration option and
set it to false, now the filename was shown in some garbled encoding but
Git hinted me to set a Unicode font on the console, so I set Lucida.
That finally fixed displaying but for some reason did not prevented Git
from nagging me about my console supposedly not supporting Unicode
(setting Lucida "for all windows of the same type", that is, globally,
did not fix the problem).
Committing the added file went OK, but `git show` afterwards barked at me:
C:\tmp\foo>git show
WARNING: terminal is not fully functional
- (press RETURN)
Hitting return presented the diff where the filename appeared OK.
I have a feeling this might be a problem with less, not Git.
git-gui and gitk showed no problems displaying the added file name.
I then played with the repository a bit, renaming the file to another
Cyrillic name etc, verifying each stage with git-gui and gitk--all went
OK except for that non-unicode console and not fully-functional
terminal nagging.
I then logged in with a user with cyrillic username ("Администратор")
and created another repo--this time with the repo directory itself named
using Cyrillic letters.
Git had no apparent problems setting the user.email and user.name
configuration variables globally (via `git config --add --global ...`)
which are stored somewhere under %USERPROFILE% as I understand this.
Then I created a file in a subdirectory of the repo, with both the
subdirectory and the file named using Cyrillic letters.
Again, no problems beyond those described above with the git itself.
But then I tried to fire git-gui from inside the repo directory and it
failed to start with this error message:
---------------------------
Отсутствует рабочий каталог C:/tmp/директория:
couldn't change working directory to "C:/tmp/директория": no such file or directory
---------------------------
The first error, translated from Russian, means
"Missing working directory C:/tmp/...:"
The name of the directory looks rather typical to me: it's an UTF-8
string interpreted as containing characters in Windows-1251.
I did not look at the code yet but supposedly git-gui somehow asks git
about its working directory and it returns an UTF-8 encoded pathname.
If this pathname is read using a pipe from git's stdout, it may well be
interpreted as being in Windows-1251 as this is the standard Tcl's
behaviour on Windows--to assume the standard handles of spawned
processes operate with the currently active codepage.
gitk worked without any apparent problems.
The only remaining issue which manifested itself in all my experiments
is that dreaded commit encoding problem. I have gvim.exe (version 7.3)
as my $EDITOR and it assumes all files it opens on Windows are encoded
using the currently active codepage (cp1251, again).
Hence two problems:
* Filenames in the commit message file are displayed botched.
* If nothing is done to fix whis, like forced setting of the output
file's encoding to be UTF-8, commit message will be composed in the
system's codepage and will appear unreadable in both git's output and
its GUI tools.
I failed to verify the behaviour of notepad.exe with regard to commit
encodings because the commit message is generated with sole LFs as
line ending markers and so notepad shows it as one long string.
Some sort of summary follows.
So, here are slight technical glitches and some conceptual issues.
Techincal glitches are:
* Reporting about non-Unicode console while it has Unicode font set.
* Reporting something about broken terminal (less problem?).
* git-gui failing to run from within a directory with non-ASCII
characters in its name.
I beleive these can be fixed.
Now onto conceptual issues.
I take Subversion as being the gold standard in the field of supporting
system's native language on Windows among free VCS tools--with it,
filenames and commit messages "just work" in the eyes of a typical
inexperienced developer, both in untweaked console window and "in the GUI"
(everyone uses TortoiseSVN). Please note that's not too subjective:
every Windows developer I discussed the issue of supporting native
language in [D]VCS tools on Windows with agrees on Subversion as being
the only one doing it right (well, may be it's actually the only thing
it does right).
Compared to Subversion, Git for Windows with Unicode patch has these
issues:
* Console window has to be tweaked.
* core.quotepath config variable has to be set to false.
* The user has to somehow force their text editor to interpret and save
the commit message as being composed in UTF-8 or resort to using
i18n.*encoding variables.
I don't know if these issues can be solved at all to match the
experience developers get with Subversion but I thought they are
worth summarising here anyway.
All in all, this patch series considerably improves the current
situation, thanks Karsten and everyone involved!
(And I'll try to do more extensive testing between Windows and Linux
machines.)
I think compat/msvc.h is the correct place to put this, not in our
placement unistd.h.
While I'm in favor of doing these kind of changes, patches like this
have caused some controversy upstream earlier. I'm not entirely sure
what the preferred solution upstream is anymore, though.
The problem with doing changes like this is that it increase the delta
against the upstream without much gain. Can we do something like what
the previous patch did instead, i.e "#pragma warning(disable : XXXX)"?
mktemp should be defined by stdlib.h, not unistd.h:
http://pubs.opengroup.org/onlinepubs/7908799/xsh/mktemp.html
But perhaps we don't support overriding headers that exist in the
Windows SDK? If that is the case, I think this would be better to
place in compat/msvc.h...
Looks good to me. But if we're splitting out an MSVC topic, then this
should probably be included in that ;)
Does splitting this step in two produce a Git for Windows that works?
If not, it'll be unfortunate for bisection later on...
I'd prefer the #pragma approach, too. It is my hope that at least someone
regularly uses the valgrind mode I worked so hard on, so we should be
reasonably sure that we're more clever than the compiler regarding those
"uninitialized" values.
And I guess that the reasoning is that initializing values to rogue values
does not buy us anything except maybe a performance hit.
Ciao,
Dscho
You have earlier[1] said that this function has the semantics of
wcstombs. But this part isn't. wcstombs does not apply filename
semantics to buffer sizes. In fact[2] it does not even error if the
string is too long (but it does return the same value as passed in as
'n', which can be used to tell us that we overflowed the buffer). So I
think it's better to modify this so that it behaves like wcstombs.
[1]: http://markmail.org/message/f7ckzoopqafywj3k
[2]: http://pubs.opengroup.org/onlinepubs/009604599/functions/wcstombs.html
This reminds me of "[PATCH v10 07/28] gitk: fix file name encoding in
diff hunk headers"... is there a reason why this is split in two
patches (or not coming directly after each other, at least)?
When doing input and output in two different patches, don't we end up
breaking locale-dependent repos for a little while (i.e where one git
tool gets a UTF-8 filename when it expects a Windows 1252 filename
from some other git tool)? It's probably not a big deal, but it'd be
nice if people who have issues with Windows 1252 repos could safely
bisect their issues past these commits also...
This commit message doesn't quite work for me; it describes WHAT the
patch does, but not WHY. The what-part is already covered pretty well
by the code; can you please describe the problem it solves instead of
how it solves some problem I don't know what is? :)
Again, I suspect that we might end up breaking Windows 1252 encoded
repos here for a single revision. Squash the incoming/outgoing
patches, perhaps?
So, the earlier patches didn't update the test suite as they went
along and broke stuff that used to work? If so, that's not how we do
it: this makes it impossible to bisect breakages in the test-suite.
But I don't get why this is needed. Surely it was possible to pass the
environment variables as 8859-1 earlier, since these tests used to
work. What changed? Is this change needed after MSYS was patched to do
Unicode? In that case, I doubt the MSYS patch is correct; if people on
unix can pass 8859-1 encoded environment variables from a
shell-script, we should be able to do so too, no?
If I understand the tests correctly it's about respecting
i18n.commitencoding, which is something I expected this series to help
us do BETTER, not worse...
This is a bugfix for 'devel', and not strictly speaking Unicode
related, no? If so, I would have expected it to be earlier in the
series...
I think the commit message is missing the important(?) detail that you
are removing our unused (and unexposed) execve-emulation in favor of
execv (which is used by shell.c). I'm not saying that's a bad thing;
just that perhaps it should be pointed out.
Let me reiterate what others have said: impressive :)
I'll quickly summarize my thoughts here, in case someone wants a simple rundown:
> Karsten Blees (28) ('+': new/rewritten in v10, '!': modified):
> MSVC: disable const qualifier compiler warnings
> MSVC: don't use uninitialized variables
> MSVC: include <io.h> for mktemp
> MSVC: fix winansi.c compile errors
> MSVC: link dynamically to the CRT
As I commented on these before, I believe all these are not strictly
speaking related, and should probably be fast-tracked to 'devel'
(after the discussion settles, which I'm sure it will very soon).
> Win32: Thread-safe windows console output
I don't quite understand what this has to do with Unicode.. Isn't it
just a bug-fix for the existing code?
> + (kb/unicode-v10-minimal) MinGW: disable legacy encoding tests
This one worries me a bit. Why can't we respect i18n.commitencoding
after going unicode?
> + Win32: fix environment memory leaks
> + Win32: unify environment case-sensitivity
> + Win32: simplify internal mingw_spawn* APIs
> Win32: move environment functions
> ! Win32: unify environment function names
> ! Win32: move environment block creation to a helper method
> ! Win32: don't copy the environment twice when spawning child processes
> Win32: reduce environment array reallocations
> ! Win32: keep the environment sorted
Some of these looks like improvements even without the Unicode-stuff.
So they should probably be moved earlier in the series, no?
> (kb/unicode-v10) Win32: patch Windows environment on startup
I think I should say this again: very impressive work. Thanks for
hanging in there so far, and I hope you're willing to stay in a little
bit longer so we can finally merge this very nice improvement! :)
On Sat, 8 Oct 2011, karste...@dcon.de wrote:
> Johannes Schindelin <Johannes....@gmx.de> wrote on 07.10.2011
> 22:35:08:
>
> > On Thu, 6 Oct 2011, Karsten Blees wrote:
> >
> > > I just pushed v10 of the Unicode patches for msysgit to
> > > http://repo.or.cz/w/git/mingw/4msysgit/kblees.git.
> >
> > Thanks for being so persistent with this issue.
> >
> > For starters, however, I would like to integrate your work on "less"
> > first. Could you make that a branch in your repository, too?
>
> Not exactly, as it's a git fork...just gave birth to kblees/msysgit on
> github instead:
>
> less-444: https://github.com/kblees/msysgit/commits/kb/msys/less
> Unicode-aware msys.dll:
> https://github.com/kblees/msysgit/commits/kb/msys/unicode
Cool, thanks! I will try to shoehorn that into my schedule sometime this
week.
> OT: I'm still a bit confused about branches in the msysgit repo...I
> think "devel" is the development environment for git and other
> mingw-based stuff, while "msys" is for customized msys tools, with
> binaries cherry-picked to devel, is that basically correct?
Yep.
> It gave me quite a headache that some msys stuff (such as src/rt) is
> also in the devel branch, but won't build there at all...
The /src/rt/ in 'devel' looks different from the one in 'msys', and
we can call that experiment "failed", I guess. Unless someone objects
loudly, will remove it from 'devel'.
It would be fantastic if we could go with Sebastian's approach of making
Git for Windows an update site for MinGW at some stage, especially since
the MSys people want to make an MSys-based Git (which would be as slow as
Cygwin Git, I guess).
But I simply lack the time, and unfortunately it seems no other people are
interested.
It would need a little bit of committment, though, since we do have a few
changes in msys.dll which would most likely need to be refined a number of
times before upstream takes them.
Ciao,
Johannes
On Mon, 10 Oct 2011, Erik Faye-Lund wrote:
If it looks like a bug fix to you, please apply directly to 'devel'.
Thanks!
Dscho
Just to clarify: The unix domain sockets was only in Junio's 'next',
and has already been kicked out of it already. It's usage also broke
the MinGW build, btw.
> There's obviously no interest in keeping this platform up to date (neither
> in the git nor the msysgit community), and as there is practically no
> supporting infrastructure (not even a working test suite), I don't see much
> value in holding on to this. So probably we should just move these patches
> to the attic and forget about them...
>
The MSVC port is by no means top priority, and leaving fixing it up to
whoever use MSVC is fine by me. In other words; I agree, we can leave
them out.
> As I understand it, msvc.[ch] is to adapt the MSVC build to the existing
> mingw.[ch] code, while compat/vcbuild/include is meant to fix general
> "non-posix"-ness of MSVC. For example, unistd.h also has all the S_*
> definitions that are missing from MSVC's <sys/stat.h>.
>
Exactly, and warning-disabling fits in the first category, not the
second one. This patch was about disabling a compiler-specific
warning, not providing POSIX functionality...
That discussion seems to conclude that disabling the warning IS the
right approach, because there's little MSVC only specific code and GCC
will still pick up the errors.
> I see your point (delta against upstream), however...according to C99,
> - the value of an uninitialized local variable is "indeterminate" (6.7.8),
> - i.e. it can be a "trap representation" (3.17.2),
> - i.e. reading that value results in "undefined behavior" (6.2.6.1),
> - i.e. the compiler may generate code that crashes the program (3.4.3).
>
> So, writing
>
> int foo = foo;
>
> is most definitely a programming error and should be fixed. On the other
> hand,
>
> int foo = 0;
>
> is perfectly fine. A smart optimizing compiler might even detect that the
> initial value is never read before beeing reassigned and remove the
> initialization entirely.
>
I agree with all of this, but the people who makes the calls upstream
doesn't seem to. And for a reason; changes like this degrades
performance a bit. I tend to value code-confidence slightly higher
than performance, though.
But as you said yourself, the point of compat/vcbuild/include is to
provide lacking POSIX interfaces, not to emulate what MinGW-runtime
does.
>> But perhaps we don't support overriding headers that exist in the
>> Windows SDK? If that is the case, I think this would be better to
>> place in compat/msvc.h...
>
> I think copying and modifying the MSVC stdlib.h would be a copyright
> infringement, and writing those 46k from scratch (in a way that fits nicely
> into the rest of MSVC "not-strictly-posix" headers) is not a viable option
> IMO.
That's not at all what I meant. I meant to add something like this as
compat/vcbuild/include/stdlib.h:
#include <some-way-of-including-msvcs-copy-of-stdlib.h>
#include <io.h>
But it might not be possible to do this without hard-coding the path
to the PlatformSDK, so I think the second best thing is to put this in
compat/msvc.h which is where we usually make minor corrections for
MSVCRT oddities.
With the normal wcstombs/mbstowcs operation you can do the same with:
wchar_t wname[MAX_PATH];
if (utftowcs(wname, name, MAX_PATH) == MAX_PATH)
return -1;
In the cases where errno should actually be set to ENAMETOOLONG, it's a simple:
if (utftowcs(wname, name, MAX_PATH) == MAX_PATH) {
errno = ENAMETOOLONG;
return -1;
}
So I don't think mixing in filename errno-semantics into a function
that is also used to convert non-filename data makes any sense.
>
> If the helper functions worked exactly like wcstombs/mbstowcs, a lot more,
> largely redundant code would be required.
>
No, just the snippets above. And we'd get the nice bonus of not
risking polluting errno (which POSIX is very strict about when is
allowed to be set).
That's basically what I feared.
> Squashing one or two patches of the series might make sense (e.g. the
> diren/non-dirent file name patches and outgoing/incoming pairs), but unless
> you squash all of them into a single large patch, you'll have bisection
> problems anyway (with the few i18n tests, that is).
Well, that's a problem, then. Documentation/SubmittingPatches clearly
states "make sure that the test suite passes after your commit".
Bisection is such a useful tool, and it performed all the time to find
the reason for a breakage, that I doubt breaking the test-suite in
itself will make these patches indigestible.
> Seems that digestible patch sizes and bisection are contrary goals for a
> patch series this large, and I'd prefer to stick with the former.
In my experience, it's usually possible to come up with a way of
ordering things so you never end up breaking the test-suite. But it
takes some careful planning.
Having small patches isn't the only goal; there's lots of large
patches in git.git. Having logical patches that solve one problem at
the time is really the goal here.
So, I think some squashing will be needed.
Thanks for clearing this up. Perhaps #7/28 should be sent directly
upstream, then?
The commit message doesn't say that the old code wasn't thread safe,
and under what circumstances it can lead to incorrect operation as-is.
I can make some educated guesses, but the commit message isn't as much
for the people who already know the details, as it is for all the
other people who ends up seeing the patch.
Anyway, if this isn't strictly speaking unicode-related (which the
commit message gives me no reason to think), I think it's candidate
for fast-tracking to 'devel'. But I'd like an improved commit message
first.
It might be the simplest for you, but it's certainly a can of worms
for future users. And this special casing of the routine renders your
earlier argument that this had the same semantics as wcstombs/mbstowcs
kind of moot.
Setting errno to EINVAL on NULL-input is a responsibility of a given
set of library functions, and to the best of my knowledge we already
do this where it's needed. If you want to refactor this part to reduce
code-reuse, you are free to do that. But please do not do it as some
cryptic side-effect in a conversion function named to be similar of
some other function that does not do this. If the function was called
something like convert_filename_or_set_errno (could be implemented as
a wrapper around this), I would not have been so worried.
I can't say I see your point about it being the most flexible, though.
>> > If the helper functions worked exactly like wcstombs/mbstowcs, a lot
>> > more,
>> > largely redundant code would be required.
>> >
>>
>> No, just the snippets above. And we'd get the nice bonus of not
>> risking polluting errno (which POSIX is very strict about when is
>> allowed to be set).
>
> The situations in which these functions will fail and "pollute" errno are
> well documented and very easy to avoid by passing a large enough buffer, so
> it's not really an issue.
Documentation isn't an excuse for a misleading interface. Besides, who
reads the documentation anyway? ;)
Speaking of documentation, javadoc/doxygen-style tags is not the way
we document in Git. We tend to write API-documentation in
Documentation/technical (or simply as normal comments).
On Tue, 11 Oct 2011, Erik Faye-Lund wrote:
> On Tue, Oct 11, 2011 at 1:01 PM, <karste...@dcon.de> wrote:
>
> > There's obviously no interest in keeping this platform up to date
> > (neither in the git nor the msysgit community), and as there is
> > practically no supporting infrastructure (not even a working test
> > suite), I don't see much value in holding on to this. So probably we
> > should just move these patches to the attic and forget about them...
>
> The MSVC port is by no means top priority, and leaving fixing it up to
> whoever use MSVC is fine by me. In other words; I agree, we can leave
> them out.
I actually like the idea of having it in our repository, but you're right,
it takes somebody who wants to keep the support up-to-date. And quite
honestly, I will take any patches anybody claims fix MSVC unless they
break other setups.
So can you please make an MSVC-only branch? You might want to rebase it
onto 'tentative' which I will merge into 'devel' as soon as I found out
what is causing the bundle-related breakage. As kusma said, the
unix-sockets stuff has been backed out from current 'next' which I
rebase-merged onto.
Ciao,
Dscho
On Tue, 11 Oct 2011, Erik Faye-Lund wrote:
> Speaking of documentation, javadoc/doxygen-style tags is not the way we
> document in Git. We tend to write API-documentation in
> Documentation/technical (or simply as normal comments).
Ach, to have Javadoc-style documentation... Then we could use doxygen...
Oh well.
Seriously again, if I understand the discussion correctly, then the
semantics of the function to convert strings from one encoding to another
is adjusted in a manner that is targeted to filenames?
Ciao,
Dscho
On Tue, 11 Oct 2011, Erik Faye-Lund wrote:
> In my experience, it's usually possible to come up with a way of
> ordering things so you never end up breaking the test-suite. But it
> takes some careful planning.
>
> Having small patches isn't the only goal; there's lots of large patches
> in git.git. Having logical patches that solve one problem at the time is
> really the goal here.
>
> So, I think some squashing will be needed.
Wait. The way I do such changes usually goes like this: I add all the
logically-separate functions in single commits, then switch over by
including the header/source files in one commit (very bisectable, but of
course, nobody helps you to identify the issue in the really big change)
and then another commit to remove all now-obsolete code.
Would this be an option here?
Ciao,
Dscho
>> So, writing
>>
>> int foo = foo;
>>
>> is most definitely a programming error and should be fixed. On the other
>> hand,
>>
>> int foo = 0;
>>
>> is perfectly fine.
[...]
> I agree with all of this, but the people who makes the calls upstream
> doesn't seem to. And for a reason; changes like this degrades
> performance a bit. I tend to value code-confidence slightly higher
> than performance, though.
I don't make the calls upstream, but I think a much better reason to
dislike use of "int foo = 0;" in place of "int foo = foo;" or "int
foo;" is that it prevents valgrind from doing its job.
What ever happened to the following series?
http://thread.gmane.org/gmane.comp.version-control.git/169136
Thanks for clarifying this :)
No, they are not. There might be existing repos out there with other
commit-encodings, and we need to continue to support them. And to
allow people to collaborate with people who use older Git for Windows
versions (without these patches), we might want to support creating
them also. But then we will probably have to start recoding stuff when
we write the object, because it looks to me like Git makes no such
attempts now.
I'm not suggesting you change this, I'm just thinking out loud here.
Anyway, what doesn't make sense to allow at all, is setting
i18n.logOutputEncoding to something other than "utf-8" when stdout
goes to console. But it might make sense when writing to a file (and
might be required for format-patch to work correctly).
The problem that our environment-strings from t3901-8859-1.txt gets
reencoded to UTF-16 when loaded into the environment in the
'setup'-test can probably be worked around as-is with some plumbing.
But I'm not sure it's worth the hassle, so perhaps your disabling of
these are the best approach. I'm on the fence here. Rewriting the
tests to use plumbing for generating the commit can be done at a later
stage (if at all), I guess.
So, if I understand it correctly:
- The reason these tests fail, is that the patched msys.dll
interprets all environment variables as UTF-8, and re-encodes them to
UTF-16.
- After these patches (and patching msys.dll), we still support
reading existing repos with commits with non-UTF-8 commit-encoding.
- After these patches (and patching msys.dll), we cannot reliably
generate commits with non-UTF-8 commit-encodings from porcelain
commands.
Am I missing something?
The GMail-interface barfs pretty badly on these, as it won't let me
poen the "- Show quoted text -"-part of the e-mail without opening the
link instead. So I'm forced to read these e-mails in raw-mode.
Would you mind trying to find out why this happens, and if possible
try to avoid it in the future?
Thanks, that makes perfect sense to me. :)
Yes. I would prefer to see that done as a filename-specific wrapper
over the string-conversion routine instead.
I wasn't aware of the upstream discussion, thanks for the pointer.
IMO "int foo;" is the cleanest solution, however, some "int foo;" have
explicitly been changed to "int foo = foo;" to silence compiler warnings:
10e8d688 Correct compiler warnings in fast-import.
75b9a8a6 submodule.c: Squelch a "use before assignment" warning
With -Werror -Wuninitialized some compiler versions will barf, and
disabling -Wuninitialized will affect the entire codebase. That's why I
went for "int foo = 0;".
Perhaps this is just a stupid idea, but we could use macros to suppress
the warnings on a case-by-case basis, something like this:
/* replace with compiler version check */
#if COMPILER_SUPPORTS_PROPER_UNINITIALIZED_WARNINGS
# define UNINITIALIZED(default)
#else
# define UNINITIALIZED(default) = (default)
#endif
Then we could write:
int foo UNINITIALIZED(0);
> On Wed, Oct 12, 2011 at 3:09 AM, <karste...@dcon.de> wrote:
> >
> > Erik Faye-Lund <kusm...@gmail.com> wrote on 10.10.2011 17:26:16:
> >
[snip]
> >> If I understand the tests correctly it's about respecting
> >> i18n.commitencoding, which is something I expected this series to
help
> >> us do BETTER, not worse...
> >
> > No, this patch series doesn't improve the LEGACY encoding
support,quite the
> > contrary: it enables the use of the default value "utf-8". As Unicode
is a
> > universal international character set that supersedes all
language-specific
> > legacy encodings, the i18n options are basically obsolete with this.
>
> No, they are not. There might be existing repos out there with other
> commit-encodings, and we need to continue to support them.
This is not a problem, as the commit encoding is stored in each commit
object and git transcodes commits to i18n.logoutputencoding automatically.
> And to
> allow people to collaborate with people who use older Git for Windows
> versions (without these patches), we might want to support creating
> them also.
Not necessarily, as older Git for Windows versions also support
transcoding commit objects. If i18n.logoutputencoding and the console code
page are set to the Windows system encoding, displaying UTF-8 encoded
commit objects will work fine...except of course for UTF-8 characters that
are missing in the system encoding.
The only reason to support legacy i18n.commitencoding is probably for
editors that don't support UTF-8.
> But then we will probably have to start recoding stuff when
> we write the object, because it looks to me like Git makes no such
> attempts now.
>
The problem with recoding is that author/committer-name/email and commit
message may be specified in many different ways and the encoding depends
on the source:
- command line (-m, --author...): always UTF-8
- environment (GIT_AUTHOR_NAME...): always UTF-8
- ~/.gitconfig (user.name...): probably UTF-8?
- .git/config (user.name...): probably UTF-8, but probably different from
~/.gitconfig
- other commit (-C): as specified in other commit
- file (-F): anything
When I looked at the code that composes commit messages a while back, I
quickly decided that there were more important things to do :-)
> So, if I understand it correctly:
> - The reason these tests fail, is that the patched msys.dll
> interprets all environment variables as UTF-8, and re-encodes them to
> UTF-16.
Precisely.
> - After these patches (and patching msys.dll), we still support
> reading existing repos with commits with non-UTF-8 commit-encoding.
Yes.
> - After these patches (and patching msys.dll), we cannot reliably
> generate commits with non-UTF-8 commit-encodings from porcelain
> commands.
This is not necessary, as older Git versions support transcoding UTF-8
commits.
It is still possible to set i18n.commitencoding and use 'git-commit -F',
as t3900 demonstrates. However, if user.name is non-ASCII, it gets tricky:
you'd have to edit .git/config manually (not via git-config) and make sure
its stored in the target encoding.
I've not followed through this thread, but I am a gvim/vim user, and can be
found lurking on #vim.
There are three settings that are important for this kind of stuff vim:
The first is 'encoding' which is the encoding of the terminal (or the way gvim
deals with it). This should be set to utf-8. I'm not sure what it currently
does under msys; I will investigate once I get back to work on Monday.
The second is 'fileencoding' which is the encoding that the current file is set
to - this is usually detected automatically, set explicitly ( ++enc=utf-8 ),
or falls to the default. Changing this and writing, will convert it from one
encoding to another.. whereas using ++enc will read it using that encoding.
The third is 'fileencodings', which is a list of encodings to search through to
detect a file's encoding.
A lot can be worked out from typing
:set fenc? enc? fencs?
in vim.
Let me know if I can help work this out. It might be that some different
defaults in the vimrc are warrented. I will be back on my work machine ina
couple of days.
//.ichael G.
[...]
> > I don't use [g]vim if I can help it, but the vim wiki suggests ":e
> > ++enc=utf-8"...there's probably a .[g]vimrc to set the default
> > encoding, I don't know.
> I've not followed through this thread, but I am a gvim/vim user, and
> can be found lurking on #vim.
>
> There are three settings that are important for this kind of stuff
> vim:
>
> The first is 'encoding' which is the encoding of the terminal (or
> the way gvim deals with it). This should be set to utf-8. I'm not
> sure what it currently does under msys; I will investigate once I get
> back to work on Monday.
This is a bit more involved.
First, 'encoding' controls the encoding of Vim's internal buffers.
Second, terminal encoding is set using the 'termencoding' (abbreviated
'tenc') option. But the Vim manual on this option states that for GUI
Vim tenc does only control what's produced by the keyboard, and
displaying is anyway controlled via the 'encoding' option.
To demonstrate, if you run native vim.exe (that one from the vim.sf.net
installer) in a console window, it'll have enc set to the system "ANSI"
code page and tenc set to the system "OEM" codepage. In Windows XP,
setting Unicode font on a console window does not alter this behaviour
in any way. You can set enc=utf-8 in it, and that won't change the
termencoding from its default value. I hence suspect that the console
version of Vim for Windows is also not really Unicode-enabled, at least
not on Windows XP.
> The second is 'fileencoding' which is the encoding that the current
> file is set to - this is usually detected automatically, set
> explicitly ( ++enc=utf-8 ), or falls to the default. Changing this
> and writing, will convert it from one encoding to another.. whereas
> using ++enc will read it using that encoding.
>
> The third is 'fileencodings', which is a list of encodings to search
> through to detect a file's encoding.
>
> A lot can be worked out from typing
> :set fenc? enc? fencs?
> in vim.
>
> Let me know if I can help work this out. It might be that some
> different defaults in the vimrc are warrented. I will be back on my
> work machine ina couple of days.
I think that currently the only way to reliably get UTF-8 in commit
messages when using Vim is to set core.editor (or the GIT_EDITOR
environment variable) to
vim.exe -c "set enc=utf-8"
or
vim.exe -c "set fenc=utf-8"
(I think the former is slightly more logical these days).
Messing with defaults is dangerous because on Windows you usually
expect test editors to write out files in ANSI code page.
This does not really contradict my point, though. Being allowed
doesn't make it a bad idea. I prefer having clear and understandable
code over having a few lines less of code.
But my concern isn't with EINVAL so much as it is with ENAMETOOLONG,
which is applying filename semantics to any buffer overflow. And the C
way of handling general buffer overflows tends to be capping on write
and returning the bytes written.
> The same is true for most POSIX wrappers in mingw.c, either by
> wrapping the MSVC version of the function or by using err_win_to_posix
> (which btw. also maps the generic ERROR_BUFFER_OVERFLOW to ENAMETOOLONG).
err_win_to_posix converting ERROR_BUFFER_OVERFLOW to ENAMETOOLONG
looks like a bug to me. It should probably only have converted
ERROR_FILENAME_EXCED_RANGE to that.
The problem is that there isn't an ERROR_BUFFER_OVERFLOW-equivalent
errno-value in POSIX, so we can't really convert it. But very few
Win32-functions can generate ERROR_BUFFER_OVERFLOW:
GetAdaptersAddresses, GetAdaptersInfo, GetNetworkParams,
GetComputerName, SendARP, DsUnquoteRdnValue, DsQuoteRdnValue,
PrepareLogArchive, GetPerAdapterInfo, SLSetGenuineInformation and
MprConfigGetGuidName. We don't call any of these functions, so we can
safely remove it from the conversion table.
>> and to the best of my knowledge we already
>> do this where it's needed. If you want to refactor this part to reduce
>> code-reuse, you are free to do that. But please do not do it as some
>> cryptic side-effect in a conversion function named to be similar of
>> some other function that does not do this. If the function was called
>> something like convert_filename_or_set_errno (could be implemented as
>> a wrapper around this), I would not have been so worried.
>>
>
> The "or_set_errno" part is understood for C functions that can fail.
Yes, but it helps the reader of the code understand why it's not being
set elsewhere in the function call. Basically, when looking over the
code for a function like opendir (where POSIX require errno to be set
to ENAMETOOLONG if the passed path is above PATH_MAX), it is quite
obvious where errno is set if it calls a function called
"convert_path_or_set_errno". A comment can be just as clear thought,
so I have no problem just going with "convert_filename" plus a
comment. The name was just suggestion.
> And
> just because the conversion functions are largely used by file API wrappers,
> or that the error codes happen to be OK for these APIs, doesn't make those
> functions specific to filenames. In fact, the same functions are also used
> for command-line, environment and console conversions.
>
Exactly, this is a good reason why we do NOT want a filename specific
errno-assignment in these non-path-related code-paths. Even though
these code-paths might take care not to trigger it, it's just not very
easy for people reading the code to to understand. And the code can
change at some point, where a cryptic error message might start
appearing instead of one that makes sense.
>> Speaking of documentation, javadoc/doxygen-style tags is not the way
>> we document in Git. We tend to write API-documentation in
>> Documentation/technical (or simply as normal comments).
>
> Yeah, I noticed that... manually documenting APIs apart from the source is
> tedious work and the results tend to be out of date and incompl
:D
>
> Talk about advantages of API documentation tools, things like
> * complete, up-to-date method signatures
> * cross referenced and indexed
> * IDE integration
>
> (KB, last updated 1997 while installing doxygen...)
I'm not objecting to Doxygen being used in git, I'm objecting to
starting using Doxygen documentation in a commit that is not about
just that. Starting to use a new tool in an unrelated commit isn't the
way we usually do it :)
And just for the sake of the argument: Personally, I'm not a big
Doxygen-fan (because programmers tend to have the crazy idea that
listings of functions and structures is useful documentation, and be
lazy with proper documentation as a result), but I'm not in violent
disagree with it either. Besides, I don't think Doxygen does a very
good job on C-code.
This has the side-effect that these environment variables now gets
exported to child-processes, right? I'm not saying it's a bad thing,
but perhaps it's worth pointing out in the commit message?
I actually think exporting TMPDIR to child-processes fixes a small
issue in "git archimport" (i.e when called through git.exe)... :)
Yes, it would. Thanks! :)
How does this make the code any more thread-safe? I'm assuming the
previous problem was that one thread could set the console attributes,
and then another thread could come in and change it again, while the
other one is depending on it. Is that so?
If it is, won't pipes be just as problematic? Since pipes are of
finite length, writes can end up being interleaved, leading to the
same problem. But now it's even worse; writes can be split in the
middle of UTF-8 sequences, leading to not only errors in the console
attributes, but also in the text emitted!
Or is there something I'm missing?
Wouldn't it be better to put a critical section around ansi_emulate,
keep the console attribute in a TLS, and re-set it right after taking
the mutex? That way, every thread that prints to the console gets it's
own console state...
But I'm really wondering, why is this needed? What part of the git
code prints to the console from multiple threads? I can understand if
one thread writes to stdout while one writes to stderr, as stdout is
expected to be buffered while stderr is not... is this what cause
problems?
Absolutely. I appreciate especially the renaming to *_path() -- even if it
might sound trivial to you -- as it makes it absolutely clear what this
function is doing.
Ciao,
Dscho
If is_console can be called from multiple threads, then "initialized"
is also problematic. And you fix that in this patch, which is good.
> I think the git send/receive-pack
> routines work that way, although they use write() for the file part, which
> isn't overridden in winansi.c yet (the TODO is already there, though).
>
> The only command that I'm aware of that prints to the console from multiple
> threads is git-grep (but this also uses write(), so console attributes
> currently don't work at all).
Are you saying that the patch is purely theoretical? I'm not against
protecting ourselves from being stupid in the future, but I think it
should be reflected in the commit message (if my understanding is
correct)...
> But you're correct that multiple console
> threads _would_ overwrite each other's console attributes with the current,
> unsynchronized ansi_emulate function.
>
> The new version doesn't have this problem because the Win32 and MSVCRT IO
> APIs are already thread safe, so a single call to an IO function is
> guaranteed to be uninterrupted by control codes from other threads. And if
> multi threaded console threads take care to reset their attributes with each
> call, everything is fine.
>
>> If it is, won't pipes be just as problematic? Since pipes are of
>> finite length, writes can end up being interleaved, leading to the
>> same problem. But now it's even worse; writes can be split in the
>> middle of UTF-8 sequences, leading to not only errors in the console
>> attributes, but also in the text emitted!
>>
>
> You're correct that UTF-8 sequences may be split (could happen every 1,024
> characters at the worst with a 4k buffer). I've already tackled this problem
> in the msys console handler, and I'll try to look into it here when I find
> the time.
>
I'm looking forward to have a look at it :)
>> Or is there something I'm missing?
>>
>> Wouldn't it be better to put a critical section around ansi_emulate,
>> keep the console attribute in a TLS, and re-set it right after taking
>> the mutex? That way, every thread that prints to the console gets it's
>> own console state...
>>
>
> Multi threaded console apps will have to take care to print in reasonable
> chunks anyway (at least complete lines, git-grep prints one file at a time),
> and to reset their attributes in every single chunk. I don't think there's a
> per-thread console state on Linux either, so that's the only way it could
> possibly work.
>
In this case it doesn't help to take such care if your file can be
larger than the 4k-pipe you've created. Multiple files might be
interleaved in 4k blocks.
>> But I'm really wondering, why is this needed? What part of the git
>> code prints to the console from multiple threads? I can understand if
>> one thread writes to stdout while one writes to stderr, as stdout is
>> expected to be buffered while stderr is not... is this what cause
>> problems?
I'd really love an answer to this question. Is this a theoretical fix,
or does it fix observable misbehavior?
After another (quick) review of the code, I have another nit on the
implementation:
On Thu, Oct 6, 2011 at 7:41 PM, Karsten Blees <bl...@dcon.de> wrote:
> @@ -28,8 +23,10 @@ static HANDLE console;
> static WORD plain_attr;
> static WORD attr;
> static int negative;
> -static FILE *last_stream = NULL;
> static int non_ascii_used = 0;
> +static HANDLE hthread, hread, hwrite;
> +static HANDLE hwrite1 = INVALID_HANDLE_VALUE, hwrite2 = INVALID_HANDLE_VALUE;
> +static HANDLE hconsole1, hconsole2;
>
'attr', 'plain_attr' and 'negative' are global variables, and...
> -static int is_console(FILE *stream)
> +static int is_console(int fd)
> {
> CONSOLE_SCREEN_BUFFER_INFO sbi;
> HANDLE hcon;
>
> - static int initialized = 0;
<snip>
> - if (!initialized) {
> - attr = plain_attr = sbi.wAttributes;
> - negative = 0;
> - initialized = 1;
> - /* check console font on exit */
> - atexit(warn_if_raster_font);
> - }
> -
> - console = hcon;
> + /* initialize attributes */
> + attr = plain_attr = sbi.wAttributes;
> + negative = 0;
> return 1;
> }
>
...is_console() initialize these from it's input fd's console attributes. But...
> +void winansi_init(void)
> +{
> + int con1, con2, hwrite_fd;
>
> - rv = ansi_emulate(buf, stream);
> + /* check if either stdout or stderr is a console output screen buffer */
> + con1 = is_console(1);
> + con2 = is_console(2);
> + if (!con1 && !con2)
> + return;
>
you call is_console(1) followed by is_console(2). This means that
"attr" and "negative" gets initialized twice. I guess your reasoning
is that there's only one console, so they'll be initialized to the
same value. I think this deserves a comment. Something like:
+ con1 = is_console(1);
+ con2 = is_console(2); /* may re-initialize state */
...should be sufficient. But perhaps it's even better to simply keep
the old initialized-logic, since it's already there?
Thanks a lot for the explanation. Based on this, I agree that this is
worth fixing.
> My local commit message currently starts like so:
> "Winansi.c has many static variables that are accessed and modified from
> the [v][f]printf / fputs functions overridden in the file. This may cause
> multi threaded git commands that print to the console to produce corrupted
> output or even crash.
>
> Additionally, winansi.c doesn't override all functions that can be used to
> print to the console (e.g. fwrite, write, fputc are missing), so that ANSI
> escapes don't work properly for some git commands (e.g. git-grep).
>
> Instead of doing ANSI emulation in just a few wrapped functions on top of
> the IO API, let's plug into the IO system and take advantage of the thread
> safety inherent to the IO system.
> ..."
>
Hmm, these last two paragraphs seems to suggest you are are changing
write, fwrite and fputc to go through the ansi-emulation also... But I
cant spot it in the code. Am I blind, or just misunderstanding the
text?
Apart from that, I like it.
>> > You're correct that UTF-8 sequences may be split (could happen every
>> > 1,024
>> > characters at the worst with a 4k buffer). I've already tackled this
>> > problem
>> > in the msys console handler, and I'll try to look into it here when I
>> > find
>> > the time.
>> >
>>
>> I'm looking forward to have a look at it :)
>>
>
> See the patch below, I'll integrate this in the next version.
>
After a quick glance at the code: looks good to me :)
>> In this case it doesn't help to take such care if your file can be
>> larger than the 4k-pipe you've created. Multiple files might be
>> interleaved in 4k blocks.
>>
>
> Only if the calling thread splits the files into 4k blocks and calls
> WriteFile/write/fwrite for each block. If you print the whole file in a
> single call, that call is uninterruptible by other threads (this is what
> git-grep does). The buffer size of the pipe doesn't affect the tread safety
> of stdio and Win32 APIs at all.
>
Not quite, but yes ;)
POSIX allows write to do an incomplete operation(in the case of more
than {PIPE_BUF} bytes, which is 4096 in our case). This is part of the
reason we have write_in_full in wrapper.c...
http://pubs.opengroup.org/onlinepubs/007904975/functions/write.html
"A write request for more than {PIPE_BUF} bytes shall cause one of the
following:
When at least one byte can be written, transfer what it can and return
the number of bytes written. When all data previously written to the
pipe is read, it shall transfer at least {PIPE_BUF} bytes.
When no data can be written, transfer no data, and return -1 with
errno set to [EAGAIN]."
So let's see what MSDN has to say:
http://msdn.microsoft.com/en-us/library/windows/desktop/aa365141(v=vs.85).aspx
"To write to the pipe, use the pipe's write handle in a call to the
WriteFile function. The WriteFile call does not return until it has
written the specified number of bytes to the pipe or an error occurs.
If the pipe buffer is full and there are more bytes to be written,
WriteFile does not return until another process reads from the pipe,
making more buffer space available."
Aha, so Windows is doing the sane thing. Good to know, and you are
indeed right. We're in the clear :)
A bit of a tangent, just to end of this part; git tries to protect
from outputting in the wrong order by locking a mutex around writing
it's output. So at this level it doesn't matter much anyway, I'm being
paranoid.
>> you call is_console(1) followed by is_console(2). This means that
>> "attr" and "negative" gets initialized twice. I guess your reasoning
>> is that there's only one console, so they'll be initialized to the
>> same value. I think this deserves a comment. Something like:
>>
>> + con1 = is_console(1);
>> + con2 = is_console(2); /* may re-initialize state */
>>
>> ...should be sufficient. But perhaps it's even better to simply keep
>> the old initialized-logic, since it's already there?
>
> Thanks, you're right, I didn't account for stderr / stdout going to
> different consoles :-)
> is_console is called only twice during initialization, so spending a static
> 'initialized' variable to prevent reinitialization seems overkill. I guess
> I'll go for the comment.
Works for me. But surely, your reasoning about "overkill" is
backwards: It's REMOVING the initialization-check that would be
overkill (as in code-churn), no?
> @@ -376,8 +377,29 @@ static DWORD WINAPI console_thread(LPVOID unused)
> }
>
> /* print remaining text unless we're parsing an escape
> sequence */
> - if (state == TEXT && end > start)
> - write_console(buffer + start, end - start);
> + if (state == TEXT && end > start) {
> + /* check for incomplete UTF-8 sequences and fix end
> accordingly */
> + if (buffer[end - 1] >= 0x80) {
> + if (buffer[end -1] >= 0xc0)
> + end--;
> + else if (end - 1 > start && buffer[end - 2]
>>= 0xe0)
> + end -= 2;
> + else if (end - 2 > start && buffer[end - 3]
>>= 0xf0)
> + end -= 3;
> + }
Minor nit: As you can see from gmail wrapping my lines here, you are
above 80 lines here and should wrap some of these lines.
Good read, I think we are VERY close to having a version of this
series that we can merge. Great work!
(Heh, Gmail scared me a bit with that ">>= 0xe0)"-line. But that's not
your fault, of course)
On Thu, 1 Dec 2011, karste...@dcon.de wrote:
> Instead of overriding these functions (in fact I'm removing overrides
> for [v]fprintf and fputs), you could say I'm "underriding" them by
> replacing the stdout / stderr file descriptors with the pipe to the
> console thread.
Clever.
> Depends on whether you prefer minimal changes or minimal code. Future
> developers will look at the current state of the code, not the changes
> that led there, and every line less to read and understand saves them
> time, right?
At the moment the problem is getting past the maintainer(s). And I
seriously like minimal changes to well-exercized code much more than
maximal changes for a future benefit that will not benefit me ;-)
So I'd really appreciate minimal changes. Really.
Thanks!
Dscho
I think the former would also require a "set tenc=ANSI" (where ANSI is
the specific ANSI codepage (ACP), as per Microsoft's definition, of the
system).
The discussion here looks at the encoding of messages in new
commits, but we should also look at what happens when vim reloads
previous commit messages (e.g. for `rebase -i' and `commit --amend').
In order to correctly handle those situations as well, the options will
probably become:
vim --cmd "set enc=utf-8 tenc=ANSI"
or
vim --cmd "set fencs=utf-8"
(i.e. `--cmd' rather than `-c' and `fencs' instead of `fenc')
<aside>
The brief explanation is: (so that other readers aren't forced to find a
manual)
* `--cmd' runs before vim loads the file, while `-c' runs after.
* the longer forms of `enc', `tenc', `fenc', `fencs' are `encoding',
`termencoding', `fileencoding' and `fileencodings'.
* `fencs' is a list encodings vim will chooses from when loading
a file. `fenc' will become the encoding vim has chosen. Changing `fenc'
changes the encoding vim uses when saving.
* Setting blanks (either explicitly, or by having no defaults) for `tenc',
`fenc' and `fencs' results in their respective values being the same as
`enc'.
</aside>
So it's a toss-up between these defaults:
set enc=utf-8 fencs= tenc=ANSI
or
set enc=ANSI fencs=utf-8 tenc=
Maybe the former is conceptually better, but practically I don't think
the latter is less so.
The former presents a challenge in that I don't know how to set "ANSI"
(the system dependent value, which is different from what vim means by
"ansi", and is "cp932" here). `enc' starts off with the ANSI setting
and I think it's done internally by vim.
Another consideration is that the former doesn't work properly here (I
can't input native characters with that setting because vim goes out of
INSERT mode when I try). A short discussion at [1] says that vim is
flaky in unicode mode so maybe it's related to that.
[1] http://groups.google.com/group/vim_dev/browse_frm/thread/a09c482b063f818b/3dd77f3014a60e43
Anyways, I propose the latter approach and will put together a patch as
soon as I've overcome some technical and time constraint hurdles. Maybe
I'll post an RFC of what I'm thinking of doing before that because I'm
on an network where I can't checkout external repositories.
> Messing with defaults is dangerous because on Windows you usually
> expect test editors to write out files in ANSI code page.
It's could be best if it can be limited to COMMIT_EDITMSG, but since
that's a technical challenge in itself, I'm thinking it might be okay if
the defaults were changed via "C:\Program Files\Git\share\vim\vimrc"
because that's under "Git for Windows"s install path.
--
Atsushi Nakagawa
<at...@chejz.com>
Changes are made when there is inconvenience.