Credits go to Thorvald Natvig (tn/utf8), Nico Rieck (work/utf8) and Takuya Murakami (work/tm/utf8) for their previous work on the subject, from which I've shamelessly stolen ideas as seemed fit. Notable differences from their patches include:
* The UTF-8 to UTF-16 conversion function can handle invalid UTF-8 (similar to TCL), so that old git repositories with legacy-encoded file names can be checked out without error.
* The conversion functions don't use static buffers in order to be fully thread safe. This makes each usage a bit more complicated, but is necessary for the multi-threaded parts of git to work (e.g. git-grep, git-pack-objects, core.preloadindex).
I've included a revised version of the thread-safe console patch that I've posted a while back. This version eliminates the call to TerminateThread and uses the fault-tolerant UTF-8 conversion function from the file name patch (so that file system and console output are in sync).
Patch 3 breaks some of the i18n tests (t3901, t4201 and t8005), which I really don't know how to fix properly. MSys bash is completely broken when it comes to Unicode, as it reads test scripts in Windows system encoding (instead of UTF-8). Changing the encoding of the test scripts (e.g. to Cp1252) is not a solution as this will fail on Windows systems with different encodings. Currently, I've just changed the tests to expect failure if bash cannot pass unicode script content to other processes correctly.
$ grep "failed [^0]" t/test-results/*
t/test-results/t5407-post-rewrite-hook-6400.counts:failed 9
t/test-results/t7400-submodule-basic-7004.counts:failed 1
t/test-results/t7508-status-5936.counts:failed 18
t/test-results/t9001-send-email-6652.counts:failed 4
t/test-results/t9500-gitweb-standalone-no-errors-3620.counts:failed 83
t/test-results/t9501-gitweb-standalone-http-status-5620.counts:failed 11
t/test-results/t9502-gitweb-standalone-parse-output-3384.counts:failed 9
Karsten Blees (5):
Unicode file names on Windows
Unicode file names for gitk and git-gui on Windows
Unicode arguments and environment on Windows
Thread-safe windows console output
Mark unicode-related tests broken on msys
Makefile | 2 -
compat/mingw.c | 757 ++++++++++++++++++++++++++++++++++---------------
compat/mingw.h | 103 +++++--
compat/winansi.c | 363 +++++++++++++++---------
git-gui/git-gui.sh | 6 +-
git-gui/lib/index.tcl | 6 +-
gitk-git/gitk | 15 +-
run-command.c | 10 +-
t/t3901-i18n-patch.sh | 28 +-
t/t4201-shortlog.sh | 6 +-
t/t8005-blame-i18n.sh | 9 +-
t/test-lib.sh | 13 +
12 files changed, 871 insertions(+), 447 deletions(-)
--
1.7.3.1.msysgit.0.8.g31c0ec
Provides replacements for getenv/putenv/unsetenv that operate directly on
the UTF-8-encoded "environ" array to save UTF-8/16 conversions on each
call (and save the trouble to keep track of conversion memory). As sorting
the environment is required for CreateProcess anyway, keep "environ"
sorted and use binary search for faster lookups.
Initialization code has been moved from the main macro to the new
mingw_startup function. This function also converts the command line and
environment to UTF-8, and fixes Windows specific environment settings
(TMPDIR and TERM, formerly handled in mingw_getenv).
Disables MSVCRT wildcard expansion of the command line to be able to pass
*/? to git grep and friends (fixes t5505 and t7810).
Altering the environment when creating new processes is now handled in
spawnv* internally, replacing make_augmented_environ and related APIs.
Signed-off-by: Karsten Blees <bl...@dcon.de>
---
Makefile | 2 -
compat/mingw.c | 354 +++++++++++++++++++++++++++++++++++++-------------------
compat/mingw.h | 23 ++---
run-command.c | 10 +--
4 files changed, 246 insertions(+), 143 deletions(-)
diff --git a/Makefile b/Makefile
index d3dcfb1..dba3182 100644
--- a/Makefile
+++ b/Makefile
@@ -1045,7 +1045,6 @@ ifeq ($(uname_S),Windows)
NO_SYMLINK_HEAD = YesPlease
NO_IPV6 = YesPlease
NO_SETENV = YesPlease
- NO_UNSETENV = YesPlease
NO_STRCASESTR = YesPlease
NO_STRLCPY = YesPlease
NO_STRTOK_R = YesPlease
@@ -1100,7 +1099,6 @@ ifneq (,$(findstring MINGW,$(uname_S)))
NO_LIBGEN_H = YesPlease
NO_SYMLINK_HEAD = YesPlease
NO_SETENV = YesPlease
- NO_UNSETENV = YesPlease
NO_STRCASESTR = YesPlease
NO_STRLCPY = YesPlease
NO_STRTOK_R = YesPlease
diff --git a/compat/mingw.c b/compat/mingw.c
index 8f20b4b..1285d27 100644
--- a/compat/mingw.c
+++ b/compat/mingw.c
@@ -761,22 +761,110 @@ char *mingw_getcwd(char *pointer, int len)
return pointer;
}
-#undef getenv
+/*
+ * Compare environment entries by key (i.e. stopping at '=').
+ */
+static int compareenv(const void *v1, const void *v2)
+{
+ const char *e1 = *(const char**)v1;
+ const char *e2 = *(const char**)v2;
+
+ /* sort NULL entries at the end */
+ if (!e1 || !e2)
+ return (e1 == e2) ? 0 : e1 ? -1 : 1;
+
+ for (;;) {
+ int c1 = *e1++;
+ int c2 = *e2++;
+ c1 = (c1 == '=') ? 0 : tolower(c1);
+ c2 = (c2 == '=') ? 0 : tolower(c2);
+ if (c1 > c2)
+ return 1;
+ if (c1 < c2)
+ return -1;
+ if (c1 == 0)
+ return 0;
+ }
+}
+
+#define MIN_ENV_SIZE 16
+
+/*
+ * Resize and sort environment for O(log n) getenv / putenv
+ */
+static void initenv(char **env)
+{
+ int i, envsz = MIN_ENV_SIZE;
+ char **e;
+ for (e = env, i = 1; *e; e++, i++)
+ if (i == envsz)
+ envsz <<= 1;
+ environ = xcalloc(envsz, sizeof(char*));
+ memcpy(environ, env, i * sizeof(char*));
+ qsort(environ, i, sizeof(char*), compareenv);
+}
+
+/*
+ * Allocated space for environ is always a power of 2 (with unused entries
+ * filled with NULL), so we can determine the size more efficiently.
+ */
+static int getenvsize()
+{
+ int envsz = MIN_ENV_SIZE;
+ while (environ[envsz - 1])
+ envsz <<= 1;
+ return envsz;
+}
+
char *mingw_getenv(const char *name)
{
- char *result = getenv(name);
- if (!result) {
- if (!strcmp(name, "TMPDIR")) {
- /* on Windows it is TMP and TEMP */
- result = getenv("TMP");
- if (!result)
- result = getenv("TEMP");
- } else if (!strcmp(name, "TERM")) {
- /* simulate TERM to enable auto-color (see color.c) */
- result = "winansi";
+ char *value, **env;
+ env = bsearch(&name, environ, getenvsize(), sizeof(char*), compareenv);
+ if (!env)
+ return NULL;
+ value = strchr(*env, '=');
+ return value ? &value[1] : NULL;
+}
+
+int mingw_unsetenv(const char *name)
+{
+ int envsz;
+ char **env;
+ envsz = getenvsize();
+ env = bsearch(&name, environ, envsz, sizeof(char*), compareenv);
+ if (!env)
+ return 0;
+ free(*env);
+ memmove(env, env + 1, (envsz - 1 - (env - environ)) * sizeof(char*));
+ return 0;
+}
+
+int mingw_putenv(char *namevalue)
+{
+ int envsz;
+ char **env;
+ if (!namevalue || !strchr(namevalue, '=')) {
+ errno = EINVAL;
+ return -1;
+ }
+ envsz = getenvsize();
+ env = bsearch(&namevalue, environ, envsz, sizeof(char*), compareenv);
+
+ if (env) {
+ /* replace existing entry */
+ free(*env);
+ *env = namevalue;
+ } else {
+ /* add at the end (last slot is guaranteed to be free), then sort */
+ environ[envsz - 1] = namevalue;
+ qsort(environ, envsz, sizeof(char*), compareenv);
+ /* expand array if last slot is used */
+ if (environ[envsz - 1]) {
+ environ = xrealloc(environ, 2 * envsz * sizeof(char*));
+ memset(environ + envsz, 0, envsz * sizeof(char*));
}
}
- return result;
+ return 0;
}
/*
@@ -961,21 +1049,63 @@ static char *path_lookup(const char *cmd, char **path, int exe_only)
return prog;
}
-static int env_compare(const void *a, const void *b)
+/*
+ * Create environment block suitable for CreateProcess. Merges current working
+ * directories, current environment and the supplied environment changes.
+ */
+static wchar_t *make_environment_block(char **env)
{
- char *const *ea = a;
- char *const *eb = b;
- return strcasecmp(*ea, *eb);
+ char **e, **t, **tmpenv;
+ wchar_t *envblk = NULL;
+ int len, envsz, envblksz = 0, envblkpos = 0;
+
+ /* get current environment size + supplied changes */
+ envsz = getenvsize();
+ for (e = env; e && *e; e++)
+ envsz++;
+
+ /* allocate temporary environment array and copy current environment */
+ tmpenv = xcalloc(envsz, sizeof(char*));
+ for (e = environ, t = tmpenv; *e; e++, t++)
+ *t = *e;
+
+ /* merge supplied environment changes into the temporary environment */
+ if (env) {
+ int tmpsz = t - tmpenv;
+ for (e = env; *e; e++) {
+ /* replace / remove existing entries or append at the end */
+ char **f = bsearch(e, tmpenv, tmpsz, sizeof(char*), compareenv);
+ if (f)
+ *f = strchr(*e, '=') ? *e : NULL;
+ else
+ *t++ = strchr(*e, '=') ? *e : NULL;
+ }
+ /* finally sort the temporary environment */
+ qsort(tmpenv, envsz, sizeof(char*), compareenv);
+ }
+
+ /* add temporary environment to the environment block */
+ for (e = tmpenv; *e; e++) {
+ len = strlen(*e) + 1;
+ ALLOC_GROW(envblk, (envblkpos + len) * sizeof(wchar_t), envblksz);
+ len = utftowcs(&envblk[envblkpos], *e, len);
+ envblkpos += len + 1;
+ }
+ /* add final \0 terminator */
+ ALLOC_GROW(envblk, (envblkpos + 1) * sizeof(wchar_t), envblksz);
+ envblk[envblkpos] = 0;
+ return envblk;
}
static pid_t mingw_spawnve_fd(const char *cmd, const char **argv, char **env,
const char *dir,
int prepend_cmd, int fhin, int fhout, int fherr)
{
- STARTUPINFO si;
+ STARTUPINFOW si;
PROCESS_INFORMATION pi;
- struct strbuf envblk, args;
- unsigned flags;
+ struct strbuf args;
+ wchar_t *envblk, wcmd[MAX_PATH], wdir[MAX_PATH], *wargs;
+ unsigned flags = CREATE_UNICODE_ENVIRONMENT;
BOOL ret;
/* Determine whether or not we are associated to a console */
@@ -992,7 +1122,7 @@ static pid_t mingw_spawnve_fd(const char *cmd, const char **argv, char **env,
* instead of CREATE_NO_WINDOW to make ssh
* recognize that it has no console.
*/
- flags = DETACHED_PROCESS;
+ flags |= DETACHED_PROCESS;
} else {
/* There is already a console. If we specified
* DETACHED_PROCESS here, too, Windows would
@@ -1000,7 +1130,6 @@ static pid_t mingw_spawnve_fd(const char *cmd, const char **argv, char **env,
* The same is true for CREATE_NO_WINDOW.
* Go figure!
*/
- flags = 0;
CloseHandle(cons);
}
memset(&si, 0, sizeof(si));
@@ -1010,6 +1139,11 @@ static pid_t mingw_spawnve_fd(const char *cmd, const char **argv, char **env,
si.hStdOutput = (HANDLE) _get_osfhandle(fhout);
si.hStdError = (HANDLE) _get_osfhandle(fherr);
+ if (utftowcs(wcmd, cmd, MAX_PATH) < 0)
+ return -1;
+ if (dir && utftowcs(wdir, dir, MAX_PATH) < 0)
+ return -1;
+
/* concatenate argv, quoting args as we go */
strbuf_init(&args, 0);
if (prepend_cmd) {
@@ -1027,33 +1161,18 @@ static pid_t mingw_spawnve_fd(const char *cmd, const char **argv, char **env,
free(quoted);
}
- if (env) {
- int count = 0;
- char **e, **sorted_env;
-
- for (e = env; *e; e++)
- count++;
+ wargs = xmalloc((2 * args.len + 1) * sizeof(wchar_t));
+ utftowcs(wargs, args.buf, 2 * args.len + 1);
+ strbuf_release(&args);
- /* environment must be sorted */
- sorted_env = xmalloc(sizeof(*sorted_env) * (count + 1));
- memcpy(sorted_env, env, sizeof(*sorted_env) * (count + 1));
- qsort(sorted_env, count, sizeof(*sorted_env), env_compare);
-
- strbuf_init(&envblk, 0);
- for (e = sorted_env; *e; e++) {
- strbuf_addstr(&envblk, *e);
- strbuf_addch(&envblk, '\0');
- }
- free(sorted_env);
- }
+ envblk = make_environment_block(env);
memset(&pi, 0, sizeof(pi));
- ret = CreateProcess(cmd, args.buf, NULL, NULL, TRUE, flags,
- env ? envblk.buf : NULL, dir, &si, &pi);
+ ret = CreateProcessW(wcmd, wargs, NULL, NULL, TRUE, flags, envblk,
+ dir ? wdir : NULL, &si, &pi);
- if (env)
- strbuf_release(&envblk);
- strbuf_release(&args);
+ free(envblk);
+ free(wargs);
if (!ret) {
errno = ENOENT;
@@ -1175,80 +1294,6 @@ void mingw_execv(const char *cmd, char *const *argv)
mingw_execve(cmd, argv, environ);
}
-static char **copy_environ(void)
-{
- char **env;
- int i = 0;
- while (environ[i])
- i++;
- env = xmalloc((i+1)*sizeof(*env));
- for (i = 0; environ[i]; i++)
- env[i] = xstrdup(environ[i]);
- env[i] = NULL;
- return env;
-}
-
-void free_environ(char **env)
-{
- int i;
- for (i = 0; env[i]; i++)
- free(env[i]);
- free(env);
-}
-
-static int lookup_env(char **env, const char *name, size_t nmln)
-{
- int i;
-
- for (i = 0; env[i]; i++) {
- if (0 == strncmp(env[i], name, nmln)
- && '=' == env[i][nmln])
- /* matches */
- return i;
- }
- return -1;
-}
-
-/*
- * If name contains '=', then sets the variable, otherwise it unsets it
- */
-static char **env_setenv(char **env, const char *name)
-{
- char *eq = strchrnul(name, '=');
- int i = lookup_env(env, name, eq-name);
-
- if (i < 0) {
- if (*eq) {
- for (i = 0; env[i]; i++)
- ;
- env = xrealloc(env, (i+2)*sizeof(*env));
- env[i] = xstrdup(name);
- env[i+1] = NULL;
- }
- }
- else {
- free(env[i]);
- if (*eq)
- env[i] = xstrdup(name);
- else
- for (; env[i]; i++)
- env[i] = env[i+1];
- }
- return env;
-}
-
-/*
- * Copies global environ and adjusts variables as specified by vars.
- */
-char **make_augmented_environ(const char *const *vars)
-{
- char **env = copy_environ();
-
- while (*vars)
- env = env_setenv(env, *vars++);
- return env;
-}
-
/*
* Note, this isn't a complete replacement for getaddrinfo. It assumes
* that service contains a numerical port, or that it it is null. It
@@ -1947,3 +1992,76 @@ int wcstoutf(char *utf, const wchar_t *wcs, size_t utflen)
errno = ENAMETOOLONG;
return -1;
}
+
+/*
+ * Disable MSVCRT command line wildcard expansion (__getmainargs called from
+ * mingw startup code, see init.c in mingw runtime).
+ */
+int _CRT_glob = 0;
+
+typedef struct {
+ int newmode;
+} _startupinfo;
+
+extern int __wgetmainargs(int *argc, wchar_t ***argv, wchar_t ***env, int glob,
+ _startupinfo *si);
+
+void mingw_startup()
+{
+ int i, maxlen, argc;
+ char *buffer;
+ wchar_t **e, **wenv, **wargv;
+ _startupinfo si;
+
+ /* get wide char arguments and environment */
+ __wgetmainargs(&argc, &wargv, &wenv, _CRT_glob, &si);
+
+ /* determine size of argv and environ conversion buffer */
+ maxlen = wcslen(_wpgmptr);
+ for (i = 1; i < argc; i++)
+ maxlen = max(maxlen, wcslen(wargv[i]));
+ for (e = wenv; *e; e++)
+ maxlen = max(maxlen, wcslen(*e));
+
+ /* allocate buffer (wchar_t encodes to max 3 UTF-8 bytes) */
+ maxlen = 3 * maxlen + 1;
+ buffer = xmalloc(maxlen);
+
+ /* convert command line arguments and environment to UTF-8 */
+ wcstoutf(buffer, _wpgmptr, maxlen);
+ __argv[0] = xstrdup(buffer);
+ for (i = 1; i < argc; i++) {
+ wcstoutf(buffer, wargv[i], maxlen);
+ __argv[i] = xstrdup(buffer);
+ }
+ for (e = wenv, i = 0; *e; e++, i++) {
+ wcstoutf(buffer, *e, maxlen);
+ environ[i] = xstrdup(buffer);
+ }
+ free(buffer);
+
+ /* initialize mingw replacement of environment functions */
+ initenv(environ);
+
+ /* fix Windows specific environment settings */
+
+ /* on Windows it is TMP and TEMP */
+ if (!getenv("TMPDIR")) {
+ const char *tmp = getenv("TMP");
+ if (!tmp)
+ tmp = getenv("TEMP");
+ if (tmp)
+ setenv("TMPDIR", tmp, 1);
+ }
+
+ /* simulate TERM to enable auto-color (see color.c) */
+ if (!getenv("TERM"))
+ setenv("TERM", "winansi", 1);
+
+ /* set up default file mode and file modes for stdin/out/err */
+ _fmode = _O_BINARY;
+ _setmode(_fileno(stdin), _O_BINARY);
+ _setmode(_fileno(stdout), _O_BINARY);
+ _setmode(_fileno(stderr), _O_BINARY);
+}
+
diff --git a/compat/mingw.h b/compat/mingw.h
index 41b0ad4..27f7510 100644
--- a/compat/mingw.h
+++ b/compat/mingw.h
@@ -208,6 +208,10 @@ char *mingw_getcwd(char *pointer, int len);
char *mingw_getenv(const char *name);
#define getenv mingw_getenv
+int mingw_putenv(char *namevalue);
+#define putenv mingw_putenv
+int mingw_unsetenv(const char *name);
+#define unsetenv mingw_unsetenv
struct hostent *mingw_gethostbyname(const char *host);
#define gethostbyname mingw_gethostbyname
@@ -301,13 +305,6 @@ void mingw_open_html(const char *path);
void mingw_mark_as_git_dir(const char *dir);
#define mark_as_git_dir mingw_mark_as_git_dir
-/*
- * helpers
- */
-
-char **make_augmented_environ(const char *const *vars);
-void free_environ(char **env);
-
/**
* Converts UTF-8 encoded string to UTF-16LE. To support legacy-encoded
* repositories, invalid UTF-8 bytes 0xa0 - 0xff are converted to corresponding
@@ -345,20 +342,16 @@ static inline int utftowcs(wchar_t *wcs, const char *utf, size_t wcslen)
int wcstoutf(char *utf, const wchar_t *wcs, size_t utflen);
/*
- * A replacement of main() that ensures that argv[0] has a path
- * and that default fmode and std(in|out|err) are in binary mode
+ * A replacement of main() that adds mingw specific initialization.
*/
+void mingw_startup();
#define main(c,v) dummy_decl_mingw_main(); \
static int mingw_main(); \
int main(int argc, const char **argv) \
{ \
- _fmode = _O_BINARY; \
- _setmode(_fileno(stdin), _O_BINARY); \
- _setmode(_fileno(stdout), _O_BINARY); \
- _setmode(_fileno(stderr), _O_BINARY); \
- argv[0] = xstrdup(_pgmptr); \
- return mingw_main(argc, argv); \
+ mingw_startup(); \
+ return mingw_main(__argc, __argv); \
} \
static int mingw_main(c,v)
diff --git a/run-command.c b/run-command.c
index 2a1041e..01d740b 100644
--- a/run-command.c
+++ b/run-command.c
@@ -317,7 +317,6 @@ fail_pipe:
{
int fhin = 0, fhout = 1, fherr = 2;
const char **sargv = cmd->argv;
- char **env = environ;
if (cmd->no_stdin)
fhin = open("/dev/null", O_RDWR);
@@ -342,23 +341,18 @@ fail_pipe:
else if (cmd->out > 1)
fhout = dup(cmd->out);
- if (cmd->env)
- env = make_augmented_environ(cmd->env);
-
if (cmd->git_cmd) {
cmd->argv = prepare_git_cmd(cmd->argv);
} else if (cmd->use_shell) {
cmd->argv = prepare_shell_cmd(cmd->argv);
}
- cmd->pid = mingw_spawnvpe(cmd->argv[0], cmd->argv, env, cmd->dir,
- fhin, fhout, fherr);
+ cmd->pid = mingw_spawnvpe(cmd->argv[0], cmd->argv, (char**) cmd->env,
+ cmd->dir, fhin, fhout, fherr);
failed_errno = errno;
if (cmd->pid < 0 && (!cmd->silent_exec_failure || errno != ENOENT))
error("cannot spawn %s: %s", cmd->argv[0], strerror(errno));
- if (cmd->env)
- free_environ(env);
if (cmd->git_cmd)
free(cmd->argv);
--
1.7.3.1.msysgit.0.8.g31c0ec
This patch adds a new test function test_expect_success_unicode that tests
whether the shell is capable of passing unicode strings to another process.
If that works, the test is expected to succeed, otherwise it's expected to
fail.
Signed-off-by: Karsten Blees <bl...@dcon.de>
---
t/t3901-i18n-patch.sh | 28 ++++++++++++++--------------
t/t4201-shortlog.sh | 6 +++---
t/t8005-blame-i18n.sh | 9 +++++----
t/test-lib.sh | 13 +++++++++++++
4 files changed, 35 insertions(+), 21 deletions(-)
diff --git a/t/t3901-i18n-patch.sh b/t/t3901-i18n-patch.sh
index 31a5770..d19255d 100755
--- a/t/t3901-i18n-patch.sh
+++ b/t/t3901-i18n-patch.sh
@@ -67,7 +67,7 @@ test_expect_success setup '
git config i18n.commitencoding UTF-8
'
-test_expect_success 'format-patch output (ISO-8859-1)' '
+test_expect_success_unicode 'format-patch output (ISO-8859-1)' '
git config i18n.logoutputencoding ISO8859-1 &&
git format-patch --stdout master..HEAD^ >out-l1 &&
@@ -78,7 +78,7 @@ test_expect_success 'format-patch output (ISO-8859-1)' '
grep "^From: =?ISO8859-1?q?=C1=E9=ED=20=F3=FA?=" out-l2
'
-test_expect_success 'format-patch output (UTF-8)' '
+test_expect_success_unicode 'format-patch output (UTF-8)' '
git config i18n.logoutputencoding UTF-8 &&
git format-patch --stdout master..HEAD^ >out-u1 &&
@@ -89,7 +89,7 @@ test_expect_success 'format-patch output (UTF-8)' '
grep "^From: =?UTF-8?q?=C3=81=C3=A9=C3=AD=20=C3=B3=C3=BA?=" out-u2
'
-test_expect_success 'rebase (U/U)' '
+test_expect_success_unicode 'rebase (U/U)' '
# We want the result of rebase in UTF-8
git config i18n.commitencoding UTF-8 &&
@@ -108,7 +108,7 @@ test_expect_success 'rebase (U/U)' '
check_encoding 2
'
-test_expect_success 'rebase (U/L)' '
+test_expect_success_unicode 'rebase (U/L)' '
git config i18n.commitencoding UTF-8 &&
git config i18n.logoutputencoding ISO8859-1 &&
. "$TEST_DIRECTORY"/t3901-utf8.txt &&
@@ -119,7 +119,7 @@ test_expect_success 'rebase (U/L)' '
check_encoding 2
'
-test_expect_success 'rebase (L/L)' '
+test_expect_success_unicode 'rebase (L/L)' '
# In this test we want ISO-8859-1 encoded commits as the result
git config i18n.commitencoding ISO8859-1 &&
git config i18n.logoutputencoding ISO8859-1 &&
@@ -131,7 +131,7 @@ test_expect_success 'rebase (L/L)' '
check_encoding 2 8859
'
-test_expect_success 'rebase (L/U)' '
+test_expect_success_unicode 'rebase (L/U)' '
# This is pathological -- use UTF-8 as intermediate form
# to get ISO-8859-1 results.
git config i18n.commitencoding ISO8859-1 &&
@@ -144,7 +144,7 @@ test_expect_success 'rebase (L/U)' '
check_encoding 2 8859
'
-test_expect_success 'cherry-pick(U/U)' '
+test_expect_success_unicode 'cherry-pick(U/U)' '
# Both the commitencoding and logoutputencoding is set to UTF-8.
git config i18n.commitencoding UTF-8 &&
@@ -159,7 +159,7 @@ test_expect_success 'cherry-pick(U/U)' '
check_encoding 3
'
-test_expect_success 'cherry-pick(L/L)' '
+test_expect_success_unicode 'cherry-pick(L/L)' '
# Both the commitencoding and logoutputencoding is set to ISO-8859-1
git config i18n.commitencoding ISO8859-1 &&
@@ -174,7 +174,7 @@ test_expect_success 'cherry-pick(L/L)' '
check_encoding 3 8859
'
-test_expect_success 'cherry-pick(U/L)' '
+test_expect_success_unicode 'cherry-pick(U/L)' '
# Commitencoding is set to UTF-8 but logoutputencoding is ISO-8859-1
git config i18n.commitencoding UTF-8 &&
@@ -189,7 +189,7 @@ test_expect_success 'cherry-pick(U/L)' '
check_encoding 3
'
-test_expect_success 'cherry-pick(L/U)' '
+test_expect_success_unicode 'cherry-pick(L/U)' '
# Again, the commitencoding is set to ISO-8859-1 but
# logoutputencoding is set to UTF-8.
@@ -205,7 +205,7 @@ test_expect_success 'cherry-pick(L/U)' '
check_encoding 3 8859
'
-test_expect_success 'rebase --merge (U/U)' '
+test_expect_success_unicode 'rebase --merge (U/U)' '
git config i18n.commitencoding UTF-8 &&
git config i18n.logoutputencoding UTF-8 &&
. "$TEST_DIRECTORY"/t3901-utf8.txt &&
@@ -216,7 +216,7 @@ test_expect_success 'rebase --merge (U/U)' '
check_encoding 2
'
-test_expect_success 'rebase --merge (U/L)' '
+test_expect_success_unicode 'rebase --merge (U/L)' '
git config i18n.commitencoding UTF-8 &&
git config i18n.logoutputencoding ISO8859-1 &&
. "$TEST_DIRECTORY"/t3901-utf8.txt &&
@@ -227,7 +227,7 @@ test_expect_success 'rebase --merge (U/L)' '
check_encoding 2
'
-test_expect_success 'rebase --merge (L/L)' '
+test_expect_success_unicode 'rebase --merge (L/L)' '
# In this test we want ISO-8859-1 encoded commits as the result
git config i18n.commitencoding ISO8859-1 &&
git config i18n.logoutputencoding ISO8859-1 &&
@@ -239,7 +239,7 @@ test_expect_success 'rebase --merge (L/L)' '
check_encoding 2 8859
'
-test_expect_success 'rebase --merge (L/U)' '
+test_expect_success_unicode 'rebase --merge (L/U)' '
# This is pathological -- use UTF-8 as intermediate form
# to get ISO-8859-1 results.
git config i18n.commitencoding ISO8859-1 &&
diff --git a/t/t4201-shortlog.sh b/t/t4201-shortlog.sh
index cdb70b4..b672b5c 100755
--- a/t/t4201-shortlog.sh
+++ b/t/t4201-shortlog.sh
@@ -93,7 +93,7 @@ test_expect_success 'output from user-defined format is re-wrapped' '
test_cmp expect log.predictable
'
-test_expect_success 'shortlog wrapping' '
+test_expect_success_unicode 'shortlog wrapping' '
cat >expect <<\EOF &&
A U Thor (5):
Test
@@ -114,7 +114,7 @@ EOF
test_cmp expect out
'
-test_expect_success 'shortlog from non-git directory' '
+test_expect_success_unicode 'shortlog from non-git directory' '
git log HEAD >log &&
GIT_DIR=non-existing git shortlog -w <log >out &&
test_cmp expect out
@@ -135,7 +135,7 @@ $DSCHO (2):
EOF
-test_expect_success 'shortlog encoding' '
+test_expect_success_unicode 'shortlog encoding' '
git reset --hard "$commit" &&
git config --unset i18n.commitencoding &&
echo 2 > a1 &&
diff --git a/t/t8005-blame-i18n.sh b/t/t8005-blame-i18n.sh
index cb39055..6affb78 100755
--- a/t/t8005-blame-i18n.sh
+++ b/t/t8005-blame-i18n.sh
@@ -1,6 +1,7 @@
#!/bin/sh
test_description='git blame encoding conversion'
+
. ./test-lib.sh
. "$TEST_DIRECTORY"/t8005/utf8.txt
@@ -33,7 +34,7 @@ author $SJIS_NAME
summary $SJIS_MSG
EOF
-test_expect_success \
+test_expect_success_unicode \
'blame respects i18n.commitencoding' '
git blame --incremental file | \
egrep "^(author|summary) " > actual &&
@@ -49,7 +50,7 @@ author $EUC_JAPAN_NAME
summary $EUC_JAPAN_MSG
EOF
-test_expect_success \
+test_expect_success_unicode \
'blame respects i18n.logoutputencoding' '
git config i18n.logoutputencoding eucJP &&
git blame --incremental file | \
@@ -66,7 +67,7 @@ author $UTF8_NAME
summary $UTF8_MSG
EOF
-test_expect_success \
+test_expect_success_unicode \
'blame respects --encoding=UTF-8' '
git blame --incremental --encoding=UTF-8 file | \
egrep "^(author|summary) " > actual &&
@@ -82,7 +83,7 @@ author $UTF8_NAME
summary $UTF8_MSG
EOF
-test_expect_success \
+test_expect_success_unicode \
'blame respects --encoding=none' '
git blame --incremental --encoding=none file | \
egrep "^(author|summary) " > actual &&
diff --git a/t/test-lib.sh b/t/test-lib.sh
index b50d723..06edcbd 100644
--- a/t/test-lib.sh
+++ b/t/test-lib.sh
@@ -477,6 +477,19 @@ test_expect_success () {
echo >&3 ""
}
+# test if shell supports unicode, expect success if it does, otherwise failure
+test_expect_success_unicode () {
+tclsh -ÀÁÂ <<EOF
+ exit [expr {\$argv ne [lindex "-\u00c0\u00c1\u00c2" 0]}]
+EOF
+ if [ $? == 0 ]
+ then
+ test_expect_success "$@"
+ else
+ test_expect_failure "$@"
+ fi
+}
+
test_expect_code () {
test "$#" = 4 && { prereq=$1; shift; } || prereq=
test "$#" = 3 ||
--
1.7.3.1.msysgit.0.8.g31c0ec
Karsten, All,
On Sun, Oct 24, 2010 at 10:00:18PM +0000, Karsten Blees wrote:
> This patch series tries to solve the Unicode problems of Git for Windows, at least for git, gitk and git-gui.
>
> Credits go to Thorvald Natvig (tn/utf8), Nico Rieck (work/utf8) and Takuya Murakami (work/tm/utf8) for their previous work on the subject, from which I've shamelessly stolen ideas as seemed fit. Notable differences from their patches include:
>
> * The UTF-8 to UTF-16 conversion function can handle invalid UTF-8 (similar to TCL), so that old git repositories with legacy-encoded file names can be checked out without error.
>
> * The conversion functions don't use static buffers in order to be fully thread safe. This makes each usage a bit more complicated, but is necessary for the multi-threaded parts of git to work (e.g. git-grep, git-pack-objects, core.preloadindex).
>
> I've included a revised version of the thread-safe console patch that I've posted a while back. This version eliminates the call to TerminateThread and uses the fault-tolerant UTF-8 conversion function from the file name patch (so that file system and console output are in sync).
>
> Patch 3 breaks some of the i18n tests (t3901, t4201 and t8005), which I really don't know how to fix properly. MSys bash is completely broken when it comes to Unicode, as it reads test scripts in Windows system encoding (instead of UTF-8). Changing the encoding of the test scripts (e.g. to Cp1252) is not a solution as this will fail on Windows systems with different encodings. Currently, I've just changed the tests to expect failure if bash cannot pass unicode script content to other processes correctly.
>
> $ grep "failed [^0]" t/test-results/*
> t/test-results/t5407-post-rewrite-hook-6400.counts:failed 9
> t/test-results/t7400-submodule-basic-7004.counts:failed 1
> t/test-results/t7508-status-5936.counts:failed 18
> t/test-results/t9001-send-email-6652.counts:failed 4
> t/test-results/t9500-gitweb-standalone-no-errors-3620.counts:failed 83
> t/test-results/t9501-gitweb-standalone-http-status-5620.counts:failed 11
> t/test-results/t9502-gitweb-standalone-parse-output-3384.counts:failed 9
>
> Karsten Blees (5):
> Unicode file names on Windows
> Unicode file names for gitk and git-gui on Windows
> Unicode arguments and environment on Windows
> Thread-safe windows console output
> Mark unicode-related tests broken on msys
Big-Thanks-From: and Tested-By: Kirill Smelkov <ki...@mns.spb.ru>
Thanks a lot!
I've tested this together with my .pdf and .doc textconv patches and
this worked great both under Wine (my primary OS is Linux) and under
several real Windows XP.
The only glitch I've found so far is that file list in git-gui's
"Repository->Browse master's files" is not converted to UTF-16 and
displays as garbage.
Otherwise everything seems to work:
- patch 1: Looks sane although I can't review it in depth because of lack
of win32 knowledge.
- patch 2: There are more places in gitk where to reencode, e.g. "New
file ...", IIRC. Otherwise it works modulo already-mentioned garbage
in git-gui file list.
- patch 3: It works because I've been able to use it with my *.doc and
*.pdf textconv drivers on UTF-8 encoded filenames.
- patch 4: Works, as you've already said only with --no-pager. Would be
great to teach it to correctly activate UTF-8 -> UTF-16 conversion in
pager cases too, but what is now is already a very good start, imho.
Thanks again, and sorry I'm maybe not reviewing this stuff in-depth --
at least it works and let's hope we'll have something which addresses
issue80 in msysgit mainline someday.
Kirill
[snip]
> > I've tested this together with my .pdf and .doc textconv patches and
> > this worked great both under Wine (my primary OS is Linux) and under
> > several real Windows XP.
> >
> > The only glitch I've found so far is that file list in git-gui's
> > "Repository->Browse master's files" is not converted to UTF-16 and
> > displays as garbage.
> >
>
> I only touched code that obviously already dealt with file name encoding
> (i.e. 'encoding convertfrom...'), so I might have missed something. This
> should fix the repo browser:
> ---
> diff --git a/git-gui/lib/browser.tcl b/git-gui/lib/browser.tcl
> index c241572..575de43 100644
> --- a/git-gui/lib/browser.tcl
> +++ b/git-gui/lib/browser.tcl
> @@ -191,7 +191,7 @@ method _ls {tree_id {name {}}} {
> $w conf -state disabled
>
> set fd [git_read ls-tree -z $tree_id]
> - fconfigure $fd -blocking 0 -translation binary -encoding binary
> + fconfigure $fd -blocking 0 -translation binary -encoding utf-8
> fileevent $fd readable [cb _read $fd]
> }
Thanks for the fixup - now it works. Btw I'm keeping all the patches here
git://repo.or.cz/git/kirr.git win32/kirr
> > Otherwise everything seems to work:
> >
> > - patch 1: Looks sane although I can't review it in depth because of
> lack
> > of win32 knowledge.
> >
> > - patch 2: There are more places in gitk where to reencode, e.g. "New
> > file ...", IIRC. Otherwise it works modulo already-mentioned garbage
> > in git-gui file list.
> >
>
> Where can you do a "New file..." in gitk?
It's when there is a new file added by commit, which is binary and for
which there is no textconv. Then it reads like
diff --git a/<nice-unicode-name> b/<nice-unicode-name>
new file mode 100644
index 0000000..XXXXXXX
Binary files /dev/null and b/<garbaged-unicode-name> differ
^^^^^^^^^^^^^^^^^^^^^
here
Sorry for not being clear in the first place.
Also actually this and rename-to/rename-from problems are not win32
specific. I suggest to send them directly to gitk (only the encoding
should maybe not be hardcoded to utf-8, but use $diffencoding instead,
e.g. like here:
http://repo.or.cz/w/git.git/commitdiff/1f2cecfd53137b76d39b2dcd7bcf7e918cd745b3)
Thanks for the info.
> > Thanks again, and sorry I'm maybe not reviewing this stuff in-depth --
> > at least it works and let's hope we'll have something which addresses
> > issue80 in msysgit mainline someday.
> >
> >
> > Kirill
>
> Yes, I agree. After all, the Windows OS had full Unicode support from the
> start, so I think it's time that 'Git for Windows' earns that name (with
> the encoding issues, it feels more like a 'Git for DOS' right now).
Here it's not like 'Git for DOS' (we already have git-gui, gitk,
git-cheetah ...), but issue80 is really a show-stopper.
Thanks again,
Kirill
I'm not telling it should handle it in different way, quite the opposite - it
should (but does not yet) handle it in the same way. IIRC gitk has built-in
line-matcher with logic which lines to encoding-convert and which not. I'd
change it to "convert all lines..."
> How did you get that output? Can you check the exact
> output of git diff-tree -p <commit>?
Here you are:
$ git diff-tree -p 34baf5
34baf57217ded50132e1f414ad76f6aab93bff47
diff --git a/hardware/mns/УС-1/Серия/УС-1_Э3_ИИ108-09.dwg b/hardware/mns/УС-1/Серия/УС-1_Э3_ИИ108-09.dwg
new file mode 100644
index 0000000..b0bf8a4
Binary files /dev/null and b/hardware/mns/УС-1/Серия/УС-1_Э3_ИИ108-09.dwg differ
This is ok, but in gitk under Linux I'm getting:
----------------- hardware/mns/УС-1/Серия/УС-1_Э3_ИИ108-09.dwg -----------------
new file mode 100644
index 0000000..b0bf8a4
Binary files /dev/null and b/hardware/mns/УС-1/Ð¡ÐµÑ€Ð¸Ñ /УС-1_Ð3_ИИ108-09.dwg differ
Here are steps how to reproduce this from scratch:
$ mkdir test
$ cd test/
$ git init
$ echo Hello > hello.txt
$ git add hello.txt
$ git commit -m init
$ dd if=/dev/zero of=привет.bin bs=1K count=4
$ git add привет.bin
$ git commit -m second
$ git diff-tree -p HEAD
71d7f6cacfa1364bd9428f4ae51d05356ec30e51
diff --git a/привет.bin b/привет.bin
new file mode 100644
index 0000000..08e7df1
Binary files /dev/null and b/привет.bin differ
# so it's ok
$ gitk # and there I'm seeing
---------------------------------- привет.bin ----------------------------------
new file mode 100644
index 0000000..08e7df1
Binary files /dev/null and b/привет.bin differ
<rant>
I see and agree that there are various encoding envolved. But from a
user perspective, wouldn't one expect output of _one_ command to be in
the same encoding?
If that one command is `git log -p` we have all three cases represented
here.
</rant>
Anyway thanks for clarification and remark about my patch. I have almost
zero Tcl skills - that's why originally I've just copied nearby lines...
> > > Yes, I agree. After all, the Windows OS had full Unicode support from
> the
> > > start, so I think it's time that 'Git for Windows' earns that name
> (with
> > > the encoding issues, it feels more like a 'Git for DOS' right now).
> >
> > Here it's not like 'Git for DOS' (we already have git-gui, gitk,
> > git-cheetah ...), but issue80 is really a show-stopper.
> >
>
> What I meant by 'Git for DOS' is, well, the last DOS-based Windows was
> taken off market about 8 years ago. It is sad that msysgit today is still
> based on the 'Windows for DOS' compatibility API (inheriting the code page
> limitations of Win3.1/95/98/ME), instead of the native Unicode API that
> has been around since the first release of Windows NT in 1993. For a
> distributed SCM desigend for projects of international scope, the 'for
> Windows' attribute just doesn't fit as long as issue80 is not resolved.
> </rant> (sorry)
:) let's hope it will be resolved one day.
Sorry again for the dalay,
Kirill
On Sun, 24 Oct 2010, Karsten Blees wrote:
> This patch series tries to solve the Unicode problems of Git for
> Windows, at least for git, gitk and git-gui.
Sorry, I did not really have time to follow this issue. So could some kind
soul update me as to the status?
Thanks!
Dscho
Did you figure out a way to do this without breaking the pager? IIRC,
there were some reports that with your previous branch, the pager
would have to be disabled (due to some problems in our less, I seem to
recall)... At least I love my pager too much to want to lose it for a
rarely-enough feature as unicode filenames.
This feature is currently required by compat/unsetenv.c, I believe.
> - complexity -----------+------------+---------------+------------
> changed files vs. noenv | 2 | 6 | 4
> inserted lines | 31 | 271 | 158
> deleted lines | 12 | 82 | 133
> - performance (µs) -----+------------+---------------+------------
> startup | 22 | 0 | 36
> getenv | 3 | 3-21 | 0.06
> getenv total (35 calls) | 105 | 154 | 2
> spawn | 42 | 49 | 17
Thanks, a lot, this settles all my issues with performance. If all we
lose is 36 µs on start-up per process for doing the full conversion,
then I'm not concerned.
The only thing I'd like to know is how large your environment was when
you did this test; it certainly affects performance.
Is this patch related to unicode? Sounds to me like if it fixes two
tests, perhaps it's worth having no matter what?
> * 97a4c3a Win32: move main macro to a function
> * 0b40c49 gitk: fix file name encoding in diff hunk headers
> * c8f515b git-gui: fix encoding in git-gui file browser
> * 6bca2da Win32: fix potential multi-threading issue
> * dc50b47 Win32 dirent: improve dirent implementation
> * b599186 Win32 dirent: clarify #include directives
> * 36caf8c Win32 dirent: change FILENAME_MAX to MAX_PATH
> * 54d7ee9 Win32 dirent: remove unused dirent.d_reclen member
> * 5f26eb8 Win32 dirent: remove unused dirent.d_ino member
> * d79d001 MSVC: link dynamically to the CRT
> * 6bdbe8e MSVC: fix winansi.c compile errors
> * 585d81a MSVC: include <io.h> for mktemp
> * 1ca9bd4 MSVC: don't use uninitialized variables
> * 0eae8f2 MSVC: disable const qualifier compiler warnings
> * cc32418 Remove dummy S_I[RWX]USR definitions
> * a8770a1 (origin/devel, devel) Partly revert "Work around funny CR issue"
As above, it sounds like a lot of these patches are simply
improvements no matter what. Perhaps we should weed out the ones that
are completely uncontroversial, so the ones that are controversial
gets more manageable?
Here's a branch that are the patches that jumps out as uncontroversial
(there might be more, though):
http://repo.or.cz/w/git/kusma.git/shortlog/refs/heads/for-devel
Dscho, Karsten: would you be OK with me just pushing these directly to
'devel'? I believe they all have been discussed...
b688b9a MinGW: disable command line globbing (fixes t5505 an
605f727 Win32: move main macro to a function
8a2b741 Win32: fix potential multi-threading issue
5b4cdf7 Win32 dirent: improve dirent implementation
5eb0af2 Win32 dirent: clarify #include directives
1cb318d Win32 dirent: change FILENAME_MAX to MAX_PATH
5cbd51a Win32 dirent: remove unused dirent.d_reclen member
b52a1f5 Win32 dirent: remove unused dirent.d_ino member
b65a6c4 Remove dummy S_I[RWX]USR definitions
I lost track of what the long-term goal of this series is.
One thing that I am worried about is that with these patches git is
designed to talk to its environment in UTF-8. Is this correct? I'd
prefer to have a program that works "the Windows way", which would mean
that it talks to its environment using the current codepage setting.
More precisely, git.exe should work in a CMD with standard settings
without any wrappers (like msysgit's git.cmd).
-- Hannes
On Tue, 24 May 2011, Erik Faye-Lund wrote:
> Here's a branch that are the patches that jumps out as uncontroversial
> (there might be more, though):
> http://repo.or.cz/w/git/kusma.git/shortlog/refs/heads/for-devel
>
> Dscho, Karsten: would you be OK with me just pushing these directly to
> 'devel'? I believe they all have been discussed...
Sure! Go ahead...
Ciao,
Dscho
On Tue, 24 May 2011, Erik Faye-Lund wrote:
> On Tue, May 24, 2011 at 8:42 PM, <karste...@dcon.de> wrote:
> > - complexity -----------+------------+---------------+------------
> > changed files vs. noenv | � � �2 � � | � � � 6 � � � | � � �4
> > inserted lines � � � � �| � � 31 � � | � � 271 � � � | � �158
> > deleted lines � � � � � | � � 12 � � | � � �82 � � � | � �133
> > - performance (�s) -----+------------+---------------+------------
> > startup � � � � � � � � | � � 22 � � | � � � 0 � � � | � � 36
> > getenv � � � � � � � � �| � � �3 � � | � �3-21 � � � | � � �0.06
> > getenv total (35 calls) | � �105 � � | � � 154 � � � | � � �2
> > spawn � � � � � � � � � | � � 42 � � | � � �49 � � � | � � 17
>
> Thanks, a lot, this settles all my issues with performance. If all we
> lose is 36 �s on start-up per process for doing the full conversion,
> then I'm not concerned.
Is it really microseconds? I thought on Windows we measure milliseconds.
This makes a difference: 30 calls of something that takes 36 milliseconds,
and we're above 1 second, which is noticable.
> The only thing I'd like to know is how large your environment was when
> you did this test; it certainly affects performance.
I'm really worried about this, too. And not only the number of entries in
the environment, but also the length of the entries.
You know, a couple of years back, I had a quizzy feeling about linking
curl into the git executable. I had the impression that whatever overhead
(memory & time) it adds, it might be noticable in some setup. Guess what:
Linus himself worked on avoiding to link curl to the git executable. For
performance.
Ciao,
Dscho
That's the metric he gave. I haven't tried to verify the numbers myself, though.
> This makes a difference: 30 calls of something that takes 36 milliseconds,
> and we're above 1 second, which is noticable.
I agree.
>> The only thing I'd like to know is how large your environment was when
>> you did this test; it certainly affects performance.
>
> I'm really worried about this, too. And not only the number of entries in
> the environment, but also the length of the entries.
I'm not as worried about that as I was agains an artificially small
environment (which didn't seem to be the case). The total length of
the environment is limited to 32768 characters (in the best case:
maximum 65536 bytes, UTF-16 encoded) on Windows, so there's a limit to
how far the performance will escalate.
> You know, a couple of years back, I had a quizzy feeling about linking
> curl into the git executable. I had the impression that whatever overhead
> (memory & time) it adds, it might be noticable in some setup. Guess what:
> Linus himself worked on avoiding to link curl to the git executable. For
> performance.
Yeah, that's my main worry here too; is this something we will regret
and have to put in a lot of time to fix later?
But if it turns out the numbers really are in microseconds, then this
is too insignificant for me to bother worrying about ;)
On Wed, 25 May 2011, Erik Faye-Lund wrote:
> On Wed, May 25, 2011 at 9:16 AM, Johannes Schindelin
> <Johannes....@gmx.de> wrote:
>
> > On Tue, 24 May 2011, Erik Faye-Lund wrote:
> >
> >> On Tue, May 24, 2011 at 8:42 PM, �<karste...@dcon.de> wrote:
> >> > - complexity -----------+------------+---------------+------------
> >> > changed files vs. noenv | � � �2 � � | � � � 6 � � � | � � �4
> >> > inserted lines � � � � �| � � 31 � � | � � 271 � � � | � �158
> >> > deleted lines � � � � � | � � 12 � � | � � �82 � � � | � �133
> >> > - performance (�s) -----+------------+---------------+------------
> >> > startup � � � � � � � � | � � 22 � � | � � � 0 � � � | � � 36
> >> > getenv � � � � � � � � �| � � �3 � � | � �3-21 � � � | � � �0.06
> >> > getenv total (35 calls) | � �105 � � | � � 154 � � � | � � �2
> >> > spawn � � � � � � � � � | � � 42 � � | � � �49 � � � | � � 17
> >>
> >> Thanks, a lot, this settles all my issues with performance. If all we
> >> lose is 36 �s on start-up per process for doing the full conversion,
> >> then I'm not concerned.
> >
> > Is it really microseconds? I thought on Windows we measure
> > milliseconds.
>
> That's the metric he gave. I haven't tried to verify the numbers myself,
> though.
>
> > This makes a difference: 30 calls of something that takes 36
> > milliseconds, and we're above 1 second, which is noticable.
>
> I agree.
>
> >> The only thing I'd like to know is how large your environment was
> >> when you did this test; it certainly affects performance.
> >
> > I'm really worried about this, too. And not only the number of entries
> > in the environment, but also the length of the entries.
>
> I'm not as worried about that as I was agains an artificially small
> environment (which didn't seem to be the case). The total length of the
> environment is limited to 32768 characters (in the best case: maximum
> 65536 bytes, UTF-16 encoded) on Windows, so there's a limit to how far
> the performance will escalate.
I did not know that! Good to know!
> > You know, a couple of years back, I had a quizzy feeling about linking
> > curl into the git executable. I had the impression that whatever
> > overhead (memory & time) it adds, it might be noticable in some setup.
> > Guess what: Linus himself worked on avoiding to link curl to the git
> > executable. For performance.
>
> Yeah, that's my main worry here too; is this something we will regret
> and have to put in a lot of time to fix later?
>
> But if it turns out the numbers really are in microseconds, then this is
> too insignificant for me to bother worrying about ;)
Right. If it's microseconds, I'll retract my worries :-)
Ciao,
Dscho
Can somebody remind me why the unicode-v9-full implementation has not been
integrated into our development branch?
Are we waiting on something? (Was there something about reversing the order of
some of the patches?)
I guess it would certainly be nice if msys similarly implemented UTF-8
encoding (and symlink support, but that's a seperate issue).
//.ichael G.
On Wed, 25 May 2011 06:28:28 AM karste...@dcon.de wrote:
> Johannes Sixt <j...@kdbg.org> wrote on 24.05.2011 22:54:05:
> > I lost track of what the long-term goal of this series is.
> >
> > One thing that I am worried about is that with these patches git is
> > designed to talk to its environment in UTF-8. Is this correct?
>
> More precisely, the patches are designed so that git uses UTF-8 internally
> and talks to its environment in UTF-16 (which is the native encoding of
> all NT-based Windows versions).
>
On Wed, 8 Jun 2011, Michael Geddes wrote:
> Excellent work Karsten, and nice summary.
>
> Can somebody remind me why the unicode-v9-full implementation has not
> been integrated into our development branch?
>
> Are we waiting on something? (Was there something about reversing the
> order of some of the patches?)
>
> I guess it would certainly be nice if msys similarly implemented UTF-8
> encoding (and symlink support, but that's a seperate issue).
Well, Hannes was completely opposed until MSys learnt about UTF-8. At
least as far as I understood his mails.
There was a slight uncomfortable feeling on my side with the wholesale
conversion of the environment, but that has been resolved.
So I basically wanted to wait for violent disagreement, and absent that,
merge the series (once I have time, this week I was traveling to
Switzerland, next week I will be in France, so unless somebody else steps
up to the task, the merge will have to wait another ~2 weeks)...
Ciao,
Johannes
I also don't want to get back onto the symlink issue until the unicode issue
is resolved.
I guess maybe it's time to make some waves in the msys pond now?
>
> Ciao,
> Johannes
thanks.
Michael.
Karsten, first of all HUGE THANKS for getting unicode support into shape
again!
I'm late to the party, but here is a build of latest msysgit devel +
kb/unicode-v9-full for those who want to try it out:
http://repo.or.cz/w/msysgit/kirr.git/tree/refs/heads/x/built -> Git-kirr15.exe
(Only briefly tested)
And thanks Johannes for planning to merge it! Not only there is no
violent disagreement, but to me there is a strong wish from a lot of
people to try to resolve famous multilingual issue80, because that issue
blocks msysgit adoption in a lot of countries:
imho ascii-only filenames are like ascii only _contents_ in today's world.
In my company I had to hack on it and build several versions of utf-8
enabled msysgit installer for local use so we could use Git between both
linux and windows people. But I guess doing this work would be too much
for average case.
Thanks again and Hope we'll merge it,
Kirill