[PATCH 4/5] Thread-safe windows console output

329 views
Skip to first unread message

Karsten Blees

unread,
Oct 24, 2010, 6:00:22 PM10/24/10
to msy...@googlegroups.com, Karsten Blees
Replaces stdout and stderr with a pipe if they point to the console. A
background thread reads from the pipe, handles ANSI escape sequences and
UTF-8 to UTF-16 conversion, then writes to the console.

Global variables are either initialized on startup (single threaded) or
exclusively modified by the background thread. Threads communicate through
the pipe, no further synchronization is necessary.

Due to the byte-oriented pipe, ANSI escape sequences can no longer be
expected to arrive in one piece. The string-based ansi_emulate() has been
replaced by a simple stateful parser (this also fixes colored diff hunk
headers, which were broken as of commit 2efcc977).

Overrides isatty to return true for the pipes redirecting to the console.

Exec/spawn obtain the original console handle to pass to the next process
via winansi_get_osfhandle().

All other overrides are gone, the default stdio implementations work as
expected with the piped stdout/stderr descriptors.

Limitations: doesn't track reopened or duped file descriptors, i.e.:
- fdopen(1/2) returns fully buffered streams
- dup(1/2), dup2(1/2) returns normal pipe descriptors (i.e. isatty() =
false, winansi_get_osfhandle won't return the original console handle)

Currently, only the git-format-patch command uses xfdopen(xdup(1)) (see
"realstdout" in builtin/log.c), but works well with these limitations.

Signed-off-by: Karsten Blees <bl...@dcon.de>
---
compat/mingw.c | 9 +-
compat/mingw.h | 12 +--
compat/winansi.c | 363 +++++++++++++++++++++++++++++++++--------------------
3 files changed, 236 insertions(+), 148 deletions(-)

diff --git a/compat/mingw.c b/compat/mingw.c
index 1285d27..27cada4 100644
--- a/compat/mingw.c
+++ b/compat/mingw.c
@@ -1135,9 +1135,9 @@ static pid_t mingw_spawnve_fd(const char *cmd, const char **argv, char **env,
memset(&si, 0, sizeof(si));
si.cb = sizeof(si);
si.dwFlags = STARTF_USESTDHANDLES;
- si.hStdInput = (HANDLE) _get_osfhandle(fhin);
- si.hStdOutput = (HANDLE) _get_osfhandle(fhout);
- si.hStdError = (HANDLE) _get_osfhandle(fherr);
+ si.hStdInput = winansi_get_osfhandle(fhin);
+ si.hStdOutput = winansi_get_osfhandle(fhout);
+ si.hStdError = winansi_get_osfhandle(fherr);

if (utftowcs(wcmd, cmd, MAX_PATH) < 0)
return -1;
@@ -2063,5 +2063,8 @@ void mingw_startup()
_setmode(_fileno(stdin), _O_BINARY);
_setmode(_fileno(stdout), _O_BINARY);
_setmode(_fileno(stderr), _O_BINARY);
+
+ /* initialize Unicode console */
+ winansi_init();
}

diff --git a/compat/mingw.h b/compat/mingw.h
index 27f7510..4dbd3b9 100644
--- a/compat/mingw.h
+++ b/compat/mingw.h
@@ -279,14 +279,10 @@ sig_handler_t mingw_signal(int sig, sig_handler_t handler);
* ANSI emulation wrappers
*/

-int winansi_fputs(const char *str, FILE *stream);
-int winansi_printf(const char *format, ...) __attribute__((format (printf, 1, 2)));
-int winansi_fprintf(FILE *stream, const char *format, ...) __attribute__((format (printf, 2, 3)));
-int winansi_vfprintf(FILE *stream, const char *format, va_list list);
-#define fputs winansi_fputs
-#define printf(...) winansi_printf(__VA_ARGS__)
-#define fprintf(...) winansi_fprintf(__VA_ARGS__)
-#define vfprintf winansi_vfprintf
+void winansi_init(void);
+int winansi_isatty(int fd);
+HANDLE winansi_get_osfhandle(int fd);
+#define isatty winansi_isatty

/*
* git specific compatibility
diff --git a/compat/winansi.c b/compat/winansi.c
index a5ca2d9..5680ed5 100644
--- a/compat/winansi.c
+++ b/compat/winansi.c
@@ -3,18 +3,13 @@
*/

#include "../git-compat-util.h"
-#include <malloc.h>
#include <wingdi.h>
#include <winreg.h>

/*
Functions to be wrapped:
*/
-#undef printf
-#undef fprintf
-#undef fputs
-#undef vfprintf
-/* TODO: write */
+#undef isatty

/*
ANSI codes used by git: m, K
@@ -27,8 +22,10 @@ static HANDLE console;
static WORD plain_attr;
static WORD attr;
static int negative;
-static FILE *last_stream = NULL;
static int non_ascii_used = 0;
+static HANDLE hthread, hread, hwrite;
+static HANDLE hwrite1 = INVALID_HANDLE_VALUE, hwrite2 = INVALID_HANDLE_VALUE;
+static HANDLE hconsole1, hconsole2;

typedef struct _CONSOLE_FONT_INFOEX {
ULONG cbSize;
@@ -71,64 +68,39 @@ static void warn_if_raster_font(void)
}
}

- if (!(fontFamily & TMPF_TRUETYPE))
- warning("Your console font probably doesn\'t support "
- "Unicode. If you experience strange characters in the output, "
- "consider switching to a TrueType font such as Lucida Console!");
+ if (!(fontFamily & TMPF_TRUETYPE)) {
+ const wchar_t *msg = L"\nWarning: Your console font probably doesn\'t "
+ "support Unicode. If you experience strange characters in the "
+ "output, consider switching to a TrueType font such as Lucida "
+ "Console!\n";
+ WriteConsoleW(console, msg, wcslen(msg), NULL, NULL);
+ }
}

-static int is_console(FILE *stream)
+static int is_console(int fd)
{
CONSOLE_SCREEN_BUFFER_INFO sbi;
HANDLE hcon;

- static int initialized = 0;
-
- /* use cached value if stream hasn't changed */
- if (stream == last_stream)
- return console != NULL;
-
- last_stream = stream;
- console = NULL;
-
- /* get OS handle of the stream */
- hcon = (HANDLE) _get_osfhandle(_fileno(stream));
+ /* get OS handle of the file descriptor */
+ hcon = (HANDLE) _get_osfhandle(fd);
if (hcon == INVALID_HANDLE_VALUE)
return 0;

+ /* check if its a device (i.e. console, printer, serial port) */
+ if (GetFileType(hcon) != FILE_TYPE_CHAR)
+ return 0;
+
/* check if its a handle to a console output screen buffer */
if (!GetConsoleScreenBufferInfo(hcon, &sbi))
return 0;

- if (!initialized) {
- attr = plain_attr = sbi.wAttributes;
- negative = 0;
- initialized = 1;
- /* check console font on exit */
- atexit(warn_if_raster_font);
- }
-
- console = hcon;
+ /* initialize attributes */
+ attr = plain_attr = sbi.wAttributes;
+ negative = 0;
return 1;
}

-static int write_console(const char *str, size_t len)
-{
- /* convert utf-8 to utf-16, write directly to console */
- int wlen = MultiByteToWideChar(CP_UTF8, 0, str, len, NULL, 0);
- wchar_t *wbuf = (wchar_t *) alloca(wlen * sizeof(wchar_t));
- MultiByteToWideChar(CP_UTF8, 0, str, len, wbuf, wlen);
-
- WriteConsoleW(console, wbuf, wlen, NULL, NULL);
-
- /* remember if non-ascii characters are printed */
- if (wlen != len)
- non_ascii_used = 1;
-
- /* return original (utf-8 encoded) length */
- return len;
-}
-
#define FOREGROUND_ALL (FOREGROUND_RED | FOREGROUND_GREEN | FOREGROUND_BLUE)
#define BACKGROUND_ALL (BACKGROUND_RED | BACKGROUND_GREEN | BACKGROUND_BLUE)

@@ -172,18 +144,13 @@ static void erase_in_line(void)
&dummy);
}

-
-static const char *set_attr(const char *str)
+static void set_attr(char func, const int *params, int paramlen)
{
- const char *func;
- size_t len = strspn(str, "0123456789;");
- func = str + len;
-
- switch (*func) {
+ int i;
+ switch (func) {
case 'm':
- do {
- long val = strtol(str, (char **)&str, 10);
- switch (val) {
+ for (i = 0; i < paramlen; i++) {
+ switch (params[i]) {
case 0: /* reset */
attr = plain_attr;
negative = 0;
@@ -306,9 +273,7 @@ static const char *set_attr(const char *str)
/* Unsupported code */
break;
}
- str++;
- } while (*(str-1) == ';');
-
+ }
set_console_attr();
break;
case 'K':
@@ -318,112 +283,236 @@ static const char *set_attr(const char *str)
/* Unsupported code */
break;
}
-
- return func + 1;
}

-static int ansi_emulate(const char *str, FILE *stream)
+#define BUFFER_SIZE 4096
+#define MAX_PARAMS 16
+
+static void write_console(char *str, size_t len)
{
- int rv = 0;
- const char *pos = str;
+ /* only called from console_thread, so a static buffer will do */
+ static wchar_t wbuf[2 * BUFFER_SIZE + 1];

- fflush(stream);
+ /* convert utf-8 to utf-16 */
+ int wlen = utftowcsn(wbuf, str, 2 * BUFFER_SIZE + 1, len);

- while (*pos) {
- pos = strstr(str, "\033[");
- if (pos) {
- size_t len = pos - str;
+ /* write directly to console */
+ WriteConsoleW(console, wbuf, wlen, NULL, NULL);

- if (len) {
- size_t out_len = write_console(str, len);
- rv += out_len;
- if (out_len < len)
- return rv;
- }
+ /* remember if non-ascii characters are printed */
+ if (wlen != len)
+ non_ascii_used = 1;
+}

- str = pos + 2;
- rv += 2;
+enum {
+ TEXT = 0, ESCAPE = 033, BRACKET = '[', EXIT = -1
+};

- pos = set_attr(str);
- rv += pos - str;
- str = pos;
- } else {
- size_t len = strlen(str);
- rv += write_console(str, len);
- return rv;
+static DWORD WINAPI console_thread(LPVOID unused)
+{
+ char buffer[BUFFER_SIZE];
+ DWORD bytes;
+ int start, end, c, parampos = 0, state = TEXT;
+ int params[MAX_PARAMS];
+
+ while (state != EXIT) {
+ /* read next chunk of bytes from the pipe */
+ if (!ReadFile(hread, buffer, BUFFER_SIZE, &bytes, NULL)) {
+ /* exit if pipe has been closed */
+ if (GetLastError() == ERROR_BROKEN_PIPE)
+ break;
+ /* ignore other errors */
+ continue;
}
+
+ /* scan the bytes and handle ANSI control codes */
+ start = end = 0;
+ while (end < bytes) {
+ c = buffer[end++];
+ switch (state) {
+ case TEXT:
+ if (c == ESCAPE) {
+ /* print text seen so far */
+ if (end - 1 > start)
+ write_console(buffer + start, end - 1 - start);
+
+ /* then start parsing escape sequence */
+ start = end - 1;
+ memset(params, 0, sizeof(params));
+ parampos = 0;
+ state = ESCAPE;
+ }
+ break;
+
+ case ESCAPE:
+ /* continue if "\033[", otherwise bail out */
+ state = (c == BRACKET) ? BRACKET : TEXT;
+ break;
+
+ case BRACKET:
+ /* parse [0-9;]* into array of parameters */
+ switch (c) {
+ case '0' ... '9':
+ params[parampos] *= 10;
+ params[parampos] += c - '0';
+ break;
+ case ';':
+ /* next parameter, bail out if out of bounds */
+ parampos++;
+ if (parampos >= MAX_PARAMS)
+ state = TEXT;
+ break;
+ case 'q':
+ /* "\033[q": terminate the thread */
+ state = EXIT;
+ break;
+ default:
+ /* end of escape sequence, change console attributes */
+ set_attr(c, params, parampos + 1);
+ start = end;
+ state = TEXT;
+ break;
+ }
+ break;
+ }
+ }
+
+ /* print remaining text unless we're parsing an escape sequence */
+ if (state == TEXT && end > start)
+ write_console(buffer + start, end - start);
}
- return rv;
+
+ /* check if the console font supports unicode */
+ warn_if_raster_font();
+
+ CloseHandle(hread);
+ return 0;
}

-int winansi_fputs(const char *str, FILE *stream)
+static void winansi_exit(void)
{
- int rv;
-
- if (!is_console(stream))
- return fputs(str, stream);
+ DWORD dummy;
+ /* flush all streams */
+ _flushall();
+
+ /* close the write ends of the pipes */
+ if (hwrite1 != INVALID_HANDLE_VALUE)
+ CloseHandle(hwrite1);
+ if (hwrite2 != INVALID_HANDLE_VALUE)
+ CloseHandle(hwrite2);
+
+ /* send termination sequence to the thread */
+ WriteFile(hwrite, "\033[q", 3, &dummy, NULL);
+ CloseHandle(hwrite);
+
+ /* allow the thread to copy remaining data to the console */
+ WaitForSingleObject(hthread, 1000);
+ CloseHandle(hthread);
+}

- rv = ansi_emulate(str, stream);
+static void die_lasterr(const char *fmt, ...)
+{
+ va_list params;
+ va_start(params, fmt);
+ errno = err_win_to_posix(GetLastError());
+ die_errno(fmt, params);
+ va_end(params);
+}

- if (rv >= 0)
- return 0;
- else
- return EOF;
+static HANDLE duplicate_handle(HANDLE hnd)
+{
+ HANDLE hresult, hproc = GetCurrentProcess();
+ if (!DuplicateHandle(hproc, hnd, hproc, &hresult, 0, TRUE,
+ DUPLICATE_SAME_ACCESS))
+ die_lasterr("DuplicateHandle(%li) failed", (long) hnd);
+ return hresult;
}

-int winansi_vfprintf(FILE *stream, const char *format, va_list list)
+static HANDLE redirect_console(int fd, HANDLE *phcon, int new_fd)
{
- int len, rv;
- char small_buf[256];
- char *buf = small_buf;
- va_list cp;
+ /* get original console handle */
+ HANDLE hcon = (HANDLE) _get_osfhandle(fd);
+ if (hcon == INVALID_HANDLE_VALUE)
+ die_errno("_get_osfhandle(%i) failed", fd);

- if (!is_console(stream))
- goto abort;
+ /* save a copy to phcon and console (used by the background thread) */
+ console = *phcon = duplicate_handle(hcon);

- va_copy(cp, list);
- len = vsnprintf(small_buf, sizeof(small_buf), format, cp);
- va_end(cp);
+ /* duplicate new_fd over fd (closes fd and associated handle (hcon)) */
+ if (_dup2(new_fd, fd))
+ die_errno("_dup2(%i, %i) failed", new_fd, fd);

- if (len > sizeof(small_buf) - 1) {
- buf = malloc(len + 1);
- if (!buf)
- goto abort;
+ /* no buffering, or stdout / stderr will be out of sync */
+ setbuf(&_iob[fd], NULL);
+ return (HANDLE) _get_osfhandle(fd);
+}

- len = vsnprintf(buf, len + 1, format, list);
- }
+void winansi_init(void)
+{
+ int con1, con2, hwrite_fd;

- rv = ansi_emulate(buf, stream);
+ /* check if either stdout or stderr is a console output screen buffer */
+ con1 = is_console(1);
+ con2 = is_console(2);
+ if (!con1 && !con2)
+ return;

- if (buf != small_buf)
- free(buf);
- return rv;
+ /* create an anonymous pipe */
+ if (!CreatePipe(&hread, &hwrite, NULL, BUFFER_SIZE))
+ die_lasterr("CreatePipe failed");

-abort:
- rv = vfprintf(stream, format, list);
- return rv;
-}
+ /* start console spool thread on the pipe's read end */
+ hthread = CreateThread(NULL, 0, console_thread, NULL, 0, NULL);
+ if (hthread == INVALID_HANDLE_VALUE)
+ die_lasterr("CreateThread(console_thread) failed");

-int winansi_fprintf(FILE *stream, const char *format, ...)
-{
- va_list list;
- int rv;
+ /* schedule cleanup routine */
+ if (atexit(winansi_exit))
+ die_errno("atexit(winansi_exit) failed");

- va_start(list, format);
- rv = winansi_vfprintf(stream, format, list);
- va_end(list);
+ /* create a file descriptor for the write end of the pipe */
+ hwrite_fd = _open_osfhandle((long) duplicate_handle(hwrite), _O_BINARY);
+ if (hwrite_fd == -1)
+ die_errno("_open_osfhandle(%li) failed", (long) hwrite);

- return rv;
+ /* redirect stdout / stderr to the pipe */
+ if (con1)
+ hwrite1 = redirect_console(1, &hconsole1, hwrite_fd);
+ if (con2)
+ hwrite2 = redirect_console(2, &hconsole2, hwrite_fd);
+
+ /* close pipe file descriptor (also closes the duped hwrite) */
+ close(hwrite_fd);
}

-int winansi_printf(const char *format, ...)
+static int is_same_handle(HANDLE hnd, int fd)
{
- va_list list;
- int rv;
+ return hnd != INVALID_HANDLE_VALUE && hnd == (HANDLE) _get_osfhandle(fd);
+}

- va_start(list, format);
- rv = winansi_vfprintf(stdout, format, list);
- va_end(list);
+/*
+ * Return true if stdout / stderr is a pipe redirecting to the console.
+ */
+int winansi_isatty(int fd)
+{
+ if (fd == 1 && is_same_handle(hwrite1, 1))
+ return 1;
+ else if (fd == 2 && is_same_handle(hwrite2, 2))
+ return 1;
+ else
+ return isatty(fd);
+}

- return rv;
+/*
+ * Returns the real console handle if stdout / stderr is a pipe redirecting
+ * to the console. Allows spawn / exec to pass the console to the next process.
+ */
+HANDLE winansi_get_osfhandle(int fd)
+{
+ if (fd == 1 && is_same_handle(hwrite1, 1))
+ return hconsole1;
+ else if (fd == 2 && is_same_handle(hwrite2, 2))
+ return hconsole2;
+ else
+ return (HANDLE) _get_osfhandle(fd);
}
--
1.7.3.1.msysgit.0.8.g31c0ec

Karsten Blees

unread,
Oct 24, 2010, 6:00:19 PM10/24/10
to msy...@googlegroups.com, Karsten Blees
Replaces Windows "ANSI" APIs dealing with file- or path names with their
Unicode equivalent, adding UTF-8/UTF-16LE conversion as necessary.

Adds trivial wrappers for access, chmod and chdir.

The opendir/readdir/closedir API is a complete rewrite, as hacking into
the mingw version is no longer possible with UTF-8/16 conversion.

To support repositories with legacy-encoded file names, the UTF-8 to UTF-16
conversion function tries to create valid, unique file names even for
invalid UTF-8 byte sequences, so that these repositories can be checked out
without error.

The current implementation leaves invalid UTF-8 bytes in range 0xa0 - 0xff
as is (producing printable Unicode chars \u00a0 - \u00ff, equivalent to
ISO-8859-1), and converts 0x80 - 0x9f to hex-code (\u0080 - \u009f are
control chars).

The Windows MultiByteToWideChar API was not used as it either drops invalid
UTF-8 sequences (on Win2k/XP; producing non-unique or even empty file
names) or converts them to the replacement char \ufffd (Vista/7; causing
ERROR_INVALID_NAME in subsequent calls to file system APIs).

The simplest way to convert a repository with legacy-encoded (e.g. Cp1252)
file names to UTF-8 ist to checkout with an old msysgit version and
"git add --all & git commit" with the new version.

Signed-off-by: Karsten Blees <bl...@dcon.de>
---

compat/mingw.c | 394 +++++++++++++++++++++++++++++++++++++++----------------
compat/mingw.h | 74 +++++++++--
2 files changed, 343 insertions(+), 125 deletions(-)

diff --git a/compat/mingw.c b/compat/mingw.c
index 1829e94..8f20b4b 100644
--- a/compat/mingw.c
+++ b/compat/mingw.c
@@ -1,6 +1,7 @@
#include "../git-compat-util.h"
#include "win32.h"
#include <conio.h>
+#include <wchar.h>
#include "../strbuf.h"
#include "../cache.h"
#include "../run-command.h"
@@ -121,10 +122,10 @@ int err_win_to_posix(DWORD winerr)
return error;
}

-static int make_hidden(const char *path)
+static int make_hidden(const wchar_t *path)
{
- DWORD attribs = GetFileAttributes(path);
- if (SetFileAttributes(path, FILE_ATTRIBUTE_HIDDEN | attribs))
+ DWORD attribs = GetFileAttributesW(path);
+ if (SetFileAttributesW(path, FILE_ATTRIBUTE_HIDDEN | attribs))
return 0;
errno = err_win_to_posix(GetLastError());
return -1;
@@ -132,19 +133,22 @@ static int make_hidden(const char *path)

void mingw_mark_as_git_dir(const char *dir)
{
- if (hide_dotfiles != HIDE_DOTFILES_FALSE && !is_bare_repository() &&
- make_hidden(dir))
- warning("Failed to make '%s' hidden", dir);
+ wchar_t wdir[MAX_PATH];
+ if (hide_dotfiles != HIDE_DOTFILES_FALSE && !is_bare_repository())
+ if (utftowcs(wdir, dir, MAX_PATH) < 0 || make_hidden(wdir))
+ warning("Failed to make '%s' hidden", dir);
git_config_set("core.hideDotFiles",
hide_dotfiles == HIDE_DOTFILES_FALSE ? "false" :
(hide_dotfiles == HIDE_DOTFILES_DOTGITONLY ?
"dotGitOnly" : "true"));
}

-#undef mkdir
int mingw_mkdir(const char *path, int mode)
{
- int ret = mkdir(path);
+ wchar_t wpath[MAX_PATH];
+ if (utftowcs(wpath, path, MAX_PATH) < 0)
+ return -1;
+ int ret = _wmkdir(wpath);
if (!ret && hide_dotfiles == HIDE_DOTFILES_TRUE) {
/*
* In Windows a file or dir starting with a dot is not
@@ -153,7 +157,7 @@ int mingw_mkdir(const char *path, int mode)
*/
const char *start = basename((char*)path);
if (*start == '.')
- return make_hidden(path);
+ return make_hidden(wpath);
}
return ret;
}
@@ -206,14 +210,16 @@ static int ask_user_yes_no(const char *format, ...)
}
}

-#undef unlink
int mingw_unlink(const char *pathname)
{
int ret, tries = 0;
+ wchar_t wpathname[MAX_PATH];
+ if (utftowcs(wpathname, pathname, MAX_PATH) < 0)
+ return -1;

/* read-only files cannot be removed */
- chmod(pathname, 0666);
- while ((ret = unlink(pathname)) == -1 && tries < ARRAY_SIZE(delay)) {
+ _wchmod(wpathname, 0666);
+ while ((ret = _wunlink(wpathname)) == -1 && tries < ARRAY_SIZE(delay)) {
if (errno != EACCES)
break;
/*
@@ -229,43 +235,40 @@ int mingw_unlink(const char *pathname)
while (ret == -1 && errno == EACCES &&
ask_user_yes_no("Unlink of file '%s' failed. "
"Should I try again?", pathname))
- ret = unlink(pathname);
+ ret = _wunlink(wpathname);
return ret;
}

-static int is_dir_empty(const char *path)
+static int is_dir_empty(const wchar_t *wpath)
{
- struct strbuf buf = STRBUF_INIT;
- WIN32_FIND_DATAA findbuf;
+ WIN32_FIND_DATAW findbuf;
HANDLE handle;
-
- strbuf_addf(&buf, "%s\\*", path);
- handle = FindFirstFileA(buf.buf, &findbuf);
- if (handle == INVALID_HANDLE_VALUE) {
- strbuf_release(&buf);
+ wchar_t wbuf[MAX_PATH + 2];
+ wcscpy(wbuf, wpath);
+ wcscat(wbuf, L"\\*");
+ handle = FindFirstFileW(wbuf, &findbuf);
+ if (handle == INVALID_HANDLE_VALUE)
return GetLastError() == ERROR_NO_MORE_FILES;
- }

- while (!strcmp(findbuf.cFileName, ".") ||
- !strcmp(findbuf.cFileName, ".."))
- if (!FindNextFile(handle, &findbuf)) {
- strbuf_release(&buf);
+ while (!wcscmp(findbuf.cFileName, L".") ||
+ !wcscmp(findbuf.cFileName, L".."))
+ if (!FindNextFileW(handle, &findbuf))
return GetLastError() == ERROR_NO_MORE_FILES;
- }
FindClose(handle);
- strbuf_release(&buf);
return 0;
}

-#undef rmdir
int mingw_rmdir(const char *pathname)
{
int ret, tries = 0;
+ wchar_t wpathname[MAX_PATH];
+ if (utftowcs(wpathname, pathname, MAX_PATH) < 0)
+ return -1;

- while ((ret = rmdir(pathname)) == -1 && tries < ARRAY_SIZE(delay)) {
+ while ((ret = _wrmdir(wpathname)) == -1 && tries < ARRAY_SIZE(delay)) {
if (errno != EACCES)
break;
- if (!is_dir_empty(pathname)) {
+ if (!is_dir_empty(wpathname)) {
errno = ENOTEMPTY;
break;
}
@@ -282,16 +285,16 @@ int mingw_rmdir(const char *pathname)
while (ret == -1 && errno == EACCES &&
ask_user_yes_no("Deletion of directory '%s' failed. "
"Should I try again?", pathname))
- ret = rmdir(pathname);
+ ret = _wrmdir(wpathname);
return ret;
}

-#undef open
int mingw_open (const char *filename, int oflags, ...)
{
va_list args;
unsigned mode;
int fd;
+ wchar_t wfilename[MAX_PATH];

va_start(args, oflags);
mode = va_arg(args, int);
@@ -300,10 +303,12 @@ int mingw_open (const char *filename, int oflags, ...)
if (filename && !strcmp(filename, "/dev/null"))
filename = "nul";

- fd = open(filename, oflags, mode);
+ if (utftowcs(wfilename, filename, MAX_PATH) < 0)
+ return -1;
+ fd = _wopen(wfilename, oflags, mode);

if (fd < 0 && (oflags & O_CREAT) && errno == EACCES) {
- DWORD attrs = GetFileAttributes(filename);
+ DWORD attrs = GetFileAttributesW(wfilename);
if (attrs != INVALID_FILE_ATTRIBUTES && (attrs & FILE_ATTRIBUTE_DIRECTORY))
errno = EISDIR;
}
@@ -315,7 +320,7 @@ int mingw_open (const char *filename, int oflags, ...)
* such a file is created.
*/
const char *start = basename((char*)filename);
- if (*start == '.' && make_hidden(filename))
+ if (*start == '.' && make_hidden(wfilename))
warning("Could not mark '%s' as hidden.", filename);
}
return fd;
@@ -338,38 +343,69 @@ ssize_t mingw_write(int fd, const void *buf, size_t count)
return write(fd, buf, min(count, 31 * 1024 * 1024));
}

-#undef fopen
FILE *mingw_fopen (const char *filename, const char *otype)
{
int hide = 0;
FILE *file;
+ wchar_t wfilename[MAX_PATH], wotype[MAX_PATH];
if (hide_dotfiles == HIDE_DOTFILES_TRUE &&
basename((char*)filename)[0] == '.')
hide = access(filename, F_OK);
if (filename && !strcmp(filename, "/dev/null"))
filename = "nul";
- file = fopen(filename, otype);
- if (file && hide && make_hidden(filename))
+ if (utftowcs(wfilename, filename, MAX_PATH) < 0 ||
+ utftowcs(wotype, otype, 10) < 0)
+ return NULL;
+ file = _wfopen(wfilename, wotype);
+ if (file && hide && make_hidden(wfilename))
warning("Could not mark '%s' as hidden.", filename);
return file;
}

-#undef freopen
FILE *mingw_freopen (const char *filename, const char *otype, FILE *stream)
{
int hide = 0;
FILE *file;
+ wchar_t wfilename[MAX_PATH], wotype[10];
if (hide_dotfiles == HIDE_DOTFILES_TRUE &&
basename((char*)filename)[0] == '.')
hide = access(filename, F_OK);
if (filename && !strcmp(filename, "/dev/null"))
filename = "nul";
- file = freopen(filename, otype, stream);
- if (file && hide && make_hidden(filename))
+ if (utftowcs(wfilename, filename, MAX_PATH) < 0 ||
+ utftowcs(wotype, otype, 10) < 0)
+ return NULL;
+ file = _wfreopen(wfilename, wotype, stream);
+ if (file && hide && make_hidden(wfilename))
warning("Could not mark '%s' as hidden.", filename);
return file;
}

+int mingw_access(const char *filename, int mode)
+{
+ wchar_t wfilename[MAX_PATH];
+ if (utftowcs(wfilename, filename, MAX_PATH) < 0)
+ return -1;
+ /* X_OK is not supported by the MSVCRT version */
+ return _waccess(wfilename, mode & ~X_OK);
+}
+
+int mingw_chdir(const char *dirname)
+{
+ wchar_t wdirname[MAX_PATH];
+ if (utftowcs(wdirname, dirname, MAX_PATH) < 0)
+ return -1;
+ return _wchdir(wdirname);
+}
+
+int mingw_chmod(const char *filename, int mode)
+{
+ wchar_t wfilename[MAX_PATH];
+ if (utftowcs(wfilename, filename, MAX_PATH) < 0)
+ return -1;
+ return _wchmod(wfilename, mode);
+}
+
/*
* The unit of FILETIME is 100-nanoseconds since January 1, 1601, UTC.
* Returns the 100-nanoseconds ("hekto nanoseconds") since the epoch.
@@ -396,8 +432,11 @@ static inline time_t filetime_to_time_t(const FILETIME *ft)
static int do_lstat(int follow, const char *file_name, struct stat *buf)
{
WIN32_FILE_ATTRIBUTE_DATA fdata;
+ wchar_t wfilename[MAX_PATH];
+ if (utftowcs(wfilename, file_name, MAX_PATH) < 0)
+ return -1;

- if (!(errno = get_file_attr(file_name, &fdata))) {
+ if (GetFileAttributesExW(wfilename, GetFileExInfoStandard, &fdata)) {
buf->st_ino = 0;
buf->st_gid = 0;
buf->st_uid = 0;
@@ -410,8 +449,8 @@ static int do_lstat(int follow, const char *file_name, struct stat *buf)
buf->st_mtime = filetime_to_time_t(&(fdata.ftLastWriteTime));
buf->st_ctime = filetime_to_time_t(&(fdata.ftCreationTime));
if (fdata.dwFileAttributes & FILE_ATTRIBUTE_REPARSE_POINT) {
- WIN32_FIND_DATAA findbuf;
- HANDLE handle = FindFirstFileA(file_name, &findbuf);
+ WIN32_FIND_DATAW findbuf;
+ HANDLE handle = FindFirstFileW(wfilename, &findbuf);
if (handle != INVALID_HANDLE_VALUE) {
if ((findbuf.dwFileAttributes & FILE_ATTRIBUTE_REPARSE_POINT) &&
(findbuf.dwReserved0 == IO_REPARSE_TAG_SYMLINK)) {
@@ -430,6 +469,23 @@ static int do_lstat(int follow, const char *file_name, struct stat *buf)
}
return 0;
}
+ switch (GetLastError()) {
+ case ERROR_ACCESS_DENIED:
+ case ERROR_SHARING_VIOLATION:
+ case ERROR_LOCK_VIOLATION:
+ case ERROR_SHARING_BUFFER_EXCEEDED:
+ errno = EACCES;
+ break;
+ case ERROR_BUFFER_OVERFLOW:
+ errno = ENAMETOOLONG;
+ break;
+ case ERROR_NOT_ENOUGH_MEMORY:
+ errno = ENOMEM;
+ break;
+ default:
+ errno = ENOENT;
+ break;
+ }
return -1;
}

@@ -442,7 +498,7 @@ static int do_lstat(int follow, const char *file_name, struct stat *buf)
static int do_stat_internal(int follow, const char *file_name, struct stat *buf)
{
int namelen;
- static char alt_name[PATH_MAX];
+ char alt_name[PATH_MAX];

if (!do_lstat(follow, file_name, buf))
return 0;
@@ -518,16 +574,19 @@ int mingw_utime (const char *file_name, const struct utimbuf *times)
{
FILETIME mft, aft;
int fh, rc;
+ wchar_t wfilename[MAX_PATH];
+ if (utftowcs(wfilename, file_name, MAX_PATH) < 0)
+ return -1;

/* must have write permission */
- DWORD attrs = GetFileAttributes(file_name);
+ DWORD attrs = GetFileAttributesW(wfilename);
if (attrs != INVALID_FILE_ATTRIBUTES &&
(attrs & FILE_ATTRIBUTE_READONLY)) {
/* ignore errors here; open() will report them */
- SetFileAttributes(file_name, attrs & ~FILE_ATTRIBUTE_READONLY);
+ SetFileAttributesW(wfilename, attrs & ~FILE_ATTRIBUTE_READONLY);
}

- if ((fh = open(file_name, O_RDWR | O_BINARY)) < 0) {
+ if ((fh = _wopen(wfilename, O_RDWR | O_BINARY)) < 0) {
rc = -1;
goto revert_attrs;
}
@@ -550,7 +609,7 @@ revert_attrs:
if (attrs != INVALID_FILE_ATTRIBUTES &&
(attrs & FILE_ATTRIBUTE_READONLY)) {
/* ignore errors again */
- SetFileAttributes(file_name, attrs);
+ SetFileAttributesW(wfilename, attrs);
}
return rc;
}
@@ -563,10 +622,14 @@ unsigned int sleep (unsigned int seconds)

int mkstemp(char *template)
{
- char *filename = mktemp(template);
- if (filename == NULL)
+ wchar_t wtemplate[MAX_PATH];
+ if (utftowcs(wtemplate, template, MAX_PATH) < 0)
+ return -1;
+ if (_wmktemp(wtemplate) == NULL)
return -1;
- return open(filename, O_RDWR | O_CREAT, 0600);
+ if (wcstoutf(template, wtemplate, strlen(template) + 1) < 0)
+ return -1;
+ return _wopen(wtemplate, O_RDWR | O_CREAT, 0600);
}

int gettimeofday(struct timeval *tv, void *tz)
@@ -684,17 +747,18 @@ struct tm *localtime_r(const time_t *timep, struct tm *result)
return result;
}

-#undef getcwd
char *mingw_getcwd(char *pointer, int len)
{
int i;
- char *ret = getcwd(pointer, len);
- if (!ret)
- return ret;
+ wchar_t wpointer[MAX_PATH];
+ if (!_wgetcwd(wpointer, MAX_PATH))
+ return NULL;
+ if (wcstoutf(pointer, wpointer, len) < 0)
+ return NULL;
for (i = 0; pointer[i]; i++)
if (pointer[i] == '\\')
pointer[i] = '/';
- return ret;
+ return pointer;
}

#undef getenv
@@ -1412,33 +1476,38 @@ int mingw_rename(const char *pold, const char *pnew)
{
DWORD attrs, gle;
int tries = 0;
+ wchar_t wpold[MAX_PATH], wpnew[MAX_PATH];
+ if (utftowcs(wpold, pold, MAX_PATH) < 0)
+ return -1;
+ if (utftowcs(wpnew, pnew, MAX_PATH) < 0)
+ return -1;

/*
* Try native rename() first to get errno right.
* It is based on MoveFile(), which cannot overwrite existing files.
*/
- if (!rename(pold, pnew))
+ if (!_wrename(wpold, wpnew))
return 0;
if (errno != EEXIST)
return -1;
repeat:
- if (MoveFileEx(pold, pnew, MOVEFILE_REPLACE_EXISTING))
+ if (MoveFileExW(wpold, wpnew, MOVEFILE_REPLACE_EXISTING))
return 0;
/* TODO: translate more errors */
gle = GetLastError();
if (gle == ERROR_ACCESS_DENIED &&
- (attrs = GetFileAttributes(pnew)) != INVALID_FILE_ATTRIBUTES) {
+ (attrs = GetFileAttributesW(wpnew)) != INVALID_FILE_ATTRIBUTES) {
if (attrs & FILE_ATTRIBUTE_DIRECTORY) {
errno = EISDIR;
return -1;
}
if ((attrs & FILE_ATTRIBUTE_READONLY) &&
- SetFileAttributes(pnew, attrs & ~FILE_ATTRIBUTE_READONLY)) {
- if (MoveFileEx(pold, pnew, MOVEFILE_REPLACE_EXISTING))
+ SetFileAttributesW(wpnew, attrs & ~FILE_ATTRIBUTE_READONLY)) {
+ if (MoveFileExW(wpold, wpnew, MOVEFILE_REPLACE_EXISTING))
return 0;
gle = GetLastError();
/* revert file attributes on failure */
- SetFileAttributes(pnew, attrs);
+ SetFileAttributesW(wpnew, attrs);
}
}
if (tries < ARRAY_SIZE(delay) && gle == ERROR_ACCESS_DENIED) {
@@ -1648,11 +1717,17 @@ void mingw_open_html(const char *unixpath)

int link(const char *oldpath, const char *newpath)
{
- typedef BOOL (WINAPI *T)(const char*, const char*, LPSECURITY_ATTRIBUTES);
+ typedef BOOL (WINAPI *T)(LPCWSTR, LPCWSTR, LPSECURITY_ATTRIBUTES);
static T create_hard_link = NULL;
+ wchar_t woldpath[MAX_PATH], wnewpath[MAX_PATH];
+ if (utftowcs(woldpath, oldpath, MAX_PATH) < 0)
+ return -1;
+ if (utftowcs(wnewpath, newpath, MAX_PATH) < 0)
+ return -1;
+
if (!create_hard_link) {
create_hard_link = (T) GetProcAddress(
- GetModuleHandle("kernel32.dll"), "CreateHardLinkA");
+ GetModuleHandle("kernel32.dll"), "CreateHardLinkW");
if (!create_hard_link)
create_hard_link = (T)-1;
}
@@ -1660,7 +1735,7 @@ int link(const char *oldpath, const char *newpath)
errno = ENOSYS;
return -1;
}
- if (!create_hard_link(newpath, oldpath, NULL)) {
+ if (!create_hard_link(wnewpath, woldpath, NULL)) {
errno = err_win_to_posix(GetLastError());
return -1;
}
@@ -1682,65 +1757,73 @@ char *getpass(const char *prompt)
return strbuf_detach(&buf, NULL);
}

-#ifndef NO_MINGW_REPLACE_READDIR
/* MinGW readdir implementation to avoid extra lstats for Git */
-struct mingw_DIR
-{
- struct _finddata_t dd_dta; /* disk transfer area for this dir */
- struct mingw_dirent dd_dir; /* Our own implementation, including d_type */
- long dd_handle; /* _findnext handle */
- int dd_stat; /* 0 = next entry to read is first entry, -1 = off the end, positive = 0 based index of next entry */
- char dd_name[1]; /* given path for dir with search pattern (struct is extended) */
-};

-struct dirent *mingw_readdir(DIR *dir)
+mingw_DIR *mingw_opendir(const char *pathname)
{
- WIN32_FIND_DATAA buf;
- HANDLE handle;
- struct mingw_DIR *mdir = (struct mingw_DIR*)dir;
-
- if (!dir->dd_handle) {
- errno = EBADF; /* No set_errno for mingw */
+ wchar_t wpathname[MAX_PATH];
+ char filename[MAX_PATH];
+ int len = utftowcs(wpathname, pathname, MAX_PATH - 2);
+ if (len < 0)
return NULL;
- }

- if (dir->dd_handle == (long)INVALID_HANDLE_VALUE && dir->dd_stat == 0)
- {
- DWORD lasterr;
- handle = FindFirstFileA(dir->dd_name, &buf);
- lasterr = GetLastError();
- dir->dd_handle = (long)handle;
- if (handle == INVALID_HANDLE_VALUE && (lasterr != ERROR_NO_MORE_FILES)) {
- errno = err_win_to_posix(lasterr);
- return NULL;
- }
- } else if (dir->dd_handle == (long)INVALID_HANDLE_VALUE) {
- return NULL;
- } else if (!FindNextFileA((HANDLE)dir->dd_handle, &buf)) {
- DWORD lasterr = GetLastError();
- FindClose((HANDLE)dir->dd_handle);
- dir->dd_handle = (long)INVALID_HANDLE_VALUE;
- /* POSIX says you shouldn't set errno when readdir can't
- find any more files; so, if another error we leave it set. */
- if (lasterr != ERROR_NO_MORE_FILES)
- errno = err_win_to_posix(lasterr);
+ /* append optional '/' and wildcard '*' */
+ if (len && wpathname[len - 1] != '/' && wpathname[len - 1] != '\\')
+ wpathname[len++] = '/';
+ wpathname[len++] = '*';
+ wpathname[len] = 0;
+
+ WIN32_FIND_DATAW fdata;
+ HANDLE h = FindFirstFileW(wpathname, &fdata);
+ if (h == INVALID_HANDLE_VALUE) {
+ errno = err_win_to_posix(GetLastError());
return NULL;
}

- /* We get here if `buf' contains valid data. */
- strcpy(dir->dd_dir.d_name, buf.cFileName);
- ++dir->dd_stat;
+ mingw_DIR *dir = (mingw_DIR*) xmalloc(sizeof(mingw_DIR));
+ dir->pos = dir->size = 0;
+ dir->entries = NULL;

- /* Set file type, based on WIN32_FIND_DATA */
- mdir->dd_dir.d_type = 0;
- if (buf.dwFileAttributes & FILE_ATTRIBUTE_DIRECTORY)
- mdir->dd_dir.d_type |= DT_DIR;
- else
- mdir->dd_dir.d_type |= DT_REG;
+ int alloc = 0;
+ do {
+ ALLOC_GROW(dir->entries, dir->size + 1, alloc);
+ len = wcstoutf(filename, fdata.cFileName, MAX_PATH);
+
+ struct mingw_dirent *ent = (struct mingw_dirent*) xmalloc(
+ sizeof(struct mingw_dirent) + len);
+ memcpy(&ent->d_name, filename, len + 1);
+ if (fdata.dwFileAttributes & FILE_ATTRIBUTE_DIRECTORY)
+ ent->d_type = DT_DIR;
+ else
+ ent->d_type = DT_REG;

- return (struct dirent*)&dir->dd_dir;
+ dir->entries[dir->size++] = ent;
+
+ } while (FindNextFileW(h, &fdata));
+ FindClose(h);
+ return dir;
+}
+
+struct mingw_dirent *mingw_readdir(mingw_DIR *dir)
+{
+ if (dir->pos < dir->size)
+ return dir->entries[(dir->pos)++];
+ return NULL;
+}
+
+void mingw_rewinddir(mingw_DIR *dir)
+{
+ dir->pos = 0;
+}
+
+void mingw_closedir(mingw_DIR *dir)
+{
+ int i;
+ for (i = 0; i < dir->size; i++)
+ free(dir->entries[i]);
+ free(dir->entries);
+ free(dir);
}
-#endif // !NO_MINGW_REPLACE_READDIR

const char *get_windows_home_directory()
{
@@ -1783,3 +1866,84 @@ int mingw_offset_1st_component(const char *path)

return offset + is_dir_sep(path[offset]);
}
+
+int utftowcsn(wchar_t *wcs, const char *utf, size_t wcslen, int utflen)
+{
+ int upos = 0, wpos = 0;
+ if (!utf || !wcs || wcslen < 1) {
+ errno = EINVAL;
+ return -1;
+ }
+ /* reserve space for \0 */
+ wcslen--;
+ if (utflen < 0)
+ utflen = INT_MAX;
+
+ while (upos < utflen) {
+ int c = ((int) utf[upos++]) & 0xff;
+ if (utflen == INT_MAX && c == 0)
+ break;
+
+ if (wpos >= wcslen) {
+ wcs[wpos] = 0;
+ errno = ENAMETOOLONG;
+ return -1;
+ }
+
+ if (c < 0x80) {
+ /* ASCII */
+ wcs[wpos++] = c;
+ } else if (c >= 0xc2 && c < 0xe0 && upos < utflen &&
+ (utf[upos] & 0xc0) == 0x80) {
+ /* 2-byte utf-8 */
+ c = ((c & 0x1f) << 6);
+ c |= (utf[upos++] & 0x3f);
+ wcs[wpos++] = c;
+ } else if (c >= 0xe0 && c < 0xf0 && upos + 1 < utflen &&
+ (utf[upos] & 0xc0) == 0x80 &&
+ (utf[upos + 1] & 0xc0) == 0x80) {
+ /* 3-byte utf-8 */
+ c = ((c & 0x0f) << 12);
+ c |= ((utf[upos++] & 0x3f) << 6);
+ c |= (utf[upos++] & 0x3f);
+ wcs[wpos++] = c;
+ } else if (c >= 0xf0 && c < 0xf5 && upos + 2 < utflen &&
+ wpos + 1 < wcslen &&
+ (utf[upos] & 0xc0) == 0x80 &&
+ (utf[upos + 1] & 0xc0) == 0x80 &&
+ (utf[upos + 2] & 0xc0) == 0x80) {
+ /* 4-byte utf-8: convert to \ud8xx \udcxx surrogate pair */
+ c = ((c & 0x07) << 18);
+ c |= ((utf[upos++] & 0x3f) << 12);
+ c |= ((utf[upos++] & 0x3f) << 6);
+ c |= (utf[upos++] & 0x3f);
+ c -= 0x10000;
+ wcs[wpos++] = 0xd800 | (c >> 10);
+ wcs[wpos++] = 0xdc00 | (c & 0x3ff);
+ } else if (c >= 0xa0) {
+ /* invalid utf-8 byte, printable unicode char: convert 1:1 */
+ wcs[wpos++] = c;
+ } else {
+ /* invalid utf-8 byte, non-printable unicode: convert to hex */
+ static const char *hex = "0123456789abcdef";
+ wcs[wpos++] = hex[c >> 4];
+ if (wpos < wcslen)
+ wcs[wpos++] = hex[c & 0x0f];
+ }
+ }
+ wcs[wpos] = 0;
+ return wpos;
+}
+
+int wcstoutf(char *utf, const wchar_t *wcs, size_t utflen)
+{
+ if (!wcs || !utf || utflen < 1) {
+ errno = EINVAL;
+ return -1;
+ }
+ utflen = WideCharToMultiByte(CP_UTF8, 0, wcs, -1, utf, utflen, NULL, NULL);
+ if (utflen)
+ return utflen - 1;
+ errno = ENAMETOOLONG;
+ return -1;
+}
diff --git a/compat/mingw.h b/compat/mingw.h
index 5587d14..41b0ad4 100644
--- a/compat/mingw.h
+++ b/compat/mingw.h
@@ -193,6 +193,16 @@ FILE *mingw_fopen (const char *filename, const char *otype);
FILE *mingw_freopen (const char *filename, const char *otype, FILE *stream);
#define freopen mingw_freopen

+int mingw_access(const char *filename, int mode);
+#undef access
+#define access mingw_access
+
+int mingw_chdir(const char *dirname);
+#define chdir mingw_chdir
+
+int mingw_chmod(const char *filename, int mode);
+#define chmod mingw_chmod
+
char *mingw_getcwd(char *pointer, int len);
#define getcwd mingw_getcwd

@@ -298,6 +308,42 @@ void mingw_mark_as_git_dir(const char *dir);
char **make_augmented_environ(const char *const *vars);
void free_environ(char **env);

+/**
+ * Converts UTF-8 encoded string to UTF-16LE. To support legacy-encoded
+ * repositories, invalid UTF-8 bytes 0xa0 - 0xff are converted to corresponding
+ * printable Unicode chars \u00a0 - \u00ff, and invalid UTF-8 bytes 0x80 - 0x9f
+ * (which would make non-printable Unicode) are converted to hex-code.
+ * Maximum space requirement for the target buffer is two wide chars per UTF-8
+ * char (((strlen(utf) * 2) + 1) [* sizeof(wchar_t)]).
+ *
+ * @param wcs wide char target buffer
+ * @param utf string to convert
+ * @param wcslen size of target buffer (in wchar_t's)
+ * @param utflen size of string to convert, or -1 if 0-terminated
+ * @return length of converted string (_wcslen(wcs)), or -1 on failure (errno
+ * is set to EINVAL or ENAMETOOLONG)
+ * @see mbstowcs, MultiByteToWideChar
+ */
+int utftowcsn(wchar_t *wcs, const char *utf, size_t wcslen, int utflen);
+static inline int utftowcs(wchar_t *wcs, const char *utf, size_t wcslen)
+{
+ return utftowcsn(wcs, utf, wcslen, -1);
+}
+
+/**
+ * Converts UTF-16LE encoded string to UTF-8.
+ * Maximum space requirement for the target buffer is three UTF-8 chars per
+ * wide char ((_wcslen(wcs) * 3) + 1).
+ *
+ * @param utf target buffer
+ * @param wcs wide string to convert
+ * @param utflen size of target buffer
+ * @return length of converted string, or -1 on failure (errno is set to EINVAL
+ * or ENAMETOOLONG)
+ * @see wcstombs, WideCharToMultiByte
+ */
+int wcstoutf(char *utf, const wchar_t *wcs, size_t utflen);
+
/*
* A replacement of main() that ensures that argv[0] has a path
* and that default fmode and std(in|out|err) are in binary mode
@@ -316,7 +362,6 @@ int main(int argc, const char **argv) \
} \
static int mingw_main(c,v)

-#ifndef NO_MINGW_REPLACE_READDIR
/*
* A replacement of readdir, to ensure that it reads the file type at
* the same time. This avoid extra unneeded lstats in git on MinGW
@@ -332,18 +377,27 @@ static int mingw_main(c,v)

struct mingw_dirent
{
- long d_ino; /* Always zero. */
- union {
- unsigned short d_reclen; /* Always zero. */
- unsigned char d_type; /* Reimplementation adds this */
- };
- unsigned short d_namlen; /* Length of name in d_name. */
- char d_name[FILENAME_MAX]; /* File name. */
+ unsigned char d_type;
+ char d_name[1];
};
+typedef struct mingw_DIR
+{
+ size_t size, pos;
+ struct mingw_dirent **entries;
+} mingw_DIR;
+mingw_DIR *mingw_opendir(const char *pathname);
+struct mingw_dirent *mingw_readdir(mingw_DIR *dir);
+void mingw_rewinddir(mingw_DIR *dir);
+void mingw_closedir(mingw_DIR *dir);
+
+#define NO_D_INO_IN_DIRENT 1
+#define _DIRENT_HAVE_D_TYPE 1
#define dirent mingw_dirent
+#define DIR mingw_DIR
+#define opendir(x) mingw_opendir(x)
#define readdir(x) mingw_readdir(x)
-struct dirent *mingw_readdir(DIR *dir);
-#endif // !NO_MINGW_REPLACE_READDIR
+#define rewinddir(x) mingw_rewinddir(x)
+#define closedir(x) mingw_closedir(x)

/*
* Used by Pthread API implementation for Windows
--
1.7.3.1.msysgit.0.8.g31c0ec

Karsten Blees

unread,
Oct 24, 2010, 6:00:20 PM10/24/10
to msy...@googlegroups.com, Karsten Blees
Assume git input / output is UTF-8 encoded.

Adds fixed utf-8 enconding where no specific encoding is specified.
Setting the default encoding to utf-8 in the startup code would have been
simpler, but usage of 'encoding system' is explicitly discouraged by the
TCL docs.

Also fixes gitk encoding problem with 'rename from' / 'rename to' lines.

Signed-off-by: Karsten Blees <bl...@dcon.de>
---

git-gui/git-gui.sh | 6 +++---
git-gui/lib/index.tcl | 6 +++---
gitk-git/gitk | 15 ++++++++-------
3 files changed, 14 insertions(+), 13 deletions(-)

diff --git a/git-gui/git-gui.sh b/git-gui/git-gui.sh
index 3c9a8aa..6180553 100755
--- a/git-gui/git-gui.sh
+++ b/git-gui/git-gui.sh
@@ -1558,7 +1558,7 @@ proc read_diff_index {fd after} {
set i [split [string range $buf_rdi $c [expr {$z1 - 2}]] { }]
set p [string range $buf_rdi $z1 [expr {$z2 - 1}]]
merge_state \
- [encoding convertfrom $p] \
+ [encoding convertfrom utf-8 $p] \
[lindex $i 4]? \
[list [lindex $i 0] [lindex $i 2]] \
[list]
@@ -1591,7 +1591,7 @@ proc read_diff_files {fd after} {
set i [split [string range $buf_rdf $c [expr {$z1 - 2}]] { }]
set p [string range $buf_rdf $z1 [expr {$z2 - 1}]]
merge_state \
- [encoding convertfrom $p] \
+ [encoding convertfrom utf-8 $p] \
?[lindex $i 4] \
[list] \
[list [lindex $i 0] [lindex $i 2]]
@@ -1614,7 +1614,7 @@ proc read_ls_others {fd after} {
set pck [split $buf_rlo "\0"]
set buf_rlo [lindex $pck end]
foreach p [lrange $pck 0 end-1] {
- set p [encoding convertfrom $p]
+ set p [encoding convertfrom utf-8 $p]
if {[string index $p end] eq {/}} {
set p [string range $p 0 end-1]
}
diff --git a/git-gui/lib/index.tcl b/git-gui/lib/index.tcl
index e9db0c4..107c8a8 100644
--- a/git-gui/lib/index.tcl
+++ b/git-gui/lib/index.tcl
@@ -112,7 +112,7 @@ proc write_update_indexinfo {fd pathList totalCnt batch after} {
set info [lindex $s 2]
if {$info eq {}} continue

- puts -nonewline $fd "$info\t[encoding convertto $path]\0"
+ puts -nonewline $fd "$info\t[encoding convertto utf-8 $path]\0"
display_file $path $new
}

@@ -180,7 +180,7 @@ proc write_update_index {fd pathList totalCnt batch after} {
?M {set new M_}
?? {continue}
}
- puts -nonewline $fd "[encoding convertto $path]\0"
+ puts -nonewline $fd "[encoding convertto utf-8 $path]\0"
display_file $path $new
}

@@ -241,7 +241,7 @@ proc write_checkout_index {fd pathList totalCnt batch after} {
?M -
?T -
?D {
- puts -nonewline $fd "[encoding convertto $path]\0"
+ puts -nonewline $fd "[encoding convertto utf-8 $path]\0"
display_file $path ?_
}
}
diff --git a/gitk-git/gitk b/gitk-git/gitk
index b9c8955..81c47a4 100644
--- a/gitk-git/gitk
+++ b/gitk-git/gitk
@@ -7222,7 +7222,7 @@ proc gettreeline {gtf id} {
if {[string index $fname 0] eq "\""} {
set fname [lindex $fname 0]
}
- set fname [encoding convertfrom $fname]
+ set fname [encoding convertfrom utf-8 $fname]
lappend treefilelist($id) $fname
}
if {![eof $gtf]} {
@@ -7444,7 +7444,7 @@ proc gettreediffline {gdtf ids} {
if {[string index $file 0] eq "\""} {
set file [lindex $file 0]
}
- set file [encoding convertfrom $file]
+ set file [encoding convertfrom utf-8 $file]
if {$file ne [lindex $treediff end]} {
lappend treediff $file
lappend sublist $file
@@ -7585,7 +7585,7 @@ proc makediffhdr {fname ids} {
global ctext curdiffstart treediffs diffencoding
global ctext_file_names jump_to_here targetline diffline

- set fname [encoding convertfrom $fname]
+ set fname [encoding convertfrom utf-8 $fname]
set diffencoding [get_path_encoding $fname]
set i [lsearch -exact $treediffs($ids) $fname]
if {$i >= 0} {
@@ -7618,7 +7618,7 @@ proc getblobdiffline {bdf ids} {
}
if {![string compare -length 5 "diff " $line]} {
if {![regexp {^diff (--cc|--git) } $line m type]} {
- set line [encoding convertfrom $line]
+ set line [encoding convertfrom utf-8 $line]
$ctext insert end "$line\n" hunksep
continue
}
@@ -7665,7 +7665,7 @@ proc getblobdiffline {bdf ids} {
makediffhdr $fname $ids

} elseif {![string compare -length 16 "* Unmerged path " $line]} {
- set fname [encoding convertfrom [string range $line 16 end]]
+ set fname [encoding convertfrom utf-8 [string range $line 16 end]]
$ctext insert end "\n"
set curdiffstart [$ctext index "end - 1c"]
lappend ctext_file_names $fname
@@ -7720,7 +7720,7 @@ proc getblobdiffline {bdf ids} {
if {[string index $fname 0] eq "\""} {
set fname [lindex $fname 0]
}
- set fname [encoding convertfrom $fname]
+ set fname [encoding convertfrom utf-8 $fname]
set i [lsearch -exact $treediffs($ids) $fname]
if {$i >= 0} {
setinlist difffilestart $i $curdiffstart
@@ -7739,6 +7739,7 @@ proc getblobdiffline {bdf ids} {
set diffinhdr 0
continue
}
+ set line [encoding convertfrom utf-8 $line]
$ctext insert end "$line\n" filesep

} else {
@@ -11296,7 +11297,7 @@ proc cache_gitattr {attr pathlist} {
foreach row [split $rlist "\n"] {
if {[regexp "(.*): $attr: (.*)" $row m path value]} {
if {[string index $path 0] eq "\""} {
- set path [encoding convertfrom [lindex $path 0]]
+ set path [encoding convertfrom utf-8 [lindex $path 0]]
}
set path_attr_cache($attr,$path) $value
}
--
1.7.3.1.msysgit.0.8.g31c0ec

karste...@dcon.de

unread,
Oct 27, 2010, 10:45:43 AM10/27/10
to Traian Stanev, msy...@googlegroups.com

Traian Stanev <tst...@gmail.com> wrote on 26.10.2010 21:01:43:


>
>
> On Oct 24, 6:00 pm, Karsten Blees <bl...@dcon.de> wrote:
> > Replaces stdout and stderr with a pipe if they point to the console. A
> > background thread reads from the pipe, handles ANSI escape sequences and
> > UTF-8 to UTF-16 conversion, then writes to the console.
> > ...
>
> Hi Karsten,
>
> I tried to apply this patch to the devel branch, but I get merge
> problems with the mingw.c change. Which branch are you working
> against? Of course I could apply the changes manually, but it would be
> better if I got them to merge successfully. I apologize if the answer
> is obvious -- I am new to msysgit.
>
> Thanks,
> Traian

Hi Traian,

the patch series applies cleanly to v1.7.3.1.msysgit.0 (which was current at the time...). There is a minor clash in patch #3 when merging to the current HEAD. Note that google groups has messed up the message order and threading (again...anyone know how to fix this?). There are 5 patches to be applied in order (you replied to #4).

If you don't get the patches to apply cleanly, you can import the attached bundle (already rebased to v1.7.3.2.msysgit.0) with "git pull unicode.bundle kb/unicode:kb/unicode".

When testing the console stuff, use --no-pager (or it will be 'less' pretty :-)

Hope that helps,
Karsten

unicode.bundle

Stefan Springer

unread,
Dec 16, 2010, 4:52:57 AM12/16/10
to msysGit
Hi,

it would be nice if someone could provide binaries or even an
installer containing the mentioned patches so that a broader class of
users can test this version (like Takuya Murakami did at http://tmurakam.org/git/).

Thanks a lot in advance,

Stefan

On 27 Okt., 15:45, karsten.bl...@dcon.de wrote:
>  unicode.bundle
> 25KAnzeigenHerunterladen

Erik Faye-Lund

unread,
Dec 16, 2010, 4:58:14 AM12/16/10
to Stefan Springer, msysGit
On Thu, Dec 16, 2010 at 10:52 AM, Stefan Springer
<stefan.s...@googlemail.com> wrote:
> Hi,
>
> it would be nice if someone could provide binaries or even an
> installer containing the mentioned patches so that a broader class of
> users can test this version (like Takuya Murakami did at http://tmurakam.org/git/).

Since you care, why don't you? It's just a matter of installing
msysGit (the self-contained Git for Windows development environment),
apply the patches and build.

Kirill Smelkov

unread,
Dec 18, 2010, 11:04:51 AM12/18/10
to Stefan Springer, msysGit
On Thu, Dec 16, 2010 at 01:52:57AM -0800, Stefan Springer wrote:
> Hi,
>
> it would be nice if someone could provide binaries or even an
> installer containing the mentioned patches so that a broader class of
> users can test this version (like Takuya Murakami did at http://tmurakam.org/git/).
>
> Thanks a lot in advance,

FYI:

I've put my builds with Karsten's UTF-8, mine (textconv for *.doc &
*.pdf) and other patches here:

http://repo.or.cz/w/msysgit/kirr.git/tree/refs/heads/x/built


Thought don't expect this to be regularly updated.


Thanks,
Kirill


P.S. the builds were done under wine, and I had to tweak installation
scripts a bit so the installer could be build at all. Otherwise it
seems to work ok under both wine and windows xp.

Johannes Schindelin

unread,
Dec 18, 2010, 4:14:56 PM12/18/10
to Kirill Smelkov, Stefan Springer, msysGit
Hi,

On Sat, 18 Dec 2010, Kirill Smelkov wrote:

> P.S. the builds were done under wine, and I had to tweak installation
> scripts a bit so the installer could be build at all. Otherwise it seems
> to work ok under both wine and windows xp.

Would it be possible to adjust the scripts in such a way that they work
both in Wine and Windows? It seems that Wine sets the environment variable
WINELOADERNOEXEC...

And if it is possible, it would be nice if you contributed your changes
back to msysGit.

Thanks,
Johannes

Kirill Smelkov

unread,
Dec 19, 2010, 6:42:40 AM12/19/10
to Johannes Schindelin, Stefan Springer, msysGit
On Sat, Dec 18, 2010 at 10:14:56PM +0100, Johannes Schindelin wrote:
> Hi,
>
> On Sat, 18 Dec 2010, Kirill Smelkov wrote:
>
> > P.S. the builds were done under wine, and I had to tweak installation
> > scripts a bit so the installer could be build at all. Otherwise it seems
> > to work ok under both wine and windows xp.
>
> Would it be possible to adjust the scripts in such a way that they work
> both in Wine and Windows? It seems that Wine sets the environment variable
> WINELOADERNOEXEC...

Hi Johannes, thanks for replying.

I think yes it is possible, but special-casing wine is not neccessary
because we are already 99% there. Here is the only change I've done to
iss scripts:


* Compiling installer was crashing on something related to icon, and I was
out of time budget for this task to investigate, so I've simply commented this.

diff --git a/share/WinGit/install.iss b/share/WinGit/install.iss
index 7743921..2e49cc5 100644
--- a/share/WinGit/install.iss
+++ b/share/WinGit/install.iss
@@ -30,7 +30,7 @@ PrivilegesRequired=none
UninstallDisplayIcon=etc\git.ico

; Cosmetic
-SetupIconFile=etc\git.ico
+; SetupIconFile=etc\git.ico -- does not compile under wine
WizardImageBackColor=clWhite
WizardImageStretch=no
WizardImageFile=git.bmp

Another change needed to produce the installer under wine is this
(already in Junio's master):

http://repo.or.cz/w/git.git/commit/15431ca651050ba315fa4e2e74527f6d115e706c

It fixes `git commit -m` to work under wine which was failing before.
The patch was Cc'ed to msysgit mailing list.

Another change which is related to wordpad -- on wine wordpad
is in C:\Windows\System32 , not in Program Files:

---- 8< ----

From: Kirill Smelkov <ki...@landau.phys.spbu.ru>
Date: Sun, 19 Dec 2010 14:01:16 +0300
Subject: [PATCH] wordpad: On Wine, wordpad.exe is in C:\Windows\System32\

I've also suppressed ls errors, so that `wordpad something.rtf` terminal
output is not noised with messages about some paths not being there.

Signed-off-by: Kirill Smelkov <ki...@landau.phys.spbu.ru>
---

If needed, the patch can be split into two.

bin/wordpad | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/bin/wordpad b/bin/wordpad
index 37d4d38..7ce7937 100644
--- a/bin/wordpad
+++ b/bin/wordpad
@@ -1,6 +1,6 @@
#!/bin/sh

-WORDPAD="$(ls "$PROGRAMFILES/Windows NT"/*/wordpad.exe | head -n 1)"
+WORDPAD="$(ls {"$PROGRAMFILES/Windows NT","$SYSTEMROOT"}/*/wordpad.exe 2>&- | head -1)"

test $# = 1 &&
case "$1" in
--
1.7.3.4.568.g1b974

---- 8< ----


That's all.

Then, I have another bunch of tweaks, so working on msysgit itself under
wine is not that painful. They are not strictly neccessary, nor are ready
for inclusion, but I'll post them here too for completeness:


* LS_COLORS as inherited from my Linux installation (Debian Lenny) is
not understood by msys ls, and so there are no colors at all.

diff --git a/etc/bash_profile b/etc/bash_profile
index 77b9c8d..3b701d7 100644
--- a/etc/bash_profile
+++ b/etc/bash_profile
@@ -6,3 +6,5 @@ alias less='less -r'
alias ls='ls -F --color --show-control-chars'
alias ll='ls -l'

+# it does not recognize st: and other prefixes inherited from linux
+unset LS_COLORS


* Compiling under Wine is slow, so on multicore system I tend to use
-j<2*N> for under-wine builds.

diff --git a/etc/profile b/etc/profile
index 8026e74..9725b7d 100644
--- a/etc/profile
+++ b/etc/profile
@@ -87,7 +87,7 @@ test -e /bin/git.exe -o -e /git/git.exe || {
echo -------------------------------------------------------

cd /git &&
- make install
+ make install -j8

case $? in
0)


* Also, git-completion is hooooooribly slow under wine, so command
prompt becomes significantly more responsive without it.

@@ -140,10 +140,11 @@ EOF
esac


-. /git/contrib/completion/git-completion.bash
+#. /git/contrib/completion/git-completion.bash
PS1='\[\033]0;$MSYSTEM:\w\007
-\033[32m\]\u@\h \[\033[33m\w$(__git_ps1)\033[0m\]
+\033[32m\]\u@\h \[\033[33m\w$\033[0m\]
$ '
+# \033[32m\]\u@\h \[\033[33m\w$(__git_ps1)\033[0m\]

# set default options for 'less'
export LESS=-FRSX


* And for intermixed editing both from under Linux and from under Wine
of the same sources:

diff --git a/etc/gitconfig b/etc/gitconfig
index 5a8c3b3..6f771d5 100644
--- a/etc/gitconfig
+++ b/etc/gitconfig
@@ -1,8 +1,13 @@
[core]
symlinks = false
- autocrlf = true
+ autocrlf = false


* And also just for convenience (we already turned colors on for diff, but
not for other commands):

[color]
diff = auto
+ status = auto
+ branch = auto
+ interactive = true
+[interactive]
+ singlekey = true
[pack]
packSizeLimit = 2g
[help]


> And if it is possible, it would be nice if you contributed your changes
> back to msysGit.

I'm trying. Where possible I send patches to git.git and cc this list.
In this regard I regret msysgit switched from tracking Junio's next to
maint. To me maint is usually too outdated for waiting for patches to
propagate. Maybe tracking Junio's master would be a good balance between
both worlds?

Also, e.g. mine textconv-out-of-the-box and Karsten's UTF-8 patches
specific to msysgit and already sent to the list got zero responce. In
msysgit issue tracker people are told to post patches to the list, but
when they do they are not getting feedback.

I understand we all have just a few bits of spare time, but zero
feedback project-wise can be a huge demotivator.

Thanks,
Kirill


P.S.

Sorry if I'm touching the pain point...

Johannes Schindelin

unread,
Dec 19, 2010, 4:51:46 PM12/19/10
to Kirill Smelkov, Stefan Springer, msysGit
Hi,

On Sun, 19 Dec 2010, Kirill Smelkov wrote:

> On Sat, Dec 18, 2010 at 10:14:56PM +0100, Johannes Schindelin wrote:
>
> > On Sat, 18 Dec 2010, Kirill Smelkov wrote:
> >
> > > P.S. the builds were done under wine, and I had to tweak
> > > installation scripts a bit so the installer could be build at all.
> > > Otherwise it seems to work ok under both wine and windows xp.
> >
> > Would it be possible to adjust the scripts in such a way that they
> > work both in Wine and Windows? It seems that Wine sets the environment
> > variable WINELOADERNOEXEC...
>
> Hi Johannes, thanks for replying.
>
> I think yes it is possible, but special-casing wine is not neccessary
> because we are already 99% there.

Probably we do want a special-casing, as the commented-out line is
responsible for the nice icon of the installer (IIRC).

Would you mind putting the commits on http://repo.or.cz/w/msysgit/kirr.git
for easy cherry-picking/pulling?

> [...]


>
> > And if it is possible, it would be nice if you contributed your
> > changes back to msysGit.
>
> I'm trying. Where possible I send patches to git.git and cc this list.
> In this regard I regret msysgit switched from tracking Junio's next to
> maint. To me maint is usually too outdated for waiting for patches to
> propagate. Maybe tracking Junio's master would be a good balance between
> both worlds?

I have no problem to go back to tracking 'next'. Since msysGit is
experimental (all the expectancies on the bug tracker and especially on
the many blog entries saying how much msysGit sucks to the contrary) it is
my opinion that 'next' is the appropriate branch to track anyway.

> Also, e.g. mine textconv-out-of-the-box and Karsten's UTF-8 patches
> specific to msysgit and already sent to the list got zero responce. In
> msysgit issue tracker people are told to post patches to the list, but
> when they do they are not getting feedback.

I am really sorry. But at some stage, I felt too alone reviewing patches
(and at the same time I got a lot of hate flying my way on the Git mailing
list), and I felt really bad about Git, so the only possible consequence
was to pull out.

It is a pity that not many other people stepped in, but I have to think
about my sanity, I hope you understand.

Could you prepare a to-be-pulled branch incorporating your
textconv-out-of-the-box and the UTF-8 patches, and also a branch of
msysgit.git incorporating your antiword stuff (although I would like the
InnoSetup part separated from the part adding /src/antiword/*)?

Since there are not many people willing (or capable) to review the patch
series, maybe it is better to just take them and give people who are
interested a chance to debug what does not work on their end.

If we do that, however, I am fully prepared to ignore users who think that
any developer including yours truly owes them any free help desk services.

> I understand we all have just a few bits of spare time, but zero
> feedback project-wise can be a huge demotivator.

Yes, I understand. I understand a lot about getting and being demotivated
:-)

If you have any idea how I can help (without decreasing my other-than-git
time budget), please let me know!

Ciao,
Dscho

Pat Thoyts

unread,
Dec 19, 2010, 5:01:13 PM12/19/10
to Johannes Schindelin, Kirill Smelkov, Stefan Springer, msysGit
On 19 December 2010 21:51, Johannes Schindelin
<Johannes....@gmx.de> wrote:
[snip]

>> I'm trying. Where possible I send patches to git.git and cc this list.
>> In this regard I regret msysgit switched from tracking Junio's next to
>> maint.  To me maint is usually too outdated for waiting for patches to
>> propagate. Maybe tracking Junio's master would be a good balance between
>> both worlds?
>
> I have no problem to go back to tracking 'next'. Since msysGit is
> experimental (all the expectancies on the bug tracker and especially on
> the many blog entries saying how much msysGit sucks to the contrary) it is
> my opinion that 'next' is the appropriate branch to track anyway.
>
[snip]

This bit is my fault. When I started doing the rebase-merges it seemed
more logical to be following the current release than the experimental
branch. Also, when I actually prepared a bunch of updates and sent
them upstream to junio he was after patches against master rather than
next.
Ideally we want all the patches applied to junio's repo anyway, so
passing the work up to there is not bad plan. The more patches we have
on msysgit, the more work it is to try passing them upstream.
I've no particular preference to tracking junio/next or junio/maint --
but is next rewound? That could cause problems.

Johannes Schindelin

unread,
Dec 19, 2010, 5:07:37 PM12/19/10
to Kirill Smelkov, Stefan Springer, msysGit
Hi,

On Sun, 19 Dec 2010, Johannes Schindelin wrote:

> On Sun, 19 Dec 2010, Kirill Smelkov wrote:
>
> > I regret msysgit switched from tracking Junio's next to maint. To me
> > maint is usually too outdated for waiting for patches to propagate.
> > Maybe tracking Junio's master would be a good balance between both
> > worlds?
>
> I have no problem to go back to tracking 'next'.

I started a rebasing merge onto 'next'.

BTW I have substantial problems running msysGit under Wine. But then,
maybe Ubuntu's 1.2-0ubuntu6~l is not really new enough?

Ciao,
Dscho

Johannes Schindelin

unread,
Dec 19, 2010, 5:11:22 PM12/19/10
to Pat Thoyts, Kirill Smelkov, Stefan Springer, msysGit
Hi,

No, that should not cause problems. I safeguarded the rebasing-merge.sh
script for that scenario explicitly. The only problem arises when Junio
takes something into 'next', we rebase onto that, and then Junio decides
to drop the patch. Then we have to cherry-pick it again.

Ciao,
Dscho

Johannes Schindelin

unread,
Dec 19, 2010, 5:13:18 PM12/19/10
to Kirill Smelkov, Stefan Springer, msysGit
Hi,

FYI the symptom looks like this (before hogging the CPU until I Ctrl+C):

-- snipsnap --
fixme:netapi32:NetWkstaUserGetInfo Level 1 processing is partially implemented
fixme:advapi:LsaOpenPolicy ((null),0x69f544,0x00000001,0x69f560) stub
fixme:advapi:LsaClose (0xcafe) stub
fixme:advapi:LsaOpenPolicy ((null),0x69f530,0x000f0fff,0x69f52c) stub
fixme:netbios:NetServerEnum Stub ((null) 101 0x69effc -1 0x69f000 0x69f004 8 L"DOMAIN" (nil))
fixme:advapi:LsaClose (0xcafe) stub
-------------------------------------------------------
Building and Installing Git
-------------------------------------------------------
assertion "!inheap (s)" failed: file "../../../../src/winsup/cygwin/cygheap.cc", line 309
assertion "!inheap (s)" failed: file "../../../../src/winsup/cygwin/cygheap.cc", line 309
assertion "!inheap (s)" failed: file "../../../../src/winsup/cygwin/cygheap.cc", line 309

Johannes Schindelin

unread,
Dec 19, 2010, 6:06:13 PM12/19/10
to Kirill Smelkov, Stefan Springer, msysGit
Hi,

On Sun, 19 Dec 2010, Johannes Schindelin wrote:

> On Sun, 19 Dec 2010, Johannes Schindelin wrote:
>
> > On Sun, 19 Dec 2010, Kirill Smelkov wrote:
> >
> > > I regret msysgit switched from tracking Junio's next to maint. To
> > > me maint is usually too outdated for waiting for patches to
> > > propagate. Maybe tracking Junio's master would be a good balance
> > > between both worlds?
> >
> > I have no problem to go back to tracking 'next'.
>
> I started a rebasing merge onto 'next'.

And I finished and pushed it (after running a test on Linux and one on a
Windows machine I am granted a remote login to; on Windows, t3032,
t3509, t4120, t5407, t5526, t7400, t7407, t7508 and some t9* failed,
haven't investigated those failures).

Ciao,
Dscho

Eric Sunshine

unread,
Dec 19, 2010, 6:29:36 PM12/19/10
to Johannes Schindelin, Kirill Smelkov, Stefan Springer, msysGit

Johannes Schindelin

unread,
Dec 19, 2010, 6:49:06 PM12/19/10
to Eric Sunshine, Kirill Smelkov, Stefan Springer, msysGit
Hi,

Thanks. I cherry-picked it back.

Ciao,
Dscho

Kirill Smelkov

unread,
Dec 20, 2010, 4:11:26 AM12/20/10
to Johannes Schindelin, Stefan Springer, msysGit
On Sun, Dec 19, 2010 at 10:51:46PM +0100, Johannes Schindelin wrote:
> Hi,
>
> On Sun, 19 Dec 2010, Kirill Smelkov wrote:
>
> > On Sat, Dec 18, 2010 at 10:14:56PM +0100, Johannes Schindelin wrote:
> >
> > > On Sat, 18 Dec 2010, Kirill Smelkov wrote:
> > >
> > > > P.S. the builds were done under wine, and I had to tweak
> > > > installation scripts a bit so the installer could be build at all.
> > > > Otherwise it seems to work ok under both wine and windows xp.
> > >
> > > Would it be possible to adjust the scripts in such a way that they
> > > work both in Wine and Windows? It seems that Wine sets the environment
> > > variable WINELOADERNOEXEC...
> >
> > Hi Johannes, thanks for replying.
> >
> > I think yes it is possible, but special-casing wine is not neccessary
> > because we are already 99% there.
>
> Probably we do want a special-casing, as the commented-out line is
> responsible for the nice icon of the installer (IIRC).

Yes, I know it is responsible for icon, and no we don't want
special-casing imho. It's just a small bug sitting somewhere, probably
in wine, and we'd better fix it in the first place. I've already had
several such cases and usually it's not too dificult. It's just a matter
of spending some time on it.

> Would you mind putting the commits on http://repo.or.cz/w/msysgit/kirr.git
> for easy cherry-picking/pulling?

Sure, I've prepared my textconv stuff splitted into more patches as you've
asked here:

git://repo.or.cz/msysgit/kirr.git 4dscho/textconv-out-of-the-box

Kirill Smelkov (5):
Add script to compile/install/distribute antiword
WinGit: distribute antiword
Add script to compile/install poppler (for pdftotext)
WinGit: distribute pdftotext + friends
Git/WinGit: Add default textconv for .doc & .pdf

bin/astextplain | 23 ++++
etc/gitattributes | 4 +
etc/gitconfig | 3 +
share/WinGit/copy-files.sh | 7 +-
src/antiword/.gitignore | 1 +
src/antiword/patches/0001-Add-.gititgnore.patch | 23 ++++
...antiword-Tweak-Makefile.Linux-for-msysGit.patch | 50 +++++++
...name-about-and-path-separators-win32-only.patch | 46 +++++++
.../0004-Implement-runtime-prefix-detection.patch | 140 ++++++++++++++++++++
src/antiword/release.sh | 26 ++++
src/poppler/.gitignore | 3 +
src/poppler/release.sh | 55 ++++++++
12 files changed, 379 insertions(+), 2 deletions(-)
create mode 100644 bin/astextplain
create mode 100644 etc/gitattributes
create mode 100644 src/antiword/.gitignore
create mode 100644 src/antiword/patches/0001-Add-.gititgnore.patch
create mode 100644 src/antiword/patches/0002-antiword-Tweak-Makefile.Linux-for-msysGit.patch
create mode 100644 src/antiword/patches/0003-Tech-szBasename-about-and-path-separators-win32-only.patch
create mode 100644 src/antiword/patches/0004-Implement-runtime-prefix-detection.patch
create mode 100644 src/antiword/release.sh
create mode 100644 src/poppler/.gitignore
create mode 100755 src/poppler/release.sh

Will try to do my other bits and Karsten's UTF-8 in the evening (I'm out of
time now and there are conflicts after recent rewind to next).


> > [...]
> >
> > > And if it is possible, it would be nice if you contributed your
> > > changes back to msysGit.
> >
> > I'm trying. Where possible I send patches to git.git and cc this list.
> > In this regard I regret msysgit switched from tracking Junio's next to
> > maint. To me maint is usually too outdated for waiting for patches to
> > propagate. Maybe tracking Junio's master would be a good balance between
> > both worlds?
>
> I have no problem to go back to tracking 'next'. Since msysGit is
> experimental (all the expectancies on the bug tracker and especially on
> the many blog entries saying how much msysGit sucks to the contrary) it is
> my opinion that 'next' is the appropriate branch to track anyway.

Thanks for doing this.

> > Also, e.g. mine textconv-out-of-the-box and Karsten's UTF-8 patches
> > specific to msysgit and already sent to the list got zero responce. In
> > msysgit issue tracker people are told to post patches to the list, but
> > when they do they are not getting feedback.
>
> I am really sorry. But at some stage, I felt too alone reviewing patches
> (and at the same time I got a lot of hate flying my way on the Git mailing
> list), and I felt really bad about Git, so the only possible consequence
> was to pull out.
>
> It is a pity that not many other people stepped in, but I have to think
> about my sanity, I hope you understand.

I think I do. Some of us more or less been through something like that and the
outcome here is that time cures...

> Could you prepare a to-be-pulled branch incorporating your
> textconv-out-of-the-box and the UTF-8 patches, and also a branch of
> msysgit.git incorporating your antiword stuff (although I would like the
> InnoSetup part separated from the part adding /src/antiword/*)?

So far I've prepared only textconv bits (see above). Will try to get UTF-8 and
other things into shape in the evening. Sorry I'd like for you to get it
through 1 pull, but have to run right now.

> Since there are not many people willing (or capable) to review the patch
> series, maybe it is better to just take them and give people who are
> interested a chance to debug what does not work on their end.
>
> If we do that, however, I am fully prepared to ignore users who think that
> any developer including yours truly owes them any free help desk services.

Yes, let's please do it. And the back side of doing things in public is
that you have learn how to filter noise out.


> > I understand we all have just a few bits of spare time, but zero
> > feedback project-wise can be a huge demotivator.
>
> Yes, I understand. I understand a lot about getting and being demotivated
> :-)
>
> If you have any idea how I can help (without decreasing my other-than-git
> time budget), please let me know!

May I suggest to please consider coming back? Not to spend time replying
to useless comments on the issue tracker and deleting them afterwards?
Somehow I understand why several projects don't use bug tracker at
all, but the issue has not technical roots.


> Ciao,
> Dscho

Have a nice day,
Kirill

Kirill Smelkov

unread,
Dec 20, 2010, 3:41:48 AM12/20/10
to Johannes Schindelin, Stefan Springer, msysGit

Yes, 1.2 is too old. I build wine from their git repo from master
branch. E.g. under recent wine-1.3.9-100-ged3ac5f git builds just fine.
msys.bat (through wineconsole cmd /c msys.bat) runs just fine too as
well as gitk and git gui.


Thanks,
Kirill

Johannes Schindelin

unread,
Dec 20, 2010, 5:26:37 AM12/20/10
to Kirill Smelkov, Stefan Springer, msysGit
Hi,

On Mon, 20 Dec 2010, Kirill Smelkov wrote:

> On Sun, Dec 19, 2010 at 10:51:46PM +0100, Johannes Schindelin wrote:
>
> > On Sun, 19 Dec 2010, Kirill Smelkov wrote:
> >
> > > On Sat, Dec 18, 2010 at 10:14:56PM +0100, Johannes Schindelin wrote:
> > >
> > > > On Sat, 18 Dec 2010, Kirill Smelkov wrote:
> > > >
> > > > > P.S. the builds were done under wine, and I had to tweak
> > > > > installation scripts a bit so the installer could be build at
> > > > > all. Otherwise it seems to work ok under both wine and windows
> > > > > xp.
> > > >
> > > > Would it be possible to adjust the scripts in such a way that they
> > > > work both in Wine and Windows? It seems that Wine sets the
> > > > environment variable WINELOADERNOEXEC...
> > >
> > > Hi Johannes, thanks for replying.
> > >
> > > I think yes it is possible, but special-casing wine is not
> > > neccessary because we are already 99% there.
> >
> > Probably we do want a special-casing, as the commented-out line is
> > responsible for the nice icon of the installer (IIRC).
>
> Yes, I know it is responsible for icon, and no we don't want
> special-casing imho. It's just a small bug sitting somewhere, probably
> in wine, and we'd better fix it in the first place. I've already had
> several such cases and usually it's not too dificult. It's just a matter
> of spending some time on it.

Right. I am of two minds here...

Thanks. I merged, installed the two things, and pushed. We might want to
add some stripping code there (as you rightfully remarked in your commit
messages, we are not good with stripping yet).

> Will try to do my other bits and Karsten's UTF-8 in the evening (I'm out
> of time now and there are conflicts after recent rewind to next).

Thanks!

> > If you have any idea how I can help (without decreasing my
> > other-than-git time budget), please let me know!
>
> May I suggest to please consider coming back? Not to spend time replying
> to useless comments on the issue tracker and deleting them afterwards?
> Somehow I understand why several projects don't use bug tracker at all,
> but the issue has not technical roots.

Well, I am delighted that you contribute so much useful stuff!

Thank you,
Dscho

Kirill Smelkov

unread,
Dec 20, 2010, 4:06:49 PM12/20/10
to Johannes Schindelin, Stefan Springer, Karsten Blees, msysGit
[ Cc'ing Karsten because of UTF-8 patches ]

On Mon, Dec 20, 2010 at 11:26:37AM +0100, Johannes Schindelin wrote:
> Hi,
>
> On Mon, 20 Dec 2010, Kirill Smelkov wrote:
>

> [...]


>
> > Yes, I know it is responsible for icon, and no we don't want
> > special-casing imho. It's just a small bug sitting somewhere, probably
> > in wine, and we'd better fix it in the first place. I've already had
> > several such cases and usually it's not too dificult. It's just a matter
> > of spending some time on it.
>
> Right. I am of two minds here...

Ok, I'll try to have a look onto it next time I build the installer.


> > > Would you mind putting the commits on
> > > http://repo.or.cz/w/msysgit/kirr.git for easy cherry-picking/pulling?
> >
> > Sure, I've prepared my textconv stuff splitted into more patches as you've
> > asked here:
> >
> > git://repo.or.cz/msysgit/kirr.git 4dscho/textconv-out-of-the-box
> >
> > Kirill Smelkov (5):
> > Add script to compile/install/distribute antiword
> > WinGit: distribute antiword
> > Add script to compile/install poppler (for pdftotext)
> > WinGit: distribute pdftotext + friends
> > Git/WinGit: Add default textconv for .doc & .pdf

[...]

>
> Thanks. I merged, installed the two things, and pushed.

Thanks a lot!

> We might want to
> add some stripping code there (as you rightfully remarked in your commit
> messages, we are not good with stripping yet).

Yes. If all goes well I might try to look into this next time I add
something big.

> > Will try to do my other bits and Karsten's UTF-8 in the evening (I'm out
> > of time now and there are conflicts after recent rewind to next).
>
> Thanks!

Here is the next round with small bits:

git://repo.or.cz/msysgit/kirr.git 4dscho/misc

Kirill Smelkov (3):


wordpad: On Wine, wordpad.exe is in C;\Windows\System32\

wordpad: Silence unimportant internal ls errors
gitconfig: Turn on colors for status, branch and interactive (add -p)

bin/wordpad | 2 +-
etc/gitconfig | 3 +++
2 files changed, 4 insertions(+), 1 deletions(-)

Unfortunately Karsten's UTF-8 patch1 needs to be ported after win32
opendir/readdir/closedir rework in git.git (d1b6e6e0 "win32: use our own
dirent.h")


Karsten, would you like to have a look into this, since you were the
original poster and probably know this code well?


> > > If you have any idea how I can help (without decreasing my
> > > other-than-git time budget), please let me know!
> >
> > May I suggest to please consider coming back? Not to spend time replying
> > to useless comments on the issue tracker and deleting them afterwards?
> > Somehow I understand why several projects don't use bug tracker at all,
> > but the issue has not technical roots.
>
> Well, I am delighted that you contribute so much useful stuff!

Thanks, I couldn't even dream it will be merged so fast by recent
msysgit standards! Now we need textconv driver for ReleaseNotes.rtf :)
and also support for .mdb is in the work. I'm glad there is something
git-related which made you cheer up a bit.


Thanks again,
Kirill

Johannes Schindelin

unread,
Dec 21, 2010, 6:10:28 AM12/21/10
to Kirill Smelkov, Stefan Springer, Karsten Blees, msysGit
Hi,

On Tue, 21 Dec 2010, Kirill Smelkov wrote:

> Here is the next round with small bits:
>
> git://repo.or.cz/msysgit/kirr.git 4dscho/misc

Thanks, merged and pushed.

> I couldn't even dream it will be merged so fast by recent msysgit
> standards!

Yeah, well... sorry!

> Now we need textconv driver for ReleaseNotes.rtf :) and also support for
> .mdb is in the work. I'm glad there is something git-related which made
> you cheer up a bit.

Heh, good plan!

Ciao,
Dscho

karste...@dcon.de

unread,
Dec 21, 2010, 6:12:33 AM12/21/10
to Kirill Smelkov, Karsten Blees, Johannes Schindelin, msysGit, Stefan Springer, Erik Faye-Lund

Kirill Smelkov <ki...@landau.phys.spbu.ru> wrote on 20.12.2010 22:06:49:

>  [ Cc'ing Karsten because of UTF-8 patches ]


Thanks

[ Cc'ing Erik because of dirent replacement ]

[snip]

>
> Unfortunately Karsten's UTF-8 patch1 needs to be ported after win32
> opendir/readdir/closedir rework in git.git (d1b6e6e0 "win32: use our own
> dirent.h")
>
>
> Karsten, would you like to have a look into this, since you were the
> original poster and probably know this code well?
>

Yes, I've actualy been working on this already...unfortunately, Erik's dirent replacement and mine are vastly different. For now, I just adapted my version from the last unicode patch series to the new dirent.[ch] and improved a bit on the error handling.

Erik, was there a good reason to stick (mostly) to the MINGW implementation (e.g. union trick for compatibility with MINGW's struct dirent, additional GetFileAttributes call etc.)? Postponing actual directory operations to readdir seems to complicate things unnecessarily...

Changes in this version:
- rebased to msysgit 51da7a90 (based on git v1.7.3.4)
- added Tested-by: Kirill Smelkov
- moved unicode dirent replacement to new compat/win32/dirent.[ch]
- improved error handling in opendir (special handling for ENOTDIR,
  checking GetLastError after FindNextFile returns false)
- fixed encoding in gitk file browser
- probably fixed MSVC build errors in winansi.c

Hope you don't mind getting this as bundle, apply with "git pull unicode-v2.bundle kb/unicode:kb/unicode".

Cheers,
Karsten

unicode-v2.bundle

Erik Faye-Lund

unread,
Dec 21, 2010, 7:26:31 AM12/21/10
to karste...@dcon.de, Kirill Smelkov, Karsten Blees, Johannes Schindelin, msysGit, Stefan Springer, Marius Storm-Olsen
[ Cc'ing Marius because of mingw_readdir ]

On Tue, Dec 21, 2010 at 12:12 PM, <karste...@dcon.de> wrote:
>
> Kirill Smelkov <ki...@landau.phys.spbu.ru> wrote on 20.12.2010 22:06:49:
>
>>  [ Cc'ing Karsten because of UTF-8 patches ]
>
> Thanks
>
> [ Cc'ing Erik because of dirent replacement ]
>
> [snip]
>>
>> Unfortunately Karsten's UTF-8 patch1 needs to be ported after win32
>> opendir/readdir/closedir rework in git.git (d1b6e6e0 "win32: use our own
>> dirent.h")
>>
>>
>> Karsten, would you like to have a look into this, since you were the
>> original poster and probably know this code well?
>>
>
> Yes, I've actualy been working on this already...unfortunately, Erik's
> dirent replacement and mine are vastly different. For now, I just adapted my
> version from the last unicode patch series to the new dirent.[ch] and
> improved a bit on the error handling.
>
> Erik, was there a good reason to stick (mostly) to the MINGW implementation
> (e.g. union trick for compatibility with MINGW's struct dirent, additional
> GetFileAttributes call etc.)?

I don't quite understand the question. Are you asking me why I didn't
rewrite the code? If so, the answer is because I simply had no reason
to rewrite it. Why would I? The current code is tested over time, and
writing new code means writing new bugs.

I just moved code around (plus some bug-fixes), we already had all of
this split out between the MinGW and MSVC ports, really. The code from
the MSVC port wasn't tested too well, so it had some bugs. I fixed up
what I could find.

> Postponing actual directory operations to
> readdir seems to complicate things unnecessarily...

This is probably a question where you'd like Marius' input on.

But I don't quite agree with the way you've done it; allocating a list
etc seems like a little too much unnecessary heap-stressing. I'd
rather see this as walking the handle returned by FindFirstFile, but
perhaps the logic can be rewritten to be clearer, I dunno.

However, writing a full review is a bit painful as you sent this as a
bundle - I can't just reply to the parts I dislike.

(And please, don't get me wrong. I like the topic, I just don't
particularly like all the details of the implementation)

Johannes Schindelin

unread,
Dec 21, 2010, 7:45:09 AM12/21/10
to karste...@dcon.de, Kirill Smelkov, Karsten Blees, msysGit, Stefan Springer, Erik Faye-Lund
Hi,

On Tue, 21 Dec 2010, karste...@dcon.de wrote:

> Hope you don't mind getting this as bundle, apply with "git pull
> unicode-v2.bundle kb/unicode:kb/unicode".

Actually, it is better to "git pull unicode-v2.bundle kb/unicode",
especially if you called the branch kb/unicode already, as I did :-)

I pushed out that branch for better visibility. Erik, may I ask you to
merge into 'devel' when all the dirent issues have been hashed out?

Thanks,
Dscho

Kirill Smelkov

unread,
Dec 21, 2010, 9:23:11 AM12/21/10
to wine-p...@winehq.org, Kirill Smelkov, Johannes Schindelin, msy...@googlegroups.com
As stated on MSDN. The fix is needed for InnoSetup's iscc.exe which,
bedore updating, first removes 'MAINICON' this way.
---
dlls/kernel32/resource.c | 25 +++++++++++++++++--------
1 files changed, 17 insertions(+), 8 deletions(-)

diff --git a/dlls/kernel32/resource.c b/dlls/kernel32/resource.c
index 5091804..0123eff 100644
--- a/dlls/kernel32/resource.c
+++ b/dlls/kernel32/resource.c
@@ -705,8 +705,10 @@ static void res_free_str( LPWSTR str )
HeapFree( GetProcessHeap(), 0, str );
}

+/* resdata=NULL + overwrite_existing=TRUE means "remove resource" */
static BOOL update_add_resource( QUEUEDUPDATES *updates, LPCWSTR Type, LPCWSTR Name,
- struct resource_data *resdata, BOOL overwrite_existing )
+ LANGID Lang, struct resource_data *resdata,
+ BOOL overwrite_existing )
{
struct resource_dir_entry *restype, *resname;
struct resource_data *existing;
@@ -736,7 +738,7 @@ static BOOL update_add_resource( QUEUEDUPDATES *updates, LPCWSTR Type, LPCWSTR N
* If there's an existing resource entry with matching (Type,Name,Language)
* it needs to be removed before adding the new data.
*/
- existing = find_resource_data( &resname->children, resdata->lang );
+ existing = find_resource_data( &resname->children, Lang );
if (existing)
{
if (!overwrite_existing)
@@ -745,7 +747,8 @@ static BOOL update_add_resource( QUEUEDUPDATES *updates, LPCWSTR Type, LPCWSTR N
HeapFree( GetProcessHeap(), 0, existing );
}

- add_resource_data_entry( &resname->children, resdata );
+ if (resdata)
+ add_resource_data_entry( &resname->children, resdata );

return TRUE;
}
@@ -1007,7 +1010,7 @@ static BOOL enumerate_mapped_resources( QUEUEDUPDATES *updates,
resdata = allocate_resource_data( Lang, data->CodePage, p, data->Size, FALSE );
if (resdata)
{
- if (!update_add_resource( updates, Type, Name, resdata, FALSE ))
+ if (!update_add_resource( updates, Type, Name, Lang, resdata, FALSE ))
HeapFree( GetProcessHeap(), 0, resdata );
}
}
@@ -1697,10 +1700,16 @@ BOOL WINAPI UpdateResourceW( HANDLE hUpdate, LPCWSTR lpType, LPCWSTR lpName,
updates = GlobalLock(hUpdate);
if (updates)
{
- struct resource_data *data;
- data = allocate_resource_data( wLanguage, 0, lpData, cbData, TRUE );
- if (data)
- ret = update_add_resource( updates, lpType, lpName, data, TRUE );
+ if (lpData==NULL && cbData==0) {
+ /* remove resource */
+ ret = update_add_resource( updates, lpType, lpName, wLanguage, NULL, TRUE );
+ }
+ else {
+ struct resource_data *data;
+ data = allocate_resource_data( wLanguage, 0, lpData, cbData, TRUE );
+ if (data)
+ ret = update_add_resource( updates, lpType, lpName, wLanguage, data, TRUE );
+ }
GlobalUnlock(hUpdate);
}
return ret;
--
1.7.3.4.568.g1b974

Erik Faye-Lund

unread,
Dec 21, 2010, 9:29:42 AM12/21/10
to Kirill Smelkov, wine-p...@winehq.org, Johannes Schindelin, msy...@googlegroups.com
On Tue, Dec 21, 2010 at 3:23 PM, Kirill Smelkov
<ki...@landau.phys.spbu.ru> wrote:
> As stated on MSDN. The fix is needed for InnoSetup's iscc.exe which,
> bedore updating, first removes 'MAINICON' this way.
> ---
>  dlls/kernel32/resource.c |   25 +++++++++++++++++--------
>  1 files changed, 17 insertions(+), 8 deletions(-)
>

Wrong list? Is this intended for Wine?

Johannes Schindelin

unread,
Dec 21, 2010, 9:42:20 AM12/21/10
to Erik Faye-Lund, Kirill Smelkov, wine-p...@winehq.org, msy...@googlegroups.com
Hi,

Correct list. This apparently fixes a bug in Wine requiring a patch for
our InnoSetup script.

Ciao,
Dscho

Kirill Smelkov

unread,
Dec 21, 2010, 9:44:15 AM12/21/10
to Erik Faye-Lund, wine-p...@winehq.org, Johannes Schindelin, msy...@googlegroups.com

It was posted to wine-patches, so where is the question? I've just Cc'ed
msysgit because the patch is related to recent thread here...

Erik Faye-Lund

unread,
Dec 21, 2010, 9:54:10 AM12/21/10
to Kirill Smelkov, wine-p...@winehq.org, Johannes Schindelin, msy...@googlegroups.com
On Tue, Dec 21, 2010 at 3:44 PM, Kirill Smelkov

<ki...@landau.phys.spbu.ru> wrote:
> On Tue, Dec 21, 2010 at 03:29:42PM +0100, Erik Faye-Lund wrote:
>> On Tue, Dec 21, 2010 at 3:23 PM, Kirill Smelkov
>> <ki...@landau.phys.spbu.ru> wrote:
>> > As stated on MSDN. The fix is needed for InnoSetup's iscc.exe which,
>> > bedore updating, first removes 'MAINICON' this way.
>> > ---
>> >  dlls/kernel32/resource.c |   25 +++++++++++++++++--------
>> >  1 files changed, 17 insertions(+), 8 deletions(-)
>> >
>>
>> Wrong list? Is this intended for Wine?
>
> It was posted to wine-patches, so where is the question? I've just Cc'ed
> msysgit because the patch is related to recent thread here...
>

Sorry for the noise. Somehow my mind was completely unable to see that
(despite my eyes doing the effort to look for it). Let's just consider
it temporary insanity ;)

Kirill Smelkov

unread,
Dec 21, 2010, 10:16:00 AM12/21/10
to Erik Faye-Lund, wine-p...@winehq.org, Johannes Schindelin, msy...@googlegroups.com

No problem. But Wine guys will kill us for non-patch traffic on their
patches-only list :) Oh well ...

Johannes Schindelin

unread,
Dec 21, 2010, 3:37:02 PM12/21/10
to Kirill Smelkov, Erik Faye-Lund, msy...@googlegroups.com
Hi,

If they don't want to see how important that patch is, that's their
problem.

:-)

Ciao,
Dscho

Johannes Schindelin

unread,
Dec 21, 2010, 7:53:18 PM12/21/10
to Kirill Smelkov, Stefan Springer, msysGit
Hi,

FWIW the error had nothing to do with 1.2 vs 1.3, but with yet another
instance of the old DLL base address problem. I solved it locally by
calling "rebase -b 0x60000000 msys-1.0.dll" from cmd with the rebase.exe
from the 'full' branch.

Ciao,
Dscho

karste...@dcon.de

unread,
Dec 22, 2010, 8:06:16 PM12/22/10
to kusm...@gmail.com, Karsten Blees, Johannes Schindelin, Kirill Smelkov, Marius Storm-Olsen, msysGit, Stefan Springer

Erik Faye-Lund <kusm...@gmail.com> wrote on 21.12.2010 13:26:31:

> [ Cc'ing Marius because of mingw_readdir ]
>
> On Tue, Dec 21, 2010 at 12:12 PM,  <karste...@dcon.de> wrote:
> >
> > Kirill Smelkov <ki...@landau.phys.spbu.ru> wrote on 20.12.2010 22:06:49:
> >
> >>  [ Cc'ing Karsten because of UTF-8 patches ]
> >
> > Thanks
> >
> > [ Cc'ing Erik because of dirent replacement ]
> >
> > [snip]
> >>
> >> Unfortunately Karsten's UTF-8 patch1 needs to be ported after win32
> >> opendir/readdir/closedir rework in git.git (d1b6e6e0 "win32: use our own
> >> dirent.h")
> >>
> >>
> >> Karsten, would you like to have a look into this, since you were the
> >> original poster and probably know this code well?
> >>
> >
> > Yes, I've actualy been working on this already...unfortunately, Erik's
> > dirent replacement and mine are vastly different. For now, I just adapted my
> > version from the last unicode patch series to the new dirent.[ch] and
> > improved a bit on the error handling.
> >
> > Erik, was there a good reason to stick (mostly) to the MINGW implementation
> > (e.g. union trick for compatibility with MINGW's struct dirent, additional
> > GetFileAttributes call etc.)?
>
> I don't quite understand the question. Are you asking me why I didn't
> rewrite the code? If so, the answer is because I simply had no reason
> to rewrite it. Why would I?

It just seemed strange to me that you would completely rewrite opendir/closedir and leave readdir almost untouched. I would've expected some middle ground, e.g. moving FindFirstFile to opendir instead of adding an extra GetFileAttributes.

Except if you were e.g. planning to port the rest of the MinGW dirent stuff to git, that's why I asked.

> The current code is tested over time, and writing new code means
> writing new bugs.
>

Well, tested over time against the MinGW runtime version, not the rewritten opendir/closedir from the MSVC port (and there's enough new code here).

I'm more the 'less code means less bugs' guy, so I usually go for the simpler/shorter solution if I see any, even if it means a complete rewrite (and I surely introduce new bugs in the process, but at least *less* new bugs :-)

> I just moved code around (plus some bug-fixes), we already had all of
> this split out between the MinGW and MSVC ports, really. The code from
> the MSVC port wasn't tested too well, so it had some bugs. I fixed up
> what I could find.
>

Just looked at the old version...'it had some bugs' is quite an understatement - good job.

> > Postponing actual directory operations to
> > readdir seems to complicate things unnecessarily...
>
> This is probably a question where you'd like Marius' input on.
>

This was necessary to be compatible with the other dirent functions from the MinGW runtime. As these are no longer used...

> But I don't quite agree with the way you've done it; allocating a list
> etc seems like a little too much unnecessary heap-stressing. I'd
> rather see this as walking the handle returned by FindFirstFile, but
> perhaps the logic can be rewritten to be clearer, I dunno.
>

I like freeing kernel resources early, and not relying on clients to call closedir for this (dir.c seems to work the same way, bulk reading into dir_struct.entries). However, the days of overly restrictive handle or memory limits are long past, so this is probably a matter of taste.

Re-implementing rewinddir/telldir/seekdir would be much simpler with the array approach, though.

> However, writing a full review is a bit painful as you sent this as a
> bundle - I can't just reply to the parts I dislike.
>

I switched to bundles because google-groups messed up the threads from git-send-email. How do you send to the list?

Anyway, here's the diff for the dirent part (I already noticed the inline declarations...old C++ habit):
---
 compat/win32/dirent.c |  137 +++++++++--------------
 compat/win32/dirent.h |   11 +-

diff --git a/compat/win32/dirent.c b/compat/win32/dirent.c
index 7a0debe..55fc31c 100644
--- a/compat/win32/dirent.c
+++ b/compat/win32/dirent.c
@@ -1,108 +1,79 @@
-#include "../git-compat-util.h"
-#include "dirent.h"
+#include "../../git-compat-util.h"
+#include <wchar.h>
+#include "../../cache.h"
 
 struct DIR {
-        struct dirent dd_dir; /* includes d_type */
-        HANDLE dd_handle;     /* FindFirstFile handle */
-        int dd_stat;          /* 0-based index */
-        char dd_name[1];      /* extend struct */
+        size_t size, pos;
+        struct dirent **entries;
 };
 
-DIR *opendir(const char *name)
+DIR *opendir(const char *dirname)
 {
-        DWORD attrs = GetFileAttributesA(name);
-        int len;
-        DIR *p;
-
-        /* check for valid path */
-        if (attrs == INVALID_FILE_ATTRIBUTES) {
-                errno = ENOENT;
+        wchar_t wdirname[MAX_PATH];
+        char filename[MAX_PATH];
+        DWORD err;
+        int len = utftowcs(wdirname, dirname, MAX_PATH - 2);
+        if (len < 0)
                 return NULL;
-        }
 
-        /* check if it's a directory */
-        if (!(attrs & FILE_ATTRIBUTE_DIRECTORY)) {
-                errno = ENOTDIR;
+        /* append optional '/' and wildcard '*' */
+        if (len && wdirname[len - 1] != '/' && wdirname[len - 1] != '\\')
+                wdirname[len++] = '/';
+        wdirname[len++] = '*';
+        wdirname[len] = 0;
+
+        WIN32_FIND_DATAW fdata;
+        HANDLE h = FindFirstFileW(wdirname, &fdata);
+        if (h == INVALID_HANDLE_VALUE) {
+                err = GetLastError();
+                errno = (err == ERROR_DIRECTORY) ? ENOTDIR : err_win_to_posix(err);
                 return NULL;
         }
 
-        /* check that the pattern won't be too long for FindFirstFileA */
-        len = strlen(name);
-        if (is_dir_sep(name[len - 1]))
-                len--;
-        if (len + 2 >= MAX_PATH) {
-                errno = ENAMETOOLONG;
-                return NULL;
-        }
+        DIR *dir = xmalloc(sizeof(DIR));
+        dir->pos = dir->size = 0;
+        dir->entries = NULL;
 
-        p = malloc(sizeof(DIR) + len + 2);
-        if (!p)
-                return NULL;
+        int alloc = 0;
+        do {
+                ALLOC_GROW(dir->entries, dir->size + 1, alloc);
+                len = wcstoutf(filename, fdata.cFileName, MAX_PATH);
 
-        memset(p, 0, sizeof(DIR) + len + 2);
-        strcpy(p->dd_name, name);
-        p->dd_name[len] = '/';
-        p->dd_name[len+1] = '*';
+                struct dirent *ent = xmalloc(sizeof(struct dirent) + len);
+                memcpy(&ent->d_name, filename, len + 1);
+                if (fdata.dwFileAttributes & FILE_ATTRIBUTE_DIRECTORY)
+                        ent->d_type = DT_DIR;
+                else
+                        ent->d_type = DT_REG;
 
-        p->dd_handle = INVALID_HANDLE_VALUE;
-        return p;
-}
+                dir->entries[dir->size++] = ent;
 
-struct dirent *readdir(DIR *dir)
-{
-        WIN32_FIND_DATAA buf;
-        HANDLE handle;
+        } while (FindNextFileW(h, &fdata));
+        err = GetLastError();
+        FindClose(h);
 
-        if (!dir || !dir->dd_handle) {
-                errno = EBADF; /* No set_errno for mingw */
+        /* handle FindNextFile errors except ERROR_NO_MORE_FILES */
+        if (err != ERROR_NO_MORE_FILES) {
+                errno = err_win_to_posix(err);
+                closedir(dir);
                 return NULL;
         }
+        return dir;
+}
 
-        if (dir->dd_handle == INVALID_HANDLE_VALUE && dir->dd_stat == 0) {
-                DWORD lasterr;
-                handle = FindFirstFileA(dir->dd_name, &buf);
-                lasterr = GetLastError();
-                dir->dd_handle = handle;
-                if (handle == INVALID_HANDLE_VALUE && (lasterr != ERROR_NO_MORE_FILES)) {
-                        errno = err_win_to_posix(lasterr);
-                        return NULL;
-                }
-        } else if (dir->dd_handle == INVALID_HANDLE_VALUE) {
-                return NULL;
-        } else if (!FindNextFileA(dir->dd_handle, &buf)) {
-                DWORD lasterr = GetLastError();
-                FindClose(dir->dd_handle);
-                dir->dd_handle = INVALID_HANDLE_VALUE;
-                /* POSIX says you shouldn't set errno when readdir can't
-                   find any more files; so, if another error we leave it set. */
-                if (lasterr != ERROR_NO_MORE_FILES)
-                        errno = err_win_to_posix(lasterr);
-                return NULL;
-        }
-
-        /* We get here if `buf' contains valid data.  */
-        strcpy(dir->dd_dir.d_name, buf.cFileName);
-        ++dir->dd_stat;
-
-        /* Set file type, based on WIN32_FIND_DATA */
-        dir->dd_dir.d_type = 0;
-        if (buf.dwFileAttributes & FILE_ATTRIBUTE_DIRECTORY)
-                dir->dd_dir.d_type |= DT_DIR;
-        else
-                dir->dd_dir.d_type |= DT_REG;
-
-        return &dir->dd_dir;
+struct dirent *readdir(DIR *dir)
+{
+        if (dir->pos < dir->size)
+                return dir->entries[(dir->pos)++];
+        return NULL;
 }
 
 int closedir(DIR *dir)
 {
-        if (!dir) {
-                errno = EBADF;
-                return -1;
-        }
-
-        if (dir->dd_handle != INVALID_HANDLE_VALUE)
-                FindClose(dir->dd_handle);
+        int i;
+        for (i = 0; i < dir->size; i++)
+                free(dir->entries[i]);
+        free(dir->entries);
         free(dir);
         return 0;
 }
diff --git a/compat/win32/dirent.h b/compat/win32/dirent.h
index 927a25c..b46255d 100644
--- a/compat/win32/dirent.h
+++ b/compat/win32/dirent.h
@@ -8,13 +8,12 @@ typedef struct DIR DIR;
 #define DT_REG     2
 #define DT_LNK     3
 
+#define NO_D_INO_IN_DIRENT 1
+#define _DIRENT_HAVE_D_TYPE 1
+
 struct dirent {
-        long d_ino;                      /* Always zero. */
-        char d_name[FILENAME_MAX];       /* File name. */
-        union {
-                unsigned short d_reclen; /* Always zero. */
-                unsigned char  d_type;   /* Reimplementation adds this */
-        };
+        unsigned char d_type;
+        char d_name[1];
 };
 
 DIR *opendir(const char *dirname);
--

karste...@dcon.de

unread,
Dec 24, 2010, 9:59:26 AM12/24/10
to kusm...@gmail.com, Karsten Blees, Johannes Schindelin, Kirill Smelkov, Marius Storm-Olsen, msysGit, Stefan Springer
Karsten Blees/DCon wrote on 23.12.2010 02:06:16:
>
> Erik Faye-Lund <kusm...@gmail.com> wrote on 21.12.2010 13:26:31:
>
[snip]
>
> > But I don't quite agree with the way you've done it; allocating a list
> > etc seems like a little too much unnecessary heap-stressing. I'd
> > rather see this as walking the handle returned by FindFirstFile, but
> > perhaps the logic can be rewritten to be clearer, I dunno.
> >
>


OK, I gave the 'walk the handle' approach a shot, moving FindFirstFile and FindClose out of readdir. Here's what I came up with (sorry for the noise).

I also tested that keeping the handle open doesn't impact performance of recursive listings. I would have expected this to suck due to disk seeks when jumping back and forth between parent- and subdirs, but it's exactly identical to the array version, even with cold cache (after reboot).

Changes in v3:
- new dirent version, moved to a separate commit (included as diff for quoting), the rest is as in v2.

Merry Christmas and a happy New Year everyone!
Karsten



---
Unicode file names on Windows (dirent)

Changes opendir/readdir to use Windows Unicode APIs and convert between
UTF-8/UTF-16.

Moves FindFirstFile to opendir, and FindClose to closedir, with the
following implications:
- DIR.dd_name is no longer needed
- chdir(one); opendir(relative); chdir(two); readdir() works as expected
  (i.e. lists one/relative instead of two/relative)
- DIR.dd_handle is a valid handle for the entire lifetime of the DIR
  structure
- thus, all checks for dd_handle == INVALID_HANDLE_VALUE have been removed
- the special case that the directory has been fully read (which was
  previously explicitly tracked with dd_handle == INVALID_HANDLE_VALUE &&
  dd_stat != 0) is now handled implicitly by the FindNextFile error
  handling code (if a client continues to call readdir after receiving
  NULL, FindNextFile will continue to fail with ERROR_NO_MORE_FILES, to
  the same effect)
- extracting dirent data from WIN32_FIND_DATA is needed in two places, so
  moved to its own method
- GetFileAttributes is no longer needed. The same information can be
  obtained from the FindFirstFile error code, which is ERROR_DIRECTORY if
  the name is NOT a directory (-> ENOTDIR), otherwise we can use
  err_win_to_posix (e.g. ERROR_PATH_NOT_FOUND -> ENOENT). The
  ERROR_DIRECTORY case could be fixed in err_win_to_posix, but this
  probably breaks other functionality.

Readdir no longer fails if dd_handle is 0, which is a valid handle value
(at least the documentation says nothing to the contrary).

Removes the ERROR_NO_MORE_FILES check after FindFirstFile (this was
fortunately a NOOP (searching for '*' always finds '.' and '..'),
otherwise the subsequent code would have copied data from an uninitialized
buffer).

There are no proper inodes on Windows, so remove dirent.d_ino and #define
NO_D_INO_IN_DIRENT (this skips e.g. an ineffective qsort in fsck.c).

Removes the union around dirent.d_type (which was necessary for
compatibility with the MinGW dirent runtime, which is no longer used).

Increases the size of dirent.d_name (converting UTF-16 to UTF-8 may grow
by factor three in the worst case).

Signed-off-by: Karsten Blees <bl...@dcon.de>
---
 compat/win32/dirent.c |  107 +++++++++++++++++++++----------------------------
 compat/win32/dirent.h |   10 ++---
 2 files changed, 50 insertions(+), 67 deletions(-)

diff --git a/compat/win32/dirent.c b/compat/win32/dirent.c
index 7a0debe..c399b28 100644
--- a/compat/win32/dirent.c
+++ b/compat/win32/dirent.c
@@ -1,78 +1,74 @@
-#include "../git-compat-util.h"
-#include "dirent.h"
+#include "../../git-compat-util.h"
 
 struct DIR {
         struct dirent dd_dir; /* includes d_type */
         HANDLE dd_handle;     /* FindFirstFile handle */
         int dd_stat;          /* 0-based index */
-        char dd_name[1];      /* extend struct */
 };
 
+static inline void finddata2dirent(struct dirent *ent, WIN32_FIND_DATAW *fdata)
+{
+        /* convert UTF-16 name to UTF-8 */
+        wcstoutf(ent->d_name, fdata->cFileName, sizeof(ent->d_name));
+
+        /* Set file type, based on WIN32_FIND_DATA */
+        if (fdata->dwFileAttributes & FILE_ATTRIBUTE_DIRECTORY)
+                ent->d_type = DT_DIR;
+        else
+                ent->d_type = DT_REG;
+}
+
 DIR *opendir(const char *name)
 {
-        DWORD attrs = GetFileAttributesA(name);
+        wchar_t wname[MAX_PATH];
+        WIN32_FIND_DATAW fdata;
+        HANDLE h;
         int len;
-        DIR *p;
+        DIR *dir;
 
-        /* check for valid path */
-        if (attrs == INVALID_FILE_ATTRIBUTES) {
-                errno = ENOENT;
+        /* convert name to UTF-16, check length (-2 for "/*") */
+        len = utftowcs(wname, name, MAX_PATH - 2);
+        if (len < 0)
                 return NULL;
-        }
-
-        /* check if it's a directory */
-        if (!(attrs & FILE_ATTRIBUTE_DIRECTORY)) {
-                errno = ENOTDIR;
-                return NULL;
-        }
 
-        /* check that the pattern won't be too long for FindFirstFileA */
-        len = strlen(name);
-        if (is_dir_sep(name[len - 1]))
-                len--;
-        if (len + 2 >= MAX_PATH) {
-                errno = ENAMETOOLONG;
+        /* append optional '/' and wildcard '*' */
+        if (len && !is_dir_sep(wname[len - 1]))
+                wname[len++] = '/';
+        wname[len++] = '*';
+        wname[len] = 0;
+
+        /* open find handle */
+        h = FindFirstFileW(wname, &fdata);
+        if (h == INVALID_HANDLE_VALUE) {
+                DWORD err = GetLastError();
+                errno = (err == ERROR_DIRECTORY) ? ENOTDIR : err_win_to_posix(err);
                 return NULL;
         }
 
-        p = malloc(sizeof(DIR) + len + 2);
-        if (!p)
-                return NULL;
-
-        memset(p, 0, sizeof(DIR) + len + 2);
-        strcpy(p->dd_name, name);
-        p->dd_name[len] = '/';
-        p->dd_name[len+1] = '*';
-
-        p->dd_handle = INVALID_HANDLE_VALUE;
-        return p;
+        /* initialize DIR structure and copy first dir entry */
+        dir = xmalloc(sizeof(DIR));
+        dir->dd_handle = h;
+        dir->dd_stat = 0;
+        finddata2dirent(&dir->dd_dir, &fdata);
+        return dir;
 }
 
 struct dirent *readdir(DIR *dir)
 {
-        WIN32_FIND_DATAA buf;
-        HANDLE handle;
+        WIN32_FIND_DATAW fdata;
 
-        if (!dir || !dir->dd_handle) {
+        if (!dir) {
                 errno = EBADF; /* No set_errno for mingw */
                 return NULL;
         }
 
-        if (dir->dd_handle == INVALID_HANDLE_VALUE && dir->dd_stat == 0) {
-                DWORD lasterr;
-                handle = FindFirstFileA(dir->dd_name, &buf);
-                lasterr = GetLastError();
-                dir->dd_handle = handle;
-                if (handle == INVALID_HANDLE_VALUE && (lasterr != ERROR_NO_MORE_FILES)) {
-                        errno = err_win_to_posix(lasterr);
-                        return NULL;
-                }
-        } else if (dir->dd_handle == INVALID_HANDLE_VALUE) {
-                return NULL;
-        } else if (!FindNextFileA(dir->dd_handle, &buf)) {
+        if (!dir->dd_stat) {
+                /* first entry, dirent has already been set up by opendir */
+        } else if (FindNextFileW(dir->dd_handle, &fdata)) {
+                /* FindNextFileW succeeded, convert fdata to dirent */
+                finddata2dirent(&dir->dd_dir, &fdata);
+        } else {
                 DWORD lasterr = GetLastError();
-                FindClose(dir->dd_handle);
-                dir->dd_handle = INVALID_HANDLE_VALUE;
                 /* POSIX says you shouldn't set errno when readdir can't
                    find any more files; so, if another error we leave it set. */
                 if (lasterr != ERROR_NO_MORE_FILES)
@@ -80,17 +76,7 @@ struct dirent *readdir(DIR *dir)
                 return NULL;
         }
 
-        /* We get here if `buf' contains valid data.  */
-        strcpy(dir->dd_dir.d_name, buf.cFileName);
         ++dir->dd_stat;
-
-        /* Set file type, based on WIN32_FIND_DATA */
-        dir->dd_dir.d_type = 0;
-        if (buf.dwFileAttributes & FILE_ATTRIBUTE_DIRECTORY)
-                dir->dd_dir.d_type |= DT_DIR;
-        else
-                dir->dd_dir.d_type |= DT_REG;
-
         return &dir->dd_dir;
 }
 
@@ -101,8 +87,7 @@ int closedir(DIR *dir)
                 return -1;
         }
 
-        if (dir->dd_handle != INVALID_HANDLE_VALUE)
-                FindClose(dir->dd_handle);
+        FindClose(dir->dd_handle);
         free(dir);
         return 0;
 }
diff --git a/compat/win32/dirent.h b/compat/win32/dirent.h
index 927a25c..9e6a0da 100644
--- a/compat/win32/dirent.h
+++ b/compat/win32/dirent.h
@@ -8,13 +8,11 @@ typedef struct DIR DIR;
 #define DT_REG     2
 #define DT_LNK     3
 
+#define NO_D_INO_IN_DIRENT 1
+
 struct dirent {
-        long d_ino;                      /* Always zero. */
-        char d_name[FILENAME_MAX];       /* File name. */
-        union {
-                unsigned short d_reclen; /* Always zero. */
-                unsigned char  d_type;   /* Reimplementation adds this */
-        };
+        unsigned char d_type;      /* file type to prevent lstat after readdir */
+        char d_name[MAX_PATH * 3]; /* file name (* 3 for UTF-8 conversion) */
 };
 
 DIR *opendir(const char *dirname);
--
1.7.3.4.3631.g37a754
unicode-v3.bundle

Erik Faye-Lund

unread,
Dec 28, 2010, 12:05:08 PM12/28/10
to karste...@dcon.de, Karsten Blees, Johannes Schindelin, Kirill Smelkov, Marius Storm-Olsen, msysGit, Stefan Springer
On Fri, Dec 24, 2010 at 3:59 PM, <karste...@dcon.de> wrote:
>
> Karsten Blees/DCon wrote on 23.12.2010 02:06:16:
>>
>> Erik Faye-Lund <kusm...@gmail.com> wrote on 21.12.2010 13:26:31:
>>
> [snip]
>>
>> > But I don't quite agree with the way you've done it; allocating a list
>> > etc seems like a little too much unnecessary heap-stressing. I'd
>> > rather see this as walking the handle returned by FindFirstFile, but
>> > perhaps the logic can be rewritten to be clearer, I dunno.
>> >
>>
>
> OK, I gave the 'walk the handle' approach a shot, moving FindFirstFile and
> FindClose out of readdir. Here's what I came up with (sorry for the noise).
>

Nice! I've reviewed it, and it's much closer to what I'd prefer to
see. Sorry for the somewhat late reply, but I haven't had much
git-time recently ;)

I think it should be split in at least two patches, possibly three:
One patch that does all preparations of changing the logic, one that
does the NO_D_INO_IN_DIRENT-change, and finally one that adds
Unicode-support.

Anyway, code review follows.

> I also tested that keeping the handle open doesn't impact performance of
> recursive listings. I would have expected this to suck due to disk seeks
> when jumping back and forth between parent- and subdirs, but it's exactly
> identical to the array version, even with cold cache (after reboot).
>

Nice!

> Readdir no longer fails if dd_handle is 0, which is a valid handle value
> (at least the documentation says nothing to the contrary).

No, 0 is not a valid handle:
http://msdn.microsoft.com/en-us/library/ms220923(v=VS.90).aspx

But it's not generated by FindFirstFile, so not checking for it is an
improvement.

> Removes the ERROR_NO_MORE_FILES check after FindFirstFile (this was
> fortunately a NOOP (searching for '*' always finds '.' and '..'),
> otherwise the subsequent code would have copied data from an uninitialized
> buffer).
>
> There are no proper inodes on Windows, so remove dirent.d_ino and #define
> NO_D_INO_IN_DIRENT (this skips e.g. an ineffective qsort in fsck.c).
>

Nice, but I think this part deserves a separate patch. More comments
on the implementation.

> Removes the union around dirent.d_type (which was necessary for
> compatibility with the MinGW dirent runtime, which is no longer used).
>
> Increases the size of dirent.d_name (converting UTF-16 to UTF-8 may grow
> by factor three in the worst case).
>

I've questioned (and hopefully answered) the factor of three at the
end of my reply. Perhaps some of the reasoning could be put in the
commit message?

> Signed-off-by: Karsten Blees <bl...@dcon.de>
> ---
>  compat/win32/dirent.c |  107
> +++++++++++++++++++++----------------------------
>  compat/win32/dirent.h |   10 ++---
>  2 files changed, 50 insertions(+), 67 deletions(-)
>
> diff --git a/compat/win32/dirent.c b/compat/win32/dirent.c
> index 7a0debe..c399b28 100644
> --- a/compat/win32/dirent.c
> +++ b/compat/win32/dirent.c
> @@ -1,78 +1,74 @@
> -#include "../git-compat-util.h"
> -#include "dirent.h"
> +#include "../../git-compat-util.h"
>
>  struct DIR {
>          struct dirent dd_dir; /* includes d_type */
>          HANDLE dd_handle;     /* FindFirstFile handle */
>          int dd_stat;          /* 0-based index */
> -        char dd_name[1];      /* extend struct */
>  };

While not needed now, dd_name would be if we decided to implement
rewinddir(). I'm not objecting, just pointing it out.

>
> +static inline void finddata2dirent(struct dirent *ent, WIN32_FIND_DATAW
> *fdata)
> +{
> +        /* convert UTF-16 name to UTF-8 */
> +        wcstoutf(ent->d_name, fdata->cFileName, sizeof(ent->d_name));
> +
> +        /* Set file type, based on WIN32_FIND_DATA */
> +        if (fdata->dwFileAttributes & FILE_ATTRIBUTE_DIRECTORY)
> +                ent->d_type = DT_DIR;
> +        else
> +                ent->d_type = DT_REG;
> +}
> +
>  DIR *opendir(const char *name)
>  {
> -        DWORD attrs = GetFileAttributesA(name);
> +        wchar_t wname[MAX_PATH];
> +        WIN32_FIND_DATAW fdata;
> +        HANDLE h;
>          int len;
> -        DIR *p;
> +        DIR *dir;

Why change the name of this variable?

Why change this logic? Was the old flawed? Sure, not having called
GetFileAttributes means you need to check for len = 0, but otherwise?

And why are you removing the ENAMETOOLONG error-path?

> +        /* open find handle */
> +        h = FindFirstFileW(wname, &fdata);
> +        if (h == INVALID_HANDLE_VALUE) {
> +                DWORD err = GetLastError();
> +                errno = (err == ERROR_DIRECTORY) ? ENOTDIR :
> err_win_to_posix(err);

Nice.

> -        p = malloc(sizeof(DIR) + len + 2);
> -        if (!p)
> -                return NULL;
> -
> -        memset(p, 0, sizeof(DIR) + len + 2);
> -        strcpy(p->dd_name, name);
> -        p->dd_name[len] = '/';
> -        p->dd_name[len+1] = '*';
> -
> -        p->dd_handle = INVALID_HANDLE_VALUE;
> -        return p;
> +        /* initialize DIR structure and copy first dir entry */
> +        dir = xmalloc(sizeof(DIR));

OK, now you're changing the logic to die() instead of returning an
error on out of memory. Not a big deal, but at least worth mentioning
in the commit message.

I did this in an earlier version of the readdir-patches I did, but
ended up deciding against it to be more POSIX-ish.

> +        dir->dd_handle = h;
> +        dir->dd_stat = 0;
> +        finddata2dirent(&dir->dd_dir, &fdata);
> +        return dir;
>  }
>
>  struct dirent *readdir(DIR *dir)
>  {
> -        WIN32_FIND_DATAA buf;
> -        HANDLE handle;
> +        WIN32_FIND_DATAW fdata;

Again, changing the type doesn't justify changing the variable name, does it?

>
> -        if (!dir || !dir->dd_handle) {
> +        if (!dir) {
>                  errno = EBADF; /* No set_errno for mingw */
>                  return NULL;
>          }
>
> -        if (dir->dd_handle == INVALID_HANDLE_VALUE && dir->dd_stat == 0) {
> -                DWORD lasterr;
> -                handle = FindFirstFileA(dir->dd_name, &buf);
> -                lasterr = GetLastError();
> -                dir->dd_handle = handle;
> -                if (handle == INVALID_HANDLE_VALUE && (lasterr !=
> ERROR_NO_MORE_FILES)) {
> -                        errno = err_win_to_posix(lasterr);
> -                        return NULL;
> -                }
> -        } else if (dir->dd_handle == INVALID_HANDLE_VALUE) {
> -                return NULL;
> -        } else if (!FindNextFileA(dir->dd_handle, &buf)) {
> +        if (!dir->dd_stat) {
> +                /* first entry, dirent has already been set up by opendir
> */
> +        } else if (FindNextFileW(dir->dd_handle, &fdata)) {
> +                /* FindNextFileW succeeded, convert fdata to dirent */
> +                finddata2dirent(&dir->dd_dir, &fdata);
> +        } else {

Nice. But I suspect this would be even clearer if you did:

if (dir->dd_stat) {
/* not first entry, dirent has not been set up yet */
if (FindNextFileW(..))
...
else {
...
}
}

> +#define NO_D_INO_IN_DIRENT 1
> +
>  struct dirent {
> -        long d_ino;                      /* Always zero. */

Nice. But I think NO_D_INO_IN_DIRENT should be defined in Makefile
(both for MinGW and MSVC) instead of this header-file.

> -        union {
> -                unsigned short d_reclen; /* Always zero. */
> -                unsigned char  d_type;   /* Reimplementation adds this */
> -        };
> +        unsigned char d_type;      /* file type to prevent lstat after
> readdir */

Nice. d_reclen isn't even defined in POSIX, we don't need it. I think
removing it is worth a mention in the commit message, though.

> -        char d_name[FILENAME_MAX];       /* File name. */

<snip>


> +        char d_name[MAX_PATH * 3]; /* file name (* 3 for UTF-8 conversion)
> */

There's no point in changing from FILENAME_MAX to MAX_PATH, they are
both 260 on Windows.

I'm having a bit of problems understanding the "* 3"-part, so please
allow me to think out loud a bit ;)

The longest UTF-8 encoding is 6 8-bit bytes for one code-point. And
the longest UTF-16 encoding is two 16-bit words for one code point.
But can you simply divide one by the other to get the maximum ratio?
Wikipedia suggests that 16 bits are stored in the first 3 bytes of
UTF-8. But UTF-16 doesn't have full 16 bits utilization of it's first
16-bit word; it has 63486 code-points (U+0000..U+D7FF and
U+E000..U+FFFF). So it seems to me that the * 3 is an over-estimation.
Good.

Also, MAX_PATH is 260, but in reality the limit is DRIVELETTER + ':' +
'\ ' + max-256-character-path + NUL, (see
http://msdn.microsoft.com/en-us/library/aa365247(v=vs.85).aspx)

Because all of the possible drive letters, ':' '\' and NUL is within
ASCII, they'll always be one-byte encodings. So MAX_PATH * 3 is again
an overestimate, "3 + _MAX_FNAME * 3 + 1" would be a slightly more
accurate estimate. But it's uglier and probably not worth the saved
memory.

It COULD be that I was wrong in changing the code from over-allocating
to a fixed-size field. It just becomes a bit more obvious when you
multiply it by three. wasting 260 bytes for something that is often,
let's say, 20-60 bytes (a very unscientific guesstimate :P) isn't too
bad - this usually disappears in memory fragmentation anyway. But 780
bytes feels a bit aggressive. Then again, the code might become more
error-prone, I dunno.

karste...@dcon.de

unread,
Jan 4, 2011, 6:23:12 PM1/4/11
to kusm...@gmail.com, Karsten Blees, Johannes Schindelin, Kirill Smelkov, Marius Storm-Olsen, msysGit, Stefan Springer

Erik Faye-Lund <kusm...@gmail.com> wrote on 28.12.2010 18:05:08:

> On Fri, Dec 24, 2010 at 3:59 PM,  <karste...@dcon.de> wrote:
> >
> > Karsten Blees/DCon wrote on 23.12.2010 02:06:16:
> >>
> >> Erik Faye-Lund <kusm...@gmail.com> wrote on 21.12.2010 13:26:31:
> >>
> > [snip]
> >>
> >> > But I don't quite agree with the way you've done it; allocating a list
> >> > etc seems like a little too much unnecessary heap-stressing. I'd
> >> > rather see this as walking the handle returned by FindFirstFile, but
> >> > perhaps the logic can be rewritten to be clearer, I dunno.
> >> >
> >>
> >
> > OK, I gave the 'walk the handle' approach a shot, moving FindFirstFile and
> > FindClose out of readdir. Here's what I came up with (sorry for the noise).
> >
>
> Nice! I've reviewed it, and it's much closer to what I'd prefer to
> see. Sorry for the somewhat late reply, but I haven't had much
> git-time recently ;)
>

Thanks for the detailed review. I returned from skiing only yesterday, so no git-time here either :-)

> I think it should be split in at least two patches, possibly three:
> One patch that does all preparations of changing the logic, one that
> does the NO_D_INO_IN_DIRENT-change, and finally one that adds
> Unicode-support.
>

I wouldn't want to split this in too many mini-patches. After all, dirent is only a small part of the Unicode patch, and removing d_ino is not a big deal, IMO. I agree on separating preparation and Unicode patch, though (and the preparation patch is now the first in the series so it can be tested with the rest of the IO APIs still using ANSI functions).

> > Readdir no longer fails if dd_handle is 0, which is a valid handle value
> > (at least the documentation says nothing to the contrary).
>
> No, 0 is not a valid handle:
> http://msdn.microsoft.com/en-us/library/ms220923(v=VS.90).aspx
>
> But it's not generated by FindFirstFile, so not checking for it is an
> improvement.
>

Changed the commit message accordingly.
 
> > Increases the size of dirent.d_name (converting UTF-16 to UTF-8 may grow
> > by factor three in the worst case).
> >
>
> I've questioned (and hopefully answered) the factor of three at the
> end of my reply. Perhaps some of the reasoning could be put in the
> commit message?
>

I've documented the maximum space requirements with the wcstoutf and utftowcs functions. See below for the reasoning.

> >          int dd_stat;          /* 0-based index */
> > -        char dd_name[1];      /* extend struct */
> >  };
>
> While not needed now, dd_name would be if we decided to implement
> rewinddir(). I'm not objecting, just pointing it out.
>

...and then it should hold an absolute path to be independant of chdir.

> >          int len;
> > -        DIR *p;
> > +        DIR *dir;
>
> Why change the name of this variable?
>

For consistency with readdir / closedir, and because I prefer meaningful rather than generic variable names. As almost all occurences of the variable are changed anyway, it can be renamed with minimal impact.
Just removing duplicate code. Utftowcs already combines NULL-check (EINVAL), length-check (ENAMETOOLONG), strcpy with conversion and strlen. Now pointed out in the commit message.

Conditionally appending '/' instead of len-- is to handle the empty string case (search for "*" rather than "/*"). This is more fault tolerant than required by POSIX, but should never happen in Git anyway.

> > +        /* initialize DIR structure and copy first dir entry */
> > +        dir = xmalloc(sizeof(DIR));
>
> OK, now you're changing the logic to die() instead of returning an
> error on out of memory. Not a big deal, but at least worth mentioning
> in the commit message.
>
> I did this in an earlier version of the readdir-patches I did, but
> ended up deciding against it to be more POSIX-ish.
>

Well, this is not exactly a POSIX compatibility contest, and these git support functions are meant to be used, aren't they? Considering the error handling in dir.c/read_directory_recursive, I'd rather die() than let git work on an incomplete directory listing. (And no, I don't want to fix up the error handling in dir.c, at least not in the context of this patch series...)

Added to commit message.

> > -        WIN32_FIND_DATAA buf;
> > -        HANDLE handle;
> > +        WIN32_FIND_DATAW fdata;
>
> Again, changing the type doesn't justify changing the variable name, does it?
>

See above.
Definitely, thanks.

> > +#define NO_D_INO_IN_DIRENT 1
> > +
> >  struct dirent {
> > -        long d_ino;                      /* Always zero. */
>
> Nice. But I think NO_D_INO_IN_DIRENT should be defined in Makefile
> (both for MinGW and MSVC) instead of this header-file.
>

Yep.

> > -        union {
> > -                unsigned short d_reclen; /* Always zero. */
> > -                unsigned char  d_type;   /* Reimplementation adds this */
> > -        };
> > +        unsigned char d_type;      /* file type to prevent lstat after
> > readdir */
>
> Nice. d_reclen isn't even defined in POSIX, we don't need it. I think
> removing it is worth a mention in the commit message, though.
>

Added to commit message.

> > -        char d_name[FILENAME_MAX];       /* File name. */
> <snip>
> > +        char d_name[MAX_PATH * 3]; /* file name (* 3 for UTF-8 conversion)
> > */
>
> There's no point in changing from FILENAME_MAX to MAX_PATH, they are
> both 260 on Windows.
>

All the rest of the code (in mingw.c and dirent.c) uses MAX_PATH, not FILENAME_MAX. Furthermore, we're adapting directly to a Windows API, and MAX_PATH is defined in the Windows SDK headers, while FILENAME_MAX is from the stdio API (which we've largely replaced in mingw.c). So I think MAX_PATH is better here, even if both constants have the same value.

> I'm having a bit of problems understanding the "* 3"-part, so please
> allow me to think out loud a bit ;)
>
> The longest UTF-8 encoding is 6 8-bit bytes for one code-point. And
> the longest UTF-16 encoding is two 16-bit words for one code point.
> But can you simply divide one by the other to get the maximum ratio?
> Wikipedia suggests that 16 bits are stored in the first 3 bytes of
> UTF-8. But UTF-16 doesn't have full 16 bits utilization of it's first
> 16-bit word; it has 63486 code-points (U+0000..U+D7FF and
> U+E000..U+FFFF). So it seems to me that the * 3 is an over-estimation.
> Good.
>

Unicode defines code points in range 0..10FFFF, which is also the maximum range that can be encoded in UTF-16. While UTF-8 is prepared to support code points above 10FFFF, the longest *valid* UTF-8 encoding is actually only 4 bytes. And these 4-byte encodings are represented as surrogate pairs in UTF-16, so it's only two UTF-8 bytes per UTF-16 word. The worst case encoding length of * 3 is for code points 0800..FFFF.

  Code point   | UTF-16 | UTF-8 | ratio
---------------+--------+-------+-------
000000..00007F |    1   |   1   |   1
000080..0007FF |    1   |   2   |   2
000800..00FFFF |    1   |   3   |   3
010000..10FFFF |    2   |   4   |   2

> Also, MAX_PATH is 260, but in reality the limit is DRIVELETTER + ':' +
> '\ ' + max-256-character-path + NUL, (see
> http://msdn.microsoft.com/en-us/library/aa365247(v=vs.85).aspx)
>
> Because all of the possible drive letters, ':' '\' and NUL is within
> ASCII, they'll always be one-byte encodings. So MAX_PATH * 3 is again
> an overestimate, "3 + _MAX_FNAME * 3 + 1" would be a slightly more
> accurate estimate. But it's uglier and probably not worth the saved
> memory.
>

The opendir and FindFirstFile APIs only return file/directory names, not paths, so drive letters, ':' and '\' are irrelevant here.

I simply chose MAX_PATH * 3 because WIN32_FIND_DATAW defines "WCHAR cFileName[MAX_PATH];". As this includes the NUL-terminator, which encodes as only one UTF-8 byte, the actual required length would be ((MAX_PATH - 1) * 3) + 1 (so "MAX_PATH * 3" wastes two bytes for simplicity).

I wouldn't want to account for specific file system semantics (as outlined in the article you linked to), just to save a few bytes (or even call GetVolumeInformation to get to the actual lpMaximumComponentLength).

> It COULD be that I was wrong in changing the code from over-allocating
> to a fixed-size field. It just becomes a bit more obvious when you
> multiply it by three. wasting 260 bytes for something that is often,
> let's say, 20-60 bytes (a very unscientific guesstimate :P) isn't too
> bad - this usually disappears in memory fragmentation anyway. But 780
> bytes feels a bit aggressive. Then again, the code might become more
> error-prone, I dunno.

An alternative would be to make d_name a pointer and ALLOC_GROW it as needed, but I don't think that it's worth the trouble. Recursive directory scans will most likely reuse freed DIR structs immediately, so allocating larger fixed sized blocks may even be beneficial over dynamically allocated strings (not that it matters compared to IO costs).

Bye,
Karsten


Changes in v4:
- rebased to current devel (0b44847a)
- separated dirent rewrite and dirent unicode patch
- moved NO_D_INO_IN_DIRENT to Makefile
- improved readdir logic



---
[PATCH 1/7] Prepare dirent for Unicode file name support

Improves the dirent implementation by removing the relics that were once
necessary to plug into the now unused MinGW runtime, in preparation for
Unicode file name support.

Moves FindFirstFile to opendir, and FindClose to closedir, with the
following implications:
- DIR.dd_name is no longer needed
- chdir(one); opendir(relative); chdir(two); readdir() works as expected
  (i.e. lists one/relative instead of two/relative)
- DIR.dd_handle is a valid handle for the entire lifetime of the DIR
  structure
- thus, all checks for dd_handle == INVALID_HANDLE_VALUE and dd_handle == 0
  have been removed
- the special case that the directory has been fully read (which was
  previously explicitly tracked with dd_handle == INVALID_HANDLE_VALUE &&
  dd_stat != 0) is now handled implicitly by the FindNextFile error
  handling code (if a client continues to call readdir after receiving
  NULL, FindNextFile will continue to fail with ERROR_NO_MORE_FILES, to
  the same effect)
- extracting dirent data from WIN32_FIND_DATA is needed in two places, so
  moved to its own method
- GetFileAttributes is no longer needed. The same information can be
  obtained from the FindFirstFile error code, which is ERROR_DIRECTORY if
  the name is NOT a directory (-> ENOTDIR), otherwise we can use
  err_win_to_posix (e.g. ERROR_PATH_NOT_FOUND -> ENOENT). The
  ERROR_DIRECTORY case could be fixed in err_win_to_posix, but this
  probably breaks other functionality.

Removes the ERROR_NO_MORE_FILES check after FindFirstFile (this was
fortunately a NOOP (searching for '*' always finds '.' and '..'),
otherwise the subsequent code would have copied data from an uninitialized
buffer).

Changes malloc to git support function xmalloc, so opendir will die() if
out of memory, rather than failing with ENOMEM and letting git work on
incomplete directory listings (error handling in dir.c is quite sparse).

There are no proper inodes on Windows, so remove dirent.d_ino and #define
NO_D_INO_IN_DIRENT in the Makefile (this skips e.g. an ineffective qsort in
fsck.c).

Removes the union around dirent.d_type and the unused dirent.d_reclen
member (which was necessary for compatibility with the MinGW dirent
runtime, which is no longer used).

Changes FILENAME_MAX to MAX_PATH to conform to the other WIN32 code.

Signed-off-by: Karsten Blees <bl...@dcon.de>
---
 Makefile              |    2 +
 compat/win32/dirent.c |  116 +++++++++++++++++++++++--------------------------
 compat/win32/dirent.h |    8 +---
 3 files changed, 59 insertions(+), 67 deletions(-)

diff --git a/Makefile b/Makefile
index ff35154..f89d81e 100644
--- a/Makefile
+++ b/Makefile
@@ -1094,6 +1094,7 @@ ifeq ($(uname_S),Windows)
         BLK_SHA1 = YesPlease
         NO_POSIX_GOODIES = UnfortunatelyYes
         NATIVE_CRLF = YesPlease
+        NO_D_INO_IN_DIRENT = YesPlease
 
         CC = compat/vcbuild/scripts/clink.pl
         AR = compat/vcbuild/scripts/lib.pl
@@ -1170,6 +1171,7 @@ ifneq (,$(findstring MINGW,$(uname_S)))
         NO_INET_PTON = YesPlease
         NO_INET_NTOP = YesPlease
         NO_POSIX_GOODIES = UnfortunatelyYes
+        NO_D_INO_IN_DIRENT = YesPlease
         COMPAT_CFLAGS += -D__USE_MINGW_ACCESS -DNOGDI -Icompat -Icompat/win32
         COMPAT_CFLAGS += -DSTRIP_EXTENSION=\".exe\"
         COMPAT_OBJS += compat/mingw.o compat/winansi.o \
diff --git a/compat/win32/dirent.c b/compat/win32/dirent.c
index 7a0debe..19ba557 100644
--- a/compat/win32/dirent.c
+++ b/compat/win32/dirent.c
@@ -1,96 +1,91 @@
-#include "../git-compat-util.h"
-#include "dirent.h"
+#include "../../git-compat-util.h"
 
 struct DIR {
         struct dirent dd_dir; /* includes d_type */
         HANDLE dd_handle;     /* FindFirstFile handle */
         int dd_stat;          /* 0-based index */
-        char dd_name[1];      /* extend struct */
 };
 
+static inline void finddata2dirent(struct dirent *ent, WIN32_FIND_DATAA *fdata)
+{
+        /* copy file name from WIN32_FIND_DATA to dirent */
+        memcpy(ent->d_name, fdata->cFileName, sizeof(ent->d_name));
+
+        /* Set file type, based on WIN32_FIND_DATA */
+        if (fdata->dwFileAttributes & FILE_ATTRIBUTE_DIRECTORY)
+                ent->d_type = DT_DIR;
+        else
+                ent->d_type = DT_REG;
+}
+
 DIR *opendir(const char *name)
 {
-        DWORD attrs = GetFileAttributesA(name);
+        char wname[MAX_PATH];
+        WIN32_FIND_DATAA fdata;
+        HANDLE h;
         int len;
-        DIR *p;
+        DIR *dir;
 
-        /* check for valid path */
-        if (attrs == INVALID_FILE_ATTRIBUTES) {
-                errno = ENOENT;
+        /* check that name is not NULL */
+        if (!name) {
+                errno = EINVAL;
                 return NULL;
         }
-
-        /* check if it's a directory */
-        if (!(attrs & FILE_ATTRIBUTE_DIRECTORY)) {
-                errno = ENOTDIR;
-                return NULL;
-        }
-
         /* check that the pattern won't be too long for FindFirstFileA */
         len = strlen(name);
-        if (is_dir_sep(name[len - 1]))
-                len--;
         if (len + 2 >= MAX_PATH) {
                 errno = ENAMETOOLONG;
                 return NULL;
         }
-
-        p = malloc(sizeof(DIR) + len + 2);
-        if (!p)
+        /* copy name to temp buffer */
+        memcpy(wname, name, len + 1);
+
+        /* append optional '/' and wildcard '*' */
+        if (len && !is_dir_sep(wname[len - 1]))
+                wname[len++] = '/';
+        wname[len++] = '*';
+        wname[len] = 0;
+
+        /* open find handle */
+        h = FindFirstFileA(wname, &fdata);
+        if (h == INVALID_HANDLE_VALUE) {
+                DWORD err = GetLastError();
+                errno = (err == ERROR_DIRECTORY) ? ENOTDIR : err_win_to_posix(err);
                 return NULL;
+        }
 
-        memset(p, 0, sizeof(DIR) + len + 2);
-        strcpy(p->dd_name, name);
-        p->dd_name[len] = '/';
-        p->dd_name[len+1] = '*';
-
-        p->dd_handle = INVALID_HANDLE_VALUE;
-        return p;
+        /* initialize DIR structure and copy first dir entry */
+        dir = xmalloc(sizeof(DIR));
+        dir->dd_handle = h;
+        dir->dd_stat = 0;
+        finddata2dirent(&dir->dd_dir, &fdata);
+        return dir;
 }
 
 struct dirent *readdir(DIR *dir)
 {
-        WIN32_FIND_DATAA buf;
-        HANDLE handle;
-
-        if (!dir || !dir->dd_handle) {
+        if (!dir) {
                 errno = EBADF; /* No set_errno for mingw */
                 return NULL;
         }
 
-        if (dir->dd_handle == INVALID_HANDLE_VALUE && dir->dd_stat == 0) {
-                DWORD lasterr;
-                handle = FindFirstFileA(dir->dd_name, &buf);
-                lasterr = GetLastError();
-                dir->dd_handle = handle;
-                if (handle == INVALID_HANDLE_VALUE && (lasterr != ERROR_NO_MORE_FILES)) {
-                        errno = err_win_to_posix(lasterr);
+        /* if first entry, dirent has already been set up by opendir */
+        if (dir->dd_stat) {
+                /* get next entry and convert from WIN32_FIND_DATA to dirent */
+                WIN32_FIND_DATAA fdata;
+                if (FindNextFileA(dir->dd_handle, &fdata)) {
+                        finddata2dirent(&dir->dd_dir, &fdata);
+                } else {
+                        DWORD lasterr = GetLastError();
+                        /* POSIX says you shouldn't set errno when readdir can't
+                           find any more files; so, if another error we leave it set. */
+                        if (lasterr != ERROR_NO_MORE_FILES)
+                                errno = err_win_to_posix(lasterr);
                         return NULL;
                 }
-        } else if (dir->dd_handle == INVALID_HANDLE_VALUE) {
-                return NULL;
-        } else if (!FindNextFileA(dir->dd_handle, &buf)) {
-                DWORD lasterr = GetLastError();
-                FindClose(dir->dd_handle);
-                dir->dd_handle = INVALID_HANDLE_VALUE;
-                /* POSIX says you shouldn't set errno when readdir can't
-                   find any more files; so, if another error we leave it set. */
-                if (lasterr != ERROR_NO_MORE_FILES)
-                        errno = err_win_to_posix(lasterr);
-                return NULL;
         }
 
-        /* We get here if `buf' contains valid data.  */
-        strcpy(dir->dd_dir.d_name, buf.cFileName);
         ++dir->dd_stat;
-
-        /* Set file type, based on WIN32_FIND_DATA */
-        dir->dd_dir.d_type = 0;
-        if (buf.dwFileAttributes & FILE_ATTRIBUTE_DIRECTORY)
-                dir->dd_dir.d_type |= DT_DIR;
-        else
-                dir->dd_dir.d_type |= DT_REG;
-
         return &dir->dd_dir;
 }
 
@@ -101,8 +96,7 @@ int closedir(DIR *dir)
                 return -1;
         }
 
-        if (dir->dd_handle != INVALID_HANDLE_VALUE)
-                FindClose(dir->dd_handle);
+        FindClose(dir->dd_handle);
         free(dir);
         return 0;
 }
diff --git a/compat/win32/dirent.h b/compat/win32/dirent.h
index 927a25c..8838cd6 100644
--- a/compat/win32/dirent.h
+++ b/compat/win32/dirent.h
@@ -9,12 +9,8 @@ typedef struct DIR DIR;
 #define DT_LNK     3
 
 struct dirent {
-        long d_ino;                      /* Always zero. */
-        char d_name[FILENAME_MAX];       /* File name. */
-        union {
-                unsigned short d_reclen; /* Always zero. */
-                unsigned char  d_type;   /* Reimplementation adds this */
-        };
+        unsigned char d_type;      /* file type to prevent lstat after readdir */
+        char d_name[MAX_PATH];     /* file name */
 };
 
 DIR *opendir(const char *dirname);
--
1.7.3.4.3789.g0b448



---
[PATCH 3/7] Unicode file names on Windows (dirent)

Changes opendir/readdir to use Windows Unicode APIs and convert between
UTF-8/UTF-16.

Removes parameter checks that are already covered by utftowcs.

Increases the size of dirent.d_name to accommodate the full
WIN32_FIND_DATA.cFileName converted to UTF-8 (UTF-16 to UTF-8 conversion
may grow by factor three in the worst case).

Signed-off-by: Karsten Blees <bl...@dcon.de>
---
 compat/win32/dirent.c |   31 +++++++++++--------------------
 compat/win32/dirent.h |    2 +-
 2 files changed, 12 insertions(+), 21 deletions(-)

diff --git a/compat/win32/dirent.c b/compat/win32/dirent.c
index 19ba557..2340d8c 100644
--- a/compat/win32/dirent.c
+++ b/compat/win32/dirent.c
@@ -6,10 +6,10 @@ struct DIR {
         int dd_stat;          /* 0-based index */
 };
 
-static inline void finddata2dirent(struct dirent *ent, WIN32_FIND_DATAA *fdata)
+static inline void finddata2dirent(struct dirent *ent, WIN32_FIND_DATAW *fdata)
 {
-        /* copy file name from WIN32_FIND_DATA to dirent */
-        memcpy(ent->d_name, fdata->cFileName, sizeof(ent->d_name));
+        /* convert UTF-16 name to UTF-8 */
+        wcstoutf(ent->d_name, fdata->cFileName, sizeof(ent->d_name));
 
         /* Set file type, based on WIN32_FIND_DATA */
         if (fdata->dwFileAttributes & FILE_ATTRIBUTE_DIRECTORY)
@@ -20,25 +20,16 @@ static inline void finddata2dirent(struct dirent *ent, WIN32_FIND_DATAA *fdata)
 
 DIR *opendir(const char *name)
 {
-        char wname[MAX_PATH];
-        WIN32_FIND_DATAA fdata;
+        wchar_t wname[MAX_PATH];
+        WIN32_FIND_DATAW fdata;
         HANDLE h;
         int len;
         DIR *dir;
 
-        /* check that name is not NULL */
-        if (!name) {
-                errno = EINVAL;
+        /* convert name to UTF-16, check length (-2 for '/' '*') */
+        len = utftowcs(wname, name, MAX_PATH - 2);
+        if (len < 0)
                 return NULL;
-        }
-        /* check that the pattern won't be too long for FindFirstFileA */
-        len = strlen(name);
-        if (len + 2 >= MAX_PATH) {
-                errno = ENAMETOOLONG;
-                return NULL;
-        }
-        /* copy name to temp buffer */
-        memcpy(wname, name, len + 1);
 
         /* append optional '/' and wildcard '*' */
         if (len && !is_dir_sep(wname[len - 1]))
@@ -47,7 +38,7 @@ DIR *opendir(const char *name)
         wname[len] = 0;
 
         /* open find handle */
-        h = FindFirstFileA(wname, &fdata);
+        h = FindFirstFileW(wname, &fdata);
         if (h == INVALID_HANDLE_VALUE) {
                 DWORD err = GetLastError();
                 errno = (err == ERROR_DIRECTORY) ? ENOTDIR : err_win_to_posix(err);
@@ -72,8 +63,8 @@ struct dirent *readdir(DIR *dir)
         /* if first entry, dirent has already been set up by opendir */
         if (dir->dd_stat) {
                 /* get next entry and convert from WIN32_FIND_DATA to dirent */
-                WIN32_FIND_DATAA fdata;
-                if (FindNextFileA(dir->dd_handle, &fdata)) {
+                WIN32_FIND_DATAW fdata;
+                if (FindNextFileW(dir->dd_handle, &fdata)) {
                         finddata2dirent(&dir->dd_dir, &fdata);
                 } else {
                         DWORD lasterr = GetLastError();
diff --git a/compat/win32/dirent.h b/compat/win32/dirent.h
index 8838cd6..058207e 100644
--- a/compat/win32/dirent.h
+++ b/compat/win32/dirent.h
@@ -10,7 +10,7 @@ typedef struct DIR DIR;
 
 struct dirent {
         unsigned char d_type;      /* file type to prevent lstat after readdir */
-        char d_name[MAX_PATH];     /* file name */
+        char d_name[MAX_PATH * 3]; /* file name (* 3 for UTF-8 conversion) */
 };
 
 DIR *opendir(const char *dirname);
--
1.7.3.4.3789.g0b448

unicode-v4.bundle

Johannes Schindelin

unread,
Jan 4, 2011, 6:52:53 PM1/4/11
to karste...@dcon.de, kusm...@gmail.com, Karsten Blees, Kirill Smelkov, Marius Storm-Olsen, msysGit, Stefan Springer
Hi,

On Wed, 5 Jan 2011, karste...@dcon.de wrote:

> I wouldn't want to split this in too many mini-patches. After all,
> dirent is only a small part of the Unicode patch, and removing d_ino is
> not a big deal, IMO. I agree on separating preparation and Unicode
> patch, though (and the preparation patch is now the first in the series
> so it can be tested with the rest of the IO APIs still using ANSI
> functions).

I guess that Erik's experience matches mine: small commits are not only
more readable, but bugs have a much harder time to hide. And probably most
importantly: small patches are way more fun to review...

Don't get me wrong: I am really happy that you put in this tremendous
amount of work. But I have to admit that I ran out of git time about
half-way through your mail :-( This should not stop you, of course; I am
sure that you and Erik finalize this series soon and it will go into
'devel'.

Ciao,
Dscho

Erik Faye-Lund

unread,
Jan 4, 2011, 7:08:26 PM1/4/11
to karste...@dcon.de, Karsten Blees, Johannes Schindelin, Kirill Smelkov, Marius Storm-Olsen, msysGit, Stefan Springer
On Wed, Jan 5, 2011 at 12:23 AM, <karste...@dcon.de> wrote:
>
> Erik Faye-Lund <kusm...@gmail.com> wrote on 28.12.2010 18:05:08:
>
>> On Fri, Dec 24, 2010 at 3:59 PM,  <karste...@dcon.de> wrote:
>> >
>> > Karsten Blees/DCon wrote on 23.12.2010 02:06:16:
>> >>
>> >> Erik Faye-Lund <kusm...@gmail.com> wrote on 21.12.2010 13:26:31:
>> >>
>> > [snip]
>> >>
>> >> > But I don't quite agree with the way you've done it; allocating a
>> >> > list
>> >> > etc seems like a little too much unnecessary heap-stressing. I'd
>> >> > rather see this as walking the handle returned by FindFirstFile, but
>> >> > perhaps the logic can be rewritten to be clearer, I dunno.
>> >> >
>> >>
>> >
>> > OK, I gave the 'walk the handle' approach a shot, moving FindFirstFile
>> > and
>> > FindClose out of readdir. Here's what I came up with (sorry for the
>> > noise).
>> >
>>
>> Nice! I've reviewed it, and it's much closer to what I'd prefer to
>> see. Sorry for the somewhat late reply, but I haven't had much
>> git-time recently ;)
>>
>
> Thanks for the detailed review. I returned from skiing only yesterday, so no
> git-time here either :-)
>

I don't have time for a thorough review right now, but I'll point out
what I spotted. I think your answers to my questions are very
reasonable.

>> I think it should be split in at least two patches, possibly three:
>> One patch that does all preparations of changing the logic, one that
>> does the NO_D_INO_IN_DIRENT-change, and finally one that adds
>> Unicode-support.
>>
>
> I wouldn't want to split this in too many mini-patches. After all, dirent is
> only a small part of the Unicode patch, and removing d_ino is not a big
> deal, IMO. I agree on separating preparation and Unicode patch, though (and
> the preparation patch is now the first in the series so it can be tested
> with the rest of the IO APIs still using ANSI functions).
>

I wouldn't worry too much about the patch-count. While long
patch-series are a bit more intimidating, smaller patches is much
easier to review because the reviewer has less pretext per patch to
keep in mind.

I would actually suggest to have even one more patch; one that does
the consistency fixes (variable renames / FILENAME_MAX->MAX_PATH etc).

But it's totally up to you. I don't think the current state is at all
that bad. The one split you did is by far the most important.

> @@ -1,96 +1,91 @@
> -#include "../git-compat-util.h"
> -#include "dirent.h"
> +#include "../../git-compat-util.h"
>

Our dirent.h is already included through git-compat-util.h; good change.

>  DIR *opendir(const char *name)
>  {
> -        DWORD attrs = GetFileAttributesA(name);
> +        char wname[MAX_PATH];

Shouldn't the w-prefix appear first in the next patch, which changes the type?

Johannes Schindelin

unread,
Jan 6, 2011, 8:41:51 AM1/6/11
to karste...@dcon.de, kusm...@gmail.com, Karsten Blees, Kirill Smelkov, Marius Storm-Olsen, msysGit, Stefan Springer
Hi,

On Thu, 6 Jan 2011, karste...@dcon.de wrote:

> Erik Faye-Lund <kusm...@gmail.com> wrote on 05.01.2011 01:08:26:
>
> > I would actually suggest to have even one more patch; one that does
> > the consistency fixes (variable renames / FILENAME_MAX->MAX_PATH etc).
> >
> > But it's totally up to you. I don't think the current state is at all
> > that bad. The one split you did is by far the most important.
>

> If it's really up to me, I'd like to keep it at that and remember the
> lesson for next time :-) I think there are more pressing issues to spend
> my time on (e.g. Unicode-aware bash and less)

I think it is time to conclude the discussion. IMO the patch series is
important, and Karsten put some serious work into it. Erik, if you're fine
with it, could you apply to 'devel' and push?

Thanks to both of you!
Dscho

Erik Faye-Lund

unread,
Jan 6, 2011, 9:59:26 AM1/6/11
to karste...@dcon.de, Karsten Blees, Johannes Schindelin, Kirill Smelkov, msysGit, Stefan Springer
(Culling Marius from the CC-list, as he keeps bouncing. Does anyone
have an updated e-mail for him?)

On Thu, Jan 6, 2011 at 2:31 PM, <karste...@dcon.de> wrote:


>
> Erik Faye-Lund <kusm...@gmail.com> wrote on 05.01.2011 01:08:26:
>
>> On Wed, Jan 5, 2011 at 12:23 AM,  <karste...@dcon.de> wrote:
>> >
>> > I wouldn't want to split this in too many mini-patches. After all,dirent
>> > is
>> > only a small part of the Unicode patch, and removing d_ino is not a big
>> > deal, IMO. I agree on separating preparation and Unicode patch, though
>> > (and
>> > the preparation patch is now the first in the series so it can be tested
>> > with the rest of the IO APIs still using ANSI functions).
>> >
>>
>> I wouldn't worry too much about the patch-count. While long
>> patch-series are a bit more intimidating, smaller patches is much
>> easier to review because the reviewer has less pretext per patch to
>> keep in mind.
>>
>

> Yes, that's perfectly reasonable from a reviewer's point of view, but from
> an author's perspective, there are also some considerable drawbacks (I'm not
> objecting, just pointing it out :-):
>
> - With long patch series, it's exceedingly more time consuming to build and
> run the test suite for every intermediate state.
>
> - Sometimes you have to throw in additional code that will be rendered
> useless by the next patch, that's just unnecessary noise (e.g. the
> additional error handling in opendir that went away with utftowcs).
>
> - When discovering obvious bugs/inconsistencies in the course of a larger
> change, it's quite disruptive to back out and create a separate patch, so I
> just fixed these inline without further reflection (I think
> FILENAME_MAX->MAX_PATH and removing the static buffer in do_lstat_internal
> are good examples). Unfortunately, 'obvious' is a subjective measure...
>
> - Ripping patches apart as an afterthought, without any real benefit to the
> end result, is really tedious work.
>

(This is 100% philosophical, so don't read too much into it)

A patch is reviewed a lot more times than it's written. And as a
contributer, you're the one relying on other people to review your
work. So I think it's perfectly fine that this burden is on the
contributor.

This of course applies to me as well when I'm contributing patches,
and I know the pain it some times is. But I still think it's worth it.

>> I would actually suggest to have even one more patch; one that does
>> the consistency fixes (variable renames / FILENAME_MAX->MAX_PATH etc).
>>
>> But it's totally up to you. I don't think the current state is at all
>> that bad. The one split you did is by far the most important.
>>
>

> If it's really up to me, I'd like to keep it at that and remember the lesson
> for next time :-) I think there are more pressing issues to spend my time on
> (e.g. Unicode-aware bash and less)
>

OK by me.

>> > @@ -1,96 +1,91 @@
>> > -#include "../git-compat-util.h"
>> > -#include "dirent.h"
>> > +#include "../../git-compat-util.h"
>> >
>>
>> Our dirent.h is already included through git-compat-util.h; good change.
>

> The first thing that puzzled me with your patch was that MinGW's <dirent.h>
> and your "dirent.h" were obviously incompatible, until I discovered
> -Icompat/win32 in the Makefile.

My patch was all about doing our own version, so compatibility with
MinGW-runtime was no longer a goal. It's entirely possible that it
could have been documented a bit better, but this series was rushed a
bit to make sure Junio's 'next' wasn't win32-broken for too long.
sorry about that ;)

> Btw., that change was already in v2 and v3.

Yeah, I just didn't notice it until now.

>>
>> >  DIR *opendir(const char *name)
>> >  {
>> > -        DWORD attrs = GetFileAttributesA(name);
>> > +        char wname[MAX_PATH];
>>
>> Shouldn't the w-prefix appear first in the next patch, which changesthe
>> type?
>

> At the risk of beeing too inventive, if you read 'w' as 'wildcard', it makes
> perfect sense :-)
>

I'm not buying that argument, but I'll gladly fix this up for you :)

Actually, I'm not entirely happy with "wname" for the UTF-16 version
either, because it's a search-pattern (as opposed to a
directory-name). So I'll rename it to "pattern" instead, OK? Then
there's no need to rename it for the following patch...

Thanks for a very enjoyable series :)

Erik Faye-Lund

unread,
Jan 6, 2011, 10:08:23 AM1/6/11
to Johannes Schindelin, karste...@dcon.de, Karsten Blees, Kirill Smelkov, Marius Storm-Olsen, msysGit, Stefan Springer

I think it looks like the series is in OK shape, and the end-result is valuable.

But I've only really reviewed the dirent-stuff myself. Has anyone
reviewed the other patches?

Erik Faye-Lund

unread,
Jan 6, 2011, 11:39:33 AM1/6/11
to Johannes Schindelin, karste...@dcon.de, Karsten Blees, Kirill Smelkov, Marius Storm-Olsen, msysGit, Stefan Springer

Unfortunately, the series does not play well with MSVC:
* There's a few decl-after-stmt's
* some string-concatination mismatches (the warning string in
warn_if_raster_font is a unicode-string attempted concatinated with a
series of ascii-strings...)
* usage of _iob-array (removed in later versions of MSVC)

...And I'm getting linker errors due to ___wgetmainargs. I'm not sure
what can be done about this...

Erik Faye-Lund

unread,
Jan 6, 2011, 1:10:09 PM1/6/11
to Johannes Schindelin, karste...@dcon.de, Karsten Blees, Kirill Smelkov, Marius Storm-Olsen, msysGit, Stefan Springer

Oh yeah, and the "case '0' ... '9':"-construct don't work on MSVC either.

karste...@dcon.de

unread,
Jan 6, 2011, 3:49:22 PM1/6/11
to kusm...@gmail.com, Karsten Blees, Johannes Schindelin, Kirill Smelkov, Marius Storm-Olsen, msysGit, Stefan Springer, patt...@gmail.com

Erik Faye-Lund <kusm...@gmail.com> wrote on 06.01.2011 19:10:09:

> On Thu, Jan 6, 2011 at 5:39 PM, Erik Faye-Lund <kusm...@gmail.com> wrote:
> >>
> >> But I've only really reviewed the dirent-stuff myself. Has anyone
> >> reviewed the other patches?
> >>

I had a discussion with Pat Thoyts about TerminateThread in an earlier version of the thread safe console patch, I don't know if he reviewed the changes in depth [adding CC].

The rest is only tested by Kirill, I think.

> >
> > Unfortunately, the series does not play well with MSVC:


I guess I have to set up an MSVC build environment, after all.

> > * There's a few decl-after-stmt's


Would it be wise to add -Wdeclaration-after-statement (or even -Werror=declaration-after-statement) to the MinGW CFLAGS, then?

> > * some string-concatination mismatches (the warning string in
> > warn_if_raster_font is a unicode-string attempted concatinated with a
> > series of ascii-strings...)
> > * usage of _iob-array (removed in later versions of MSVC)
> >
> > ...And I'm getting linker errors due to ___wgetmainargs. I'm not sure
> > what can be done about this...
> >
>
> Oh yeah, and the "case '0' ... '9':"-construct don't work on MSVC either.

Could you try if that works, until I've got the MSVC stuff set up?

From d98000fd2542fc5f5abc62da641389934d7ea47d Mon Sep 17 00:00:00 2001
From: Karsten Blees <bl...@dcon.de>
Date: Thu, 6 Jan 2011 21:33:38 +0100
Subject: [PATCH] fix MSVC build problems

---
 compat/mingw.c   |   10 ++++++----
 compat/winansi.c |   28 ++++++++++++----------------
 2 files changed, 18 insertions(+), 20 deletions(-)

diff --git a/compat/mingw.c b/compat/mingw.c
index d4c0b71..893c3a7 100644
--- a/compat/mingw.c
+++ b/compat/mingw.c
@@ -146,9 +146,10 @@ void mingw_mark_as_git_dir(const char *dir)
 int mingw_mkdir(const char *path, int mode)
 {
         wchar_t wpath[MAX_PATH];
+        int ret;
         if (utftowcs(wpath, path, MAX_PATH) < 0)
                 return -1;
-        int ret = _wmkdir(wpath);
+        ret = _wmkdir(wpath);
         if (!ret && hide_dotfiles == HIDE_DOTFILES_TRUE) {
                 /*
                  * In Windows a file or dir starting with a dot is not
@@ -574,12 +575,13 @@ int mingw_utime (const char *file_name, const struct utimbuf *times)
 {
         FILETIME mft, aft;
         int fh, rc;
+        DWORD attrs;
         wchar_t wfilename[MAX_PATH];
         if (utftowcs(wfilename, file_name, MAX_PATH) < 0)
                 return -1;
 
         /* must have write permission */
-        DWORD attrs = GetFileAttributesW(wfilename);
+        attrs = GetFileAttributesW(wfilename);
         if (attrs != INVALID_FILE_ATTRIBUTES &&
             (attrs & FILE_ATTRIBUTE_READONLY)) {
                 /* ignore errors here; open() will report them */
@@ -2024,8 +2026,8 @@ typedef struct {
         int newmode;
 } _startupinfo;
 
-extern int __wgetmainargs(int *argc, wchar_t ***argv, wchar_t ***env, int glob,
-                _startupinfo *si);
+_CRTIMP int __cdecl __wgetmainargs(int *argc, wchar_t ***argv, wchar_t ***env,
+                int glob, _startupinfo *si);
 
 void mingw_startup()
 {
diff --git a/compat/winansi.c b/compat/winansi.c
index 7e565ad..5672b20 100644
--- a/compat/winansi.c
+++ b/compat/winansi.c
@@ -73,9 +73,9 @@ static void warn_if_raster_font(void)
 
         if (!(fontFamily & TMPF_TRUETYPE)) {
                 const wchar_t *msg = L"\nWarning: Your console font probably doesn\'t "
-                        "support Unicode. If you experience strange characters in the "
-                        "output, consider switching to a TrueType font such as Lucida "
-                        "Console!\n";
+                        L"support Unicode. If you experience strange characters in the "
+                        L"output, consider switching to a TrueType font such as Lucida "
+                        L"Console!\n";
                 WriteConsoleW(console, msg, wcslen(msg), NULL, NULL);
         }
 }
@@ -354,27 +354,22 @@ static DWORD WINAPI console_thread(LPVOID unused)
 
                         case BRACKET:
                                 /* parse [0-9;]* into array of parameters */
-                                switch (c) {
-                                case '0' ... '9':
+                                if (c >= '0' && c <= '9') {
                                         params[parampos] *= 10;
                                         params[parampos] += c - '0';
-                                        break;
-                                case ';':
+                                } else if (c == ';') {
                                         /* next parameter, bail out if out of bounds */
                                         parampos++;
                                         if (parampos >= MAX_PARAMS)
                                                 state = TEXT;
-                                        break;
-                                case 'q':
+                                } else if (c == 'q') {
                                         /* "\033[q": terminate the thread */
                                         state = EXIT;
-                                        break;
-                                default:
+                                } else {
                                         /* end of escape sequence, change console attributes */
                                         set_attr(c, params, parampos + 1);
                                         start = end;
                                         state = TEXT;
-                                        break;
                                 }
                                 break;
                         }
@@ -431,9 +426,10 @@ static HANDLE duplicate_handle(HANDLE hnd)
         return hresult;
 }
 
-static HANDLE redirect_console(int fd, HANDLE *phcon, int new_fd)
+static HANDLE redirect_console(FILE *stream, HANDLE *phcon, int new_fd)
 {
         /* get original console handle */
+        int fd = _fileno(stream);
         HANDLE hcon = (HANDLE) _get_osfhandle(fd);
         if (hcon == INVALID_HANDLE_VALUE)
                 die_errno("_get_osfhandle(%i) failed", fd);
@@ -446,7 +442,7 @@ static HANDLE redirect_console(int fd, HANDLE *phcon, int new_fd)
                 die_errno("_dup2(%i, %i) failed", new_fd, fd);
 
         /* no buffering, or stdout / stderr will be out of sync */
-        setbuf(&_iob[fd], NULL);
+        setbuf(stream, NULL);
         return (HANDLE) _get_osfhandle(fd);
 }
 
@@ -480,9 +476,9 @@ void winansi_init(void)
 
         /* redirect stdout / stderr to the pipe */
         if (con1)
-                hwrite1 = redirect_console(1, &hconsole1, hwrite_fd);
+                hwrite1 = redirect_console(stdout, &hconsole1, hwrite_fd);
         if (con2)
-                hwrite2 = redirect_console(2, &hconsole2, hwrite_fd);
+                hwrite2 = redirect_console(stderr, &hconsole2, hwrite_fd);
 
         /* close pipe file descriptor (also closes the duped hwrite) */
         close(hwrite_fd);
--
1.7.3.4.3796.g0486.dirty

Johannes Schindelin

unread,
Jan 6, 2011, 8:32:06 PM1/6/11
to karste...@dcon.de, kusm...@gmail.com, Karsten Blees, Kirill Smelkov, Marius Storm-Olsen, msysGit, Stefan Springer, patt...@gmail.com
Hi,

On Thu, 6 Jan 2011, karste...@dcon.de wrote:

> Erik Faye-Lund <kusm...@gmail.com> wrote on 06.01.2011 19:10:09:
>
> > On Thu, Jan 6, 2011 at 5:39 PM, Erik Faye-Lund <kusm...@gmail.com>
> wrote:
> > >>
> > >> But I've only really reviewed the dirent-stuff myself. Has anyone
> > >> reviewed the other patches?
>
> I had a discussion with Pat Thoyts about TerminateThread in an earlier
> version of the thread safe console patch, I don't know if he reviewed
> the changes in depth [adding CC].
>
> The rest is only tested by Kirill, I think.

Well, I don't want this work to go to waste just because nobody reviews
the patches. msysGit is experimental after all (if companies want to use
it in production, they're welcome to pay people real money for maintaining
this beast), so I think that we can just merge the patch series and if
problems arise, I invite everybody to fix the bugs.

> > > Unfortunately, the series does not play well with MSVC:
>
> I guess I have to set up an MSVC build environment, after all.

I heard that there is a free-of-cost version that does not even force you
to register for one month.

> > > * There's a few decl-after-stmt's
>
> Would it be wise to add -Wdeclaration-after-statement (or even
> -Werror=declaration-after-statement) to the MinGW CFLAGS, then?

I thought I cherry-picked that patch already... It's from almost precisely
2 years ago! Applied, thanks for the reminder!

Erik, if the current version compiles with MSVC, could you apply it?

Thanks,
Dscho

Erik Faye-Lund

unread,
Jan 7, 2011, 4:52:32 AM1/7/11
to karste...@dcon.de, Karsten Blees, Johannes Schindelin, Kirill Smelkov, Marius Storm-Olsen, msysGit, Stefan Springer, patt...@gmail.com
On Thu, Jan 6, 2011 at 9:49 PM, <karste...@dcon.de> wrote:
>
> Could you try if that works, until I've got the MSVC stuff set up?
>

Would it be possible for you to re-send this without whitespace
damage? All tabs have turned to spaces, which makes it awkward to
apply...

Johannes Schindelin

unread,
Jan 7, 2011, 5:27:43 AM1/7/11
to Erik Faye-Lund, karste...@dcon.de, Karsten Blees, Kirill Smelkov, Marius Storm-Olsen, msysGit, Stefan Springer, patt...@gmail.com
Hi,

Even more convenient (for application, not so much for review) would be a
public repository from which to pull... I hear github.com is a pretty neat
site :-)

Ciao,
Dscho

karste...@dcon.de

unread,
Jan 7, 2011, 7:18:20 AM1/7/11
to kusm...@gmail.com, Karsten Blees, Johannes Schindelin, Kirill Smelkov, Marius Storm-Olsen, msysGit, patt...@gmail.com, Stefan Springer
Here's last night's result as attachment (setting up MSVC wasn't that difficult after all).


0001-fix-MSVC-build-problems.patch

karste...@dcon.de

unread,
Jan 7, 2011, 7:25:35 AM1/7/11
to Johannes Schindelin, Karsten Blees, Kirill Smelkov, kusm...@gmail.com, Marius Storm-Olsen, msysGit, patt...@gmail.com, Stefan Springer

Johannes Schindelin <Johannes....@gmx.de> wrote on 07.01.2011 02:32:06:

> > > > * There's a few decl-after-stmt's
> >
> > Would it be wise to add -Wdeclaration-after-statement (or even
> > -Werror=declaration-after-statement) to the MinGW CFLAGS, then?
>
> I thought I cherry-picked that patch already... It's from almost precisely
> 2 years ago! Applied, thanks for the reminder!
>

Isn't -Wold-style-definition in combination with -Werror a bit restrictive? This results in quite a few compile errors, mostly in 3rd-party code (nedmalloc, fnmatch etc.).

Johannes Schindelin

unread,
Jan 7, 2011, 7:56:39 AM1/7/11
to karste...@dcon.de, msysGit
Hi,

Oh, you're right. I only test-compiled on Linux. So what would you prefer:
skip -Werror or -Wold-style-definitions? (I'd rather not change nedmalloc
and fnmatch, since we might want to keep up-to-date on those and merging
can be a bitch.)

Ciao,
Dscho

karste...@dcon.de

unread,
Jan 7, 2011, 8:10:47 AM1/7/11
to Johannes Schindelin, msysGit
I don't think we need to check old-style definitions (not even as a warning)...


0001-Allow-old-style-definitions.patch

Erik Faye-Lund

unread,
Jan 7, 2011, 10:12:23 AM1/7/11
to karste...@dcon.de, Karsten Blees, Johannes Schindelin, Kirill Smelkov, Marius Storm-Olsen, msysGit, patt...@gmail.com, Stefan Springer

Thanks. This works apart from this part:

libgit.lib(msvc.obj) : error LNK2001: unresolved external symbol ___wgetmainargs

Erik Faye-Lund

unread,
Jan 7, 2011, 10:30:50 AM1/7/11
to Johannes Schindelin, karste...@dcon.de, Kirill Smelkov, msysGit, Stefan Springer, patt...@gmail.com
On Fri, Jan 7, 2011 at 2:32 AM, Johannes Schindelin
<Johannes....@gmx.de> wrote:
> Hi,
>
> On Thu, 6 Jan 2011, karste...@dcon.de wrote:
>
>> Erik Faye-Lund <kusm...@gmail.com> wrote on 06.01.2011 19:10:09:
>>
>> > On Thu, Jan 6, 2011 at 5:39 PM, Erik Faye-Lund <kusm...@gmail.com>
>> wrote:
>> > >>
>> > >> But I've only really reviewed the dirent-stuff myself. Has anyone
>> > >> reviewed the other patches?
>>
>> I had a discussion with Pat Thoyts about TerminateThread in an earlier
>> version of the thread safe console patch, I don't know if he reviewed
>> the changes in depth [adding CC].
>>
>> The rest is only tested by Kirill, I think.
>
> Well, I don't want this work to go to waste just because nobody reviews
> the patches. msysGit is experimental after all (if companies want to use
> it in production, they're welcome to pay people real money for maintaining
> this beast), so I think that we can just merge the patch series and if
> problems arise, I invite everybody to fix the bugs.
>
>> > > Unfortunately, the series does not play well with MSVC:
>>
>> I guess I have to set up an MSVC build environment, after all.
>
> I heard that there is a free-of-cost version that does not even force you
> to register for one month.
>

Indeed, Visual Studio 2008 Express works fine for Git development, and
can be downloaded from:
http://www.microsoft.com/express/Downloads/

They claim that you should register (which is free) after a month, but
this doesn't seem to be enforced. I've been running it on my netbook
for a couple of months without registering.

> I thought I cherry-picked that patch already... It's from almost precisely
> 2 years ago! Applied, thanks for the reminder!
>
> Erik, if the current version compiles with MSVC, could you apply it?

I think the series needs a second round. I just want to see that the
MSVC-fixes gets squashed appropriately, and that the end result looks
sane. But it seems like the series is definitely getting there.

I do have some objections agains one of the patches, though; the whole
approach of replacing the environment at start-up with an
UTF-8-version seems too intrusive to me. wgetenv + utftowcs should be
sufficient and more importantly: not cost a whole lot of startup-time
for programs that doesn't even consult the environment. Git contains a
LOT of small programs starting each other, and increasing the
startup-time could end up hurting a lot.

karste...@dcon.de

unread,
Jan 7, 2011, 7:10:44 PM1/7/11
to kusm...@gmail.com, Karsten Blees, Johannes Schindelin, Kirill Smelkov, Marius Storm-Olsen, msysGit, patt...@gmail.com, Stefan Springer
Hmm...the Makefile change (i.e. -MT -> -MD, no -NODEFAULTLIB:msvcrt.lib) should have fixed that. Did you do a 'make clean'? I noticed that changing mingw.c doesn't trigger a rebuild of msvc.c, and library references are compiled into the .o files, I think (___wgetmainargs vs. __imp____wgetmainargs makes a huge difference).

Johannes Schindelin

unread,
Jan 7, 2011, 7:52:57 PM1/7/11
to karste...@dcon.de, msysGit
Hi,

On Fri, 7 Jan 2011, karste...@dcon.de wrote:

> Johannes Schindelin <Johannes....@gmx.de> wrote on 07.01.2011

> 13:56:39:


>
> > On Fri, 7 Jan 2011, karste...@dcon.de wrote:
> >
> > > Johannes Schindelin <Johannes....@gmx.de> wrote on 07.01.2011
> > > 02:32:06:
> > >
> > > > > > > * There's a few decl-after-stmt's
> > > > >
> > > > > Would it be wise to add -Wdeclaration-after-statement (or even
> > > > > -Werror=declaration-after-statement) to the MinGW CFLAGS, then?
> > > >
> > > > I thought I cherry-picked that patch already... It's from almost
> > > > precisely 2 years ago! Applied, thanks for the reminder!
> > >
> > > Isn't -Wold-style-definition in combination with -Werror a bit
> > > restrictive? This results in quite a few compile errors, mostly in
> > > 3rd-party code (nedmalloc, fnmatch etc.).
> >
> > Oh, you're right. I only test-compiled on Linux. So what would you
> > prefer: skip -Werror or -Wold-style-definitions? (I'd rather not
> > change nedmalloc and fnmatch, since we might want to keep up-to-date
> > on those and merging can be a bitch.)
>

> I don't think we need to check old-style definitions (not even as a
> warning)...

I just applied a commit that prevents the old-style check for compat/.
This is not set in stone, though...

Ciao,
Dscho

Johannes Schindelin

unread,
Jan 8, 2011, 9:28:16 AM1/8/11
to Kirill Smelkov, Erik Faye-Lund, msy...@googlegroups.com
Hi all,

I just pushed an msysgit.git branch called 'wine' which I need to be able
to even start msysGit under Wine. If nobody objects, I will merge it into
'devel' soon.

On Tue, 21 Dec 2010, Kirill Smelkov wrote:

> > >> On Tue, Dec 21, 2010 at 3:23 PM, Kirill Smelkov
> > >> <ki...@landau.phys.spbu.ru> wrote:
> > >> > As stated on MSDN. The fix is needed for InnoSetup's iscc.exe
> > >> > which, bedore updating, first removes 'MAINICON' this way.

What's the status on that patch?

Thanks,
Dscho

karste...@dcon.de

unread,
Jan 8, 2011, 1:57:55 PM1/8/11
to kusm...@gmail.com, Johannes Schindelin, Kirill Smelkov, msysGit, patt...@gmail.com, Stefan Springer
Erik Faye-Lund <kusm...@gmail.com> wrote on 07.01.2011 16:30:50:
> On Fri, Jan 7, 2011 at 2:32 AM, Johannes Schindelin
> <Johannes....@gmx.de> wrote:
[snip]
> >
> > Erik, if the current version compiles with MSVC, could you apply it?
>
> I think the series needs a second round. I just want to see that the
> MSVC-fixes gets squashed appropriately, and that the end result looks
> sane. But it seems like the series is definitely getting there.
>


Here you go. Tests on MinGW are looking good, but I haven't figured out yet how to run the test suite for the MSVC version.

Changes in v5:
- rebased to current devel (fd6937e0)
- created separate patches for many minor fixes / improvements
- fixed MSVC compile issues
- opendir: renamed wname to pattern
- added unicode-aware mktemp (used by both mkstemp and mkdtemp)
- mingw_startup: initialized _startupinfo before calling __wgetmainargs



> I do have some objections agains one of the patches, though; the whole
> approach of replacing the environment at start-up with an
> UTF-8-version seems too intrusive to me. wgetenv + utftowcs should be
> sufficient and more importantly: not cost a whole lot of startup-time
> for programs that doesn't even consult the environment. Git contains a
> LOT of small programs starting each other, and increasing the
> startup-time could end up hurting a lot.

Doing the conversion on startup is MUCH easier to implement, however. With lazy conversion, as you suggested, we'd have to keep track of every value returned from getenv. Nico Rieck's version followed this approach (see branch work/utf8), but it's not half way there yet (CreateProcess still 100% ANSI, unsetenv mixes ANSI and wide environments, segfaults and memory leaks to fix...).

But I actually didn't check whether my 'faster lookups' scheme would add up, so I threw in a few QueryPerformanceCounter() calls today to see what's going on.

With ~50 environment entries, the additional startup time (including allocation, utf-8 conversion and initial qsort) is ~30µs. Getenv with bsearch is ~0.2µs, while linear search (what _wgetenv would do) is ~2µs (just lookup, without conversion and allocating / tracking returned values yet).

Launching an external git command calls getenv 5 times (2*PATH, 2*GIT_TRACE, 1*GIT_EXEC_PATH), so you're correct that in this scenario, the sorted version is slower (30+5*0.2=31 vs. 5*2=10 µs). Builtin plumbing commands (ls-tree, cat-file...), however, call getenv ~25 times, here the sorted version wins slightly (35 vs. 50 µs). And with builtin porcelain commands (status, log, show...), there are ~45 getenv calls, with the sorted version beeing about twice as fast.

Considering that performance critical commands are usually builtin, and the typical 'LOT of small programs starting each other' case will be porcelain scripts calling builtin plumbing, I think burning time on startup rather than on lookup is actually a Good Thing.

All of these figures are of course negligible compared to starting a process on Windows :-)
unicode-v5.bundle

Johannes Schindelin

unread,
Jan 8, 2011, 2:17:25 PM1/8/11
to karste...@dcon.de, kusm...@gmail.com, Kirill Smelkov, msysGit, patt...@gmail.com, Stefan Springer
Hi,

On Sat, 8 Jan 2011, karste...@dcon.de wrote:

> Erik Faye-Lund <kusm...@gmail.com> wrote on 07.01.2011 16:30:50:
> > On Fri, Jan 7, 2011 at 2:32 AM, Johannes Schindelin
> > <Johannes....@gmx.de> wrote:
> [snip]
> > >
> > > Erik, if the current version compiles with MSVC, could you apply it?
> >
> > I think the series needs a second round. I just want to see that the
> > MSVC-fixes gets squashed appropriately, and that the end result looks
> > sane. But it seems like the series is definitely getting there.
> >
>
> Here you go.

Thanks! I force-pushed a new kb/unicode to repo.or.cz.

> > I do have some objections agains one of the patches, though; the whole
> > approach of replacing the environment at start-up with an
> > UTF-8-version seems too intrusive to me. wgetenv + utftowcs should be
> > sufficient and more importantly: not cost a whole lot of startup-time
> > for programs that doesn't even consult the environment. Git contains a
> > LOT of small programs starting each other, and increasing the
> > startup-time could end up hurting a lot.
>
> Doing the conversion on startup is MUCH easier to implement, however.

I'm more wary of the consequences of double-conversion, but also wary of
the unintended consequences of converting an environment that contains
non-UTF-8 strings.

Your analysis about the performance is impressively complete, though.

Ciao,
Dscho

karste...@dcon.de

unread,
Jan 8, 2011, 4:16:18 PM1/8/11
to Johannes Schindelin, Kirill Smelkov, kusm...@gmail.com, msysGit, patt...@gmail.com, Stefan Springer

Johannes Schindelin <Johannes....@gmx.de> wrote on 08.01.2011 20:17:25:

> Hi,
>
> On Sat, 8 Jan 2011, karste...@dcon.de wrote:
>
> > Erik Faye-Lund <kusm...@gmail.com> wrote on 07.01.2011 16:30:50:
> > > On Fri, Jan 7, 2011 at 2:32 AM, Johannes Schindelin
> > > <Johannes....@gmx.de> wrote:
> > [snip]
> > > >
> > > > Erik, if the current version compiles with MSVC, could you apply it?
> > >
> > > I think the series needs a second round. I just want to see that the
> > > MSVC-fixes gets squashed appropriately, and that the end result looks
> > > sane. But it seems like the series is definitely getting there.
> > >
> >
> > Here you go.
>
> Thanks! I force-pushed a new kb/unicode to repo.or.cz.


Thanks

>
> > > I do have some objections agains one of the patches, though; the whole
> > > approach of replacing the environment at start-up with an
> > > UTF-8-version seems too intrusive to me. wgetenv + utftowcs should be
> > > sufficient and more importantly: not cost a whole lot of startup-time
> > > for programs that doesn't even consult the environment. Git contains a
> > > LOT of small programs starting each other, and increasing the
> > > startup-time could end up hurting a lot.
> >
> > Doing the conversion on startup is MUCH easier to implement, however.
>
> I'm more wary of the consequences of double-conversion, but also wary of
> the unintended consequences of converting an environment that contains
> non-UTF-8 strings.
>


But that wouldn't change with a lazy conversion approach, and also applies to arguments and file names.

Assuming that we get valid UTF-16 from Windows, wcstoutf is straightforward and lossless. Converting back is lossless, too, unless git changes anything, and in this case utftowcs has some tweaks to make the best of invalid UTF-8. The conversion is certainly not symmetric, but that's exactly the central point of issue 80, I think: that you can check out no matter how broken the encoding of the repository may be...

You won't ever get around the double-conversion when porting software using char* to an OS that natively uses wchar_t* (even the Win32 *A functions do that under the hood). And the best we can do with respect to invalid UTF-8 is to treat it the same way in file names, arguments, environment and console output, IMO.

Kirill Smelkov

unread,
Jan 9, 2011, 9:28:32 AM1/9/11
to Johannes Schindelin, Erik Faye-Lund, msy...@googlegroups.com
On Sat, Jan 08, 2011 at 03:28:16PM +0100, Johannes Schindelin wrote:
> Hi all,
>
> I just pushed an msysgit.git branch called 'wine' which I need to be able
> to even start msysGit under Wine. If nobody objects, I will merge it into
> 'devel' soon.

> commit cc878bcc50cc5e79ac29a8844dac0b1c89df9e6b
> Author: Johannes Schindelin <johannes....@gmx.de>
> Date: Sat Jan 8 14:22:30 2011 +0100
>
> Add the rebase.exe tool to be able to fix msys-1.0.dll
>
> Depending on your setup, the base address of msys-1.0.dll might clash
> with other .dll files' addresses. The symptom looks like this:
>
> assertion "!inheap (s)" failed: file "../../../../src/winsup/cygwin/cygheap.cc", line 309
>
> In such a case, simply fire up cmd.exe, change the working directory to
> $MSYSGIT_ROOT/bin and execute something like
>
> ..\mingw\bin\rebase -b 0x67000000 msys-1.0.dll
>
> The rebase.exe tool was cherry-picked from the 'full' branch, where it
> was built from /src/rebase/.

I'd like to first clarify why we need this.

On several x86 hosts with Debian Lenny and Squeeze and Wine regularly
updated from upstream Git for ~ 1 year already I've been able to run
msysGit just fine without rebasing msys-1.0.dll .

Are you maybe on x86-64 or if not, what is your setup?


> On Tue, 21 Dec 2010, Kirill Smelkov wrote:
>
> > > >> On Tue, Dec 21, 2010 at 3:23 PM, Kirill Smelkov
> > > >> <ki...@landau.phys.spbu.ru> wrote:
> > > >> > As stated on MSDN. The fix is needed for InnoSetup's iscc.exe
> > > >> > which, bedore updating, first removes 'MAINICON' this way.
>
> What's the status on that patch?

They've merged it long ago (in usual style of stripping "extra
comments"):

http://source.winehq.org/git/wine.git/?a=commit;h=d5cb11a45a3aab5ac372e571b55e1fc35709f13d

> Thanks,
> Dscho


Thanks,
Kirill


P.S. Sorry for being silent on other topics. Rebuilding gettext was a
major pain which took lots of time...

Johannes Schindelin

unread,
Jan 9, 2011, 10:55:51 AM1/9/11
to Kirill Smelkov, Erik Faye-Lund, msy...@googlegroups.com
Hi,

On Sun, 9 Jan 2011, Kirill Smelkov wrote:

> On Sat, Jan 08, 2011 at 03:28:16PM +0100, Johannes Schindelin wrote:
>
> > I just pushed an msysgit.git branch called 'wine' which I need to be
> > able to even start msysGit under Wine. If nobody objects, I will merge
> > it into 'devel' soon.
>
> > commit cc878bcc50cc5e79ac29a8844dac0b1c89df9e6b
> > Author: Johannes Schindelin <johannes....@gmx.de>
> > Date: Sat Jan 8 14:22:30 2011 +0100
> >
> > Add the rebase.exe tool to be able to fix msys-1.0.dll
> >
> > Depending on your setup, the base address of msys-1.0.dll might clash
> > with other .dll files' addresses. The symptom looks like this:
> >
> > assertion "!inheap (s)" failed: file "../../../../src/winsup/cygwin/cygheap.cc", line 309
> >
> > In such a case, simply fire up cmd.exe, change the working directory to
> > $MSYSGIT_ROOT/bin and execute something like
> >
> > ..\mingw\bin\rebase -b 0x67000000 msys-1.0.dll
> >
> > The rebase.exe tool was cherry-picked from the 'full' branch, where it
> > was built from /src/rebase/.
>
> I'd like to first clarify why we need this.
>
> On several x86 hosts with Debian Lenny and Squeeze and Wine regularly
> updated from upstream Git for ~ 1 year already I've been able to run
> msysGit just fine without rebasing msys-1.0.dll .
>
> Are you maybe on x86-64 or if not, what is your setup?

This is a 32-bit Ubuntu (IIRC 10.04) with a Wine built from 8bcc61fb.

> > On Tue, 21 Dec 2010, Kirill Smelkov wrote:
> >
> > > > >> On Tue, Dec 21, 2010 at 3:23 PM, Kirill Smelkov
> > > > >> <ki...@landau.phys.spbu.ru> wrote:
> > > > >> > As stated on MSDN. The fix is needed for InnoSetup's iscc.exe
> > > > >> > which, bedore updating, first removes 'MAINICON' this way.
> >
> > What's the status on that patch?
>
> They've merged it long ago (in usual style of stripping "extra
> comments"):
>
> http://source.winehq.org/git/wine.git/?a=commit;h=d5cb11a45a3aab5ac372e571b55e1fc35709f13d

Thanks!

> P.S. Sorry for being silent on other topics. Rebuilding gettext was a
> major pain which took lots of time...

Oh wow! Impressive stuff...

Ciao,
Dscho

Erik Faye-Lund

unread,
Jan 9, 2011, 11:13:06 AM1/9/11
to Kirill Smelkov, Johannes Schindelin, msy...@googlegroups.com
On Sun, Jan 9, 2011 at 3:28 PM, Kirill Smelkov <ki...@landau.phys.spbu.ru> wrote:
> P.S. Sorry for being silent on other topics. Rebuilding gettext was a
> major pain which took lots of time...
>

Gettext for msysGit? If so, I hope you were aware that I already
submitted the patches for that - I just haven't merged it yet...

Kirill Smelkov

unread,
Jan 10, 2011, 12:41:52 PM1/10/11
to karste...@dcon.de, kusm...@gmail.com, Johannes Schindelin, msysGit, patt...@gmail.com, Stefan Springer
On Sat, Jan 08, 2011 at 07:57:55PM +0100, karste...@dcon.de wrote:
> Erik Faye-Lund <kusm...@gmail.com> wrote on 07.01.2011 16:30:50:
> > On Fri, Jan 7, 2011 at 2:32 AM, Johannes Schindelin
> > <Johannes....@gmx.de> wrote:
> [snip]
> > >
> > > Erik, if the current version compiles with MSVC, could you apply it?
> >
> > I think the series needs a second round. I just want to see that the
> > MSVC-fixes gets squashed appropriately, and that the end result looks
> > sane. But it seems like the series is definitely getting there.
> >
>
> Here you go. Tests on MinGW are looking good, but I haven't figured out
> yet how to run the test suite for the MSVC version.
>
> Changes in v5:
> - rebased to current devel (fd6937e0)
> - created separate patches for many minor fixes / improvements
> - fixed MSVC compile issues
> - opendir: renamed wname to pattern
> - added unicode-aware mktemp (used by both mkstemp and mkdtemp)
> - mingw_startup: initialized _startupinfo before calling __wgetmainargs

Karsten, thanks for doing this.

I'm slowly starting to play again with this topic so here is the
first nitpick:


> commit e77f07658cc02a7a1f49425d067b32f8a2b87702
> Author: Karsten Blees <bl...@dcon.de>
> Date: Fri Jan 7 20:34:30 2011 +0100
>
> Win32: Unicode file name support (dirent)
>
> Changes opendir/readdir to use Windows Unicode APIs and convert between
> UTF-8/UTF-16.
>
> Removes parameter checks that are already covered by utftowcs.
>
> Increases the size of dirent.d_name to accommodate the full
> WIN32_FIND_DATA.cFileName converted to UTF-8 (UTF-16 to UTF-8 conversion
> may grow by factor three in the worst case).
>
> Signed-off-by: Karsten Blees <bl...@dcon.de>
>
> diff --git a/compat/win32/dirent.c b/compat/win32/dirent.c
> index 82a515c..37f56b7 100644
> --- a/compat/win32/dirent.c
> +++ b/compat/win32/dirent.c
> @@ -6,10 +6,10 @@ struct DIR {

[...]

> diff --git a/compat/win32/dirent.h b/compat/win32/dirent.h
> index 8838cd6..058207e 100644
> --- a/compat/win32/dirent.h
> +++ b/compat/win32/dirent.h
> @@ -10,7 +10,7 @@ typedef struct DIR DIR;
>
> struct dirent {
> unsigned char d_type; /* file type to prevent lstat after readdir */
> - char d_name[MAX_PATH]; /* file name */
> + char d_name[MAX_PATH * 3]; /* file name (* 3 for UTF-8 conversion) */
> };

Is't one UTF-8 character up to 6 bytes long?


Thanks,
Kirill

karste...@dcon.de

unread,
Jan 10, 2011, 1:02:50 PM1/10/11
to Kirill Smelkov, Johannes Schindelin, kusm...@gmail.com, msysGit, patt...@gmail.com, Stefan Springer
> >     unsigned char d_type;      /* file type to prevent lstat afterreaddir */

> > -   char d_name[MAX_PATH];     /* file name */
> > +   char d_name[MAX_PATH * 3]; /* file name (* 3 for UTF-8 conversion) */
> >  };
>
> Is't one UTF-8 character up to 6 bytes long?
>


No, from http://groups.google.com/group/msysgit/msg/b6ac872ef4821ee2 :

Unicode defines code points in range 0..10FFFF, which is also the maximum
range that can be encoded in UTF-16. While UTF-8 is prepared to support
code points above 10FFFF, the longest *valid* UTF-8 encoding is actually
only 4 bytes. And these 4-byte encodings are represented as surrogate
pairs in UTF-16, so it's only two UTF-8 bytes per UTF-16 word. The worst
case encoding length of * 3 is for code points 0800..FFFF.

  Code point   | UTF-16 | UTF-8 | ratio
---------------+--------+-------+-------
000000..00007F |    1   |   1   |   1
000080..0007FF |    1   |   2   |   2
000800..00FFFF |    1   |   3   |   3
010000..10FFFF |    2   |   4   |   2

>
> Thanks,
> Kirill

Erik Faye-Lund

unread,
Jan 10, 2011, 1:12:19 PM1/10/11
to karste...@dcon.de, Kirill Smelkov, Johannes Schindelin, msysGit, patt...@gmail.com, Stefan Springer

So, just to try to boil it down to something simple:
UTF-8 is up to 6 bytes, supporting 1<<32 code-points. But Unicode does
not have any valid code-points above 10FFFF, meaning that any > 3 byte
UTF-8 encoding must be an invalid Unicode code-point. No?

Erik Faye-Lund

unread,
Jan 10, 2011, 1:52:27 PM1/10/11
to karste...@dcon.de, Johannes Schindelin, Kirill Smelkov, msysGit, patt...@gmail.com, Stefan Springer

Uhm, what? Maintaining sorted lists etc is a much bigger hassle than
just converting the input of mingw_getenv to utf-16, looking up using
_wgetenv, converting the result to utf-8. And the same thing for
putenv.

This is the same thing as the ANSI version of the environment does,
only with a different encoding.

> With
> lazy conversion, as you suggested, we'd have to keep track of every value
> returned from getenv. Nico Rieck's version followed this approach (see
> branch work/utf8), but it's not half way there yet (CreateProcess still 100%
> ANSI, unsetenv mixes ANSI and wide environments, segfaults and memory leaks
> to fix...).
>

I... don't think you understood my suggestion. I don't suggest we
replace the environment, just that we wrap it up (as we already do for
getenv) and convert the encoding.

Looking a bit deeper, I've found another thing I'm not entirely happy
with: this adds yet another set of UTF-16 <-> UTF-8 recoding paths. We
already have WideCharToMultiByte/MultiByteToWideChar from winapi and
reencode_string from utf8.c. Why yet another?

Kirill Smelkov

unread,
Jan 10, 2011, 1:55:43 PM1/10/11
to Erik Faye-Lund, karste...@dcon.de, Johannes Schindelin, msysGit, patt...@gmail.com, Stefan Springer

Sorry for not looking around before posting, but then imho it would be
better to put this more explicit explanation somewhere into code
comments and also mention it in the patch log.


Thanks,
Kirill

Kirill Smelkov

unread,
Jan 10, 2011, 2:18:53 PM1/10/11
to Johannes Schindelin, Erik Faye-Lund, msy...@googlegroups.com

Hmm, strange. But now with your changes it crashes on my side:

kirr@mini:~/src/tools/git/msysgit$ wine cmd /c msys.bat
0 [main] us 0 init_cheap: VirtualAlloc pointer is null, Win32 error 487
AllocationBase 0x7E475000, BaseAddress 0x7E540000, RegionSize 0xD0000, State 0x2000
Z:\home\kirr\src\tools\git\msysgit\bin\bash.exe: *** Couldn't reserve space for cygwin's heap, Win32 error 487


I guess this needs to be investigated further and either wine or msys to
be fixed...


> > > On Tue, 21 Dec 2010, Kirill Smelkov wrote:

[...]

> > P.S. Sorry for being silent on other topics. Rebuilding gettext was a
> > major pain which took lots of time...
>
> Oh wow! Impressive stuff...

Not quite so. Just running configure takes ~ 4 hours under wine. After
spending ~1.5 days on this I decided to give up and just install
pre-compiled gettext...


On Sun, Jan 09, 2011 at 05:13:06PM +0100, Erik Faye-Lund wrote:


> On Sun, Jan 9, 2011 at 3:28 PM, Kirill Smelkov <ki...@landau.phys.spbu.ru> wrote:
> > P.S. Sorry for being silent on other topics. Rebuilding gettext was a
> > major pain which took lots of time...
> >
>

> Gettext for msysGit? If so, I hope you were aware that I already
> submitted the patches for that - I just haven't merged it yet...

No, I wasn't and thanks for pointing it out.

I need gettext for glib which is needed by mdbtools which is going to be
(I hope) textconv driver for .mdb (ms access) files.

So let's maybe merge your work/gettext branch without waiting for git to
require gettext? I'll have a base to continue and we'll already have
some running time testing gettext bits when git.git merges ab/i18n...


Thanks,
Kirill

Kirill Smelkov

unread,
Jan 10, 2011, 3:43:29 PM1/10/11
to karste...@dcon.de, Johannes Schindelin, Erik Faye-Lund, msysGit, patt...@gmail.com, Stefan Springer

Yes, I saw this.

> I didn't think it would be necessary to
> document the reasoning behind it (after all UTF-x to UTF-y is only
> shifting bits around, so it's actually pretty straightforward). I can add
> that, however, if you think it's necessary.

Yes, please do so.

As I see it, the question was brought up by two people already (Erik and
me), so to avoid confusion for others working on the code, it would be
better to document the rationale.


Thanks,
Kirill

karste...@dcon.de

unread,
Jan 10, 2011, 5:41:44 PM1/10/11
to kusm...@gmail.com, Johannes Schindelin, Kirill Smelkov, msysGit, patt...@gmail.com, Stefan Springer
No it's not. Getenv is supposed to return a pointer to a string owned by the runtime (usually a pointer into char **environ). When you convert to UTF-8 on demand, you need to keep track of the allocated memory for the converted UTF-8 strings, as user code is not supposed to free() it. This means that you have to maintain two environment lists (the first beeing the original wchar_t environment, the second containing all values that have already been converted and returned from getenv). And all environment functions need to query / update / merge both lists, which is MUCH more complex and error-prone than maintaining only a single list. The sorting we get for free (in terms of complexity, not necessarily performance), as it is required by CreateProcess anyway.

And no, using a static (or even local) buffer for the getenv return value is not an option. While POSIX allows this ("The string pointed to may be overwritten by a subsequent call to getenv()..."), it is incompatible with near every usage in git. E.g. (from ident.c:254):
return fmt_ident(getenv("GIT_AUTHOR_NAME"),
         getenv("GIT_AUTHOR_EMAIL"),
         getenv("GIT_AUTHOR_DATE"),
         flag);

Please have a look at Nico's patch in work/utf8 (bcce1013). While beeing far from complete, it pretty much shows where the 'convert to utf-8 in getenv' approach leads to.

> > With
> > lazy conversion, as you suggested, we'd have to keep track of every value
> > returned from getenv. Nico Rieck's version followed this approach (see
> > branch work/utf8), but it's not half way there yet (CreateProcess still 100%
> > ANSI, unsetenv mixes ANSI and wide environments, segfaults and memory leaks
> > to fix...).
> >
>
> I... don't think you understood my suggestion. I don't suggest we
> replace the environment, just that we wrap it up (as we already do for
> getenv) and convert the encoding.
>


I realize my explanation could have been clearer...

> Looking a bit deeper, I've found another thing I'm not entirely happy
> with: this adds yet another set of UTF-16 <-> UTF-8 recoding paths. We
> already have WideCharToMultiByte/MultiByteToWideChar from winapi and
> reencode_string from utf8.c. Why yet another?


The wcstoutf function is only a small wrapper around WideCharToMultiByte, to adapt to errno and return the correct string length (without trailing \0).

There is no equivalent for utftowcs, though, as both iconv and MultiByteToWideChar fail badly for invalid UTF-8 (and MultiByteToWideChar in different ways, depending on Windows version). While msysgit is exprimental, it would be quite cruel for devoted long-time users if they could no longer use their old, ANSI-encoded msysgit repositories. (This is all explained in the commit message, btw.)

Erik Faye-Lund

unread,
Jan 10, 2011, 6:32:14 PM1/10/11
to karste...@dcon.de, Johannes Schindelin, Kirill Smelkov, msysGit, patt...@gmail.com, Stefan Springer
On Mon, Jan 10, 2011 at 11:41 PM, <karste...@dcon.de> wrote:

>
> Erik Faye-Lund <kusm...@gmail.com> wrote on 10.01.2011 19:52:27:
>>
>> Uhm, what? Maintaining sorted lists etc is a much bigger hassle than
>> just converting the input of mingw_getenv to utf-16, looking up using
>> _wgetenv, converting the result to utf-8. And the same thing for
>> putenv.
>>
>> This is the same thing as the ANSI version of the environment does,
>> only with a different encoding.
>>
>
> No it's not. Getenv is supposed to return a pointer to a string owned by the
> runtime (usually a pointer into char **environ). When you convert to UTF-8
> on demand, you need to keep track of the allocated memory for the converted
> UTF-8 strings, as user code is not supposed to free() it. This means that
> you have to maintain two environment lists (the first beeing the original
> wchar_t environment, the second containing all values that have already been
> converted and returned from getenv). And all environment functions need to
> query / update / merge both lists, which is MUCH more complex and
> error-prone than maintaining only a single list.

Wrong. POSIX says the following: "The return value from getenv() may
point to static data which may be overwritten by subsequent calls to
getenv()". Simply converting into a static buffer and returning it is
OK.

See http://pubs.opengroup.org/onlinepubs/009695399/functions/getenv.html
for details.

> The sorting we get for free
> (in terms of complexity, not necessarily performance), as it is required by
> CreateProcess anyway.
>
> And no, using a static (or even local) buffer for the getenv return value is
> not an option. While POSIX allows this ("The string pointed to may be
> overwritten by a subsequent call to getenv()..."), it is incompatible with
> near every usage in git. E.g. (from ident.c:254):
> return fmt_ident(getenv("GIT_AUTHOR_NAME"),
>          getenv("GIT_AUTHOR_EMAIL"),
>          getenv("GIT_AUTHOR_DATE"),
>          flag);
>

Patches fixing this would be nice, I'm sure it'd be appreciated upstreams.

Or we could have a couple of buffers we cycle through (like we do in
some other places already).

> Please have a look at Nico's patch in work/utf8 (bcce1013). While beeing far
> from complete, it pretty much shows where the 'convert to utf-8 in getenv'
> approach leads to.
>

I think this approach is better than what you're proposing, as it's
less intrusive as it only affects the functions in question. But I
think cycling through a set of buffers would be even better.

In fact, I don't really see the point in caching the result either. Is
git really fetching the same environment variables from many different
places without modifying it? And even if it is, is it affecting
performance enough to justify complicating the code?

>> Looking a bit deeper, I've found another thing I'm not entirely happy
>> with: this adds yet another set of UTF-16 <-> UTF-8 recoding paths. We
>> already have WideCharToMultiByte/MultiByteToWideChar from winapi and
>> reencode_string from utf8.c. Why yet another?
>
> The wcstoutf function is only a small wrapper around WideCharToMultiByte, to
> adapt to errno and return the correct string length (without trailing \0).
>

Yeah, I saw that after posting... I don't quite agree that an internal
function should modify errno like that, though (and it prevents you
from querying the length of the string by barfing on "!utf"). And with
the only useful difference left being modifying semantics, I find that
it hides more than it solves (I had to read the code to figure out
it's semantics, I already know WideCharToMultiByte's semantics by
heart).

> There is no equivalent for utftowcs, though, as both iconv and
> MultiByteToWideChar fail badly for invalid UTF-8 (and MultiByteToWideChar in
> different ways, depending on Windows version). While msysgit is exprimental,
> it would be quite cruel for devoted long-time users if they could no longer
> use their old, ANSI-encoded msysgit repositories. (This is all explained in
> the commit message, btw.)
>

Well, it wasn't quite explained well enough for me to understand it.
Are you saying that since i18n.commitEncoding defaults to "utf-8" and
some people have repos with ANSI (I guess you mean Windows 1252 and/or
ISO-8859-1) that out-date i18n.commitEncoding (or Git for Windows'
honoring of it), we should try to detect that and show the characters
as "ANSI"?

If that is the case, my biggest wonder is how this works on Linux.
Perhaps iconv already tries to be graceful? If so, reencode_string
should do the trick, no?

Johannes Schindelin

unread,
Jan 10, 2011, 7:24:43 PM1/10/11
to Kirill Smelkov, Erik Faye-Lund, msy...@googlegroups.com
Hi,

On Mon, 10 Jan 2011, Kirill Smelkov wrote:

> On Sun, Jan 09, 2011 at 04:55:51PM +0100, Johannes Schindelin wrote:
>
> > This is a 32-bit Ubuntu (IIRC 10.04) with a Wine built from 8bcc61fb.
>
> Hmm, strange. But now with your changes it crashes on my side:
>
> kirr@mini:~/src/tools/git/msysgit$ wine cmd /c msys.bat
> 0 [main] us 0 init_cheap: VirtualAlloc pointer is null, Win32 error 487
> AllocationBase 0x7E475000, BaseAddress 0x7E540000, RegionSize 0xD0000, State 0x2000
> Z:\home\kirr\src\tools\git\msysgit\bin\bash.exe: *** Couldn't reserve space for cygwin's heap, Win32 error 487

Could you find out what memory ranges would be okay on your side?
Otherwise we'll never get at a stable situation.

Ciao,
Dscho

Kirill Smelkov

unread,
Jan 11, 2011, 4:06:01 PM1/11/11
to Johannes Schindelin, Erik Faye-Lund, msy...@googlegroups.com

I think it's wrong to start guessing because we don't understand the
cause yet -- just blindly adjusting for kirr & dscho setup could still
result in failure on 3rd case. Let's please fix it properly - I'll try
to look into the problem, but doubt that could do it this week -- too
tight on schedule...

> Otherwise we'll never get at a stable situation.

Other patches besides this one from the 'wine' branch are ok. Let's
merge them to devel now, and investigate heap failure.


Thanks,
Kirill

Johannes Schindelin

unread,
Jan 11, 2011, 5:07:44 PM1/11/11
to Kirill Smelkov, Erik Faye-Lund, msy...@googlegroups.com
Hi,

On Wed, 12 Jan 2011, Kirill Smelkov wrote:

> On Tue, Jan 11, 2011 at 01:24:43AM +0100, Johannes Schindelin wrote:
>
> > On Mon, 10 Jan 2011, Kirill Smelkov wrote:
> >
> > > On Sun, Jan 09, 2011 at 04:55:51PM +0100, Johannes Schindelin wrote:
> > >
> > > > This is a 32-bit Ubuntu (IIRC 10.04) with a Wine built from
> > > > 8bcc61fb.
> > >
> > > Hmm, strange. But now with your changes it crashes on my side:
> > >
> > > kirr@mini:~/src/tools/git/msysgit$ wine cmd /c msys.bat
> > > 0 [main] us 0 init_cheap: VirtualAlloc pointer is null, Win32 error 487
> > > AllocationBase 0x7E475000, BaseAddress 0x7E540000, RegionSize 0xD0000, State 0x2000
> > > Z:\home\kirr\src\tools\git\msysgit\bin\bash.exe: *** Couldn't reserve space for cygwin's heap, Win32 error 487
> >
> > Could you find out what memory ranges would be okay on your side?
>
> I think it's wrong to start guessing because we don't understand the
> cause yet -- just blindly adjusting for kirr & dscho setup could still
> result in failure on 3rd case. Let's please fix it properly - I'll try
> to look into the problem, but doubt that could do it this week -- too
> tight on schedule...

We had that discussion already, the most informative answers came from
Peter Harris and Laurent Boulard:

http://thread.gmane.org/gmane.comp.version-control.msysgit/5931/focus=6139

> > Otherwise we'll never get at a stable situation.
>
> Other patches besides this one from the 'wine' branch are ok. Let's
> merge them to devel now, and investigate heap failure.

I cannot continue without the rebase, because the current msys-1.0.dll
triggers the error I mentioned when I start under wine.

Ciao,
Dscho

Kirill Smelkov

unread,
Jan 12, 2011, 3:12:40 AM1/12/11
to Johannes Schindelin, Erik Faye-Lund, msy...@googlegroups.com
On Tue, Jan 11, 2011 at 11:07:44PM +0100, Johannes Schindelin wrote:
> Hi,
>
> On Wed, 12 Jan 2011, Kirill Smelkov wrote:
>
> > On Tue, Jan 11, 2011 at 01:24:43AM +0100, Johannes Schindelin wrote:
> >
> > > On Mon, 10 Jan 2011, Kirill Smelkov wrote:
> > >
> > > > On Sun, Jan 09, 2011 at 04:55:51PM +0100, Johannes Schindelin wrote:
> > > >
> > > > > This is a 32-bit Ubuntu (IIRC 10.04) with a Wine built from
> > > > > 8bcc61fb.
> > > >
> > > > Hmm, strange. But now with your changes it crashes on my side:
> > > >
> > > > kirr@mini:~/src/tools/git/msysgit$ wine cmd /c msys.bat
> > > > 0 [main] us 0 init_cheap: VirtualAlloc pointer is null, Win32 error 487
> > > > AllocationBase 0x7E475000, BaseAddress 0x7E540000, RegionSize 0xD0000, State 0x2000
> > > > Z:\home\kirr\src\tools\git\msysgit\bin\bash.exe: *** Couldn't reserve space for cygwin's heap, Win32 error 487
> > >
> > > Could you find out what memory ranges would be okay on your side?
> >
> > I think it's wrong to start guessing because we don't understand the
> > cause yet -- just blindly adjusting for kirr & dscho setup could still
> > result in failure on 3rd case. Let's please fix it properly - I'll try
> > to look into the problem, but doubt that could do it this week -- too
> > tight on schedule...
>
> We had that discussion already, the most informative answers came from
> Peter Harris and Laurent Boulard:
>
> http://thread.gmane.org/gmane.comp.version-control.msysgit/5931/focus=6139

Thanks, I'll study those.


> > > Otherwise we'll never get at a stable situation.
> >
> > Other patches besides this one from the 'wine' branch are ok. Let's
> > merge them to devel now, and investigate heap failure.
>
> I cannot continue without the rebase, because the current msys-1.0.dll
> triggers the error I mentioned when I start under wine.

If you want you can apply your rebase to msys.dll now, so I will be in
position to fix it :)


Thanks,
Kirill

karste...@dcon.de

unread,
Jan 16, 2011, 6:25:31 PM1/16/11
to kusm...@gmail.com, Johannes Schindelin, Kirill Smelkov, msysGit, patt...@gmail.com, Stefan Springer

Hi,

sorry for not replying sooner, I had quite a busy week.

From your last mail I guess that there has been a thorough misunderstanding of the environment code, so I've tried to address this by splitting that patch into six individual features. I found a few minor bugs and some areas of improvement in the process, which I have fixed.

Changes in v6:
- separated argument / environment features into individual patches
- mingw_fopen: reduced wotype conversion buffer size
- make_environment_block: fixed envblk conversion buffer size (was too small for invalid UTF-8), fixed memory leak (tmpenv wasn't free()d), fixed removing entries (next bsearch needs a sorted list, too, so use memmove instead of assigning NULL)
- utftowcs/wcstoutf: improved documentation
- utftowcs: improved invalid UTF-8 detection
- mingw_startup: replaced xstrdup with xmemdupz where the length is known in advance (slightly faster)
- new bsearchenv that returns insert position on failure (enables us to use positioned inserts rather that append + qsort)
- new do_putenv that is used by both mingw_putenv and make_environment_block
- removed mingw_unsetenv, as do_putenv (and thus, mingw_putenv) can now do this, too
- moved initenv logic to mingw_startup (saves additional counting and copying)
- changed environment code to use array semantics rather than pointer arithmetic (easier to read and understand, I hope)




Erik Faye-Lund <kusm...@gmail.com> wrote on 11.01.2011 00:32:14:

> On Mon, Jan 10, 2011 at 11:41 PM,  <karste...@dcon.de> wrote:
> >
> > Erik Faye-Lund <kusm...@gmail.com> wrote on 10.01.2011 19:52:27:
> >>
> >> Uhm, what? Maintaining sorted lists etc is a much bigger hassle than
> >> just converting the input of mingw_getenv to utf-16, looking up using
> >> _wgetenv, converting the result to utf-8. And the same thing for
> >> putenv.
> >>

I think this is the root of the misunderstanding. Having a sorted environment and UTF-8 conversion are two entirely different features.

"Maintaining sorted lists" is nothing new at all, we already had the sorting and a full set of environment management functions (to prepare the environment of the child process for CreateProcess). I only changed this from 'linear search merge before qsort' to 'binary search merge after qsort' (now in patch "Win32: improve environment merging", this reduces the code by ~20% that I've filled up with comments).

Reusing that code to implement a sorted process environment is only a relatively small step (now in patch "Win32: improve environment performance").

As for the UTF-8 conversion, that part is exactly 9 lines of code in mingw_startup (now in patch "Win32: Unicode environment (incoming)"), plus a minimal putenv that is completely replaced later by the performance patch.

> >> This is the same thing as the ANSI version of the environment does,
> >> only with a different encoding.
> >>
> >
> > No it's not. Getenv is supposed to return a pointer to a string owned by the
> > runtime (usually a pointer into char **environ). When you convert to UTF-8
> > on demand, you need to keep track of the allocated memory for the converted
> > UTF-8 strings, as user code is not supposed to free() it. This means that
> > you have to maintain two environment lists (the first beeing the original
> > wchar_t environment, the second containing all values that have already been
> > converted and returned from getenv). And all environment functions need to
> > query / update / merge both lists, which is MUCH more complex and
> > error-prone than maintaining only a single list.
>
> Wrong. POSIX says the following: "The return value from getenv() may
> point to static data which may be overwritten by subsequent calls to
> getenv()". Simply converting into a static buffer and returning it is
> OK.
>
> See http://pubs.opengroup.org/onlinepubs/009695399/functions/getenv.html
> for details.
>

The Linux man page has this to say: "As typically implemented, getenv() returns a pointer to a string within the environment list." - which is fully POSIX compliant, as static data "MAY be overwritten by subsequent calls", not "MUST".

Such a "typical" implementation of getenv() not only allows using several environment variables at a time without copying them, it is also thread safe as long as the environment is not modified.

BOTH of these properties are currently required by git, and, obviously, the C libraries of all currently supported git target platforms provide such a typical implementation.

> > The sorting we get for free
> > (in terms of complexity, not necessarily performance), as it is required by
> > CreateProcess anyway.
> >
> > And no, using a static (or even local) buffer for the getenv return value is
> > not an option. While POSIX allows this ("The string pointed to may be
> > overwritten by a subsequent call to getenv()..."), it is incompatible with
> > near every usage in git. E.g. (from ident.c:254):
> > return fmt_ident(getenv("GIT_AUTHOR_NAME"),
> >          getenv("GIT_AUTHOR_EMAIL"),
> >          getenv("GIT_AUTHOR_DATE"),
> >          flag);
> >
>
> Patches fixing this would be nice, I'm sure it'd be appreciated upstreams.
>

There are about 120 references to getenv() throughout git, often following the pattern above, some in multi threaded code. Adding abundant string copying and synchronization will invariably affect performance on ALL git platforms, with the risk of introducing new memory leaks and deadlocks.

If you're willing to get tarred and feathered on the git mailing list for this, go ahead :-) But I think the result is pretty predictable: Adapting to degenerate API implementations should be in compat.

> Or we could have a couple of buffers we cycle through (like we do in
> some other places already).
>

Cycling through a fixed set of buffers in an API that is used by multiple threads is one of the more elusive programming errors (failing sporadically, or even worse: succeeding with wrong data). If you know of any such patterns in git, please point them out so that we may fix these bugs.

> > Please have a look at Nico's patch in work/utf8 (bcce1013). While beeing far
> > from complete, it pretty much shows where the 'convert to utf-8 in getenv'
> > approach leads to.
> >
>
> I think this approach is better than what you're proposing, as it's
> less intrusive as it only affects the functions in question. But I
> think cycling through a set of buffers would be even better.
>

As I see it, it boils down to these options:
- converting char **environ on startup (my approach, 9 lines of code (28 with minimal putenv))
- keeping a separate list (Nico's approach, ~90 lines of code)
- using static buffers and changing every getenv() usage in git (your approach, ~500 lines of code?)

> In fact, I don't really see the point in caching the result either. Is
> git really fetching the same environment variables from many different
> places without modifying it? And even if it is, is it affecting
> performance enough to justify complicating the code?
>
> >> Looking a bit deeper, I've found another thing I'm not entirely happy
> >> with: this adds yet another set of UTF-16 <-> UTF-8 recoding paths. We
> >> already have WideCharToMultiByte/MultiByteToWideChar from winapi and
> >> reencode_string from utf8.c. Why yet another?
> >
> > The wcstoutf function is only a small wrapper around WideCharToMultiByte, to
> > adapt to errno and return the correct string length (without trailing \0).
> >
>
> Yeah, I saw that after posting... I don't quite agree that an internal
> function should modify errno like that, though

Checking parameters and setting up errno appropriately is needed in nearly all ~30 usages of these functions. You don't want to suggest that we duplicate those 10 lines of code in 30 places, do you?

> (and it prevents you
> from querying the length of the string by barfing on "!utf").

I don't like functions that do different things based on supplied or missing input parameters. If calculating the target buffer size in advance was needed anywhere, I would have added an appropriate helper function ("wcstoutflen(widestring)" is much clearer than "wcstoutf(NULL, widestring, 0)").

> And with
> the only useful difference left being modifying semantics, I find that
> it hides more than it solves

The semantics conform to mbstowcs and wcstombs exactly, and are very similar to strncpy, memcpy and memmove.

But this is rather a coincidence, my primary concern when designing these functions was to require minimal code changes for run-of-the-mill stdio wrappers (+3 lines, as seen in mingw_chdir and mingw_chmod).

> (I had to read the code to figure out
> it's semantics, I already know WideCharToMultiByte's semantics by
> heart).
>

I believe the documentation of wcstoutf / utftowcs is pretty extensive already (at least compared to usual git standards). If you point out which parts were particularly incomprehensive, I can try to improve it.

On the other hand, at least for wcstoutf the documentation is now longer than the source code, so you may actually save time reading the code directly :-)

> > There is no equivalent for utftowcs, though, as both iconv and
> > MultiByteToWideChar fail badly for invalid UTF-8 (and MultiByteToWideChar in
> > different ways, depending on Windows version). While msysgit is exprimental,
> > it would be quite cruel for devoted long-time users if they could no longer
> > use their old, ANSI-encoded msysgit repositories. (This is all explained in
> > the commit message, btw.)
> >
>
> Well, it wasn't quite explained well enough for me to understand it.
> Are you saying that since i18n.commitEncoding defaults to "utf-8" and
> some people have repos with ANSI (I guess you mean Windows 1252 and/or
> ISO-8859-1) that out-date i18n.commitEncoding (or Git for Windows'
> honoring of it), we should try to detect that and show the characters
> as "ANSI"?
>

The whole patch series is actually mostly about (non-ASCII) file names. The fault tolerant utftowcs in particular enables us to check out repos created by current msysgit/msvcgit versions with the new, UTF-8-aware versions.

By "ANSI" I mean any code page that can be used as default Windows "ANSI" code page, i.e. that the Win32 *A functions use to convert to and from native Unicode, and that MultiByteToWideChar / WideCharToMultiByte use with the CP_ACP constant (HKLM\SYSTEM\CurrentControlSet\Control\Nls\CodePage\ACP setting in the registry). This includes all cp125x code pages as well as some double byte character sets for CJK (see http://msdn.microsoft.com/en-us/goglobal/bb964654). Note that UTF-8 (65001) is NOT on this list, you'll get a blue screen on reboot if you try to set this as default code page.

The current msysgit/msvcgit versions (before the patch) exclusively use the Win32 *A API, so git will see and create file names in one of these encodings. As there is no translation between git tree objects and readdir/open/stat..., the tree objects will also contain file names in the default encoding of the originating host (this is also true for Linux, just that the default encoding here will most likely be UTF-8).

When checking out a repository, file names will be passed from git tree objects to CreateFileA, which tries to convert them from the system's default code page to Windows native UTF-16. This will fail if the file name contains invalid byte sequences, e.g. greek file names added on a greek Windows (all chars > 0x80) will most likely be invalid on a korean (cp949) Windows.

I think the same is true for Linux if you try to check out on a file system that treats file names as text (e.g. HFS, JFS, NTFS, NFS, SMB...). Checking out on byte-array based file systems will work, but result in mangled file names (which can be fixed with convmv).


The new msysgit/msvcgit versions (after the patch) will assume file names to be UTF-8-encoded, so now we can flawlessly interoperate with Linux, cygwin-git, JGit etc... Greek file names added with the current msysgit/msvcgit, however, will be mostly invalid UTF-8.

With a conversion function that fails for invalid UTF-8, we wouldn't be able to check out the greek repo even on a greek Windows system.

With MultiByteToWideChar on XP, invalid UTF-8 is dropped, so we would pass empty names to CreateFileW (-> ERROR_INVALID_NAME), or abridged names, so that the files would probably overwrite each other.

With MultiByteToWideChar on Vista/7, we would pass file names containing \ufffd (replacement char) to CreateFileW (-> ERROR_INVALID_NAME).

With the fault tolerant utftowcs, however, we can check out the repository regardless of file name encoding, fix mangled file names in the working copy (as reported by git-status), and then commit as UTF-8 (just as on Linux, except that there is no extra utility).

The UTF-8 migration is much simpler, though, if you keep an old msysgit/msvcgit version around: checkout with the old version, then git-add --all & git-commit with the new one. With git-filter-branch + iconv, it may also be possible to convert the whole history instead of just the branch heads, but I haven't checked that.


The rest of the patch series simply ensures that UTF-8 file names can be printed properly on the console, can be passed in and out of git via command line arguments and environment variables, and are displayed correctly in gitk / git-gui (even with non-UTF-8 system encoding).

Regarding i18n.commitEncoding and i18n.logOutputEncoding, these options are pretty much obsolete with Unicode/UTF-8 support. After all, Windows natively supports Unicode, there is no point in limiting the character set by converting to a legacy encoding.
unicode-v6.bundle

Robin Rosenberg

unread,
Jan 17, 2011, 2:44:50 AM1/17/11
to karste...@dcon.de, kusm...@gmail.com, Johannes Schindelin, Kirill Smelkov, msysGit, patt...@gmail.com, Stefan Springer
karste...@dcon.de skrev 2011-01-17 00:25:
> The new msysgit/msvcgit versions (after the patch) will assume file names
> to be UTF-8-encoded, so now we can flawlessly interoperate with Linux,
> cygwin-git, JGit etc...

JGit/EGit compatible. Sweet. Thanks for taking the lead on this.

-- robin

Alex Budovski

unread,
Jan 17, 2011, 5:27:41 AM1/17/11
to karste...@dcon.de, kusm...@gmail.com, Johannes Schindelin, Kirill Smelkov, msysGit, patt...@gmail.com, Stefan Springer, Robin Rosenberg
Karsten,

I've a few questions about the v6 bundle:

1) Re: d12b818 -- Firstly, the warning numbers are transposed in the
commit message (should be swapped).

"MSVC also warns about implicit casts from non-const to const, which is
stupid - disable C4700 warning." (should be C4090)

It's actually the other way around: from const -> non-const. It warns
when you *lose* the const qualifier, not when you add it, which is
correct. This is a full /error/ in C++, not just a puny warning (like
in C).

E.g.

int *a = 0;
const int *b = a; // OK
a = b; // this loses const-qualifier! => C4090

2) Why disable the use of uninitialized variable warning (C4700)? This
sounds extremely dangerous. You wrote that git code often contains
statements like:

int foo = foo;

This leaves foo with an indeterminate value. Why on earth does git do
this? (I haven't looked at the code). I can't see how this can be
anything but a programming error.

If anything, the warning seems like it should be left on -- to catch
such programming errors.

Alex Budovski

unread,
Jan 17, 2011, 6:00:49 AM1/17/11
to karste...@dcon.de, kusm...@gmail.com, Johannes Schindelin, Kirill Smelkov, msysGit, patt...@gmail.com, Stefan Springer, Robin Rosenberg
Also, e10ecbe8f says

"MSys bash doesn't support unicode at all. Testing a unicode-enabled git
with an encoding-agnostic bash cannot work."

Does that mean these changes to support unicode will only work with
regular cmd.exe and not git bash?

If so that sounds a very problematic, as git bash provides many useful
posix utilities/functionality (grep, sed, awk, perl, bash scripts,
find, etc) that cmd.exe doesn't. It would be a shame to forego all of
these things in order to have unicode support.

Johannes Schindelin

unread,
Jan 17, 2011, 7:47:15 AM1/17/11
to Kirill Smelkov, Erik Faye-Lund, msy...@googlegroups.com, ki...@landau.phys.spbu.ru
Hi,

On Mon, 17 Jan 2011, Kirill Smelkov wrote:

> I think we need to move this forward. I propose that now we merge your
> wine branch with comments in rebase patch itself or in merge commit,
> that rebasing is not working unversally and that e.g. it fails for wine
> on Debian Lenny/Squeeze version, and also that those who need to use
> msysgit on wine are welcome to fix it.
>
> I'm going to investigate it eventually, but this may take a looong time
> because of free time issues...

Bad news. I just tried on a 64-bit current Ubuntu, and the rebased
msys-1.0.dll actually _breaks_ things. So I pulled it out, and will come
up with a script to mark msys-1.0.dll with the "assume unchanged" bit when
I have a need to work on 32-bit again.

I merged the rest (and also the commit adding the build script for
rebase.exe) and pushed things to 'devel'.

Ciao,
Dscho

karste...@dcon.de

unread,
Jan 17, 2011, 7:57:04 AM1/17/11
to Alex Budovski, Johannes Schindelin, Kirill Smelkov, kusm...@gmail.com, msysGit, patt...@gmail.com, Robin Rosenberg, Stefan Springer

Alex Budovski <abud...@gmail.com> wrote on 17.01.2011 11:27:41:

> Karsten,
>
> I've a few questions about the v6 bundle:
>
> 1) Re: d12b818 -- Firstly, the warning numbers are transposed in the
> commit message (should be swapped).
>
>  "MSVC also warns about implicit casts from non-const to const, which is
>     stupid - disable C4700 warning." (should be C4090)
>


Oops...

> It's actually the other way around: from const -> non-const. It warns
> when you *lose* the const qualifier, not when you add it, which is
> correct. This is a full /error/ in C++, not just a puny warning (like
> in C).
>
> E.g.
>
> int *a = 0;
> const int *b = a;  // OK
> a = b; // this loses const-qualifier! => C4090
>


Losing const qualifier is currently an error in GCC, so on getting loads of 'changed const qualifier' warnings in MSVC (129, to be exact), I just assumed it would be for adding const. However, it turns out that MSVC is just more picky...most of the warnings seem to originate from casting const* to void (which should be OK, as void indicates the data is treated as opaque). E.g.

const char **x = NULL;
x = realloc(x, 5 * sizeof(*x));

...will be fine on GCC, but produce C4090 on MSVC.

As we don't have that much MSVC-only code, and losing const qualifiers will be caught by GCC, I'd still propose to disable this warning.

> 2) Why disable the use of uninitialized variable warning (C4700)? This
> sounds extremely dangerous. You wrote that git code often contains
> statements like:
>
> int foo = foo;
>
> This leaves foo with an indeterminate value. Why on earth does git do
> this? (I haven't looked at the code). I can't see how this can be
> anything but a programming error.
>
> If anything, the warning seems like it should be left on -- to catch
> such programming errors.

I've blamed some of these, and the commit messages said this is to disable the warning. The subsequent code in fact initializes these variables before using them, just in a non-trivial way (so the compiler doesn't catch it).

There are only 27 occurences, though, and I think we don't lose anything by writing 'int foo = 0;', so these might be fixed in git instead of diabling the warning. Any objections?

karste...@dcon.de

unread,
Jan 17, 2011, 9:05:07 AM1/17/11
to Alex Budovski, Johannes Schindelin, Kirill Smelkov, kusm...@gmail.com, msysGit, patt...@gmail.com, Robin Rosenberg, Stefan Springer

Alex Budovski <abud...@gmail.com> wrote on 17.01.2011 12:00:49:

> Also, e10ecbe8f says
>
>  "MSys bash doesn't support unicode at all. Testing a unicode-enabled git
>     with an encoding-agnostic bash cannot work."
>
> Does that mean these changes to support unicode will only work with
> regular cmd.exe and not git bash?
>


No, most things will work quite well, some things will result in mangled text or file names.

MSys uses the Windows default encoding for everything, which will be translated correctly to file names, command line arguments etc. by the Win32 *A functions. Reading or writing UTF-8 encoded files or streams, however, will be problematic (e.g. executing UTF-8-encoded test scripts, or piping UTF-8-encoded git output to MSys less (I've posted some workarounds for this somewhere on this thread)).

But let's tackle one problem at a time, shall we? I'm already hacking on a Unicode-enabled MSys.dll in my spare time, though it's too soon to tell if that'll work...might be simpler to switch to GnuWin32 or Cygwin.

Alex Budovski

unread,
Jan 17, 2011, 8:02:29 PM1/17/11
to karste...@dcon.de, Johannes Schindelin, Kirill Smelkov, kusm...@gmail.com, msysGit, patt...@gmail.com, Robin Rosenberg, Stefan Springer
> No, most things will work quite well, some things will result in mangled
> text or file names.
>
> MSys uses the Windows default encoding for everything, which will be
> translated correctly to file names, command line arguments etc. by the Win32
> *A functions. Reading or writing UTF-8 encoded files or streams, however,
> will be problematic (e.g. executing UTF-8-encoded test scripts, or piping
> UTF-8-encoded git output to MSys less (I've posted some workarounds for this
> somewhere on this thread)).
>
> But let's tackle one problem at a time, shall we? I'm already hacking on a
> Unicode-enabled MSys.dll in my spare time, though it's too soon to tell if
> that'll work...might be simpler to switch to GnuWin32 or Cygwin.

Oh, I greatly appreciate the work you've been doing -- was just
curious to see what scope it'd have.

So If I understand correctly, these changes will make git compatible
with non-ANSI filenames?

And also perhaps gitk and git-gui will be able to display utf-8 encoded text?

Anything else?

Just trying to understand what impact this patchset will have more precisely.

Alex Budovski

unread,
Jan 17, 2011, 8:43:29 PM1/17/11
to karste...@dcon.de, Johannes Schindelin, Kirill Smelkov, kusm...@gmail.com, msysGit, patt...@gmail.com, Robin Rosenberg, Stefan Springer
> Losing const qualifier is currently an error in GCC, so on getting loads of
> 'changed const qualifier' warnings in MSVC (129, to be exact), I just
> assumed it would be for adding const. However, it turns out that MSVC is
> just more picky...most of the warnings seem to originate from casting const*
> to void (which should be OK, as void indicates the data is treated as
> opaque). E.g.

Well, it's not really OK, as even gcc warns if you assign a const T*
to a nonconst void *. (warning: initialization discards qualifiers
from pointer target type).

>
> const char **x = NULL;
> x = realloc(x, 5 * sizeof(*x));
>
> ...will be fine on GCC, but produce C4090 on MSVC.

I did a bit of investigating; it looks like VC takes issue with
conversion from const T ** => void *, but gcc doesn't. Both compilers
take issue with conversion from const T * => void * (as they should).

gcc is probably correct in that const T ** points to a vector of const
T * 's, which themselves aren't const (only the T's they point to
are).

> As we don't have that much MSVC-only code, and losing const qualifiers will
> be caught by GCC, I'd still propose to disable this warning.

Given the above discussion, agreed.

>
>> 2) Why disable the use of uninitialized variable warning (C4700)? This
>> sounds extremely dangerous. You wrote that git code often contains
>> statements like:
>>
>> int foo = foo;
>>
>> This leaves foo with an indeterminate value. Why on earth does git do
>> this? (I haven't looked at the code). I can't see how this can be
>> anything but a programming error.
>>
>> If anything, the warning seems like it should be left on -- to catch
>> such programming errors.
>
> I've blamed some of these, and the commit messages said this is to disable
> the warning. The subsequent code in fact initializes these variables before
> using them, just in a non-trivial way (so the compiler doesn't catch it).
>
> There are only 27 occurences, though, and I think we don't lose anything by
> writing 'int foo = 0;', so these might be fixed in git instead of diabling
> the warning. Any objections?
>

Indeed, the practice looks suspicious at best. I'd prefer they fix it
as you suggest, and we keep the warning. It could uncover some real
bugs.

karste...@dcon.de

unread,
Jan 18, 2011, 7:44:35 AM1/18/11
to Alex Budovski, Johannes Schindelin, Kirill Smelkov, kusm...@gmail.com, msysGit, patt...@gmail.com, Robin Rosenberg, Stefan Springer
Alex Budovski <abud...@gmail.com> wrote on 18.01.2011 02:02:29:

> > No, most things will work quite well, some things will result in mangled
> > text or file names.
> >
> > MSys uses the Windows default encoding for everything, which will be
> > translated correctly to file names, command line arguments etc. bythe Win32

> > *A functions. Reading or writing UTF-8 encoded files or streams, however,
> > will be problematic (e.g. executing UTF-8-encoded test scripts, or piping
> > UTF-8-encoded git output to MSys less (I've posted some workarounds for this
> > somewhere on this thread)).
> >
> > But let's tackle one problem at a time, shall we? I'm already hacking on a
> > Unicode-enabled MSys.dll in my spare time, though it's too soon to tell if
> > that'll work...might be simpler to switch to GnuWin32 or Cygwin.
>
> Oh, I greatly appreciate the work you've been doing -- was just
> curious to see what scope it'd have.
>
> So If I understand correctly, these changes will make git compatible
> with non-ANSI filenames?
>


Yes, I tested with greek and cyrillic file names on my cp1252 Windows box.

But more importantly, by using UTF-8 file names, we can interoperate flawlessly with other UTF-8-based systems. I.e. msysgit/msvcgit on all Windows versions (independent of language/encoding), UTF-8 Unixes/Linuxes (including cygwin on Windows), JGit/EGit, and even other UTF-8 encoded SCMs (e.g. SVN via git-svn)).

> And also perhaps gitk and git-gui will be able to display utf-8 encoded text?
>


Gitk and git-gui already had quite good Unicode/UTF-8 support. I just changed the file name encoding from system default (i.e. some Windows "ANSI" code page) to UTF-8.

> Anything else?
>
> Just trying to understand what impact this patchset will have more precisely.

Then I suggest that you try it out and report what's still broken for you :-) I don't pretend to use ALL git features (or that I even know them)...

karste...@dcon.de

unread,
Jan 21, 2011, 10:56:01 PM1/21/11
to kusm...@gmail.com, Johannes Schindelin, Kirill Smelkov, msysGit, patt...@gmail.com, Robin Rosenberg, Stefan Springer, Alex Budovski

Hi,

here's the fixes from this week (pretty slim, might be time to merge the stuff, if nothing else comes up?)

Changes in v7:
- disable C4090 compiler warning (cast const** -> void*)
- fixed C4700 compiler warning (using uninitialized variable) in git core code

Ciao,
Karsten
unicode-v7.bundle

Clifford Caoile

unread,
Feb 4, 2011, 9:40:40 AM2/4/11
to karste...@dcon.de, kusm...@gmail.com, Johannes Schindelin, Kirill Smelkov, msysGit, patt...@gmail.com, Robin Rosenberg, Stefan Springer, Alex Budovski
Hi!

On Sat, Jan 22, 2011 at 12:56, <karste...@dcon.de> wrote:
> here's the fixes from this week (pretty slim, might be time to merge the
> stuff, if nothing else comes up?)
>
> Changes in v7:

I finally tried out this kb/unicode PATCH v7. Thank you for this work.
It seems to preserve the file names with my manual tests in Japanese
filenames on Windows XP Japanese. In particular:

- Creating new files with non ASCII names works.
- Creating new files inside directories with non ASCII names works.
- Renaming files and directories with non ASCII names works.
- Doing a merge with separate changes to the same file with non ASCII
names works.
I then pushed this local created repository to a Linux machine and
checked it out there.
- The file names were preserved, which is good (and the purpose of the
patch set).

However, the log messages show lots of jumbled octal notation, like this:

--- example: git log --numstat origin/utf8 ----
commit 58659c6b5409fe788c6129c255a441b2d2ae057e
Author: Clifford Caoile <pi...@users.sf.net>
Date: Thu Feb 3 16:07:11 2011 +0900

大阪の「地名の由来と変遷」説明をテキストファイルに追加した

参考:
http://ja.wikipedia.org/w/index.php?title=大阪

32 0
"\346\235\261\344\272\254\343\200\201\344\272\254\351\203\275\343\200\201\345\244\247\351\230\252\343\200\201\345\220\215\345\217\244\345\261\213/\343\203\236\343\203\274\343\202\270\350\251\246\351\250\223\345\257\276\345\277\234.txt"
--- example: git log --numstat origin/utf8 ----

The attachment can be examined with git clone
unicode-jp-tests.gitbundle and git branch -r.

I haven't taken a good look at what needs to be fixed for that issue.
I must say I really like what this patch series enables. It's not
complete but it's a good start.

Also, how can I write test cases to test this new functionality? In
particular I would like to compare the behavior to a typical
Linux-based git.

Best regards,
Clifford Caoile

unicode-jp-tests.gitbundle

karste...@dcon.de

unread,
Feb 4, 2011, 1:57:03 PM2/4/11
to pi...@users.sourceforge.net, Alex Budovski, Johannes Schindelin, Kirill Smelkov, kusm...@gmail.com, msysGit, patt...@gmail.com, piy...@gmail.com, Robin Rosenberg, Stefan Springer

piy...@gmail.com wrote on 04.02.2011 15:40:40:

> Hi!
>
> On Sat, Jan 22, 2011 at 12:56,  <karste...@dcon.de> wrote:
> > here's the fixes from this week (pretty slim, might be time to merge the
> > stuff, if nothing else comes up?)
> >
> > Changes in v7:
>
> I finally tried out this kb/unicode PATCH v7. Thank you for this work.
> It seems to preserve the file names with my manual tests in Japanese
> filenames on Windows XP Japanese. In particular:
>
> - Creating new files with non ASCII names works.
> - Creating new files inside directories with non ASCII names works.
> - Renaming files and directories with non ASCII names works.
> - Doing a merge with separate changes to the same file with non ASCII
> names works.
> I then pushed this local created repository to a Linux machine and
> checked it out there.
> - The file names were preserved, which is good (and the purpose of the
> patch set).
>


Thanks for testing. Good to know that it works for Far East charsets, too.

> However, the log messages show lots of jumbled octal notation, like this:

[snip]
> "\346\235\261\344\272\254\343\200\201...

git config core.quotepath off

Then use "git --no-pager ...", or install cygwin and add

export LESSCHARSET=utf-8
export GIT_PAGER=C:/cygwin/bin/less.exe

to your %gitinstalldir%/etc/profile

> The attachment can be examined with git clone
> unicode-jp-tests.gitbundle and git branch -r.
>


My european Lucida Console font doesn't have japanese glyphs, so I see only boxes in the console, but it looks quite cute in gitk (not that it means anything to me :-).

> I haven't taken a good look at what needs to be fixed for that issue.
> I must say I really like what this patch series enables. It's not
> complete but it's a good start.
>
> Also, how can I write test cases to test this new functionality? In
> particular I would like to compare the behavior to a typical
> Linux-based git.
>

I'm afraid the MSYS toolchain is completely Unicode-ignorant, you might succeed with cygwin (that's why you need --no-pager above, MSYS less just doesn't cut it). I'm working on MSYS Unicode support (whenever I find the time, no promises on that end...)

Bye,
Karsten

Kirill Smelkov

unread,
Feb 4, 2011, 2:05:50 PM2/4/11
to Clifford Caoile, karste...@dcon.de, kusm...@gmail.com, Johannes Schindelin, msysGit, patt...@gmail.com, Robin Rosenberg, Stefan Springer, Alex Budovski

Maybe try setting core.quotepath=false in your gitconfig:

---- 8< ----

kirr@tugrik:~/tmp/trashme/jp/x$ git config --local --add core.quotepath true
kirr@tugrik:~/tmp/trashme/jp/x$ git log -1 --numstat 58659c


commit 58659c6b5409fe788c6129c255a441b2d2ae057e
Author: Clifford Caoile <pi...@users.sf.net>
Date: Thu Feb 3 16:07:11 2011 +0900

大阪の「地名の由来と変遷」説明をテキストファイルに追加した

参考:
http://ja.wikipedia.org/w/index.php?title=大阪

32 0 "\346\235\261\344\272\254\343\200\201\344\272\254\351\203\275\343\200\201\345\244\247\351\230\252\343\200\201\345\220\215\345\217\244\345\261\213/\343\203\236\343\203\274\343\202\270\350\251\246\351\250\223\345\257\276\345\277\234.txt"

kirr@tugrik:~/tmp/trashme/jp/x$
kirr@tugrik:~/tmp/trashme/jp/x$ git config --local --add core.quotepath false
kirr@tugrik:~/tmp/trashme/jp/x$ git log -1 --numstat 58659c


commit 58659c6b5409fe788c6129c255a441b2d2ae057e
Author: Clifford Caoile <pi...@users.sf.net>
Date: Thu Feb 3 16:07:11 2011 +0900

大阪の「地名の由来と変遷」説明をテキストファイルに追加した

参考:
http://ja.wikipedia.org/w/index.php?title=大阪

32 0 東京、京都、大阪、名古屋/マージ試験対応.txt

---- 8< ----

We should probably also do this by default in shipped msysgit /etc/gitconfig
out of the box ...


Thanks,
Kirill

Clifford Caoile

unread,
Feb 4, 2011, 9:59:44 PM2/4/11
to Kirill Smelkov, karste...@dcon.de, kusm...@gmail.com, Johannes Schindelin, msysGit, patt...@gmail.com, Robin Rosenberg, Stefan Springer, Alex Budovski
Hi!

2011/2/5 Kirill Smelkov <ki...@landau.phys.spbu.ru>:


>
> Maybe try setting core.quotepath=false in your gitconfig:

>[snip]

Whoops, I did not know about that config.
Thanks, this suggestion works on my Linux machine (Ubuntu 10.04 LTS,
VM, git 1.7.0.4) and now Git for Windows with cmd.exe + Cygwin less.

> We should probably also do this by default in shipped msysgit /etc/gitconfig
> out of the box ...

My Linux machine's git did not have this set in the global or system
config. I wonder if this means the Git for Windows should also be
conservative and NOT set this config.

Best regards,
Clifford Caoile

Clifford Caoile

unread,
Feb 4, 2011, 9:57:59 PM2/4/11
to karste...@dcon.de, Alex Budovski, Johannes Schindelin, Kirill Smelkov, kusm...@gmail.com, msysGit, patt...@gmail.com, Robin Rosenberg, Stefan Springer
Hi!

On Sat, Feb 5, 2011 at 03:57, <karste...@dcon.de> wrote:
>
> piy...@gmail.com wrote on 04.02.2011 15:40:40:
>> However, the log messages show lots of jumbled octal notation, like this:
> [snip]
>> "\346\235\261\344\272\254\343\200\201...
>
> git config core.quotepath off

Whoops, I did not know about that config.
Actually that output was from a Linux machine via a putty terminal emulator.

As Kirill demonstrated, after setting that config on the Linux
machine, the --stat and --numstat reports of the file names show up
correctly instead of the "jumbled octal". Also using gitk on Windows
shows it works, as well.

My confidence in using this patch set for daily use has increased. :)

> Then use "git --no-pager ...", or install cygwin and add
>
> export LESSCHARSET=utf-8
> export GIT_PAGER=C:/cygwin/bin/less.exe
>
> to your %gitinstalldir%/etc/profile

Good to know, thanks!
Just for the record, I'm a cmd.exe interface user when using Git for
Windows. I did the following and now git from cmd.exe is more
pleasant:

cmd> set LESSCHARSET=utf-8
cmd> set GIT_PAGER=c:/app/cyg/bin/less.exe

>> Also, how can I write test cases to test this new functionality? In
>> particular I would like to compare the behavior to a typical
>> Linux-based git.
>
> I'm afraid the MSYS toolchain is completely Unicode-ignorant, you might
> succeed with cygwin (that's why you need --no-pager above, MSYS less just
> doesn't cut it). I'm working on MSYS Unicode support (whenever I find the
> time, no promises on that end...)

Instead of using the MSYS toolchain like the bash and perl, would it
be better to drop down directly to C and use the Win32 APIs directly
when crafting a test program? If so, would it be acceptable as a
automated git test? (e.g. run from msysgit env's run-tests.sh)

Best regards,
Clifford Caoile

Kirill Smelkov

unread,
Feb 7, 2011, 3:57:10 AM2/7/11
to Clifford Caoile, karste...@dcon.de, kusm...@gmail.com, Johannes Schindelin, msysGit, patt...@gmail.com, Robin Rosenberg, Stefan Springer, Alex Budovski

My rationale here is that if we agree on UTF-8 as the next ASCII (which
is approach taken in kb/unicode) there is no reason to quote paths in
the first place. And for complex setups there always is a knob to turn
quoting on...


Thanks,
Kirill

Erik Faye-Lund

unread,
Feb 7, 2011, 4:59:22 PM2/7/11
to Kirill Smelkov, Johannes Schindelin, msy...@googlegroups.com
On Mon, Jan 10, 2011 at 8:18 PM, Kirill Smelkov
<ki...@landau.phys.spbu.ru> wrote:
> On Sun, Jan 09, 2011 at 05:13:06PM +0100, Erik Faye-Lund wrote:
>> On Sun, Jan 9, 2011 at 3:28 PM, Kirill Smelkov <ki...@landau.phys.spbu.ru> wrote:
>> > P.S. Sorry for being silent on other topics. Rebuilding gettext was a
>> > major pain which took lots of time...
>> >
>>
>> Gettext for msysGit? If so, I hope you were aware that I already
>> submitted the patches for that - I just haven't merged it yet...
>
> No, I wasn't and thanks for pointing it out.
>
> I need gettext for glib which is needed by mdbtools which is going to be
> (I hope) textconv driver for .mdb (ms access) files.
>
> So let's maybe merge your work/gettext branch without waiting for git to
> require gettext?  I'll have a base to continue and we'll already have
> some running time testing gettext bits when git.git merges ab/i18n...
>

Sorry for being slow. I've just merged and pushed my
work/gettext-branch, I hope this helps.

Erik Faye-Lund

unread,
Feb 8, 2011, 5:00:26 AM2/8/11
to Kirill Smelkov, Johannes Schindelin, msy...@googlegroups.com
On Tue, Feb 8, 2011 at 10:55 AM, Kirill Smelkov <ki...@mns.spb.ru> wrote:
> Thanks and nevermind, but pushed to where? msysgit.git master/devel is
> at "Updated to Git v1.7.4" (a5f56c), and
> http://repo.or.cz/w/msysgit/kusma.git/shortlog/refs/heads/work/gettext
> is 4 months old.
>
> Sorry if I misunderstood something and thanks,
> Kirill
>

I could swear I pushed it to http://repo.or.cz/w/msysgit.git, but it's
not there now. I'm not on the same computer, so I can't verify. Could
it be that someone rewound it?

Anyway, what I did was simply to merge
http://repo.or.cz/w/msysgit/kusma.git/shortlog/refs/heads/work/gettext
into "Updated to Git v1.7.4" (a5f56c). I guess I could do it again,
but I'd prefer to wait until I know what happened. I suspect I'll be
able to tell when I get home after work.

Erik Faye-Lund

unread,
Feb 8, 2011, 2:36:00 PM2/8/11
to Kirill Smelkov, Johannes Schindelin, msy...@googlegroups.com

Well, it seems I only TRIED to push; there was an error (because I had
my remote set up incorrectly) that I didn't notice. Pushed for real,
now.

Kirill Smelkov

unread,
Feb 8, 2011, 4:55:32 AM2/8/11
to Erik Faye-Lund, Johannes Schindelin, msy...@googlegroups.com

Thanks and nevermind, but pushed to where? msysgit.git master/devel is

Kirill Smelkov

unread,
Feb 9, 2011, 4:17:38 AM2/9/11
to Erik Faye-Lund, Johannes Schindelin, msy...@googlegroups.com

Thanks a lot.

Kirill Smelkov

unread,
Feb 12, 2011, 5:52:52 PM2/12/11
to Johannes Schindelin, Erik Faye-Lund, msy...@googlegroups.com, ki...@landau.phys.spbu.ru, ki...@mns.spb.ru
On Mon, Jan 17, 2011 at 01:47:15PM +0100, Johannes Schindelin wrote:
> Hi,
>
> On Mon, 17 Jan 2011, Kirill Smelkov wrote:
>
> > I think we need to move this forward. I propose that now we merge your
> > wine branch with comments in rebase patch itself or in merge commit,
> > that rebasing is not working unversally and that e.g. it fails for wine
> > on Debian Lenny/Squeeze version, and also that those who need to use
> > msysgit on wine are welcome to fix it.
> >
> > I'm going to investigate it eventually, but this may take a looong time
> > because of free time issues...
>
> Bad news. I just tried on a 64-bit current Ubuntu, and the rebased
> msys-1.0.dll actually _breaks_ things. So I pulled it out, and will come
> up with a script to mark msys-1.0.dll with the "assume unchanged" bit when
> I have a need to work on 32-bit again.

Some indirectly-related progress here:

I've managed to cross-compile msys.dll on linux, and also have found why
it takes so much time to start even simple msys process[1] (see also [2]).

Could we please apply attached patch to our msys?

Thanks,
Kirill


[1] http://repo.or.cz/w/msys/kirr.git/shortlog/refs/heads/x/kirr
[2] http://bugs.winehq.org/show_bug.cgi?id=13606


P.S. sorry for posting patch as attachment - I'm having problems with
mail posting/delivery this days...

0001-msys.dll-Don-t-pull-user32.dll-friends-just-to-detec.patch
It is loading more messages.
0 new messages