[PATCH] revisions --stdin: accept CRLF line terminators

32 views
Skip to first unread message

Johannes Sixt

unread,
Aug 11, 2015, 4:21:38 PM8/11/15
to Git Mailing List, msysGit
On Windows, 'git rebase -i' with rebase.missingCommitsCheck set to
warn or error reports:

Dropped commits (newer to older):
'atal: bad revision '410dee56...

The error comes from the git rev-list --stdin invocation in
git-rebase--interactive.sh (function check_todo_list). It is caused by
CRs that end up in the file "$todo".miss, because many tools of the MSYS
toolset force LF to CRLF conversion when regular files are written via
stdout.

To fix the error, permit CRLF line terminators when revisions and
pathspec are read using the --stdin option.

Signed-off-by: Johannes Sixt <j...@kdbg.org>
---
This fixes a new failure in the test suite (t3404.8[67]) on Windows, but
I got around to debug it only now.

revision.c | 4 ++++
t/t6017-rev-list-stdin.sh | 16 ++++++++++++++++
2 files changed, 20 insertions(+)

diff --git a/revision.c b/revision.c
index cf60c5d..4efedeb 100644
--- a/revision.c
+++ b/revision.c
@@ -1641,6 +1641,8 @@ static void read_pathspec_from_stdin(struct rev_info *revs, struct strbuf *sb,
int len = sb->len;
if (len && sb->buf[len - 1] == '\n')
sb->buf[--len] = '\0';
+ if (len && sb->buf[len - 1] == '\r')
+ sb->buf[--len] = '\0';
ALLOC_GROW(prune->path, prune->nr + 1, prune->alloc);
prune->path[prune->nr++] = xstrdup(sb->buf);
}
@@ -1661,6 +1663,8 @@ static void read_revisions_from_stdin(struct rev_info *revs,
int len = sb.len;
if (len && sb.buf[len - 1] == '\n')
sb.buf[--len] = '\0';
+ if (len && sb.buf[len - 1] == '\r')
+ sb.buf[--len] = '\0';
if (!len)
break;
if (sb.buf[0] == '-') {
diff --git a/t/t6017-rev-list-stdin.sh b/t/t6017-rev-list-stdin.sh
index 667b375..34c43cf 100755
--- a/t/t6017-rev-list-stdin.sh
+++ b/t/t6017-rev-list-stdin.sh
@@ -75,4 +75,20 @@ test_expect_success 'not only --stdin' '
test_cmp expect actual
'

+test_expect_success 'accept CRLF line terminators' '
+ cat >expect <<-\EOF &&
+ 7
+
+ file-2
+ EOF
+ q_to_cr >input <<-\EOF &&
+ masterQ
+ ^master^Q
+ --Q
+ file-2Q
+ EOF
+ git log --pretty=tformat:%s --name-only --stdin <input >actual &&
+ test_cmp expect actual
+'
+
test_done
--
2.3.2.245.gb5bf9d3

Junio C Hamano

unread,
Aug 11, 2015, 5:35:07 PM8/11/15
to Johannes Sixt, Git Mailing List, msysGit
This will strip lone CR at the end of line, in addition to CRLF at
the end of line (which you want to handle) and LF at the end of line
(we always have removed), making it work even on ancient MacOS.

If we really cared, I guess we could write it like this:

if (len && sb.buf[len - 1] == '\n') {
len--;
if (len && sb.buf[len - 1] == '\r')
len--;
sb.buf[len] = '\0';
}

but your version should be fine as-is.

Thanks.
> --

Junio C Hamano

unread,
Aug 11, 2015, 6:14:16 PM8/11/15
to Johannes Sixt, Git Mailing List, msysGit
Junio C Hamano <git...@pobox.com> writes:

> Johannes Sixt <j...@kdbg.org> writes:
>
>> On Windows, 'git rebase -i' with rebase.missingCommitsCheck set to
>> warn or error reports:
>>
>> Dropped commits (newer to older):
>> 'atal: bad revision '410dee56...
>>
>> The error comes from the git rev-list --stdin invocation in
>> git-rebase--interactive.sh (function check_todo_list)....

We have other places that take long list of things via --stdin
option. It somehow feels incomplete to patch only rev-list and not
others, doesn't it?

I looked at hits from 'grep -e --stdin Documentation/'. Here are
the findings.

1. These use strbuf_getline() to get one line at a time into a
strbuf and expects the line termination stripped off (i.e. these
callers do not want to worry about having LF at the end):

check-attr --stdin
check-ingore --stdin
check-mailmap --stdin
checkout-index --stdin
hash-object --stdin-paths
http-fetch --stdin
notes --stdin
send-pack --stdin
update-index --index-info

2. Any command in the "log" family uses strbuf_getwholeline(), so it
needs to know about the LF at the end explicitly:

rev-list --stdin
show --stdin
cherry-pick --stdin
...

3. This uses fgets() into a fixed buffer; it calls get_sha1_hex() on
it, and the expected input is one 40-hex per line, so it does not
matter if there is an extra CR at the end immediately before LF.

diff-tree --stdin

4. This slurps everything in-core, instead of going line-by-line.

update-ref --stdin

Now, I am wondering if it makes sense to do these two things:

* Teach revision.c::read_revisions_from_stdin() to use
strbuf_getline() instead of strbuf_getwholeline().

* Teach strbuf_getline() to remove CR at the end when stripping the
LF at the end, only if "term" parameter is set to LF.

Doing so would solve 1. and 2., but we obviously need to audit all
the other uses of strbuf_getline() to see if they can benefit (or if
some of them may be broken because they _always_ need LF terminated
lines, i.e. CRLF terminated input is illegal to them).

As to 3., I think it is OK. The code structure of 4. is too ugly
and needs to be revamped to go one line at a time first before even
thinking about how to proceed, I would think.

Thoughts?

Johannes Sixt

unread,
Aug 12, 2015, 2:24:53 PM8/12/15
to Junio C Hamano, Git Mailing List, msysGit
Am 12.08.2015 um 00:14 schrieb Junio C Hamano:
> Now, I am wondering if it makes sense to do these two things:
>
> * Teach revision.c::read_revisions_from_stdin() to use
> strbuf_getline() instead of strbuf_getwholeline().
>
> * Teach strbuf_getline() to remove CR at the end when stripping the
> LF at the end, only if "term" parameter is set to LF.
>
> Doing so would solve 1. and 2., but we obviously need to audit all
> the other uses of strbuf_getline() to see if they can benefit (or if
> some of them may be broken because they _always_ need LF terminated
> lines, i.e. CRLF terminated input is illegal to them).

I can see what I can do with these. Don't hold your breath, though.

> As to 3., I think it is OK. The code structure of 4. is too ugly
> and needs to be revamped to go one line at a time first before even
> thinking about how to proceed, I would think.

Regarding update-ref --stdin (your 4.), I notice that the input format
is very strict, so the solution is to allow an optional CR before the
LF. I alread have a patch, but it skips all trailing space, which is
probably too lenient. (I only needed the patch once for a debug
sesssion, but there is no obvious breakage without the patch.)

-- Hannes

Reply all
Reply to author
Forward
0 new messages