[msysGit] [PATCH v3 0/5] End-of-line normalization, redesigned

30 views
Skip to first unread message

Eyvind Bernhardsen

unread,
May 12, 2010, 7:00:50 PM5/12/10
to g...@vger.kernel.org, msysGit, Linus Torvalds, Junio C Hamano, Dmitry Potapov, Robert Buck, Finn Arne Gangstad, Jay Soffian
After Finn Arne's bombshell of a patch, I was almost ready to throw in
the towel on this series. Then I realized that just because autocrlf
is safe to use now doesn't mean it solves my CRLF-related problems.

The reason is that since autocrlf doesn't require your text files to
be normalized any more, it also doesn't guarantee that they are. If
you need to interoperate with some other SCM, have tools that require
a specific line ending, or you just like your repository free of CR
characters, autocrlf doesn't do that.

This series does that. There have been some changes since v2:

- Series is now based on Finn Arne's "safe autocrlf" patch (I took the
one from "pu" since Junio seems to have fixed some whitespace
damage).

- Removed core.eolStyle. This gets more explanation below.

- Added "crlf=lf" and "crlf=crlf"; they turn on normalization and
convert line endings to LF or CRLF on checkout, respectively. Yes,
I know.

- RFC patch: As promised, rename "crlf" attribute as "eolconv",
keeping "crlf" as an alias for backwards compatibility. I think
this one might be worth it, but perhaps not as implemented (see the
fix I made for git-cvsserver.perl to understand why).

- RFC patch: Rename "core.autocrlf" as "core.eolconv". This one is
mainly for fun, not so much for inclusion: it might have the same
problems as adding an alias for "crlf" and I'm not too bothered
about the name any more anyway, as I'll explain below.


So if I've removed eolStyle, how does the user say what line endings
to use for a normalized text file in the working directory? Using
"core.autocrlf". There are three reasons why that isn't completely
insane:

1. A user who wants CRLFs in text files probably doesn't want them
just in files that happen to have normalized line endings.

2. You can force CRLF in the working directory now, so if you just
want .vcproj files and the like to have CRLFs, you check in a
.gitattributes containing "*.vcproj crlf=crlf" or add that line to
your .git/info/attributes. No need to use autocrlf at all.

3. With the "safe autocrlf" patch, core.autocrlf is actually safe to
use in a non-normalized repository, so "core.autocrlf=true" is no
longer an insane default.

Given the intended usage for autocrlf it's not even a particularly bad
name any more: "I don't care how you do it, I just want CRLFs in my
text files". Even "autocrlf=input" isn't that bad if you squint a
bit. After a few beers.

Summary: the new "core.autocrlf" is for when you don't want to mess up
an existing repository with unwanted CRLFs, and the new "crlf"
mechanisms are for normalizing text files.


Eyvind Bernhardsen (4):
Add tests for per-repository eol normalization
Add per-repository eol normalization
Rename "crlf" attribute as "eolconv"
Rename "core.autocrlf" config variable as "core.eolconv"

Finn Arne Gangstad (1):
autocrlf: Make it work also for un-normalized repositories

Documentation/config.txt | 26 ++++---
Documentation/gitattributes.txt | 157 ++++++++++++++++++++++++++++++---------
attr.c | 2 +-
cache.h | 9 ++-
config.c | 13 ++-
convert.c | 115 +++++++++++++++++++++++-----
environment.c | 2 +-
git-cvsserver.perl | 8 ++-
t/t0020-crlf.sh | 106 ++++++++++++++++++++++++++
t/t0025-crlf-auto.sh | 134 +++++++++++++++++++++++++++++++++
10 files changed, 497 insertions(+), 75 deletions(-)
create mode 100755 t/t0025-crlf-auto.sh

Eyvind Bernhardsen

unread,
May 12, 2010, 7:00:51 PM5/12/10
to g...@vger.kernel.org, msysGit, Linus Torvalds, Junio C Hamano, Dmitry Potapov, Robert Buck, Finn Arne Gangstad, Jay Soffian
From: Finn Arne Gangstad <fin...@pvv.org>

Previously, autocrlf would only work well for normalized
repositories. Any text files that contained CRLF in the repository
would cause problems, and would be modified when handled with
core.autocrlf set.

Change autocrlf to not do any conversions to files that in the
repository already contain a CR. git with autocrlf set will never
create such a file, or change a LF only file to contain CRs, so the
(new) assumption is that if a file contains a CR, it is intentional,
and autocrlf should not change that.

The following sequence should now always be a NOP even with autocrlf
set (assuming a clean working directory):

git checkout <something>
touch *
git add -A . (will add nothing)
git commit (nothing to commit)

Previously this would break for any text file containing a CR.

Some of you may have been folowing Eyvind's excellent thread about
trying to make end-of-line translation in git a bit smoother.

I decided to attack the problem from a different angle: Is it possible
to make autocrlf behave non-destructively for all the previous problem cases?

Stealing the problem from Eyvind's initial mail (paraphrased and
summarized a bit):

1. Setting autocrlf globally is a pain since autocrlf does not work well
with CRLF in the repo
2. Setting it in individual repos is hard since you do it "too late"
(the clone will get it wrong)
3. If someone checks in a file with CRLF later, you get into problems again
4. If a repository once has contained CRLF, you can't tell autocrlf
at which commit everything is sane again
5. autocrlf does needless work if you know that all your users want
the same EOL style.

I belive that this patch makes autocrlf a safe (and good) default
setting for Windows, and this solves problems 1-4 (it solves 2 by being
set by default, which is early enough for clone).

I implemented it by looking for CR charactes in the index, and
aborting any conversion attempt if this is found.

Signed-off-by: Finn Arne Gangstad <fi...@pvv.org>
Signed-off-by: Junio C Hamano <git...@pobox.com>
Signed-off-by: Eyvind Bernhardsen <eyvind.be...@gmail.com>
---
convert.c | 49 +++++++++++++++++++++++++++++++++++++++++++++++++
t/t0020-crlf.sh | 52 ++++++++++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 101 insertions(+), 0 deletions(-)

diff --git a/convert.c b/convert.c
index 4f8fcb7..46622b0 100644
--- a/convert.c
+++ b/convert.c
@@ -120,6 +120,43 @@ static void check_safe_crlf(const char *path, int action,
}
}

+static int has_cr_in_index(const char *path)
+{
+ int pos, len;
+ unsigned long sz;
+ enum object_type type;
+ void *data;
+ int has_cr;
+ struct index_state *istate = &the_index;
+
+ len = strlen(path);
+ pos = index_name_pos(istate, path, len);
+ if (pos < 0) {
+ /*
+ * We might be in the middle of a merge, in which
+ * case we would read stage #2 (ours).
+ */
+ int i;
+ for (i = -pos - 1;
+ (pos < 0 && i < istate->cache_nr &&
+ !strcmp(istate->cache[i]->name, path));
+ i++)
+ if (ce_stage(istate->cache[i]) == 2)
+ pos = i;
+ }
+ if (pos < 0)
+ return 0;
+ data = read_sha1_file(istate->cache[pos]->sha1, &type, &sz);
+ if (!data || type != OBJ_BLOB) {
+ free(data);
+ return 0;
+ }
+
+ has_cr = memchr(data, '\r', sz) != NULL;
+ free(data);
+ return has_cr;
+}
+
static int crlf_to_git(const char *path, const char *src, size_t len,
struct strbuf *buf, int action, enum safe_crlf checksafe)
{
@@ -145,6 +182,13 @@ static int crlf_to_git(const char *path, const char *src, size_t len,
*/
if (is_binary(len, &stats))
return 0;
+
+ /*
+ * If the file in the index has any CR in it, do not convert.
+ * This is the new safer autocrlf handling.
+ */
+ if (has_cr_in_index(path))
+ return 0;
}

check_safe_crlf(path, action, &stats, checksafe);
@@ -203,6 +247,11 @@ static int crlf_to_worktree(const char *path, const char *src, size_t len,
return 0;

if (action == CRLF_GUESS) {
+ /* If we have any CR or CRLF line endings, we do not touch it */
+ /* This is the new safer autocrlf-handling */
+ if (stats.cr > 0 || stats.crlf > 0)
+ return 0;
+
/* If we have any bare CR characters, we're not going to touch it */
if (stats.cr != stats.crlf)
return 0;
diff --git a/t/t0020-crlf.sh b/t/t0020-crlf.sh
index c3e7e32..234a94f 100755
--- a/t/t0020-crlf.sh
+++ b/t/t0020-crlf.sh
@@ -453,5 +453,57 @@ test_expect_success 'invalid .gitattributes (must not crash)' '
git diff

'
+# Some more tests here to add new autocrlf functionality.
+# We want to have a known state here, so start a bit from scratch
+
+test_expect_success 'setting up for new autocrlf tests' '
+ git config core.autocrlf false &&
+ git config core.safecrlf false &&
+ rm -rf .????* * &&
+ for w in I am all LF; do echo $w; done >alllf &&
+ for w in Oh here is CRLFQ in text; do echo $w; done | q_to_cr >mixed &&
+ for w in I am all CRLF; do echo $w; done | append_cr >allcrlf &&
+ git add -A . &&
+ git commit -m "alllf, allcrlf and mixed only" &&
+ git tag -a -m "message" autocrlf-checkpoint
+'
+
+test_expect_success 'report no change after setting autocrlf' '
+ git config core.autocrlf true &&
+ touch * &&
+ git diff --exit-code
+'
+
+test_expect_success 'files are clean after checkout' '
+ rm * &&
+ git checkout -f &&
+ git diff --exit-code
+'
+
+cr_to_Q_no_NL () {
+ tr '\015' Q | tr -d '\012'
+}
+
+test_expect_success 'LF only file gets CRLF with autocrlf' '
+ test "$(cr_to_Q_no_NL < alllf)" = "IQamQallQLFQ"
+'
+
+test_expect_success 'Mixed file is still mixed with autocrlf' '
+ test "$(cr_to_Q_no_NL < mixed)" = "OhhereisCRLFQintext"
+'
+
+test_expect_success 'CRLF only file has CRLF with autocrlf' '
+ test "$(cr_to_Q_no_NL < allcrlf)" = "IQamQallQCRLFQ"
+'
+
+test_expect_success 'New CRLF file gets LF in repo' '
+ tr -d "\015" < alllf | append_cr > alllf2 &&
+ git add alllf2 &&
+ git commit -m "alllf2 added" &&
+ git config core.autocrlf false &&
+ rm * &&
+ git checkout -f &&
+ test_cmp alllf alllf2
+'

test_done
--
1.7.1.3.g448cb.dirty

Eyvind Bernhardsen

unread,
May 12, 2010, 7:00:54 PM5/12/10
to g...@vger.kernel.org, msysGit, Linus Torvalds, Junio C Hamano, Dmitry Potapov, Robert Buck, Finn Arne Gangstad, Jay Soffian
As discussed at length on the list, "crlf" is a pretty bad name for an
attribute that enables end-of-line conversion, and the addition of "lf"
and "crlf" values for it doesn't help.

Rename the attribute "eolconv", but fall back to "crlf" for backwards
compatibility if "eolconv" is not set.

Signed-off-by: Eyvind Bernhardsen <eyvind.be...@gmail.com>
---
Documentation/gitattributes.txt | 51 ++++++++++++++++++++------------------
attr.c | 2 +-
convert.c | 15 ++++++++---
git-cvsserver.perl | 8 ++++-
t/t0025-crlf-auto.sh | 31 ++++++++++++++++-------
5 files changed, 67 insertions(+), 40 deletions(-)

diff --git a/Documentation/gitattributes.txt b/Documentation/gitattributes.txt
index bb3b446..2887f85 100644
--- a/Documentation/gitattributes.txt
+++ b/Documentation/gitattributes.txt
@@ -92,30 +92,33 @@ such as 'git checkout' and 'git merge' run. They also affect how
git stores the contents you prepare in the working tree in the
repository upon 'git add' and 'git commit'.

-`crlf`
-^^^^^^
+`eolconv`
+^^^^^^^^^

This attribute enables and controls end-of-line normalization. When a
text file is normalized, its line endings are converted to LF in the
repository. Text files can have their line endings converted to
-CRLF in the working directory, using the `crlf` attribute for
+CRLF in the working directory, using the `eolconv` attribute for
individual files or the `core.autocrlf` configuration variable for all
files.

+For compatibility with older versions of git, `crlf` is an alias for
+this attribute.
+
Set::

- Setting the `crlf` attribute on a path enables end-of-line
+ Setting the `eolconv` attribute on a path enables end-of-line
normalization and marks the path as a text file. End-of-line
conversion takes place without guessing the content type.

Unset::

- Unsetting the `crlf` attribute on a path tells git not to
+ Unsetting the `eolconv` attribute on a path tells git not to
attempt any end-of-line conversion upon checkin or checkout.

Set to string value "auto"::

- When `crlf` is set to "auto", the path is marked for automatic
+ When `eolconv` is set to "auto", the path is marked for automatic
end-of-line normalization. If git decides that the content is
text, its line endings are normalized to LF on checkin.

@@ -134,13 +137,13 @@ Set to string value "lf"::

Unspecified::

- Leaving the `crlf` attribute unspecified tells git to apply
+ Leaving the `eolconv` attribute unspecified tells git to apply
end-of-line normalization only if the `core.autocrlf`
configuration variable is set, the content appears to be text,
and the file is either new or already normalized in the
repository.

-Any other value causes git to act as if `crlf` has been left
+Any other value causes git to act as if `eolconv` has been left
unspecified.


@@ -157,10 +160,10 @@ the working directory, and prevent .jpg files from being normalized
regardless of their content.

------------------------
-*.txt crlf
-*.vcproj crlf=crlf
-*.sh crlf=lf
-*.jpg -crlf
+*.txt eolconv
+*.vcproj eolconv=crlf
+*.sh eolconv=lf
+*.jpg -eolconv
------------------------

Other source code management systems normalize all text files in their
@@ -185,24 +188,24 @@ files without conversion to CRLF in the working directory.

If you want to interoperate with a source code management system that
enforces end-of-line normalization, or you simply want all text files
-in your repository to be normalized, you should instead set the `crlf`
+in your repository to be normalized, you should instead set the `eolconv`
attribute to "auto" for _all_ files.

------------------------
-* crlf=auto
+* eolconv=auto
------------------------

This ensures that all files that git considers to be text will have
normalized (LF) line endings in the repository.

-NOTE: When `crlf=auto` normalization is enabled in an existing
+NOTE: When `eolconv=auto` normalization is enabled in an existing
repository, any text files containing CRLFs should be normalized. If
they are not they will be normalized the next time someone tries to
change them, causing unfortunate misattribution. From a clean working
directory:

-------------------------------------------------
-$ echo "* crlf=auto" >>.gitattributes
+$ echo "* eolconv=auto" >>.gitattributes
# ...this should be the first line in .gitattributes
$ rm .git/index # Remove the index to force git to
$ git reset # re-scan the working directory
@@ -213,17 +216,17 @@ $ git commit -m "Introduce end-of-line normalization"
-------------------------------------------------

If any files that should not be normalized show up in 'git status',
-unset their `crlf` attribute before running 'git add -u'.
+unset their `eolconv` attribute before running 'git add -u'.

------------------------
-manual.pdf -crlf
+manual.pdf -eolconv
------------------------

Conversely, text files that git does not detect can have normalization
enabled manually.

------------------------
-weirdchars.txt crlf
+weirdchars.txt eolconv
------------------------

If `core.safecrlf` is set to "true" or "warn", git verifies if
@@ -309,11 +312,11 @@ Interaction between checkin/checkout attributes
In the check-in codepath, the worktree file is first converted
with `filter` driver (if specified and corresponding driver
defined), then the result is processed with `ident` (if
-specified), and then finally with `crlf` (again, if specified
+specified), and then finally with `eolconv` (again, if specified
and applicable).

In the check-out codepath, the blob content is first converted
-with `crlf`, and then `ident` and fed to `filter`.
+with `eolconv`, and then `ident` and fed to `filter`.


Generating diff text
@@ -717,7 +720,7 @@ You do not want any end-of-line conversions applied to, nor textual diffs
produced for, any binary file you track. You would need to specify e.g.

------------
-*.jpg -crlf -diff
+*.jpg -eolconv -diff
------------

but that may become cumbersome, when you have many attributes. Using
@@ -730,7 +733,7 @@ the same time. The system knows a built-in attribute macro, `binary`:

which is equivalent to the above. Note that the attribute macros can only
be "Set" (see the above example that sets "binary" macro as if it were an
-ordinary attribute --- setting it in turn unsets "crlf" and "diff").
+ordinary attribute --- setting it in turn unsets "eolconv" and "diff").


DEFINING ATTRIBUTE MACROS
@@ -741,7 +744,7 @@ at the toplevel (i.e. not in any subdirectory). The built-in attribute
macro "binary" is equivalent to:

------------
-[attr]binary -diff -crlf
+[attr]binary -diff -eolconv
------------


diff --git a/attr.c b/attr.c
index f5346ed..7f924bc 100644
--- a/attr.c
+++ b/attr.c
@@ -287,7 +287,7 @@ static void free_attr_elem(struct attr_stack *e)
}

static const char *builtin_attr[] = {
- "[attr]binary -diff -crlf",
+ "[attr]binary -diff -eolconv",
NULL,
};

diff --git a/convert.c b/convert.c
index 0eb3d4b..b46f85d 100644
--- a/convert.c
+++ b/convert.c
@@ -438,11 +438,13 @@ static int read_convert_config(const char *var, const char *value, void *cb)

static void setup_convert_check(struct git_attr_check *check)
{
+ static struct git_attr *attr_eolconv;
static struct git_attr *attr_crlf;
static struct git_attr *attr_ident;
static struct git_attr *attr_filter;

if (!attr_crlf) {
+ attr_eolconv = git_attr("eolconv");
attr_crlf = git_attr("crlf");
attr_ident = git_attr("ident");
attr_filter = git_attr("filter");
@@ -452,6 +454,7 @@ static void setup_convert_check(struct git_attr_check *check)
check[0].attr = attr_crlf;
check[1].attr = attr_ident;
check[2].attr = attr_filter;
+ check[3].attr = attr_eolconv;
}

static int count_ident(const char *cp, unsigned long size)
@@ -639,7 +642,7 @@ static int git_path_check_ident(const char *path, struct git_attr_check *check)
int convert_to_git(const char *path, const char *src, size_t len,
struct strbuf *dst, enum safe_crlf checksafe)
{
- struct git_attr_check check[3];
+ struct git_attr_check check[4];
int crlf = CRLF_GUESS;
int ident = 0, ret = 0;
const char *filter = NULL;
@@ -647,7 +650,9 @@ int convert_to_git(const char *path, const char *src, size_t len,
setup_convert_check(check);
if (!git_checkattr(path, ARRAY_SIZE(check), check)) {
struct convert_driver *drv;
- crlf = git_path_check_crlf(path, check + 0);
+ crlf = git_path_check_crlf(path, check + 3);
+ if (crlf == CRLF_GUESS)
+ crlf = git_path_check_crlf(path, check + 0);
ident = git_path_check_ident(path, check + 1);
drv = git_path_check_convert(path, check + 2);
if (drv && drv->clean)
@@ -669,7 +674,7 @@ int convert_to_git(const char *path, const char *src, size_t len,

int convert_to_working_tree(const char *path, const char *src, size_t len, struct strbuf *dst)
{
- struct git_attr_check check[3];
+ struct git_attr_check check[4];
int crlf = CRLF_GUESS;
int ident = 0, ret = 0;
const char *filter = NULL;
@@ -677,7 +682,9 @@ int convert_to_working_tree(const char *path, const char *src, size_t len, struc
setup_convert_check(check);
if (!git_checkattr(path, ARRAY_SIZE(check), check)) {
struct convert_driver *drv;
- crlf = git_path_check_crlf(path, check + 0);
+ crlf = git_path_check_crlf(path, check + 3);
+ if (crlf == CRLF_GUESS)
+ crlf = git_path_check_crlf(path, check + 0);
ident = git_path_check_ident(path, check + 1);
drv = git_path_check_convert(path, check + 2);
if (drv && drv->smudge)
diff --git a/git-cvsserver.perl b/git-cvsserver.perl
index 13751db..ede47a6 100755
--- a/git-cvsserver.perl
+++ b/git-cvsserver.perl
@@ -2369,8 +2369,12 @@ sub kopts_from_path
if ( defined ( $cfg->{gitcvs}{usecrlfattr} ) and
$cfg->{gitcvs}{usecrlfattr} =~ /\s*(1|true|yes)\s*$/i )
{
- my ($val) = check_attr( "crlf", $path );
- if ( $val eq "set" )
+ my ($val) = check_attr( "eolconv", $path );
+ if ( $val eq "unspecified" )
+ {
+ $val = check_attr( "crlf", $path );
+ }
+ if ( $val =~ /^(set|crlf|lf)$/ )
{
return "";
}
diff --git a/t/t0025-crlf-auto.sh b/t/t0025-crlf-auto.sh
index f11fee4..05e5725 100755
--- a/t/t0025-crlf-auto.sh
+++ b/t/t0025-crlf-auto.sh
@@ -41,9 +41,22 @@ test_expect_success 'default settings cause no changes' '
test -z "$onediff" -a -z "$twodiff"
'

-test_expect_success 'crlf=true causes a CRLF file to be normalized' '
+test_expect_success 'eolconv=true causes a CRLF file to be normalized' '

rm -f .gitattributes tmp one two &&
+ echo "two eolconv" > .gitattributes &&
+ git read-tree --reset -u HEAD &&
+
+ # Note, "normalized" means that git will normalize it if added
+ has_cr two &&
+ twodiff=`git diff two` &&
+ test -n "$twodiff"
+'
+
+test_expect_success 'crlf=true also causes a CRLF file to be normalized' '
+
+ # Backwards compatilibity check
+ rm -f .gitattributes tmp one two &&
echo "two crlf" > .gitattributes &&
git read-tree --reset -u HEAD &&

@@ -53,11 +66,11 @@ test_expect_success 'crlf=true causes a CRLF file to be normalized' '
test -n "$twodiff"
'

-test_expect_success 'crlf=crlf gives a normalized file CRLFs with autocrlf=false' '
+test_expect_success 'eolconv=crlf gives a normalized file CRLFs with autocrlf=false' '

rm -f .gitattributes tmp one two &&
git config core.autocrlf false &&
- echo "one crlf=crlf" > .gitattributes &&
+ echo "one eolconv=crlf" > .gitattributes &&
git read-tree --reset -u HEAD &&

has_cr one &&
@@ -65,11 +78,11 @@ test_expect_success 'crlf=crlf gives a normalized file CRLFs with autocrlf=false
test -z "$onediff"
'

-test_expect_success 'crlf=crlf gives a normalized file CRLFs with autocrlf=input' '
+test_expect_success 'eolconv=crlf gives a normalized file CRLFs with autocrlf=input' '

rm -f .gitattributes tmp one two &&
git config core.autocrlf input &&
- echo "one crlf=crlf" > .gitattributes &&
+ echo "one eolconv=crlf" > .gitattributes &&
git read-tree --reset -u HEAD &&

has_cr one &&
@@ -77,11 +90,11 @@ test_expect_success 'crlf=crlf gives a normalized file CRLFs with autocrlf=input
test -z "$onediff"
'

-test_expect_success 'crlf=lf gives a normalized file LFs with autocrlf=true' '
+test_expect_success 'eolconv=lf gives a normalized file LFs with autocrlf=true' '

rm -f .gitattributes tmp one two &&
git config core.autocrlf true &&
- echo "one crlf=lf" > .gitattributes &&
+ echo "one eolconv=lf" > .gitattributes &&
git read-tree --reset -u HEAD &&

! has_cr one &&
@@ -102,11 +115,11 @@ test_expect_success 'autocrlf=true does not normalize CRLF files' '
test -z "$onediff" -a -z "$twodiff"
'

-test_expect_success 'crlf=auto, autocrlf=true _does_ normalize CRLF files' '
+test_expect_success 'eolconv=auto, autocrlf=true _does_ normalize CRLF files' '

rm -f .gitattributes tmp one two &&
git config core.autocrlf true &&
- echo "* crlf=auto" > .gitattributes &&
+ echo "* eolconv=auto" > .gitattributes &&
git read-tree --reset -u HEAD &&

has_cr one &&
--
1.7.1.3.g448cb.dirty

Linus Torvalds

unread,
May 12, 2010, 9:38:54 PM5/12/10
to Eyvind Bernhardsen, g...@vger.kernel.org, msysGit, Junio C Hamano, Dmitry Potapov, Robert Buck, Finn Arne Gangstad, Jay Soffian


On Thu, 13 May 2010, Eyvind Bernhardsen wrote:
>
> ------------------------
> -*.txt crlf
> -*.vcproj crlf=crlf
> -*.sh crlf=lf
> -*.jpg -crlf
> +*.txt eolconv
> +*.vcproj eolconv=crlf
> +*.sh eolconv=lf
> +*.jpg -eolconv
> ------------------------
...
> ------------------------
> -* crlf=auto
> +* eolconv=auto
> ------------------------

If you are doing the renaming, then I seriously object to this.

It makes no sense to say "eolconv=crlf" and then say "eolconv=auto". They
are two totally different things. One is _how_ line endings should look
like, and the other is _whether_ line endings exist or not.

And "eolconv=crlf" makes no sense anyway. I assume "conv" is
conversion, but a conversion implies a from and a to. That's just a
"to", and it would make much more sense to just say "eol=crlf" for that
case.

Now, it _does_ make sense to say "eolconv=auto", but that's because it's
that totally different case: it's not about what the line ending
character is, it's about whether any eol conversion is done at all. So
for _that_ case, it makes sense to use "eolconv", although even for that
case I think the name is not very _good_.
So if you rename these things, keep them separate. Make the "am I a
text-file" boolean be a boolean (plus "auto"), and just call it "text".
And make the "what end of line to use" be just "eol" then.

So you can have

* text=auto,eol=crlf

that means "autodetect whether it is text, and use crlf as eol".

Now, I'd further suggest:

- "eol=xyz" with no "text" attribute automatically implies "text" being
true.
- "text=xyz" with no "eol" attribute implies "eol=native"

so now you can write:

*.jpg -text
*.txt text
*.vcproj eol=crlf
*.sh eol=lf
* text=auto

and that means:

- jpg files are binary
- *.txt files are text, and we use the default ("native") line ending for
them (implicit, since we don't have any matcing eol rule)
- *.vcproj files are text (implicit), and we use CRLF line endings
- *.sh files are text (implicit), and we use UNIX style line endings
- everything else is auto-detected, and we implicitly use native line
endings for them

Doesn't that look finally sane?

Because if we really renaem the attributes, let's rename them _right_.

Linus

Robert Buck

unread,
May 13, 2010, 5:58:45 AM5/13/10
to Linus Torvalds, Eyvind Bernhardsen, g...@vger.kernel.org, msysGit, Junio C Hamano, Dmitry Potapov, Finn Arne Gangstad, Jay Soffian
Quick question here, while people would be in the convert.c functions
when making the above changes. This question is related to detecting
whether a file is text, but the question could be spun off to a
different thread if you so wish...

Have you considered skipping the UTF8 BOM and provided that the
remaining content is considered text allow auto conversions? The check
is simple, and would cover at least 50% of latin-derived languages.
Since you have the buffer at hand, and are in the same file
(convert.c), simply check for an initial EF BB BF. This would fix some
text files created on Windows (someone had mentioned Notepad I
believe). Out of the box experience for eol and text detection for
Windows users would be improved.

Bob

Robert Buck

unread,
May 13, 2010, 5:39:03 AM5/13/10
to Linus Torvalds, Eyvind Bernhardsen, g...@vger.kernel.org, msysGit, Junio C Hamano, Dmitry Potapov, Finn Arne Gangstad, Jay Soffian
[...]
> Because if we really rename the attributes, let's rename them _right_.
>
>                        Linus
>

Love it!

Eyvind Bernhardsen

unread,
May 13, 2010, 6:59:15 AM5/13/10
to Linus Torvalds, g...@vger.kernel.org, msysGit, Junio C Hamano, Dmitry Potapov, Robert Buck, Finn Arne Gangstad, Jay Soffian
On 13. mai 2010, at 03.38, Linus Torvalds wrote:

> so now you can write:
>
> *.jpg -text
> *.txt text
> *.vcproj eol=crlf
> *.sh eol=lf
> * text=auto

[...]

> Doesn't that look finally sane?
>
> Because if we really renaem the attributes, let's rename them _right_.

Beautiful.

Do you agree that "native" eol should only be CRLF if autocrlf is true? Otherwise, if .gitattributes looks like this:

*.txt text

git will put CRLFs in .txt files but LFs in .c files, and I don't think that makes much sense.
--
Eyvind

Eyvind Bernhardsen

unread,
May 13, 2010, 7:47:45 AM5/13/10
to Robert Buck, git@vger.kernel.org List, msysGit
I just did a quick test with a plain text file; it was detected as text both with and without a utf8 BOM. Looking at the code, characters >= 128 are considered printable so the BOM shouldn't make any difference at all. Do you have an example utf8 text file that is misdetected as binary?
--
Eyvind

Robert Buck

unread,
May 13, 2010, 9:19:15 AM5/13/10
to Eyvind Bernhardsen, git@vger.kernel.org List, msysGit
Sorry, my bad. I misread a line in convert.c. It handles UTF-8 beautifully.

Linus Torvalds

unread,
May 13, 2010, 5:45:14 PM5/13/10
to Eyvind Bernhardsen, g...@vger.kernel.org, msysGit, Junio C Hamano, Dmitry Potapov, Robert Buck, Finn Arne Gangstad, Jay Soffian


On Thu, 13 May 2010, Eyvind Bernhardsen wrote:
>
> Do you agree that "native" eol should only be CRLF if autocrlf is true?

Not really. We're trying to get _away_ from .gitattributes depending on
autocrlf, aren't we?

> Otherwise, if .gitattributes looks like this:
>
> *.txt text
>
> git will put CRLFs in .txt files but LFs in .c files, and I don't think
> that makes much sense.

Well, but that's what you asked for, isn't it? And I don't see why you say
*.c files would have LF's, since that depends on what you put in them: and
under Windows, that might well be CRLF.

And I do think it's perfectly reasonable to override the "native" mode in
your .git/config. If we're renaming the attributes, we might as well then
introduce a

[core]
eol=lf

to set the "native" EOL for that repo, exactly because presumably a number
of Windows people would like to see the saner LF-only model rather than
the traditional native CRLF.

In fact, maybe it would even make sense to just make LF the default
"native" end-of-line sequence even on windows, so that Windows people who
actually want CRLF would have to set core.eol=crlf. Whatever. That would
be for the Windows git users to fight out, I don't care.

But if we are going to clean up text attribute handling, then I really
think we want to totally break that old "core.autocrlf" dependency.

Linus

Robert Buck

unread,
May 13, 2010, 10:34:27 PM5/13/10
to Linus Torvalds, Eyvind Bernhardsen, g...@vger.kernel.org, msysGit, Junio C Hamano, Dmitry Potapov, Finn Arne Gangstad, Jay Soffian
Probably a newbie question, lots to read, lots already read, but I
really want to verify if I have this correct. So in a nutshell, in the
gitattributes file

* text
*.foo binary

means autoconvert everything regardless of the autocrlf setting,
except for .foo files ? So now we can dispense with the autocrlf
attribute altogether if we so wish?

- Bob

Jonathan Nieder

unread,
May 14, 2010, 12:56:46 AM5/14/10
to Robert Buck, Linus Torvalds, Eyvind Bernhardsen, g...@vger.kernel.org, msysGit, Junio C Hamano, Dmitry Potapov, Finn Arne Gangstad, Jay Soffian
Hi Bob,

Robert Buck wrote:

> * text
> *.foo binary
>
> means autoconvert everything regardless of the autocrlf setting,
> except for .foo files ? So now we can dispense with the autocrlf
> attribute altogether if we so wish?

If I understand correctly, there is no autocrlf attribute, just a
configuration item. If you put

* crlf
*.foo -crlf

in your .gitattributes with current git, this means:

- if the '[core] autocrlf' configuration is not set, do not convert
anything;

- otherwise, convert everything except for .foo files

Eyvind’s series improves that in a few ways.

- [from Finn Arne Gangstad] If the in-repository copy of a file
contains any carriage returns, do not try to convert it. This
makes it easier to deal with mistakes.

- For files with crlf enabled through attributes, always convert,
whether '[core] autocrlf' is enabled or not.

- Use the '[core] autocrlf' setting to determine the desired
line-ending for checked-out files (\r\n if true, \n otherwise).
A new eol attribute is provided to override that setting.

- The crlf attribute gets a new synonym "text" to avoid confusion.

There is also some change to the result of file type autodetection,
but as long as your .gitattributes uses '* crlf' or '* -crlf', there
is no need to worry about this.

Hope that helps,
Jonathan

Dmitry Potapov

unread,
May 14, 2010, 6:16:48 AM5/14/10
to Eyvind Bernhardsen, Robert Buck, git@vger.kernel.org List, msysGit
On Thu, May 13, 2010 at 01:47:45PM +0200, Eyvind Bernhardsen wrote:
>
> I just did a quick test with a plain text file; it was detected as
> text both with and without a utf8 BOM. Looking at the code,
> characters >= 128 are considered printable so the BOM shouldn't make
> any difference at all. Do you have an example utf8 text file that is
> misdetected as binary?

Though UTF-8 BOM does not present any problem for automatic text
detector, it is another piece from Microsoft that creates some
interoperability issues when you work with non-ASCII text files.
In short:

1. Microsoft editors and tools like to add utf8 BOM to files, and
you cannot turn this behavior off.
2. Many tools (such as Microsoft compiler) incapable to recognize
UTF-8 files without BOM, so they screw up all non-ASCII chars.

#1 is a problem, because it creates changes consisting solely of adding
utf8 BOM. Moreover, users of non-Windows platforms are not exactly
thrilled with having utf8 BOM at the beginning of every text file.

Probably, ability of automatic add utf8 BOM on Windows to text files
(which are marked as "unicode") can be helpful, but it is just a part
of the problem of how to deal with text files in "legacy" encoding,
which are still widely used on Windows.



Dmitry

Eyvind Bernhardsen

unread,
May 14, 2010, 5:32:03 PM5/14/10
to Robert Buck, Linus Torvalds, git@vger.kernel.org List, msysGit, Junio C Hamano, Dmitry Potapov, Finn Arne Gangstad, Jay Soffian, Jonathan Nieder
On 14. mai 2010, at 04.34, Robert Buck wrote:

> Probably a newbie question, lots to read, lots already read, but I
> really want to verify if I have this correct. So in a nutshell, in the
> gitattributes file
>
> * text

I missed this when I replied to Jonathan, but you probably want "* text=auto" here. "* text" would force git to treat all files as text files.

Also, as Jonathan said, if you want CRLF line endings you currently have to have core.autocrlf set to "true" (which is the default on Windows).
--
Eyvind

Linus Torvalds

unread,
May 14, 2010, 5:27:57 PM5/14/10
to Eyvind Bernhardsen, g...@vger.kernel.org, msysGit, Junio C Hamano, Dmitry Potapov, Robert Buck, Finn Arne Gangstad, Jay Soffian


On Fri, 14 May 2010, Eyvind Bernhardsen wrote:

> On 13. mai 2010, at 23.45, Linus Torvalds wrote:
>
> > On Thu, 13 May 2010, Eyvind Bernhardsen wrote:
> >>
> >> Do you agree that "native" eol should only be CRLF if autocrlf is true?
> >
> > Not really. We're trying to get _away_ from .gitattributes depending on
> > autocrlf, aren't we?
>
> I'm not sure we still are. I certainly was when I started this series,
> but that was because autocrlf just plain didn't work with many existing
> repositories. When "safe autocrlf" fixed that, I decided that the extra
> complexity of core.eolStyle wasn't worth it.

The thing is, I disagree with your notion of "safe autocrlf". I think it's
ugly, and I don't think it's safe at all. It adds a _feeling_ of safety
that isn't actually safe.

In short:

- core.autocrlf is _always_ dangerous. Your "safe" thing isn't any safer
at all, since it depends on something that isn't reliable (previous
state).

Example: new binary files, or changed files, or renames.

- so if you want text conversion, but you want it to be truly safe, and
only happen for certain files, YOU MUST NOT ENABLE autocrlf.

- Ergo: if you make the .gitattributes behaviour depend on autocrlf,
you're still screwed, and you've not actually improved on anything at
all in the end.

It's really that simple. I think "autocrlf" actually works pretty well,
but at the same time, I think we made mistakes in the initial design.
Let's not make them again.

Linus

Eyvind Bernhardsen

unread,
May 14, 2010, 5:16:29 PM5/14/10
to Linus Torvalds, g...@vger.kernel.org, msysGit, Junio C Hamano, Dmitry Potapov, Robert Buck, Finn Arne Gangstad, Jay Soffian
On 13. mai 2010, at 23.45, Linus Torvalds wrote:

> On Thu, 13 May 2010, Eyvind Bernhardsen wrote:
>>
>> Do you agree that "native" eol should only be CRLF if autocrlf is true?
>
> Not really. We're trying to get _away_ from .gitattributes depending on
> autocrlf, aren't we?

I'm not sure we still are. I certainly was when I started this series, but that was because autocrlf just plain didn't work with many existing repositories. When "safe autocrlf" fixed that, I decided that the extra complexity of core.eolStyle wasn't worth it.

I could be wrong, and I'd be happy to add it later. I don't think this series requires it, though.

I'd like to make my terms explicit: when I say "core.autocrlf", I mean a config value that makes git normalize all text files automagically. "core.eol" would be a different config value that simply tells git what line endings to put in files that are explicitly flagged as "text" (or automatically detected by "text=auto").

>> Otherwise, if .gitattributes looks like this:
>>
>> *.txt text
>>
>> git will put CRLFs in .txt files but LFs in .c files, and I don't think
>> that makes much sense.
>
> Well, but that's what you asked for, isn't it? And I don't see why you say
> *.c files would have LF's, since that depends on what you put in them: and
> under Windows, that might well be CRLF.

That's not an interesting problem. If you're okay with CRLFs in your repository there's no need for you to use text file normalization at all, and you're certainly not going to bother to set any text attributes. Everything will Just Work.

To make it more relevant, let's consider what would happen if you suddenly wanted to share that repository with a Linux user. You would clearly have been better off if the text files had been normalized, but I can only see three ways this could happen:

1. You set "* text=auto" when you created the repository
2. text=auto is the default for all files
3. autocrlf=true is set by default on Windows

The first option is unrealistic, and we probably agree that the second one is a bad idea. That's why, once Finn Arne fixed autocrlf, I realized it's not all that bad.

> And I do think it's perfectly reasonable to override the "native" mode in
> your .git/config. If we're renaming the attributes, we might as well then
> introduce a
>
> [core]
> eol=lf
>
> to set the "native" EOL for that repo, exactly because presumably a number
> of Windows people would like to see the saner LF-only model rather than
> the traditional native CRLF.

But they can equally easily set "core.autocrlf=false". Although the name still grates.

> In fact, maybe it would even make sense to just make LF the default
> "native" end-of-line sequence even on windows, so that Windows people who
> actually want CRLF would have to set core.eol=crlf. Whatever. That would
> be for the Windows git users to fight out, I don't care.

This is the crux of the problem. It's possible that I'm just being prejudiced, but I think that if someone wants CRLF as a _default_ they probably want it to be the default for all text files, not just normalized ones.

> But if we are going to clean up text attribute handling, then I really
> think we want to totally break that old "core.autocrlf" dependency.

"core.autocrlf=true" is exactly equivalent to "core.eol=crlf" in a repository with "* text=auto" (setting the "text" attribute disables the index check).

In a repository that doesn't care, "core.autocrlf=true" will normalize your text files and put CRLFs in them, while "core.eol=crlf" won't do a thing.

Unless you're simply arguing for renaming autocrlf to eol?
--
Eyvind

Eyvind Bernhardsen

unread,
May 14, 2010, 5:21:53 PM5/14/10
to Jonathan Nieder, Robert Buck, Linus Torvalds, g...@vger.kernel.org, msysGit, Junio C Hamano, Dmitry Potapov, Finn Arne Gangstad, Jay Soffian
On 14. mai 2010, at 06.56, Jonathan Nieder wrote:

[Lots of good answers cut]

> - The crlf attribute gets a new synonym "text" to avoid confusion.

I would prefer to phrase that as "the text attribute has the synonym 'crlf' for backwards compatilibity". If I wanted to avoid confusion I wouldn't have renamed it ;)
--
Eyvind

Eyvind Bernhardsen

unread,
May 15, 2010, 4:23:52 PM5/15/10
to Dmitry Potapov, Robert Buck, git@vger.kernel.org List, msysGit
On 14. mai 2010, at 12.16, Dmitry Potapov wrote:

> Probably, ability of automatic add utf8 BOM on Windows to text files
> (which are marked as "unicode") can be helpful, but it is just a part
> of the problem of how to deal with text files in "legacy" encoding,
> which are still widely used on Windows.

Sounds like something a clean/smudge filter should be able to do. The clean filter converts legacy encoded text to utf8 and strips any utf8 BOM before checking the file in, and the smudge filter writes the file out as utf8 with a BOM (which hopefully works no matter what your code page is? I don't know much about Windows i18n).

Adding this to convert.c would be more difficult, at least politically, since I assume it would be Windows-specific code.
--
Eyvind

Eyvind Bernhardsen

unread,
May 15, 2010, 4:47:25 PM5/15/10
to Linus Torvalds, g...@vger.kernel.org, msysGit, Junio C Hamano, Finn Arne Gangstad
Introduce a new configuration variable, "core.eol", that allows the user
to set which line endings to use for end-of-line-normalized files in the
working directory. It defaults to "native", which means CRLF on Windows
and LF everywhere else.

For backwards compatibility, "core.autocrlf" will override core.eol if
core.eol is left unset. This means that

[core]
autocrlf = true

will give CRLFs in the working directory even on platforms with LF as
their native line ending.

If core.eol is set explicitly (including setting it to "native"), it
will override core.autocrlf so that

[core]
autocrlf = true
eol = lf

normalizes all files that look like text, but does not put CRLFs in the
working directory.

Signed-off-by: Eyvind Bernhardsen <eyvind.be...@gmail.com>
---

It turns out that my resistance to "core.eol" was mostly laziness, so I
just implemented it.

I decided that "core.autocrlf" has to override the native line ending if
"core.eol" isn't set explicitly, which gives some extra complexity in
convert.c.

For 1.8 I would consider making core.autocrlf just turn on normalization
and leave the working directory line ending decision to core.eol, but
that _will_ break people's setups.

Patch is on top of my latest series.
--
Eyvind

Documentation/config.txt | 8 ++++
Documentation/gitattributes.txt | 6 ++-
Makefile | 3 +
cache.h | 13 ++++++
config.c | 12 ++++++
convert.c | 39 +++++++++++-------
environment.c | 1 +
t/t0026-eol-config.sh | 83 +++++++++++++++++++++++++++++++++++++++
8 files changed, 149 insertions(+), 16 deletions(-)
create mode 100755 t/t0026-eol-config.sh

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 207351b..7cc15a4 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -207,6 +207,14 @@ core.autocrlf::
the file's `text` attribute, or if `text` is unspecified,
based on the file's contents. See linkgit:gitattributes[5].

+core.eol::
+ Sets the line ending type to use in the working directory for
+ files that have the `text` property set. Alternatives are
+ 'lf', 'crlf' and 'native', which uses the platform's native
+ line ending. The default value is `native`. See
+ linkgit:gitattributes[5] for more information on end-of-line
+ conversion.
+
core.safecrlf::
If true, makes git check if converting `CRLF` is reversible when
end-of-line conversion is active. Git will verify if a command
diff --git a/Documentation/gitattributes.txt b/Documentation/gitattributes.txt
index 25753b7..8268c09 100644
--- a/Documentation/gitattributes.txt
+++ b/Documentation/gitattributes.txt
@@ -207,7 +207,11 @@ attribute to "auto" for _all_ files.
------------------------

This ensures that all files that git considers to be text will have
-normalized (LF) line endings in the repository.
+normalized (LF) line endings in the repository. The `core.eol`
+configuration variable controls which line endings git will use for
+normalized files in your working directory; the default is to use the
+native line ending for your platform, or CRLF if `core.autocrlf` is
+set.

NOTE: When `text=auto` normalization is enabled in an existing
repository, any text files containing CRLFs should be normalized. If
diff --git a/Makefile b/Makefile
index 910f471..419532e 100644
--- a/Makefile
+++ b/Makefile
@@ -224,6 +224,8 @@ all::
#
# Define CHECK_HEADER_DEPENDENCIES to check for problems in the hard-coded
# dependency rules.
+#
+# Define NATIVE_CRLF if your platform uses CRLF for line endings.

GIT-VERSION-FILE: FORCE
@$(SHELL_PATH) ./GIT-VERSION-GEN
@@ -989,6 +991,7 @@ ifeq ($(uname_S),Windows)
NO_CURL = YesPlease
NO_PYTHON = YesPlease
BLK_SHA1 = YesPlease
+ NATIVE_CRLF = YesPlease

CC = compat/vcbuild/scripts/clink.pl
AR = compat/vcbuild/scripts/lib.pl
diff --git a/cache.h b/cache.h
index d1f669e..ac6bfbd 100644
--- a/cache.h
+++ b/cache.h
@@ -568,6 +568,19 @@ enum auto_crlf {

extern enum auto_crlf auto_crlf;

+enum eol {
+ EOL_UNSET,
+ EOL_CRLF,
+ EOL_LF,
+#ifdef NATIVE_CRLF
+ EOL_NATIVE = EOL_CRLF
+#else
+ EOL_NATIVE = EOL_LF
+#endif
+};
+
+extern enum eol eol;
+
enum branch_track {
BRANCH_TRACK_UNSPECIFIED = -1,
BRANCH_TRACK_NEVER = 0,
diff --git a/config.c b/config.c
index b60a1ff..4edd940 100644
--- a/config.c
+++ b/config.c
@@ -477,6 +477,18 @@ static int git_default_core_config(const char *var, const char *value)
return 0;
}

+ if (!strcmp(var, "core.eol")) {
+ if (value && !strcasecmp(value, "lf"))
+ eol = EOL_LF;
+ else if (value && !strcasecmp(value, "crlf"))
+ eol = EOL_CRLF;
+ else if (value && !strcasecmp(value, "native"))
+ eol = EOL_NATIVE;
+ else
+ eol = EOL_UNSET;
+ return 0;
+ }
+
if (!strcmp(var, "core.notesref")) {
notes_ref_name = xstrdup(value);
return 0;
diff --git a/convert.c b/convert.c
index a309e07..b7ee469 100644
--- a/convert.c
+++ b/convert.c
@@ -20,12 +20,6 @@ enum action {
CRLF_AUTO,
};

-enum eol {
- EOL_UNSET,
- EOL_LF,
- EOL_CRLF,
-};
-
struct text_stat {
/* NUL, CR, LF and CRLF counts */
unsigned nul, cr, lf, crlf;
@@ -244,12 +238,27 @@ static int crlf_to_worktree(const char *path, const char *src, size_t len,
char *to_free = NULL;
struct text_stat stats;

- if ((action == CRLF_BINARY) || (action == CRLF_INPUT) ||
- (action != CRLF_CRLF && auto_crlf != AUTO_CRLF_TRUE))
+ if (!len)
return 0;

- if (!len)
+ switch (action) {
+ case CRLF_CRLF:
+ break;
+ case CRLF_BINARY:
+ case CRLF_INPUT:
return 0;
+ case CRLF_GUESS:
+ if (auto_crlf == AUTO_CRLF_FALSE)
+ return 0;
+ /* fall through */
+ case CRLF_TEXT:
+ case CRLF_AUTO:
+ if (eol == EOL_LF ||
+ (eol == EOL_UNSET &&
+ (auto_crlf == AUTO_CRLF_INPUT ||
+ auto_crlf == AUTO_CRLF_FALSE && EOL_NATIVE == EOL_LF)))
+ return 0;
+ }

gather_stats(src, len, &stats);

@@ -670,7 +679,7 @@ int convert_to_git(const char *path, const char *src, size_t len,
{
struct git_attr_check check[5];
enum action action = CRLF_GUESS;
- enum eol eol = EOL_UNSET;
+ enum eol eol_attr = EOL_UNSET;
int ident = 0, ret = 0;
const char *filter = NULL;

@@ -682,7 +691,7 @@ int convert_to_git(const char *path, const char *src, size_t len,
action = git_path_check_crlf(path, check + 0);
ident = git_path_check_ident(path, check + 1);
drv = git_path_check_convert(path, check + 2);
- eol = git_path_check_eol(path, check + 3);
+ eol_attr = git_path_check_eol(path, check + 3);
if (drv && drv->clean)
filter = drv->clean;
}
@@ -692,7 +701,7 @@ int convert_to_git(const char *path, const char *src, size_t len,
src = dst->buf;
len = dst->len;
}
- action = determine_action(action, eol);
+ action = determine_action(action, eol_attr);
ret |= crlf_to_git(path, src, len, dst, action, checksafe);
if (ret) {
src = dst->buf;
@@ -705,7 +714,7 @@ int convert_to_working_tree(const char *path, const char *src, size_t len, struc
{
struct git_attr_check check[5];
enum action action = CRLF_GUESS;
- enum eol eol = EOL_UNSET;
+ enum eol eol_attr = EOL_UNSET;
int ident = 0, ret = 0;
const char *filter = NULL;

@@ -717,7 +726,7 @@ int convert_to_working_tree(const char *path, const char *src, size_t len, struc
action = git_path_check_crlf(path, check + 0);
ident = git_path_check_ident(path, check + 1);
drv = git_path_check_convert(path, check + 2);
- eol = git_path_check_eol(path, check + 3);
+ eol_attr = git_path_check_eol(path, check + 3);
if (drv && drv->smudge)
filter = drv->smudge;
}
@@ -727,7 +736,7 @@ int convert_to_working_tree(const char *path, const char *src, size_t len, struc
src = dst->buf;
len = dst->len;
}
- action = determine_action(action, eol);
+ action = determine_action(action, eol_attr);
ret |= crlf_to_worktree(path, src, len, dst, action);
if (ret) {
src = dst->buf;
diff --git a/environment.c b/environment.c
index db4a5e9..83d38d3 100644
--- a/environment.c
+++ b/environment.c
@@ -40,6 +40,7 @@ const char *editor_program;
const char *excludes_file;
enum auto_crlf auto_crlf = AUTO_CRLF_FALSE;
int read_replace_refs = 1;
+enum eol eol = EOL_UNSET;
enum safe_crlf safe_crlf = SAFE_CRLF_WARN;
unsigned whitespace_rule_cfg = WS_DEFAULT_RULE;
enum branch_track git_branch_track = BRANCH_TRACK_REMOTE;
diff --git a/t/t0026-eol-config.sh b/t/t0026-eol-config.sh
new file mode 100755
index 0000000..5b6c297
--- /dev/null
+++ b/t/t0026-eol-config.sh
@@ -0,0 +1,83 @@
+#!/bin/sh
+
+test_description='CRLF conversion'
+
+. ./test-lib.sh
+
+has_cr() {
+ tr '\015' Q <"$1" | grep Q >/dev/null
+}
+
+test_expect_success setup '
+
+ git config core.autocrlf false &&
+
+ echo "one text" > .gitattributes
+
+ for w in Hello world how are you; do echo $w; done >one &&
+ for w in I am very very fine thank you; do echo $w; done >two &&
+ git add . &&
+
+ git commit -m initial &&
+
+ one=`git rev-parse HEAD:one` &&
+ two=`git rev-parse HEAD:two` &&
+
+ echo happy.
+'
+
+test_expect_success 'eol=lf puts LFs in normalized file' '
+
+ rm -f .gitattributes tmp one two &&
+ git config core.eol lf &&
+ git read-tree --reset -u HEAD &&
+
+ ! has_cr one &&
+ ! has_cr two &&
+ onediff=`git diff one` &&
+ twodiff=`git diff two` &&
+ test -z "$onediff" -a -z "$twodiff"
+'
+
+test_expect_success 'eol=crlf puts CRLFs in normalized file' '
+
+ rm -f .gitattributes tmp one two &&
+ git config core.eol crlf &&
+ git read-tree --reset -u HEAD &&
+
+ has_cr one &&
+ ! has_cr two &&
+ onediff=`git diff one` &&
+ twodiff=`git diff two` &&
+ test -z "$onediff" -a -z "$twodiff"
+'
+
+test_expect_success 'eol=lf overrides autocrlf=true' '
+
+ rm -f .gitattributes tmp one two &&
+ git config core.eol lf &&
+ git config core.autocrlf true &&
+ git read-tree --reset -u HEAD &&
+
+ ! has_cr one &&
+ ! has_cr two &&
+ onediff=`git diff one` &&
+ twodiff=`git diff two` &&
+ test -z "$onediff" -a -z "$twodiff"
+'
+
+test_expect_success 'autocrlf=true overrides unset eol' '
+
+ rm -f .gitattributes tmp one two &&
+ git config --unset-all core.eol &&
+ git config core.autocrlf true &&
+ git read-tree --reset -u HEAD &&
+
+ has_cr one &&
+ has_cr two &&
+ onediff=`git diff one` &&
+ twodiff=`git diff two` &&
+ test -z "$onediff" -a -z "$twodiff"
+'
+
+test_done
--
1.7.1.5.gd739a

Dmitry Potapov

unread,
May 16, 2010, 1:19:27 AM5/16/10
to Eyvind Bernhardsen, Robert Buck, git@vger.kernel.org List, msysGit
On Sat, May 15, 2010 at 10:23:52PM +0200, Eyvind Bernhardsen wrote:
> On 14. mai 2010, at 12.16, Dmitry Potapov wrote:
>
> > Probably, ability of automatic add utf8 BOM on Windows to text files
> > (which are marked as "unicode") can be helpful, but it is just a part
> > of the problem of how to deal with text files in "legacy" encoding,
> > which are still widely used on Windows.
>
> Sounds like something a clean/smudge filter should be able to do.

Yes, it should if you handful files that need such conversion. However,
if you want it for every text file, running filters are slow (especially
on Windows), and they are not capable to autodetect text.

> (which hopefully works no matter what your code
> page is? I don't know much about Windows i18n).

Yes, it does. I am not an expert on Windows either, but as far as I
know, BOM are used to mark unicode files, which could be either UTF-8
or UTF-16. BTW, UTF-16 are treated by Git as "binary" now, which may
not always convenient, because impossible to do "merge" or "diff".

> Adding this to convert.c would be more difficult, at least
> politically, since I assume it would be Windows-specific code.

I don't think it needs any Windows-specific code. We already have some
functions to convert text from different charsets, which could be used.
But this feature should be developed and tested by people who work on
Windows regularly and need this feature, because there is no substitute
for testing and experience of how well it works in practice. Currently,
I rarely use Windows and can get by clean/smudge filters.


Dmitry

Eyvind Bernhardsen

unread,
May 16, 2010, 6:37:54 AM5/16/10
to Dmitry Potapov, Robert Buck, git@vger.kernel.org List, msysGit
On 16. mai 2010, at 07.19, Dmitry Potapov wrote:

> On Sat, May 15, 2010 at 10:23:52PM +0200, Eyvind Bernhardsen wrote:
>> (which hopefully works no matter what your code
>> page is? I don't know much about Windows i18n).
>
> Yes, it does. I am not an expert on Windows either, but as far as I
> know, BOM are used to mark unicode files, which could be either UTF-8
> or UTF-16. BTW, UTF-16 are treated by Git as "binary" now, which may
> not always convenient, because impossible to do "merge" or "diff".

Okay, so something that checks text files to see if they're utf16 (maybe just accept anything with a utf16 BOM as utf16?) and converts them to utf8 might be useful on any platform. Stripping utf8 BOMs and optionally re-adding them on output would be a natural extension. "core.autoutf", anyone?

>> Adding this to convert.c would be more difficult, at least
>> politically, since I assume it would be Windows-specific code.
>
> I don't think it needs any Windows-specific code. We already have some
> functions to convert text from different charsets, which could be used.
> But this feature should be developed and tested by people who work on
> Windows regularly and need this feature, because there is no substitute
> for testing and experience of how well it works in practice. Currently,
> I rarely use Windows and can get by clean/smudge filters.

Yeah, the problem is finding someone who needs the feature _and_ is able/willing to implement it. I try to keep a Unix-like experience on Windows, so I don't usually run into utf8 BOMs.
--
Eyvind

Robert Buck

unread,
May 16, 2010, 6:39:57 AM5/16/10
to Eyvind Bernhardsen, Linus Torvalds, g...@vger.kernel.org, msysGit, Junio C Hamano, Finn Arne Gangstad
Looking forward to this change. In terms of usability it is really
nice. Eager to see it in a release.
Reply all
Reply to author
Forward
0 new messages