[GSoC][PATCH v4 0/7] clone: dir-iterator refactoring with tests

2 views
Skip to first unread message

Matheus Tavares

unread,
Mar 22, 2019, 7:22:49 PM3/22/19
to g...@vger.kernel.org, Thomas Gummerer, Ævar Arnfjörð Bjarmason, Christian Couder, Nguyễn Thái Ngọc Duy, kerne...@googlegroups.com
This patchset contains:
- a replacement of explicit recursive dir iteration at
copy_or_link_directory for the dir-iterator API;
- some refactoring and behaviour changes at local clone, mainly to
take care of symlinks and hidden files at .git/objects; and
- tests for this type of files

Changes since v3 includes:
- Addressed Duy's and Ævar's comments and suggestions in v2,
including but not limited to:
- Add patch to replace strcmp for fspathcmp
- Code comments refactoring
- Unident snippet at mkdir_if_missing
- Made t5604 added subtests pass under GIT_TEST_MULTI_PACK_INDEX=1
and GIT_TEST_COMMIT_GRAPH=1
- Re-implemented patch 2 with linkat(), to be simpler and have a safer
behaviour when clonning repos with symlinks at .git/objects
- Split first patch's tests into patches 1 and 2, tweaked it a little
to reflect the previous item changes, and replaced some usages of the
string 'link' for 'symlink' just to avoid confusion with 'hardlinks'
which are also known just by 'links'.

v3: https://public-inbox.org/git/20190226122829...@gmail.com/

Matheus Tavares (6):
clone: better handle symlinked files at .git/objects/
dir-iterator: add flags parameter to dir_iterator_begin
clone: copy hidden paths at local clone
clone: extract function from copy_or_link_directory
clone: use dir-iterator to avoid explicit dir traversal
clone: Replace strcmp by fspathcmp

Ævar Arnfjörð Bjarmason (1):
clone: test for our behavior on odd objects/* content

builtin/clone.c | 72 ++++++++++---------
dir-iterator.c | 28 +++++++-
dir-iterator.h | 39 +++++++++--
refs/files-backend.c | 2 +-
t/t5604-clone-reference.sh | 137 +++++++++++++++++++++++++++++++++++++
5 files changed, 236 insertions(+), 42 deletions(-)

--
2.20.1

Matheus Tavares

unread,
Mar 22, 2019, 7:22:52 PM3/22/19
to g...@vger.kernel.org, Thomas Gummerer, Ævar Arnfjörð Bjarmason, Christian Couder, Nguyễn Thái Ngọc Duy, kerne...@googlegroups.com, Alex Riesen, Junio C Hamano
From: Ævar Arnfjörð Bjarmason <ava...@gmail.com>

Add tests for what happens when we perform a local clone on a repo
containing odd files at .git/object directory, such as symlinks to other
dirs, or unknown files.

I'm bending over backwards here to avoid a SHA1 dependency. See [1]
for an earlier and simpler version that hardcoded a SHA-1s.

This behavior has been the same for a *long* time, but hasn't been
tested for.

There's a good post-hoc argument to be made for copying over unknown
things, e.g. I'd like a git version that doesn't know about the
commit-graph to copy it under "clone --local" so a newer git version
can make use of it.

In follow-up commits we'll look at changing some of this behavior, but
for now let's just assert it as-is so we'll notice what we'll change
later.

1. https://public-inbox.org/git/20190226002625...@gmail.com/

Signed-off-by: Ævar Arnfjörð Bjarmason <ava...@gmail.com>
Signed-off-by: Matheus Tavares <matheus.b...@usp.br>
Helped-by: Matheus Tavares <matheus.b...@usp.br>
---
t/t5604-clone-reference.sh | 116 +++++++++++++++++++++++++++++++++++++
1 file changed, 116 insertions(+)

diff --git a/t/t5604-clone-reference.sh b/t/t5604-clone-reference.sh
index 4320082b1b..708b1a2c66 100755
--- a/t/t5604-clone-reference.sh
+++ b/t/t5604-clone-reference.sh
@@ -221,4 +221,120 @@ test_expect_success 'clone, dissociate from alternates' '
( cd C && git fsck )
'

+test_expect_success 'setup repo with garbage in objects/*' '
+ git init S &&
+ (
+ cd S &&
+ test_commit A &&
+
+ cd .git/objects &&
+ >.some-hidden-file &&
+ >some-file &&
+ mkdir .some-hidden-dir &&
+ >.some-hidden-dir/some-file &&
+ >.some-hidden-dir/.some-dot-file &&
+ mkdir some-dir &&
+ >some-dir/some-file &&
+ >some-dir/.some-dot-file
+ )
+'
+
+test_expect_success 'clone a repo with garbage in objects/*' '
+ for option in --local --no-hardlinks --shared --dissociate
+ do
+ git clone $option S S$option || return 1 &&
+ git -C S$option fsck || return 1
+ done &&
+ find S-* -name "*some*" | sort >actual &&
+ cat >expected <<-EOF &&
+ S--dissociate/.git/objects/.some-hidden-file
+ S--dissociate/.git/objects/some-dir
+ S--dissociate/.git/objects/some-dir/.some-dot-file
+ S--dissociate/.git/objects/some-dir/some-file
+ S--dissociate/.git/objects/some-file
+ S--local/.git/objects/.some-hidden-file
+ S--local/.git/objects/some-dir
+ S--local/.git/objects/some-dir/.some-dot-file
+ S--local/.git/objects/some-dir/some-file
+ S--local/.git/objects/some-file
+ S--no-hardlinks/.git/objects/.some-hidden-file
+ S--no-hardlinks/.git/objects/some-dir
+ S--no-hardlinks/.git/objects/some-dir/.some-dot-file
+ S--no-hardlinks/.git/objects/some-dir/some-file
+ S--no-hardlinks/.git/objects/some-file
+ EOF
+ test_cmp expected actual
+'
+
+test_expect_success SYMLINKS 'setup repo with manually symlinked dirs and unknown files at objects/' '
+ git init T &&
+ (
+ cd T &&
+ test_commit A &&
+ git gc &&
+ (
+ cd .git/objects &&
+ mv pack packs &&
+ ln -s packs pack
+ ) &&
+ test_commit B &&
+ (
+ cd .git/objects &&
+ find ?? -type d >loose-dirs &&
+ last_loose=$(tail -n 1 loose-dirs) &&
+ rm -f loose-dirs &&
+ mv $last_loose a-loose-dir &&
+ ln -s a-loose-dir $last_loose &&
+ find . -type f | sort >../../../T.objects-files.raw &&
+ echo unknown_content> unknown_file
+ )
+ ) &&
+ git -C T fsck &&
+ git -C T rev-list --all --objects >T.objects
+'
+
+
+test_expect_success SYMLINKS 'clone repo with symlinked dirs and unknown files at objects/' '
+ for option in --local --no-hardlinks --shared --dissociate
+ do
+ git clone $option T T$option || return 1 &&
+ git -C T$option fsck || return 1 &&
+ git -C T$option rev-list --all --objects >T$option.objects &&
+ test_cmp T.objects T$option.objects &&
+ (
+ cd T$option/.git/objects &&
+ find . -type f | sort >../../../T$option.objects-files.raw
+ )
+ done &&
+
+ for raw in $(ls T*.raw)
+ do
+ sed -e "s!/..\$!/X!; s!/../!/Y/!; s![0-9a-f]\{38,\}!Z!" \
+ -e "/multi-pack-index/d" -e "/commit-graph/d" <$raw >$raw.de-sha || return 1
+ done &&
+
+ cat >expected-files <<-EOF &&
+ ./Y/Z
+ ./Y/Z
+ ./a-loose-dir/Z
+ ./Y/Z
+ ./info/packs
+ ./pack/pack-Z.idx
+ ./pack/pack-Z.pack
+ ./packs/pack-Z.idx
+ ./packs/pack-Z.pack
+ ./unknown_file
+ EOF
+
+ for option in --local --dissociate --no-hardlinks
+ do
+ test_cmp expected-files T$option.objects-files.raw.de-sha || return 1
+ done &&
+
+ cat >expected-files <<-EOF &&
+ ./info/alternates
+ EOF
+ test_cmp expected-files T--shared.objects-files.raw
+'
+
test_done
--
2.20.1

Matheus Tavares

unread,
Mar 22, 2019, 7:22:55 PM3/22/19
to g...@vger.kernel.org, Thomas Gummerer, Ævar Arnfjörð Bjarmason, Christian Couder, Nguyễn Thái Ngọc Duy, kerne...@googlegroups.com, Benoit Pierre, Junio C Hamano
There is currently an odd behaviour when locally clonning a repository
with symlinks at .git/objects: using --no-hardlinks all symlinks are
dereferenced but without it Git will try to hardlink the files with the
link() function, which has an OS-specific behaviour on symlinks. On OSX
and NetBSD, it creates a hardlink to the file pointed by the symlink
whilst on GNU/Linux, it creates a hardlink to the symlink itself.

On Manjaro GNU/Linux:
$ touch a
$ ln -s a b
$ link b c
$ ls -li a b c
155 [...] a
156 [...] b -> a
156 [...] c -> a

But on NetBSD:
$ ls -li a b c
2609160 [...] a
2609164 [...] b -> a
2609160 [...] c

It's not good to have the result of a local clone to be OS-dependent and
since the behaviour on GNU/Linux may result in broken symlinks, let's
re-implement it with linkat() instead of link() using a flag to always
follow symlinks and make the hardlink be to the pointed file. With this,
besides standardizing the behaviour, no broken symlinks will be
produced. Also, add tests for symlinked files at .git/objects/.

Note: Git won't create symlinks at .git/objects itself, but it's better
to handle this case and be friendly with users who manually create them.

Signed-off-by: Matheus Tavares <matheus.b...@usp.br>
Signed-off-by: Ævar Arnfjörð Bjarmason <ava...@gmail.com>
Co-authored-by: Ævar Arnfjörð Bjarmason <ava...@gmail.com>
---
builtin/clone.c | 2 +-
t/t5604-clone-reference.sh | 26 +++++++++++++++++++-------
2 files changed, 20 insertions(+), 8 deletions(-)

diff --git a/builtin/clone.c b/builtin/clone.c
index 50bde99618..b76f33c635 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -443,7 +443,7 @@ static void copy_or_link_directory(struct strbuf *src, struct strbuf *dest,
if (unlink(dest->buf) && errno != ENOENT)
die_errno(_("failed to unlink '%s'"), dest->buf);
if (!option_no_hardlinks) {
- if (!link(src->buf, dest->buf))
+ if (!linkat(AT_FDCWD, src->buf, AT_FDCWD, dest->buf, AT_SYMLINK_FOLLOW))
continue;
if (option_local > 0)
die_errno(_("failed to create link '%s'"), dest->buf);
diff --git a/t/t5604-clone-reference.sh b/t/t5604-clone-reference.sh
index 708b1a2c66..76d45f1187 100755
--- a/t/t5604-clone-reference.sh
+++ b/t/t5604-clone-reference.sh
@@ -266,7 +266,7 @@ test_expect_success 'clone a repo with garbage in objects/*' '
test_cmp expected actual
'

-test_expect_success SYMLINKS 'setup repo with manually symlinked dirs and unknown files at objects/' '
+test_expect_success SYMLINKS 'setup repo with manually symlinked or unknown files at objects/' '
git init T &&
(
cd T &&
@@ -282,10 +282,18 @@ test_expect_success SYMLINKS 'setup repo with manually symlinked dirs and unknow
cd .git/objects &&
find ?? -type d >loose-dirs &&
last_loose=$(tail -n 1 loose-dirs) &&
- rm -f loose-dirs &&
mv $last_loose a-loose-dir &&
ln -s a-loose-dir $last_loose &&
+ first_loose=$(head -n 1 loose-dirs) &&
+ rm -f loose-dirs &&
+ (
+ cd $first_loose &&
+ obj=$(ls *) &&
+ mv $obj ../an-object &&
+ ln -s ../an-object $obj
+ ) &&
find . -type f | sort >../../../T.objects-files.raw &&
+ find . -type l | sort >../../../T.objects-symlinks.raw &&
echo unknown_content> unknown_file
)
) &&
@@ -294,7 +302,7 @@ test_expect_success SYMLINKS 'setup repo with manually symlinked dirs and unknow
'


-test_expect_success SYMLINKS 'clone repo with symlinked dirs and unknown files at objects/' '
+test_expect_success SYMLINKS 'clone repo with symlinked or unknown files at objects/' '
for option in --local --no-hardlinks --shared --dissociate
do
git clone $option T T$option || return 1 &&
@@ -303,7 +311,8 @@ test_expect_success SYMLINKS 'clone repo with symlinked dirs and unknown files a
test_cmp T.objects T$option.objects &&
(
cd T$option/.git/objects &&
- find . -type f | sort >../../../T$option.objects-files.raw
+ find . -type f | sort >../../../T$option.objects-files.raw &&
+ find . -type l | sort >../../../T$option.objects-symlinks.raw
)
done &&

@@ -317,6 +326,7 @@ test_expect_success SYMLINKS 'clone repo with symlinked dirs and unknown files a
./Y/Z
./Y/Z
./a-loose-dir/Z
+ ./an-object
./Y/Z
./info/packs
./pack/pack-Z.idx
@@ -326,15 +336,17 @@ test_expect_success SYMLINKS 'clone repo with symlinked dirs and unknown files a
./unknown_file
EOF

- for option in --local --dissociate --no-hardlinks
+ for option in --local --no-hardlinks --dissociate
do
- test_cmp expected-files T$option.objects-files.raw.de-sha || return 1
+ test_cmp expected-files T$option.objects-files.raw.de-sha || return 1 &&
+ test_must_be_empty T$option.objects-symlinks.raw.de-sha || return 1
done &&

cat >expected-files <<-EOF &&
./info/alternates
EOF
- test_cmp expected-files T--shared.objects-files.raw
+ test_cmp expected-files T--shared.objects-files.raw &&
+ test_must_be_empty T--shared.objects-symlinks.raw
'

test_done
--
2.20.1

Matheus Tavares

unread,
Mar 22, 2019, 7:22:58 PM3/22/19
to g...@vger.kernel.org, Thomas Gummerer, Ævar Arnfjörð Bjarmason, Christian Couder, Nguyễn Thái Ngọc Duy, kerne...@googlegroups.com, Michael Haggerty, Ramsay Jones, Junio C Hamano
Add the possibility of giving flags to dir_iterator_begin to initialize
a dir-iterator with special options.

Currently possible flags are DIR_ITERATOR_PEDANTIC, which makes
dir_iterator_advance abort imediatelly in the case of an error while
trying to fetch next entry; and DIR_ITERATOR_FOLLOW_SYMLINKS, which
makes the iteration follow symlinks to directories and include its
contents in the iteration. These new flags will be used in a subsequent
patch.

Also adjust refs/files-backend.c to the new dir_iterator_begin
signature.

Signed-off-by: Matheus Tavares <matheus.b...@usp.br>
---
dir-iterator.c | 28 +++++++++++++++++++++++++---
dir-iterator.h | 39 +++++++++++++++++++++++++++++++++------
refs/files-backend.c | 2 +-
3 files changed, 59 insertions(+), 10 deletions(-)

diff --git a/dir-iterator.c b/dir-iterator.c
index f2dcd82fde..17aca8ea41 100644
--- a/dir-iterator.c
+++ b/dir-iterator.c
@@ -48,12 +48,16 @@ struct dir_iterator_int {
* that will be included in this iteration.
*/
struct dir_iterator_level *levels;
+
+ /* Combination of flags for this dir-iterator */
+ unsigned flags;
};

int dir_iterator_advance(struct dir_iterator *dir_iterator)
{
struct dir_iterator_int *iter =
(struct dir_iterator_int *)dir_iterator;
+ int ret;

while (1) {
struct dir_iterator_level *level =
@@ -71,6 +75,8 @@ int dir_iterator_advance(struct dir_iterator *dir_iterator)

level->dir = opendir(iter->base.path.buf);
if (!level->dir && errno != ENOENT) {
+ if (iter->flags & DIR_ITERATOR_PEDANTIC)
+ goto error_out;
warning("error opening directory %s: %s",
iter->base.path.buf, strerror(errno));
/* Popping the level is handled below */
@@ -122,6 +128,8 @@ int dir_iterator_advance(struct dir_iterator *dir_iterator)
if (!de) {
/* This level is exhausted; pop up a level. */
if (errno) {
+ if (iter->flags & DIR_ITERATOR_PEDANTIC)
+ goto error_out;
warning("error reading directory %s: %s",
iter->base.path.buf, strerror(errno));
} else if (closedir(level->dir))
@@ -138,11 +146,20 @@ int dir_iterator_advance(struct dir_iterator *dir_iterator)
continue;

strbuf_addstr(&iter->base.path, de->d_name);
- if (lstat(iter->base.path.buf, &iter->base.st) < 0) {
- if (errno != ENOENT)
+
+ if (iter->flags & DIR_ITERATOR_FOLLOW_SYMLINKS)
+ ret = stat(iter->base.path.buf, &iter->base.st);
+ else
+ ret = lstat(iter->base.path.buf, &iter->base.st);
+
+ if (ret < 0) {
+ if (errno != ENOENT) {
+ if (iter->flags & DIR_ITERATOR_PEDANTIC)
+ goto error_out;
warning("error reading path '%s': %s",
iter->base.path.buf,
strerror(errno));
+ }
continue;
}

@@ -159,6 +176,10 @@ int dir_iterator_advance(struct dir_iterator *dir_iterator)
return ITER_OK;
}
}
+
+error_out:
+ dir_iterator_abort(dir_iterator);
+ return ITER_ERROR;
}

int dir_iterator_abort(struct dir_iterator *dir_iterator)
@@ -182,7 +203,7 @@ int dir_iterator_abort(struct dir_iterator *dir_iterator)
return ITER_DONE;
}

-struct dir_iterator *dir_iterator_begin(const char *path)
+struct dir_iterator *dir_iterator_begin(const char *path, unsigned flags)
{
struct dir_iterator_int *iter = xcalloc(1, sizeof(*iter));
struct dir_iterator *dir_iterator = &iter->base;
@@ -195,6 +216,7 @@ struct dir_iterator *dir_iterator_begin(const char *path)

ALLOC_GROW(iter->levels, 10, iter->levels_alloc);

+ iter->flags = flags;
iter->levels_nr = 1;
iter->levels[0].initialized = 0;

diff --git a/dir-iterator.h b/dir-iterator.h
index 970793d07a..890d5d8dbb 100644
--- a/dir-iterator.h
+++ b/dir-iterator.h
@@ -19,7 +19,7 @@
* A typical iteration looks like this:
*
* int ok;
- * struct iterator *iter = dir_iterator_begin(path);
+ * struct iterator *iter = dir_iterator_begin(path, 0);
*
* while ((ok = dir_iterator_advance(iter)) == ITER_OK) {
* if (want_to_stop_iteration()) {
@@ -40,6 +40,20 @@
* dir_iterator_advance() again.
*/

+/*
+ * Flags for dir_iterator_begin:
+ *
+ * - DIR_ITERATOR_PEDANTIC: override dir-iterator's default behavior
+ * in case of an error while trying to fetch the next entry, which is
+ * to emit a warning and keep going. With this flag, resouces are
+ * freed and ITER_ERROR is return immediately.
+ *
+ * - DIR_ITERATOR_FOLLOW_SYMLINKS: make dir-iterator follow symlinks to
+ * directories, i.e., iterate over linked directories' contents.
+ */
+#define DIR_ITERATOR_PEDANTIC (1 << 0)
+#define DIR_ITERATOR_FOLLOW_SYMLINKS (1 << 1)
+
struct dir_iterator {
/* The current path: */
struct strbuf path;
@@ -54,20 +68,28 @@ struct dir_iterator {
/* The current basename: */
const char *basename;

- /* The result of calling lstat() on path: */
+ /*
+ * The result of calling lstat() on path or stat(), if the
+ * DIR_ITERATOR_FOLLOW_SYMLINKS flag was set at
+ * dir_iterator's initialization.
+ */
struct stat st;
};

/*
- * Start a directory iteration over path. Return a dir_iterator that
- * holds the internal state of the iteration.
+ * Start a directory iteration over path with the combination of
+ * options specified by flags. Return a dir_iterator that holds the
+ * internal state of the iteration.
*
* The iteration includes all paths under path, not including path
* itself and not including "." or ".." entries.
*
- * path is the starting directory. An internal copy will be made.
+ * Parameters are:
+ * - path is the starting directory. An internal copy will be made.
+ * - flags is a combination of the possible flags to initialize a
+ * dir-iterator or 0 for default behaviour.
*/
-struct dir_iterator *dir_iterator_begin(const char *path);
+struct dir_iterator *dir_iterator_begin(const char *path, unsigned flags);

/*
* Advance the iterator to the first or next item and return ITER_OK.
@@ -76,6 +98,11 @@ struct dir_iterator *dir_iterator_begin(const char *path);
* dir_iterator and associated resources and return ITER_ERROR. It is
* a bug to use iterator or call this function again after it has
* returned ITER_DONE or ITER_ERROR.
+ *
+ * Note that whether dir-iterator will return ITER_ERROR when failing
+ * to fetch the next entry or just emit a warning and try to fetch the
+ * next is defined by the 'pedantic' option at dir-iterator's
+ * initialization.
*/
int dir_iterator_advance(struct dir_iterator *iterator);

diff --git a/refs/files-backend.c b/refs/files-backend.c
index ef053f716c..2ce9783097 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -2143,7 +2143,7 @@ static struct ref_iterator *reflog_iterator_begin(struct ref_store *ref_store,

base_ref_iterator_init(ref_iterator, &files_reflog_iterator_vtable, 0);
strbuf_addf(&sb, "%s/logs", gitdir);
- iter->dir_iterator = dir_iterator_begin(sb.buf);
+ iter->dir_iterator = dir_iterator_begin(sb.buf, 0);
iter->ref_store = ref_store;
strbuf_release(&sb);

--
2.20.1

Matheus Tavares

unread,
Mar 22, 2019, 7:23:01 PM3/22/19
to g...@vger.kernel.org, Thomas Gummerer, Ævar Arnfjörð Bjarmason, Christian Couder, Nguyễn Thái Ngọc Duy, kerne...@googlegroups.com, Benoit Pierre, Junio C Hamano
Make the copy_or_link_directory function no longer skip hidden
directories. This function, used to copy .git/objects, currently skips
all hidden directories but not hidden files, which is an odd behaviour.
The reason for that could be unintentional: probably the intention was
to skip '.' and '..' only but it ended up accidentally skipping all
directories starting with '.'. Besides being more natural, the new
behaviour is more permissive to the user.

Also adjust tests to reflect this behaviour change.

Signed-off-by: Matheus Tavares <matheus.b...@usp.br>
Signed-off-by: Ævar Arnfjörð Bjarmason <ava...@gmail.com>
Co-authored-by: Ævar Arnfjörð Bjarmason <ava...@gmail.com>
---
builtin/clone.c | 2 +-
t/t5604-clone-reference.sh | 9 +++++++++
2 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/builtin/clone.c b/builtin/clone.c
index b76f33c635..60c6780c06 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -428,7 +428,7 @@ static void copy_or_link_directory(struct strbuf *src, struct strbuf *dest,
continue;
}
if (S_ISDIR(buf.st_mode)) {
- if (de->d_name[0] != '.')
+ if (!is_dot_or_dotdot(de->d_name))
copy_or_link_directory(src, dest,
src_repo, src_baselen);
continue;
diff --git a/t/t5604-clone-reference.sh b/t/t5604-clone-reference.sh
index 76d45f1187..0992baa5ac 100755
--- a/t/t5604-clone-reference.sh
+++ b/t/t5604-clone-reference.sh
@@ -247,16 +247,25 @@ test_expect_success 'clone a repo with garbage in objects/*' '
done &&
find S-* -name "*some*" | sort >actual &&
cat >expected <<-EOF &&
+ S--dissociate/.git/objects/.some-hidden-dir
+ S--dissociate/.git/objects/.some-hidden-dir/.some-dot-file
+ S--dissociate/.git/objects/.some-hidden-dir/some-file
S--dissociate/.git/objects/.some-hidden-file
S--dissociate/.git/objects/some-dir
S--dissociate/.git/objects/some-dir/.some-dot-file
S--dissociate/.git/objects/some-dir/some-file
S--dissociate/.git/objects/some-file
+ S--local/.git/objects/.some-hidden-dir
+ S--local/.git/objects/.some-hidden-dir/.some-dot-file
+ S--local/.git/objects/.some-hidden-dir/some-file
S--local/.git/objects/.some-hidden-file
S--local/.git/objects/some-dir
S--local/.git/objects/some-dir/.some-dot-file
S--local/.git/objects/some-dir/some-file
S--local/.git/objects/some-file
+ S--no-hardlinks/.git/objects/.some-hidden-dir
+ S--no-hardlinks/.git/objects/.some-hidden-dir/.some-dot-file
+ S--no-hardlinks/.git/objects/.some-hidden-dir/some-file
S--no-hardlinks/.git/objects/.some-hidden-file
S--no-hardlinks/.git/objects/some-dir
S--no-hardlinks/.git/objects/some-dir/.some-dot-file
--
2.20.1

Matheus Tavares

unread,
Mar 22, 2019, 7:23:04 PM3/22/19
to g...@vger.kernel.org, Thomas Gummerer, Ævar Arnfjörð Bjarmason, Christian Couder, Nguyễn Thái Ngọc Duy, kerne...@googlegroups.com, Benoit Pierre, Junio C Hamano
Extract dir creation code snippet from copy_or_link_directory to its own
function named mkdir_if_missing. This change will help removing
copy_or_link_directory's explicit recursion, which will be done in a
following patch. Also makes code more readable.

Signed-off-by: Matheus Tavares <matheus.b...@usp.br>
---
builtin/clone.c | 24 ++++++++++++++++--------
1 file changed, 16 insertions(+), 8 deletions(-)

diff --git a/builtin/clone.c b/builtin/clone.c
index 60c6780c06..c17bbf1bfc 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -392,6 +392,21 @@ static void copy_alternates(struct strbuf *src, struct strbuf *dst,
fclose(in);
}

+static void mkdir_if_missing(const char *pathname, mode_t mode)
+{
+ struct stat st;
+
+ if (!mkdir(pathname, mode))
+ return;
+
+ if (errno != EEXIST)
+ die_errno(_("failed to create directory '%s'"), pathname);
+ else if (stat(pathname, &st))
+ die_errno(_("failed to stat '%s'"), pathname);
+ else if (!S_ISDIR(st.st_mode))
+ die(_("%s exists and is not a directory"), pathname);
+}
+
static void copy_or_link_directory(struct strbuf *src, struct strbuf *dest,
const char *src_repo, int src_baselen)
{
@@ -404,14 +419,7 @@ static void copy_or_link_directory(struct strbuf *src, struct strbuf *dest,
if (!dir)
die_errno(_("failed to open '%s'"), src->buf);

- if (mkdir(dest->buf, 0777)) {
- if (errno != EEXIST)
- die_errno(_("failed to create directory '%s'"), dest->buf);
- else if (stat(dest->buf, &buf))
- die_errno(_("failed to stat '%s'"), dest->buf);
- else if (!S_ISDIR(buf.st_mode))
- die(_("%s exists and is not a directory"), dest->buf);
- }
+ mkdir_if_missing(dest->buf, 0777);

strbuf_addch(src, '/');
src_len = src->len;
--
2.20.1

Matheus Tavares

unread,
Mar 22, 2019, 7:23:07 PM3/22/19
to g...@vger.kernel.org, Thomas Gummerer, Ævar Arnfjörð Bjarmason, Christian Couder, Nguyễn Thái Ngọc Duy, kerne...@googlegroups.com, Benoit Pierre, Junio C Hamano
Replace usage of opendir/readdir/closedir API to traverse directories
recursively, at copy_or_link_directory function, by the dir-iterator
API. This simplifies the code and avoid recursive calls to
copy_or_link_directory.

This process also makes copy_or_link_directory call die() in case of an
error on readdir or stat, inside dir_iterator_advance. Previously it
would just print a warning for errors on stat and ignore errors on
readdir, which isn't nice because a local git clone would end up
successfully even though the .git/objects copy didn't fully succeeded.

Signed-off-by: Matheus Tavares <matheus.b...@usp.br>
---
builtin/clone.c | 44 ++++++++++++++++++++++----------------------
1 file changed, 22 insertions(+), 22 deletions(-)

diff --git a/builtin/clone.c b/builtin/clone.c
index c17bbf1bfc..4ee45e7862 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -23,6 +23,8 @@
#include "transport.h"
#include "strbuf.h"
#include "dir.h"
+#include "dir-iterator.h"
+#include "iterator.h"
#include "sigchain.h"
#include "branch.h"
#include "remote.h"
@@ -408,42 +410,36 @@ static void mkdir_if_missing(const char *pathname, mode_t mode)
}

static void copy_or_link_directory(struct strbuf *src, struct strbuf *dest,
- const char *src_repo, int src_baselen)
+ const char *src_repo)
{
- struct dirent *de;
- struct stat buf;
int src_len, dest_len;
- DIR *dir;
-
- dir = opendir(src->buf);
- if (!dir)
- die_errno(_("failed to open '%s'"), src->buf);
+ struct dir_iterator *iter;
+ int iter_status;
+ unsigned flags;

mkdir_if_missing(dest->buf, 0777);

+ flags = DIR_ITERATOR_PEDANTIC | DIR_ITERATOR_FOLLOW_SYMLINKS;
+ iter = dir_iterator_begin(src->buf, flags);
+
strbuf_addch(src, '/');
src_len = src->len;
strbuf_addch(dest, '/');
dest_len = dest->len;

- while ((de = readdir(dir)) != NULL) {
+ while ((iter_status = dir_iterator_advance(iter)) == ITER_OK) {
strbuf_setlen(src, src_len);
- strbuf_addstr(src, de->d_name);
+ strbuf_addstr(src, iter->relative_path);
strbuf_setlen(dest, dest_len);
- strbuf_addstr(dest, de->d_name);
- if (stat(src->buf, &buf)) {
- warning (_("failed to stat %s\n"), src->buf);
- continue;
- }
- if (S_ISDIR(buf.st_mode)) {
- if (!is_dot_or_dotdot(de->d_name))
- copy_or_link_directory(src, dest,
- src_repo, src_baselen);
+ strbuf_addstr(dest, iter->relative_path);
+
+ if (S_ISDIR(iter->st.st_mode)) {
+ mkdir_if_missing(dest->buf, 0777);
continue;
}

/* Files that cannot be copied bit-for-bit... */
- if (!strcmp(src->buf + src_baselen, "/info/alternates")) {
+ if (!strcmp(iter->relative_path, "info/alternates")) {
copy_alternates(src, dest, src_repo);
continue;
}
@@ -460,7 +456,11 @@ static void copy_or_link_directory(struct strbuf *src, struct strbuf *dest,
if (copy_file_with_time(dest->buf, src->buf, 0666))
die_errno(_("failed to copy file to '%s'"), dest->buf);
}
- closedir(dir);
+
+ if (iter_status != ITER_DONE) {
+ strbuf_setlen(src, src_len);
+ die(_("failed to iterate over '%s'"), src->buf);
+ }
}

static void clone_local(const char *src_repo, const char *dest_repo)
@@ -478,7 +478,7 @@ static void clone_local(const char *src_repo, const char *dest_repo)
get_common_dir(&dest, dest_repo);
strbuf_addstr(&src, "/objects");
strbuf_addstr(&dest, "/objects");
- copy_or_link_directory(&src, &dest, src_repo, src.len);
+ copy_or_link_directory(&src, &dest, src_repo);
strbuf_release(&src);
strbuf_release(&dest);
}
--
2.20.1

Matheus Tavares

unread,
Mar 22, 2019, 7:23:10 PM3/22/19
to g...@vger.kernel.org, Thomas Gummerer, Ævar Arnfjörð Bjarmason, Christian Couder, Nguyễn Thái Ngọc Duy, kerne...@googlegroups.com, Benoit Pierre, Junio C Hamano
Replace the use of strcmp by fspathcmp at copy_or_link_directory, which
is more permissive/friendly to case-insensitive file systems.

Signed-off-by: Matheus Tavares <matheus.b...@usp.br>
Suggested-by: Nguyễn Thái Ngọc Duy <pcl...@gmail.com>
---
builtin/clone.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/builtin/clone.c b/builtin/clone.c
index 4ee45e7862..763ad5e31f 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -439,7 +439,7 @@ static void copy_or_link_directory(struct strbuf *src, struct strbuf *dest,
}

/* Files that cannot be copied bit-for-bit... */
- if (!strcmp(iter->relative_path, "info/alternates")) {
+ if (!fspathcmp(iter->relative_path, "info/alternates")) {
copy_alternates(src, dest, src_repo);
continue;
}
--
2.20.1

Matheus Tavares Bernardino

unread,
Mar 24, 2019, 2:09:44 PM3/24/19
to Ævar Arnfjörð Bjarmason, git, Thomas Gummerer, Christian Couder, Nguyễn Thái Ngọc Duy, Kernel USP, Alex Riesen, Junio C Hamano
Ævar, maybe I'm missing something here, but do we really need the
first sed command ("s!/..\$!/X!") ?

SZEDER Gábor

unread,
Mar 24, 2019, 4:56:30 PM3/24/19
to Matheus Tavares, g...@vger.kernel.org, Thomas Gummerer, Ævar Arnfjörð Bjarmason, Christian Couder, Nguyễn Thái Ngọc Duy, kerne...@googlegroups.com, Alex Riesen, Junio C Hamano
On Fri, Mar 22, 2019 at 08:22:31PM -0300, Matheus Tavares wrote:
> From: Ævar Arnfjörð Bjarmason <ava...@gmail.com>
>
> Add tests for what happens when we perform a local clone on a repo
> containing odd files at .git/object directory, such as symlinks to other
> dirs, or unknown files.
>
> I'm bending over backwards here to avoid a SHA1 dependency. See [1]

s/SHA1/SHA-1/

> for an earlier and simpler version that hardcoded a SHA-1s.

s/SHA-1s/SHA-1/ or s/a SHA-1s/SHA-1s/, depending on what you consider
multiple occurrances of the same SHA-1.

> This behavior has been the same for a *long* time, but hasn't been
> tested for.
>
> There's a good post-hoc argument to be made for copying over unknown
> things, e.g. I'd like a git version that doesn't know about the
> commit-graph to copy it under "clone --local" so a newer git version
> can make use of it.
>
> In follow-up commits we'll look at changing some of this behavior, but
> for now let's just assert it as-is so we'll notice what we'll change
> later.
>
> 1. https://public-inbox.org/git/20190226002625...@gmail.com/
>
> Signed-off-by: Ævar Arnfjörð Bjarmason <ava...@gmail.com>
> Signed-off-by: Matheus Tavares <matheus.b...@usp.br>
> Helped-by: Matheus Tavares <matheus.b...@usp.br>


> +test_expect_success SYMLINKS 'setup repo with manually symlinked dirs and unknown files at objects/' '
> + git init T &&
> + (
> + cd T &&
> + test_commit A &&
> + git gc &&
> + (
> + cd .git/objects &&
> + mv pack packs &&
> + ln -s packs pack
> + ) &&
> + test_commit B &&
> + (
> + cd .git/objects &&
> + find ?? -type d >loose-dirs &&
> + last_loose=$(tail -n 1 loose-dirs) &&
> + rm -f loose-dirs &&
> + mv $last_loose a-loose-dir &&
> + ln -s a-loose-dir $last_loose &&
> + find . -type f | sort >../../../T.objects-files.raw &&
> + echo unknown_content> unknown_file
> + )

Please drop these inner subshells. They are unnecessary, because the
outer subshell alone is sufficient to ensure that the test script
returns to the original directory if one of the commands were to fail.

> + ) &&
> + git -C T fsck &&
> + git -C T rev-list --all --objects >T.objects
> +'
> +
> +
> +test_expect_success SYMLINKS 'clone repo with symlinked dirs and unknown files at objects/' '
> + for option in --local --no-hardlinks --shared --dissociate
> + do
> + git clone $option T T$option || return 1 &&
> + git -C T$option fsck || return 1 &&
> + git -C T$option rev-list --all --objects >T$option.objects &&
> + test_cmp T.objects T$option.objects &&
> + (
> + cd T$option/.git/objects &&
> + find . -type f | sort >../../../T$option.objects-files.raw
> + )

Nit: this might be a bit easier on the eyes when written as

(
cd T$option/.git/objects &&
find . -type f
) | sort >T$option.objects-files.raw

because it would avoid that '../../../'.
Perhaps

echo ./info/alternates >expected-files

Matheus Tavares Bernardino

unread,
Mar 26, 2019, 3:43:35 PM3/26/19
to SZEDER Gábor, git, Thomas Gummerer, Ævar Arnfjörð Bjarmason, Christian Couder, Nguyễn Thái Ngọc Duy, Kernel USP, Alex Riesen, Junio C Hamano
On Sun, Mar 24, 2019 at 5:56 PM SZEDER Gábor <szede...@gmail.com> wrote:
>
> On Fri, Mar 22, 2019 at 08:22:31PM -0300, Matheus Tavares wrote:
> > From: Ævar Arnfjörð Bjarmason <ava...@gmail.com>
> >
> > Add tests for what happens when we perform a local clone on a repo
> > containing odd files at .git/object directory, such as symlinks to other
> > dirs, or unknown files.
> >
> > I'm bending over backwards here to avoid a SHA1 dependency. See [1]
>
> s/SHA1/SHA-1/
>

Thanks, nice catch.

> > for an earlier and simpler version that hardcoded a SHA-1s.
>
> s/SHA-1s/SHA-1/ or s/a SHA-1s/SHA-1s/, depending on what you consider
> multiple occurrances of the same SHA-1.
>

Yes, I think it should be just "SHA-1s". Thanks.
Ok!

> > + ) &&
> > + git -C T fsck &&
> > + git -C T rev-list --all --objects >T.objects
> > +'
> > +
> > +
> > +test_expect_success SYMLINKS 'clone repo with symlinked dirs and unknown files at objects/' '
> > + for option in --local --no-hardlinks --shared --dissociate
> > + do
> > + git clone $option T T$option || return 1 &&
> > + git -C T$option fsck || return 1 &&
> > + git -C T$option rev-list --all --objects >T$option.objects &&
> > + test_cmp T.objects T$option.objects &&
> > + (
> > + cd T$option/.git/objects &&
> > + find . -type f | sort >../../../T$option.objects-files.raw
> > + )
>
> Nit: this might be a bit easier on the eyes when written as
>
> (
> cd T$option/.git/objects &&
> find . -type f
> ) | sort >T$option.objects-files.raw
>
> because it would avoid that '../../../'.

Sounds good, but in the next patch of this series, another 'find'
statement will be added inside this subshell, so I think that change
is not really possible, unfortunately.
Indeed, much simpler. Thanks.

Thomas Gummerer

unread,
Mar 28, 2019, 5:49:53 PM3/28/19
to Matheus Tavares, g...@vger.kernel.org, Ævar Arnfjörð Bjarmason, Christian Couder, Nguyễn Thái Ngọc Duy, kerne...@googlegroups.com, Alex Riesen, Junio C Hamano
On 03/22, Matheus Tavares wrote:
> From: Ævar Arnfjörð Bjarmason <ava...@gmail.com>
>
> Add tests for what happens when we perform a local clone on a repo
> containing odd files at .git/object directory, such as symlinks to other
> dirs, or unknown files.
>
> I'm bending over backwards here to avoid a SHA1 dependency. See [1]
> for an earlier and simpler version that hardcoded a SHA-1s.
>
> This behavior has been the same for a *long* time, but hasn't been
> tested for.
>
> There's a good post-hoc argument to be made for copying over unknown
> things, e.g. I'd like a git version that doesn't know about the
> commit-graph to copy it under "clone --local" so a newer git version
> can make use of it.
>
> In follow-up commits we'll look at changing some of this behavior, but
> for now let's just assert it as-is so we'll notice what we'll change
> later.
>
> 1. https://public-inbox.org/git/20190226002625...@gmail.com/
>
> Signed-off-by: Ævar Arnfjörð Bjarmason <ava...@gmail.com>
> Signed-off-by: Matheus Tavares <matheus.b...@usp.br>
> Helped-by: Matheus Tavares <matheus.b...@usp.br>

The trailers should be in the order things have happened usually. So
having Ævar's S-o-b first makes sense, but the Helped-by should come
before your S-o-b, as you made the changes first before sending out
the patch series.

When sending someone elses patch in a slightly modified version, it
may also be useful to add which parts you changed, as it was done in
e8dfcace31 ("poll: use GetTickCount64() to avoid wrap-around issues",
2018-10-31) for example.

Iirc, the test that is added in this patch does not work on some
platforms, notably MacOS. That would mean that we would break
bisectability at this patch on some platforms if we were to introduce
it here. Therefore I think it would be better to squash this patch
into the next one which fixes these inconsistencies.

Note that I can't test this at the moment, so this concern is only
based on previous discussions that I remember. If that's already
addressed somehow, all the better!

Thomas Gummerer

unread,
Mar 28, 2019, 6:10:52 PM3/28/19
to Matheus Tavares, g...@vger.kernel.org, Ævar Arnfjörð Bjarmason, Christian Couder, Nguyễn Thái Ngọc Duy, kerne...@googlegroups.com, Benoit Pierre, Junio C Hamano, Johannes Schindelin
On 03/22, Matheus Tavares wrote:
This line is starting to get a bit long, might be worth breaking it up
to keep to 80 characters per line.

I notice that we are currently not using 'linkat()' anywhere else in
our codebase. It looks like it has been introduced in POSIX.1-2008,
which sounds fairly recent by git's standards. So I wonder if this is
really supported on all platforms that git is being built on.

I also wonder what would need to be done on Windows if we were to
introduce this. I see we define the 'link()' function in
'compat/mingw.c' for that currently, so I guess something similar
would be needed for 'linkat()'. I added Dscho to Cc for Windows
expertise.

While I agree with the goal of consistency accross all platforms here,
I don't know if it's actually worth going through the pain of doing
that, especially for somewhat of an edge case in local clones.

If the test in the previous patch passes on all platforms, I'd be okay
with just calling the behaviour here undefined, especially as git
would never actually create symlinks in the .git/objects directory.

Thomas Gummerer

unread,
Mar 28, 2019, 6:19:13 PM3/28/19
to Matheus Tavares, g...@vger.kernel.org, Ævar Arnfjörð Bjarmason, Christian Couder, Nguyễn Thái Ngọc Duy, kerne...@googlegroups.com, Michael Haggerty, Ramsay Jones, Junio C Hamano
On 03/22, Matheus Tavares wrote:
> Add the possibility of giving flags to dir_iterator_begin to initialize
> a dir-iterator with special options.
>
> Currently possible flags are DIR_ITERATOR_PEDANTIC, which makes
> dir_iterator_advance abort imediatelly in the case of an error while

s/imediatelly/immediately/

Ævar Arnfjörð Bjarmason

unread,
Mar 29, 2019, 4:38:17 AM3/29/19
to Thomas Gummerer, Matheus Tavares, g...@vger.kernel.org, Christian Couder, Nguyễn Thái Ngọc Duy, kerne...@googlegroups.com, Benoit Pierre, Junio C Hamano, Johannes Schindelin
For better of worse this particular quest started because I pointed out
(with some WIP patches) that for understanding this change we should
test whatever we did now, to ensure that the refactoring didn't have
unintended side-effects.

But that's a separate question from whether or not we want to keep the
current behavior.

I think the current behavior is clearly insane, so I think we should
change it with some follow-up patches. In particular options like
--dissociate should clearly (in my mind at least) have behavior similar
to "cp -L", and --local should hardlink to the *target* of the symlink,
if anything, at least for objects/{??,pack,info}

I think that changes the portability story with linkat(), since it's not
something we should be planning to keep, just an intermediate step so we
don't have a gigantic patch that both adds tests, refactors and changes
the behavior.

> While I agree with the goal of consistency accross all platforms here,
> I don't know if it's actually worth going through the pain of doing
> that, especially for somewhat of an edge case in local clones.

Note that we explicitly clone everything under objects/, including
recursively cloning unknown directories and their files.

So this is not just say about how we handle symlinks that we don't
expect now (nothing uses them), but if we want to make the promise that
nothing in objects/ will ever use symlinks. Or more specifically, that
if a new version of git starts using it that something doing local
clones might produce a broken copy of such a repo.

Maybe we'll still say "we don't care". Just saying it's a slightly
different question...

Matheus Tavares Bernardino

unread,
Mar 29, 2019, 9:16:13 AM3/29/19
to Thomas Gummerer, git, Ævar Arnfjörð Bjarmason, Christian Couder, Nguyễn Thái Ngọc Duy, Kernel USP, Michael Haggerty, Ramsay Jones, Junio C Hamano
Thanks!

Matheus Tavares Bernardino

unread,
Mar 29, 2019, 10:06:33 AM3/29/19
to Thomas Gummerer, git, Ævar Arnfjörð Bjarmason, Christian Couder, Nguyễn Thái Ngọc Duy, Kernel USP, Alex Riesen, Junio C Hamano
Ok, thanks for letting me know. I'll fix it.

> When sending someone elses patch in a slightly modified version, it
> may also be useful to add which parts you changed, as it was done in
> e8dfcace31 ("poll: use GetTickCount64() to avoid wrap-around issues",
> 2018-10-31) for example.

Thanks, I didn't know about that! I searched the log and didn't see
many of this on patches with 'Helped-by' tags, is there a particular
case to use it or not?

> Iirc, the test that is added in this patch does not work on some
> platforms, notably MacOS. That would mean that we would break
> bisectability at this patch on some platforms if we were to introduce
> it here. Therefore I think it would be better to squash this patch
> into the next one which fixes these inconsistencies.
> Note that I can't test this at the moment, so this concern is only
> based on previous discussions that I remember. If that's already
> addressed somehow, all the better!

Yes, it is already addressed :) The section of these tests that used
to break on some platforms is now moved to the next patch which also
fixes the platform inconsistencies. Now both patches (this and the
next) work on macOS, NetBSD and GNU/Linux. Also every test and job is
passing at travis-ci, except by the job named "Documentation"[1]. But,
it's weird since these patches don't even touch Documentation/... And
master is failing the same job at my fork as well [2]... Any thoughts
on that?

[1] https://travis-ci.org/MatheusBernardino/git/builds/512713775
[2] https://travis-ci.org/MatheusBernardino/git/builds/513028692

Matheus Tavares Bernardino

unread,
Mar 29, 2019, 10:28:07 AM3/29/19
to Thomas Gummerer, git, Ævar Arnfjörð Bjarmason, Christian Couder, Nguyễn Thái Ngọc Duy, Kernel USP, Benoit Pierre, Junio C Hamano, Johannes Schindelin
Ok, what if instead of using linkat() we use 'realpath(const char
*path, char *resolved_path)', which will resolve any symlinks at
'path' and store the canonical path at 'resolved_path'? Then, we can
still keep using link() but now, with the certainty that all platforms
will have a consistent behaviour? (also, realpath() is POSIX.1-2001)
Would that be a better idea?

Johannes Schindelin

unread,
Mar 29, 2019, 11:40:47 AM3/29/19
to Thomas Gummerer, Matheus Tavares, g...@vger.kernel.org, Ævar Arnfjörð Bjarmason, Christian Couder, Nguyễn Thái Ngọc Duy, kerne...@googlegroups.com, Benoit Pierre, Junio C Hamano
Hi Thomas,

On Thu, 28 Mar 2019, Thomas Gummerer wrote:

> On 03/22, Matheus Tavares wrote:
> >
> > diff --git a/builtin/clone.c b/builtin/clone.c
> > index 50bde99618..b76f33c635 100644
> > --- a/builtin/clone.c
> > +++ b/builtin/clone.c
> > @@ -443,7 +443,7 @@ static void copy_or_link_directory(struct strbuf *src, struct strbuf *dest,
> > if (unlink(dest->buf) && errno != ENOENT)
> > die_errno(_("failed to unlink '%s'"), dest->buf);
> > if (!option_no_hardlinks) {
> > - if (!link(src->buf, dest->buf))
> > + if (!linkat(AT_FDCWD, src->buf, AT_FDCWD, dest->buf, AT_SYMLINK_FOLLOW))
>
> [...]
>
> I notice that we are currently not using 'linkat()' anywhere else in
> our codebase. It looks like it has been introduced in POSIX.1-2008,
> which sounds fairly recent by git's standards. So I wonder if this is
> really supported on all platforms that git is being built on.

I bet you it isn't.

> I also wonder what would need to be done on Windows if we were to
> introduce this. I see we define the 'link()' function in
> 'compat/mingw.c' for that currently, so I guess something similar
> would be needed for 'linkat()'. I added Dscho to Cc for Windows
> expertise.

Indeed, `linkat()` would have to be implemented in `compat/mingw.c`. It
would be a bit involved because the last parameter of that function
changes behavior noticeably, but the main difficulty (to determine the
path from a file descriptor) should be overcome using
`HANDLE olddirhandle = _get_osfhandle(olddirfd);` and the calling
`GetFinalPathNameByHandleW(olddirhandle, wbuf, sizeof(wbuf));`.

So yes, this is *not* something I'd do lightly.

The bigger problem will be to continue to support older Unices such as
SunOS and AIX. I highly doubt that they have that function. You should
find out, Matheus.

Ciao,
Johannes

Thomas Gummerer

unread,
Mar 29, 2019, 3:32:01 PM3/29/19
to Matheus Tavares Bernardino, git, Ævar Arnfjörð Bjarmason, Christian Couder, Nguyễn Thái Ngọc Duy, Kernel USP, Alex Riesen, Junio C Hamano
On 03/29, Matheus Tavares Bernardino wrote:
> On Thu, Mar 28, 2019 at 6:49 PM Thomas Gummerer <t.gum...@gmail.com> wrote:
> > When sending someone elses patch in a slightly modified version, it
> > may also be useful to add which parts you changed, as it was done in
> > e8dfcace31 ("poll: use GetTickCount64() to avoid wrap-around issues",
> > 2018-10-31) for example.
>
> Thanks, I didn't know about that! I searched the log and didn't see
> many of this on patches with 'Helped-by' tags, is there a particular
> case to use it or not?

Helped-by tags are usually used when you want to give someone credit
for help you got on a patch that you originally authored. It's up to
you at which point of involvement you actually want to add the tag, I
tend to add them whenever someones input significantly
changes/improves the patch. I think adding it here might be okay,
it's just less common when sending a patch that someone else authored
originally.

> > Iirc, the test that is added in this patch does not work on some
> > platforms, notably MacOS. That would mean that we would break
> > bisectability at this patch on some platforms if we were to introduce
> > it here. Therefore I think it would be better to squash this patch
> > into the next one which fixes these inconsistencies.
> > Note that I can't test this at the moment, so this concern is only
> > based on previous discussions that I remember. If that's already
> > addressed somehow, all the better!
>
> Yes, it is already addressed :) The section of these tests that used
> to break on some platforms is now moved to the next patch which also
> fixes the platform inconsistencies. Now both patches (this and the
> next) work on macOS, NetBSD and GNU/Linux.

Great!

> Also every test and job is
> passing at travis-ci, except by the job named "Documentation"[1]. But,
> it's weird since these patches don't even touch Documentation/... And
> master is failing the same job at my fork as well [2]... Any thoughts
> on that?

Yeah, this error seems to have nothing to do with your patch series.
Since the last run of travis on master [*1*] at least the asciidoc
package doesn't seem to have changed, so from a first look I don't
quite understand what's going on there. In any case, I don't think
you need to worry about that for now, as it hasn't been triggered by
your changes (I won't discourage you from looking at why it is failing
and to try and fix that, but I think your time is probably better
spent looking at this patch series and the proposal for GSoC for
now).

*1*: https://travis-ci.org/git/git/builds/508784487

SZEDER Gábor

unread,
Mar 29, 2019, 3:42:06 PM3/29/19
to Thomas Gummerer, Matheus Tavares Bernardino, git, Ævar Arnfjörð Bjarmason, Christian Couder, Nguyễn Thái Ngọc Duy, Kernel USP, Alex Riesen, Junio C Hamano
On Fri, Mar 29, 2019 at 07:31:58PM +0000, Thomas Gummerer wrote:
> > Also every test and job is
> > passing at travis-ci, except by the job named "Documentation"[1]. But,
> > it's weird since these patches don't even touch Documentation/... And
> > master is failing the same job at my fork as well [2]... Any thoughts
> > on that?
>
> Yeah, this error seems to have nothing to do with your patch series.
> Since the last run of travis on master [*1*] at least the asciidoc
> package doesn't seem to have changed, so from a first look I don't
> quite understand what's going on there.

https://public-inbox.org/git/20190329123520.2...@gmail.com/

Thomas Gummerer

unread,
Mar 29, 2019, 4:05:20 PM3/29/19
to Matheus Tavares Bernardino, git, Ævar Arnfjörð Bjarmason, Christian Couder, Nguyễn Thái Ngọc Duy, Kernel USP, Benoit Pierre, Junio C Hamano, Johannes Schindelin
On 03/29, Matheus Tavares Bernardino wrote:
> On Thu, Mar 28, 2019 at 7:10 PM Thomas Gummerer <t.gum...@gmail.com> wrote:
> > I notice that we are currently not using 'linkat()' anywhere else in
> > our codebase. It looks like it has been introduced in POSIX.1-2008,
> > which sounds fairly recent by git's standards. So I wonder if this is
> > really supported on all platforms that git is being built on.
> >
> > I also wonder what would need to be done on Windows if we were to
> > introduce this. I see we define the 'link()' function in
> > 'compat/mingw.c' for that currently, so I guess something similar
> > would be needed for 'linkat()'. I added Dscho to Cc for Windows
> > expertise.
>
> Ok, what if instead of using linkat() we use 'realpath(const char
> *path, char *resolved_path)', which will resolve any symlinks at
> 'path' and store the canonical path at 'resolved_path'? Then, we can
> still keep using link() but now, with the certainty that all platforms
> will have a consistent behaviour? (also, realpath() is POSIX.1-2001)
> Would that be a better idea?

Yeah, I think that is a good idea. Note that 'realpath()' itself is
not used anywhere in our codebase either, but there is
'strbuf_realpath()', that from reading the function documentation does
exactly what 'realpath()' would do. So using 'strbuf_realpath()'
would probably be the right thing to do here.

Thomas Gummerer

unread,
Mar 29, 2019, 4:15:44 PM3/29/19
to Ævar Arnfjörð Bjarmason, Matheus Tavares, g...@vger.kernel.org, Christian Couder, Nguyễn Thái Ngọc Duy, kerne...@googlegroups.com, Benoit Pierre, Junio C Hamano, Johannes Schindelin
On 03/29, Ęvar Arnfjörš Bjarmason wrote:
>
> On Thu, Mar 28 2019, Thomas Gummerer wrote:
> > I notice that we are currently not using 'linkat()' anywhere else in
> > our codebase. It looks like it has been introduced in POSIX.1-2008,
> > which sounds fairly recent by git's standards. So I wonder if this is
> > really supported on all platforms that git is being built on.
> >
> > I also wonder what would need to be done on Windows if we were to
> > introduce this. I see we define the 'link()' function in
> > 'compat/mingw.c' for that currently, so I guess something similar
> > would be needed for 'linkat()'. I added Dscho to Cc for Windows
> > expertise.
>
> For better of worse this particular quest started because I pointed out
> (with some WIP patches) that for understanding this change we should
> test whatever we did now, to ensure that the refactoring didn't have
> unintended side-effects.
>
> But that's a separate question from whether or not we want to keep the
> current behavior.
>
> I think the current behavior is clearly insane, so I think we should
> change it with some follow-up patches. In particular options like
> --dissociate should clearly (in my mind at least) have behavior similar
> to "cp -L", and --local should hardlink to the *target* of the symlink,
> if anything, at least for objects/{??,pack,info}

Right, I definitely agree with all of that. Adding tests for the
current behaviour is definitely a good thing if we can do it in a sane
way. And I also agree that the current behaviour is insane, and
should be fixed, but that may not want to be part of this patch
series.

> I think that changes the portability story with linkat(), since it's not
> something we should be planning to keep, just an intermediate step so we
> don't have a gigantic patch that both adds tests, refactors and changes
> the behavior.

Fair enough, but that also means that this patch series necessarily
has to introduce the changes in behaviour as well as switching clone
to use dir-iterator. Of course we could say that the switch-over to
using dir-iterator could be done as a separate patch series, but that
seems a bit too much of a change in scope of this series.

Now I think Matheus has actually found a nice solution to this issue
using 'strbuf_readlink()', which gives us the same behaviour as using
'linkat()' in this patch would give us, so this might not be that big
an issue in the end.

Matheus Tavares Bernardino

unread,
Mar 29, 2019, 10:49:54 PM3/29/19
to Thomas Gummerer, git, Ævar Arnfjörð Bjarmason, Christian Couder, Nguyễn Thái Ngọc Duy, Kernel USP, Alex Riesen, Junio C Hamano
On Fri, Mar 29, 2019 at 4:32 PM Thomas Gummerer <t.gum...@gmail.com> wrote:
>
> On 03/29, Matheus Tavares Bernardino wrote:
> > On Thu, Mar 28, 2019 at 6:49 PM Thomas Gummerer <t.gum...@gmail.com> wrote:
> > > When sending someone elses patch in a slightly modified version, it
> > > may also be useful to add which parts you changed, as it was done in
> > > e8dfcace31 ("poll: use GetTickCount64() to avoid wrap-around issues",
> > > 2018-10-31) for example.
> >
> > Thanks, I didn't know about that! I searched the log and didn't see
> > many of this on patches with 'Helped-by' tags, is there a particular
> > case to use it or not?
>
> Helped-by tags are usually used when you want to give someone credit
> for help you got on a patch that you originally authored. It's up to
> you at which point of involvement you actually want to add the tag, I
> tend to add them whenever someones input significantly
> changes/improves the patch. I think adding it here might be okay,
> it's just less common when sending a patch that someone else authored
> originally.
>

Ok, got it, thanks!
Ok, thanks again.

Matheus Tavares Bernardino

unread,
Mar 30, 2019, 1:32:15 AM3/30/19
to Thomas Gummerer, git, Ævar Arnfjörð Bjarmason, Christian Couder, Nguyễn Thái Ngọc Duy, Kernel USP, Benoit Pierre, Junio C Hamano, Johannes Schindelin
Thanks. While I was looking for realpath() at git codebase (before I
saw your email), I got a little confused: Besides strbuf_realpath() I
also found real_path(), real_path_if_valid() and real_pathdup(). All
these last three use strbuf_realpath() but they also initialize the
struct strbuf internally and just return a 'char *', which is much
convenient in some cases. What seems weird to me is that, whilst
real_pathdup() releases the internally initialized struct strubuf
(leaving just the returned string to be free'd by the user), the other
two don't. So, if struct strbuf change in the future to have more
dynamic allocated resources, these functions will also have to be
modified. Also, since real_pathdup() can already do what the other two
do, do you know if there is a reason to keep all of them?

One last question: I found some places which don't free the string
returned by, for example, real_path() (e.g., find_worktree() at
worktree.c). Would it be a valid/good patch (or patches) to add free()
calls in this places? (I'm currently trying to get more people here at
USP to contribute to git, and maybe this could be a nice first
contribution for them...)

Thomas Gummerer

unread,
Mar 30, 2019, 3:27:41 PM3/30/19
to Matheus Tavares Bernardino, git, Ævar Arnfjörð Bjarmason, Christian Couder, Nguyễn Thái Ngọc Duy, Kernel USP, Benoit Pierre, Junio C Hamano, Johannes Schindelin
On 03/30, Matheus Tavares Bernardino wrote:
> On Fri, Mar 29, 2019 at 5:05 PM Thomas Gummerer <t.gum...@gmail.com> wrote:
> >
> > On 03/29, Matheus Tavares Bernardino wrote:
> > > Ok, what if instead of using linkat() we use 'realpath(const char
> > > *path, char *resolved_path)', which will resolve any symlinks at
> > > 'path' and store the canonical path at 'resolved_path'? Then, we can
> > > still keep using link() but now, with the certainty that all platforms
> > > will have a consistent behaviour? (also, realpath() is POSIX.1-2001)
> > > Would that be a better idea?
> >
> > Yeah, I think that is a good idea. Note that 'realpath()' itself is
> > not used anywhere in our codebase either, but there is
> > 'strbuf_realpath()', that from reading the function documentation does
> > exactly what 'realpath()' would do. So using 'strbuf_realpath()'
> > would probably be the right thing to do here.
>
> Thanks. While I was looking for realpath() at git codebase (before I
> saw your email), I got a little confused: Besides strbuf_realpath() I
> also found real_path(), real_path_if_valid() and real_pathdup(). All
> these last three use strbuf_realpath() but they also initialize the
> struct strbuf internally and just return a 'char *', which is much
> convenient in some cases.

Right, feel free to use whichever is most convenient for you, and
whichever works in the context.

> What seems weird to me is that, whilst
> real_pathdup() releases the internally initialized struct strubuf
> (leaving just the returned string to be free'd by the user), the other
> two don't. So, if struct strbuf change in the future to have more
> dynamic allocated resources, these functions will also have to be
> modified. Also, since real_pathdup() can already do what the other two
> do, do you know if there is a reason to keep all of them?

Right, '*dup()' functions usually leave the return value to be free'd
by the caller. And while 'real_pathdup()' could do what the others do
already it also takes more effort to use it. Users don't need to free
the return value from 'real_path()' to avoid a memory leak. This
alone justifies its existence I think.

> One last question: I found some places which don't free the string
> returned by, for example, real_path() (e.g., find_worktree() at
> worktree.c). Would it be a valid/good patch (or patches) to add free()
> calls in this places? (I'm currently trying to get more people here at
> USP to contribute to git, and maybe this could be a nice first
> contribution for them...)

Trying to plug memory leaks in the codebase is definitely something
that I think is worthy of doing. Sometimes it's not worth actually
free'ing the memory, for example just before the program exits, in
which case we can use the UNLEAK annotation. It was introduced in
0e5bba53af ("add UNLEAK annotation for reducing leak false positives",
2017-09-08) if you want more background.

That said, the memory from 'real_path()' should actually not be
free'd. The strbuf there has a static lifetime, so it is valid until
git exits. If we were to free the return value of the function we'd
actually free an internal buffer of the strbuf, that is still valid.
So if someone were to use 'real_path()' after that, the memory that
strbuf still thinks it owns would actually have been free'd, which
would result in undefined behaviour, and probably would make git
segfault.

Matheus Tavares

unread,
Mar 30, 2019, 6:49:18 PM3/30/19
to Junio C Hamano, g...@vger.kernel.org, Thomas Gummerer, Ævar Arnfjörð Bjarmason, Christian Couder, Nguyễn Thái Ngọc Duy, SZEDER Gábor, kerne...@googlegroups.com
This patchset contains:
- a replacement of explicit recursive dir iteration at
copy_or_link_directory for the dir-iterator API;
- some refactoring and behaviour changes at local clone, mainly to
take care of symlinks and hidden files at .git/objects; and
- tests for this type of files

Changes since v4:
- Improved and fixed errors at messages from patches 1, 3, 5, 6 and 7.
- At first patch:
- Simplified construction, changing a multi-line cat for an echo.
- Removed unnecessary subshells.
- Disabled gc.auto, just to make sure we don't get any undesired
behaviour for this test
- Removed the first section of a sed command ("s!/..\$!/X!;")
that converts SHA-1s to fixed strings. No SHA-1 seemed to
be changed by this section and neither it seemed to be used
after the command.
- At second patch, removed linkat() usage, which is POSIX.1-2008
and may not be supported in all platforms git is being built.
Now the same effect is achieved using real_pathdup() + link().

v4: https://public-inbox.org/git/20190322232237.13293...@usp.br/

Matheus Tavares (6):
clone: better handle symlinked files at .git/objects/
dir-iterator: add flags parameter to dir_iterator_begin
clone: copy hidden paths at local clone
clone: extract function from copy_or_link_directory
clone: use dir-iterator to avoid explicit dir traversal
clone: replace strcmp by fspathcmp

Ævar Arnfjörð Bjarmason (1):
clone: test for our behavior on odd objects/* content

builtin/clone.c | 75 ++++++++++++---------
dir-iterator.c | 28 +++++++-
dir-iterator.h | 39 +++++++++--
refs/files-backend.c | 2 +-
t/t5604-clone-reference.sh | 133 +++++++++++++++++++++++++++++++++++++
5 files changed, 235 insertions(+), 42 deletions(-)

--
2.20.1

Matheus Tavares

unread,
Mar 30, 2019, 6:49:21 PM3/30/19
to Junio C Hamano, g...@vger.kernel.org, Thomas Gummerer, Ævar Arnfjörð Bjarmason, Christian Couder, Nguyễn Thái Ngọc Duy, SZEDER Gábor, kerne...@googlegroups.com, Alex Riesen
From: Ævar Arnfjörð Bjarmason <ava...@gmail.com>

Add tests for what happens when we perform a local clone on a repo
containing odd files at .git/object directory, such as symlinks to other
dirs, or unknown files.

I'm bending over backwards here to avoid a SHA-1 dependency. See [1]
for an earlier and simpler version that hardcoded SHA-1s.

This behavior has been the same for a *long* time, but hasn't been
tested for.

There's a good post-hoc argument to be made for copying over unknown
things, e.g. I'd like a git version that doesn't know about the
commit-graph to copy it under "clone --local" so a newer git version
can make use of it.

In follow-up commits we'll look at changing some of this behavior, but
for now, let's just assert it as-is so we'll notice what we'll change
later.

1. https://public-inbox.org/git/20190226002625...@gmail.com/

Signed-off-by: Ævar Arnfjörð Bjarmason <ava...@gmail.com>
[matheus.bernardino: improved and split tests in more than one patch]
Helped-by: Matheus Tavares <matheus.b...@usp.br>
Signed-off-by: Matheus Tavares <matheus.b...@usp.br>
---
t/t5604-clone-reference.sh | 111 +++++++++++++++++++++++++++++++++++++
1 file changed, 111 insertions(+)

diff --git a/t/t5604-clone-reference.sh b/t/t5604-clone-reference.sh
index 4320082b1b..207650cb95 100755
--- a/t/t5604-clone-reference.sh
+++ b/t/t5604-clone-reference.sh
@@ -221,4 +221,115 @@ test_expect_success 'clone, dissociate from alternates' '
( cd C && git fsck )
'

+test_expect_success 'setup repo with garbage in objects/*' '
+ git init S &&
+ (
+ cd S &&
+ test_commit A &&
+
+ cd .git/objects &&
+ >.some-hidden-file &&
+ >some-file &&
+ mkdir .some-hidden-dir &&
+ >.some-hidden-dir/some-file &&
+ >.some-hidden-dir/.some-dot-file &&
+ mkdir some-dir &&
+ >some-dir/some-file &&
+ >some-dir/.some-dot-file
+ )
+'
+
+test_expect_success 'clone a repo with garbage in objects/*' '
+ for option in --local --no-hardlinks --shared --dissociate
+ do
+ git clone $option S S$option || return 1 &&
+ git -C S$option fsck || return 1
+ done &&
+ find S-* -name "*some*" | sort >actual &&
+ cat >expected <<-EOF &&
+ S--dissociate/.git/objects/.some-hidden-file
+ S--dissociate/.git/objects/some-dir
+ S--dissociate/.git/objects/some-dir/.some-dot-file
+ S--dissociate/.git/objects/some-dir/some-file
+ S--dissociate/.git/objects/some-file
+ S--local/.git/objects/.some-hidden-file
+ S--local/.git/objects/some-dir
+ S--local/.git/objects/some-dir/.some-dot-file
+ S--local/.git/objects/some-dir/some-file
+ S--local/.git/objects/some-file
+ S--no-hardlinks/.git/objects/.some-hidden-file
+ S--no-hardlinks/.git/objects/some-dir
+ S--no-hardlinks/.git/objects/some-dir/.some-dot-file
+ S--no-hardlinks/.git/objects/some-dir/some-file
+ S--no-hardlinks/.git/objects/some-file
+ EOF
+ test_cmp expected actual
+'
+
+test_expect_success SYMLINKS 'setup repo with manually symlinked dirs and unknown files at objects/' '
+ git init T &&
+ (
+ cd T &&
+ git config gc.auto 0 &&
+ test_commit A &&
+ git gc &&
+ test_commit B &&
+
+ cd .git/objects &&
+ mv pack packs &&
+ ln -s packs pack &&
+ find ?? -type d >loose-dirs &&
+ last_loose=$(tail -n 1 loose-dirs) &&
+ rm -f loose-dirs &&
+ mv $last_loose a-loose-dir &&
+ ln -s a-loose-dir $last_loose &&
+ find . -type f | sort >../../../T.objects-files.raw &&
+ echo unknown_content> unknown_file
+ ) &&
+ git -C T fsck &&
+ git -C T rev-list --all --objects >T.objects
+'
+
+
+test_expect_success SYMLINKS 'clone repo with symlinked dirs and unknown files at objects/' '
+ for option in --local --no-hardlinks --shared --dissociate
+ do
+ git clone $option T T$option || return 1 &&
+ git -C T$option fsck || return 1 &&
+ git -C T$option rev-list --all --objects >T$option.objects &&
+ test_cmp T.objects T$option.objects &&
+ (
+ cd T$option/.git/objects &&
+ find . -type f | sort >../../../T$option.objects-files.raw
+ )
+ done &&
+
+ for raw in $(ls T*.raw)
+ do
+ sed -e "s!/../!/Y/!; s![0-9a-f]\{38,\}!Z!" -e "/commit-graph/d" \
+ -e "/multi-pack-index/d" <$raw >$raw.de-sha || return 1
+ done &&
+
+ cat >expected-files <<-EOF &&
+ ./Y/Z
+ ./Y/Z
+ ./a-loose-dir/Z
+ ./Y/Z
+ ./info/packs
+ ./pack/pack-Z.idx
+ ./pack/pack-Z.pack
+ ./packs/pack-Z.idx
+ ./packs/pack-Z.pack
+ ./unknown_file
+ EOF
+
+ for option in --local --dissociate --no-hardlinks
+ do
+ test_cmp expected-files T$option.objects-files.raw.de-sha || return 1
+ done &&
+
+ echo ./info/alternates >expected-files &&

Matheus Tavares

unread,
Mar 30, 2019, 6:49:24 PM3/30/19
to Junio C Hamano, g...@vger.kernel.org, Thomas Gummerer, Ævar Arnfjörð Bjarmason, Christian Couder, Nguyễn Thái Ngọc Duy, SZEDER Gábor, kerne...@googlegroups.com
There is currently an odd behaviour when locally cloning a repository
with symlinks at .git/objects: using --no-hardlinks all symlinks are
dereferenced but without it, Git will try to hardlink the files with the
link() function, which has an OS-specific behaviour on symlinks. On OSX
and NetBSD, it creates a hardlink to the file pointed by the symlink
whilst on GNU/Linux, it creates a hardlink to the symlink itself.

On Manjaro GNU/Linux:
$ touch a
$ ln -s a b
$ link b c
$ ls -li a b c
155 [...] a
156 [...] b -> a
156 [...] c -> a

But on NetBSD:
$ ls -li a b c
2609160 [...] a
2609164 [...] b -> a
2609160 [...] c

It's not good to have the result of a local clone to be OS-dependent and
besides that, the current behaviour on GNU/Linux may result in broken
symlinks. So let's standardize this by making the hardlinks always point
to dereferenced paths, instead of the symlinks themselves. Also, add
tests for symlinked files at .git/objects/.

Note: Git won't create symlinks at .git/objects itself, but it's better
to handle this case and be friendly with users who manually create them.

Signed-off-by: Matheus Tavares <matheus.b...@usp.br>
Signed-off-by: Ævar Arnfjörð Bjarmason <ava...@gmail.com>
Co-authored-by: Ævar Arnfjörð Bjarmason <ava...@gmail.com>
---
builtin/clone.c | 5 ++++-
t/t5604-clone-reference.sh | 27 ++++++++++++++++++++-------
2 files changed, 24 insertions(+), 8 deletions(-)

diff --git a/builtin/clone.c b/builtin/clone.c
index 50bde99618..f975b509f1 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -443,7 +443,10 @@ static void copy_or_link_directory(struct strbuf *src, struct strbuf *dest,
if (unlink(dest->buf) && errno != ENOENT)
die_errno(_("failed to unlink '%s'"), dest->buf);
if (!option_no_hardlinks) {
- if (!link(src->buf, dest->buf))
+ char *resolved_path = real_pathdup(src->buf, 1);
+ int status = link(resolved_path, dest->buf);
+ free(resolved_path);
+ if (!status)
continue;
if (option_local > 0)
die_errno(_("failed to create link '%s'"), dest->buf);
diff --git a/t/t5604-clone-reference.sh b/t/t5604-clone-reference.sh
index 207650cb95..0800c3853f 100755
--- a/t/t5604-clone-reference.sh
+++ b/t/t5604-clone-reference.sh
@@ -266,7 +266,7 @@ test_expect_success 'clone a repo with garbage in objects/*' '
test_cmp expected actual
'

-test_expect_success SYMLINKS 'setup repo with manually symlinked dirs and unknown files at objects/' '
+test_expect_success SYMLINKS 'setup repo with manually symlinked or unknown files at objects/' '
git init T &&
(
cd T &&
@@ -280,10 +280,19 @@ test_expect_success SYMLINKS 'setup repo with manually symlinked dirs and unknow
ln -s packs pack &&
find ?? -type d >loose-dirs &&
last_loose=$(tail -n 1 loose-dirs) &&
- rm -f loose-dirs &&
mv $last_loose a-loose-dir &&
ln -s a-loose-dir $last_loose &&
+ first_loose=$(head -n 1 loose-dirs) &&
+ rm -f loose-dirs &&
+
+ cd $first_loose &&
+ obj=$(ls *) &&
+ mv $obj ../an-object &&
+ ln -s ../an-object $obj &&
+
+ cd ../ &&
find . -type f | sort >../../../T.objects-files.raw &&
+ find . -type l | sort >../../../T.objects-symlinks.raw &&
echo unknown_content> unknown_file
) &&
git -C T fsck &&
@@ -291,7 +300,7 @@ test_expect_success SYMLINKS 'setup repo with manually symlinked dirs and unknow
'


-test_expect_success SYMLINKS 'clone repo with symlinked dirs and unknown files at objects/' '
+test_expect_success SYMLINKS 'clone repo with symlinked or unknown files at objects/' '
for option in --local --no-hardlinks --shared --dissociate
do
git clone $option T T$option || return 1 &&
@@ -300,7 +309,8 @@ test_expect_success SYMLINKS 'clone repo with symlinked dirs and unknown files a
test_cmp T.objects T$option.objects &&
(
cd T$option/.git/objects &&
- find . -type f | sort >../../../T$option.objects-files.raw
+ find . -type f | sort >../../../T$option.objects-files.raw &&
+ find . -type l | sort >../../../T$option.objects-symlinks.raw
)
done &&

@@ -314,6 +324,7 @@ test_expect_success SYMLINKS 'clone repo with symlinked dirs and unknown files a
./Y/Z
./Y/Z
./a-loose-dir/Z
+ ./an-object
./Y/Z
./info/packs
./pack/pack-Z.idx
@@ -323,13 +334,15 @@ test_expect_success SYMLINKS 'clone repo with symlinked dirs and unknown files a
./unknown_file
EOF

- for option in --local --dissociate --no-hardlinks
+ for option in --local --no-hardlinks --dissociate
do
- test_cmp expected-files T$option.objects-files.raw.de-sha || return 1
+ test_cmp expected-files T$option.objects-files.raw.de-sha || return 1 &&
+ test_must_be_empty T$option.objects-symlinks.raw.de-sha || return 1
done &&

echo ./info/alternates >expected-files &&
- test_cmp expected-files T--shared.objects-files.raw
+ test_cmp expected-files T--shared.objects-files.raw &&
+ test_must_be_empty T--shared.objects-symlinks.raw
'

test_done
--
2.20.1

Matheus Tavares

unread,
Mar 30, 2019, 6:49:28 PM3/30/19
to Junio C Hamano, g...@vger.kernel.org, Thomas Gummerer, Ævar Arnfjörð Bjarmason, Christian Couder, Nguyễn Thái Ngọc Duy, SZEDER Gábor, kerne...@googlegroups.com, Michael Haggerty, Ramsay Jones
Add the possibility of giving flags to dir_iterator_begin to initialize
a dir-iterator with special options.

Currently possible flags are DIR_ITERATOR_PEDANTIC, which makes
dir_iterator_advance abort immediately in the case of an error while
trying to fetch next entry; and DIR_ITERATOR_FOLLOW_SYMLINKS, which
makes the iteration follow symlinks to directories and include its
contents in the iteration. These new flags will be used in a subsequent
patch.

Also adjust refs/files-backend.c to the new dir_iterator_begin
signature.

Signed-off-by: Matheus Tavares <matheus.b...@usp.br>
---
dir-iterator.c | 28 +++++++++++++++++++++++++---
dir-iterator.h | 39 +++++++++++++++++++++++++++++++++------
refs/files-backend.c | 2 +-
3 files changed, 59 insertions(+), 10 deletions(-)

diff --git a/dir-iterator.c b/dir-iterator.c
index f2dcd82fde..17aca8ea41 100644
--- a/dir-iterator.c
+++ b/dir-iterator.c
@@ -48,12 +48,16 @@ struct dir_iterator_int {
* that will be included in this iteration.
*/
struct dir_iterator_level *levels;
+
+ /* Combination of flags for this dir-iterator */
+ unsigned flags;
};

int dir_iterator_advance(struct dir_iterator *dir_iterator)
{
struct dir_iterator_int *iter =
(struct dir_iterator_int *)dir_iterator;
+ int ret;

while (1) {
struct dir_iterator_level *level =
@@ -71,6 +75,8 @@ int dir_iterator_advance(struct dir_iterator *dir_iterator)

level->dir = opendir(iter->base.path.buf);
if (!level->dir && errno != ENOENT) {
+ if (iter->flags & DIR_ITERATOR_PEDANTIC)
+ goto error_out;
warning("error opening directory %s: %s",
iter->base.path.buf, strerror(errno));
/* Popping the level is handled below */
@@ -122,6 +128,8 @@ int dir_iterator_advance(struct dir_iterator *dir_iterator)
if (!de) {
/* This level is exhausted; pop up a level. */
if (errno) {
+ if (iter->flags & DIR_ITERATOR_PEDANTIC)
+ goto error_out;
warning("error reading directory %s: %s",
iter->base.path.buf, strerror(errno));
} else if (closedir(level->dir))
@@ -138,11 +146,20 @@ int dir_iterator_advance(struct dir_iterator *dir_iterator)
continue;

strbuf_addstr(&iter->base.path, de->d_name);
- if (lstat(iter->base.path.buf, &iter->base.st) < 0) {
- if (errno != ENOENT)
+
+ if (iter->flags & DIR_ITERATOR_FOLLOW_SYMLINKS)
+ ret = stat(iter->base.path.buf, &iter->base.st);
+ else
+ ret = lstat(iter->base.path.buf, &iter->base.st);
+
+ if (ret < 0) {
+ if (errno != ENOENT) {
+ if (iter->flags & DIR_ITERATOR_PEDANTIC)
+ goto error_out;
warning("error reading path '%s': %s",
iter->base.path.buf,
strerror(errno));
+ }
continue;
}

@@ -159,6 +176,10 @@ int dir_iterator_advance(struct dir_iterator *dir_iterator)
return ITER_OK;
}
}
+
+error_out:
+ dir_iterator_abort(dir_iterator);
+ return ITER_ERROR;
}

int dir_iterator_abort(struct dir_iterator *dir_iterator)
@@ -182,7 +203,7 @@ int dir_iterator_abort(struct dir_iterator *dir_iterator)
return ITER_DONE;
}

-struct dir_iterator *dir_iterator_begin(const char *path)
+struct dir_iterator *dir_iterator_begin(const char *path, unsigned flags)
{
struct dir_iterator_int *iter = xcalloc(1, sizeof(*iter));
struct dir_iterator *dir_iterator = &iter->base;
@@ -195,6 +216,7 @@ struct dir_iterator *dir_iterator_begin(const char *path)

ALLOC_GROW(iter->levels, 10, iter->levels_alloc);

+ iter->flags = flags;
iter->levels_nr = 1;
iter->levels[0].initialized = 0;

diff --git a/dir-iterator.h b/dir-iterator.h
index 970793d07a..93646c3bea 100644
--- a/dir-iterator.h
+++ b/dir-iterator.h
@@ -19,7 +19,7 @@
* A typical iteration looks like this:
*
* int ok;
- * struct iterator *iter = dir_iterator_begin(path);
+ * struct iterator *iter = dir_iterator_begin(path, 0);
*
* while ((ok = dir_iterator_advance(iter)) == ITER_OK) {
* if (want_to_stop_iteration()) {
@@ -40,6 +40,20 @@
* dir_iterator_advance() again.
*/

+/*
+ * Flags for dir_iterator_begin:
+ *
+ * - DIR_ITERATOR_PEDANTIC: override dir-iterator's default behavior
+ * in case of an error while trying to fetch the next entry, which is
+ * to emit a warning and keep going. With this flag, resouces are
+ * freed and ITER_ERROR is return immediately.
+ *
+ * - DIR_ITERATOR_FOLLOW_SYMLINKS: make dir-iterator follow symlinks to
+ * directories, i.e., iterate over linked directories' contents.
+ */
+#define DIR_ITERATOR_PEDANTIC (1 << 0)
+#define DIR_ITERATOR_FOLLOW_SYMLINKS (1 << 1)
+
struct dir_iterator {
/* The current path: */
struct strbuf path;
@@ -54,20 +68,28 @@ struct dir_iterator {
/* The current basename: */
const char *basename;

- /* The result of calling lstat() on path: */
+ /*
+ * The result of calling lstat() on path or stat(), if the
+ * DIR_ITERATOR_FOLLOW_SYMLINKS flag was set at
+ * dir_iterator's initialization.
+ */
struct stat st;
};

/*
- * Start a directory iteration over path. Return a dir_iterator that
- * holds the internal state of the iteration.
+ * Start a directory iteration over path with the combination of
+ * options specified by flags. Return a dir_iterator that holds the
+ * internal state of the iteration.
*
* The iteration includes all paths under path, not including path
* itself and not including "." or ".." entries.
*
- * path is the starting directory. An internal copy will be made.
+ * Parameters are:
+ * - path is the starting directory. An internal copy will be made.
+ * - flags is a combination of the possible flags to initialize a
+ * dir-iterator or 0 for default behaviour.
*/
-struct dir_iterator *dir_iterator_begin(const char *path);
+struct dir_iterator *dir_iterator_begin(const char *path, unsigned flags);

/*
* Advance the iterator to the first or next item and return ITER_OK.
@@ -76,6 +98,11 @@ struct dir_iterator *dir_iterator_begin(const char *path);
* dir_iterator and associated resources and return ITER_ERROR. It is
* a bug to use iterator or call this function again after it has
* returned ITER_DONE or ITER_ERROR.
+ *
+ * Note that whether dir-iterator will return ITER_ERROR when failing
+ * to fetch the next entry or just emit a warning and try to fetch the
+ * next is defined by the 'pedantic' option at dir-iterator's
+ * initialization.
*/
int dir_iterator_advance(struct dir_iterator *iterator);

diff --git a/refs/files-backend.c b/refs/files-backend.c
index ef053f716c..2ce9783097 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -2143,7 +2143,7 @@ static struct ref_iterator *reflog_iterator_begin(struct ref_store *ref_store,

base_ref_iterator_init(ref_iterator, &files_reflog_iterator_vtable, 0);
strbuf_addf(&sb, "%s/logs", gitdir);
- iter->dir_iterator = dir_iterator_begin(sb.buf);
+ iter->dir_iterator = dir_iterator_begin(sb.buf, 0);
iter->ref_store = ref_store;
strbuf_release(&sb);

--
2.20.1

Matheus Tavares

unread,
Mar 30, 2019, 6:49:31 PM3/30/19
to Junio C Hamano, g...@vger.kernel.org, Thomas Gummerer, Ævar Arnfjörð Bjarmason, Christian Couder, Nguyễn Thái Ngọc Duy, SZEDER Gábor, kerne...@googlegroups.com
Make the copy_or_link_directory function no longer skip hidden
directories. This function, used to copy .git/objects, currently skips
all hidden directories but not hidden files, which is an odd behaviour.
The reason for that could be unintentional: probably the intention was
to skip '.' and '..' only but it ended up accidentally skipping all
directories starting with '.'. Besides being more natural, the new
behaviour is more permissive to the user.

Also adjust tests to reflect this behaviour change.

Signed-off-by: Matheus Tavares <matheus.b...@usp.br>
Signed-off-by: Ævar Arnfjörð Bjarmason <ava...@gmail.com>
Co-authored-by: Ævar Arnfjörð Bjarmason <ava...@gmail.com>
---
builtin/clone.c | 2 +-
t/t5604-clone-reference.sh | 9 +++++++++
2 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/builtin/clone.c b/builtin/clone.c
index f975b509f1..81e1a39c61 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -428,7 +428,7 @@ static void copy_or_link_directory(struct strbuf *src, struct strbuf *dest,
continue;
}
if (S_ISDIR(buf.st_mode)) {
- if (de->d_name[0] != '.')
+ if (!is_dot_or_dotdot(de->d_name))
copy_or_link_directory(src, dest,
src_repo, src_baselen);
continue;
diff --git a/t/t5604-clone-reference.sh b/t/t5604-clone-reference.sh
index 0800c3853f..c3998f2f9e 100755
--- a/t/t5604-clone-reference.sh
+++ b/t/t5604-clone-reference.sh
@@ -247,16 +247,25 @@ test_expect_success 'clone a repo with garbage in objects/*' '
done &&
find S-* -name "*some*" | sort >actual &&
cat >expected <<-EOF &&
+ S--dissociate/.git/objects/.some-hidden-dir
+ S--dissociate/.git/objects/.some-hidden-dir/.some-dot-file
+ S--dissociate/.git/objects/.some-hidden-dir/some-file
S--dissociate/.git/objects/.some-hidden-file
S--dissociate/.git/objects/some-dir
S--dissociate/.git/objects/some-dir/.some-dot-file
S--dissociate/.git/objects/some-dir/some-file
S--dissociate/.git/objects/some-file
+ S--local/.git/objects/.some-hidden-dir
+ S--local/.git/objects/.some-hidden-dir/.some-dot-file
+ S--local/.git/objects/.some-hidden-dir/some-file
S--local/.git/objects/.some-hidden-file
S--local/.git/objects/some-dir
S--local/.git/objects/some-dir/.some-dot-file
S--local/.git/objects/some-dir/some-file
S--local/.git/objects/some-file
+ S--no-hardlinks/.git/objects/.some-hidden-dir
+ S--no-hardlinks/.git/objects/.some-hidden-dir/.some-dot-file
+ S--no-hardlinks/.git/objects/.some-hidden-dir/some-file
S--no-hardlinks/.git/objects/.some-hidden-file
S--no-hardlinks/.git/objects/some-dir
S--no-hardlinks/.git/objects/some-dir/.some-dot-file
--
2.20.1

Matheus Tavares

unread,
Mar 30, 2019, 6:49:34 PM3/30/19
to Junio C Hamano, g...@vger.kernel.org, Thomas Gummerer, Ævar Arnfjörð Bjarmason, Christian Couder, Nguyễn Thái Ngọc Duy, SZEDER Gábor, kerne...@googlegroups.com
Extract dir creation code snippet from copy_or_link_directory to its own
function named mkdir_if_missing. This change will help to remove
copy_or_link_directory's explicit recursion, which will be done in a
following patch. Also makes the code more readable.

Signed-off-by: Matheus Tavares <matheus.b...@usp.br>
---
builtin/clone.c | 24 ++++++++++++++++--------
1 file changed, 16 insertions(+), 8 deletions(-)

diff --git a/builtin/clone.c b/builtin/clone.c
index 81e1a39c61..f348eb02d4 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -392,6 +392,21 @@ static void copy_alternates(struct strbuf *src, struct strbuf *dst,
fclose(in);
}

+static void mkdir_if_missing(const char *pathname, mode_t mode)
+{
+ struct stat st;
+
+ if (!mkdir(pathname, mode))
+ return;
+
+ if (errno != EEXIST)
+ die_errno(_("failed to create directory '%s'"), pathname);
+ else if (stat(pathname, &st))
+ die_errno(_("failed to stat '%s'"), pathname);
+ else if (!S_ISDIR(st.st_mode))
+ die(_("%s exists and is not a directory"), pathname);
+}
+
static void copy_or_link_directory(struct strbuf *src, struct strbuf *dest,
const char *src_repo, int src_baselen)
{
@@ -404,14 +419,7 @@ static void copy_or_link_directory(struct strbuf *src, struct strbuf *dest,
if (!dir)
die_errno(_("failed to open '%s'"), src->buf);

- if (mkdir(dest->buf, 0777)) {
- if (errno != EEXIST)
- die_errno(_("failed to create directory '%s'"), dest->buf);
- else if (stat(dest->buf, &buf))
- die_errno(_("failed to stat '%s'"), dest->buf);
- else if (!S_ISDIR(buf.st_mode))
- die(_("%s exists and is not a directory"), dest->buf);
- }
+ mkdir_if_missing(dest->buf, 0777);

strbuf_addch(src, '/');
src_len = src->len;
--
2.20.1

Matheus Tavares

unread,
Mar 30, 2019, 6:49:37 PM3/30/19
to Junio C Hamano, g...@vger.kernel.org, Thomas Gummerer, Ævar Arnfjörð Bjarmason, Christian Couder, Nguyễn Thái Ngọc Duy, SZEDER Gábor, kerne...@googlegroups.com
Replace usage of opendir/readdir/closedir API to traverse directories
recursively, at copy_or_link_directory function, by the dir-iterator
API. This simplifies the code and avoids recursive calls to
copy_or_link_directory.

This process also makes copy_or_link_directory call die() in case of an
error on readdir or stat, inside dir_iterator_advance. Previously it
would just print a warning for errors on stat and ignore errors on
readdir, which isn't nice because a local git clone could succeed even
though the .git/objects copy didn't fully succeed.

Signed-off-by: Matheus Tavares <matheus.b...@usp.br>
---
builtin/clone.c | 44 ++++++++++++++++++++++----------------------
1 file changed, 22 insertions(+), 22 deletions(-)

diff --git a/builtin/clone.c b/builtin/clone.c
index f348eb02d4..ebe8d83334 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -23,6 +23,8 @@
#include "transport.h"
#include "strbuf.h"
#include "dir.h"
+#include "dir-iterator.h"
+#include "iterator.h"
#include "sigchain.h"
#include "branch.h"
#include "remote.h"
@@ -408,42 +410,36 @@ static void mkdir_if_missing(const char *pathname, mode_t mode)
}

static void copy_or_link_directory(struct strbuf *src, struct strbuf *dest,
- const char *src_repo, int src_baselen)
+ const char *src_repo)
{
- struct dirent *de;
- struct stat buf;
int src_len, dest_len;
- DIR *dir;
-
- dir = opendir(src->buf);
- if (!dir)
- die_errno(_("failed to open '%s'"), src->buf);
+ struct dir_iterator *iter;
+ int iter_status;
+ unsigned flags;

mkdir_if_missing(dest->buf, 0777);

+ flags = DIR_ITERATOR_PEDANTIC | DIR_ITERATOR_FOLLOW_SYMLINKS;
+ iter = dir_iterator_begin(src->buf, flags);
+
strbuf_addch(src, '/');
src_len = src->len;
strbuf_addch(dest, '/');
dest_len = dest->len;

- while ((de = readdir(dir)) != NULL) {
+ while ((iter_status = dir_iterator_advance(iter)) == ITER_OK) {
strbuf_setlen(src, src_len);
- strbuf_addstr(src, de->d_name);
+ strbuf_addstr(src, iter->relative_path);
strbuf_setlen(dest, dest_len);
- strbuf_addstr(dest, de->d_name);
- if (stat(src->buf, &buf)) {
- warning (_("failed to stat %s\n"), src->buf);
- continue;
- }
- if (S_ISDIR(buf.st_mode)) {
- if (!is_dot_or_dotdot(de->d_name))
- copy_or_link_directory(src, dest,
- src_repo, src_baselen);
+ strbuf_addstr(dest, iter->relative_path);
+
+ if (S_ISDIR(iter->st.st_mode)) {
+ mkdir_if_missing(dest->buf, 0777);
continue;
}

/* Files that cannot be copied bit-for-bit... */
- if (!strcmp(src->buf + src_baselen, "/info/alternates")) {
+ if (!strcmp(iter->relative_path, "info/alternates")) {
copy_alternates(src, dest, src_repo);
continue;
}
@@ -463,7 +459,11 @@ static void copy_or_link_directory(struct strbuf *src, struct strbuf *dest,
if (copy_file_with_time(dest->buf, src->buf, 0666))
die_errno(_("failed to copy file to '%s'"), dest->buf);
}
- closedir(dir);
+
+ if (iter_status != ITER_DONE) {
+ strbuf_setlen(src, src_len);
+ die(_("failed to iterate over '%s'"), src->buf);
+ }
}

static void clone_local(const char *src_repo, const char *dest_repo)
@@ -481,7 +481,7 @@ static void clone_local(const char *src_repo, const char *dest_repo)
get_common_dir(&dest, dest_repo);
strbuf_addstr(&src, "/objects");
strbuf_addstr(&dest, "/objects");
- copy_or_link_directory(&src, &dest, src_repo, src.len);
+ copy_or_link_directory(&src, &dest, src_repo);
strbuf_release(&src);
strbuf_release(&dest);
}
--
2.20.1

Matheus Tavares

unread,
Mar 30, 2019, 6:49:40 PM3/30/19
to Junio C Hamano, g...@vger.kernel.org, Thomas Gummerer, Ævar Arnfjörð Bjarmason, Christian Couder, Nguyễn Thái Ngọc Duy, SZEDER Gábor, kerne...@googlegroups.com
Replace the use of strcmp by fspathcmp at copy_or_link_directory, which
is more permissive/friendly to case-insensitive file systems.

Signed-off-by: Matheus Tavares <matheus.b...@usp.br>
Suggested-by: Nguyễn Thái Ngọc Duy <pcl...@gmail.com>
---
builtin/clone.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/builtin/clone.c b/builtin/clone.c
index ebe8d83334..bf56a01638 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -439,7 +439,7 @@ static void copy_or_link_directory(struct strbuf *src, struct strbuf *dest,
}

/* Files that cannot be copied bit-for-bit... */
- if (!strcmp(iter->relative_path, "info/alternates")) {
+ if (!fspathcmp(iter->relative_path, "info/alternates")) {
copy_alternates(src, dest, src_repo);
continue;
}
--
2.20.1

Thomas Gummerer

unread,
Mar 31, 2019, 1:40:41 PM3/31/19
to Matheus Tavares, Junio C Hamano, g...@vger.kernel.org, Ævar Arnfjörð Bjarmason, Christian Couder, Nguyễn Thái Ngọc Duy, SZEDER Gábor, kerne...@googlegroups.com
Is there any reason why we can't use 'real_path()' here? As I
mentioned in [*1*], 'real_path()' doesn't require the callers to free
any memory, so the above could become much simpler, and could just be

+ if (!link(real_path(src->buf), dest->buf))

*1*: <20190330192...@hank.intra.tgummerer.com>

Thomas Gummerer

unread,
Mar 31, 2019, 2:12:14 PM3/31/19
to Matheus Tavares, Junio C Hamano, g...@vger.kernel.org, Ævar Arnfjörð Bjarmason, Christian Couder, Nguyễn Thái Ngọc Duy, SZEDER Gábor, kerne...@googlegroups.com, Michael Haggerty, Ramsay Jones
On 03/30, Matheus Tavares wrote:
Minor nit: I'd define this variable closer to where it is actually
used, inside the second 'while(1)' loop in this function. That would
make it clearer that it's only used there and not in other places in
the function as well, which I had first expected when I read this.
Outside of this context, we already mentione errorhandling when
'ok != ITER_DONE' in his example. This still can't happen with the
way the dir iterator is used here, but it serves as a reminder if
people are using the DIR_ITERATOR_PEDANTIC flag. Good.
I feel like at this point we are repeating documentation that already
exists for the flags. Should we ever find a reason to return
ITER_ERROR without the pedantic flag, this comment is likely to become
out of date. I think not adding this note is probably better in this
case.

Thomas Gummerer

unread,
Mar 31, 2019, 2:16:38 PM3/31/19
to Matheus Tavares, Junio C Hamano, g...@vger.kernel.org, Ævar Arnfjörð Bjarmason, Christian Couder, Nguyễn Thái Ngọc Duy, SZEDER Gábor, kerne...@googlegroups.com
On 03/30, Matheus Tavares wrote:
> This patchset contains:
> - a replacement of explicit recursive dir iteration at
> copy_or_link_directory for the dir-iterator API;
> - some refactoring and behaviour changes at local clone, mainly to
> take care of symlinks and hidden files at .git/objects; and
> - tests for this type of files

Thanks. I read through the series, and only found a few minor nits.

One note on the cover letter, as I'm not sure I mentioned this before.
But as the series progresses and there are less changes in individual
patches, it is useful to include a 'range-diff', so reviewers can
quickly see what changed in the series. This is especially useful if
they can still remember the last iteration, so they don't necessarily
have to re-read the whole series.

This can be added using the '--range-diff' option in 'git
format-patch'.

> Changes since v4:
> - Improved and fixed errors at messages from patches 1, 3, 5, 6 and 7.
> - At first patch:
> - Simplified construction, changing a multi-line cat for an echo.
> - Removed unnecessary subshells.
> - Disabled gc.auto, just to make sure we don't get any undesired
> behaviour for this test
> - Removed the first section of a sed command ("s!/..\$!/X!;")
> that converts SHA-1s to fixed strings. No SHA-1 seemed to
> be changed by this section and neither it seemed to be used
> after the command.
> - At second patch, removed linkat() usage, which is POSIX.1-2008
> and may not be supported in all platforms git is being built.
> Now the same effect is achieved using real_pathdup() + link().
>
> v4: https://public-inbox.org/git/20190322232237.13293...@usp.br/
>
> Matheus Tavares (6):
> clone: better handle symlinked files at .git/objects/
> dir-iterator: add flags parameter to dir_iterator_begin
> clone: copy hidden paths at local clone
> clone: extract function from copy_or_link_directory
> clone: use dir-iterator to avoid explicit dir traversal
> clone: replace strcmp by fspathcmp
>
> Ęvar Arnfjörš Bjarmason (1):

Matheus Tavares Bernardino

unread,
Mar 31, 2019, 11:57:04 PM3/31/19
to Thomas Gummerer, git, Ævar Arnfjörð Bjarmason, Christian Couder, Nguyễn Thái Ngọc Duy, Kernel USP, Benoit Pierre, Junio C Hamano, Johannes Schindelin
Thanks for the great explanation, Thomas. I hadn't noticed that the
strbuf variable inside real_path() is declared as static. I also took
some time, now, to better understand how strbuf functions deal with
the buf attribute (especially how it's realloc'ed) and now I think I
understand it better. Thanks again for the help!

> --
> You received this message because you are subscribed to the Google Groups "Kernel USP" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to kernel-usp+...@googlegroups.com.
> To post to this group, send email to kerne...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/kernel-usp/20190330192738.GQ32487%40hank.intra.tgummerer.com.
> For more options, visit https://groups.google.com/d/optout.

Matheus Tavares Bernardino

unread,
Mar 31, 2019, 11:59:48 PM3/31/19
to Thomas Gummerer, Junio C Hamano, git, Ævar Arnfjörð Bjarmason, Christian Couder, Nguyễn Thái Ngọc Duy, SZEDER Gábor, Kernel USP
Yes, you are right. I will change this! I sent this v5 before
carefully reading your previous email and studding strbuf functions
and real_path(), now that I did that, I see that real_path() is the
best option here. Thanks!

Matheus Tavares Bernardino

unread,
Apr 1, 2019, 9:56:26 AM4/1/19
to Thomas Gummerer, Junio C Hamano, git, Ævar Arnfjörð Bjarmason, Christian Couder, Nguyễn Thái Ngọc Duy, SZEDER Gábor, Kernel USP
On Sun, Mar 31, 2019 at 3:16 PM Thomas Gummerer <t.gum...@gmail.com> wrote:
>
> On 03/30, Matheus Tavares wrote:
> > This patchset contains:
> > - a replacement of explicit recursive dir iteration at
> > copy_or_link_directory for the dir-iterator API;
> > - some refactoring and behaviour changes at local clone, mainly to
> > take care of symlinks and hidden files at .git/objects; and
> > - tests for this type of files
>
> Thanks. I read through the series, and only found a few minor nits.
>
> One note on the cover letter, as I'm not sure I mentioned this before.
> But as the series progresses and there are less changes in individual
> patches, it is useful to include a 'range-diff', so reviewers can
> quickly see what changed in the series. This is especially useful if
> they can still remember the last iteration, so they don't necessarily
> have to re-read the whole series.
>
> This can be added using the '--range-diff' option in 'git
> format-patch'.

Thanks! I think you've said it earlier, but I forgot to use. I will
include it in v6! Thanks for remembering me about it.

Matheus Tavares Bernardino

unread,
Apr 10, 2019, 4:25:04 PM4/10/19
to Thomas Gummerer, Junio C Hamano, git, Ævar Arnfjörð Bjarmason, Christian Couder, Nguyễn Thái Ngọc Duy, SZEDER Gábor, Kernel USP, Michael Haggerty, Ramsay Jones
Hi, Thomas

Sorry for the late reply, but now that I submitted my GSoC proposal I
can finally come back to this series.
Right, thanks.

> > diff --git a/dir-iterator.h b/dir-iterator.h
> > index 970793d07a..93646c3bea 100644
> > --- a/dir-iterator.h
> > +++ b/dir-iterator.h
> > @@ -19,7 +19,7 @@
> > * A typical iteration looks like this:
> > *
> > * int ok;
> > - * struct iterator *iter = dir_iterator_begin(path);
> > + * struct iterator *iter = dir_iterator_begin(path, 0);
>
> Outside of this context, we already mentione errorhandling when
> 'ok != ITER_DONE' in his example. This still can't happen with the
> way the dir iterator is used here, but it serves as a reminder if
> people are using the DIR_ITERATOR_PEDANTIC flag. Good.

This made me think again about the documentation saying that
dir_iterator_abort() and dir_iterator_advance() may return ITER_ERROR,
but the implementation does not containing these possibilities.
(Besides when the pedantic flag is used). Maybe the idea was to make
API-users implement the check for an ITER_ERROR in case dir-iterator
needs to start returning it in the future.

But do you think such a change in dir-iterator is likely to happen?
Maybe we could just make dir_iterator_abort() be void and remove this
section from documentation. Then, for dir_iterator_advance() users
would only need to check for ITER_ERROR if the pedantic flag was given
at dir-iterator creation...

Also CC-ed Michael in case he has some input

Thomas Gummerer

unread,
Apr 11, 2019, 5:09:33 PM4/11/19
to Matheus Tavares Bernardino, Junio C Hamano, git, Ævar Arnfjörð Bjarmason, Christian Couder, Nguyễn Thái Ngọc Duy, SZEDER Gábor, Kernel USP, Michael Haggerty, Ramsay Jones
On 04/10, Matheus Tavares Bernardino wrote:
> > > diff --git a/dir-iterator.h b/dir-iterator.h
> > > index 970793d07a..93646c3bea 100644
> > > --- a/dir-iterator.h
> > > +++ b/dir-iterator.h
> > > @@ -19,7 +19,7 @@
> > > * A typical iteration looks like this:
> > > *
> > > * int ok;
> > > - * struct iterator *iter = dir_iterator_begin(path);
> > > + * struct iterator *iter = dir_iterator_begin(path, 0);
> >
> > Outside of this context, we already mentione errorhandling when
> > 'ok != ITER_DONE' in his example. This still can't happen with the
> > way the dir iterator is used here, but it serves as a reminder if
> > people are using the DIR_ITERATOR_PEDANTIC flag. Good.
>
> This made me think again about the documentation saying that
> dir_iterator_abort() and dir_iterator_advance() may return ITER_ERROR,
> but the implementation does not containing these possibilities.
> (Besides when the pedantic flag is used). Maybe the idea was to make
> API-users implement the check for an ITER_ERROR in case dir-iterator
> needs to start returning it in the future.

Yeah, I think that was the intention.

> But do you think such a change in dir-iterator is likely to happen?
> Maybe we could just make dir_iterator_abort() be void and remove this
> section from documentation. Then, for dir_iterator_advance() users
> would only need to check for ITER_ERROR if the pedantic flag was given
> at dir-iterator creation...

Dunno. In a world where we have the pedantic flag, I think only
returning ITER_ERROR if that flag is given might be what we want to
do. I can't think of a reason why we would want to return ITER_ERROR
without the pedantic flag in that case.

Though I think I would change the example the other way in that case,
and pass DIR_ITERATOR_PEDANTIC to 'dir_iterator_begin()', as it would
be easy to forget error handling otherwise, even when it is
necessary. I'd rather err on the side of showing too much error
handling, than having people forget it and having users run into some
odd edge cases in the wild that the tests don't cover.

Matheus Tavares Bernardino

unread,
Apr 23, 2019, 1:07:38 PM4/23/19
to Thomas Gummerer, Junio C Hamano, git, Ævar Arnfjörð Bjarmason, Christian Couder, Nguyễn Thái Ngọc Duy, SZEDER Gábor, Kernel USP, Michael Haggerty, Ramsay Jones
Ok. I began doing the change, but got stuck in a specific decision.
What I was trying to do is:

1) Make dir_iterator_advance() return ITER_ERROR only when the
pedantic flag is given;
2) Make dir_iterator_abort() be void.

The first change is trivial. But the second is not so easy: Since the
[only] current API user defines other iterators on top of
dir-iterator, it would require a somehow big surgery on refs/* to make
this change. Should I proceed and make the changes at refs/* or should
I keep dir_iterator_abort() returning int, although it can never fail?

There's also a third option: The only operation that may fail during
dir_iterator_abort() is closedir(). But even on
dir_iterator_advance(), I'm treating this error as "non-fatal" in the
sense that it's not caught by the pedantic flag (although a warning is
emitted). I did it like this because it doesn't seem like a major
error during dir iteration... But I could change this and make
DIR_ITERATOR_PEDANTIC return ITER_ERROR upon closedir() errors for
both dir-iterator advance() and abort() functions. What do you think?

> Though I think I would change the example the other way in that case,
> and pass DIR_ITERATOR_PEDANTIC to 'dir_iterator_begin()', as it would
> be easy to forget error handling otherwise, even when it is
> necessary. I'd rather err on the side of showing too much error
> handling, than having people forget it and having users run into some
> odd edge cases in the wild that the tests don't cover.

Yes, I agree.

Thomas Gummerer

unread,
Apr 24, 2019, 2:36:26 PM4/24/19
to Matheus Tavares Bernardino, Junio C Hamano, git, Ævar Arnfjörð Bjarmason, Christian Couder, Nguyễn Thái Ngọc Duy, SZEDER Gábor, Kernel USP, Michael Haggerty, Ramsay Jones
Maybe I'm missing something, but wouldn't this change in refs.c be
enough? (Other than actually making dir_iterator_abort not return
anything)

diff --git a/refs/files-backend.c b/refs/files-backend.c
index 5848f32ef8..81863c3ee0 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -2125,13 +2125,12 @@ static int files_reflog_iterator_abort(struct ref_iterator *ref_iterator)
{
struct files_reflog_iterator *iter =
(struct files_reflog_iterator *)ref_iterator;
- int ok = ITER_DONE;

if (iter->dir_iterator)
- ok = dir_iterator_abort(iter->dir_iterator);
+ dir_iterator_abort(iter->dir_iterator);

base_ref_iterator_free(ref_iterator);
- return ok;
+ return ITER_DONE;
}

static struct ref_iterator_vtable files_reflog_iterator_vtable = {

Currently the only thing calling dir_iterator_abort() is
files_reflog_iterator_abort() from what I can see, and
dir_iterator_abort() always returns ITER_DONE.

That said, I don't know if this is actually worth pursuing. Having it
return some value and having the caller check that makes it more
future proof, as we won't have to change all the callers in the future
if we want to start returning anything other than ITER_DONE. Just
leaving it as it is now doesn't actually hurt anybody I think, but may
help in the future.

> There's also a third option: The only operation that may fail during
> dir_iterator_abort() is closedir(). But even on
> dir_iterator_advance(), I'm treating this error as "non-fatal" in the
> sense that it's not caught by the pedantic flag (although a warning is
> emitted). I did it like this because it doesn't seem like a major
> error during dir iteration... But I could change this and make
> DIR_ITERATOR_PEDANTIC return ITER_ERROR upon closedir() errors for
> both dir-iterator advance() and abort() functions. What do you think?

I think this might be the right way to go. We don't really need an
error from closedir, but at the same time if we are being pedantic,
maybe it should be an error. I don't have a strong opinion here
either way, other than I think it should probably keep returning an
int.

Matheus Tavares Bernardino

unread,
Apr 26, 2019, 12:14:02 AM4/26/19
to Thomas Gummerer, Junio C Hamano, git, Ævar Arnfjörð Bjarmason, Christian Couder, Nguyễn Thái Ngọc Duy, SZEDER Gábor, Kernel USP, Michael Haggerty, Ramsay Jones
Yes, indeed. But I thought that since the reason for making
dir_iterator_abort() be void is that it always returns ITER_DONE, the
same change should be applied to files_reflog_iterator_abort() as it
would fall into the same case. And this, in turn, would require
changes to ref_iterator_abort() and many other functions at
refs/iterator.c and refs/files-backend.c

> Currently the only thing calling dir_iterator_abort() is
> files_reflog_iterator_abort() from what I can see, and
> dir_iterator_abort() always returns ITER_DONE.
>
> That said, I don't know if this is actually worth pursuing. Having it
> return some value and having the caller check that makes it more
> future proof, as we won't have to change all the callers in the future
> if we want to start returning anything other than ITER_DONE. Just
> leaving it as it is now doesn't actually hurt anybody I think, but may
> help in the future.

Ok, I understand.

> > There's also a third option: The only operation that may fail during
> > dir_iterator_abort() is closedir(). But even on
> > dir_iterator_advance(), I'm treating this error as "non-fatal" in the
> > sense that it's not caught by the pedantic flag (although a warning is
> > emitted). I did it like this because it doesn't seem like a major
> > error during dir iteration... But I could change this and make
> > DIR_ITERATOR_PEDANTIC return ITER_ERROR upon closedir() errors for
> > both dir-iterator advance() and abort() functions. What do you think?
>
> I think this might be the right way to go. We don't really need an
> error from closedir, but at the same time if we are being pedantic,
> maybe it should be an error. I don't have a strong opinion here
> either way, other than I think it should probably keep returning an
> int.

I know I suggested this option, but searching the code base I saw no
other place that checks closedir()'s return besides dir-iterator. So
maybe the best option would be to keep dir_iterator_abort() always
returning ITER_DONE, even upon closedir() errors. Them, I can document
that the pedantic flag only affects dir_iterator_advance() behavior
(but closedir() errors wouldn't be considered here as well).

I got stuck in this for a while, but finally this option seems good to me now...

Matheus Tavares

unread,
May 2, 2019, 10:48:42 AM5/2/19
to g...@vger.kernel.org, Thomas Gummerer, Ævar Arnfjörð Bjarmason, Christian Couder, Nguyễn Thái Ngọc Duy, SZEDER Gábor, kerne...@googlegroups.com
This patchset contains:
- a replacement of explicit recursive dir iteration at
copy_or_link_directory for the dir-iterator API;
- some refactoring and behaviour changes at local clone, mainly to
take care of symlinks and hidden files at .git/objects, together
with tests for this types of files.
- dir-iterator refactoring and feature adding with tests.

Changes since v5:
- Add tests for the dir-iterator API
- Refactor the dir-iterator state machine model, simplifying its
mechanics to improve readability.
- Change warning() to warning_errno() at dir-iterator.c
- Add a recursive symlinks check for dir_iterator_advance() in order
to avoid unwanted recursions with DIR_ITERATOR_FOLLOW_SYMLIKS
- Add tests for the dir-iterator flags feature
- Make warnings be emitted both when DIR_ITERATOR_PEDANTIC is
supplied and when it's not. It contains more relevant information
on the error, so I thought it should be always printed.
- Make dir_iterator_begin() check if the given argument is a valid
path to a directory.
- Adjusted some minor codestyle problems and commit messages
- Address Thomas comments in v5

v5: https://public-inbox.org/git/20190330224907.3277-...@usp.br/
travis build: https://travis-ci.org/MatheusBernardino/git/builds/527176611

Note: I tried to use --range-diff as Thomas suggested but I'm not sure
the output is as desired. Please, let me know if I did something wrong
using it.

Daniel Ferreira (1):
dir-iterator: add tests for dir-iterator API

Matheus Tavares (8):
clone: better handle symlinked files at .git/objects/
dir-iterator: use warning_errno when possible
dir-iterator: refactor state machine model
dir-iterator: add flags parameter to dir_iterator_begin
clone: copy hidden paths at local clone
clone: extract function from copy_or_link_directory
clone: use dir-iterator to avoid explicit dir traversal
clone: replace strcmp by fspathcmp

Ævar Arnfjörð Bjarmason (1):
clone: test for our behavior on odd objects/* content

Makefile | 1 +
builtin/clone.c | 75 +++++----
dir-iterator.c | 289 +++++++++++++++++++++--------------
dir-iterator.h | 59 +++++--
refs/files-backend.c | 17 ++-
t/helper/test-dir-iterator.c | 58 +++++++
t/helper/test-tool.c | 1 +
t/helper/test-tool.h | 1 +
t/t0066-dir-iterator.sh | 163 ++++++++++++++++++++
t/t5604-clone-reference.sh | 133 ++++++++++++++++
10 files changed, 634 insertions(+), 163 deletions(-)
create mode 100644 t/helper/test-dir-iterator.c
create mode 100755 t/t0066-dir-iterator.sh

Range-diff against v5:
1: 3d422dd4de = 1: a630b1a129 clone: test for our behavior on odd objects/* content
2: 35819e6ed1 ! 2: 51e06687fc clone: better handle symlinked files at .git/objects/
@@ -45,10 +45,7 @@
die_errno(_("failed to unlink '%s'"), dest->buf);
if (!option_no_hardlinks) {
- if (!link(src->buf, dest->buf))
-+ char *resolved_path = real_pathdup(src->buf, 1);
-+ int status = link(resolved_path, dest->buf);
-+ free(resolved_path);
-+ if (!status)
++ if (!link(real_path(src->buf), dest->buf))
continue;
if (option_local > 0)
die_errno(_("failed to create link '%s'"), dest->buf);
3: 2afe3208a4 < -: ---------- dir-iterator: add flags parameter to dir_iterator_begin
-: ---------- > 3: c8a860e3a5 dir-iterator: add tests for dir-iterator API
-: ---------- > 4: b975351080 dir-iterator: use warning_errno when possible
-: ---------- > 5: 0fdbd1633e dir-iterator: refactor state machine model
-: ---------- > 6: 7b2a9ae947 dir-iterator: add flags parameter to dir_iterator_begin
4: 71d64e6278 = 7: b9f298cbc6 clone: copy hidden paths at local clone
5: 35e36756db = 8: 0e7b1e49e2 clone: extract function from copy_or_link_directory
6: 1bfda87879 ! 9: f726ce2733 clone: use dir-iterator to avoid explicit dir traversal
@@ -8,10 +8,14 @@
copy_or_link_directory.

This process also makes copy_or_link_directory call die() in case of an
- error on readdir or stat, inside dir_iterator_advance. Previously it
+ error on readdir or stat inside dir_iterator_advance. Previously it
would just print a warning for errors on stat and ignore errors on
readdir, which isn't nice because a local git clone could succeed even
- though the .git/objects copy didn't fully succeed.
+ though the .git/objects copy didn't fully succeed. Also, with the
+ dir-iterator API, recursive symlinks will be detected and skipped. This
+ is another behavior improvement, since the current version would
+ continue to copy the same content over and over until stat() returned an
+ ELOOP error.

Signed-off-by: Matheus Tavares <matheus.b...@usp.br>

@@ -44,12 +48,15 @@
- die_errno(_("failed to open '%s'"), src->buf);
+ struct dir_iterator *iter;
+ int iter_status;
-+ unsigned flags;
++ unsigned int flags;

mkdir_if_missing(dest->buf, 0777);

+ flags = DIR_ITERATOR_PEDANTIC | DIR_ITERATOR_FOLLOW_SYMLINKS;
+ iter = dir_iterator_begin(src->buf, flags);
++
++ if (!iter)
++ die_errno(_("failed to start iterator over '%s'"), src->buf);
+
strbuf_addch(src, '/');
src_len = src->len;
7: 3861b30108 = 10: 6a57bb3887 clone: replace strcmp by fspathcmp
--
2.20.1

Matheus Tavares

unread,
May 2, 2019, 10:48:45 AM5/2/19
to g...@vger.kernel.org, Thomas Gummerer, Ævar Arnfjörð Bjarmason, Christian Couder, Nguyễn Thái Ngọc Duy, SZEDER Gábor, kerne...@googlegroups.com, Junio C Hamano, Alex Riesen
From: Ævar Arnfjörð Bjarmason <ava...@gmail.com>

Add tests for what happens when we perform a local clone on a repo
containing odd files at .git/object directory, such as symlinks to other
dirs, or unknown files.

I'm bending over backwards here to avoid a SHA-1 dependency. See [1]
for an earlier and simpler version that hardcoded SHA-1s.

This behavior has been the same for a *long* time, but hasn't been
tested for.

There's a good post-hoc argument to be made for copying over unknown
things, e.g. I'd like a git version that doesn't know about the
commit-graph to copy it under "clone --local" so a newer git version
can make use of it.

In follow-up commits we'll look at changing some of this behavior, but
for now, let's just assert it as-is so we'll notice what we'll change
later.

1. https://public-inbox.org/git/20190226002625...@gmail.com/

Signed-off-by: Ævar Arnfjörð Bjarmason <ava...@gmail.com>
[matheus.bernardino: improved and split tests in more than one patch]
Helped-by: Matheus Tavares <matheus.b...@usp.br>
Signed-off-by: Matheus Tavares <matheus.b...@usp.br>
---
t/t5604-clone-reference.sh | 111 +++++++++++++++++++++++++++++++++++++
1 file changed, 111 insertions(+)

diff --git a/t/t5604-clone-reference.sh b/t/t5604-clone-reference.sh
index 4320082b1b..207650cb95 100755
--- a/t/t5604-clone-reference.sh
+++ b/t/t5604-clone-reference.sh
+ ln -s packs pack &&
+ find ?? -type d >loose-dirs &&
+ last_loose=$(tail -n 1 loose-dirs) &&
+ rm -f loose-dirs &&
+ mv $last_loose a-loose-dir &&
+ ln -s a-loose-dir $last_loose &&
+ find . -type f | sort >../../../T.objects-files.raw &&
+ echo unknown_content> unknown_file
+ ) &&
+ git -C T fsck &&
+ git -C T rev-list --all --objects >T.objects
+'
+
+
+test_expect_success SYMLINKS 'clone repo with symlinked dirs and unknown files at objects/' '
+ for option in --local --no-hardlinks --shared --dissociate
+ do
+ for option in --local --dissociate --no-hardlinks
+ do
+ test_cmp expected-files T$option.objects-files.raw.de-sha || return 1

Matheus Tavares

unread,
May 2, 2019, 10:48:48 AM5/2/19
to g...@vger.kernel.org, Thomas Gummerer, Ævar Arnfjörð Bjarmason, Christian Couder, Nguyễn Thái Ngọc Duy, SZEDER Gábor, kerne...@googlegroups.com, Junio C Hamano, Michael Haggerty
Signed-off-by: Ævar Arnfjörð Bjarmason <ava...@gmail.com>
Co-authored-by: Ævar Arnfjörð Bjarmason <ava...@gmail.com>
---
builtin/clone.c | 2 +-
t/t5604-clone-reference.sh | 27 ++++++++++++++++++++-------
2 files changed, 21 insertions(+), 8 deletions(-)

diff --git a/builtin/clone.c b/builtin/clone.c
index 50bde99618..d1aba3b13f 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -443,7 +443,7 @@ static void copy_or_link_directory(struct strbuf *src, struct strbuf *dest,
if (unlink(dest->buf) && errno != ENOENT)
die_errno(_("failed to unlink '%s'"), dest->buf);
if (!option_no_hardlinks) {
- if (!link(src->buf, dest->buf))
+ if (!link(real_path(src->buf), dest->buf))
continue;
if (option_local > 0)
die_errno(_("failed to create link '%s'"), dest->buf);
diff --git a/t/t5604-clone-reference.sh b/t/t5604-clone-reference.sh
index 207650cb95..0800c3853f 100755
--- a/t/t5604-clone-reference.sh
+++ b/t/t5604-clone-reference.sh
@@ -266,7 +266,7 @@ test_expect_success 'clone a repo with garbage in objects/*' '
test_cmp expected actual
'

-test_expect_success SYMLINKS 'setup repo with manually symlinked dirs and unknown files at objects/' '
+test_expect_success SYMLINKS 'setup repo with manually symlinked or unknown files at objects/' '
git init T &&
(
cd T &&
@@ -280,10 +280,19 @@ test_expect_success SYMLINKS 'setup repo with manually symlinked dirs and unknow
ln -s packs pack &&
find ?? -type d >loose-dirs &&
last_loose=$(tail -n 1 loose-dirs) &&
- rm -f loose-dirs &&
mv $last_loose a-loose-dir &&
ln -s a-loose-dir $last_loose &&
+ first_loose=$(head -n 1 loose-dirs) &&
+ rm -f loose-dirs &&
+
+ cd $first_loose &&
+ obj=$(ls *) &&
+ mv $obj ../an-object &&
+ ln -s ../an-object $obj &&
+
+ cd ../ &&
find . -type f | sort >../../../T.objects-files.raw &&
+ find . -type l | sort >../../../T.objects-symlinks.raw &&
echo unknown_content> unknown_file
) &&
git -C T fsck &&
@@ -291,7 +300,7 @@ test_expect_success SYMLINKS 'setup repo with manually symlinked dirs and unknow
'


-test_expect_success SYMLINKS 'clone repo with symlinked dirs and unknown files at objects/' '
+test_expect_success SYMLINKS 'clone repo with symlinked or unknown files at objects/' '
for option in --local --no-hardlinks --shared --dissociate
do
git clone $option T T$option || return 1 &&

Matheus Tavares

unread,
May 2, 2019, 10:48:52 AM5/2/19
to g...@vger.kernel.org, Thomas Gummerer, Ævar Arnfjörð Bjarmason, Christian Couder, Nguyễn Thái Ngọc Duy, SZEDER Gábor, kerne...@googlegroups.com, Daniel Ferreira, Junio C Hamano
From: Daniel Ferreira <bnm...@gmail.com>

Create t/helper/test-dir-iterator.c, which prints relevant information
about a directory tree iterated over with dir-iterator.

Create t/t0066-dir-iterator.sh, which tests that dir-iterator does
iterate through a whole directory tree as expected.

Signed-off-by: Daniel Ferreira <bnm...@gmail.com>
[matheus.bernardino: update to use test-tool and some minor aesthetics]
Helped-by: Matheus Tavares <matheus.b...@usp.br>
Signed-off-by: Matheus Tavares <matheus.b...@usp.br>
---
Makefile | 1 +
t/helper/test-dir-iterator.c | 33 ++++++++++++++++++++++
t/helper/test-tool.c | 1 +
t/helper/test-tool.h | 1 +
t/t0066-dir-iterator.sh | 55 ++++++++++++++++++++++++++++++++++++
5 files changed, 91 insertions(+)
create mode 100644 t/helper/test-dir-iterator.c
create mode 100755 t/t0066-dir-iterator.sh

diff --git a/Makefile b/Makefile
index 9f1b6e8926..61da7e4f35 100644
--- a/Makefile
+++ b/Makefile
@@ -713,6 +713,7 @@ TEST_BUILTINS_OBJS += test-config.o
TEST_BUILTINS_OBJS += test-ctype.o
TEST_BUILTINS_OBJS += test-date.o
TEST_BUILTINS_OBJS += test-delta.o
+TEST_BUILTINS_OBJS += test-dir-iterator.o
TEST_BUILTINS_OBJS += test-drop-caches.o
TEST_BUILTINS_OBJS += test-dump-cache-tree.o
TEST_BUILTINS_OBJS += test-dump-fsmonitor.o
diff --git a/t/helper/test-dir-iterator.c b/t/helper/test-dir-iterator.c
new file mode 100644
index 0000000000..84f50bed8c
--- /dev/null
+++ b/t/helper/test-dir-iterator.c
@@ -0,0 +1,33 @@
+#include "test-tool.h"
+#include "git-compat-util.h"
+#include "strbuf.h"
+#include "iterator.h"
+#include "dir-iterator.h"
+
+/* Argument is a directory path to iterate over */
+int cmd__dir_iterator(int argc, const char **argv)
+{
+ struct strbuf path = STRBUF_INIT;
+ struct dir_iterator *diter;
+
+ if (argc < 2)
+ die("BUG: test-dir-iterator needs one argument");
+
+ strbuf_add(&path, argv[1], strlen(argv[1]));
+
+ diter = dir_iterator_begin(path.buf);
+
+ while (dir_iterator_advance(diter) == ITER_OK) {
+ if (S_ISDIR(diter->st.st_mode))
+ printf("[d] ");
+ else if (S_ISREG(diter->st.st_mode))
+ printf("[f] ");
+ else
+ printf("[?] ");
+
+ printf("(%s) [%s] %s\n", diter->relative_path, diter->basename,
+ diter->path.buf);
+ }
+
+ return 0;
+}
diff --git a/t/helper/test-tool.c b/t/helper/test-tool.c
index 53c06932c4..89b3bfcad8 100644
--- a/t/helper/test-tool.c
+++ b/t/helper/test-tool.c
@@ -13,6 +13,7 @@ static struct test_cmd cmds[] = {
{ "ctype", cmd__ctype },
{ "date", cmd__date },
{ "delta", cmd__delta },
+ { "dir-iterator", cmd__dir_iterator },
{ "drop-caches", cmd__drop_caches },
{ "dump-cache-tree", cmd__dump_cache_tree },
{ "dump-fsmonitor", cmd__dump_fsmonitor },
diff --git a/t/helper/test-tool.h b/t/helper/test-tool.h
index ffab4d19d7..0a831c839c 100644
--- a/t/helper/test-tool.h
+++ b/t/helper/test-tool.h
@@ -9,6 +9,7 @@ int cmd__config(int argc, const char **argv);
int cmd__ctype(int argc, const char **argv);
int cmd__date(int argc, const char **argv);
int cmd__delta(int argc, const char **argv);
+int cmd__dir_iterator(int argc, const char **argv);
int cmd__drop_caches(int argc, const char **argv);
int cmd__dump_cache_tree(int argc, const char **argv);
int cmd__dump_fsmonitor(int argc, const char **argv);
diff --git a/t/t0066-dir-iterator.sh b/t/t0066-dir-iterator.sh
new file mode 100755
index 0000000000..6e06dc038d
--- /dev/null
+++ b/t/t0066-dir-iterator.sh
@@ -0,0 +1,55 @@
+#!/bin/sh
+
+test_description='Test the dir-iterator functionality'
+
+. ./test-lib.sh
+
+test_expect_success 'setup' '
+ mkdir -p dir &&
+ mkdir -p dir/a/b/c/ &&
+ >dir/b &&
+ >dir/c &&
+ mkdir -p dir/d/e/d/ &&
+ >dir/a/b/c/d &&
+ >dir/a/e &&
+ >dir/d/e/d/a &&
+
+ mkdir -p dir2/a/b/c/ &&
+ >dir2/a/b/c/d
+'
+
+test_expect_success 'dir-iterator should iterate through all files' '
+ cat >expected-iteration-sorted-output <<-EOF &&
+ [d] (a) [a] ./dir/a
+ [d] (a/b) [b] ./dir/a/b
+ [d] (a/b/c) [c] ./dir/a/b/c
+ [d] (d) [d] ./dir/d
+ [d] (d/e) [e] ./dir/d/e
+ [d] (d/e/d) [d] ./dir/d/e/d
+ [f] (a/b/c/d) [d] ./dir/a/b/c/d
+ [f] (a/e) [e] ./dir/a/e
+ [f] (b) [b] ./dir/b
+ [f] (c) [c] ./dir/c
+ [f] (d/e/d/a) [a] ./dir/d/e/d/a
+ EOF
+
+ test-tool dir-iterator ./dir >out &&
+ sort <out >./actual-iteration-sorted-output &&
+
+ test_cmp expected-iteration-sorted-output actual-iteration-sorted-output
+'
+
+test_expect_success 'dir-iterator should list files in the correct order' '
+ cat >expected-pre-order-output <<-EOF &&
+ [d] (a) [a] ./dir2/a
+ [d] (a/b) [b] ./dir2/a/b
+ [d] (a/b/c) [c] ./dir2/a/b/c
+ [f] (a/b/c/d) [d] ./dir2/a/b/c/d
+ EOF
+
+ test-tool dir-iterator ./dir2 >actual-pre-order-output &&
+
+ test_cmp expected-pre-order-output actual-pre-order-output
+'
+
+test_done
--
2.20.1

Matheus Tavares

unread,
May 2, 2019, 10:48:55 AM5/2/19
to g...@vger.kernel.org, Thomas Gummerer, Ævar Arnfjörð Bjarmason, Christian Couder, Nguyễn Thái Ngọc Duy, SZEDER Gábor, kerne...@googlegroups.com, Junio C Hamano, Michael Haggerty
Change warning(..., strerror(errno)) by warning_errno(...). This helps
to unify warning display besides simplifying a bit the code. Also,
improve warning messages by surrounding paths with quotation marks and
using more meaningful statements.

Signed-off-by: Matheus Tavares <matheus.b...@usp.br>
---
dir-iterator.c | 23 ++++++++++++-----------
1 file changed, 12 insertions(+), 11 deletions(-)

diff --git a/dir-iterator.c b/dir-iterator.c
index f2dcd82fde..0c8880868a 100644
--- a/dir-iterator.c
+++ b/dir-iterator.c
@@ -71,8 +71,8 @@ int dir_iterator_advance(struct dir_iterator *dir_iterator)

level->dir = opendir(iter->base.path.buf);
if (!level->dir && errno != ENOENT) {
- warning("error opening directory %s: %s",
- iter->base.path.buf, strerror(errno));
+ warning_errno("error opening directory '%s'",
+ iter->base.path.buf);
/* Popping the level is handled below */
}

@@ -122,11 +122,11 @@ int dir_iterator_advance(struct dir_iterator *dir_iterator)
if (!de) {
/* This level is exhausted; pop up a level. */
if (errno) {
- warning("error reading directory %s: %s",
- iter->base.path.buf, strerror(errno));
+ warning_errno("error reading directory '%s'",
+ iter->base.path.buf);
} else if (closedir(level->dir))
- warning("error closing directory %s: %s",
- iter->base.path.buf, strerror(errno));
+ warning_errno("error closing directory '%s'",
+ iter->base.path.buf);

level->dir = NULL;
if (--iter->levels_nr == 0)
@@ -140,9 +140,8 @@ int dir_iterator_advance(struct dir_iterator *dir_iterator)
strbuf_addstr(&iter->base.path, de->d_name);
if (lstat(iter->base.path.buf, &iter->base.st) < 0) {
if (errno != ENOENT)
- warning("error reading path '%s': %s",
- iter->base.path.buf,
- strerror(errno));
+ warning_errno("failed to stat '%s'",
+ iter->base.path.buf);
continue;
}

@@ -170,9 +169,11 @@ int dir_iterator_abort(struct dir_iterator *dir_iterator)
&iter->levels[iter->levels_nr - 1];

if (level->dir && closedir(level->dir)) {
+ int saved_errno = errno;
strbuf_setlen(&iter->base.path, level->prefix_len);
- warning("error closing directory %s: %s",
- iter->base.path.buf, strerror(errno));
+ errno = saved_errno;
+ warning_errno("error closing directory '%s'",
+ iter->base.path.buf);
}
}

--
2.20.1

Matheus Tavares

unread,
May 2, 2019, 10:48:59 AM5/2/19
to g...@vger.kernel.org, Thomas Gummerer, Ævar Arnfjörð Bjarmason, Christian Couder, Nguyễn Thái Ngọc Duy, SZEDER Gábor, kerne...@googlegroups.com, Daniel Ferreira, Michael Haggerty, Ramsay Jones, Junio C Hamano, Jeff King, Johannes Schindelin
dir_iterator_advance() is a large function with two nested loops. Let's
improve its readability factoring out three functions and simplifying
its mechanics. The refactored model will no longer depend on
level.initialized and level.dir_state to keep track of the iteration
state and will perform on a single loop.

Also, dir_iterator_begin() currently does not check if the given string
represents a valid directory path. Since the refactored model will have
to stat() the given path at initialization, let's also check for this
kind of error and make dir_iterator_begin() return NULL, on failures,
with errno appropriately set. And add tests for this new behavior.

Improve documentation at dir-iteration.h and code comments at
dir-iterator.c to reflect the changes and eliminate possible
ambiguities.

Finally, adjust refs/files-backend.c to check for now possible
dir_iterator_begin() failures.

Original-patch-by: Daniel Ferreira <bnm...@gmail.com>
Signed-off-by: Matheus Tavares <matheus.b...@usp.br>
---

When dir_iterator_begin() fails at refs/files-backend.c, I used the same
idea Daniel proposed, which is to initialize an empty iterator with
empty_ref_iterator_begin(). Still, I'm not sure wether we shouldn't
abort execution there instead of returning an empty iterator.

dir_iterator_begin() will fail if the give argument is an empty string,
NULL, an invalid path, a non directory path or on some other stat()
errors. Maybe, on these kind of errors, we don't want the users
of refs/files-backend.c, and therefore should call die() right there?

(Also, NULL and empty string arguments were considered a bug in the
previous version of dir_iterator_begin)

dir-iterator.c | 234 ++++++++++++++++++-----------------
dir-iterator.h | 15 ++-
refs/files-backend.c | 17 ++-
t/helper/test-dir-iterator.c | 5 +
t/t0066-dir-iterator.sh | 13 ++
5 files changed, 163 insertions(+), 121 deletions(-)

diff --git a/dir-iterator.c b/dir-iterator.c
index 0c8880868a..594fe4d67b 100644
--- a/dir-iterator.c
+++ b/dir-iterator.c
@@ -4,8 +4,6 @@
#include "dir-iterator.h"

struct dir_iterator_level {
- int initialized;
-
DIR *dir;

/*
@@ -13,16 +11,6 @@ struct dir_iterator_level {
* (including a trailing '/'):
*/
size_t prefix_len;
-
- /*
- * The last action that has been taken with the current entry
- * (needed for directories, which have to be included in the
- * iteration and also iterated into):
- */
- enum {
- DIR_STATE_ITER,
- DIR_STATE_RECURSE
- } dir_state;
};

/*
@@ -34,9 +22,11 @@ struct dir_iterator_int {
struct dir_iterator base;

/*
- * The number of levels currently on the stack. This is always
- * at least 1, because when it becomes zero the iteration is
- * ended and this struct is freed.
+ * The number of levels currently on the stack. After the first
+ * call to dir_iterator_begin(), if it succeeds to open the
+ * first level's dir, this will always be at least 1. Then,
+ * when it comes to zero the iteration is ended and this
+ * struct is freed.
*/
size_t levels_nr;

@@ -50,113 +40,118 @@ struct dir_iterator_int {
struct dir_iterator_level *levels;
};

+/*
+ * Push a level in the iter stack and initialize it with information from
+ * the directory pointed by iter->base->path. It is assumed that this
+ * strbuf points to a valid directory path. Return 0 on success and -1
+ * otherwise, leaving the stack unchanged.
+ */
+static int push_level(struct dir_iterator_int *iter)
+{
+ struct dir_iterator_level *level;
+
+ ALLOC_GROW(iter->levels, iter->levels_nr + 1, iter->levels_alloc);
+ level = &iter->levels[iter->levels_nr++];
+
+ if (!is_dir_sep(iter->base.path.buf[iter->base.path.len - 1]))
+ strbuf_addch(&iter->base.path, '/');
+ level->prefix_len = iter->base.path.len;
+
+ level->dir = opendir(iter->base.path.buf);
+ if (!level->dir) {
+ if (errno != ENOENT) {
+ warning_errno("error opening directory '%s'",
+ iter->base.path.buf);
+ }
+ iter->levels_nr--;
+ return -1;
+ }
+
+ return 0;
+}
+
+/*
+ * Pop the top level on the iter stack, releasing any resources associated
+ * with it. Return the new value of iter->levels_nr.
+ */
+static int pop_level(struct dir_iterator_int *iter)
+{
+ struct dir_iterator_level *level =
+ &iter->levels[iter->levels_nr - 1];
+
+ if (level->dir && closedir(level->dir))
+ warning_errno("error closing directory '%s'",
+ iter->base.path.buf);
+ level->dir = NULL;
+
+ return --iter->levels_nr;
+}
+
+/*
+ * Populate iter->base with the necessary information on the next iteration
+ * entry, represented by the given dirent de. Return 0 on success and -1
+ * otherwise.
+ */
+static int prepare_next_entry_data(struct dir_iterator_int *iter,
+ struct dirent *de)
+{
+ strbuf_addstr(&iter->base.path, de->d_name);
+ /*
+ * We have to reset these because the path strbuf might have
+ * been realloc()ed at the previous strbuf_addstr().
+ */
+ iter->base.relative_path = iter->base.path.buf +
+ iter->levels[0].prefix_len;
+ iter->base.basename = iter->base.path.buf +
+ iter->levels[iter->levels_nr - 1].prefix_len;
+
+ if (lstat(iter->base.path.buf, &iter->base.st)) {
+ if (errno != ENOENT)
+ warning_errno("failed to stat '%s'", iter->base.path.buf);
+ return -1;
+ }
+
+ return 0;
+}
+
int dir_iterator_advance(struct dir_iterator *dir_iterator)
{
struct dir_iterator_int *iter =
(struct dir_iterator_int *)dir_iterator;

+ if (S_ISDIR(iter->base.st.st_mode)) {
+ if (push_level(iter) && iter->levels_nr == 0) {
+ /* Pushing the first level failed */
+ return dir_iterator_abort(dir_iterator);
+ }
+ }
+
+ /* Loop until we find an entry that we can give back to the caller. */
while (1) {
+ struct dirent *de;
struct dir_iterator_level *level =
&iter->levels[iter->levels_nr - 1];
- struct dirent *de;

- if (!level->initialized) {
- /*
- * Note: dir_iterator_begin() ensures that
- * path is not the empty string.
- */
- if (!is_dir_sep(iter->base.path.buf[iter->base.path.len - 1]))
- strbuf_addch(&iter->base.path, '/');
- level->prefix_len = iter->base.path.len;
-
- level->dir = opendir(iter->base.path.buf);
- if (!level->dir && errno != ENOENT) {
- warning_errno("error opening directory '%s'",
+ strbuf_setlen(&iter->base.path, level->prefix_len);
+ errno = 0;
+ de = readdir(level->dir);
+
+ if (!de) {
+ if (errno)
+ warning_errno("error reading directory '%s'",
iter->base.path.buf);
- /* Popping the level is handled below */
- }
-
- level->initialized = 1;
- } else if (S_ISDIR(iter->base.st.st_mode)) {
- if (level->dir_state == DIR_STATE_ITER) {
- /*
- * The directory was just iterated
- * over; now prepare to iterate into
- * it.
- */
- level->dir_state = DIR_STATE_RECURSE;
- ALLOC_GROW(iter->levels, iter->levels_nr + 1,
- iter->levels_alloc);
- level = &iter->levels[iter->levels_nr++];
- level->initialized = 0;
- continue;
- } else {
- /*
- * The directory has already been
- * iterated over and iterated into;
- * we're done with it.
- */
- }
+ else if (pop_level(iter) == 0)
+ return dir_iterator_abort(dir_iterator);
+ continue;
}

- if (!level->dir) {
- /*
- * This level is exhausted (or wasn't opened
- * successfully); pop up a level.
- */
- if (--iter->levels_nr == 0)
- return dir_iterator_abort(dir_iterator);
+ if (is_dot_or_dotdot(de->d_name))
+ continue;

+ if (prepare_next_entry_data(iter, de))
continue;
- }

- /*
- * Loop until we find an entry that we can give back
- * to the caller:
- */
- while (1) {
- strbuf_setlen(&iter->base.path, level->prefix_len);
- errno = 0;
- de = readdir(level->dir);
-
- if (!de) {
- /* This level is exhausted; pop up a level. */
- if (errno) {
- warning_errno("error reading directory '%s'",
- iter->base.path.buf);
- } else if (closedir(level->dir))
- warning_errno("error closing directory '%s'",
- iter->base.path.buf);
-
- level->dir = NULL;
- if (--iter->levels_nr == 0)
- return dir_iterator_abort(dir_iterator);
- break;
- }
-
- if (is_dot_or_dotdot(de->d_name))
- continue;
-
- strbuf_addstr(&iter->base.path, de->d_name);
- if (lstat(iter->base.path.buf, &iter->base.st) < 0) {
- if (errno != ENOENT)
- warning_errno("failed to stat '%s'",
- iter->base.path.buf);
- continue;
- }
-
- /*
- * We have to set these each time because
- * the path strbuf might have been realloc()ed.
- */
- iter->base.relative_path =
- iter->base.path.buf + iter->levels[0].prefix_len;
- iter->base.basename =
- iter->base.path.buf + level->prefix_len;
- level->dir_state = DIR_STATE_ITER;
-
- return ITER_OK;
- }
+ return ITER_OK;
}
}

@@ -187,17 +182,32 @@ struct dir_iterator *dir_iterator_begin(const char *path)
{
struct dir_iterator_int *iter = xcalloc(1, sizeof(*iter));
struct dir_iterator *dir_iterator = &iter->base;
-
- if (!path || !*path)
- BUG("empty path passed to dir_iterator_begin()");
+ int saved_errno;

strbuf_init(&iter->base.path, PATH_MAX);
strbuf_addstr(&iter->base.path, path);

ALLOC_GROW(iter->levels, 10, iter->levels_alloc);
+ iter->levels_nr = 0;

- iter->levels_nr = 1;
- iter->levels[0].initialized = 0;
+ /*
+ * Note: stat already checks for NULL or empty strings and
+ * inexistent paths.
+ */
+ if (stat(iter->base.path.buf, &iter->base.st) < 0) {
+ saved_errno = errno;
+ goto error_out;
+ }
+
+ if (!S_ISDIR(iter->base.st.st_mode)) {
+ saved_errno = ENOTDIR;
+ goto error_out;
+ }

return dir_iterator;
+
+error_out:
+ dir_iterator_abort(dir_iterator);
+ errno = saved_errno;
+ return NULL;
}
diff --git a/dir-iterator.h b/dir-iterator.h
index 970793d07a..0822821e56 100644
--- a/dir-iterator.h
+++ b/dir-iterator.h
@@ -8,19 +8,23 @@
*
* Iterate over a directory tree, recursively, including paths of all
* types and hidden paths. Skip "." and ".." entries and don't follow
- * symlinks except for the original path.
+ * symlinks except for the original path. Note that the original path
+ * is not included in the iteration.
*
* Every time dir_iterator_advance() is called, update the members of
* the dir_iterator structure to reflect the next path in the
* iteration. The order that paths are iterated over within a
- * directory is undefined, but directory paths are always iterated
- * over before the subdirectory contents.
+ * directory is undefined, directory paths are always given before
+ * their contents.
*
* A typical iteration looks like this:
*
* int ok;
* struct iterator *iter = dir_iterator_begin(path);
*
+ * if (!iter)
+ * goto error_handler;
+ *
* while ((ok = dir_iterator_advance(iter)) == ITER_OK) {
* if (want_to_stop_iteration()) {
* ok = dir_iterator_abort(iter);
@@ -59,8 +63,9 @@ struct dir_iterator {
};

/*
- * Start a directory iteration over path. Return a dir_iterator that
- * holds the internal state of the iteration.
+ * Start a directory iteration over path. On success, return a
+ * dir_iterator that holds the internal state of the iteration.
+ * In case of failure, return NULL and set errno accordingly.
*
* The iteration includes all paths under path, not including path
* itself and not including "." or ".." entries.
diff --git a/refs/files-backend.c b/refs/files-backend.c
index 63e55e6773..97a54532e3 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -2143,13 +2143,22 @@ static struct ref_iterator_vtable files_reflog_iterator_vtable = {
static struct ref_iterator *reflog_iterator_begin(struct ref_store *ref_store,
const char *gitdir)
{
- struct files_reflog_iterator *iter = xcalloc(1, sizeof(*iter));
- struct ref_iterator *ref_iterator = &iter->base;
+ struct dir_iterator *diter;
+ struct files_reflog_iterator *iter;
+ struct ref_iterator *ref_iterator;
struct strbuf sb = STRBUF_INIT;

- base_ref_iterator_init(ref_iterator, &files_reflog_iterator_vtable, 0);
strbuf_addf(&sb, "%s/logs", gitdir);
- iter->dir_iterator = dir_iterator_begin(sb.buf);
+
+ diter = dir_iterator_begin(sb.buf);
+ if (!diter)
+ return empty_ref_iterator_begin();
+
+ iter = xcalloc(1, sizeof(*iter));
+ ref_iterator = &iter->base;
+
+ base_ref_iterator_init(ref_iterator, &files_reflog_iterator_vtable, 0);
+ iter->dir_iterator = diter;
iter->ref_store = ref_store;
strbuf_release(&sb);

diff --git a/t/helper/test-dir-iterator.c b/t/helper/test-dir-iterator.c
index 84f50bed8c..fab1ff6237 100644
--- a/t/helper/test-dir-iterator.c
+++ b/t/helper/test-dir-iterator.c
@@ -17,6 +17,11 @@ int cmd__dir_iterator(int argc, const char **argv)

diter = dir_iterator_begin(path.buf);

+ if (!diter) {
+ printf("dir_iterator_begin failure: %d\n", errno);
+ exit(EXIT_FAILURE);
+ }
+
while (dir_iterator_advance(diter) == ITER_OK) {
if (S_ISDIR(diter->st.st_mode))
printf("[d] ");
diff --git a/t/t0066-dir-iterator.sh b/t/t0066-dir-iterator.sh
index 6e06dc038d..c739ed7911 100755
--- a/t/t0066-dir-iterator.sh
+++ b/t/t0066-dir-iterator.sh
@@ -52,4 +52,17 @@ test_expect_success 'dir-iterator should list files in the correct order' '
test_cmp expected-pre-order-output actual-pre-order-output
'

+test_expect_success 'begin should fail upon inexistent paths' '
+ test_must_fail test-tool dir-iterator ./inexistent-path \
+ >actual-inexistent-path-output &&
+ echo "dir_iterator_begin failure: 2" >expected-inexistent-path-output &&
+ test_cmp expected-inexistent-path-output actual-inexistent-path-output
+'
+
+test_expect_success 'begin should fail upon non directory paths' '
+ test_must_fail test-tool dir-iterator ./dir/b >actual-non-dir-output &&
+ echo "dir_iterator_begin failure: 20" >expected-non-dir-output &&
+ test_cmp expected-non-dir-output actual-non-dir-output

Matheus Tavares

unread,
May 2, 2019, 10:49:05 AM5/2/19
to g...@vger.kernel.org, Thomas Gummerer, Ævar Arnfjörð Bjarmason, Christian Couder, Nguyễn Thái Ngọc Duy, SZEDER Gábor, kerne...@googlegroups.com, Michael Haggerty, Daniel Ferreira, Ramsay Jones, Junio C Hamano
Add the possibility of giving flags to dir_iterator_begin to initialize
a dir-iterator with special options.

Currently possible flags are:
- DIR_ITERATOR_PEDANTIC, which makes dir_iterator_advance abort
immediately in the case of an error, instead of keep looking for the
next valid entry;
- DIR_ITERATOR_FOLLOW_SYMLINKS, which makes the iterator follow
symlinks and include linked directories' contents in the iteration.

These new flags will be used in a subsequent patch.

Also add tests for the flags' usage and adjust refs/files-backend.c to
the new dir_iterator_begin signature.

Signed-off-by: Matheus Tavares <matheus.b...@usp.br>
---

refs/files_backend.c is currently using no flags at the place it calls
dir_iterator_begin(), to keep the same behavior it previously had. But
as ITER_ERROR will now only be possibly returned by
dir_iterator_avance() when DIR_ITERATOR_PEDANTIC is used and as
refs/files_backend.c already checks for ITER_ERRORs, should we, perhaps,
use this flag when initializing an iterator here?

Another uncertainty I had is why we ignore ENOENT at dir-iterator. Is it
so that files may be remove during iteration? If not, maybe we should
consider to start looking for them, as, for example, broken symlinks
will simply be ignored in this current version as an ENOENT will be
returned when trying to dereference them.

dir-iterator.c | 82 +++++++++++++++++++++++++------
dir-iterator.h | 50 ++++++++++++++-----
refs/files-backend.c | 2 +-
t/helper/test-dir-iterator.c | 34 ++++++++++---
t/t0066-dir-iterator.sh | 95 ++++++++++++++++++++++++++++++++++++
5 files changed, 228 insertions(+), 35 deletions(-)

diff --git a/dir-iterator.c b/dir-iterator.c
index 594fe4d67b..52db87bdc9 100644
--- a/dir-iterator.c
+++ b/dir-iterator.c
@@ -6,6 +6,9 @@
struct dir_iterator_level {
DIR *dir;

+ /* The inode number of this level's directory. */
+ ino_t ino;
+
/*
* The length of the directory part of path at this level
* (including a trailing '/'):
@@ -38,13 +41,16 @@ struct dir_iterator_int {
* that will be included in this iteration.
*/
struct dir_iterator_level *levels;
+
+ /* Combination of flags for this dir-iterator */
+ unsigned int flags;
};

/*
* Push a level in the iter stack and initialize it with information from
* the directory pointed by iter->base->path. It is assumed that this
* strbuf points to a valid directory path. Return 0 on success and -1
- * otherwise, leaving the stack unchanged.
+ * otherwise, setting errno accordingly and leaving the stack unchanged.
*/
static int push_level(struct dir_iterator_int *iter)
{
@@ -56,14 +62,17 @@ static int push_level(struct dir_iterator_int *iter)
if (!is_dir_sep(iter->base.path.buf[iter->base.path.len - 1]))
strbuf_addch(&iter->base.path, '/');
level->prefix_len = iter->base.path.len;
+ level->ino = iter->base.st.st_ino;

level->dir = opendir(iter->base.path.buf);
if (!level->dir) {
+ int saved_errno = errno;
if (errno != ENOENT) {
warning_errno("error opening directory '%s'",
iter->base.path.buf);
}
iter->levels_nr--;
+ errno = saved_errno;
return -1;
}

@@ -90,11 +99,13 @@ static int pop_level(struct dir_iterator_int *iter)
/*
* Populate iter->base with the necessary information on the next iteration
* entry, represented by the given dirent de. Return 0 on success and -1
- * otherwise.
+ * otherwise, setting errno accordingly.
*/
static int prepare_next_entry_data(struct dir_iterator_int *iter,
struct dirent *de)
{
+ int err, saved_errno;
+
strbuf_addstr(&iter->base.path, de->d_name);
/*
* We have to reset these because the path strbuf might have
@@ -105,12 +116,34 @@ static int prepare_next_entry_data(struct dir_iterator_int *iter,
iter->base.basename = iter->base.path.buf +
iter->levels[iter->levels_nr - 1].prefix_len;

- if (lstat(iter->base.path.buf, &iter->base.st)) {
- if (errno != ENOENT)
- warning_errno("failed to stat '%s'", iter->base.path.buf);
- return -1;
- }
+ if (iter->flags & DIR_ITERATOR_FOLLOW_SYMLINKS)
+ err = stat(iter->base.path.buf, &iter->base.st);
+ else
+ err = lstat(iter->base.path.buf, &iter->base.st);
+
+ saved_errno = errno;
+ if (err && errno != ENOENT)
+ warning_errno("failed to stat '%s'", iter->base.path.buf);
+
+ errno = saved_errno;
+ return err;
+}
+
+/*
+ * Look for a recursive symlink at iter->base.path pointing to any directory on
+ * the previous stack levels. If it is found, return 1. If not, return 0.
+ */
+static int find_recursive_symlinks(struct dir_iterator_int *iter)
+{
+ int i;
+
+ if (!(iter->flags & DIR_ITERATOR_FOLLOW_SYMLINKS) ||
+ !S_ISDIR(iter->base.st.st_mode))
+ return 0;

+ for (i = 0; i < iter->levels_nr; ++i)
+ if (iter->base.st.st_ino == iter->levels[i].ino)
+ return 1;
return 0;
}

@@ -119,11 +152,11 @@ int dir_iterator_advance(struct dir_iterator *dir_iterator)
struct dir_iterator_int *iter =
(struct dir_iterator_int *)dir_iterator;

- if (S_ISDIR(iter->base.st.st_mode)) {
- if (push_level(iter) && iter->levels_nr == 0) {
- /* Pushing the first level failed */
- return dir_iterator_abort(dir_iterator);
- }
+ if (S_ISDIR(iter->base.st.st_mode) && push_level(iter)) {
+ if (errno != ENOENT && iter->flags & DIR_ITERATOR_PEDANTIC)
+ goto error_out;
+ if (iter->levels_nr == 0)
+ goto error_out;
}

/* Loop until we find an entry that we can give back to the caller. */
@@ -137,22 +170,38 @@ int dir_iterator_advance(struct dir_iterator *dir_iterator)
de = readdir(level->dir);

if (!de) {
- if (errno)
+ if (errno) {
warning_errno("error reading directory '%s'",
iter->base.path.buf);
- else if (pop_level(iter) == 0)
+ if (iter->flags & DIR_ITERATOR_PEDANTIC)
+ goto error_out;
+ } else if (pop_level(iter) == 0) {
return dir_iterator_abort(dir_iterator);
+ }
continue;
}

if (is_dot_or_dotdot(de->d_name))
continue;

- if (prepare_next_entry_data(iter, de))
+ if (prepare_next_entry_data(iter, de)) {
+ if (errno != ENOENT && iter->flags & DIR_ITERATOR_PEDANTIC)
+ goto error_out;
continue;
+ }
+
+ if (find_recursive_symlinks(iter)) {
+ warning("ignoring recursive symlink at '%s'",
+ iter->base.path.buf);
+ continue;
+ }

return ITER_OK;
}
+
+error_out:
+ dir_iterator_abort(dir_iterator);
+ return ITER_ERROR;
}

int dir_iterator_abort(struct dir_iterator *dir_iterator)
@@ -178,7 +227,7 @@ int dir_iterator_abort(struct dir_iterator *dir_iterator)
return ITER_DONE;
}

-struct dir_iterator *dir_iterator_begin(const char *path)
+struct dir_iterator *dir_iterator_begin(const char *path, unsigned int flags)
{
struct dir_iterator_int *iter = xcalloc(1, sizeof(*iter));
struct dir_iterator *dir_iterator = &iter->base;
@@ -189,6 +238,7 @@ struct dir_iterator *dir_iterator_begin(const char *path)

ALLOC_GROW(iter->levels, 10, iter->levels_alloc);
iter->levels_nr = 0;
+ iter->flags = flags;

/*
* Note: stat already checks for NULL or empty strings and
diff --git a/dir-iterator.h b/dir-iterator.h
index 0822821e56..28e50dabdb 100644
--- a/dir-iterator.h
+++ b/dir-iterator.h
@@ -20,7 +20,8 @@
* A typical iteration looks like this:
*
* int ok;
- * struct iterator *iter = dir_iterator_begin(path);
+ * unsigned int flags = DIR_ITERATOR_PEDANTIC;
+ * struct iterator *iter = dir_iterator_begin(path, flags);
*
* if (!iter)
* goto error_handler;
@@ -44,6 +45,24 @@
* dir_iterator_advance() again.
*/

+/*
+ * Flags for dir_iterator_begin:
+ *
+ * - DIR_ITERATOR_PEDANTIC: override dir-iterator's default behavior
+ * in case of an error at dir_iterator_advance(), which is to keep
+ * looking for a next valid entry. With this flag, resources are freed
+ * and ITER_ERROR is returned immediately. In both cases, a meaningful
+ * warning is emitted.
+ *
+ * - DIR_ITERATOR_FOLLOW_SYMLINKS: make dir-iterator follow symlinks.
+ * i.e., linked directories' contents will be iterated over and
+ * iter->base.st will contain information on the referred files,
+ * not the symlinks themselves, which is the default behavior.
+ * Recursive symlinks are skipped.
+ */
+#define DIR_ITERATOR_PEDANTIC (1 << 0)
+#define DIR_ITERATOR_FOLLOW_SYMLINKS (1 << 1)
+
struct dir_iterator {
/* The current path: */
struct strbuf path;
@@ -58,29 +77,38 @@ struct dir_iterator {
/* The current basename: */
const char *basename;

- /* The result of calling lstat() on path: */
+ /*
+ * The result of calling lstat() on path; or stat(), if the
+ * DIR_ITERATOR_FOLLOW_SYMLINKS flag was set at
+ * dir_iterator's initialization.
+ */
struct stat st;
};

/*
- * Start a directory iteration over path. On success, return a
- * dir_iterator that holds the internal state of the iteration.
- * In case of failure, return NULL and set errno accordingly.
+ * Start a directory iteration over path with the combination of
+ * options specified by flags. On success, return a dir_iterator
+ * that holds the internal state of the iteration. In case of
+ * failure, return NULL and set errno accordingly.
*
* The iteration includes all paths under path, not including path
* itself and not including "." or ".." entries.
*
- * path is the starting directory. An internal copy will be made.
+ * Parameters are:
+ * - path is the starting directory. An internal copy will be made.
+ * - flags is a combination of the possible flags to initialize a
+ * dir-iterator or 0 for default behavior.
*/
-struct dir_iterator *dir_iterator_begin(const char *path);
+struct dir_iterator *dir_iterator_begin(const char *path, unsigned int flags);

/*
* Advance the iterator to the first or next item and return ITER_OK.
* If the iteration is exhausted, free the dir_iterator and any
- * resources associated with it and return ITER_DONE. On error, free
- * dir_iterator and associated resources and return ITER_ERROR. It is
- * a bug to use iterator or call this function again after it has
- * returned ITER_DONE or ITER_ERROR.
+ * resources associated with it and return ITER_DONE.
+ *
+ * It is a bug to use iterator or call this function again after it
+ * has returned ITER_DONE or ITER_ERROR (which may be returned iff
+ * the DIR_ITERATOR_PEDANTIC flag was set).
*/
int dir_iterator_advance(struct dir_iterator *iterator);

diff --git a/refs/files-backend.c b/refs/files-backend.c
index 97a54532e3..ce78656823 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -2150,7 +2150,7 @@ static struct ref_iterator *reflog_iterator_begin(struct ref_store *ref_store,

strbuf_addf(&sb, "%s/logs", gitdir);

- diter = dir_iterator_begin(sb.buf);
+ diter = dir_iterator_begin(sb.buf, 0);
if (!diter)
return empty_ref_iterator_begin();

diff --git a/t/helper/test-dir-iterator.c b/t/helper/test-dir-iterator.c
index fab1ff6237..a5b96cb0dc 100644
--- a/t/helper/test-dir-iterator.c
+++ b/t/helper/test-dir-iterator.c
@@ -4,29 +4,44 @@
#include "iterator.h"
#include "dir-iterator.h"

-/* Argument is a directory path to iterate over */
+/*
+ * usage:
+ * tool-test dir-iterator [--follow-symlinks] [--pedantic] directory_path
+ */
int cmd__dir_iterator(int argc, const char **argv)
{
struct strbuf path = STRBUF_INIT;
struct dir_iterator *diter;
+ unsigned int flags = 0;
+ int iter_status;
+
+ for (++argv, --argc; *argv && starts_with(*argv, "--"); ++argv, --argc) {
+ if (strcmp(*argv, "--follow-symlinks") == 0)
+ flags |= DIR_ITERATOR_FOLLOW_SYMLINKS;
+ else if (strcmp(*argv, "--pedantic") == 0)
+ flags |= DIR_ITERATOR_PEDANTIC;
+ else
+ die("invalid option '%s'", *argv);
+ }

- if (argc < 2)
- die("BUG: test-dir-iterator needs one argument");
-
- strbuf_add(&path, argv[1], strlen(argv[1]));
+ if (!*argv || argc != 1)
+ die("dir-iterator needs exactly one non-option argument");

- diter = dir_iterator_begin(path.buf);
+ strbuf_add(&path, *argv, strlen(*argv));
+ diter = dir_iterator_begin(path.buf, flags);

if (!diter) {
printf("dir_iterator_begin failure: %d\n", errno);
exit(EXIT_FAILURE);
}

- while (dir_iterator_advance(diter) == ITER_OK) {
+ while ((iter_status = dir_iterator_advance(diter)) == ITER_OK) {
if (S_ISDIR(diter->st.st_mode))
printf("[d] ");
else if (S_ISREG(diter->st.st_mode))
printf("[f] ");
+ else if (S_ISLNK(diter->st.st_mode))
+ printf("[s] ");
else
printf("[?] ");

@@ -34,5 +49,10 @@ int cmd__dir_iterator(int argc, const char **argv)
diter->path.buf);
}

+ if (iter_status != ITER_DONE) {
+ printf("dir_iterator_advance failure\n");
+ return 1;
+ }
+
return 0;
}
diff --git a/t/t0066-dir-iterator.sh b/t/t0066-dir-iterator.sh
index c739ed7911..8f996a31fa 100755
--- a/t/t0066-dir-iterator.sh
+++ b/t/t0066-dir-iterator.sh
@@ -65,4 +65,99 @@ test_expect_success 'begin should fail upon non directory paths' '
test_cmp expected-non-dir-output actual-non-dir-output
'

+test_expect_success POSIXPERM,SANITY 'advance should not fail on errors by default' '
+ cat >expected-no-permissions-output <<-EOF &&
+ [d] (a) [a] ./dir3/a
+ EOF
+
+ mkdir -p dir3/a &&
+ > dir3/a/b &&
+ chmod 0 dir3/a &&
+
+ test-tool dir-iterator ./dir3 >actual-no-permissions-output &&
+ test_cmp expected-no-permissions-output actual-no-permissions-output &&
+ chmod 755 dir3/a &&
+ rm -rf dir3
+'
+
+test_expect_success POSIXPERM,SANITY 'advance should fail on errors, w/ pedantic flag' '
+ cat >expected-no-permissions-pedantic-output <<-EOF &&
+ [d] (a) [a] ./dir3/a
+ dir_iterator_advance failure
+ EOF
+
+ mkdir -p dir3/a &&
+ > dir3/a/b &&
+ chmod 0 dir3/a &&
+
+ test_must_fail test-tool dir-iterator --pedantic ./dir3 \
+ >actual-no-permissions-pedantic-output &&
+ test_cmp expected-no-permissions-pedantic-output \
+ actual-no-permissions-pedantic-output &&
+ chmod 755 dir3/a &&
+ rm -rf dir3
+'
+
+test_expect_success SYMLINKS 'setup dirs with symlinks' '
+ mkdir -p dir4/a &&
+ mkdir -p dir4/b/c &&
+ >dir4/a/d &&
+ ln -s d dir4/a/e &&
+ ln -s ../b dir4/a/f &&
+
+ mkdir -p dir5/a/b &&
+ mkdir -p dir5/a/c &&
+ ln -s ../c dir5/a/b/d &&
+ ln -s ../ dir5/a/b/e &&
+ ln -s ../../ dir5/a/b/f
+'
+
+test_expect_success SYMLINKS 'dir-iterator should not follow symlinks by default' '
+ cat >expected-no-follow-sorted-output <<-EOF &&
+ [d] (a) [a] ./dir4/a
+ [d] (b) [b] ./dir4/b
+ [d] (b/c) [c] ./dir4/b/c
+ [f] (a/d) [d] ./dir4/a/d
+ [s] (a/e) [e] ./dir4/a/e
+ [s] (a/f) [f] ./dir4/a/f
+ EOF
+
+ test-tool dir-iterator ./dir4 >out &&
+ sort <out >actual-no-follow-sorted-output &&
+
+ test_cmp expected-no-follow-sorted-output actual-no-follow-sorted-output
+'
+
+test_expect_success SYMLINKS 'dir-iterator should follow symlinks w/ follow flag' '
+ cat >expected-follow-sorted-output <<-EOF &&
+ [d] (a) [a] ./dir4/a
+ [d] (a/f) [f] ./dir4/a/f
+ [d] (a/f/c) [c] ./dir4/a/f/c
+ [d] (b) [b] ./dir4/b
+ [d] (b/c) [c] ./dir4/b/c
+ [f] (a/d) [d] ./dir4/a/d
+ [f] (a/e) [e] ./dir4/a/e
+ EOF
+
+ test-tool dir-iterator --follow-symlinks ./dir4 >out &&
+ sort <out >actual-follow-sorted-output &&
+
+ test_cmp expected-follow-sorted-output actual-follow-sorted-output
+'
+
+
+test_expect_success SYMLINKS 'dir-iterator should ignore recursive symlinks w/ follow flag' '
+ cat >expected-rec-symlinks-sorted-output <<-EOF &&
+ [d] (a) [a] ./dir5/a
+ [d] (a/b) [b] ./dir5/a/b
+ [d] (a/b/d) [d] ./dir5/a/b/d
+ [d] (a/c) [c] ./dir5/a/c
+ EOF
+
+ test-tool dir-iterator --follow-symlinks ./dir5 >out &&
+ sort <out >actual-rec-symlinks-sorted-output &&
+
+ test_cmp expected-rec-symlinks-sorted-output actual-rec-symlinks-sorted-output

Matheus Tavares

unread,
May 2, 2019, 10:49:08 AM5/2/19
to g...@vger.kernel.org, Thomas Gummerer, Ævar Arnfjörð Bjarmason, Christian Couder, Nguyễn Thái Ngọc Duy, SZEDER Gábor, kerne...@googlegroups.com, Junio C Hamano, Michael Haggerty
Make the copy_or_link_directory function no longer skip hidden
directories. This function, used to copy .git/objects, currently skips
all hidden directories but not hidden files, which is an odd behaviour.
The reason for that could be unintentional: probably the intention was
to skip '.' and '..' only but it ended up accidentally skipping all
directories starting with '.'. Besides being more natural, the new
behaviour is more permissive to the user.

Also adjust tests to reflect this behaviour change.

Signed-off-by: Matheus Tavares <matheus.b...@usp.br>
Signed-off-by: Ævar Arnfjörð Bjarmason <ava...@gmail.com>
Co-authored-by: Ævar Arnfjörð Bjarmason <ava...@gmail.com>
---
builtin/clone.c | 2 +-
t/t5604-clone-reference.sh | 9 +++++++++
2 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/builtin/clone.c b/builtin/clone.c
index d1aba3b13f..f117a6b206 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -428,7 +428,7 @@ static void copy_or_link_directory(struct strbuf *src, struct strbuf *dest,
continue;
}
if (S_ISDIR(buf.st_mode)) {
- if (de->d_name[0] != '.')
+ if (!is_dot_or_dotdot(de->d_name))
copy_or_link_directory(src, dest,
src_repo, src_baselen);
continue;
diff --git a/t/t5604-clone-reference.sh b/t/t5604-clone-reference.sh
index 0800c3853f..c3998f2f9e 100755
--- a/t/t5604-clone-reference.sh
+++ b/t/t5604-clone-reference.sh
@@ -247,16 +247,25 @@ test_expect_success 'clone a repo with garbage in objects/*' '
done &&
find S-* -name "*some*" | sort >actual &&
cat >expected <<-EOF &&

Matheus Tavares

unread,
May 2, 2019, 10:49:11 AM5/2/19
to g...@vger.kernel.org, Thomas Gummerer, Ævar Arnfjörð Bjarmason, Christian Couder, Nguyễn Thái Ngọc Duy, SZEDER Gábor, kerne...@googlegroups.com, Junio C Hamano, Michael Haggerty
Extract dir creation code snippet from copy_or_link_directory to its own
function named mkdir_if_missing. This change will help to remove
copy_or_link_directory's explicit recursion, which will be done in a
following patch. Also makes the code more readable.

Signed-off-by: Matheus Tavares <matheus.b...@usp.br>
---
builtin/clone.c | 24 ++++++++++++++++--------
1 file changed, 16 insertions(+), 8 deletions(-)

diff --git a/builtin/clone.c b/builtin/clone.c
index f117a6b206..1ee6d6050e 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -392,6 +392,21 @@ static void copy_alternates(struct strbuf *src, struct strbuf *dst,
fclose(in);
}

+static void mkdir_if_missing(const char *pathname, mode_t mode)
+{
+ struct stat st;
+
+ if (!mkdir(pathname, mode))
+ return;
+
+ if (errno != EEXIST)
+ die_errno(_("failed to create directory '%s'"), pathname);
+ else if (stat(pathname, &st))
+ die_errno(_("failed to stat '%s'"), pathname);
+ else if (!S_ISDIR(st.st_mode))
+ die(_("%s exists and is not a directory"), pathname);
+}
+
static void copy_or_link_directory(struct strbuf *src, struct strbuf *dest,
const char *src_repo, int src_baselen)
{
@@ -404,14 +419,7 @@ static void copy_or_link_directory(struct strbuf *src, struct strbuf *dest,
if (!dir)
die_errno(_("failed to open '%s'"), src->buf);

- if (mkdir(dest->buf, 0777)) {
- if (errno != EEXIST)
- die_errno(_("failed to create directory '%s'"), dest->buf);
- else if (stat(dest->buf, &buf))
- die_errno(_("failed to stat '%s'"), dest->buf);
- else if (!S_ISDIR(buf.st_mode))
- die(_("%s exists and is not a directory"), dest->buf);
- }
+ mkdir_if_missing(dest->buf, 0777);

strbuf_addch(src, '/');
src_len = src->len;
--
2.20.1

Matheus Tavares

unread,
May 2, 2019, 10:49:13 AM5/2/19
to g...@vger.kernel.org, Thomas Gummerer, Ævar Arnfjörð Bjarmason, Christian Couder, Nguyễn Thái Ngọc Duy, SZEDER Gábor, kerne...@googlegroups.com, Michael Haggerty, Junio C Hamano
Replace usage of opendir/readdir/closedir API to traverse directories
recursively, at copy_or_link_directory function, by the dir-iterator
API. This simplifies the code and avoids recursive calls to
copy_or_link_directory.

This process also makes copy_or_link_directory call die() in case of an
error on readdir or stat inside dir_iterator_advance. Previously it
would just print a warning for errors on stat and ignore errors on
readdir, which isn't nice because a local git clone could succeed even
though the .git/objects copy didn't fully succeed. Also, with the
dir-iterator API, recursive symlinks will be detected and skipped. This
is another behavior improvement, since the current version would
continue to copy the same content over and over until stat() returned an
ELOOP error.

Signed-off-by: Matheus Tavares <matheus.b...@usp.br>
---
builtin/clone.c | 47 +++++++++++++++++++++++++----------------------
1 file changed, 25 insertions(+), 22 deletions(-)

diff --git a/builtin/clone.c b/builtin/clone.c
index 1ee6d6050e..f99acd878f 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -23,6 +23,8 @@
#include "transport.h"
#include "strbuf.h"
#include "dir.h"
+#include "dir-iterator.h"
+#include "iterator.h"
#include "sigchain.h"
#include "branch.h"
#include "remote.h"
@@ -408,42 +410,39 @@ static void mkdir_if_missing(const char *pathname, mode_t mode)
}

static void copy_or_link_directory(struct strbuf *src, struct strbuf *dest,
- const char *src_repo, int src_baselen)
+ const char *src_repo)
{
- struct dirent *de;
- struct stat buf;
int src_len, dest_len;
- DIR *dir;
-
- dir = opendir(src->buf);
- if (!dir)
- die_errno(_("failed to open '%s'"), src->buf);
+ struct dir_iterator *iter;
+ int iter_status;
+ unsigned int flags;

mkdir_if_missing(dest->buf, 0777);

+ flags = DIR_ITERATOR_PEDANTIC | DIR_ITERATOR_FOLLOW_SYMLINKS;
+ iter = dir_iterator_begin(src->buf, flags);
+
+ if (!iter)
+ die_errno(_("failed to start iterator over '%s'"), src->buf);
+
strbuf_addch(src, '/');
src_len = src->len;
strbuf_addch(dest, '/');
dest_len = dest->len;

- while ((de = readdir(dir)) != NULL) {
+ while ((iter_status = dir_iterator_advance(iter)) == ITER_OK) {
strbuf_setlen(src, src_len);
- strbuf_addstr(src, de->d_name);
+ strbuf_addstr(src, iter->relative_path);
strbuf_setlen(dest, dest_len);
- strbuf_addstr(dest, de->d_name);
- if (stat(src->buf, &buf)) {
- warning (_("failed to stat %s\n"), src->buf);
- continue;
- }
- if (S_ISDIR(buf.st_mode)) {
- if (!is_dot_or_dotdot(de->d_name))
- copy_or_link_directory(src, dest,
- src_repo, src_baselen);
+ strbuf_addstr(dest, iter->relative_path);
+
+ if (S_ISDIR(iter->st.st_mode)) {
+ mkdir_if_missing(dest->buf, 0777);
continue;
}

/* Files that cannot be copied bit-for-bit... */
- if (!strcmp(src->buf + src_baselen, "/info/alternates")) {
+ if (!strcmp(iter->relative_path, "info/alternates")) {
copy_alternates(src, dest, src_repo);
continue;
}
@@ -460,7 +459,11 @@ static void copy_or_link_directory(struct strbuf *src, struct strbuf *dest,
if (copy_file_with_time(dest->buf, src->buf, 0666))
die_errno(_("failed to copy file to '%s'"), dest->buf);
}
- closedir(dir);
+
+ if (iter_status != ITER_DONE) {
+ strbuf_setlen(src, src_len);
+ die(_("failed to iterate over '%s'"), src->buf);
+ }
}

static void clone_local(const char *src_repo, const char *dest_repo)
@@ -478,7 +481,7 @@ static void clone_local(const char *src_repo, const char *dest_repo)

Matheus Tavares

unread,
May 2, 2019, 10:49:16 AM5/2/19
to g...@vger.kernel.org, Thomas Gummerer, Ævar Arnfjörð Bjarmason, Christian Couder, Nguyễn Thái Ngọc Duy, SZEDER Gábor, kerne...@googlegroups.com, Junio C Hamano, Michael Haggerty
Replace the use of strcmp by fspathcmp at copy_or_link_directory, which
is more permissive/friendly to case-insensitive file systems.

Signed-off-by: Matheus Tavares <matheus.b...@usp.br>
Suggested-by: Nguyễn Thái Ngọc Duy <pcl...@gmail.com>
---
builtin/clone.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/builtin/clone.c b/builtin/clone.c
index f99acd878f..6e0f194c3b 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -442,7 +442,7 @@ static void copy_or_link_directory(struct strbuf *src, struct strbuf *dest,
}

/* Files that cannot be copied bit-for-bit... */
- if (!strcmp(iter->relative_path, "info/alternates")) {
+ if (!fspathcmp(iter->relative_path, "info/alternates")) {
copy_alternates(src, dest, src_repo);
continue;
}
--
2.20.1

Matheus Tavares

unread,
Jun 18, 2019, 7:28:24 PM6/18/19
to g...@vger.kernel.org, Thomas Gummerer, Ævar Arnfjörð Bjarmason, Christian Couder, Nguyễn Thái Ngọc Duy, SZEDER Gábor, Olga Telezhnaya, kerne...@googlegroups.com
This patchset contains:
- tests to the dir-iterator API;
- dir-iterator refactoring to make its state machine simpler
and feature adding with tests;
- a replacement of explicit recursive dir iteration at
copy_or_link_directory for the dir-iterator API;
- some refactoring and behavior changes at local clone, mainly to
take care of symlinks and hidden files at .git/objects, together
with tests for these types of files.

Changes since v6:
- Rebased with master;
- Added to dir-iterator documentation that ENOENT errors and hence broken
symlinks are both ignored.

With the changes brought by this patchset, dir_iterator_begin() may now
return NULL (setting errno) when it finds an error. Also, it's possible
to pass a pedantic flag to it so that dir_iterator_advance() return
immediately on errors. But at refs/files-backend.c, the only user of
the API so far, the flag wasn't used and an empty iterator is
returned in case of errors at dir_iterator_begin(). These actions were
taken in order to keep the files-backend's behavior as close as
possible to the one it previously had. But since it already has guards
for possible errors at dir_iterator_advance(), I'm wondering whether I
should send a follow-up patch making it use the pedantic flag.

Also, should I perhaps call die_errno() on dir_iterator_begin() errors
at files-backend? I mean, we should continue returning an empty
iterator on ENOENT errors since ".git/logs", the dir it iterates over,
may not be present. But we could possibly abort on other errors, just
to be sure...

Any comments on this possible follow-up patch will be highly appreciated.

v6: https://public-inbox.org/git/20190502144829.4394-...@usp.br/
travis build: https://travis-ci.org/matheustavares/git/builds/547451528

Daniel Ferreira (1):
dir-iterator: add tests for dir-iterator API

Matheus Tavares (8):
clone: better handle symlinked files at .git/objects/
dir-iterator: use warning_errno when possible
dir-iterator: refactor state machine model
dir-iterator: add flags parameter to dir_iterator_begin
clone: copy hidden paths at local clone
clone: extract function from copy_or_link_directory
clone: use dir-iterator to avoid explicit dir traversal
clone: replace strcmp by fspathcmp

Ævar Arnfjörð Bjarmason (1):
clone: test for our behavior on odd objects/* content

Makefile | 1 +
builtin/clone.c | 75 +++++----
dir-iterator.c | 289 +++++++++++++++++++++--------------
dir-iterator.h | 60 ++++++--
refs/files-backend.c | 17 ++-
t/helper/test-dir-iterator.c | 58 +++++++
t/helper/test-tool.c | 1 +
t/helper/test-tool.h | 1 +
t/t0066-dir-iterator.sh | 163 ++++++++++++++++++++
t/t5604-clone-reference.sh | 133 ++++++++++++++++
10 files changed, 635 insertions(+), 163 deletions(-)
create mode 100644 t/helper/test-dir-iterator.c
create mode 100755 t/t0066-dir-iterator.sh

--
2.22.0

Matheus Tavares

unread,
Jun 18, 2019, 7:28:46 PM6/18/19
to g...@vger.kernel.org, Thomas Gummerer, Ævar Arnfjörð Bjarmason, Christian Couder, Nguyễn Thái Ngọc Duy, SZEDER Gábor, Olga Telezhnaya, kerne...@googlegroups.com, Alex Riesen, Junio C Hamano
From: Ævar Arnfjörð Bjarmason <ava...@gmail.com>

Add tests for what happens when we perform a local clone on a repo
containing odd files at .git/object directory, such as symlinks to other
dirs, or unknown files.

I'm bending over backwards here to avoid a SHA-1 dependency. See [1]
for an earlier and simpler version that hardcoded SHA-1s.

This behavior has been the same for a *long* time, but hasn't been
tested for.

There's a good post-hoc argument to be made for copying over unknown
things, e.g. I'd like a git version that doesn't know about the
commit-graph to copy it under "clone --local" so a newer git version
can make use of it.

In follow-up commits we'll look at changing some of this behavior, but
for now, let's just assert it as-is so we'll notice what we'll change
later.

1. https://public-inbox.org/git/20190226002625...@gmail.com/

Signed-off-by: Ævar Arnfjörð Bjarmason <ava...@gmail.com>
[matheus.bernardino: improved and split tests in more than one patch]
Helped-by: Matheus Tavares <matheus.b...@usp.br>
Signed-off-by: Matheus Tavares <matheus.b...@usp.br>
---
t/t5604-clone-reference.sh | 111 +++++++++++++++++++++++++++++++++++++
1 file changed, 111 insertions(+)

diff --git a/t/t5604-clone-reference.sh b/t/t5604-clone-reference.sh
index 4320082b1b..207650cb95 100755
--- a/t/t5604-clone-reference.sh
+++ b/t/t5604-clone-reference.sh
+'
+
+test_expect_success SYMLINKS 'setup repo with manually symlinked dirs and unknown files at objects/' '
+ git init T &&
+ (
+ cd T &&
+ git config gc.auto 0 &&
+ test_commit A &&
+ git gc &&
+ test_commit B &&
+
+ cd .git/objects &&
+ mv pack packs &&
+ ln -s packs pack &&
+ find ?? -type d >loose-dirs &&
+ last_loose=$(tail -n 1 loose-dirs) &&
+ rm -f loose-dirs &&
+ mv $last_loose a-loose-dir &&
+ ln -s a-loose-dir $last_loose &&
+ find . -type f | sort >../../../T.objects-files.raw &&
+ echo unknown_content> unknown_file
+ ) &&
+ git -C T fsck &&
+ git -C T rev-list --all --objects >T.objects
+'
+
+
+test_expect_success SYMLINKS 'clone repo with symlinked dirs and unknown files at objects/' '
+ for option in --local --no-hardlinks --shared --dissociate
+ do
+ for option in --local --dissociate --no-hardlinks
+ do
+ test_cmp expected-files T$option.objects-files.raw.de-sha || return 1
+ done &&
+
+ echo ./info/alternates >expected-files &&
+ test_cmp expected-files T--shared.objects-files.raw
+'
+
test_done
--
2.22.0

Matheus Tavares

unread,
Jun 18, 2019, 7:28:56 PM6/18/19
to g...@vger.kernel.org, Thomas Gummerer, Ævar Arnfjörð Bjarmason, Christian Couder, Nguyễn Thái Ngọc Duy, SZEDER Gábor, Olga Telezhnaya, kerne...@googlegroups.com, Junio C Hamano, Jeff King
Signed-off-by: Ævar Arnfjörð Bjarmason <ava...@gmail.com>
Co-authored-by: Ævar Arnfjörð Bjarmason <ava...@gmail.com>
---
builtin/clone.c | 2 +-
t/t5604-clone-reference.sh | 27 ++++++++++++++++++++-------
2 files changed, 21 insertions(+), 8 deletions(-)

diff --git a/builtin/clone.c b/builtin/clone.c
index 5b9ebe9947..4a0a2455a7 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -445,7 +445,7 @@ static void copy_or_link_directory(struct strbuf *src, struct strbuf *dest,
if (unlink(dest->buf) && errno != ENOENT)
die_errno(_("failed to unlink '%s'"), dest->buf);
if (!option_no_hardlinks) {
- if (!link(src->buf, dest->buf))
+ if (!link(real_path(src->buf), dest->buf))
continue;
if (option_local > 0)
die_errno(_("failed to create link '%s'"), dest->buf);
diff --git a/t/t5604-clone-reference.sh b/t/t5604-clone-reference.sh
index 207650cb95..0800c3853f 100755
--- a/t/t5604-clone-reference.sh
+++ b/t/t5604-clone-reference.sh
@@ -266,7 +266,7 @@ test_expect_success 'clone a repo with garbage in objects/*' '
test_cmp expected actual
'

-test_expect_success SYMLINKS 'setup repo with manually symlinked dirs and unknown files at objects/' '
+test_expect_success SYMLINKS 'setup repo with manually symlinked or unknown files at objects/' '
git init T &&
(
cd T &&
@@ -280,10 +280,19 @@ test_expect_success SYMLINKS 'setup repo with manually symlinked dirs and unknow
ln -s packs pack &&
find ?? -type d >loose-dirs &&
last_loose=$(tail -n 1 loose-dirs) &&
- rm -f loose-dirs &&
mv $last_loose a-loose-dir &&
ln -s a-loose-dir $last_loose &&
+ first_loose=$(head -n 1 loose-dirs) &&
+ rm -f loose-dirs &&
+
+ cd $first_loose &&
+ obj=$(ls *) &&
+ mv $obj ../an-object &&
+ ln -s ../an-object $obj &&
+
+ cd ../ &&
find . -type f | sort >../../../T.objects-files.raw &&
+ find . -type l | sort >../../../T.objects-symlinks.raw &&
echo unknown_content> unknown_file
) &&
git -C T fsck &&
@@ -291,7 +300,7 @@ test_expect_success SYMLINKS 'setup repo with manually symlinked dirs and unknow
'


-test_expect_success SYMLINKS 'clone repo with symlinked dirs and unknown files at objects/' '
+test_expect_success SYMLINKS 'clone repo with symlinked or unknown files at objects/' '
for option in --local --no-hardlinks --shared --dissociate
do
git clone $option T T$option || return 1 &&
2.22.0

Matheus Tavares

unread,
Jun 18, 2019, 7:29:03 PM6/18/19
to g...@vger.kernel.org, Thomas Gummerer, Ævar Arnfjörð Bjarmason, Christian Couder, Nguyễn Thái Ngọc Duy, SZEDER Gábor, Olga Telezhnaya, kerne...@googlegroups.com, Daniel Ferreira, Junio C Hamano
From: Daniel Ferreira <bnm...@gmail.com>

Create t/helper/test-dir-iterator.c, which prints relevant information
about a directory tree iterated over with dir-iterator.

Create t/t0066-dir-iterator.sh, which tests that dir-iterator does
iterate through a whole directory tree as expected.

Signed-off-by: Daniel Ferreira <bnm...@gmail.com>
[matheus.bernardino: update to use test-tool and some minor aesthetics]
Helped-by: Matheus Tavares <matheus.b...@usp.br>
Signed-off-by: Matheus Tavares <matheus.b...@usp.br>
---
Makefile | 1 +
t/helper/test-dir-iterator.c | 33 ++++++++++++++++++++++
t/helper/test-tool.c | 1 +
t/helper/test-tool.h | 1 +
t/t0066-dir-iterator.sh | 55 ++++++++++++++++++++++++++++++++++++
5 files changed, 91 insertions(+)
create mode 100644 t/helper/test-dir-iterator.c
create mode 100755 t/t0066-dir-iterator.sh

diff --git a/Makefile b/Makefile
index f58bf14c7b..7e2a44cccc 100644
--- a/Makefile
+++ b/Makefile
@@ -704,6 +704,7 @@ TEST_BUILTINS_OBJS += test-config.o
TEST_BUILTINS_OBJS += test-ctype.o
TEST_BUILTINS_OBJS += test-date.o
TEST_BUILTINS_OBJS += test-delta.o
+TEST_BUILTINS_OBJS += test-dir-iterator.o
TEST_BUILTINS_OBJS += test-drop-caches.o
TEST_BUILTINS_OBJS += test-dump-cache-tree.o
TEST_BUILTINS_OBJS += test-dump-fsmonitor.o
diff --git a/t/helper/test-dir-iterator.c b/t/helper/test-dir-iterator.c
new file mode 100644
index 0000000000..84f50bed8c
--- /dev/null
+++ b/t/helper/test-dir-iterator.c
@@ -0,0 +1,33 @@
+#include "test-tool.h"
+#include "git-compat-util.h"
+#include "strbuf.h"
+#include "iterator.h"
+#include "dir-iterator.h"
+
+/* Argument is a directory path to iterate over */
+int cmd__dir_iterator(int argc, const char **argv)
+{
+ struct strbuf path = STRBUF_INIT;
+ struct dir_iterator *diter;
+
+ if (argc < 2)
+ die("BUG: test-dir-iterator needs one argument");
+
+ strbuf_add(&path, argv[1], strlen(argv[1]));
+
+ diter = dir_iterator_begin(path.buf);
+
+ while (dir_iterator_advance(diter) == ITER_OK) {
+ if (S_ISDIR(diter->st.st_mode))
+ printf("[d] ");
+ else if (S_ISREG(diter->st.st_mode))
+ printf("[f] ");
+ else
+ printf("[?] ");
+
+ printf("(%s) [%s] %s\n", diter->relative_path, diter->basename,
+ diter->path.buf);
+ }
+
+ return 0;
+}
diff --git a/t/helper/test-tool.c b/t/helper/test-tool.c
index 087a8c0cc9..7bc9bb231e 100644
--- a/t/helper/test-tool.c
+++ b/t/helper/test-tool.c
@@ -19,6 +19,7 @@ static struct test_cmd cmds[] = {
{ "ctype", cmd__ctype },
{ "date", cmd__date },
{ "delta", cmd__delta },
+ { "dir-iterator", cmd__dir_iterator },
{ "drop-caches", cmd__drop_caches },
{ "dump-cache-tree", cmd__dump_cache_tree },
{ "dump-fsmonitor", cmd__dump_fsmonitor },
diff --git a/t/helper/test-tool.h b/t/helper/test-tool.h
index 7e703f3038..ec0ffbd0cb 100644
+ >dir/d/e/d/a &&
+
+ mkdir -p dir2/a/b/c/ &&
+ >dir2/a/b/c/d
+'
+
+test_expect_success 'dir-iterator should iterate through all files' '
+ cat >expected-iteration-sorted-output <<-EOF &&
+ [d] (a) [a] ./dir/a
+ [d] (a/b) [b] ./dir/a/b
+ [d] (a/b/c) [c] ./dir/a/b/c
+ [d] (d) [d] ./dir/d
+ [d] (d/e) [e] ./dir/d/e
+ [d] (d/e/d) [d] ./dir/d/e/d
+ [f] (a/b/c/d) [d] ./dir/a/b/c/d
+ [f] (a/e) [e] ./dir/a/e
+ [f] (b) [b] ./dir/b
+ [f] (c) [c] ./dir/c
+ [f] (d/e/d/a) [a] ./dir/d/e/d/a
+ EOF
+
+ test-tool dir-iterator ./dir >out &&
+ sort <out >./actual-iteration-sorted-output &&
+
+ test_cmp expected-iteration-sorted-output actual-iteration-sorted-output
+'
+
+test_expect_success 'dir-iterator should list files in the correct order' '
+ cat >expected-pre-order-output <<-EOF &&
+ [d] (a) [a] ./dir2/a
+ [d] (a/b) [b] ./dir2/a/b
+ [d] (a/b/c) [c] ./dir2/a/b/c
+ [f] (a/b/c/d) [d] ./dir2/a/b/c/d
+ EOF
+
+ test-tool dir-iterator ./dir2 >actual-pre-order-output &&
+
+ test_cmp expected-pre-order-output actual-pre-order-output
+'
+
+test_done
--
2.22.0

Matheus Tavares

unread,
Jun 18, 2019, 7:29:08 PM6/18/19
to g...@vger.kernel.org, Thomas Gummerer, Ævar Arnfjörð Bjarmason, Christian Couder, Nguyễn Thái Ngọc Duy, SZEDER Gábor, Olga Telezhnaya, kerne...@googlegroups.com, Junio C Hamano, Michael Haggerty
Change warning(..., strerror(errno)) by warning_errno(...). This helps
to unify warning display besides simplifying a bit the code. Also,
improve warning messages by surrounding paths with quotation marks and
using more meaningful statements.

Signed-off-by: Matheus Tavares <matheus.b...@usp.br>
---
dir-iterator.c | 23 ++++++++++++-----------
1 file changed, 12 insertions(+), 11 deletions(-)

diff --git a/dir-iterator.c b/dir-iterator.c
index f2dcd82fde..0c8880868a 100644
--- a/dir-iterator.c
+++ b/dir-iterator.c
@@ -71,8 +71,8 @@ int dir_iterator_advance(struct dir_iterator *dir_iterator)

level->dir = opendir(iter->base.path.buf);
if (!level->dir && errno != ENOENT) {
- warning("error opening directory %s: %s",
- iter->base.path.buf, strerror(errno));
+ warning_errno("error opening directory '%s'",
+ iter->base.path.buf);
/* Popping the level is handled below */
}

@@ -122,11 +122,11 @@ int dir_iterator_advance(struct dir_iterator *dir_iterator)
if (!de) {
/* This level is exhausted; pop up a level. */
if (errno) {
- warning("error reading directory %s: %s",
- iter->base.path.buf, strerror(errno));
+ warning_errno("error reading directory '%s'",
+ iter->base.path.buf);
} else if (closedir(level->dir))
- warning("error closing directory %s: %s",
- iter->base.path.buf, strerror(errno));
+ warning_errno("error closing directory '%s'",
+ iter->base.path.buf);

level->dir = NULL;
if (--iter->levels_nr == 0)
@@ -140,9 +140,8 @@ int dir_iterator_advance(struct dir_iterator *dir_iterator)
strbuf_addstr(&iter->base.path, de->d_name);
if (lstat(iter->base.path.buf, &iter->base.st) < 0) {
if (errno != ENOENT)
- warning("error reading path '%s': %s",
- iter->base.path.buf,
- strerror(errno));
+ warning_errno("failed to stat '%s'",
+ iter->base.path.buf);
continue;
}

@@ -170,9 +169,11 @@ int dir_iterator_abort(struct dir_iterator *dir_iterator)
&iter->levels[iter->levels_nr - 1];

if (level->dir && closedir(level->dir)) {
+ int saved_errno = errno;
strbuf_setlen(&iter->base.path, level->prefix_len);
- warning("error closing directory %s: %s",
- iter->base.path.buf, strerror(errno));
+ errno = saved_errno;
+ warning_errno("error closing directory '%s'",
+ iter->base.path.buf);
}
}

--
2.22.0

Matheus Tavares

unread,
Jun 18, 2019, 7:29:18 PM6/18/19
to g...@vger.kernel.org, Thomas Gummerer, Ævar Arnfjörð Bjarmason, Christian Couder, Nguyễn Thái Ngọc Duy, SZEDER Gábor, Olga Telezhnaya, kerne...@googlegroups.com, Daniel Ferreira, Jeff King, Johannes Schindelin, Michael Haggerty, Junio C Hamano, Ramsay Jones
dir_iterator_advance() is a large function with two nested loops. Let's
improve its readability factoring out three functions and simplifying
its mechanics. The refactored model will no longer depend on
level.initialized and level.dir_state to keep track of the iteration
state and will perform on a single loop.

Also, dir_iterator_begin() currently does not check if the given string
represents a valid directory path. Since the refactored model will have
to stat() the given path at initialization, let's also check for this
kind of error and make dir_iterator_begin() return NULL, on failures,
with errno appropriately set. And add tests for this new behavior.

Improve documentation at dir-iteration.h and code comments at
dir-iterator.c to reflect the changes and eliminate possible
ambiguities.

Finally, adjust refs/files-backend.c to check for now possible
dir_iterator_begin() failures.

Original-patch-by: Daniel Ferreira <bnm...@gmail.com>
Signed-off-by: Matheus Tavares <matheus.b...@usp.br>
---
dir-iterator.c | 234 ++++++++++++++++++-----------------
dir-iterator.h | 15 ++-
refs/files-backend.c | 17 ++-
t/helper/test-dir-iterator.c | 5 +
t/t0066-dir-iterator.sh | 13 ++
5 files changed, 163 insertions(+), 121 deletions(-)

diff --git a/dir-iterator.c b/dir-iterator.c
index 0c8880868a..594fe4d67b 100644
--- a/dir-iterator.c
+++ b/dir-iterator.c
@@ -4,8 +4,6 @@
#include "dir-iterator.h"

struct dir_iterator_level {
- int initialized;
-
DIR *dir;

/*
@@ -13,16 +11,6 @@ struct dir_iterator_level {
* (including a trailing '/'):
+ level->prefix_len = iter->base.path.len;
+
+ level->dir = opendir(iter->base.path.buf);
+ if (!level->dir) {
+ if (errno != ENOENT) {
+ warning_errno("error opening directory '%s'",
+ iter->base.path.buf);
+ }
+ iter->levels_nr--;
+ return -1;
+ }
+
+ return 0;
+}
+
+/*
+ * Pop the top level on the iter stack, releasing any resources associated
+ * with it. Return the new value of iter->levels_nr.
+ */
+static int pop_level(struct dir_iterator_int *iter)
+{
+ struct dir_iterator_level *level =
+ &iter->levels[iter->levels_nr - 1];
+
+ if (level->dir && closedir(level->dir))
+ warning_errno("error closing directory '%s'",
+ iter->base.path.buf);
+ level->dir = NULL;
+
+ return --iter->levels_nr;
+}
+
+/*
+ * Populate iter->base with the necessary information on the next iteration
+ * entry, represented by the given dirent de. Return 0 on success and -1
+ * otherwise.
+ */
+static int prepare_next_entry_data(struct dir_iterator_int *iter,
+ struct dirent *de)
+{
+ strbuf_addstr(&iter->base.path, de->d_name);
+ /*
+ * We have to reset these because the path strbuf might have
+ * been realloc()ed at the previous strbuf_addstr().
+ */
+ iter->base.relative_path = iter->base.path.buf +
+ iter->levels[0].prefix_len;
+ iter->base.basename = iter->base.path.buf +
+ iter->levels[iter->levels_nr - 1].prefix_len;
+
+ if (lstat(iter->base.path.buf, &iter->base.st)) {
+ if (errno != ENOENT)
+ warning_errno("failed to stat '%s'", iter->base.path.buf);
+ return -1;
+ }
+
+ return 0;
+}
+
int dir_iterator_advance(struct dir_iterator *dir_iterator)
{
struct dir_iterator_int *iter =
(struct dir_iterator_int *)dir_iterator;

+ if (S_ISDIR(iter->base.st.st_mode)) {
+ if (push_level(iter) && iter->levels_nr == 0) {
+ /* Pushing the first level failed */
+ return dir_iterator_abort(dir_iterator);
+ }
+ }
+
+ /* Loop until we find an entry that we can give back to the caller. */
while (1) {
+ struct dirent *de;
struct dir_iterator_level *level =
&iter->levels[iter->levels_nr - 1];
- struct dirent *de;

- if (!level->initialized) {
- /*
- * Note: dir_iterator_begin() ensures that
- * path is not the empty string.
- */
- if (!is_dir_sep(iter->base.path.buf[iter->base.path.len - 1]))
- strbuf_addch(&iter->base.path, '/');
- level->prefix_len = iter->base.path.len;
-
- level->dir = opendir(iter->base.path.buf);
- if (!level->dir && errno != ENOENT) {
- warning_errno("error opening directory '%s'",
+ strbuf_setlen(&iter->base.path, level->prefix_len);
+ errno = 0;
+ de = readdir(level->dir);
+
+ if (!de) {
+ if (errno)
+ warning_errno("error reading directory '%s'",
+ else if (pop_level(iter) == 0)
@@ -187,17 +182,32 @@ struct dir_iterator *dir_iterator_begin(const char *path)
{
struct dir_iterator_int *iter = xcalloc(1, sizeof(*iter));
struct dir_iterator *dir_iterator = &iter->base;
-
- if (!path || !*path)
- BUG("empty path passed to dir_iterator_begin()");
+ int saved_errno;

strbuf_init(&iter->base.path, PATH_MAX);
strbuf_addstr(&iter->base.path, path);

ALLOC_GROW(iter->levels, 10, iter->levels_alloc);
+ iter->levels_nr = 0;

- iter->levels_nr = 1;
- iter->levels[0].initialized = 0;
+ /*
+ * Note: stat already checks for NULL or empty strings and
+ * inexistent paths.
+ */
+ if (stat(iter->base.path.buf, &iter->base.st) < 0) {
+ saved_errno = errno;
+ goto error_out;
+ }
+
+ if (!S_ISDIR(iter->base.st.st_mode)) {
+ saved_errno = ENOTDIR;
+ goto error_out;
+ }

return dir_iterator;
+
+error_out:
+ dir_iterator_abort(dir_iterator);
+ errno = saved_errno;
+ return NULL;
}
diff --git a/dir-iterator.h b/dir-iterator.h
index 970793d07a..0822821e56 100644
--- a/dir-iterator.h
+++ b/dir-iterator.h
@@ -8,19 +8,23 @@
*
* Iterate over a directory tree, recursively, including paths of all
* types and hidden paths. Skip "." and ".." entries and don't follow
- * symlinks except for the original path.
+ * symlinks except for the original path. Note that the original path
+ * is not included in the iteration.
*
* Every time dir_iterator_advance() is called, update the members of
* the dir_iterator structure to reflect the next path in the
* iteration. The order that paths are iterated over within a
- * directory is undefined, but directory paths are always iterated
- * over before the subdirectory contents.
+ * directory is undefined, directory paths are always given before
+ * their contents.
*
* A typical iteration looks like this:
*
* int ok;
* struct iterator *iter = dir_iterator_begin(path);
*
+ * if (!iter)
+ * goto error_handler;
+ *
* while ((ok = dir_iterator_advance(iter)) == ITER_OK) {
* if (want_to_stop_iteration()) {
* ok = dir_iterator_abort(iter);
@@ -59,8 +63,9 @@ struct dir_iterator {
};

/*
- * Start a directory iteration over path. Return a dir_iterator that
- * holds the internal state of the iteration.
+ * Start a directory iteration over path. On success, return a
+ * dir_iterator that holds the internal state of the iteration.
+ * In case of failure, return NULL and set errno accordingly.
*
* The iteration includes all paths under path, not including path
* itself and not including "." or ".." entries.
diff --git a/refs/files-backend.c b/refs/files-backend.c
index 63e55e6773..7ed81046d4 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -2143,13 +2143,22 @@ static struct ref_iterator_vtable files_reflog_iterator_vtable = {
static struct ref_iterator *reflog_iterator_begin(struct ref_store *ref_store,
const char *gitdir)
{
- struct files_reflog_iterator *iter = xcalloc(1, sizeof(*iter));
- struct ref_iterator *ref_iterator = &iter->base;
+ struct dir_iterator *diter;
+ struct files_reflog_iterator *iter;
+ struct ref_iterator *ref_iterator;
struct strbuf sb = STRBUF_INIT;

- base_ref_iterator_init(ref_iterator, &files_reflog_iterator_vtable, 0);
strbuf_addf(&sb, "%s/logs", gitdir);
- iter->dir_iterator = dir_iterator_begin(sb.buf);
+
+ diter = dir_iterator_begin(sb.buf);
+ if(!diter)
+ return empty_ref_iterator_begin();
+
+ iter = xcalloc(1, sizeof(*iter));
+ ref_iterator = &iter->base;
+
+ base_ref_iterator_init(ref_iterator, &files_reflog_iterator_vtable, 0);
+ iter->dir_iterator = diter;
iter->ref_store = ref_store;
strbuf_release(&sb);

diff --git a/t/helper/test-dir-iterator.c b/t/helper/test-dir-iterator.c
index 84f50bed8c..fab1ff6237 100644
--- a/t/helper/test-dir-iterator.c
+++ b/t/helper/test-dir-iterator.c
@@ -17,6 +17,11 @@ int cmd__dir_iterator(int argc, const char **argv)

diter = dir_iterator_begin(path.buf);

+ if (!diter) {
+ printf("dir_iterator_begin failure: %d\n", errno);
+ exit(EXIT_FAILURE);
+ }
+
while (dir_iterator_advance(diter) == ITER_OK) {
if (S_ISDIR(diter->st.st_mode))
printf("[d] ");
diff --git a/t/t0066-dir-iterator.sh b/t/t0066-dir-iterator.sh
index 6e06dc038d..c739ed7911 100755
--- a/t/t0066-dir-iterator.sh
+++ b/t/t0066-dir-iterator.sh
@@ -52,4 +52,17 @@ test_expect_success 'dir-iterator should list files in the correct order' '
test_cmp expected-pre-order-output actual-pre-order-output
'

+test_expect_success 'begin should fail upon inexistent paths' '
+ test_must_fail test-tool dir-iterator ./inexistent-path \
+ >actual-inexistent-path-output &&
+ echo "dir_iterator_begin failure: 2" >expected-inexistent-path-output &&
+ test_cmp expected-inexistent-path-output actual-inexistent-path-output
+'
+
+test_expect_success 'begin should fail upon non directory paths' '
+ test_must_fail test-tool dir-iterator ./dir/b >actual-non-dir-output &&
+ echo "dir_iterator_begin failure: 20" >expected-non-dir-output &&
+ test_cmp expected-non-dir-output actual-non-dir-output

Matheus Tavares

unread,
Jun 18, 2019, 7:29:26 PM6/18/19
to g...@vger.kernel.org, Thomas Gummerer, Ævar Arnfjörð Bjarmason, Christian Couder, Nguyễn Thái Ngọc Duy, SZEDER Gábor, Olga Telezhnaya, kerne...@googlegroups.com, Michael Haggerty, Daniel Ferreira, Ramsay Jones, Junio C Hamano
Add the possibility of giving flags to dir_iterator_begin to initialize
a dir-iterator with special options.

Currently possible flags are:
- DIR_ITERATOR_PEDANTIC, which makes dir_iterator_advance abort
immediately in the case of an error, instead of keep looking for the
next valid entry;
- DIR_ITERATOR_FOLLOW_SYMLINKS, which makes the iterator follow
symlinks and include linked directories' contents in the iteration.

These new flags will be used in a subsequent patch.

Also add tests for the flags' usage and adjust refs/files-backend.c to
the new dir_iterator_begin signature.

Signed-off-by: Matheus Tavares <matheus.b...@usp.br>
---
dir-iterator.c | 82 +++++++++++++++++++++++++------
dir-iterator.h | 51 ++++++++++++++-----
refs/files-backend.c | 2 +-
t/helper/test-dir-iterator.c | 34 ++++++++++---
t/t0066-dir-iterator.sh | 95 ++++++++++++++++++++++++++++++++++++
5 files changed, 229 insertions(+), 35 deletions(-)

diff --git a/dir-iterator.c b/dir-iterator.c
index 594fe4d67b..52db87bdc9 100644
--- a/dir-iterator.c
+++ b/dir-iterator.c
@@ -6,6 +6,9 @@
struct dir_iterator_level {
DIR *dir;

+ /* The inode number of this level's directory. */
+ ino_t ino;
+
/*
* The length of the directory part of path at this level
* (including a trailing '/'):
@@ -38,13 +41,16 @@ struct dir_iterator_int {
* that will be included in this iteration.
*/
struct dir_iterator_level *levels;
+
+ /* Combination of flags for this dir-iterator */
+ unsigned int flags;
};

/*
* Push a level in the iter stack and initialize it with information from
* the directory pointed by iter->base->path. It is assumed that this
* strbuf points to a valid directory path. Return 0 on success and -1
- * otherwise, leaving the stack unchanged.
+ * otherwise, setting errno accordingly and leaving the stack unchanged.
*/
static int push_level(struct dir_iterator_int *iter)
{
@@ -56,14 +62,17 @@ static int push_level(struct dir_iterator_int *iter)
if (!is_dir_sep(iter->base.path.buf[iter->base.path.len - 1]))
strbuf_addch(&iter->base.path, '/');
level->prefix_len = iter->base.path.len;
+ level->ino = iter->base.st.st_ino;

level->dir = opendir(iter->base.path.buf);
if (!level->dir) {
+ int saved_errno = errno;
if (errno != ENOENT) {
warning_errno("error opening directory '%s'",
iter->base.path.buf);
}
iter->levels_nr--;
+ errno = saved_errno;
return -1;
}

@@ -90,11 +99,13 @@ static int pop_level(struct dir_iterator_int *iter)
/*
* Populate iter->base with the necessary information on the next iteration
* entry, represented by the given dirent de. Return 0 on success and -1
- * otherwise.
+ * otherwise, setting errno accordingly.
*/
static int prepare_next_entry_data(struct dir_iterator_int *iter,
struct dirent *de)
{
+ int err, saved_errno;
+
strbuf_addstr(&iter->base.path, de->d_name);
/*
* We have to reset these because the path strbuf might have
@@ -105,12 +116,34 @@ static int prepare_next_entry_data(struct dir_iterator_int *iter,
iter->base.basename = iter->base.path.buf +
iter->levels[iter->levels_nr - 1].prefix_len;

- if (lstat(iter->base.path.buf, &iter->base.st)) {
- if (errno != ENOENT)
- warning_errno("failed to stat '%s'", iter->base.path.buf);
- return -1;
- }
+ if (iter->flags & DIR_ITERATOR_FOLLOW_SYMLINKS)
+ err = stat(iter->base.path.buf, &iter->base.st);
+ else
+ err = lstat(iter->base.path.buf, &iter->base.st);
+
+ saved_errno = errno;
+ if (err && errno != ENOENT)
+ warning_errno("failed to stat '%s'", iter->base.path.buf);
+
+ errno = saved_errno;
+ return err;
+}
+
+/*
+ * Look for a recursive symlink at iter->base.path pointing to any directory on
+ * the previous stack levels. If it is found, return 1. If not, return 0.
+ */
+static int find_recursive_symlinks(struct dir_iterator_int *iter)
+{
+ int i;
+
+ if (!(iter->flags & DIR_ITERATOR_FOLLOW_SYMLINKS) ||
+ !S_ISDIR(iter->base.st.st_mode))
+ return 0;

+ for (i = 0; i < iter->levels_nr; ++i)
+ if (iter->base.st.st_ino == iter->levels[i].ino)
+ return 1;
return 0;
}

@@ -119,11 +152,11 @@ int dir_iterator_advance(struct dir_iterator *dir_iterator)
struct dir_iterator_int *iter =
(struct dir_iterator_int *)dir_iterator;

- if (S_ISDIR(iter->base.st.st_mode)) {
- if (push_level(iter) && iter->levels_nr == 0) {
- /* Pushing the first level failed */
- return dir_iterator_abort(dir_iterator);
- }
+ if (S_ISDIR(iter->base.st.st_mode) && push_level(iter)) {
+ if (errno != ENOENT && iter->flags & DIR_ITERATOR_PEDANTIC)
+ goto error_out;
+ if (iter->levels_nr == 0)
+ goto error_out;
}

/* Loop until we find an entry that we can give back to the caller. */
@@ -137,22 +170,38 @@ int dir_iterator_advance(struct dir_iterator *dir_iterator)
de = readdir(level->dir);

if (!de) {
- if (errno)
+ if (errno) {
warning_errno("error reading directory '%s'",
iter->base.path.buf);
- else if (pop_level(iter) == 0)
+ if (iter->flags & DIR_ITERATOR_PEDANTIC)
+ goto error_out;
+ } else if (pop_level(iter) == 0) {
return dir_iterator_abort(dir_iterator);
+ }
continue;
}

if (is_dot_or_dotdot(de->d_name))
continue;

- if (prepare_next_entry_data(iter, de))
+ if (prepare_next_entry_data(iter, de)) {
+ if (errno != ENOENT && iter->flags & DIR_ITERATOR_PEDANTIC)
+ goto error_out;
continue;
+ }
+
+ if (find_recursive_symlinks(iter)) {
+ warning("ignoring recursive symlink at '%s'",
+ iter->base.path.buf);
+ continue;
+ }

return ITER_OK;
}
+
+error_out:
+ dir_iterator_abort(dir_iterator);
+ return ITER_ERROR;
}

int dir_iterator_abort(struct dir_iterator *dir_iterator)
@@ -178,7 +227,7 @@ int dir_iterator_abort(struct dir_iterator *dir_iterator)
return ITER_DONE;
}

-struct dir_iterator *dir_iterator_begin(const char *path)
+struct dir_iterator *dir_iterator_begin(const char *path, unsigned int flags)
{
struct dir_iterator_int *iter = xcalloc(1, sizeof(*iter));
struct dir_iterator *dir_iterator = &iter->base;
@@ -189,6 +238,7 @@ struct dir_iterator *dir_iterator_begin(const char *path)

ALLOC_GROW(iter->levels, 10, iter->levels_alloc);
iter->levels_nr = 0;
+ iter->flags = flags;

/*
* Note: stat already checks for NULL or empty strings and
diff --git a/dir-iterator.h b/dir-iterator.h
index 0822821e56..42751091a5 100644
--- a/dir-iterator.h
+++ b/dir-iterator.h
@@ -20,7 +20,8 @@
* A typical iteration looks like this:
*
* int ok;
- * struct iterator *iter = dir_iterator_begin(path);
+ * unsigned int flags = DIR_ITERATOR_PEDANTIC;
+ * struct dir_iterator *iter = dir_iterator_begin(path, flags);
*
* if (!iter)
* goto error_handler;
@@ -44,6 +45,25 @@
* dir_iterator_advance() again.
*/

+/*
+ * Flags for dir_iterator_begin:
+ *
+ * - DIR_ITERATOR_PEDANTIC: override dir-iterator's default behavior
+ * in case of an error at dir_iterator_advance(), which is to keep
+ * looking for a next valid entry. With this flag, resources are freed
+ * and ITER_ERROR is returned immediately. In both cases, a meaningful
+ * warning is emitted. Note: ENOENT errors are always ignored so that
+ * the API users may remove files during iteration.
+ *
+ * - DIR_ITERATOR_FOLLOW_SYMLINKS: make dir-iterator follow symlinks.
+ * i.e., linked directories' contents will be iterated over and
+ * iter->base.st will contain information on the referred files,
+ * not the symlinks themselves, which is the default behavior.
+ * Recursive symlinks are skipped with a warning and broken symlinks
+ * are ignored.
+ */
+#define DIR_ITERATOR_PEDANTIC (1 << 0)
+#define DIR_ITERATOR_FOLLOW_SYMLINKS (1 << 1)
+
struct dir_iterator {
/* The current path: */
struct strbuf path;
@@ -58,29 +78,38 @@ struct dir_iterator {
/* The current basename: */
const char *basename;

- /* The result of calling lstat() on path: */
+ /*
+ * The result of calling lstat() on path; or stat(), if the
+ * DIR_ITERATOR_FOLLOW_SYMLINKS flag was set at
+ * dir_iterator's initialization.
+ */
struct stat st;
};

/*
- * Start a directory iteration over path. On success, return a
- * dir_iterator that holds the internal state of the iteration.
- * In case of failure, return NULL and set errno accordingly.
+ * Start a directory iteration over path with the combination of
+ * options specified by flags. On success, return a dir_iterator
+ * that holds the internal state of the iteration. In case of
+ * failure, return NULL and set errno accordingly.
*
* The iteration includes all paths under path, not including path
* itself and not including "." or ".." entries.
*
- * path is the starting directory. An internal copy will be made.
+ * Parameters are:
+ * - path is the starting directory. An internal copy will be made.
+ * - flags is a combination of the possible flags to initialize a
+ * dir-iterator or 0 for default behavior.
*/
-struct dir_iterator *dir_iterator_begin(const char *path);
+struct dir_iterator *dir_iterator_begin(const char *path, unsigned int flags);

/*
* Advance the iterator to the first or next item and return ITER_OK.
* If the iteration is exhausted, free the dir_iterator and any
- * resources associated with it and return ITER_DONE. On error, free
- * dir_iterator and associated resources and return ITER_ERROR. It is
- * a bug to use iterator or call this function again after it has
- * returned ITER_DONE or ITER_ERROR.
+ * resources associated with it and return ITER_DONE.
+ *
+ * It is a bug to use iterator or call this function again after it
+ * has returned ITER_DONE or ITER_ERROR (which may be returned iff
+ * the DIR_ITERATOR_PEDANTIC flag was set).
*/
int dir_iterator_advance(struct dir_iterator *iterator);

diff --git a/refs/files-backend.c b/refs/files-backend.c
index 7ed81046d4..b1f8f53a09 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -2150,7 +2150,7 @@ static struct ref_iterator *reflog_iterator_begin(struct ref_store *ref_store,

strbuf_addf(&sb, "%s/logs", gitdir);

- diter = dir_iterator_begin(sb.buf);
+ diter = dir_iterator_begin(sb.buf, 0);
if(!diter)
return empty_ref_iterator_begin();

diff --git a/t/helper/test-dir-iterator.c b/t/helper/test-dir-iterator.c
index fab1ff6237..a5b96cb0dc 100644
--- a/t/helper/test-dir-iterator.c
+++ b/t/helper/test-dir-iterator.c
@@ -4,29 +4,44 @@
#include "iterator.h"
#include "dir-iterator.h"

-/* Argument is a directory path to iterate over */
+/*
+ * usage:
+ * tool-test dir-iterator [--follow-symlinks] [--pedantic] directory_path
+ */
int cmd__dir_iterator(int argc, const char **argv)
{
struct strbuf path = STRBUF_INIT;
struct dir_iterator *diter;
+ unsigned int flags = 0;
+ int iter_status;
+
+ for (++argv, --argc; *argv && starts_with(*argv, "--"); ++argv, --argc) {
+ if (strcmp(*argv, "--follow-symlinks") == 0)
+ flags |= DIR_ITERATOR_FOLLOW_SYMLINKS;
+ else if (strcmp(*argv, "--pedantic") == 0)
+ flags |= DIR_ITERATOR_PEDANTIC;
+ else
+ die("invalid option '%s'", *argv);
+ }

- if (argc < 2)
- die("BUG: test-dir-iterator needs one argument");
-
- strbuf_add(&path, argv[1], strlen(argv[1]));
+ if (!*argv || argc != 1)
+ die("dir-iterator needs exactly one non-option argument");

- diter = dir_iterator_begin(path.buf);
+ strbuf_add(&path, *argv, strlen(*argv));
+ diter = dir_iterator_begin(path.buf, flags);

if (!diter) {
printf("dir_iterator_begin failure: %d\n", errno);
exit(EXIT_FAILURE);
}

- while (dir_iterator_advance(diter) == ITER_OK) {
+ while ((iter_status = dir_iterator_advance(diter)) == ITER_OK) {
if (S_ISDIR(diter->st.st_mode))
printf("[d] ");
else if (S_ISREG(diter->st.st_mode))
printf("[f] ");
+ else if (S_ISLNK(diter->st.st_mode))
+ printf("[s] ");
else
printf("[?] ");

@@ -34,5 +49,10 @@ int cmd__dir_iterator(int argc, const char **argv)
diter->path.buf);
}

+ if (iter_status != ITER_DONE) {
+ printf("dir_iterator_advance failure\n");
+ return 1;
+ }
+
return 0;
}
diff --git a/t/t0066-dir-iterator.sh b/t/t0066-dir-iterator.sh
index c739ed7911..8f996a31fa 100755
--- a/t/t0066-dir-iterator.sh
+++ b/t/t0066-dir-iterator.sh
+'
+
+test_expect_success SYMLINKS 'setup dirs with symlinks' '
+ mkdir -p dir4/a &&
+ mkdir -p dir4/b/c &&
+ >dir4/a/d &&
+ ln -s d dir4/a/e &&
+ ln -s ../b dir4/a/f &&
+
+ mkdir -p dir5/a/b &&
+ mkdir -p dir5/a/c &&
+ ln -s ../c dir5/a/b/d &&
+ ln -s ../ dir5/a/b/e &&
+ ln -s ../../ dir5/a/b/f
+'
+
+test_expect_success SYMLINKS 'dir-iterator should not follow symlinks by default' '
+ cat >expected-no-follow-sorted-output <<-EOF &&
+ [d] (a) [a] ./dir4/a
+ [d] (b) [b] ./dir4/b
+ [d] (b/c) [c] ./dir4/b/c
+ [f] (a/d) [d] ./dir4/a/d
+ [s] (a/e) [e] ./dir4/a/e
+ [s] (a/f) [f] ./dir4/a/f
+ EOF
+
+ test-tool dir-iterator ./dir4 >out &&
+ sort <out >actual-no-follow-sorted-output &&
+
+ test_cmp expected-no-follow-sorted-output actual-no-follow-sorted-output
+'
+
+test_expect_success SYMLINKS 'dir-iterator should follow symlinks w/ follow flag' '
+ cat >expected-follow-sorted-output <<-EOF &&
+ [d] (a) [a] ./dir4/a
+ [d] (a/f) [f] ./dir4/a/f
+ [d] (a/f/c) [c] ./dir4/a/f/c
+ [d] (b) [b] ./dir4/b
+ [d] (b/c) [c] ./dir4/b/c
+ [f] (a/d) [d] ./dir4/a/d
+ [f] (a/e) [e] ./dir4/a/e
+ EOF
+
+ test-tool dir-iterator --follow-symlinks ./dir4 >out &&
+ sort <out >actual-follow-sorted-output &&
+
+ test_cmp expected-follow-sorted-output actual-follow-sorted-output
+'
+
+
+test_expect_success SYMLINKS 'dir-iterator should ignore recursive symlinks w/ follow flag' '
+ cat >expected-rec-symlinks-sorted-output <<-EOF &&
+ [d] (a) [a] ./dir5/a
+ [d] (a/b) [b] ./dir5/a/b
+ [d] (a/b/d) [d] ./dir5/a/b/d
+ [d] (a/c) [c] ./dir5/a/c
+ EOF
+
+ test-tool dir-iterator --follow-symlinks ./dir5 >out &&
+ sort <out >actual-rec-symlinks-sorted-output &&
+
+ test_cmp expected-rec-symlinks-sorted-output actual-rec-symlinks-sorted-output

Matheus Tavares

unread,
Jun 18, 2019, 7:29:31 PM6/18/19
to g...@vger.kernel.org, Thomas Gummerer, Ævar Arnfjörð Bjarmason, Christian Couder, Nguyễn Thái Ngọc Duy, SZEDER Gábor, Olga Telezhnaya, kerne...@googlegroups.com, Jeff King, Junio C Hamano
Make the copy_or_link_directory function no longer skip hidden
directories. This function, used to copy .git/objects, currently skips
all hidden directories but not hidden files, which is an odd behaviour.
The reason for that could be unintentional: probably the intention was
to skip '.' and '..' only but it ended up accidentally skipping all
directories starting with '.'. Besides being more natural, the new
behaviour is more permissive to the user.

Also adjust tests to reflect this behaviour change.

Signed-off-by: Matheus Tavares <matheus.b...@usp.br>
Signed-off-by: Ævar Arnfjörð Bjarmason <ava...@gmail.com>
Co-authored-by: Ævar Arnfjörð Bjarmason <ava...@gmail.com>
---
builtin/clone.c | 2 +-
t/t5604-clone-reference.sh | 9 +++++++++
2 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/builtin/clone.c b/builtin/clone.c
index 4a0a2455a7..9dd083e34d 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -430,7 +430,7 @@ static void copy_or_link_directory(struct strbuf *src, struct strbuf *dest,
continue;
}
if (S_ISDIR(buf.st_mode)) {
- if (de->d_name[0] != '.')
+ if (!is_dot_or_dotdot(de->d_name))
copy_or_link_directory(src, dest,
src_repo, src_baselen);
continue;
diff --git a/t/t5604-clone-reference.sh b/t/t5604-clone-reference.sh
index 0800c3853f..c3998f2f9e 100755
--- a/t/t5604-clone-reference.sh
+++ b/t/t5604-clone-reference.sh
@@ -247,16 +247,25 @@ test_expect_success 'clone a repo with garbage in objects/*' '
done &&
find S-* -name "*some*" | sort >actual &&
cat >expected <<-EOF &&
+ S--dissociate/.git/objects/.some-hidden-dir
+ S--dissociate/.git/objects/.some-hidden-dir/.some-dot-file
+ S--dissociate/.git/objects/.some-hidden-dir/some-file
S--dissociate/.git/objects/.some-hidden-file
S--dissociate/.git/objects/some-dir
S--dissociate/.git/objects/some-dir/.some-dot-file
S--dissociate/.git/objects/some-dir/some-file
S--dissociate/.git/objects/some-file
+ S--local/.git/objects/.some-hidden-dir
+ S--local/.git/objects/.some-hidden-dir/.some-dot-file
+ S--local/.git/objects/.some-hidden-dir/some-file
S--local/.git/objects/.some-hidden-file
S--local/.git/objects/some-dir
S--local/.git/objects/some-dir/.some-dot-file
S--local/.git/objects/some-dir/some-file
S--local/.git/objects/some-file
+ S--no-hardlinks/.git/objects/.some-hidden-dir
+ S--no-hardlinks/.git/objects/.some-hidden-dir/.some-dot-file
+ S--no-hardlinks/.git/objects/.some-hidden-dir/some-file
S--no-hardlinks/.git/objects/.some-hidden-file
S--no-hardlinks/.git/objects/some-dir
S--no-hardlinks/.git/objects/some-dir/.some-dot-file
--
2.22.0

Matheus Tavares

unread,
Jun 18, 2019, 7:29:37 PM6/18/19
to g...@vger.kernel.org, Thomas Gummerer, Ævar Arnfjörð Bjarmason, Christian Couder, Nguyễn Thái Ngọc Duy, SZEDER Gábor, Olga Telezhnaya, kerne...@googlegroups.com, Jeff King, Junio C Hamano
Extract dir creation code snippet from copy_or_link_directory to its own
function named mkdir_if_missing. This change will help to remove
copy_or_link_directory's explicit recursion, which will be done in a
following patch. Also makes the code more readable.

Signed-off-by: Matheus Tavares <matheus.b...@usp.br>
---
builtin/clone.c | 24 ++++++++++++++++--------
1 file changed, 16 insertions(+), 8 deletions(-)

diff --git a/builtin/clone.c b/builtin/clone.c
index 9dd083e34d..96566c1bab 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -394,6 +394,21 @@ static void copy_alternates(struct strbuf *src, const char *src_repo)
fclose(in);
}

+static void mkdir_if_missing(const char *pathname, mode_t mode)
+{
+ struct stat st;
+
+ if (!mkdir(pathname, mode))
+ return;
+
+ if (errno != EEXIST)
+ die_errno(_("failed to create directory '%s'"), pathname);
+ else if (stat(pathname, &st))
+ die_errno(_("failed to stat '%s'"), pathname);
+ else if (!S_ISDIR(st.st_mode))
+ die(_("%s exists and is not a directory"), pathname);
+}
+
static void copy_or_link_directory(struct strbuf *src, struct strbuf *dest,
const char *src_repo, int src_baselen)
{
@@ -406,14 +421,7 @@ static void copy_or_link_directory(struct strbuf *src, struct strbuf *dest,
if (!dir)
die_errno(_("failed to open '%s'"), src->buf);

- if (mkdir(dest->buf, 0777)) {
- if (errno != EEXIST)
- die_errno(_("failed to create directory '%s'"), dest->buf);
- else if (stat(dest->buf, &buf))
- die_errno(_("failed to stat '%s'"), dest->buf);
- else if (!S_ISDIR(buf.st_mode))
- die(_("%s exists and is not a directory"), dest->buf);
- }
+ mkdir_if_missing(dest->buf, 0777);

strbuf_addch(src, '/');
src_len = src->len;
--
2.22.0

Matheus Tavares

unread,
Jun 18, 2019, 7:29:42 PM6/18/19
to g...@vger.kernel.org, Thomas Gummerer, Ævar Arnfjörð Bjarmason, Christian Couder, Nguyễn Thái Ngọc Duy, SZEDER Gábor, Olga Telezhnaya, kerne...@googlegroups.com, Junio C Hamano, Jeff King
Replace usage of opendir/readdir/closedir API to traverse directories
recursively, at copy_or_link_directory function, by the dir-iterator
API. This simplifies the code and avoids recursive calls to
copy_or_link_directory.

This process also makes copy_or_link_directory call die() in case of an
error on readdir or stat inside dir_iterator_advance. Previously it
would just print a warning for errors on stat and ignore errors on
readdir, which isn't nice because a local git clone could succeed even
though the .git/objects copy didn't fully succeed. Also, with the
dir-iterator API, recursive symlinks will be detected and skipped. This
is another behavior improvement, since the current version would
continue to copy the same content over and over until stat() returned an
ELOOP error.

Signed-off-by: Matheus Tavares <matheus.b...@usp.br>
---
builtin/clone.c | 47 +++++++++++++++++++++++++----------------------
1 file changed, 25 insertions(+), 22 deletions(-)

diff --git a/builtin/clone.c b/builtin/clone.c
index 96566c1bab..47cb4a2a8e 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -23,6 +23,8 @@
#include "transport.h"
#include "strbuf.h"
#include "dir.h"
+#include "dir-iterator.h"
+#include "iterator.h"
#include "sigchain.h"
#include "branch.h"
#include "remote.h"
@@ -410,42 +412,39 @@ static void mkdir_if_missing(const char *pathname, mode_t mode)
}

static void copy_or_link_directory(struct strbuf *src, struct strbuf *dest,
- const char *src_repo, int src_baselen)
+ const char *src_repo)
{
- struct dirent *de;
- struct stat buf;
int src_len, dest_len;
- DIR *dir;
-
- dir = opendir(src->buf);
- if (!dir)
- die_errno(_("failed to open '%s'"), src->buf);
+ struct dir_iterator *iter;
+ int iter_status;
+ unsigned int flags;

mkdir_if_missing(dest->buf, 0777);

+ flags = DIR_ITERATOR_PEDANTIC | DIR_ITERATOR_FOLLOW_SYMLINKS;
+ iter = dir_iterator_begin(src->buf, flags);
+
+ if (!iter)
+ die_errno(_("failed to start iterator over '%s'"), src->buf);
+
strbuf_addch(src, '/');
src_len = src->len;
strbuf_addch(dest, '/');
dest_len = dest->len;

- while ((de = readdir(dir)) != NULL) {
+ while ((iter_status = dir_iterator_advance(iter)) == ITER_OK) {
strbuf_setlen(src, src_len);
- strbuf_addstr(src, de->d_name);
+ strbuf_addstr(src, iter->relative_path);
strbuf_setlen(dest, dest_len);
- strbuf_addstr(dest, de->d_name);
- if (stat(src->buf, &buf)) {
- warning (_("failed to stat %s\n"), src->buf);
- continue;
- }
- if (S_ISDIR(buf.st_mode)) {
- if (!is_dot_or_dotdot(de->d_name))
- copy_or_link_directory(src, dest,
- src_repo, src_baselen);
+ strbuf_addstr(dest, iter->relative_path);
+
+ if (S_ISDIR(iter->st.st_mode)) {
+ mkdir_if_missing(dest->buf, 0777);
continue;
}

/* Files that cannot be copied bit-for-bit... */
- if (!strcmp(src->buf + src_baselen, "/info/alternates")) {
+ if (!strcmp(iter->relative_path, "info/alternates")) {
copy_alternates(src, src_repo);
continue;
}
@@ -462,7 +461,11 @@ static void copy_or_link_directory(struct strbuf *src, struct strbuf *dest,
if (copy_file_with_time(dest->buf, src->buf, 0666))
die_errno(_("failed to copy file to '%s'"), dest->buf);
}
- closedir(dir);
+
+ if (iter_status != ITER_DONE) {
+ strbuf_setlen(src, src_len);
+ die(_("failed to iterate over '%s'"), src->buf);
+ }
}

static void clone_local(const char *src_repo, const char *dest_repo)
@@ -480,7 +483,7 @@ static void clone_local(const char *src_repo, const char *dest_repo)
get_common_dir(&dest, dest_repo);
strbuf_addstr(&src, "/objects");
strbuf_addstr(&dest, "/objects");
- copy_or_link_directory(&src, &dest, src_repo, src.len);
+ copy_or_link_directory(&src, &dest, src_repo);
strbuf_release(&src);
strbuf_release(&dest);
}
--
2.22.0

Matheus Tavares

unread,
Jun 18, 2019, 7:29:47 PM6/18/19
to g...@vger.kernel.org, Thomas Gummerer, Ævar Arnfjörð Bjarmason, Christian Couder, Nguyễn Thái Ngọc Duy, SZEDER Gábor, Olga Telezhnaya, kerne...@googlegroups.com, Jeff King, Junio C Hamano
Replace the use of strcmp by fspathcmp at copy_or_link_directory, which
is more permissive/friendly to case-insensitive file systems.

Suggested-by: Nguyễn Thái Ngọc Duy <pcl...@gmail.com>
Signed-off-by: Matheus Tavares <matheus.b...@usp.br>
---
builtin/clone.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/builtin/clone.c b/builtin/clone.c
index 47cb4a2a8e..8da696ef30 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -444,7 +444,7 @@ static void copy_or_link_directory(struct strbuf *src, struct strbuf *dest,
}

/* Files that cannot be copied bit-for-bit... */
- if (!strcmp(iter->relative_path, "info/alternates")) {
+ if (!fspathcmp(iter->relative_path, "info/alternates")) {
copy_alternates(src, src_repo);
continue;
}
--
2.22.0

Matheus Tavares Bernardino

unread,
Jun 19, 2019, 12:36:52 AM6/19/19
to git, Thomas Gummerer, Ævar Arnfjörð Bjarmason, Christian Couder, Nguyễn Thái Ngọc Duy, SZEDER Gábor, Olga Telezhnaya, Kernel USP, Michael Haggerty, Daniel Ferreira
I got ahead of myself in this last paragraph. ".git/logs" is one of the dirs
that files-backend.c is used to iterate over, but it doesn't mean it's the only
one. This dir, in particular, is iterated when we run 'git rev-list
--reflog', for
example. And upon ENOENTs, the iteration is expected to end
successfully but with no entries.

(also adding Michael and Daniel to CC, in case they have some input on
these ideas)

Junio C Hamano

unread,
Jun 20, 2019, 4:18:23 PM6/20/19
to Matheus Tavares, g...@vger.kernel.org, Thomas Gummerer, Ævar Arnfjörð Bjarmason, Christian Couder, Nguyễn Thái Ngọc Duy, SZEDER Gábor, Olga Telezhnaya, kerne...@googlegroups.com
A higher level question is what's the benefit of using dir-iterator
API in the first place. After subtracting 356 added lines to t/,
it still adds 279 lines while removing only 163 lines, so it is not
like "we have a perfect dir-iterator API that can be applied as-is
but an older code that predates dir-iterator API was still using an
old way, so let's make the latter use the former."


Matheus Tavares Bernardino

unread,
Jun 21, 2019, 9:41:44 AM6/21/19
to Junio C Hamano, git, Thomas Gummerer, Ævar Arnfjörð Bjarmason, Christian Couder, Nguyễn Thái Ngọc Duy, SZEDER Gábor, Olga Telezhnaya, Kernel USP
Yes, indeed the dir-iterator API didn't nicely fit in clone without
some tweaking. Yet I think most of those line additions were not only
to adjust the API, but also trying to improve both dir-iterator and
local clone (I should have maybe split those changes into other
patchsets, though). For example, these changes make local clone better
handle possible symlinks and hidden files at git dir. And the API
changes should make it easier to apply it as-is in other sections of
the codebase from now on.

As for the benefit of using the API here, I think it mainly resides in
the security it brings, avoiding recursive iteration (even though it
should be shallow in local clone) and more carefully handling
symlinks.

Junio C Hamano

unread,
Jun 25, 2019, 2:00:24 PM6/25/19
to Matheus Tavares, g...@vger.kernel.org, Thomas Gummerer, Ævar Arnfjörð Bjarmason, Christian Couder, Nguyễn Thái Ngọc Duy, SZEDER Gábor, Olga Telezhnaya, kerne...@googlegroups.com, Michael Haggerty, Daniel Ferreira, Ramsay Jones
Matheus Tavares <matheus.b...@usp.br> writes:

This hunk, which claims to have 25 lines in the postimage ...
... adds 20 lines, making the postimage 26 lines long.

Did you hand edit your patch? It is OK to do so, as long as you
know what you are doing ;-). Adjust the length of the postimage on
the @@ ... @@ line to make it consistent with the patch text, and
also make sure a tweak you do here won't make later patches not
apply.

Matheus Tavares Bernardino

unread,
Jun 25, 2019, 2:11:48 PM6/25/19
to Junio C Hamano, git, Thomas Gummerer, Ævar Arnfjörð Bjarmason, Christian Couder, Nguyễn Thái Ngọc Duy, SZEDER Gábor, Olga Telezhnaya, Kernel USP, Michael Haggerty, Daniel Ferreira, Ramsay Jones
Oh, I'm sorry for that, I'll be more careful with hand editing next
time. Thanks for the advice. I think for this time it won't affect the
later patches as it was a minor addition at one comment, but should I
perhaps re-send it?

Johannes Schindelin

unread,
Jun 26, 2019, 9:34:38 AM6/26/19
to Matheus Tavares, g...@vger.kernel.org, Thomas Gummerer, Ævar Arnfjörð Bjarmason, Christian Couder, Nguyễn Thái Ngọc Duy, SZEDER Gábor, Olga Telezhnaya, kerne...@googlegroups.com, Michael Haggerty, Daniel Ferreira, Ramsay Jones, Junio C Hamano
Hi Matheus,

On Tue, 18 Jun 2019, Matheus Tavares wrote:

>[...]
> +/*
> + * Look for a recursive symlink at iter->base.path pointing to any directory on
> + * the previous stack levels. If it is found, return 1. If not, return 0.
> + */
> +static int find_recursive_symlinks(struct dir_iterator_int *iter)
> +{
> + int i;
> +
> + if (!(iter->flags & DIR_ITERATOR_FOLLOW_SYMLINKS) ||
> + !S_ISDIR(iter->base.st.st_mode))
> + return 0;
>
> + for (i = 0; i < iter->levels_nr; ++i)
> + if (iter->base.st.st_ino == iter->levels[i].ino)

This does not work on Windows. Remember, Git relies on (too) many areas
where Linux is strong, and the `lstat()` call is one of them. Therefore,
Git overuses that call.

In the Git for Windows project, we struggled a bit to emulate it in the
best way.

It is pretty expensive, for example, to find out the number of hard
links, the device ID, an equivalent of the inode, etc. Many `lstat()`
calls are really only interested in the `mtime`, though, meaning that we
would waste a ton of time if we tried to be more faithful in our `lstat()`
emulation.

Therefore, we simply assign `0` as inode.

Sure, this violates the POSIX standard, but imagine this: the FAT
filesystem (which is still in use!) does not have _anything_ resembling
inodes.

I fear, therefore, that we will require at least a workaround for the
situation where `st_ino` is always zero.

Ciao,
Johannes

Junio C Hamano

unread,
Jun 26, 2019, 2:04:58 PM6/26/19
to Johannes Schindelin, Matheus Tavares, g...@vger.kernel.org, Thomas Gummerer, Ævar Arnfjörð Bjarmason, Christian Couder, Nguyễn Thái Ngọc Duy, SZEDER Gábor, Olga Telezhnaya, kerne...@googlegroups.com, Michael Haggerty, Daniel Ferreira, Ramsay Jones
Johannes Schindelin <Johannes....@gmx.de> writes:

> Hi Matheus,
>
> On Tue, 18 Jun 2019, Matheus Tavares wrote:
>
>>[...]
>> +/*
>> + * Look for a recursive symlink at iter->base.path pointing to any directory on
>> + * the previous stack levels. If it is found, return 1. If not, return 0.
>> + */
>> +static int find_recursive_symlinks(struct dir_iterator_int *iter)
>> +{
>> + int i;
>> +
>> + if (!(iter->flags & DIR_ITERATOR_FOLLOW_SYMLINKS) ||
>> + !S_ISDIR(iter->base.st.st_mode))
>> + return 0;
>>
>> + for (i = 0; i < iter->levels_nr; ++i)
>> + if (iter->base.st.st_ino == iter->levels[i].ino)
>
> This does not work on Windows. [[ Windows port does not have
> usable st_ino field ]]]

And if you cross mountpoint, st_ino alone does not guarantee
uniqueness; you'd need to combine it with st_dev, I would think,
even on POSIX systems.

Duy Nguyen

unread,
Jun 27, 2019, 5:21:18 AM6/27/19
to Junio C Hamano, Johannes Schindelin, Matheus Tavares, Git Mailing List, Thomas Gummerer, Ævar Arnfjörð Bjarmason, Christian Couder, SZEDER Gábor, Olga Telezhnaya, kerne...@googlegroups.com, Michael Haggerty, Daniel Ferreira, Ramsay Jones
which should be protected by USE_STDEV. There's another code that
ignore st_ino on Windows in entry.c. Maybe it's time to define
USE_STINO instead of spreading "#if GIT_WINDOWS_NATIVE" more.
--
Duy

Matheus Tavares Bernardino

unread,
Jun 27, 2019, 1:23:32 PM6/27/19
to Junio C Hamano, Johannes Schindelin, git, Thomas Gummerer, Ævar Arnfjörð Bjarmason, Christian Couder, Nguyễn Thái Ngọc Duy, SZEDER Gábor, Olga Telezhnaya, Kernel USP, Michael Haggerty, Daniel Ferreira, Ramsay Jones
Ok, thanks for letting me know. I'm trying to think of another
approach to test for recursive symlinks that does not rely on inode:
Given any symlink, we could get its real_path() and compare it with
the path of the directory current being iterated. If the first is a
prefix of the second, than we mark it as a recursive symlink.

What do you think of this idea?

Johannes Schindelin

unread,
Jun 27, 2019, 2:47:52 PM6/27/19
to Matheus Tavares Bernardino, Junio C Hamano, git, Thomas Gummerer, Ævar Arnfjörð Bjarmason, Christian Couder, Nguyễn Thái Ngọc Duy, SZEDER Gábor, Olga Telezhnaya, Kernel USP, Michael Haggerty, Daniel Ferreira, Ramsay Jones
Hi Matheus,
I think this would be pretty expensive. Too expensive.

A better method might be to rely on st_ino/st_dev when we can, and just
not bother looking for recursive symlinks when we cannot, like I did in
https://github.com/git-for-windows/git/commit/979b00ccf44ec31cff4686e24adf27474923c33a

Ciao,
Johannes

Matheus Tavares Bernardino

unread,
Jun 27, 2019, 3:34:10 PM6/27/19
to Johannes Schindelin, Junio C Hamano, git, Thomas Gummerer, Ævar Arnfjörð Bjarmason, Christian Couder, Nguyễn Thái Ngọc Duy, SZEDER Gábor, Olga Telezhnaya, Kernel USP, Michael Haggerty, Daniel Ferreira, Ramsay Jones
Hmm, yes unfortunately :(

> A better method might be to rely on st_ino/st_dev when we can, and just
> not bother looking for recursive symlinks when we cannot,

What if we fallback on the path prefix strategy when st_ino is not
available? I mean, if we don't look for recursive symlinks, they would
be iterated over and over until we get an ELOOP error. So I think
using real_path() should be less expensive in this case. (But just as
a fallback to st_ino, off course)
Nice! At dir-iterator.h the documentation says that recursive symlinks
will be ignored. If we don't implement any fallback, should we add
that this is not available on Windows, perhaps?

> Ciao,
> Johannes

Johannes Schindelin

unread,
Jun 28, 2019, 8:50:51 AM6/28/19
to Matheus Tavares Bernardino, Junio C Hamano, git, Thomas Gummerer, Ævar Arnfjörð Bjarmason, Christian Couder, Nguyễn Thái Ngọc Duy, SZEDER Gábor, Olga Telezhnaya, Kernel USP, Michael Haggerty, Daniel Ferreira, Ramsay Jones
I do not really care, unless it breaks things on Windows that were not
broken before.

You might also want to guard this behind `USE_STDEV` as Duy suggested (and
maybe use the opportunity to correct that constant to `USE_ST_DEV`; I
looked for it and did not find it because of that naming mistake).

Ciao,
Dscho

Matheus Tavares Bernardino

unread,
Jun 28, 2019, 10:16:49 AM6/28/19
to Johannes Schindelin, Junio C Hamano, git, Thomas Gummerer, Ævar Arnfjörð Bjarmason, Christian Couder, Nguyễn Thái Ngọc Duy, SZEDER Gábor, Olga Telezhnaya, Kernel USP, Michael Haggerty, Daniel Ferreira, Ramsay Jones
Ok, just to confirm, what I should do is send your fixup patch with
the USE_STDEV guard addition, right? Also, USE_STDEV docs says it is
used "from the update-index perspective", should I make it more
generic as we're using it for other purposes or is it OK like this?

Thanks,
Matheus

Johannes Schindelin

unread,
Jul 1, 2019, 8:15:16 AM7/1/19
to Matheus Tavares Bernardino, Junio C Hamano, git, Thomas Gummerer, Ævar Arnfjörð Bjarmason, Christian Couder, Nguyễn Thái Ngọc Duy, SZEDER Gábor, Olga Telezhnaya, Kernel USP, Michael Haggerty, Daniel Ferreira, Ramsay Jones
Hi Matheus,
I thought Duy had verified that `USE_STDEV` would make sense in this
instance, but I agree with you that the idea of that compile time flag was
not to guard against a missing `st_dev` field, but about trusting it in
the presence of network filesystems.

So no, I revert my vote for using `USE_STDEV`.

Thanks for the sanity check.

Ciao,
Dscho

SZEDER Gábor

unread,
Jul 3, 2019, 4:57:30 AM7/3/19
to Matheus Tavares, g...@vger.kernel.org, Thomas Gummerer, Ævar Arnfjörð Bjarmason, Christian Couder, Nguyễn Thái Ngọc Duy, Olga Telezhnaya, kerne...@googlegroups.com, Michael Haggerty, Daniel Ferreira, Ramsay Jones, Junio C Hamano
> diff --git a/t/t0066-dir-iterator.sh b/t/t0066-dir-iterator.sh
> index c739ed7911..8f996a31fa 100755
> --- a/t/t0066-dir-iterator.sh
> +++ b/t/t0066-dir-iterator.sh
> @@ -65,4 +65,99 @@ test_expect_success 'begin should fail upon non directory paths' '
> test_cmp expected-non-dir-output actual-non-dir-output
> '
>
> +test_expect_success POSIXPERM,SANITY 'advance should not fail on errors by default' '
> + cat >expected-no-permissions-output <<-EOF &&
> + [d] (a) [a] ./dir3/a
> + EOF
> +
> + mkdir -p dir3/a &&
> + > dir3/a/b &&

Style nit: space between redirection op and pathname.

> + chmod 0 dir3/a &&
> +
> + test-tool dir-iterator ./dir3 >actual-no-permissions-output &&
> + test_cmp expected-no-permissions-output actual-no-permissions-output &&
> + chmod 755 dir3/a &&
> + rm -rf dir3
> +'
> +
> +test_expect_success POSIXPERM,SANITY 'advance should fail on errors, w/ pedantic flag' '
> + cat >expected-no-permissions-pedantic-output <<-EOF &&
> + [d] (a) [a] ./dir3/a
> + dir_iterator_advance failure
> + EOF
> +
> + mkdir -p dir3/a &&
> + > dir3/a/b &&

Likewise.
Unnecessary redirection, 'sort' is capable to open the file on its
own.

> +
> + test_cmp expected-no-follow-sorted-output actual-no-follow-sorted-output
> +'
> +
> +test_expect_success SYMLINKS 'dir-iterator should follow symlinks w/ follow flag' '
> + cat >expected-follow-sorted-output <<-EOF &&
> + [d] (a) [a] ./dir4/a
> + [d] (a/f) [f] ./dir4/a/f
> + [d] (a/f/c) [c] ./dir4/a/f/c
> + [d] (b) [b] ./dir4/b
> + [d] (b/c) [c] ./dir4/b/c
> + [f] (a/d) [d] ./dir4/a/d
> + [f] (a/e) [e] ./dir4/a/e
> + EOF
> +
> + test-tool dir-iterator --follow-symlinks ./dir4 >out &&
> + sort <out >actual-follow-sorted-output &&

Likewise.

> + test_cmp expected-follow-sorted-output actual-follow-sorted-output
> +'
> +
> +
> +test_expect_success SYMLINKS 'dir-iterator should ignore recursive symlinks w/ follow flag' '
> + cat >expected-rec-symlinks-sorted-output <<-EOF &&
> + [d] (a) [a] ./dir5/a
> + [d] (a/b) [b] ./dir5/a/b
> + [d] (a/b/d) [d] ./dir5/a/b/d
> + [d] (a/c) [c] ./dir5/a/c
> + EOF
> +
> + test-tool dir-iterator --follow-symlinks ./dir5 >out &&
> + sort <out >actual-rec-symlinks-sorted-output &&

Likewise.

Matheus Tavares Bernardino

unread,
Jul 8, 2019, 6:21:54 PM7/8/19
to SZEDER Gábor, git, Thomas Gummerer, Ævar Arnfjörð Bjarmason, Christian Couder, Nguyễn Thái Ngọc Duy, Olga Telezhnaya, Kernel USP, Michael Haggerty, Daniel Ferreira, Ramsay Jones, Junio C Hamano
Thanks for the review. I'll address those issues in v8.

Best,
Matheus

Matheus Tavares

unread,
Jul 10, 2019, 7:59:21 PM7/10/19
to Junio C Hamano, g...@vger.kernel.org, Thomas Gummerer, Ævar Arnfjörð Bjarmason, Christian Couder, Nguyễn Thái Ngọc Duy, SZEDER Gábor, Olga Telezhnaya, Johannes Schindelin, kerne...@googlegroups.com
This patchset contains:
- tests to the dir-iterator API;
- dir-iterator refactoring to make its state machine simpler
and feature adding with tests;
- a replacement of explicit recursive dir iteration at
copy_or_link_directory for the dir-iterator API;
- some refactoring and behavior changes at local clone, mainly to
take care of symlinks and hidden files at .git/objects, together
with tests for these types of files.

Changes since v7[1]:
- Applied some style fixes at tests, as suggested by SZEDER
- Removed the code to find circular symlinks as suggested in this[2]
thread. The way it was previously implemented wouldn't work on Windows.
So Dscho suggested me to remove this section until we come up with a
more portable implementation.

[1]: https://public-inbox.org/git/cover.1560898723.gi...@usp.br/
[2]: https://public-inbox.org/git/nycvar.QRO.7.76.6...@tvgsbejvaqbjf.bet/
travis build: https://travis-ci.org/matheustavares/git/builds/557047597

Daniel Ferreira (1):
dir-iterator: add tests for dir-iterator API

Matheus Tavares (8):
clone: better handle symlinked files at .git/objects/
dir-iterator: use warning_errno when possible
dir-iterator: refactor state machine model
dir-iterator: add flags parameter to dir_iterator_begin
clone: copy hidden paths at local clone
clone: extract function from copy_or_link_directory
clone: use dir-iterator to avoid explicit dir traversal
clone: replace strcmp by fspathcmp

Ævar Arnfjörð Bjarmason (1):
clone: test for our behavior on odd objects/* content

Makefile | 1 +
builtin/clone.c | 75 +++++-----
dir-iterator.c | 263 ++++++++++++++++++++---------------
dir-iterator.h | 64 +++++++--
refs/files-backend.c | 17 ++-
t/helper/test-dir-iterator.c | 58 ++++++++
t/helper/test-tool.c | 1 +
t/helper/test-tool.h | 1 +
t/t0066-dir-iterator.sh | 148 ++++++++++++++++++++
t/t5604-clone-reference.sh | 133 ++++++++++++++++++
10 files changed, 597 insertions(+), 164 deletions(-)
create mode 100644 t/helper/test-dir-iterator.c
create mode 100755 t/t0066-dir-iterator.sh

Range-diff against v7:
1: 437b1eb1c7 ! 1: a2016d9d3b clone: test for our behavior on odd objects/* content
@@ -98,7 +98,7 @@
+ mv $last_loose a-loose-dir &&
+ ln -s a-loose-dir $last_loose &&
+ find . -type f | sort >../../../T.objects-files.raw &&
-+ echo unknown_content> unknown_file
++ echo unknown_content >unknown_file
+ ) &&
+ git -C T fsck &&
+ git -C T rev-list --all --objects >T.objects
2: 108bea2652 ! 2: 47a4f9b31c clone: better handle symlinked files at .git/objects/
@@ -80,7 +80,7 @@
+ cd ../ &&
find . -type f | sort >../../../T.objects-files.raw &&
+ find . -type l | sort >../../../T.objects-symlinks.raw &&
- echo unknown_content> unknown_file
+ echo unknown_content >unknown_file
) &&
git -C T fsck &&
@@
3: 2c0232be6c ! 3: bbce6a601b dir-iterator: add tests for dir-iterator API
@@ -129,7 +129,7 @@
+ EOF
+
+ test-tool dir-iterator ./dir >out &&
-+ sort <out >./actual-iteration-sorted-output &&
++ sort out >./actual-iteration-sorted-output &&
+
+ test_cmp expected-iteration-sorted-output actual-iteration-sorted-output
+'
4: 0b76044165 = 4: 0cc5f1f0b4 dir-iterator: use warning_errno when possible
5: 44c47d579c ! 5: f871b5d3f4 dir-iterator: refactor state machine model
@@ -340,14 +340,14 @@
* A typical iteration looks like this:
*
* int ok;
- * struct iterator *iter = dir_iterator_begin(path);
- *
+- * struct iterator *iter = dir_iterator_begin(path);
++ * struct dir_iterator *iter = dir_iterator_begin(path);
++ *
+ * if (!iter)
+ * goto error_handler;
-+ *
+ *
* while ((ok = dir_iterator_advance(iter)) == ITER_OK) {
* if (want_to_stop_iteration()) {
- * ok = dir_iterator_abort(iter);
@@
};

6: 86fc04ad0e ! 6: fe838d7eb4 dir-iterator: add flags parameter to dir_iterator_begin
@@ -22,16 +22,6 @@
diff --git a/dir-iterator.c b/dir-iterator.c
--- a/dir-iterator.c
+++ b/dir-iterator.c
-@@
- struct dir_iterator_level {
- DIR *dir;
-
-+ /* The inode number of this level's directory. */
-+ ino_t ino;
-+
- /*
- * The length of the directory part of path at this level
- * (including a trailing '/'):
@@
* that will be included in this iteration.
*/
@@ -51,10 +41,6 @@
static int push_level(struct dir_iterator_int *iter)
{
@@
- if (!is_dir_sep(iter->base.path.buf[iter->base.path.len - 1]))
- strbuf_addch(&iter->base.path, '/');
- level->prefix_len = iter->base.path.len;
-+ level->ino = iter->base.st.st_ino;

level->dir = opendir(iter->base.path.buf);
if (!level->dir) {
@@ -96,33 +82,17 @@
+ err = stat(iter->base.path.buf, &iter->base.st);
+ else
+ err = lstat(iter->base.path.buf, &iter->base.st);
-+
+
+- return 0;
+ saved_errno = errno;
+ if (err && errno != ENOENT)
+ warning_errno("failed to stat '%s'", iter->base.path.buf);
+
+ errno = saved_errno;
+ return err;
-+}
-+
-+/*
-+ * Look for a recursive symlink at iter->base.path pointing to any directory on
-+ * the previous stack levels. If it is found, return 1. If not, return 0.
-+ */
-+static int find_recursive_symlinks(struct dir_iterator_int *iter)
-+{
-+ int i;
-+
-+ if (!(iter->flags & DIR_ITERATOR_FOLLOW_SYMLINKS) ||
-+ !S_ISDIR(iter->base.st.st_mode))
-+ return 0;
-
-+ for (i = 0; i < iter->levels_nr; ++i)
-+ if (iter->base.st.st_ino == iter->levels[i].ino)
-+ return 1;
- return 0;
}

+ int dir_iterator_advance(struct dir_iterator *dir_iterator)
@@
struct dir_iterator_int *iter =
(struct dir_iterator_int *)dir_iterator;
@@ -165,12 +135,6 @@
+ if (errno != ENOENT && iter->flags & DIR_ITERATOR_PEDANTIC)
+ goto error_out;
continue;
-+ }
-+
-+ if (find_recursive_symlinks(iter)) {
-+ warning("ignoring recursive symlink at '%s'",
-+ iter->base.path.buf);
-+ continue;
+ }

return ITER_OK;
@@ -207,7 +171,7 @@
* A typical iteration looks like this:
*
* int ok;
-- * struct iterator *iter = dir_iterator_begin(path);
+- * struct dir_iterator *iter = dir_iterator_begin(path);
+ * unsigned int flags = DIR_ITERATOR_PEDANTIC;
+ * struct dir_iterator *iter = dir_iterator_begin(path, flags);
*
@@ -230,9 +194,12 @@
+ * - DIR_ITERATOR_FOLLOW_SYMLINKS: make dir-iterator follow symlinks.
+ * i.e., linked directories' contents will be iterated over and
+ * iter->base.st will contain information on the referred files,
-+ * not the symlinks themselves, which is the default behavior.
-+ * Recursive symlinks are skipped with a warning and broken symlinks
-+ * are ignored.
++ * not the symlinks themselves, which is the default behavior. Broken
++ * symlinks are ignored.
++ *
++ * Warning: circular symlinks are also followed when
++ * DIR_ITERATOR_FOLLOW_SYMLINKS is set. The iteration may end up with
++ * an ELOOP if they happen and DIR_ITERATOR_PEDANTIC is set.
+ */
+#define DIR_ITERATOR_PEDANTIC (1 << 0)
+#define DIR_ITERATOR_FOLLOW_SYMLINKS (1 << 1)
@@ -383,7 +350,7 @@
+ EOF
+
+ mkdir -p dir3/a &&
-+ > dir3/a/b &&
++ >dir3/a/b &&
+ chmod 0 dir3/a &&
+
+ test-tool dir-iterator ./dir3 >actual-no-permissions-output &&
@@ -399,7 +366,7 @@
+ EOF
+
+ mkdir -p dir3/a &&
-+ > dir3/a/b &&
++ >dir3/a/b &&
+ chmod 0 dir3/a &&
+
+ test_must_fail test-tool dir-iterator --pedantic ./dir3 \
@@ -435,7 +402,7 @@
+ EOF
+
+ test-tool dir-iterator ./dir4 >out &&
-+ sort <out >actual-no-follow-sorted-output &&
++ sort out >actual-no-follow-sorted-output &&
+
+ test_cmp expected-no-follow-sorted-output actual-no-follow-sorted-output
+'
@@ -452,24 +419,9 @@
+ EOF
+
+ test-tool dir-iterator --follow-symlinks ./dir4 >out &&
-+ sort <out >actual-follow-sorted-output &&
++ sort out >actual-follow-sorted-output &&
+
+ test_cmp expected-follow-sorted-output actual-follow-sorted-output
+'
-+
-+
-+test_expect_success SYMLINKS 'dir-iterator should ignore recursive symlinks w/ follow flag' '
-+ cat >expected-rec-symlinks-sorted-output <<-EOF &&
-+ [d] (a) [a] ./dir5/a
-+ [d] (a/b) [b] ./dir5/a/b
-+ [d] (a/b/d) [d] ./dir5/a/b/d
-+ [d] (a/c) [c] ./dir5/a/c
-+ EOF
-+
-+ test-tool dir-iterator --follow-symlinks ./dir5 >out &&
-+ sort <out >actual-rec-symlinks-sorted-output &&
-+
-+ test_cmp expected-rec-symlinks-sorted-output actual-rec-symlinks-sorted-output
-+'
+
test_done
7: 17685057cd = 7: 3da6408e04 clone: copy hidden paths at local clone
8: c7f3a8640e = 8: af7430eb2c clone: extract function from copy_or_link_directory
9: 7934036d30 ! 9: e8308c7408 clone: use dir-iterator to avoid explicit dir traversal
@@ -11,11 +11,7 @@
error on readdir or stat inside dir_iterator_advance. Previously it
would just print a warning for errors on stat and ignore errors on
readdir, which isn't nice because a local git clone could succeed even
- though the .git/objects copy didn't fully succeed. Also, with the
- dir-iterator API, recursive symlinks will be detected and skipped. This
- is another behavior improvement, since the current version would
- continue to copy the same content over and over until stat() returned an
- ELOOP error.
+ though the .git/objects copy didn't fully succeed.

Signed-off-by: Matheus Tavares <matheus.b...@usp.br>

10: 2e25c03c07 = 10: 782ca07eed clone: replace strcmp by fspathcmp
--
2.22.0

Matheus Tavares

unread,
Jul 10, 2019, 7:59:35 PM7/10/19
to Junio C Hamano, g...@vger.kernel.org, Thomas Gummerer, Ævar Arnfjörð Bjarmason, Christian Couder, Nguyễn Thái Ngọc Duy, SZEDER Gábor, Olga Telezhnaya, Johannes Schindelin, kerne...@googlegroups.com, Alex Riesen
From: Ævar Arnfjörð Bjarmason <ava...@gmail.com>

Add tests for what happens when we perform a local clone on a repo
containing odd files at .git/object directory, such as symlinks to other
dirs, or unknown files.

I'm bending over backwards here to avoid a SHA-1 dependency. See [1]
for an earlier and simpler version that hardcoded SHA-1s.

This behavior has been the same for a *long* time, but hasn't been
tested for.

There's a good post-hoc argument to be made for copying over unknown
things, e.g. I'd like a git version that doesn't know about the
commit-graph to copy it under "clone --local" so a newer git version
can make use of it.

In follow-up commits we'll look at changing some of this behavior, but
for now, let's just assert it as-is so we'll notice what we'll change
later.

1. https://public-inbox.org/git/20190226002625...@gmail.com/

Signed-off-by: Ævar Arnfjörð Bjarmason <ava...@gmail.com>
[matheus.bernardino: improved and split tests in more than one patch]
Helped-by: Matheus Tavares <matheus.b...@usp.br>
Signed-off-by: Matheus Tavares <matheus.b...@usp.br>
---
t/t5604-clone-reference.sh | 111 +++++++++++++++++++++++++++++++++++++
1 file changed, 111 insertions(+)

diff --git a/t/t5604-clone-reference.sh b/t/t5604-clone-reference.sh
index 4320082b1b..11250cab40 100755
--- a/t/t5604-clone-reference.sh
+++ b/t/t5604-clone-reference.sh
@@ -221,4 +221,115 @@ test_expect_success 'clone, dissociate from alternates' '
( cd C && git fsck )
'

+test_expect_success 'setup repo with garbage in objects/*' '
+ git init S &&
+ (
+ cd S &&
+ test_commit A &&
+
+ cd .git/objects &&
+ >.some-hidden-file &&
+ >some-file &&
+ mkdir .some-hidden-dir &&
+ >.some-hidden-dir/some-file &&
+ >.some-hidden-dir/.some-dot-file &&
+ mkdir some-dir &&
+ >some-dir/some-file &&
+ >some-dir/.some-dot-file
+ )
+'
+
+test_expect_success 'clone a repo with garbage in objects/*' '
+ for option in --local --no-hardlinks --shared --dissociate
+ do
+ git clone $option S S$option || return 1 &&
+ git -C S$option fsck || return 1
+ done &&
+ find S-* -name "*some*" | sort >actual &&
+ cat >expected <<-EOF &&
+ S--dissociate/.git/objects/.some-hidden-file
+ S--dissociate/.git/objects/some-dir
+ S--dissociate/.git/objects/some-dir/.some-dot-file
+ S--dissociate/.git/objects/some-dir/some-file
+ S--dissociate/.git/objects/some-file
+ S--local/.git/objects/.some-hidden-file
+ S--local/.git/objects/some-dir
+ S--local/.git/objects/some-dir/.some-dot-file
+ S--local/.git/objects/some-dir/some-file
+ S--local/.git/objects/some-file
+ S--no-hardlinks/.git/objects/.some-hidden-file
+ S--no-hardlinks/.git/objects/some-dir
+ S--no-hardlinks/.git/objects/some-dir/.some-dot-file
+ S--no-hardlinks/.git/objects/some-dir/some-file
+ S--no-hardlinks/.git/objects/some-file
+ EOF
+ test_cmp expected actual
+'
+
+test_expect_success SYMLINKS 'setup repo with manually symlinked dirs and unknown files at objects/' '
+ git init T &&
+ (
+ cd T &&
+ git config gc.auto 0 &&
+ test_commit A &&
+ git gc &&
+ test_commit B &&
+
+ cd .git/objects &&
+ mv pack packs &&
+ ln -s packs pack &&
+ find ?? -type d >loose-dirs &&
+ last_loose=$(tail -n 1 loose-dirs) &&
+ rm -f loose-dirs &&
+ mv $last_loose a-loose-dir &&
+ ln -s a-loose-dir $last_loose &&
+ find . -type f | sort >../../../T.objects-files.raw &&
+ echo unknown_content >unknown_file
+ ) &&
+ git -C T fsck &&
+ git -C T rev-list --all --objects >T.objects
+'
+
+
+test_expect_success SYMLINKS 'clone repo with symlinked dirs and unknown files at objects/' '
+ for option in --local --no-hardlinks --shared --dissociate
+ do
+ git clone $option T T$option || return 1 &&
+ git -C T$option fsck || return 1 &&
+ git -C T$option rev-list --all --objects >T$option.objects &&
+ test_cmp T.objects T$option.objects &&
+ (
+ cd T$option/.git/objects &&
+ find . -type f | sort >../../../T$option.objects-files.raw
+ )
+ done &&
+
+ for raw in $(ls T*.raw)
+ do
+ sed -e "s!/../!/Y/!; s![0-9a-f]\{38,\}!Z!" -e "/commit-graph/d" \
+ -e "/multi-pack-index/d" <$raw >$raw.de-sha || return 1
+ done &&
+
+ cat >expected-files <<-EOF &&
+ ./Y/Z
+ ./Y/Z
+ ./a-loose-dir/Z
+ ./Y/Z
+ ./info/packs
+ ./pack/pack-Z.idx
+ ./pack/pack-Z.pack
+ ./packs/pack-Z.idx
+ ./packs/pack-Z.pack
+ ./unknown_file
+ EOF
+
+ for option in --local --dissociate --no-hardlinks
+ do
+ test_cmp expected-files T$option.objects-files.raw.de-sha || return 1
+ done &&
+
+ echo ./info/alternates >expected-files &&
+ test_cmp expected-files T--shared.objects-files.raw

Matheus Tavares

unread,
Jul 10, 2019, 7:59:47 PM7/10/19
to Junio C Hamano, g...@vger.kernel.org, Thomas Gummerer, Ævar Arnfjörð Bjarmason, Christian Couder, Nguyễn Thái Ngọc Duy, SZEDER Gábor, Olga Telezhnaya, Johannes Schindelin, kerne...@googlegroups.com
There is currently an odd behaviour when locally cloning a repository
with symlinks at .git/objects: using --no-hardlinks all symlinks are
dereferenced but without it, Git will try to hardlink the files with the
link() function, which has an OS-specific behaviour on symlinks. On OSX
and NetBSD, it creates a hardlink to the file pointed by the symlink
whilst on GNU/Linux, it creates a hardlink to the symlink itself.

On Manjaro GNU/Linux:
$ touch a
$ ln -s a b
$ link b c
$ ls -li a b c
155 [...] a
156 [...] b -> a
156 [...] c -> a

But on NetBSD:
$ ls -li a b c
2609160 [...] a
2609164 [...] b -> a
2609160 [...] c

It's not good to have the result of a local clone to be OS-dependent and
besides that, the current behaviour on GNU/Linux may result in broken
symlinks. So let's standardize this by making the hardlinks always point
to dereferenced paths, instead of the symlinks themselves. Also, add
tests for symlinked files at .git/objects/.

Note: Git won't create symlinks at .git/objects itself, but it's better
to handle this case and be friendly with users who manually create them.

Signed-off-by: Matheus Tavares <matheus.b...@usp.br>
Signed-off-by: Ævar Arnfjörð Bjarmason <ava...@gmail.com>
Co-authored-by: Ævar Arnfjörð Bjarmason <ava...@gmail.com>
---
builtin/clone.c | 2 +-
t/t5604-clone-reference.sh | 27 ++++++++++++++++++++-------
2 files changed, 21 insertions(+), 8 deletions(-)

diff --git a/builtin/clone.c b/builtin/clone.c
index 5b9ebe9947..4a0a2455a7 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -445,7 +445,7 @@ static void copy_or_link_directory(struct strbuf *src, struct strbuf *dest,
if (unlink(dest->buf) && errno != ENOENT)
die_errno(_("failed to unlink '%s'"), dest->buf);
if (!option_no_hardlinks) {
- if (!link(src->buf, dest->buf))
+ if (!link(real_path(src->buf), dest->buf))
continue;
if (option_local > 0)
die_errno(_("failed to create link '%s'"), dest->buf);
diff --git a/t/t5604-clone-reference.sh b/t/t5604-clone-reference.sh
index 11250cab40..459ad8a20b 100755
--- a/t/t5604-clone-reference.sh
+++ b/t/t5604-clone-reference.sh
@@ -266,7 +266,7 @@ test_expect_success 'clone a repo with garbage in objects/*' '
test_cmp expected actual
'

-test_expect_success SYMLINKS 'setup repo with manually symlinked dirs and unknown files at objects/' '
+test_expect_success SYMLINKS 'setup repo with manually symlinked or unknown files at objects/' '
git init T &&
(
cd T &&
@@ -280,10 +280,19 @@ test_expect_success SYMLINKS 'setup repo with manually symlinked dirs and unknow
ln -s packs pack &&
find ?? -type d >loose-dirs &&
last_loose=$(tail -n 1 loose-dirs) &&
- rm -f loose-dirs &&
mv $last_loose a-loose-dir &&
ln -s a-loose-dir $last_loose &&
+ first_loose=$(head -n 1 loose-dirs) &&
+ rm -f loose-dirs &&
+
+ cd $first_loose &&
+ obj=$(ls *) &&
+ mv $obj ../an-object &&
+ ln -s ../an-object $obj &&
+
+ cd ../ &&
find . -type f | sort >../../../T.objects-files.raw &&
+ find . -type l | sort >../../../T.objects-symlinks.raw &&
echo unknown_content >unknown_file
) &&
git -C T fsck &&
@@ -291,7 +300,7 @@ test_expect_success SYMLINKS 'setup repo with manually symlinked dirs and unknow
'


-test_expect_success SYMLINKS 'clone repo with symlinked dirs and unknown files at objects/' '
+test_expect_success SYMLINKS 'clone repo with symlinked or unknown files at objects/' '
for option in --local --no-hardlinks --shared --dissociate
do
git clone $option T T$option || return 1 &&
@@ -300,7 +309,8 @@ test_expect_success SYMLINKS 'clone repo with symlinked dirs and unknown files a
test_cmp T.objects T$option.objects &&
(
cd T$option/.git/objects &&
- find . -type f | sort >../../../T$option.objects-files.raw
+ find . -type f | sort >../../../T$option.objects-files.raw &&
+ find . -type l | sort >../../../T$option.objects-symlinks.raw
)
done &&

@@ -314,6 +324,7 @@ test_expect_success SYMLINKS 'clone repo with symlinked dirs and unknown files a
./Y/Z
./Y/Z
./a-loose-dir/Z
+ ./an-object
./Y/Z
./info/packs
./pack/pack-Z.idx
@@ -323,13 +334,15 @@ test_expect_success SYMLINKS 'clone repo with symlinked dirs and unknown files a
./unknown_file
EOF

- for option in --local --dissociate --no-hardlinks
+ for option in --local --no-hardlinks --dissociate
do
- test_cmp expected-files T$option.objects-files.raw.de-sha || return 1
+ test_cmp expected-files T$option.objects-files.raw.de-sha || return 1 &&
+ test_must_be_empty T$option.objects-symlinks.raw.de-sha || return 1
done &&

echo ./info/alternates >expected-files &&
- test_cmp expected-files T--shared.objects-files.raw
+ test_cmp expected-files T--shared.objects-files.raw &&
+ test_must_be_empty T--shared.objects-symlinks.raw
'

test_done
--
2.22.0

Matheus Tavares

unread,
Jul 10, 2019, 7:59:54 PM7/10/19
to Junio C Hamano, g...@vger.kernel.org, Thomas Gummerer, Ævar Arnfjörð Bjarmason, Christian Couder, Nguyễn Thái Ngọc Duy, SZEDER Gábor, Olga Telezhnaya, Johannes Schindelin, kerne...@googlegroups.com, Daniel Ferreira
From: Daniel Ferreira <bnm...@gmail.com>

Create t/helper/test-dir-iterator.c, which prints relevant information
about a directory tree iterated over with dir-iterator.

Create t/t0066-dir-iterator.sh, which tests that dir-iterator does
iterate through a whole directory tree as expected.

Signed-off-by: Daniel Ferreira <bnm...@gmail.com>
[matheus.bernardino: update to use test-tool and some minor aesthetics]
Helped-by: Matheus Tavares <matheus.b...@usp.br>
Signed-off-by: Matheus Tavares <matheus.b...@usp.br>
---
Makefile | 1 +
t/helper/test-dir-iterator.c | 33 ++++++++++++++++++++++
t/helper/test-tool.c | 1 +
t/helper/test-tool.h | 1 +
t/t0066-dir-iterator.sh | 55 ++++++++++++++++++++++++++++++++++++
5 files changed, 91 insertions(+)
create mode 100644 t/helper/test-dir-iterator.c
create mode 100755 t/t0066-dir-iterator.sh

diff --git a/Makefile b/Makefile
index f58bf14c7b..7e2a44cccc 100644
--- a/Makefile
+++ b/Makefile
@@ -704,6 +704,7 @@ TEST_BUILTINS_OBJS += test-config.o
TEST_BUILTINS_OBJS += test-ctype.o
TEST_BUILTINS_OBJS += test-date.o
TEST_BUILTINS_OBJS += test-delta.o
+TEST_BUILTINS_OBJS += test-dir-iterator.o
TEST_BUILTINS_OBJS += test-drop-caches.o
TEST_BUILTINS_OBJS += test-dump-cache-tree.o
TEST_BUILTINS_OBJS += test-dump-fsmonitor.o
diff --git a/t/helper/test-dir-iterator.c b/t/helper/test-dir-iterator.c
new file mode 100644
index 0000000000..84f50bed8c
--- /dev/null
+++ b/t/helper/test-dir-iterator.c
@@ -0,0 +1,33 @@
+#include "test-tool.h"
+#include "git-compat-util.h"
+#include "strbuf.h"
+#include "iterator.h"
+#include "dir-iterator.h"
+
+/* Argument is a directory path to iterate over */
+int cmd__dir_iterator(int argc, const char **argv)
+{
+ struct strbuf path = STRBUF_INIT;
+ struct dir_iterator *diter;
+
+ if (argc < 2)
+ die("BUG: test-dir-iterator needs one argument");
+
+ strbuf_add(&path, argv[1], strlen(argv[1]));
+
+ diter = dir_iterator_begin(path.buf);
+
+ while (dir_iterator_advance(diter) == ITER_OK) {
+ if (S_ISDIR(diter->st.st_mode))
+ printf("[d] ");
+ else if (S_ISREG(diter->st.st_mode))
+ printf("[f] ");
+ else
+ printf("[?] ");
+
+ printf("(%s) [%s] %s\n", diter->relative_path, diter->basename,
+ diter->path.buf);
+ }
+
+ return 0;
+}
diff --git a/t/helper/test-tool.c b/t/helper/test-tool.c
index 087a8c0cc9..7bc9bb231e 100644
--- a/t/helper/test-tool.c
+++ b/t/helper/test-tool.c
@@ -19,6 +19,7 @@ static struct test_cmd cmds[] = {
{ "ctype", cmd__ctype },
{ "date", cmd__date },
{ "delta", cmd__delta },
+ { "dir-iterator", cmd__dir_iterator },
{ "drop-caches", cmd__drop_caches },
{ "dump-cache-tree", cmd__dump_cache_tree },
{ "dump-fsmonitor", cmd__dump_fsmonitor },
diff --git a/t/helper/test-tool.h b/t/helper/test-tool.h
index 7e703f3038..ec0ffbd0cb 100644
--- a/t/helper/test-tool.h
+++ b/t/helper/test-tool.h
@@ -9,6 +9,7 @@ int cmd__config(int argc, const char **argv);
int cmd__ctype(int argc, const char **argv);
int cmd__date(int argc, const char **argv);
int cmd__delta(int argc, const char **argv);
+int cmd__dir_iterator(int argc, const char **argv);
int cmd__drop_caches(int argc, const char **argv);
int cmd__dump_cache_tree(int argc, const char **argv);
int cmd__dump_fsmonitor(int argc, const char **argv);
diff --git a/t/t0066-dir-iterator.sh b/t/t0066-dir-iterator.sh
new file mode 100755
index 0000000000..59bce868f4
--- /dev/null
+++ b/t/t0066-dir-iterator.sh
@@ -0,0 +1,55 @@
+#!/bin/sh
+
+test_description='Test the dir-iterator functionality'
+
+. ./test-lib.sh
+
+test_expect_success 'setup' '
+ mkdir -p dir &&
+ mkdir -p dir/a/b/c/ &&
+ >dir/b &&
+ >dir/c &&
+ mkdir -p dir/d/e/d/ &&
+ >dir/a/b/c/d &&
+ >dir/a/e &&
+ >dir/d/e/d/a &&
+
+ mkdir -p dir2/a/b/c/ &&
+ >dir2/a/b/c/d
+'
+
+test_expect_success 'dir-iterator should iterate through all files' '
+ cat >expected-iteration-sorted-output <<-EOF &&
+ [d] (a) [a] ./dir/a
+ [d] (a/b) [b] ./dir/a/b
+ [d] (a/b/c) [c] ./dir/a/b/c
+ [d] (d) [d] ./dir/d
+ [d] (d/e) [e] ./dir/d/e
+ [d] (d/e/d) [d] ./dir/d/e/d
+ [f] (a/b/c/d) [d] ./dir/a/b/c/d
+ [f] (a/e) [e] ./dir/a/e
+ [f] (b) [b] ./dir/b
+ [f] (c) [c] ./dir/c
+ [f] (d/e/d/a) [a] ./dir/d/e/d/a
+ EOF
+
+ test-tool dir-iterator ./dir >out &&
+ sort out >./actual-iteration-sorted-output &&
+
+ test_cmp expected-iteration-sorted-output actual-iteration-sorted-output
+'
+
+test_expect_success 'dir-iterator should list files in the correct order' '
+ cat >expected-pre-order-output <<-EOF &&
+ [d] (a) [a] ./dir2/a
+ [d] (a/b) [b] ./dir2/a/b
+ [d] (a/b/c) [c] ./dir2/a/b/c
+ [f] (a/b/c/d) [d] ./dir2/a/b/c/d
+ EOF
+
+ test-tool dir-iterator ./dir2 >actual-pre-order-output &&
+
+ test_cmp expected-pre-order-output actual-pre-order-output
+'
+
+test_done
--
2.22.0

Matheus Tavares

unread,
Jul 10, 2019, 8:00:01 PM7/10/19
to Junio C Hamano, g...@vger.kernel.org, Thomas Gummerer, Ævar Arnfjörð Bjarmason, Christian Couder, Nguyễn Thái Ngọc Duy, SZEDER Gábor, Olga Telezhnaya, Johannes Schindelin, kerne...@googlegroups.com, Michael Haggerty
Change warning(..., strerror(errno)) by warning_errno(...). This helps
to unify warning display besides simplifying a bit the code. Also,
improve warning messages by surrounding paths with quotation marks and
using more meaningful statements.

Signed-off-by: Matheus Tavares <matheus.b...@usp.br>
---
dir-iterator.c | 23 ++++++++++++-----------
1 file changed, 12 insertions(+), 11 deletions(-)

diff --git a/dir-iterator.c b/dir-iterator.c
index f2dcd82fde..0c8880868a 100644
--- a/dir-iterator.c
+++ b/dir-iterator.c
@@ -71,8 +71,8 @@ int dir_iterator_advance(struct dir_iterator *dir_iterator)

level->dir = opendir(iter->base.path.buf);
if (!level->dir && errno != ENOENT) {
- warning("error opening directory %s: %s",
- iter->base.path.buf, strerror(errno));
+ warning_errno("error opening directory '%s'",
+ iter->base.path.buf);
/* Popping the level is handled below */
}

@@ -122,11 +122,11 @@ int dir_iterator_advance(struct dir_iterator *dir_iterator)
if (!de) {
/* This level is exhausted; pop up a level. */
if (errno) {
- warning("error reading directory %s: %s",
- iter->base.path.buf, strerror(errno));
+ warning_errno("error reading directory '%s'",
+ iter->base.path.buf);
} else if (closedir(level->dir))
- warning("error closing directory %s: %s",
- iter->base.path.buf, strerror(errno));
+ warning_errno("error closing directory '%s'",
+ iter->base.path.buf);

level->dir = NULL;
if (--iter->levels_nr == 0)
@@ -140,9 +140,8 @@ int dir_iterator_advance(struct dir_iterator *dir_iterator)
strbuf_addstr(&iter->base.path, de->d_name);
if (lstat(iter->base.path.buf, &iter->base.st) < 0) {
if (errno != ENOENT)
- warning("error reading path '%s': %s",
- iter->base.path.buf,
- strerror(errno));
+ warning_errno("failed to stat '%s'",
+ iter->base.path.buf);
continue;
}

@@ -170,9 +169,11 @@ int dir_iterator_abort(struct dir_iterator *dir_iterator)
&iter->levels[iter->levels_nr - 1];

if (level->dir && closedir(level->dir)) {
+ int saved_errno = errno;
strbuf_setlen(&iter->base.path, level->prefix_len);
- warning("error closing directory %s: %s",
- iter->base.path.buf, strerror(errno));
+ errno = saved_errno;
+ warning_errno("error closing directory '%s'",
+ iter->base.path.buf);
}
}

--
2.22.0

Matheus Tavares

unread,
Jul 10, 2019, 8:00:09 PM7/10/19
to Junio C Hamano, g...@vger.kernel.org, Thomas Gummerer, Ævar Arnfjörð Bjarmason, Christian Couder, Nguyễn Thái Ngọc Duy, SZEDER Gábor, Olga Telezhnaya, Johannes Schindelin, kerne...@googlegroups.com, Daniel Ferreira, Jeff King, Ramsay Jones, Michael Haggerty
dir_iterator_advance() is a large function with two nested loops. Let's
improve its readability factoring out three functions and simplifying
its mechanics. The refactored model will no longer depend on
level.initialized and level.dir_state to keep track of the iteration
state and will perform on a single loop.

Also, dir_iterator_begin() currently does not check if the given string
represents a valid directory path. Since the refactored model will have
to stat() the given path at initialization, let's also check for this
kind of error and make dir_iterator_begin() return NULL, on failures,
with errno appropriately set. And add tests for this new behavior.

Improve documentation at dir-iteration.h and code comments at
dir-iterator.c to reflect the changes and eliminate possible
ambiguities.

Finally, adjust refs/files-backend.c to check for now possible
dir_iterator_begin() failures.

Original-patch-by: Daniel Ferreira <bnm...@gmail.com>
Signed-off-by: Matheus Tavares <matheus.b...@usp.br>
---
dir-iterator.c | 234 ++++++++++++++++++-----------------
dir-iterator.h | 17 ++-
refs/files-backend.c | 17 ++-
t/helper/test-dir-iterator.c | 5 +
t/t0066-dir-iterator.sh | 13 ++
5 files changed, 164 insertions(+), 122 deletions(-)

diff --git a/dir-iterator.c b/dir-iterator.c
index 0c8880868a..594fe4d67b 100644
--- a/dir-iterator.c
+++ b/dir-iterator.c
@@ -4,8 +4,6 @@
#include "dir-iterator.h"

struct dir_iterator_level {
- int initialized;
-
DIR *dir;

/*
@@ -13,16 +11,6 @@ struct dir_iterator_level {
* (including a trailing '/'):
*/
size_t prefix_len;
-
- /*
- * The last action that has been taken with the current entry
- * (needed for directories, which have to be included in the
- * iteration and also iterated into):
- */
- enum {
- DIR_STATE_ITER,
- DIR_STATE_RECURSE
- } dir_state;
};

/*
@@ -34,9 +22,11 @@ struct dir_iterator_int {
struct dir_iterator base;

/*
- * The number of levels currently on the stack. This is always
- * at least 1, because when it becomes zero the iteration is
- * ended and this struct is freed.
+ * The number of levels currently on the stack. After the first
+ * call to dir_iterator_begin(), if it succeeds to open the
+ * first level's dir, this will always be at least 1. Then,
+ * when it comes to zero the iteration is ended and this
+ * struct is freed.
*/
size_t levels_nr;

@@ -50,113 +40,118 @@ struct dir_iterator_int {
struct dir_iterator_level *levels;
};

+/*
+ * Push a level in the iter stack and initialize it with information from
+ * the directory pointed by iter->base->path. It is assumed that this
+ * strbuf points to a valid directory path. Return 0 on success and -1
+ * otherwise, leaving the stack unchanged.
+ */
+static int push_level(struct dir_iterator_int *iter)
+{
+ struct dir_iterator_level *level;
+
+ ALLOC_GROW(iter->levels, iter->levels_nr + 1, iter->levels_alloc);
+ level = &iter->levels[iter->levels_nr++];
+
+ if (!is_dir_sep(iter->base.path.buf[iter->base.path.len - 1]))
+ strbuf_addch(&iter->base.path, '/');
+ level->prefix_len = iter->base.path.len;
+
+ level->dir = opendir(iter->base.path.buf);
+ if (!level->dir) {
+ if (errno != ENOENT) {
+ warning_errno("error opening directory '%s'",
+ iter->base.path.buf);
+ }
+ iter->levels_nr--;
+ return -1;
+ }
+
+ return 0;
+}
+
+/*
+ * Pop the top level on the iter stack, releasing any resources associated
+ * with it. Return the new value of iter->levels_nr.
+ */
+static int pop_level(struct dir_iterator_int *iter)
+{
+ struct dir_iterator_level *level =
+ &iter->levels[iter->levels_nr - 1];
+
+ if (level->dir && closedir(level->dir))
+ warning_errno("error closing directory '%s'",
+ iter->base.path.buf);
+ level->dir = NULL;
+
+ return --iter->levels_nr;
+}
+
+/*
+ * Populate iter->base with the necessary information on the next iteration
+ * entry, represented by the given dirent de. Return 0 on success and -1
+ * otherwise.
+ */
+static int prepare_next_entry_data(struct dir_iterator_int *iter,
+ struct dirent *de)
+{
+ strbuf_addstr(&iter->base.path, de->d_name);
+ /*
+ * We have to reset these because the path strbuf might have
+ * been realloc()ed at the previous strbuf_addstr().
+ */
+ iter->base.relative_path = iter->base.path.buf +
+ iter->levels[0].prefix_len;
+ iter->base.basename = iter->base.path.buf +
+ iter->levels[iter->levels_nr - 1].prefix_len;
+
+ if (lstat(iter->base.path.buf, &iter->base.st)) {
+ if (errno != ENOENT)
+ warning_errno("failed to stat '%s'", iter->base.path.buf);
+ return -1;
+ }
+
+ return 0;
+}
+
int dir_iterator_advance(struct dir_iterator *dir_iterator)
{
struct dir_iterator_int *iter =
(struct dir_iterator_int *)dir_iterator;

+ if (S_ISDIR(iter->base.st.st_mode)) {
+ if (push_level(iter) && iter->levels_nr == 0) {
+ /* Pushing the first level failed */
+ return dir_iterator_abort(dir_iterator);
+ }
+ }
+
+ /* Loop until we find an entry that we can give back to the caller. */
while (1) {
+ struct dirent *de;
struct dir_iterator_level *level =
&iter->levels[iter->levels_nr - 1];
- struct dirent *de;

- if (!level->initialized) {
- /*
- * Note: dir_iterator_begin() ensures that
- * path is not the empty string.
- */
- if (!is_dir_sep(iter->base.path.buf[iter->base.path.len - 1]))
- strbuf_addch(&iter->base.path, '/');
- level->prefix_len = iter->base.path.len;
-
- level->dir = opendir(iter->base.path.buf);
- if (!level->dir && errno != ENOENT) {
- warning_errno("error opening directory '%s'",
+ strbuf_setlen(&iter->base.path, level->prefix_len);
+ errno = 0;
+ de = readdir(level->dir);
+
+ if (!de) {
+ if (errno)
+ warning_errno("error reading directory '%s'",
iter->base.path.buf);
- /* Popping the level is handled below */
- }
-
- level->initialized = 1;
- } else if (S_ISDIR(iter->base.st.st_mode)) {
- if (level->dir_state == DIR_STATE_ITER) {
- /*
- * The directory was just iterated
- * over; now prepare to iterate into
- * it.
- */
- level->dir_state = DIR_STATE_RECURSE;
- ALLOC_GROW(iter->levels, iter->levels_nr + 1,
- iter->levels_alloc);
- level = &iter->levels[iter->levels_nr++];
- level->initialized = 0;
- continue;
- } else {
- /*
- * The directory has already been
- * iterated over and iterated into;
- * we're done with it.
- */
- }
+ else if (pop_level(iter) == 0)
+ return dir_iterator_abort(dir_iterator);
+ continue;
}

- if (!level->dir) {
- /*
- * This level is exhausted (or wasn't opened
- * successfully); pop up a level.
- */
- if (--iter->levels_nr == 0)
- return dir_iterator_abort(dir_iterator);
+ if (is_dot_or_dotdot(de->d_name))
+ continue;

+ if (prepare_next_entry_data(iter, de))
continue;
- }

- /*
- * Loop until we find an entry that we can give back
- * to the caller:
- */
- while (1) {
- strbuf_setlen(&iter->base.path, level->prefix_len);
- errno = 0;
- de = readdir(level->dir);
-
- if (!de) {
- /* This level is exhausted; pop up a level. */
- if (errno) {
- warning_errno("error reading directory '%s'",
- iter->base.path.buf);
- } else if (closedir(level->dir))
- warning_errno("error closing directory '%s'",
- iter->base.path.buf);
-
- level->dir = NULL;
- if (--iter->levels_nr == 0)
- return dir_iterator_abort(dir_iterator);
- break;
- }
-
- if (is_dot_or_dotdot(de->d_name))
- continue;
-
- strbuf_addstr(&iter->base.path, de->d_name);
- if (lstat(iter->base.path.buf, &iter->base.st) < 0) {
- if (errno != ENOENT)
- warning_errno("failed to stat '%s'",
- iter->base.path.buf);
- continue;
- }
-
- /*
- * We have to set these each time because
- * the path strbuf might have been realloc()ed.
- */
- iter->base.relative_path =
- iter->base.path.buf + iter->levels[0].prefix_len;
- iter->base.basename =
- iter->base.path.buf + level->prefix_len;
- level->dir_state = DIR_STATE_ITER;
-
- return ITER_OK;
- }
+ return ITER_OK;
}
}

@@ -187,17 +182,32 @@ struct dir_iterator *dir_iterator_begin(const char *path)
{
struct dir_iterator_int *iter = xcalloc(1, sizeof(*iter));
struct dir_iterator *dir_iterator = &iter->base;
-
- if (!path || !*path)
- BUG("empty path passed to dir_iterator_begin()");
+ int saved_errno;

strbuf_init(&iter->base.path, PATH_MAX);
strbuf_addstr(&iter->base.path, path);

ALLOC_GROW(iter->levels, 10, iter->levels_alloc);
+ iter->levels_nr = 0;

- iter->levels_nr = 1;
- iter->levels[0].initialized = 0;
+ /*
+ * Note: stat already checks for NULL or empty strings and
+ * inexistent paths.
+ */
+ if (stat(iter->base.path.buf, &iter->base.st) < 0) {
+ saved_errno = errno;
+ goto error_out;
+ }
+
+ if (!S_ISDIR(iter->base.st.st_mode)) {
+ saved_errno = ENOTDIR;
+ goto error_out;
+ }

return dir_iterator;
+
+error_out:
+ dir_iterator_abort(dir_iterator);
+ errno = saved_errno;
+ return NULL;
}
diff --git a/dir-iterator.h b/dir-iterator.h
index 970793d07a..9b4cb7acd2 100644
--- a/dir-iterator.h
+++ b/dir-iterator.h
@@ -8,18 +8,22 @@
*
* Iterate over a directory tree, recursively, including paths of all
* types and hidden paths. Skip "." and ".." entries and don't follow
- * symlinks except for the original path.
+ * symlinks except for the original path. Note that the original path
+ * is not included in the iteration.
*
* Every time dir_iterator_advance() is called, update the members of
* the dir_iterator structure to reflect the next path in the
* iteration. The order that paths are iterated over within a
- * directory is undefined, but directory paths are always iterated
- * over before the subdirectory contents.
+ * directory is undefined, directory paths are always given before
+ * their contents.
*
* A typical iteration looks like this:
*
* int ok;
- * struct iterator *iter = dir_iterator_begin(path);
+ * struct dir_iterator *iter = dir_iterator_begin(path);
+ *
+ * if (!iter)
+ * goto error_handler;
*
* while ((ok = dir_iterator_advance(iter)) == ITER_OK) {
* if (want_to_stop_iteration()) {
@@ -59,8 +63,9 @@ struct dir_iterator {
};

/*
- * Start a directory iteration over path. Return a dir_iterator that
- * holds the internal state of the iteration.
+ * Start a directory iteration over path. On success, return a
+ * dir_iterator that holds the internal state of the iteration.
+ * In case of failure, return NULL and set errno accordingly.
*
* The iteration includes all paths under path, not including path
* itself and not including "." or ".." entries.
diff --git a/refs/files-backend.c b/refs/files-backend.c
index 63e55e6773..7ed81046d4 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -2143,13 +2143,22 @@ static struct ref_iterator_vtable files_reflog_iterator_vtable = {
static struct ref_iterator *reflog_iterator_begin(struct ref_store *ref_store,
const char *gitdir)
{
- struct files_reflog_iterator *iter = xcalloc(1, sizeof(*iter));
- struct ref_iterator *ref_iterator = &iter->base;
+ struct dir_iterator *diter;
+ struct files_reflog_iterator *iter;
+ struct ref_iterator *ref_iterator;
struct strbuf sb = STRBUF_INIT;

- base_ref_iterator_init(ref_iterator, &files_reflog_iterator_vtable, 0);
strbuf_addf(&sb, "%s/logs", gitdir);
- iter->dir_iterator = dir_iterator_begin(sb.buf);
+
+ diter = dir_iterator_begin(sb.buf);
+ if(!diter)
+ return empty_ref_iterator_begin();
+
+ iter = xcalloc(1, sizeof(*iter));
+ ref_iterator = &iter->base;
+
+ base_ref_iterator_init(ref_iterator, &files_reflog_iterator_vtable, 0);
+ iter->dir_iterator = diter;
iter->ref_store = ref_store;
strbuf_release(&sb);

diff --git a/t/helper/test-dir-iterator.c b/t/helper/test-dir-iterator.c
index 84f50bed8c..fab1ff6237 100644
--- a/t/helper/test-dir-iterator.c
+++ b/t/helper/test-dir-iterator.c
@@ -17,6 +17,11 @@ int cmd__dir_iterator(int argc, const char **argv)

diter = dir_iterator_begin(path.buf);

+ if (!diter) {
+ printf("dir_iterator_begin failure: %d\n", errno);
+ exit(EXIT_FAILURE);
+ }
+
while (dir_iterator_advance(diter) == ITER_OK) {
if (S_ISDIR(diter->st.st_mode))
printf("[d] ");
diff --git a/t/t0066-dir-iterator.sh b/t/t0066-dir-iterator.sh
index 59bce868f4..cc4b19c34c 100755
--- a/t/t0066-dir-iterator.sh
+++ b/t/t0066-dir-iterator.sh
@@ -52,4 +52,17 @@ test_expect_success 'dir-iterator should list files in the correct order' '
test_cmp expected-pre-order-output actual-pre-order-output
'

+test_expect_success 'begin should fail upon inexistent paths' '
+ test_must_fail test-tool dir-iterator ./inexistent-path \
+ >actual-inexistent-path-output &&
+ echo "dir_iterator_begin failure: 2" >expected-inexistent-path-output &&
+ test_cmp expected-inexistent-path-output actual-inexistent-path-output
+'
+
+test_expect_success 'begin should fail upon non directory paths' '
+ test_must_fail test-tool dir-iterator ./dir/b >actual-non-dir-output &&
+ echo "dir_iterator_begin failure: 20" >expected-non-dir-output &&
+ test_cmp expected-non-dir-output actual-non-dir-output

Matheus Tavares

unread,
Jul 10, 2019, 8:00:18 PM7/10/19
to Junio C Hamano, g...@vger.kernel.org, Thomas Gummerer, Ævar Arnfjörð Bjarmason, Christian Couder, Nguyễn Thái Ngọc Duy, SZEDER Gábor, Olga Telezhnaya, Johannes Schindelin, kerne...@googlegroups.com, Michael Haggerty, Ramsay Jones, Daniel Ferreira
Add the possibility of giving flags to dir_iterator_begin to initialize
a dir-iterator with special options.

Currently possible flags are:
- DIR_ITERATOR_PEDANTIC, which makes dir_iterator_advance abort
immediately in the case of an error, instead of keep looking for the
next valid entry;
- DIR_ITERATOR_FOLLOW_SYMLINKS, which makes the iterator follow
symlinks and include linked directories' contents in the iteration.

These new flags will be used in a subsequent patch.

Also add tests for the flags' usage and adjust refs/files-backend.c to
the new dir_iterator_begin signature.

Signed-off-by: Matheus Tavares <matheus.b...@usp.br>
---
dir-iterator.c | 56 +++++++++++++++++--------
dir-iterator.h | 55 ++++++++++++++++++++-----
refs/files-backend.c | 2 +-
t/helper/test-dir-iterator.c | 34 +++++++++++----
t/t0066-dir-iterator.sh | 80 ++++++++++++++++++++++++++++++++++++
5 files changed, 191 insertions(+), 36 deletions(-)

diff --git a/dir-iterator.c b/dir-iterator.c
index 594fe4d67b..b17e9f970a 100644
--- a/dir-iterator.c
+++ b/dir-iterator.c
@@ -38,13 +38,16 @@ struct dir_iterator_int {
* that will be included in this iteration.
*/
struct dir_iterator_level *levels;
+
+ /* Combination of flags for this dir-iterator */
+ unsigned int flags;
};

/*
* Push a level in the iter stack and initialize it with information from
* the directory pointed by iter->base->path. It is assumed that this
* strbuf points to a valid directory path. Return 0 on success and -1
- * otherwise, leaving the stack unchanged.
+ * otherwise, setting errno accordingly and leaving the stack unchanged.
*/
static int push_level(struct dir_iterator_int *iter)
{
@@ -59,11 +62,13 @@ static int push_level(struct dir_iterator_int *iter)

level->dir = opendir(iter->base.path.buf);
if (!level->dir) {
+ int saved_errno = errno;
if (errno != ENOENT) {
warning_errno("error opening directory '%s'",
iter->base.path.buf);
}
iter->levels_nr--;
+ errno = saved_errno;
return -1;
}

@@ -90,11 +95,13 @@ static int pop_level(struct dir_iterator_int *iter)
/*
* Populate iter->base with the necessary information on the next iteration
* entry, represented by the given dirent de. Return 0 on success and -1
- * otherwise.
+ * otherwise, setting errno accordingly.
*/
static int prepare_next_entry_data(struct dir_iterator_int *iter,
struct dirent *de)
{
+ int err, saved_errno;
+
strbuf_addstr(&iter->base.path, de->d_name);
/*
* We have to reset these because the path strbuf might have
@@ -105,13 +112,17 @@ static int prepare_next_entry_data(struct dir_iterator_int *iter,
iter->base.basename = iter->base.path.buf +
iter->levels[iter->levels_nr - 1].prefix_len;

- if (lstat(iter->base.path.buf, &iter->base.st)) {
- if (errno != ENOENT)
- warning_errno("failed to stat '%s'", iter->base.path.buf);
- return -1;
- }
+ if (iter->flags & DIR_ITERATOR_FOLLOW_SYMLINKS)
+ err = stat(iter->base.path.buf, &iter->base.st);
+ else
+ err = lstat(iter->base.path.buf, &iter->base.st);

- return 0;
+ saved_errno = errno;
+ if (err && errno != ENOENT)
+ warning_errno("failed to stat '%s'", iter->base.path.buf);
+
+ errno = saved_errno;
+ return err;
}

int dir_iterator_advance(struct dir_iterator *dir_iterator)
@@ -119,11 +130,11 @@ int dir_iterator_advance(struct dir_iterator *dir_iterator)
struct dir_iterator_int *iter =
(struct dir_iterator_int *)dir_iterator;

- if (S_ISDIR(iter->base.st.st_mode)) {
- if (push_level(iter) && iter->levels_nr == 0) {
- /* Pushing the first level failed */
- return dir_iterator_abort(dir_iterator);
- }
+ if (S_ISDIR(iter->base.st.st_mode) && push_level(iter)) {
+ if (errno != ENOENT && iter->flags & DIR_ITERATOR_PEDANTIC)
+ goto error_out;
+ if (iter->levels_nr == 0)
+ goto error_out;
}

/* Loop until we find an entry that we can give back to the caller. */
@@ -137,22 +148,32 @@ int dir_iterator_advance(struct dir_iterator *dir_iterator)
de = readdir(level->dir);

if (!de) {
- if (errno)
+ if (errno) {
warning_errno("error reading directory '%s'",
iter->base.path.buf);
- else if (pop_level(iter) == 0)
+ if (iter->flags & DIR_ITERATOR_PEDANTIC)
+ goto error_out;
+ } else if (pop_level(iter) == 0) {
return dir_iterator_abort(dir_iterator);
+ }
continue;
}

if (is_dot_or_dotdot(de->d_name))
continue;

- if (prepare_next_entry_data(iter, de))
+ if (prepare_next_entry_data(iter, de)) {
+ if (errno != ENOENT && iter->flags & DIR_ITERATOR_PEDANTIC)
+ goto error_out;
continue;
+ }

return ITER_OK;
}
+
+error_out:
+ dir_iterator_abort(dir_iterator);
+ return ITER_ERROR;
}

int dir_iterator_abort(struct dir_iterator *dir_iterator)
@@ -178,7 +199,7 @@ int dir_iterator_abort(struct dir_iterator *dir_iterator)
return ITER_DONE;
}

-struct dir_iterator *dir_iterator_begin(const char *path)
+struct dir_iterator *dir_iterator_begin(const char *path, unsigned int flags)
{
struct dir_iterator_int *iter = xcalloc(1, sizeof(*iter));
struct dir_iterator *dir_iterator = &iter->base;
@@ -189,6 +210,7 @@ struct dir_iterator *dir_iterator_begin(const char *path)

ALLOC_GROW(iter->levels, 10, iter->levels_alloc);
iter->levels_nr = 0;
+ iter->flags = flags;

/*
* Note: stat already checks for NULL or empty strings and
diff --git a/dir-iterator.h b/dir-iterator.h
index 9b4cb7acd2..08229157c6 100644
--- a/dir-iterator.h
+++ b/dir-iterator.h
@@ -20,7 +20,8 @@
* A typical iteration looks like this:
*
* int ok;
- * struct dir_iterator *iter = dir_iterator_begin(path);
+ * unsigned int flags = DIR_ITERATOR_PEDANTIC;
+ * struct dir_iterator *iter = dir_iterator_begin(path, flags);
*
* if (!iter)
* goto error_handler;
@@ -44,6 +45,29 @@
* dir_iterator_advance() again.
*/

+/*
+ * Flags for dir_iterator_begin:
+ *
+ * - DIR_ITERATOR_PEDANTIC: override dir-iterator's default behavior
+ * in case of an error at dir_iterator_advance(), which is to keep
+ * looking for a next valid entry. With this flag, resources are freed
+ * and ITER_ERROR is returned immediately. In both cases, a meaningful
+ * warning is emitted. Note: ENOENT errors are always ignored so that
+ * the API users may remove files during iteration.
+ *
+ * - DIR_ITERATOR_FOLLOW_SYMLINKS: make dir-iterator follow symlinks.
+ * i.e., linked directories' contents will be iterated over and
+ * iter->base.st will contain information on the referred files,
+ * not the symlinks themselves, which is the default behavior. Broken
+ * symlinks are ignored.
+ *
+ * Warning: circular symlinks are also followed when
+ * DIR_ITERATOR_FOLLOW_SYMLINKS is set. The iteration may end up with
+ * an ELOOP if they happen and DIR_ITERATOR_PEDANTIC is set.
+ */
+#define DIR_ITERATOR_PEDANTIC (1 << 0)
+#define DIR_ITERATOR_FOLLOW_SYMLINKS (1 << 1)
+
struct dir_iterator {
/* The current path: */
struct strbuf path;
@@ -58,29 +82,38 @@ struct dir_iterator {
/* The current basename: */
const char *basename;

- /* The result of calling lstat() on path: */
+ /*
+ * The result of calling lstat() on path; or stat(), if the
+ * DIR_ITERATOR_FOLLOW_SYMLINKS flag was set at
+ * dir_iterator's initialization.
+ */
struct stat st;
};

/*
- * Start a directory iteration over path. On success, return a
- * dir_iterator that holds the internal state of the iteration.
- * In case of failure, return NULL and set errno accordingly.
+ * Start a directory iteration over path with the combination of
+ * options specified by flags. On success, return a dir_iterator
+ * that holds the internal state of the iteration. In case of
+ * failure, return NULL and set errno accordingly.
*
* The iteration includes all paths under path, not including path
* itself and not including "." or ".." entries.
*
- * path is the starting directory. An internal copy will be made.
+ * Parameters are:
+ * - path is the starting directory. An internal copy will be made.
+ * - flags is a combination of the possible flags to initialize a
+ * dir-iterator or 0 for default behavior.
*/
-struct dir_iterator *dir_iterator_begin(const char *path);
+struct dir_iterator *dir_iterator_begin(const char *path, unsigned int flags);

/*
* Advance the iterator to the first or next item and return ITER_OK.
* If the iteration is exhausted, free the dir_iterator and any
- * resources associated with it and return ITER_DONE. On error, free
- * dir_iterator and associated resources and return ITER_ERROR. It is
- * a bug to use iterator or call this function again after it has
- * returned ITER_DONE or ITER_ERROR.
+ * resources associated with it and return ITER_DONE.
+ *
+ * It is a bug to use iterator or call this function again after it
+ * has returned ITER_DONE or ITER_ERROR (which may be returned iff
+ * the DIR_ITERATOR_PEDANTIC flag was set).
*/
int dir_iterator_advance(struct dir_iterator *iterator);

diff --git a/refs/files-backend.c b/refs/files-backend.c
index 7ed81046d4..b1f8f53a09 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -2150,7 +2150,7 @@ static struct ref_iterator *reflog_iterator_begin(struct ref_store *ref_store,

strbuf_addf(&sb, "%s/logs", gitdir);

- diter = dir_iterator_begin(sb.buf);
+ diter = dir_iterator_begin(sb.buf, 0);
if(!diter)
return empty_ref_iterator_begin();

diff --git a/t/helper/test-dir-iterator.c b/t/helper/test-dir-iterator.c
index fab1ff6237..a5b96cb0dc 100644
--- a/t/helper/test-dir-iterator.c
+++ b/t/helper/test-dir-iterator.c
@@ -4,29 +4,44 @@
#include "iterator.h"
#include "dir-iterator.h"

-/* Argument is a directory path to iterate over */
+/*
+ * usage:
+ * tool-test dir-iterator [--follow-symlinks] [--pedantic] directory_path
+ */
int cmd__dir_iterator(int argc, const char **argv)
{
struct strbuf path = STRBUF_INIT;
struct dir_iterator *diter;
+ unsigned int flags = 0;
+ int iter_status;
+
+ for (++argv, --argc; *argv && starts_with(*argv, "--"); ++argv, --argc) {
+ if (strcmp(*argv, "--follow-symlinks") == 0)
+ flags |= DIR_ITERATOR_FOLLOW_SYMLINKS;
+ else if (strcmp(*argv, "--pedantic") == 0)
+ flags |= DIR_ITERATOR_PEDANTIC;
+ else
+ die("invalid option '%s'", *argv);
+ }

- if (argc < 2)
- die("BUG: test-dir-iterator needs one argument");
-
- strbuf_add(&path, argv[1], strlen(argv[1]));
+ if (!*argv || argc != 1)
+ die("dir-iterator needs exactly one non-option argument");

- diter = dir_iterator_begin(path.buf);
+ strbuf_add(&path, *argv, strlen(*argv));
+ diter = dir_iterator_begin(path.buf, flags);

if (!diter) {
printf("dir_iterator_begin failure: %d\n", errno);
exit(EXIT_FAILURE);
}

- while (dir_iterator_advance(diter) == ITER_OK) {
+ while ((iter_status = dir_iterator_advance(diter)) == ITER_OK) {
if (S_ISDIR(diter->st.st_mode))
printf("[d] ");
else if (S_ISREG(diter->st.st_mode))
printf("[f] ");
+ else if (S_ISLNK(diter->st.st_mode))
+ printf("[s] ");
else
printf("[?] ");

@@ -34,5 +49,10 @@ int cmd__dir_iterator(int argc, const char **argv)
diter->path.buf);
}

+ if (iter_status != ITER_DONE) {
+ printf("dir_iterator_advance failure\n");
+ return 1;
+ }
+
return 0;
}
diff --git a/t/t0066-dir-iterator.sh b/t/t0066-dir-iterator.sh
index cc4b19c34c..9354d3f1ed 100755
--- a/t/t0066-dir-iterator.sh
+++ b/t/t0066-dir-iterator.sh
@@ -65,4 +65,84 @@ test_expect_success 'begin should fail upon non directory paths' '
test_cmp expected-non-dir-output actual-non-dir-output
'

+test_expect_success POSIXPERM,SANITY 'advance should not fail on errors by default' '
+ cat >expected-no-permissions-output <<-EOF &&
+ [d] (a) [a] ./dir3/a
+ EOF
+
+ mkdir -p dir3/a &&
+ >dir3/a/b &&
+ chmod 0 dir3/a &&
+
+ test-tool dir-iterator ./dir3 >actual-no-permissions-output &&
+ test_cmp expected-no-permissions-output actual-no-permissions-output &&
+ chmod 755 dir3/a &&
+ rm -rf dir3
+'
+
+test_expect_success POSIXPERM,SANITY 'advance should fail on errors, w/ pedantic flag' '
+ cat >expected-no-permissions-pedantic-output <<-EOF &&
+ [d] (a) [a] ./dir3/a
+ dir_iterator_advance failure
+ EOF
+
+ mkdir -p dir3/a &&
+ >dir3/a/b &&
+ chmod 0 dir3/a &&
+
+ test_must_fail test-tool dir-iterator --pedantic ./dir3 \
+ >actual-no-permissions-pedantic-output &&
+ test_cmp expected-no-permissions-pedantic-output \
+ actual-no-permissions-pedantic-output &&
+ chmod 755 dir3/a &&
+ rm -rf dir3
+'
+
+test_expect_success SYMLINKS 'setup dirs with symlinks' '
+ mkdir -p dir4/a &&
+ mkdir -p dir4/b/c &&
+ >dir4/a/d &&
+ ln -s d dir4/a/e &&
+ ln -s ../b dir4/a/f &&
+
+ mkdir -p dir5/a/b &&
+ mkdir -p dir5/a/c &&
+ ln -s ../c dir5/a/b/d &&
+ ln -s ../ dir5/a/b/e &&
+ ln -s ../../ dir5/a/b/f
+'
+
+test_expect_success SYMLINKS 'dir-iterator should not follow symlinks by default' '
+ cat >expected-no-follow-sorted-output <<-EOF &&
+ [d] (a) [a] ./dir4/a
+ [d] (b) [b] ./dir4/b
+ [d] (b/c) [c] ./dir4/b/c
+ [f] (a/d) [d] ./dir4/a/d
+ [s] (a/e) [e] ./dir4/a/e
+ [s] (a/f) [f] ./dir4/a/f
+ EOF
+
+ test-tool dir-iterator ./dir4 >out &&
+ sort out >actual-no-follow-sorted-output &&
+
+ test_cmp expected-no-follow-sorted-output actual-no-follow-sorted-output
+'
+
+test_expect_success SYMLINKS 'dir-iterator should follow symlinks w/ follow flag' '
+ cat >expected-follow-sorted-output <<-EOF &&
+ [d] (a) [a] ./dir4/a
+ [d] (a/f) [f] ./dir4/a/f
+ [d] (a/f/c) [c] ./dir4/a/f/c
+ [d] (b) [b] ./dir4/b
+ [d] (b/c) [c] ./dir4/b/c
+ [f] (a/d) [d] ./dir4/a/d
+ [f] (a/e) [e] ./dir4/a/e
+ EOF
+
+ test-tool dir-iterator --follow-symlinks ./dir4 >out &&
+ sort out >actual-follow-sorted-output &&
+
+ test_cmp expected-follow-sorted-output actual-follow-sorted-output
+'
+
test_done
--
2.22.0

Matheus Tavares

unread,
Jul 10, 2019, 8:00:22 PM7/10/19
to Junio C Hamano, g...@vger.kernel.org, Thomas Gummerer, Ævar Arnfjörð Bjarmason, Christian Couder, Nguyễn Thái Ngọc Duy, SZEDER Gábor, Olga Telezhnaya, Johannes Schindelin, kerne...@googlegroups.com
Make the copy_or_link_directory function no longer skip hidden
directories. This function, used to copy .git/objects, currently skips
all hidden directories but not hidden files, which is an odd behaviour.
The reason for that could be unintentional: probably the intention was
to skip '.' and '..' only but it ended up accidentally skipping all
directories starting with '.'. Besides being more natural, the new
behaviour is more permissive to the user.

Also adjust tests to reflect this behaviour change.

Signed-off-by: Matheus Tavares <matheus.b...@usp.br>
Signed-off-by: Ævar Arnfjörð Bjarmason <ava...@gmail.com>
Co-authored-by: Ævar Arnfjörð Bjarmason <ava...@gmail.com>
---
builtin/clone.c | 2 +-
t/t5604-clone-reference.sh | 9 +++++++++
2 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/builtin/clone.c b/builtin/clone.c
index 4a0a2455a7..9dd083e34d 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -430,7 +430,7 @@ static void copy_or_link_directory(struct strbuf *src, struct strbuf *dest,
continue;
}
if (S_ISDIR(buf.st_mode)) {
- if (de->d_name[0] != '.')
+ if (!is_dot_or_dotdot(de->d_name))
copy_or_link_directory(src, dest,
src_repo, src_baselen);
continue;
diff --git a/t/t5604-clone-reference.sh b/t/t5604-clone-reference.sh
index 459ad8a20b..4894237ab8 100755
--- a/t/t5604-clone-reference.sh
+++ b/t/t5604-clone-reference.sh
@@ -247,16 +247,25 @@ test_expect_success 'clone a repo with garbage in objects/*' '
done &&
find S-* -name "*some*" | sort >actual &&
cat >expected <<-EOF &&
+ S--dissociate/.git/objects/.some-hidden-dir
+ S--dissociate/.git/objects/.some-hidden-dir/.some-dot-file
+ S--dissociate/.git/objects/.some-hidden-dir/some-file
S--dissociate/.git/objects/.some-hidden-file
S--dissociate/.git/objects/some-dir
S--dissociate/.git/objects/some-dir/.some-dot-file
S--dissociate/.git/objects/some-dir/some-file
S--dissociate/.git/objects/some-file
+ S--local/.git/objects/.some-hidden-dir
+ S--local/.git/objects/.some-hidden-dir/.some-dot-file
+ S--local/.git/objects/.some-hidden-dir/some-file
S--local/.git/objects/.some-hidden-file
S--local/.git/objects/some-dir
S--local/.git/objects/some-dir/.some-dot-file
S--local/.git/objects/some-dir/some-file
S--local/.git/objects/some-file
+ S--no-hardlinks/.git/objects/.some-hidden-dir
+ S--no-hardlinks/.git/objects/.some-hidden-dir/.some-dot-file
+ S--no-hardlinks/.git/objects/.some-hidden-dir/some-file
S--no-hardlinks/.git/objects/.some-hidden-file
S--no-hardlinks/.git/objects/some-dir
S--no-hardlinks/.git/objects/some-dir/.some-dot-file
--
2.22.0

Matheus Tavares

unread,
Jul 10, 2019, 8:00:26 PM7/10/19
to Junio C Hamano, g...@vger.kernel.org, Thomas Gummerer, Ævar Arnfjörð Bjarmason, Christian Couder, Nguyễn Thái Ngọc Duy, SZEDER Gábor, Olga Telezhnaya, Johannes Schindelin, kerne...@googlegroups.com
Extract dir creation code snippet from copy_or_link_directory to its own
function named mkdir_if_missing. This change will help to remove
copy_or_link_directory's explicit recursion, which will be done in a
following patch. Also makes the code more readable.

Signed-off-by: Matheus Tavares <matheus.b...@usp.br>
---
builtin/clone.c | 24 ++++++++++++++++--------
1 file changed, 16 insertions(+), 8 deletions(-)

diff --git a/builtin/clone.c b/builtin/clone.c
index 9dd083e34d..96566c1bab 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -394,6 +394,21 @@ static void copy_alternates(struct strbuf *src, const char *src_repo)
fclose(in);
}

+static void mkdir_if_missing(const char *pathname, mode_t mode)
+{
+ struct stat st;
+
+ if (!mkdir(pathname, mode))
+ return;
+
+ if (errno != EEXIST)
+ die_errno(_("failed to create directory '%s'"), pathname);
+ else if (stat(pathname, &st))
+ die_errno(_("failed to stat '%s'"), pathname);
+ else if (!S_ISDIR(st.st_mode))
+ die(_("%s exists and is not a directory"), pathname);
+}
+
static void copy_or_link_directory(struct strbuf *src, struct strbuf *dest,
const char *src_repo, int src_baselen)
{
@@ -406,14 +421,7 @@ static void copy_or_link_directory(struct strbuf *src, struct strbuf *dest,
if (!dir)
die_errno(_("failed to open '%s'"), src->buf);

- if (mkdir(dest->buf, 0777)) {
- if (errno != EEXIST)
- die_errno(_("failed to create directory '%s'"), dest->buf);
- else if (stat(dest->buf, &buf))
- die_errno(_("failed to stat '%s'"), dest->buf);
- else if (!S_ISDIR(buf.st_mode))
- die(_("%s exists and is not a directory"), dest->buf);
- }
+ mkdir_if_missing(dest->buf, 0777);

strbuf_addch(src, '/');
src_len = src->len;
--
2.22.0

Matheus Tavares

unread,
Jul 10, 2019, 8:00:32 PM7/10/19
to Junio C Hamano, g...@vger.kernel.org, Thomas Gummerer, Ævar Arnfjörð Bjarmason, Christian Couder, Nguyễn Thái Ngọc Duy, SZEDER Gábor, Olga Telezhnaya, Johannes Schindelin, kerne...@googlegroups.com, Jeff King
Replace usage of opendir/readdir/closedir API to traverse directories
recursively, at copy_or_link_directory function, by the dir-iterator
API. This simplifies the code and avoids recursive calls to
copy_or_link_directory.

This process also makes copy_or_link_directory call die() in case of an
error on readdir or stat inside dir_iterator_advance. Previously it
would just print a warning for errors on stat and ignore errors on
readdir, which isn't nice because a local git clone could succeed even
though the .git/objects copy didn't fully succeed.

Signed-off-by: Matheus Tavares <matheus.b...@usp.br>
---
builtin/clone.c | 47 +++++++++++++++++++++++++----------------------
1 file changed, 25 insertions(+), 22 deletions(-)

diff --git a/builtin/clone.c b/builtin/clone.c
index 96566c1bab..47cb4a2a8e 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -23,6 +23,8 @@
#include "transport.h"
#include "strbuf.h"
#include "dir.h"
+#include "dir-iterator.h"
+#include "iterator.h"
#include "sigchain.h"
#include "branch.h"
#include "remote.h"
@@ -410,42 +412,39 @@ static void mkdir_if_missing(const char *pathname, mode_t mode)
}

static void copy_or_link_directory(struct strbuf *src, struct strbuf *dest,
- const char *src_repo, int src_baselen)
+ const char *src_repo)
{
- struct dirent *de;
- struct stat buf;
int src_len, dest_len;
- DIR *dir;
-
- dir = opendir(src->buf);
- if (!dir)
- die_errno(_("failed to open '%s'"), src->buf);
+ struct dir_iterator *iter;
+ int iter_status;
+ unsigned int flags;

mkdir_if_missing(dest->buf, 0777);

+ flags = DIR_ITERATOR_PEDANTIC | DIR_ITERATOR_FOLLOW_SYMLINKS;
+ iter = dir_iterator_begin(src->buf, flags);
+
+ if (!iter)
+ die_errno(_("failed to start iterator over '%s'"), src->buf);
+
strbuf_addch(src, '/');
src_len = src->len;
strbuf_addch(dest, '/');
dest_len = dest->len;

- while ((de = readdir(dir)) != NULL) {
+ while ((iter_status = dir_iterator_advance(iter)) == ITER_OK) {
strbuf_setlen(src, src_len);
- strbuf_addstr(src, de->d_name);
+ strbuf_addstr(src, iter->relative_path);
strbuf_setlen(dest, dest_len);
- strbuf_addstr(dest, de->d_name);
- if (stat(src->buf, &buf)) {
- warning (_("failed to stat %s\n"), src->buf);
- continue;
- }
- if (S_ISDIR(buf.st_mode)) {
- if (!is_dot_or_dotdot(de->d_name))
- copy_or_link_directory(src, dest,
- src_repo, src_baselen);
+ strbuf_addstr(dest, iter->relative_path);
+
+ if (S_ISDIR(iter->st.st_mode)) {
+ mkdir_if_missing(dest->buf, 0777);
continue;
}

/* Files that cannot be copied bit-for-bit... */
- if (!strcmp(src->buf + src_baselen, "/info/alternates")) {
+ if (!strcmp(iter->relative_path, "info/alternates")) {
copy_alternates(src, src_repo);
continue;
}
@@ -462,7 +461,11 @@ static void copy_or_link_directory(struct strbuf *src, struct strbuf *dest,
if (copy_file_with_time(dest->buf, src->buf, 0666))
die_errno(_("failed to copy file to '%s'"), dest->buf);
}
- closedir(dir);
+
+ if (iter_status != ITER_DONE) {
+ strbuf_setlen(src, src_len);
+ die(_("failed to iterate over '%s'"), src->buf);
+ }
}

static void clone_local(const char *src_repo, const char *dest_repo)
@@ -480,7 +483,7 @@ static void clone_local(const char *src_repo, const char *dest_repo)
get_common_dir(&dest, dest_repo);
strbuf_addstr(&src, "/objects");
strbuf_addstr(&dest, "/objects");
- copy_or_link_directory(&src, &dest, src_repo, src.len);
+ copy_or_link_directory(&src, &dest, src_repo);
strbuf_release(&src);
strbuf_release(&dest);
}
--
2.22.0

Matheus Tavares

unread,
Jul 10, 2019, 8:00:38 PM7/10/19
to Junio C Hamano, g...@vger.kernel.org, Thomas Gummerer, Ævar Arnfjörð Bjarmason, Christian Couder, Nguyễn Thái Ngọc Duy, SZEDER Gábor, Olga Telezhnaya, Johannes Schindelin, kerne...@googlegroups.com, Jeff King
Replace the use of strcmp by fspathcmp at copy_or_link_directory, which
is more permissive/friendly to case-insensitive file systems.

Suggested-by: Nguyễn Thái Ngọc Duy <pcl...@gmail.com>
Signed-off-by: Matheus Tavares <matheus.b...@usp.br>
---
builtin/clone.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/builtin/clone.c b/builtin/clone.c
index 47cb4a2a8e..8da696ef30 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -444,7 +444,7 @@ static void copy_or_link_directory(struct strbuf *src, struct strbuf *dest,
}

/* Files that cannot be copied bit-for-bit... */
- if (!strcmp(iter->relative_path, "info/alternates")) {
+ if (!fspathcmp(iter->relative_path, "info/alternates")) {
copy_alternates(src, src_repo);
continue;
}
--
2.22.0

Johannes Schindelin

unread,
Jul 11, 2019, 7:56:43 AM7/11/19
to Matheus Tavares, Junio C Hamano, g...@vger.kernel.org, Thomas Gummerer, Ævar Arnfjörð Bjarmason, Christian Couder, Nguyễn Thái Ngọc Duy, SZEDER Gábor, Olga Telezhnaya, kerne...@googlegroups.com
Hi Matheus,

On Wed, 10 Jul 2019, Matheus Tavares wrote:

> - a replacement of explicit recursive dir iteration at
> copy_or_link_directory for the dir-iterator API;

As far as I can see, it was not replaced, but just dropped. Which is
good, as it will most likely address the CI failures.

Thanks,
Dscho

Matheus Tavares Bernardino

unread,
Jul 11, 2019, 11:24:58 AM7/11/19
to Johannes Schindelin, Junio C Hamano, git, Thomas Gummerer, Ævar Arnfjörð Bjarmason, Christian Couder, Nguyễn Thái Ngọc Duy, SZEDER Gábor, Olga Telezhnaya, Kernel USP
You mean the circular symlink checker, right? Yes, it was dropped. At
this item I was referring to a dir iteration code at builtin/clone.c
(using opendir/readdir) which was replaced by the dir-iterator API.

> Thanks,
> Dscho
Reply all
Reply to author
Forward
0 new messages