[PATCH v2 2/2] repos: Add KAS_CLONE_DEPTH to implement git shallow clone/fetch

1 view
Skip to first unread message

Felix Moessbauer

unread,
May 17, 2024, 10:48:25 AMMay 17
to kas-...@googlegroups.com, jan.k...@siemens.com, ch...@wiggins.nz, Marek Vasut, Felix Moessbauer
From: Marek Vasut <ma...@denx.de>

Add new environment variable KAS_CLONE_DEPTH which adds '--depth=N'
to the 'git clone' and 'git fetch' commands. This forces git to
perform shallow clone, which saves bandwidth and CI runner disk
space. The depth 'N' is derived from KAS_CLONE_DEPTH value.

This is useful in case CI always starts with empty work directory
and this directory is always discarded after the CI run. In that
case, it makes no sense to clone the entire repository, instead
clone just enough to reproduce the desired state of the repository
and assemble the checkout of it.

This is also useful when cloning massive repositories which would
otherwise take long time to clone. Shallow cloning is supported for
specific commits, branches and tags (but disabled on refspec).

[Felix: rebased, sanitize input values, adapted doc entry to new
format, port test over to monkeykas infrastructure, forward from
kas-container]

Signed-off-by: Marek Vasut <ma...@denx.de>
Signed-off-by: Felix Moessbauer <felix.mo...@siemens.com>
---
docs/command-line/environment-variables.inc | 7 ++++++
kas-container | 2 +-
kas/context.py | 5 +++++
kas/repos.py | 21 +++++++++++++++++
tests/conftest.py | 1 +
tests/test_commands.py | 25 ++++++++++++++++++++-
tests/test_commands/test-shallow.yml | 23 +++++++++++++++++++
7 files changed, 82 insertions(+), 2 deletions(-)
create mode 100644 tests/test_commands/test-shallow.yml

diff --git a/docs/command-line/environment-variables.inc b/docs/command-line/environment-variables.inc
index db1ac928a..c5a17f511 100644
--- a/docs/command-line/environment-variables.inc
+++ b/docs/command-line/environment-variables.inc
@@ -72,6 +72,13 @@ Variables Glossary
| ``DISTRO_APT_PREMIRRORS``| Specifies alternatives for apt URLs. Just like |
| (C) | ``KAS_PREMIRRORS``. |
+--------------------------+--------------------------------------------------+
+| ``KAS_CLONE_DEPTH`` | Perform shallow git clone/fetch using --depth=N |
+| (C, K) | specified by this variable. This is useful in |
+| | case CI always starts with empty work directory |
+| | and this directory is always discarded after the |
+| | CI run. Shallow cloning is supported for 'tag', |
+| | 'branch' and 'commit'. |
++--------------------------+--------------------------------------------------+
| ``SSH_PRIVATE_KEY`` | Variable containing the private key that should |
| (K) | be added to an internal ssh-agent. This key |
| | cannot be password protected. This setting is |
diff --git a/kas-container b/kas-container
index 39c984a29..a94107a88 100755
--- a/kas-container
+++ b/kas-container
@@ -546,7 +546,7 @@ if [ -n "${KAS_REPO_REF_DIR}" ]; then
-e KAS_REPO_REF_DIR=/repo-ref
fi

-for var in TERM KAS_DISTRO KAS_MACHINE KAS_TARGET KAS_TASK \
+for var in TERM KAS_DISTRO KAS_MACHINE KAS_TARGET KAS_TASK KAS_CLONE_DEPTH \
KAS_PREMIRRORS DISTRO_APT_PREMIRRORS BB_NUMBER_THREADS PARALLEL_MAKE \
GIT_CREDENTIAL_USEHTTPPATH; do
if [ -n "$(eval echo \$${var})" ]; then
diff --git a/kas/context.py b/kas/context.py
index 6ca529151..7f2300c72 100644
--- a/kas/context.py
+++ b/kas/context.py
@@ -25,6 +25,7 @@

import os
import logging
+from kas.kasusererror import KasUserError

try:
import distro
@@ -78,6 +79,10 @@ class Context:
self.__kas_build_dir = os.path.abspath(build_dir)
ref_dir = os.environ.get('KAS_REPO_REF_DIR', None)
self.__kas_repo_ref_dir = os.path.abspath(ref_dir) if ref_dir else None
+ clone_depth = os.environ.get('KAS_CLONE_DEPTH', '0')
+ if not clone_depth.isdigit():
+ raise KasUserError('KAS_CLONE_DEPTH must be a number')
+ self.repo_clone_depth = int(clone_depth)
self.setup_initial_environ()
self.config = None
self.args = args
diff --git a/kas/repos.py b/kas/repos.py
index 4b2b0c3e5..af17585da 100644
--- a/kas/repos.py
+++ b/kas/repos.py
@@ -523,6 +523,16 @@ class GitRepo(RepoImpl):

def clone_cmd(self, srcdir, createref):
cmd = ['git', 'clone', '-q']
+
+ depth = get_context().repo_clone_depth
+ if depth:
+ if self.refspec:
+ logging.warning('Shallow cloning is not supported for legacy '
+ f'refspec on repository "{self.name}". '
+ 'Performing full clone.')
+ else:
+ cmd.extend(['--depth', str(depth)])
+
if createref:
cmd.extend([self.effective_url, '--bare', srcdir])
elif srcdir:
@@ -543,8 +553,19 @@ class GitRepo(RepoImpl):

def fetch_cmd(self):
cmd = ['git', 'fetch', '-q']
+
+ depth = get_context().repo_clone_depth
+ if depth and not self.refspec:
+ cmd.extend(['--depth', str(depth)])
+
if self.tag:
cmd.extend(['origin', f'+{self.tag}:refs/tags/{self.tag}'])
+ return cmd
+
+ # only fetch this commit
+ if depth and self.commit:
+ cmd.extend(['origin', self.commit])
+ return cmd

branch = self.branch or self.refspec
if branch and branch.startswith('refs/'):
diff --git a/tests/conftest.py b/tests/conftest.py
index ef0dc9d10..d3ad2daa3 100644
--- a/tests/conftest.py
+++ b/tests/conftest.py
@@ -32,6 +32,7 @@ ENVVARS_KAS = [
'KAS_TARGET',
'KAS_TASK',
'KAS_PREMIRRORS',
+ 'KAS_CLONE_DEPTH',
'SSH_PRIVATE_KEY',
'SSH_PRIVATE_KEY_FILE',
'SSH_AUTH_SOCK',
diff --git a/tests/test_commands.py b/tests/test_commands.py
index 849882e6a..6d4585164 100644
--- a/tests/test_commands.py
+++ b/tests/test_commands.py
@@ -29,7 +29,8 @@ import yaml
import subprocess
import pytest
from kas import kas
-from kas.libkas import TaskExecError
+from kas.libkas import run_cmd
+from kas.libkas import TaskExecError, KasUserError


def test_for_all_repos(monkeykas, tmpdir):
@@ -107,6 +108,28 @@ def test_checkout_create_refs(monkeykas, tmpdir):
assert os.path.exists('kas/.git/objects/info/alternates')


+def test_checkout_shallow(monkeykas, tmpdir):
+ tdir = str(tmpdir / 'test_commands')
+ shutil.copytree('tests/test_commands', tdir)
+ monkeykas.chdir(tdir)
+ with monkeykas.context() as mp:
+ mp.setenv('KAS_CLONE_DEPTH', 'invalid')
+ with pytest.raises(KasUserError):
+ kas.kas(['checkout', 'test-shallow.yml'])
+
+ with monkeykas.context() as mp:
+ mp.setenv('KAS_CLONE_DEPTH', '1')
+ kas.kas(['checkout', 'test-shallow.yml'])
+ for repo in ['kas_1', 'kas_2', 'kas_3', 'kas_4']:
+ (rc, output) = run_cmd(['git', 'rev-list', '--count', 'HEAD'],
+ cwd=repo, fail=False, liveupdate=False)
+ assert rc == 0
+ if repo == 'kas_4':
+ assert output.strip() >= '1'
+ else:
+ assert output.strip() == '1'
+
+
def test_repo_includes(monkeykas, tmpdir):
tdir = str(tmpdir / 'test_commands')
shutil.copytree('tests/test_repo_includes', tdir)
diff --git a/tests/test_commands/test-shallow.yml b/tests/test_commands/test-shallow.yml
new file mode 100644
index 000000000..dfd711cdd
--- /dev/null
+++ b/tests/test_commands/test-shallow.yml
@@ -0,0 +1,23 @@
+header:
+ version: 14
+
+repos:
+ this:
+
+ kas_1:
+ url: https://github.com/siemens/kas.git
+ branch: master
+
+ kas_2:
+ url: https://github.com/siemens/kas.git
+ tag: '4.3'
+ commit: f650ebe2495a9cbe2fdf4a2c8becc7b3db470d55
+
+ kas_3:
+ url: https://github.com/siemens/kas.git
+ commit: e42a64a666082b77fbc2758b07191b662d17f792
+
+ kas_4:
+ url: https://github.com/siemens/kas.git
+ # keep legacy refspec here for testing purposes
+ refspec: master
--
2.39.2

Jan Kiszka

unread,
May 17, 2024, 11:58:46 AMMay 17
to Felix Moessbauer, kas-...@googlegroups.com, ch...@wiggins.nz, Marek Vasut
On 17.05.24 16:48, 'Felix Moessbauer' via kas-devel wrote:
> From: Marek Vasut <ma...@denx.de>
>
> Add new environment variable KAS_CLONE_DEPTH which adds '--depth=N'
> to the 'git clone' and 'git fetch' commands. This forces git to
> perform shallow clone, which saves bandwidth and CI runner disk
> space. The depth 'N' is derived from KAS_CLONE_DEPTH value.
>
> This is useful in case CI always starts with empty work directory
> and this directory is always discarded after the CI run. In that
> case, it makes no sense to clone the entire repository, instead
> clone just enough to reproduce the desired state of the repository
> and assemble the checkout of it.
>
> This is also useful when cloning massive repositories which would
> otherwise take long time to clone. Shallow cloning is supported for
> specific commits, branches and tags (but disabled on refspec).

And what will happen if I request to update a branch-based shallow clone?

Jan
Siemens AG, Technology
Linux Expert Center

Jan Kiszka

unread,
May 22, 2024, 8:06:35 AMMay 22
to Felix Moessbauer, kas-...@googlegroups.com, ch...@wiggins.nz, Marek Vasut
On 17.05.24 17:58, 'Jan Kiszka' via kas-devel wrote:
> On 17.05.24 16:48, 'Felix Moessbauer' via kas-devel wrote:
>> From: Marek Vasut <ma...@denx.de>
>>
>> Add new environment variable KAS_CLONE_DEPTH which adds '--depth=N'
>> to the 'git clone' and 'git fetch' commands. This forces git to
>> perform shallow clone, which saves bandwidth and CI runner disk
>> space. The depth 'N' is derived from KAS_CLONE_DEPTH value.
>>
>> This is useful in case CI always starts with empty work directory
>> and this directory is always discarded after the CI run. In that
>> case, it makes no sense to clone the entire repository, instead
>> clone just enough to reproduce the desired state of the repository
>> and assemble the checkout of it.
>>
>> This is also useful when cloning massive repositories which would
>> otherwise take long time to clone. Shallow cloning is supported for
>> specific commits, branches and tags (but disabled on refspec).
>
> And what will happen if I request to update a branch-based shallow clone?
>

What does happen with this already: If I request remote branch that is
not HEAD there and have KAS_CLONE_DEPTH=1, kas checkout will fail.

But the whole concept of shallow clones on branches does not go well,
and I would prefer to reject such requests.

Jan

MOESSBAUER, Felix

unread,
May 23, 2024, 3:27:13 AMMay 23
to Kiszka, Jan, kas-...@googlegroups.com, ma...@denx.de, ch...@wiggins.nz
On Wed, 2024-05-22 at 14:06 +0200, Jan Kiszka wrote:
> On 17.05.24 17:58, 'Jan Kiszka' via kas-devel wrote:
> > On 17.05.24 16:48, 'Felix Moessbauer' via kas-devel wrote:
> > > From: Marek Vasut <ma...@denx.de>
> > >
> > > Add new environment variable KAS_CLONE_DEPTH which adds '--
> > > depth=N'
> > > to the 'git clone' and 'git fetch' commands. This forces git to
> > > perform shallow clone, which saves bandwidth and CI runner disk
> > > space. The depth 'N' is derived from KAS_CLONE_DEPTH value.
> > >
> > > This is useful in case CI always starts with empty work directory
> > > and this directory is always discarded after the CI run. In that
> > > case, it makes no sense to clone the entire repository, instead
> > > clone just enough to reproduce the desired state of the
> > > repository
> > > and assemble the checkout of it.
> > >
> > > This is also useful when cloning massive repositories which would
> > > otherwise take long time to clone. Shallow cloning is supported
> > > for
> > > specific commits, branches and tags (but disabled on refspec).
> >
> > And what will happen if I request to update a branch-based shallow
> > clone?
> >
>
> What does happen with this already: If I request remote branch that
> is
> not HEAD there and have KAS_CLONE_DEPTH=1, kas checkout will fail.

Yes, I noticed that as well. We need a more holistic approach.

>
> But the whole concept of shallow clones on branches does not go well,
> and I would prefer to reject such requests.

IMHO this is a very valuable - and needed - feature. Especially for the
huge OE repos. My v3 of that series (to be send soon) will fix all
these issues and handle this in a fully transparent way.

Felix
Reply all
Reply to author
Forward
0 new messages