[PATCH] repos: Add KAS_GIT_SHALLOW to implement git shallow clone/fetch

153 views
Skip to first unread message

Marek Vasut

unread,
Oct 31, 2023, 3:06:15 PM10/31/23
to kas-...@googlegroups.com, Marek Vasut
Add new environment variable KAS_GIT_SHALLOW which adds '--depth=N'
to the 'git clone' and 'git fetch' commands. This forces git to
perform shallow clone, which saves bandwidth and CI runner disk
space. The depth 'N' is derived from KAS_GIT_SHALLOW value.

This is useful in case CI always starts with empty work directory
and this directory is always discarded after the CI run. In that
case, it makes no sense to clone the entire repository, instead
clone just enough to reproduce the desired state of the repository
and assemble the checkout of it.

This is also useful when cloning massive repositories which would
otherwise take long time to clone.

Signed-off-by: Marek Vasut <ma...@denx.de>
---
docs/command-line.rst | 6 ++++++
kas/repos.py | 10 ++++++++++
tests/test_commands.py | 14 ++++++++++++++
tests/test_commands/test-shallow.yml | 9 +++++++++
4 files changed, 39 insertions(+)
create mode 100644 tests/test_commands/test-shallow.yml

diff --git a/docs/command-line.rst b/docs/command-line.rst
index 25be620..b59c722 100644
--- a/docs/command-line.rst
+++ b/docs/command-line.rst
@@ -46,6 +46,12 @@ Environment variables
| | space-separated, its replacement. E.g.: |
| | "http://.*\.someurl\.io/ http://localmirror.net/"|
+--------------------------+--------------------------------------------------+
+| ``KAS_GIT_SHALLOW`` | Perform shallow git clone/fetch using --depth=N |
+| | specified by this variable. This is useful in |
+| | case CI always starts with empty work directory |
+| | and this directory is always discarded after the |
+| | CI run. |
++--------------------------+--------------------------------------------------+
| ``SSH_PRIVATE_KEY`` | Variable containing the private key that should |
| | be added to an internal ssh-agent. This key |
| | cannot be password protected. This setting is |
diff --git a/kas/repos.py b/kas/repos.py
index 99205f4..b663f56 100644
--- a/kas/repos.py
+++ b/kas/repos.py
@@ -478,6 +478,11 @@ class GitRepo(RepoImpl):

def clone_cmd(self, srcdir, createref):
cmd = ['git', 'clone', '-q']
+
+ depth = os.environ.get('KAS_GIT_SHALLOW')
+ if depth:
+ cmd.extend(['--depth', depth])
+
if createref:
cmd.extend([self.effective_url, '--bare', srcdir])
elif srcdir:
@@ -498,6 +503,11 @@ class GitRepo(RepoImpl):

def fetch_cmd(self):
cmd = ['git', 'fetch', '-q']
+
+ depth = os.environ.get('KAS_GIT_SHALLOW')
+ if depth:
+ cmd.extend(['--depth', depth])
+
if self.tag:
cmd.append('--tags')

diff --git a/tests/test_commands.py b/tests/test_commands.py
index 42100fe..2dda279 100644
--- a/tests/test_commands.py
+++ b/tests/test_commands.py
@@ -28,6 +28,7 @@ import json
import yaml
import pytest
from kas import kas
+from kas.libkas import run_cmd
from kas.libkas import TaskExecError


@@ -83,6 +84,19 @@ def test_checkout_create_refs(changedir, tmpdir):
assert os.path.exists('kas/.git/objects/info/alternates')


+def test_checkout_shallow(changedir, tmpdir):
+ tdir = str(tmpdir / 'test_commands')
+ shutil.copytree('tests/test_commands', tdir)
+ os.chdir(tdir)
+ os.environ['KAS_GIT_SHALLOW'] = '1'
+ kas.kas(['checkout', 'test-shallow.yml'])
+ del os.environ['KAS_GIT_SHALLOW']
+ (rc, output) = run_cmd(['git', 'rev-list', '--count', 'HEAD'], cwd='kas',
+ fail=False, liveupdate=False)
+ assert rc == 0
+ assert output.strip() == '1'
+
+
def test_repo_includes(changedir, tmpdir):
tdir = str(tmpdir / 'test_commands')
shutil.copytree('tests/test_repo_includes', tdir)
diff --git a/tests/test_commands/test-shallow.yml b/tests/test_commands/test-shallow.yml
new file mode 100644
index 0000000..a3128d3
--- /dev/null
+++ b/tests/test_commands/test-shallow.yml
@@ -0,0 +1,9 @@
+header:
+ version: 14
+
+repos:
+ this:
+
+ kas:
+ url: https://github.com/siemens/kas.git
+ commit: master
--
2.42.0

MOESSBAUER, Felix

unread,
Nov 1, 2023, 2:24:03 AM11/1/23
to ma...@denx.de, kas-...@googlegroups.com
On Tue, 2023-10-31 at 20:05 +0100, Marek Vasut wrote:
> Add new environment variable KAS_GIT_SHALLOW which adds '--depth=N'
> to the 'git clone' and 'git fetch' commands. This forces git to
> perform shallow clone, which saves bandwidth and CI runner disk
> space. The depth 'N' is derived from KAS_GIT_SHALLOW value.

Hi Marek,

thanks for bringing this up. This is indeed a valuable addition.

For me, the name KAS_GIT_SHALLOW suggests it is a boolean variable,
while it actually is an integer. How about KAS_GIT_DEPTH?

>
> This is useful in case CI always starts with empty work directory
> and this directory is always discarded after the CI run. In that
> case, it makes no sense to clone the entire repository, instead
> clone just enough to reproduce the desired state of the repository
> and assemble the checkout of it.

When talking about the CI: Shall we auto-inherit this value from the
GIT_DEPTH variable that is set in the gitlab-ci? This would
automatically reduce the download size, but on the other hand it might
break existing projects. I'm unsure about it.

With this implementation, it is not possible to perform a shallow clone
of a non-default branch (except if the branch's tip is within the N
commits we cloned). If we have the branch information from the kas
file, we should also append it to the clone command in this case. (git
clone --depths <N> <REPO> -b foo).


>          if createref:
>              cmd.extend([self.effective_url, '--bare', srcdir])
>          elif srcdir:
> @@ -498,6 +503,11 @@ class GitRepo(RepoImpl):
>  
>      def fetch_cmd(self):
>          cmd = ['git', 'fetch', '-q']
> +
> +        depth = os.environ.get('KAS_GIT_SHALLOW')
> +        if depth:
> +            cmd.extend(['--depth', depth])
> +

Same here, we need to add the branch information.

In KAS we already have support for reference repos to cache the
download on the CI runners itself (see KAS_REPO_REF_DIR for details).
How does this play together with shallow clones? Can we add a test for
that as well?

Best regards,
Felix

Jan Kiszka

unread,
Nov 1, 2023, 11:38:51 AM11/1/23
to MOESSBAUER, Felix, ma...@denx.de, kas-...@googlegroups.com
On 01.11.23 07:23, 'MOESSBAUER, Felix' via kas-devel wrote:
> On Tue, 2023-10-31 at 20:05 +0100, Marek Vasut wrote:
>> Add new environment variable KAS_GIT_SHALLOW which adds '--depth=N'
>> to the 'git clone' and 'git fetch' commands. This forces git to
>> perform shallow clone, which saves bandwidth and CI runner disk
>> space. The depth 'N' is derived from KAS_GIT_SHALLOW value.
>
> Hi Marek,
>
> thanks for bringing this up. This is indeed a valuable addition.
>
> For me, the name KAS_GIT_SHALLOW suggests it is a boolean variable,
> while it actually is an integer. How about KAS_GIT_DEPTH?
>

I'm wondering what the advantages are of having this as environment
variable. Why not adding a commend line option to kas?

Furthermore, I'm wondering if this will not explode when the referenced
commit is outside of the specified depth - which can happen at any time.
How would you manage that in practice? Apparently, this is also the
reason why bitbake itself does not support shallow clones.

Jan

--
Siemens AG, Technology
Linux Expert Center

Marek Vasut

unread,
Nov 1, 2023, 12:13:48 PM11/1/23
to Jan Kiszka, MOESSBAUER, Felix, kas-...@googlegroups.com
On 11/1/23 16:38, Jan Kiszka wrote:
> On 01.11.23 07:23, 'MOESSBAUER, Felix' via kas-devel wrote:
>> On Tue, 2023-10-31 at 20:05 +0100, Marek Vasut wrote:
>>> Add new environment variable KAS_GIT_SHALLOW which adds '--depth=N'
>>> to the 'git clone' and 'git fetch' commands. This forces git to
>>> perform shallow clone, which saves bandwidth and CI runner disk
>>> space. The depth 'N' is derived from KAS_GIT_SHALLOW value.
>>
>> Hi Marek,
>>
>> thanks for bringing this up. This is indeed a valuable addition.
>>
>> For me, the name KAS_GIT_SHALLOW suggests it is a boolean variable,
>> while it actually is an integer. How about KAS_GIT_DEPTH?
>>
>
> I'm wondering what the advantages are of having this as environment
> variable. Why not adding a commend line option to kas?

So, what is the preference with kas, variables or command line options ?

> Furthermore, I'm wondering if this will not explode when the referenced
> commit is outside of the specified depth - which can happen at any time.

It will blow up, I don't think you can clone a specific commit out of a
git repo with --depth=1 (at least not right now), so this would have to
be limited to branches . And then, git clone --depth=1 for branch means
the topmost commit of that branch .

I can imagine one can create an empty local repo, set remote in it,
fetch specific commit from that remote with --depth=1 , and then
checkout that commit into local branch to work around the aforementioned
git clone limitation .

Marek Vasut

unread,
Nov 1, 2023, 7:18:25 PM11/1/23
to MOESSBAUER, Felix, kas-...@googlegroups.com
On 11/1/23 07:23, MOESSBAUER, Felix wrote:
> On Tue, 2023-10-31 at 20:05 +0100, Marek Vasut wrote:
>> Add new environment variable KAS_GIT_SHALLOW which adds '--depth=N'
>> to the 'git clone' and 'git fetch' commands. This forces git to
>> perform shallow clone, which saves bandwidth and CI runner disk
>> space. The depth 'N' is derived from KAS_GIT_SHALLOW value.
>
> Hi Marek,

Hi,

> thanks for bringing this up. This is indeed a valuable addition.
>
> For me, the name KAS_GIT_SHALLOW suggests it is a boolean variable,

It refers to git clone --shallow , but depth is fine all the same.

> while it actually is an integer. How about KAS_GIT_DEPTH?

Will do in V2

>> This is useful in case CI always starts with empty work directory
>> and this directory is always discarded after the CI run. In that
>> case, it makes no sense to clone the entire repository, instead
>> clone just enough to reproduce the desired state of the repository
>> and assemble the checkout of it.
>
> When talking about the CI: Shall we auto-inherit this value from the
> GIT_DEPTH variable that is set in the gitlab-ci? This would
> automatically reduce the download size, but on the other hand it might
> break existing projects. I'm unsure about it.

I'd probably leave that one-liner gitlab-ci.yml tweak up to users.

I can imagine how such behavior could be undesired for some users, e.g.
in case they are doing some statistics on the git history.
Is this what you have in mind ?

diff --git a/kas/repos.py b/kas/repos.py
index dfb2046..66ec3b1 100644
--- a/kas/repos.py
+++ b/kas/repos.py
@@ -482,6 +482,10 @@ class GitRepo(RepoImpl):
depth = os.environ.get('KAS_GIT_DEPTH')
if depth:
cmd.extend(['--depth', depth])
+ branch = self.branch
+ if branch and branch.startswith('refs/'):
+ branch = 'remotes/origin/' + self.remove_ref_prefix(branch)
+ cmd.extend(['-b', branch])

if createref:
cmd.extend([self.effective_url, '--bare', srcdir])

What about refspec: ? I don't think there is a single magic git-clone
command to clone one specific commit SHA into local directory .
I added a test. Those two things are complementary, it basically boils
down to:
$ git clone --depth 1 --reference /local/path/ proto://remote/repo
Does it not ?

Jan Kiszka

unread,
Nov 2, 2023, 2:27:49 AM11/2/23
to Marek Vasut, MOESSBAUER, Felix, kas-...@googlegroups.com
On 01.11.23 17:13, Marek Vasut wrote:
> On 11/1/23 16:38, Jan Kiszka wrote:
>> On 01.11.23 07:23, 'MOESSBAUER, Felix' via kas-devel wrote:
>>> On Tue, 2023-10-31 at 20:05 +0100, Marek Vasut wrote:
>>>> Add new environment variable KAS_GIT_SHALLOW which adds '--depth=N'
>>>> to the 'git clone' and 'git fetch' commands. This forces git to
>>>> perform shallow clone, which saves bandwidth and CI runner disk
>>>> space. The depth 'N' is derived from KAS_GIT_SHALLOW value.
>>>
>>> Hi Marek,
>>>
>>> thanks for bringing this up. This is indeed a valuable addition.
>>>
>>> For me, the name KAS_GIT_SHALLOW suggests it is a boolean variable,
>>> while it actually is an integer. How about KAS_GIT_DEPTH?
>>>
>>
>> I'm wondering what the advantages are of having this as environment
>> variable. Why not adding a commend line option to kas?
>
> So, what is the preference with kas, variables or command line options ?
>

Things we want to "sneak" into an existing configuration and invocation
of KAS and that are environment-specific should go into vars. The rest
into command line options or even the configs themselves.

>> Furthermore, I'm wondering if this will not explode when the referenced
>> commit is outside of the specified depth - which can happen at any time.
>
> It will blow up, I don't think you can clone a specific commit out of a
> git repo with --depth=1 (at least not right now), so this would have to
> be limited to branches . And then, git clone --depth=1 for branch means
> the topmost commit of that branch .
>

I that light, you must reject this option when the repos are not using
'commit'. And that actually raises the question if this switch shouldn't
become part of the config schema so that we can enforce that dependency
already at that level.

MOESSBAUER, Felix

unread,
Nov 4, 2023, 5:05:46 AM11/4/23
to ma...@denx.de, Kiszka, Jan, kas-...@googlegroups.com
I don't like the idea of adding this to the config. In the end, if
going with shallow clones or not is something the user should decide,
not the project.

I propose the following:

- add it as command line option, which globally enables it for all
(git) repos (including the ones with 'commit' only).
- in case of 'commit' repositories issue a warning that this should be
migrated to branch / tag + lockfile. By that, transitive dependencies
which do not yet use lockfiles do not make it impossible to clone
shallow
- all is "best-effort" only. This is commonly accepted across git
users. Even git itself behaves like that: When using git log on a
single path, the log message might just show the latest available
commit instead of the commit that actually changed it.

Best regards,
Felix

Jan Kiszka

unread,
Nov 4, 2023, 6:10:53 AM11/4/23
to MOESSBAUER, Felix (T CED INW-CN), ma...@denx.de, kas-...@googlegroups.com
Shallow only makes sense if 'branch' is present and 'commit' absent. Any
other usage should be rejected as broken ('commit') or at least fragile.

Marek Vasut

unread,
Nov 6, 2023, 8:36:51 AM11/6/23
to Jan Kiszka, MOESSBAUER, Felix, kas-...@googlegroups.com
On 11/2/23 07:27, Jan Kiszka wrote:
> On 01.11.23 17:13, Marek Vasut wrote:
>> On 11/1/23 16:38, Jan Kiszka wrote:
>>> On 01.11.23 07:23, 'MOESSBAUER, Felix' via kas-devel wrote:
>>>> On Tue, 2023-10-31 at 20:05 +0100, Marek Vasut wrote:
>>>>> Add new environment variable KAS_GIT_SHALLOW which adds '--depth=N'
>>>>> to the 'git clone' and 'git fetch' commands. This forces git to
>>>>> perform shallow clone, which saves bandwidth and CI runner disk
>>>>> space. The depth 'N' is derived from KAS_GIT_SHALLOW value.
>>>>
>>>> Hi Marek,
>>>>
>>>> thanks for bringing this up. This is indeed a valuable addition.
>>>>
>>>> For me, the name KAS_GIT_SHALLOW suggests it is a boolean variable,
>>>> while it actually is an integer. How about KAS_GIT_DEPTH?
>>>>
>>>
>>> I'm wondering what the advantages are of having this as environment
>>> variable. Why not adding a commend line option to kas?
>>
>> So, what is the preference with kas, variables or command line options ?
>>
>
> Things we want to "sneak" into an existing configuration and invocation
> of KAS and that are environment-specific should go into vars. The rest
> into command line options or even the configs themselves.

Maybe this should be per-repo setting in the yaml file.

>>> Furthermore, I'm wondering if this will not explode when the referenced
>>> commit is outside of the specified depth - which can happen at any time.
>>
>> It will blow up, I don't think you can clone a specific commit out of a
>> git repo with --depth=1 (at least not right now), so this would have to
>> be limited to branches . And then, git clone --depth=1 for branch means
>> the topmost commit of that branch .
>>
>
> I that light, you must reject this option when the repos are not using
> 'commit'.

... are yes using commit ..., right ?

> And that actually raises the question if this switch shouldn't
> become part of the config schema so that we can enforce that dependency
> already at that level.

Probably. I'll revisit this when time permits.

Jan Kiszka

unread,
Nov 6, 2023, 9:20:55 AM11/6/23
to Marek Vasut, MOESSBAUER, Felix, kas-...@googlegroups.com
Yes, bit flip between brain and keyboard.

Jan

>> And that actually raises the question if this switch shouldn't
>> become part of the config schema so that we can enforce that dependency
>> already at that level.
>
> Probably. I'll revisit this when time permits.

Reply all
Reply to author
Forward
0 new messages