sbuild race with sporadic build breakages

60 views
Skip to first unread message

Jan Kiszka

unread,
May 23, 2024, 1:09:06 PM5/23/24
to isar-users, Uladzimir Bely
Hi all,

I think I've seen it before but didn't look closer so far:

2024-05-23 15:16:22 - INFO - Log data follows:
2024-05-23 15:16:22 - INFO - | DEBUG: Executing python function sstate_task_prefunc
2024-05-23 15:16:22 - INFO - | DEBUG: Python function sstate_task_prefunc finished
2024-05-23 15:16:22 - INFO - | DEBUG: Executing python function do_dpkg_build
2024-05-23 15:16:22 - INFO - | DEBUG: Executing shell function dpkg_schroot_create_configs
2024-05-23 15:16:22 - INFO - | Creating /etc/schroot/chroot.d/isar-builder-3cef593b-e753-4463-a544-cea3d14532e2-39837
2024-05-23 15:16:22 - INFO - | DEBUG: Shell function dpkg_schroot_create_configs finished
2024-05-23 15:16:22 - INFO - | DEBUG: Executing shell function dpkg_runbuild
2024-05-23 15:16:22 - INFO - | sbuild (Debian sbuild) 0.85.0 (04 January 2023) on runner-czqurwfpg-project-10191315-concurrent-25-hauy0e5m
2024-05-23 15:16:22 - INFO - |
2024-05-23 15:16:22 - INFO - | +==============================================================================+
2024-05-23 15:16:22 - INFO - | | linux-cip-rt 5.10.214-cip46-rt19+r0 (arm64) Thu, 23 May 2024 15:16:21 +0000 |
2024-05-23 15:16:22 - INFO - | +==============================================================================+
2024-05-23 15:16:22 - INFO - |
2024-05-23 15:16:22 - INFO - | Package: linux-cip-rt
2024-05-23 15:16:22 - INFO - | Version: 5.10.214-cip46-rt19+r0
2024-05-23 15:16:22 - INFO - | Source Version: 5.10.214-cip46-rt19+r0
2024-05-23 15:16:22 - INFO - | Distribution: isar
2024-05-23 15:16:22 - INFO - | Machine Architecture: amd64
2024-05-23 15:16:22 - INFO - | Host Architecture: arm64
2024-05-23 15:16:22 - INFO - | Build Architecture: amd64
2024-05-23 15:16:22 - INFO - | Build Profiles: kernel debian nolibcdev cross
2024-05-23 15:16:22 - INFO - | Build Type: binary
2024-05-23 15:16:22 - INFO - |
2024-05-23 15:16:22 - INFO - | E: /etc/schroot/chroot.d/isar-builder-3cef593b-e753-4463-a544-cea3d14532e2-41541: Failed to stat file: No such file or directory
2024-05-23 15:16:22 - INFO - | E: debconf-set-selections failed
2024-05-23 15:16:22 - INFO - | Failed to set up chroot
2024-05-23 15:16:22 - INFO - | E: Error creating chroot session: skipping linux-cip-rt
(https://gitlab.com/cip-project/cip-core/isar-cip-core/-/jobs/6923376323)

So, we are not finding in dpkg_runbuild the schroot we created just
before. Looking at why the the former may look for a different schroot,
this jumped at me:

SBUILD_CHROOT ?= "${DEBDISTRONAME}-${SCHROOT_USER}-${ISAR_BUILD_UUID}-${@os.getpid()}"

We are using some pid of the calling process here, and we assume that
this pid is only taken once (or at least over the same process) so that
there are no difference in SBUILD_CHROOT down the line. But if you look
at the log above, that seems to have happened: the last number is
different.

I'm not yet understanding why the evaluation may happen multiple times
and also why it then only rarely causes this, but I also wonder if we
shouldn't better enforce an immediate, single evaluation of os.getpid to
avoid that risk.

Along this, I wonder why we make all those variables here weak. Is there
really a use case for overwriting them? Not in-tree, apparently.

Jan

--
Siemens AG, Technology
Linux Expert Center

MOESSBAUER, Felix

unread,
May 24, 2024, 3:26:30 AM5/24/24
to ub...@ilbers.de, isar-...@googlegroups.com, Kiszka, Jan, Schmidt, Adriaan
During parsing, variables are evaluated multiple times, e.g. because
they happen in different branches of the parsing operation. The problem
here might be the lazy expansion of the variable, though. By that, the
SBUILD_CHROOT variable is expanded when the final value is needed (e.g.
via a d.getVar()). If the SBUILD_CHROOT is used in various context, the
expansion also happens multiple times. We should better use immediate
expansion here (at least for the getpid part).

CC'ing Adriaan as he is more familiar with the parser.

Felix

Anton Mikanovich

unread,
May 24, 2024, 4:08:06 AM5/24/24
to Jan Kiszka, isar-users, Uladzimir Bely
Hello Jan,

We've already faced with this floating issue and made some investigations.
As you mentioned, sbuild is using SBUILD_CHROOT value with PID of calling
process (which is locked to the task and shouldn't be changed during the
operation). But in some cases sbuild (which was started with correct
PID) got
other incorrect value. It can happen on different steps of the package build
process, even on cleanup.
The most interesting finding is that this new incorrect PID is not just
random,
but it is a PID of other sbuild task executed on the same time in
parallel. It
means the issue is not in SBUILD_CHROOT expanding, because variables are
not sharing between different tasks.

Jan Kiszka

unread,
May 24, 2024, 4:52:31 AM5/24/24
to Anton Mikanovich, isar-users, Uladzimir Bely
So, the variable may actually also not be evaluated per task but at some
point centrally? Then this is even more broken. Let's just use something
like a task hash here, stable and still unique inside a run.

Anton Mikanovich

unread,
May 24, 2024, 5:30:49 AM5/24/24
to Jan Kiszka, isar-users, Uladzimir Bely
24/05/2024 11:52, Jan Kiszka wrote:
> So, the variable may actually also not be evaluated per task but at some
> point centrally? Then this is even more broken. Let's just use something
> like a task hash here, stable and still unique inside a run.
>
> Jan
>
Sbuild is getting SBUILD_CHROOT value only once via command line option.
And it is already calculated and expanded at this point.
So no meter which keys will be used for SBUILD_CHROOT path the issue will
probably be there: tasks paths still will be different and sbuild still will
failing in case getting the path value of other wrong task.

Jan Kiszka

unread,
May 24, 2024, 6:42:21 AM5/24/24
to Anton Mikanovich, isar-users, Uladzimir Bely
On 24.05.24 11:30, Anton Mikanovich wrote:
> 24/05/2024 11:52, Jan Kiszka wrote:
>> So, the variable may actually also not be evaluated per task but at some
>> point centrally? Then this is even more broken. Let's just use something
>> like a task hash here, stable and still unique inside a run.
>>
>> Jan
>>
> Sbuild is getting SBUILD_CHROOT value only once via command line option.
> And it is already calculated and expanded at this point.

And that is where I'm not so sure, given the problems we see.

> So no meter which keys will be used for SBUILD_CHROOT path the issue will
> probably be there: tasks paths still will be different and sbuild still
> will
> failing in case getting the path value of other wrong task.
>

Why speaks against this?

diff --git a/meta/classes/sbuild.bbclass b/meta/classes/sbuild.bbclass
index f1193c20..3f27a2ef 100644
--- a/meta/classes/sbuild.bbclass
+++ b/meta/classes/sbuild.bbclass
@@ -7,7 +7,7 @@ SCHROOT_MOUNTS ?= ""

inherit crossvars

-SBUILD_CHROOT ?= "${DEBDISTRONAME}-${SCHROOT_USER}-${ISAR_BUILD_UUID}-${@os.getpid()}"
+SBUILD_CHROOT ?= "${DEBDISTRONAME}-${SCHROOT_USER}-${ISAR_BUILD_UUID}-${BB_TASKHASH}"

SBUILD_CONF_DIR ?= "${SCHROOT_CONF}/${SBUILD_CHROOT}"
SCHROOT_CONF_FILE ?= "${SCHROOT_CONF}/chroot.d/${SBUILD_CHROOT}"


We definitely need to get rid of that getpid.

Anton Mikanovich

unread,
May 24, 2024, 7:55:22 AM5/24/24
to Jan Kiszka, isar-users, Uladzimir Bely, Baurzhan Ismagulov
24/05/2024 13:42, Jan Kiszka wrote:
> Why speaks against this?
>
> diff --git a/meta/classes/sbuild.bbclass b/meta/classes/sbuild.bbclass
> index f1193c20..3f27a2ef 100644
> --- a/meta/classes/sbuild.bbclass
> +++ b/meta/classes/sbuild.bbclass
> @@ -7,7 +7,7 @@ SCHROOT_MOUNTS ?= ""
>
> inherit crossvars
>
> -SBUILD_CHROOT ?= "${DEBDISTRONAME}-${SCHROOT_USER}-${ISAR_BUILD_UUID}-${@os.getpid()}"
> +SBUILD_CHROOT ?= "${DEBDISTRONAME}-${SCHROOT_USER}-${ISAR_BUILD_UUID}-${BB_TASKHASH}"
>
> SBUILD_CONF_DIR ?= "${SCHROOT_CONF}/${SBUILD_CHROOT}"
> SCHROOT_CONF_FILE ?= "${SCHROOT_CONF}/chroot.d/${SBUILD_CHROOT}"
>
>
> We definitely need to get rid of that getpid.
>
> Jan
>
Ok, I can prepare a patch and we can try.
The only point we shouldn't loose is task-taskhash mapping, but hope it will
be ok with the existing logging of schroot session during its creation.

Here is one of the logs we've caught on our CI.
The first log output of session ID (with correct PID) is done by Isar.
The second one (also correct) is done by sbuild process later.
And the third (incorrect) log output is done also by the same sbuild process
when the issue happened.
Log:

ERROR: Logfile of failure stored in:
/workspace/build-auto/isar_master/1364/build/tmp/work/debian-bookworm-mipsel/hello-isar/0.5-b48d156-r0/temp/log.do_dpkg_build.3652871
ERROR: Task
(mc:qemumipsel-bookworm:/workspace/build-auto/isar_master/1364/meta-isar/recipes-app/hello-isar/hello-isar.bb:do_dpkg_build)
failed with exit code '1'
| DEBUG: Executing python function sstate_task_prefunc
| DEBUG: Python function sstate_task_prefunc finished
| DEBUG: Executing python function do_dpkg_build
| DEBUG: Executing shell function dpkg_schroot_create_configs
| Creating
/etc/schroot/chroot.d/isar-jenkins-6c34c167-bdc8-498f-828a-50cd88b412b0-3652871
| DEBUG: Shell function dpkg_schroot_create_configs finished
| DEBUG: Executing shell function dpkg_runbuild
| sbuild (Debian sbuild) 0.85.0 (04 January 2023) on qaton.m.ilbers.de
|
|
+==============================================================================+
| | hello-isar 0.5 (mipsel)                      Wed, 15 May 2024
04:12:02 +0000 |
|
+==============================================================================+
|
| Package: hello-isar
| Version: 0.5
| Source Version: 0.5
| Distribution: isar
| Machine Architecture: amd64
| Host Architecture: mipsel
| Build Architecture: mipsel
| Build Type: binary
|
| I: NOTICE: Log filtering will replace
'var/run/schroot/mount/isar-jenkins-6c34c167-bdc8-498f-828a-50cd88b412b0-3652871-9097eeed-6320-4cf9-b2b9-229b1d1fe5a0'
with '<<CHROOT>>'
|
|
+------------------------------------------------------------------------------+
| | Chroot Setup
Commands                                                        |
|
+------------------------------------------------------------------------------+
|
|
| echo "Package: *\nPin: release n=isar\nPin-Priority: 1000" >
/etc/apt/preferences.d/isar-apt
|
--------------------------------------------------------------------------------------------
|
|
| I: Finished running 'echo "Package: *\nPin: release
n=isar\nPin-Priority: 1000" > /etc/apt/preferences.d/isar-apt'.
|
| echo "APT::Get::allow-downgrades 1;" > /etc/apt/apt.conf.d/50isar-apt
| ---------------------------------------------------------------------
|
|
| I: Finished running 'echo "APT::Get::allow-downgrades 1;" >
/etc/apt/apt.conf.d/50isar-apt'.
|
| rm -f /var/log/dpkg.log
| -----------------------
|
|
| I: Finished running 'rm -f /var/log/dpkg.log'.
|
| mkdir -p /var/cache/apt/archives
| --------------------------------
|
|
| I: Finished running 'mkdir -p /var/cache/apt/archives'.
|
| find /home/builder/hello-isar/rootfs/var/cache/apt/archives -maxdepth
1 -name '*.deb' -exec ln -t /var/cache/apt/archives/ -sf {} +
|
-----------------------------------------------------------------------------------------------------------------------------------
|
|
| I: Finished running 'find
/home/builder/hello-isar/rootfs/var/cache/apt/archives -maxdepth 1 -name
'*.deb' -exec ln -t /var/cache/apt/archives/ -sf {} +'.
|
| apt-get update -o Dir::Etc::SourceList="sources.list.d/isar-apt.list"
-o Dir::Etc::SourceParts="-" -o APT::Get::List-Cleanup="0"
|
--------------------------------------------------------------------------------------------------------------------------------
|
| Get:1 file:/isar-apt isar InRelease
| Ign:1 file:/isar-apt isar InRelease
| Get:2 file:/isar-apt isar Release [5102 B]
| Get:2 file:/isar-apt isar Release [5102 B]
| Get:3 file:/isar-apt isar Release.gpg
| Ign:3 file:/isar-apt isar Release.gpg
| Get:4 file:/isar-apt isar/main Sources [3759 B]
| Get:5 file:/isar-apt isar/main mipsel Packages [3187 B]
| Reading package lists...
|
| I: Finished running 'apt-get update -o
Dir::Etc::SourceList="sources.list.d/isar-apt.list" -o
Dir::Etc::SourceParts="-" -o APT::Get::List-Cleanup="0"'.
|
| Finished processing commands.
|
--------------------------------------------------------------------------------
| I: NOTICE: Log filtering will replace
'build/hello-isar-gmVcne/resolver-6OF4Lt' with '<<RESOLVERDIR>>'
|
|
+------------------------------------------------------------------------------+
| | Update chroot |
|
+------------------------------------------------------------------------------+
|
| Reading package lists...
| Building dependency tree...
| Reading state information...
| Calculating upgrade...
| 0 upgraded, 0 newly installed, 0 to remove and 0 not upgraded.
|
|
+------------------------------------------------------------------------------+
| | Fetch source
files                                                           |
|
+------------------------------------------------------------------------------+
|
|
| Local sources
| -------------
|
|
/workspace/build-auto/isar_master/1364/build/tmp/work/debian-bookworm-mipsel/hello-isar/0.5-b48d156-r0/hello-isar_0.5.dsc
exists in
/workspace/build-auto/isar_master/1364/build/tmp/work/debian-bookworm-mipsel/hello-isar/0.5-b48d156-r0;
copying to chroot
| I: NOTICE: Log filtering will replace
'build/hello-isar-gmVcne/hello-isar-0.5' with '<<PKGBUILDDIR>>'
| I: NOTICE: Log filtering will replace 'build/hello-isar-gmVcne' with
'<<BUILDDIR>>'
|
|
+------------------------------------------------------------------------------+
| | Install package build
dependencies                                           |
|
+------------------------------------------------------------------------------+
|
|
| Setup apt archive
| -----------------
|
| E:
/etc/schroot/chroot.d/isar-jenkins-6c34c167-bdc8-498f-828a-50cd88b412b0-2897105:
Failed to stat file: No such file or directory
| E: Can't mkdir /<<RESOLVERDIR>>/sbuild-build-depends-main-dummy: No
such file or directory
| W: Could not create build-depends dummy dir
/<<RESOLVERDIR>>/sbuild-build-depends-main-dummyNo such file or
directoryE: Setting up apt archive failed

Jan Kiszka

unread,
Jun 17, 2024, 3:44:56 AM6/17/24
to Anton Mikanovich, isar-users, Uladzimir Bely, Baurzhan Ismagulov
On 24.05.24 13:55, Anton Mikanovich wrote:
> 24/05/2024 13:42, Jan Kiszka wrote:
>> Why speaks against this?
>>
>> diff --git a/meta/classes/sbuild.bbclass b/meta/classes/sbuild.bbclass
>> index f1193c20..3f27a2ef 100644
>> --- a/meta/classes/sbuild.bbclass
>> +++ b/meta/classes/sbuild.bbclass
>> @@ -7,7 +7,7 @@ SCHROOT_MOUNTS ?= ""
>>     inherit crossvars
>>   -SBUILD_CHROOT ?=
>> "${DEBDISTRONAME}-${SCHROOT_USER}-${ISAR_BUILD_UUID}-${@os.getpid()}"
>> +SBUILD_CHROOT ?=
>> "${DEBDISTRONAME}-${SCHROOT_USER}-${ISAR_BUILD_UUID}-${BB_TASKHASH}"
>>     SBUILD_CONF_DIR ?= "${SCHROOT_CONF}/${SBUILD_CHROOT}"
>>   SCHROOT_CONF_FILE ?= "${SCHROOT_CONF}/chroot.d/${SBUILD_CHROOT}"
>>
>>
>> We definitely need to get rid of that getpid.
>>
>> Jan
>>
> Ok, I can prepare a patch and we can try.
> The only point we shouldn't loose is task-taskhash mapping, but hope it
> will
> be ok with the existing logging of schroot session during its creation.
>

What's the status of this? Just triggered this bug once again over isar
next, reminding me of it.

Baurzhan Ismagulov

unread,
Jun 17, 2024, 4:32:53 AM6/17/24
to isar-users, Jan Kiszka
On 2024-06-17 09:44, Jan Kiszka wrote:
> On 24.05.24 13:55, Anton Mikanovich wrote:
> > 24/05/2024 13:42, Jan Kiszka wrote:
> >> Why speaks against this?
> >>
> >> diff --git a/meta/classes/sbuild.bbclass b/meta/classes/sbuild.bbclass
> >> index f1193c20..3f27a2ef 100644
> >> --- a/meta/classes/sbuild.bbclass
> >> +++ b/meta/classes/sbuild.bbclass
> >> @@ -7,7 +7,7 @@ SCHROOT_MOUNTS ?= ""
> >>     inherit crossvars
> >>   -SBUILD_CHROOT ?=
> >> "${DEBDISTRONAME}-${SCHROOT_USER}-${ISAR_BUILD_UUID}-${@os.getpid()}"
> >> +SBUILD_CHROOT ?=
> >> "${DEBDISTRONAME}-${SCHROOT_USER}-${ISAR_BUILD_UUID}-${BB_TASKHASH}"
> >>     SBUILD_CONF_DIR ?= "${SCHROOT_CONF}/${SBUILD_CHROOT}"
> >>   SCHROOT_CONF_FILE ?= "${SCHROOT_CONF}/chroot.d/${SBUILD_CHROOT}"
> >>
> >> We definitely need to get rid of that getpid.
> >>
> > Ok, I can prepare a patch and we can try.
> > The only point we shouldn't loose is task-taskhash mapping, but hope it
> > will
> > be ok with the existing logging of schroot session during its creation.
>
> What's the status of this? Just triggered this bug once again over isar
> next, reminding me of it.

Anton will test and send the patch these days.

We are still analyzing the root cause. The issue is not reliably reproducible.
Given that bitbake differentiates tasks by pid in run.do_* and log.do_* files,
I think the pid itself should be fine. The mechanism of getting the pid could
have something, although I'm wondering what kind of issues @os.getpid() could
have.

Also, Anton's log of 2024-05-24 shows spurious ID change inside a signle sbuild
call:

ERROR: Logfile of failure stored in: /workspace/build-auto/isar_master/1364/build/tmp/work/debian-bookworm-mipsel/hello-isar/0.5-b48d156-r0/temp/log.do_dpkg_build.3652871
...
| I: NOTICE: Log filtering will replace 'var/run/schroot/mount/isar-jenkins-6c34c167-bdc8-498f-828a-50cd88b412b0-3652871-9097eeed-6320-4cf9-b2b9-229b1d1fe5a0' with '<<CHROOT>>'
...
| E: /etc/schroot/chroot.d/isar-jenkins-6c34c167-bdc8-498f-828a-50cd88b412b0-2897105: Failed to stat file: No such file or directory

And this is not a string corruption, this directory did actually exist at some
point. So looks rather like sbuild having a list of sessions and picking the
wrong one at some point.

By saying "We definitely need to get rid of that getpid", what kind of scenario
do you have in mind?

With kind regards,
Baurzhan

Jan Kiszka

unread,
Jun 17, 2024, 8:55:16 AM6/17/24
to isar-users
I think the issue is when the PID is collected into variables. From the
logs, this does not work reliable, therefore the mismatch and the failure.

Anton Mikanovich

unread,
Jun 17, 2024, 10:12:09 AM6/17/24
to Jan Kiszka, isar-users
17/06/2024 15:55, 'Jan Kiszka' via isar-users wrote:
> I think the issue is when the PID is collected into variables. From the
> logs, this does not work reliable, therefore the mismatch and the failure.
>
> Jan
>
I can't agree with that hypothesis because even if SBUILD_CHROOT will be
changed for any reason there is no way for sbuild to reread it from Bitbake
if sbuild was already started.
Anyway I've sent the patch so we can try to see if it change anything.

Jan Kiszka

unread,
Jun 17, 2024, 10:50:56 AM6/17/24
to Anton Mikanovich, isar-users
We have split functions here, schroot_create_configs vs. dpkg_runbuild.
It's not that things need to change within the very same sbuild session.

Uladzimir Bely

unread,
Jun 24, 2024, 4:00:19 AM6/24/24
to Jan Kiszka, isar-users, Felix MOESSBAUER


We were running CI with the patch from
https://groups.google.com/g/isar-users/c/koNMYOyo-YM applied for
several days and at some moment the issue happened again (with "heavy"
NoCross test):

```
Creating /etc/schroot/chroot.d/isar-jenkins-7a841d41-9632-420f-a97d-
bbf30e2732b9-
9f8d1f2d22de72cfbecc8e6ed8ceb6f59a829def67a2b9b7e2eb45a82af88ef2
DEBUG: Shell function dpkg_schroot_create_configs finished
DEBUG: Executing shell function dpkg_runbuild
sbuild (Debian sbuild) 0.85.0 (04 January 2023) on turar.m.ilbers.de

+======================================================================
========+
| libhello 0.3 (mipsel) Sat, 22 Jun 2024
12:34:22 +0000 |
+======================================================================
========+

Package: libhello
Version: 0.3
Source Version: 0.3
Distribution: isar
Machine Architecture: amd64
Host Architecture: mipsel
Build Architecture: mipsel
Build Type: binary

I: NOTICE: Log filtering will replace 'var/run/schroot/mount/isar-
jenkins-7a841d41-9632-420f-a97d-bbf30e2732b9-
9f8d1f2d22de72cfbecc8e6ed8ceb6f59a829def67a2b9b7e2eb45a82af88ef2-
3ab2cd48-c709-4e9e-888f-a45fb2da713e' with '<<CHROOT>>'

+----------------------------------------------------------------------
--------+
| Chroot Setup Commands
|
+----------------------------------------------------------------------
--------+


echo "Package: *\nPin: release n=isar\nPin-Priority: 1000" >
/etc/apt/preferences.d/isar-apt
-----------------------------------------------------------------------
---------------------


I: Finished running 'echo "Package: *\nPin: release n=isar\nPin-
Priority: 1000" > /etc/apt/preferences.d/isar-apt'.

echo "APT::Get::allow-downgrades 1;" > /etc/apt/apt.conf.d/50isar-apt
---------------------------------------------------------------------


I: Finished running 'echo "APT::Get::allow-downgrades 1;" >
/etc/apt/apt.conf.d/50isar-apt'.

rm -f /var/log/dpkg.log
-----------------------


I: Finished running 'rm -f /var/log/dpkg.log'.

mkdir -p /var/cache/apt/archives
--------------------------------


I: Finished running 'mkdir -p /var/cache/apt/archives'.

find /home/builder/libhello/rootfs/var/cache/apt/archives -maxdepth 1 -
name '*.deb' -exec ln -t /var/cache/apt/archives/ -sf {} +
-----------------------------------------------------------------------
----------------------------------------------------------


I: Finished running 'find
/home/builder/libhello/rootfs/var/cache/apt/archives -maxdepth 1 -name
'*.deb' -exec ln -t /var/cache/apt/archives/ -sf {} +'.

apt-get update -o Dir::Etc::SourceList="sources.list.d/isar-apt.list" -
o Dir::Etc::SourceParts="-" -o APT::Get::List-Cleanup="0"
-----------------------------------------------------------------------
---------------------------------------------------------

Get:1 file:/isar-apt isar InRelease
Ign:1 file:/isar-apt isar InRelease
Get:2 file:/isar-apt isar Release [5006 B]
Get:2 file:/isar-apt isar Release [5006 B]
Get:3 file:/isar-apt isar Release.gpg
Ign:3 file:/isar-apt isar Release.gpg
Reading package lists...

I: Finished running 'apt-get update -o
Dir::Etc::SourceList="sources.list.d/isar-apt.list" -o
Dir::Etc::SourceParts="-" -o APT::Get::List-Cleanup="0"'.

Finished processing commands.
-----------------------------------------------------------------------
---------
I: NOTICE: Log filtering will replace 'build/libhello-BBthl5/resolver-
X1hxT7' with '<<RESOLVERDIR>>'

+----------------------------------------------------------------------
--------+
| Update chroot
|
+----------------------------------------------------------------------
--------+

Reading package lists...
Building dependency tree...
Reading state information...
Calculating upgrade...
0 upgraded, 0 newly installed, 0 to remove and 0 not upgraded.

+----------------------------------------------------------------------
--------+
| Fetch source files
|
+----------------------------------------------------------------------
--------+


Local sources
-------------

/workspace/build/isar_ub_devel/436/build/tmp/work/debian-bookworm-
mipsel/libhello/0.3-f4a5669-r0/libhello_0.3.dsc exists in
/workspace/build/isar_ub_devel/436/build/tmp/work/debian-bookworm-
mipsel/libhello/0.3-f4a5669-r0; copying to chroot
I: NOTICE: Log filtering will replace 'build/libhello-BBthl5/libhello-
0.3' with '<<PKGBUILDDIR>>'
I: NOTICE: Log filtering will replace 'build/libhello-BBthl5' with
'<<BUILDDIR>>'

+----------------------------------------------------------------------
--------+
| Install package build dependencies
|
+----------------------------------------------------------------------
--------+


Setup apt archive
-----------------

E: /var/lib/schroot/session/isar-jenkins-7a841d41-9632-420f-a97d-
bbf30e2732b9-
8641cd44af3718c57bd0c9e341b9dd285c91d116b7bb1d5b93a87a4e4c043d01-
0f576f86-4a3b-4d43-a366-f14782e97097: Failed to open file: No such file
or directory
E: Can't chmod /<<RESOLVERDIR>> to 0770: No such file or directory
E: Failed to set 0770 permissions on dummy package dir
E: Setting up apt archive failed
```

And that's what we see in logs:

```
build/tmp$ grep -R f14782e97097
work/debian-bookworm-mipsel/libhello/0.3-f4a5669-
r0/temp/log.do_dpkg_build:E: /var/lib/schroot/session/isar-jenkins-
7a841d41-9632-420f-a97d-bbf30e2732b9-
8641cd44af3718c57bd0c9e341b9dd285c91d116b7bb1d5b93a87a4e4c043d01-
0f576f86-4a3b-4d43-a366-f14782e97097: Failed to open file: No such file
or directory
work/debian-bookworm-mipsel/libhello/0.3-f4a5669-
r0/temp/log.do_dpkg_build.694689:E: /var/lib/schroot/session/isar-
jenkins-7a841d41-9632-420f-a97d-bbf30e2732b9-
8641cd44af3718c57bd0c9e341b9dd285c91d116b7bb1d5b93a87a4e4c043d01-
0f576f86-4a3b-4d43-a366-f14782e97097: Failed to open file: No such file
or directory
work/debian-bookworm-amd64/kselftest/5.6.13-
r0/temp/log.do_dpkg_build.369632:I: NOTICE: Log filtering will replace
'var/run/schroot/mount/isar-jenkins-7a841d41-9632-420f-a97d-
bbf30e2732b9-
8641cd44af3718c57bd0c9e341b9dd285c91d116b7bb1d5b93a87a4e4c043d01-
0f576f86-4a3b-4d43-a366-f14782e97097' with '<<CHROOT>>'
work/debian-bookworm-amd64/kselftest/5.6.13-
r0/temp/log.do_dpkg_build.369632:SCHROOT_SESSION_ID=isar-jenkins-
7a841d41-9632-420f-a97d-bbf30e2732b9-
8641cd44af3718c57bd0c9e341b9dd285c91d116b7bb1d5b93a87a4e4c043d01-
0f576f86-4a3b-4d43-a366-f14782e97097
```

For some reason, when building "libhello" package for "mipsel"
architecture, sbuild tried to use existing directory, that "kselftest"
package form "amd64" used.

So, replacing pid with task hashsum does not help with this sporadic
issue (but reduces readability of log files).

We'll continue debugging the root cause of the issue.

--
Best regards,
Uladzimir.

Jan Kiszka

unread,
Jun 24, 2024, 5:13:20 AM6/24/24
to Uladzimir Bely, isar-users, Felix MOESSBAUER
Interesting - at least we ruled out one potential cause this way.

Thanks,

Anton Mikanovich

unread,
Sep 9, 2024, 6:53:41 AM9/9/24
to Jan Kiszka, isar-users, Felix Moessbauer, Uladzimir Bely, Baurzhan Ismagulov
23/05/2024 20:08, 'Jan Kiszka' via isar-users wrote:
> 2024-05-23 15:16:22 - INFO - | E: /etc/schroot/chroot.d/isar-builder-3cef593b-e753-4463-a544-cea3d14532e2-41541: Failed to stat file: No such file or directory

Hello Jan,

I've finally found the root cause of this issue and it turns out that
schroot
is not fully thread safe when managing config files. Luckly this issue looks
quite easy to fix with one additional lock file, I've reported this
findings to
the upstream:
https://codeberg.org/shelter/reschroot/issues/10

Jan Kiszka

unread,
Oct 1, 2024, 3:36:01 AM10/1/24
to Anton Mikanovich, isar-users, Felix Moessbauer, Uladzimir Bely, Baurzhan Ismagulov
Is there a way to mitigate this outside of sbuild? Upstream shows no
reactions so far (already pinged them again?), and then it will still
take a while to get that via stable debian updates and finally the kas
container.

Jan Kiszka

unread,
Nov 11, 2024, 4:31:09 AM11/11/24
to Anton Mikanovich, isar-users, Felix Moessbauer, Uladzimir Bely, Baurzhan Ismagulov
Ping: Would locking outside of schroot mean that isar had to serialize
all build jobs? Or why don't we have workaround yet?

BTW, you should also create a Debian bug for this. Seems there is none
yet, right?

Anton Mikanovich

unread,
Nov 11, 2024, 5:44:20 AM11/11/24
to Jan Kiszka, isar-users, Felix Moessbauer, Uladzimir Bely, Baurzhan Ismagulov
11/11/2024 11:31, Jan Kiszka wrote:
> Ping: Would locking outside of schroot mean that isar had to serialize
> all build jobs? Or why don't we have workaround yet?
>
> BTW, you should also create a Debian bug for this. Seems there is none
> yet, right?
>
> Jan
>
Hello Jan,

The solution I've prepared as a fix don't need Isar jobs to be serialized,
because shared locking is used wherever possible. The only change in Isar is
to make schroot_delete_configs function use config lock in exclusive
mode, but
it's quite fast operation, so no slowdown should be observed.

Here is how this change will be look like:

https://github.com/ilbers/isar/commit/30ea9c25164e516cba9180b5cd08d708ec49d220

And here are the changes proposed for schroot:

https://codeberg.org/shelter/reschroot/pulls/11/commits

After long testing of patched schroot+Isar in our CI we've found it working,
but some cosmetic changes still needed.

There is no Debian bug so far, we will create it.

Jan Kiszka

unread,
Nov 11, 2024, 7:04:31 AM11/11/24
to Anton Mikanovich, isar-users, Felix Moessbauer, Uladzimir Bely, Baurzhan Ismagulov
On 11.11.24 11:44, Anton Mikanovich wrote:
> 11/11/2024 11:31, Jan Kiszka wrote:
>> Ping: Would locking outside of schroot mean that isar had to serialize
>> all build jobs? Or why don't we have workaround yet?
>>
>> BTW, you should also create a Debian bug for this. Seems there is none
>> yet, right?
>>
>> Jan
>>
> Hello Jan,
>
> The solution I've prepared as a fix don't need Isar jobs to be serialized,
> because shared locking is used wherever possible. The only change in
> Isar is
> to make schroot_delete_configs function use config lock in exclusive
> mode, but
> it's quite fast operation, so no slowdown should be observed.
>
> Here is how this change will be look like:
>
> https://github.com/ilbers/isar/
> commit/30ea9c25164e516cba9180b5cd08d708ec49d220
>

Please send to the list now, for review and broader testing. Yes, that
"|| true" will likely require some clean up.

Jan

> And here are the changes proposed for schroot:
>
> https://codeberg.org/shelter/reschroot/pulls/11/commits
>
> After long testing of patched schroot+Isar in our CI we've found it
> working,
> but some cosmetic changes still needed.
>
> There is no Debian bug so far, we will create it.
>

Jan Kiszka

unread,
Nov 11, 2024, 9:29:07 AM11/11/24
to Anton Mikanovich, isar-users, Felix Moessbauer, Uladzimir Bely, Baurzhan Ismagulov
On 11.11.24 13:04, Jan Kiszka wrote:
> On 11.11.24 11:44, Anton Mikanovich wrote:
>> 11/11/2024 11:31, Jan Kiszka wrote:
>>> Ping: Would locking outside of schroot mean that isar had to serialize
>>> all build jobs? Or why don't we have workaround yet?
>>>
>>> BTW, you should also create a Debian bug for this. Seems there is none
>>> yet, right?
>>>
>>> Jan
>>>
>> Hello Jan,
>>
>> The solution I've prepared as a fix don't need Isar jobs to be serialized,
>> because shared locking is used wherever possible. The only change in
>> Isar is
>> to make schroot_delete_configs function use config lock in exclusive
>> mode, but
>> it's quite fast operation, so no slowdown should be observed.
>>
>> Here is how this change will be look like:
>>
>> https://github.com/ilbers/isar/
>> commit/30ea9c25164e516cba9180b5cd08d708ec49d220
>>
>
> Please send to the list now, for review and broader testing. Yes, that
> "|| true" will likely require some clean up.
>

Err, this patch depends on your so far ignored pull request - this does
not help us because existing schroot lacks --lock/unlock. We would need
a solution around current schroot. Is THAT possible as well?

Jan

Anton Mikanovich

unread,
Nov 11, 2024, 9:37:52 AM11/11/24
to Jan Kiszka, isar-users, Felix Moessbauer, Uladzimir Bely, Baurzhan Ismagulov
11/11/2024 16:28, Jan Kiszka wrote:
> Err, this patch depends on your so far ignored pull request - this does
> not help us because existing schroot lacks --lock/unlock. We would need
> a solution around current schroot. Is THAT possible as well?
>
> Jan
>
As schroot is not thread safe and have conflicts with itself, I don't
see other
ways of fixing the issue if we are not limiting the number of running
schroot
instances (= full serialization, even between different users and
workspaces).

Jan Kiszka

unread,
Nov 11, 2024, 9:52:17 AM11/11/24
to Anton Mikanovich, isar-users, Felix Moessbauer, Uladzimir Bely, Baurzhan Ismagulov
We could only lock known sessions against each other. But that would
already help everything that runs in a CI runner, maybe in a kas-isar
container. Not perfect but would probably address 99% of cases today.

Anton Mikanovich

unread,
Nov 11, 2024, 10:17:05 AM11/11/24
to Jan Kiszka, isar-users, Felix Moessbauer, Uladzimir Bely, Baurzhan Ismagulov
11/11/2024 16:52, Jan Kiszka wrote:
> We could only lock known sessions against each other. But that would
> already help everything that runs in a CI runner, maybe in a kas-isar
> container. Not perfect but would probably address 99% of cases today.
>
> Jan
>
This will result in serialization of do_dpkg_build tasks and will cause Isar
slowdown.

Jan Kiszka

unread,
Nov 11, 2024, 10:57:06 AM11/11/24
to Anton Mikanovich, isar-users, Felix Moessbauer, Uladzimir Bely, Baurzhan Ismagulov
OK, that was the information I was looking for - and was afraid of.

But even if your PR would be merged tomorrow (which is completely
unpredictable right now), it would still require backports to bookworm
at least, and those would need to be deployed locally or integrated into
future versions of kas. All that is not going to happen soon - if at
all. It may even be that only trixes schroot will have all what we need.

Can't we break up the calls to schroot into those that need looking and
those that actually build so that, inside do_dpkg_build, we only lock
certain commands, not the complete task?

MOESSBAUER, Felix

unread,
Nov 11, 2024, 12:58:39 PM11/11/24
to ami...@ilbers.de, isar-...@googlegroups.com, Kiszka, Jan, ub...@ilbers.de, i...@ilbers.de
On Mon, 2024-11-11 at 16:56 +0100, Jan Kiszka wrote:
> On 11.11.24 16:17, Anton Mikanovich wrote:
> > 11/11/2024 16:52, Jan Kiszka wrote:
> > > We could only lock known sessions against each other. But that
> > > would
> > > already help everything that runs in a CI runner, maybe in a kas-
> > > isar
> > > container. Not perfect but would probably address 99% of cases
> > > today.
> > >
> > > Jan
> > >
> > This will result in serialization of do_dpkg_build tasks and will
> > cause
> > Isar
> > slowdown.
> >
>
> OK, that was the information I was looking for - and was afraid of.
>
> But even if your PR would be merged tomorrow (which is completely
> unpredictable right now), it would still require backports to
> bookworm
> at least, and those would need to be deployed locally or integrated
> into
> future versions of kas. All that is not going to happen soon - if at
> all. It may even be that only trixes schroot will have all what we
> need.

We have similar problems with apt and proper rate-limit handling. But
that's OT.

At a second thought I'm wondering if we should drop schroot at all and
use mmdebstrap + unshare. By that, the sbuild also does no longer
require root access. I'm running sbuild with unshare backend for some
time locally now (not with ISAR, though).

Felix

Jan Kiszka

unread,
Nov 11, 2024, 2:48:37 PM11/11/24
to Moessbauer, Felix (FT RPD CED OES-DE), ami...@ilbers.de, isar-...@googlegroups.com, ub...@ilbers.de, i...@ilbers.de
https://lists.debian.org/debian-devel/2024/06/msg00257.html - there is
likely no simple answer for the backend yet.

BTW, would unshare work at all inside our build containers?

MOESSBAUER, Felix

unread,
Nov 13, 2024, 1:52:00 AM11/13/24
to ami...@ilbers.de, isar-...@googlegroups.com, Kiszka, Jan, ub...@ilbers.de, i...@ilbers.de
For me that reads like unschroot is just a unshare backend on steroids.
But as we can't use that prior to trixie, it probably does not help
here. Still I see no reason that blocks us from using the unshare
backend.

>
> BTW, would unshare work at all inside our build containers?

Yes, that works as namespaces can be stacked - that's how podman user
(non root) containers work.

Felix

Jan Kiszka

unread,
Nov 13, 2024, 2:01:03 AM11/13/24
to Moessbauer, Felix (FT RPD CED OES-DE), ami...@ilbers.de, isar-...@googlegroups.com, ub...@ilbers.de, i...@ilbers.de
My point was that there are apparently enough limitations of unshare to
motivate this additional backend work. And that concerns me that we
would only replace problems with different problems.

Jan

Anton Mikanovich

unread,
Dec 5, 2024, 10:57:12 AM12/5/24
to Jan Kiszka, isar-users, Felix Moessbauer, Uladzimir Bely, Baurzhan Ismagulov
11/11/2024 17:56, Jan Kiszka wrote:
> Can't we break up the calls to schroot into those that need looking and
> those that actually build so that, inside do_dpkg_build, we only lock
> certain commands, not the complete task?
>
> Jan
>
It looks like this approach can work, I've send the implementation in:

[RFC 0/1] External fix for sporadic schroot race issue

Testing will take some time, hope it will help.

Reply all
Reply to author
Forward
0 new messages