Advice Request: CI/CD docker Kiwi build fail

97 views
Skip to first unread message

Jan Robinson

unread,
Jun 21, 2022, 4:58:13 AM6/21/22
to kiwi
Hello All

A Kiwi build that works 100% fails in CI/CD pipeline.

Any suggestions to get past this failure will be appreciated.

Runner setup:
  [runners.docker]
    tls_verify = false
    image = "dr.group.net/ci-cd/bmw_sles15sp3_ci:latest"
    privileged = true
    disable_entrypoint_overwrite = false
    oom_kill_disable = false
    disable_cache = false
    volumes = ["/cache", "/dev:/dev"]
    shm_size = 0
    helper_image = "dr.group.net/public/gitlab-runner-helper:x86_64-${CI_RUNNER_REVISION}"


From the kiwi log.

[ INFO    ]: 07:26:16 | Check/Fix File Permissions
[ DEBUG   ]: 07:26:16 | EXEC: [chroot /builds/QXR2904/kiwi-build-templates-pxe/kiwi_pxe/build/image-root chkstat --system --set]
[ DEBUG   ]: 07:26:16 | EXEC: Failed with stderr: /usr/lib/utempter/utempter: will not give away capabilities or setXid bits on an insecure path
/etc/ssh/sshd_config: file has insecure permissions (world-writable)
ERROR: not all operations were successful.
, stdout: Checking permissions and ownerships - using the permissions files
    /etc/permissions
    /etc/permissions.easy
    /etc/permissions.local
setting /root/ to root:root 0700. (wrong permissions 0777)
setting /etc/ to root:root 0755. (wrong permissions 0777)
setting /home/ to root:root 0755. (wrong permissions 0777)
setting /usr/ to root:root 0755. (wrong permissions 0777)
setting /var/spool/ to root:root 0755. (wrong permissions 0777)
setting /etc/passwd to root:root 0644. (wrong permissions 0664)
setting /etc/shadow to root:shadow 0640. (wrong owner/group root:root permissions 0664)
setting /usr/lib/utempter/utempter to root:utmp 2755. (wrong owner/group root:dialout)
[ ERROR   ]: 07:26:16 | KiwiCommandError: chroot: stderr: /usr/lib/utempter/utempter: will not give away capabilities or setXid bits on an insecure path
/etc/ssh/sshd_config: file has insecure permissions (world-writable)
ERROR: not all operations were successful.
, stdout: Checking permissions and ownerships - using the permissions files
    /etc/permissions
    /etc/permissions.easy
    /etc/permissions.local
setting /root/ to root:root 0700. (wrong permissions 0777)
setting /etc/ to root:root 0755. (wrong permissions 0777)
setting /home/ to root:root 0755. (wrong permissions 0777)
setting /usr/ to root:root 0755. (wrong permissions 0777)
setting /var/spool/ to root:root 0755. (wrong permissions 0777)
setting /etc/passwd to root:root 0644. (wrong permissions 0664)
setting /etc/shadow to root:shadow 0640. (wrong owner/group root:root permissions 0664)
setting /usr/lib/utempter/utempter to root:utmp 2755. (wrong owner/group root:dialout)
[ INFO    ]: 07:26:16 | Cleaning up SystemPrepare instance

Thanks so much,
Jan

Marcus Schäfer

unread,
Jun 21, 2022, 4:52:56 PM6/21/22
to kiwi-...@googlegroups.com
Hi Jan,
I think I have seen this problem if the host is using btrfs.
Does your container use btrfs as the storage backend ? In this case
btrfs snapshots are used per container to provide the storage
and I think I remember I ran into this issue that chkstat was no
longer able to fix the file permissions

I cannot explain the real issue but I observed the same problem some
time ago. You have several options:

a) Try to use the overlay(overlay2 on docker) storage backend in your
/etc/containers/storage.conf

b) Make chkstat a noop by using a "post_bootstrap.sh" script
doing something nasty like:

cp /usr/bin/true /usr/bin/chkstat

Option b) is just to unblock you but I would not trust the result
image and you maybe should find the root cause.

Hope this helps a bit and I would be very interested if you can
share your findings here.

Thanks

Regards,
Marcus
--
Public Key available via: https://keybase.io/marcus_schaefer/key.asc
keybase search marcus_schaefer
-------------------------------------------------------
Marcus Schäfer Brunnenweg 18
Tel: +49 7562 905437 D-88260 Argenbühl
Germany
-------------------------------------------------------
signature.asc

Jan Robinson

unread,
Jun 22, 2022, 2:45:27 AM6/22/22
to kiwi
Hi Marcus

Thank you for the hints.
Will give it a go.

Regards,
Jan

Alex Mantel

unread,
Jun 22, 2022, 4:25:13 AM6/22/22
to kiwi-...@googlegroups.com
Hey Jan,

in addition to Marcus' suggestions i would recommend to compare the
effective capabilities (CapEff) in your successful environment and your
docker container. Even if the container is privileged, i would have a
manual look at the capabilities.
For instance:

cat /proc/$$/status | grep Cap

/usr/sbin/capsh --decode=00000000a80425fb | tr , \\n

When the capabilities are not the same in both environments, you might
figure out which capabilities are missing.

Regards, Alex
--
Alex Mantel man...@pre-sense.de
PRESENSE Technologies GmbH Nagelsweg 41, D-20097 HH
Geschäftsführer/Managing Directors AG Hamburg, HRB 107844
Till Dörges, Jürgen Sander USt-IdNr.: DE263765024

Jan Robinson

unread,
Jun 22, 2022, 4:37:31 AM6/22/22
to kiwi
Hello Alex,

 thanks for sharing. It is valuable.

I am looking at add to the runner config cap_add =  <will see what >

Can't wait to test. Haven't worked on this for two days now...

Regards,
Jan

Jan Robinson

unread,
Jun 23, 2022, 11:27:41 AM6/23/22
to kiwi
Hello Marcus

Thanks so much for the suggestion: btrfs the culprit.

Working docker host:
 Storage Driver: overlay2
  Backing Filesystem: xfs
 
Failing docker host:
 Storage Driver: btrfs

@Alex Comparing the capabilities, both were the same.

Thank you so much for the support.
Jan

Marcus Schäfer

unread,
Jun 24, 2022, 3:41:36 AM6/24/22
to kiwi-...@googlegroups.com
Hi Jan,

> Thanks so much for the suggestion: btrfs the culprit.
>
> Working docker host:
> Storage Driver: overlay2
> Backing Filesystem: xfs
>
> Failing docker host:
> Storage Driver: btrfs

Hrm, yes unfortunately that matches my past experience. If you
find the reason and maybe a solution/workaround it would be great
if you can share it here.

Actually the btrfs based storage backend for podman/docker is
a nice solution. One snapshot per container

but for building images on top of a btrfs controlled build host I
had many issues. For example last time a user tried to build lvm
images on a build host with a btrfs rootfs and that causes the
lvm tooling to react with very weird response messages like you
observed in your case.

The real problem with all this imho is that you don't expect the
build host rootfs to play any role here... but it does.

This is also the reason why we try to advertise our concept of
boxbuild which eliminates this sort of issues:

https://osinside.github.io/kiwi/plugins/self_contained.html

Just in case you are interested.

No new level of indirection with new issue though :) When using
boxbuild in the cloud or on kubernetes(containers in general) you
need nested virtualization support.

I assume you run the builds in containers for the same reasons
why we provide a box plugin. The idea to move the builds into
containers also because CI systems are often based on container
infrastructure is a good idea. If there wouldn't be these
unexpected after effects ;)

We collect all findings in that regard in a troubleshooting
chapter:

https://osinside.github.io/kiwi/troubleshooting.html

So if you find anything that can be helpful just let us know
or maybe even create PR against the troubleshooting chapter :)

Thanks in advance
signature.asc

Jan Robinson

unread,
Jun 29, 2022, 8:45:36 AM6/29/22
to kiwi
Hello Marcus.

I have to recant a bit: not btrfs

The build done that worked with a xfs docker host and a sles15.3 container, was used without a gitlab-runner.
As soon as we registered the runner in CI/CD on the xfs docker host's runner (V15.0.0) and ran the kiwi build there, it failed with the same error.

Common ground for the failing build - the gitlab runner: executor = "docker".

We are now using a standalone runner executor = "shell".

It has its drawbacks but at least we can go on.

Regards,
Jan

Jan Robinson

unread,
Sep 14, 2022, 7:17:46 AM9/14/22
to kiwi
Hi Marcus

Some update.

This kiwi build in CI/CD with executor = "docker" is now working 100%.
Docker is not used or running but podman.socket.service

# ls -l /var/run/docker.sock
lrwxrwxrwx 1 root root 23 Aug  8 08:05 /var/run/docker.sock -> /run/podman/podman.sock

The difference is, instead of in config.toml using
   volumes = ["/cache","/dev:/dev","/global/local/pod-result:/result"].

Where "/result" will be used as "--target-dir /result", in the kiwi command line, a directory inside the sles15-kiwi container is used, "--target-dir /workdir".
Then after the build the result is copied to "/result" that is mounted in the container.

So the chroot etc. operations, works as in a standalone server. 
As expected.

build-job:
  stage: build
  tags:
    - podman-docker-exec
  script:
    - ./sle15/cicd-build.sh   <-- script creates "${CI_BUILDS_DIR}/workdir" in container
    - cd workdir
    - rm -rf *.raw build/
    - cp -p * /result

  when: manual

config.toml
host = "unix:///run/podman/podman.sock"
volumes = ["/cache","/dev:/dev","/global/build/pod-result:/result"]

CI/CD server 
  • SLES-15.3
  • podman-3.4.7
  • gitlab-runner-15.3.0-1.x86_64

Container: sles15-kiwi:latest (built from scratch using buildah)
  • sles15.4
  • kiwi-ng 9.24.48

Get podman socket path from here:

# systemctl status podman.socket
● podman.socket - Podman API Socket
     Loaded: loaded (/usr/lib/systemd/system/podman.socket; enabled; vendor preset: disabled)
     Active: active (listening) since Thu 2022-08-25 09:33:56 CEST; 2 weeks 6 days ago
   Triggers: ● podman.service
       Docs: man:podman-system-service(1)
     Listen: /run/podman/podman.sock (Stream)
     CGroup: /system.slice/podman.socket
Aug 25 09:33:56 itadell101 systemd[1]: Listening on Podman API Socket.


Suggestions welcome.
Thanks so much,
Jan

Marcus Schäfer

unread,
Sep 15, 2022, 3:11:06 AM9/15/22
to kiwi-...@googlegroups.com
Hi Jan,

> Some update.
> This kiwi build in CI/CD with executor = "docker" is now working 100%.

Really cool, thanks for the feedback.

> Docker is not used or running but podman.socket.service
> # ls -l /var/run/docker.sock
> lrwxrwxrwx 1 root root 23 Aug 8 08:05 /var/run/docker.sock ->
> /run/podman/podman.sock

funny, docker.sock pointing to podman.sock

> The difference is, instead of in config.toml using
> volumes = ["/cache","/dev:/dev","/global/local/pod-result:/result"].
> Where "/result" will be used as "--target-dir /result", in the kiwi
> command line, a directory inside the sles15-kiwi container is used,
> "--target-dir /workdir".
> Then after the build the result is copied to "/result" that is mounted
> in the container.
> So the chroot etc. operations, works as in a standalone server.

Interesting concept. I'm happy your process chain is now
working. I could imagine without knowing about the background
information debugging in case of an issue might be not so
easy. But I also think this is the case with any CI/CD
automation.

Again thanks for sharing

Best regards,
signature.asc

Jan Robinson

unread,
Sep 16, 2022, 3:13:26 AM9/16/22
to kiwi
Hi Marcus

Thanks for the comment.

funny, docker.sock pointing to podman.sock

The reason for the link describe here. (the runner currently only knows the docker sock path)

 
Regards,
Jan
Reply all
Reply to author
Forward
0 new messages