[slurm-users] Debian RPM build for arm64?

30 views
Skip to first unread message

Christopher Harrop - NOAA Affiliate via slurm-users

unread,
Jun 13, 2024, 6:38:12 PMJun 13
to slurm...@lists.schedmd.com
Hello,

Are the instructions for building Debian RPMs found at https://slurm.schedmd.com/quickstart_admin.html#debuild expected to work on ARM machines?

I am having trouble with the "debuild -b -uc -us” step. 

#10 29.01 configure: exit 1
#10 29.01 dh_auto_configure: error: cd obj-aarch64-linux-gnu && ../configure --build=aarch64-linux-gnu --prefix=/usr --includedir=\${prefix}/include --mandir=\${prefix}/share/man --infodir=\${prefix}/share/info --sysconfdir=/etc --localstatedir=/var --disable-silent-rules --libdir=\${prefix}/lib/aarch64-linux-gnu --runstatedir=/run --disable-maintainer-mode --disable-dependency-tracking --sysconfdir=/etc/slurm --disable-debug --with-slurmrestd --with-pmix --enable-pam --with-systemdsystemunitdir=/lib/systemd/system/ SUCMD=/bin/su SLEEP_CMD=/bin/sleep returned exit code 1
#10 29.01 make[1]: *** [debian/rules:21: override_dh_auto_configure] Error 25
#10 29.01 make[1]: Leaving directory '/tmp/slurm-23.11.7'
#10 29.02 make: *** [debian/rules:6: build] Error 2
#10 29.02 dpkg-buildpackage: error: debian/rules build subprocess returned exit status 2
#10 29.02 debuild: fatal error at line 1182:
#10 29.02 dpkg-buildpackage -us -uc -ui -b failed
#10 ERROR: process "/bin/sh -c cd /tmp  && wget https://download.schedmd.com/slurm/slurm-23.11.7.tar.bz2  && tar -xaf slurm-23.11.7.tar.bz2  && cd slurm-23.11.7  && mk-build-deps -t \"apt-get -o Debug::pkgProblemResolver=yes -y\" -i debian/control  && debuild -b -uc -us  && cd ..  && ARCH=$(dpkg --print-architecture)  && dpkg --install slurm-smd_23.11.7-1_${ARCH}.deb  && dpkg --install slurm-smd-client_23.11.7-1_${ARCH}.deb  && dpkg --install slurm-smd-dev_23.11.7-1_${ARCH}.deb  && dpkg --install slurm-smd-doc_23.11.7-1_all.deb  && dpkg --install slurm-smd-libnss-slurm_23.11.7-1_${ARCH}.deb  && dpkg --install slurm-smd-libpam-slurm-adopt_23.11.7-1_${ARCH}.deb  && dpkg --install slurm-smd-libpmi0_23.11.7-1_${ARCH}.deb  && dpkg --install slurm-smd-libpmi2-0_23.11.7-1_${ARCH}.deb  && dpkg --install slurm-smd-libslurm-perl_23.11.7-1_${ARCH}.deb  && dpkg --install slurm-smd-sackd_23.11.7-1_${ARCH}.deb  && dpkg --install slurm-smd-sview_23.11.7-1_${ARCH}.deb" did not complete successfully: exit code: 29

Chris
---------------------------------------------------------------------------------------------------
Christopher W. Harrop                                                voice: (720) 649-0316
NOAA Global Systems Laboratory, R/GSL6                  fax: (303) 497-7259                 
325 Broadway                                                 
Boulder, CO 80303

Arnuld via slurm-users

unread,
Jun 13, 2024, 11:32:34 PMJun 13
to slurm...@lists.schedmd.com
I dont' know much about Slurm but if you want to start troubleshooting then you need to isolate the step where error appears. From the output you have posted , it looks like you are using some automated script to download, extract and build Slurm. Look here:

 "/bin/sh -c cd /tmp  && wget https://download.schedmd.com/slurm/slurm-23.11.7.tar.bz2  && tar -xaf slurm-23.11.7.tar.bz2  && cd slurm-23.11.7  && mk-build-deps -t \"apt-get -o Debug::pkgProblemResolver=yes -y\" -i debian/control  && debuild -b -uc -us  && .....

Here 6 steps have combined together with &&. I would do these steps by hand, manually, one by one and see where error occurs. You might get some extra information about the error this way



--
slurm-users mailing list -- slurm...@lists.schedmd.com
To unsubscribe send an email to slurm-us...@lists.schedmd.com

Christopher Harrop via slurm-users

unread,
Jun 14, 2024, 11:47:05 AMJun 14
to slurm...@lists.schedmd.com
The commands were grouped like that because they are part of a RUN in a Dockerfile. The build was happening on a Github Actions runner, so not so easy to just interactively run them one at a time. But, I'm pretty confident that it was the "debuild -b -uc -us" that failed.

I have since gathered some more information. I started an Ubuntu-22.04 EC2 arm instance (because I don't have access to an arm machine any other way) and ran the commands and they all completed and built the RPMs just fine. My container, however, is using Ubuntu-20.04. Unfortunately, the arm architecture is not available for the Ubuntu 20.04 AMI on EC2 (at least for me), so I was not able to do a clean test of 20.04. I suspect it's a problem with 20.04, and that 22.04+ is required. I can add a "mxschmitt/action-tmate@v3" github action to my CI step to try to get an interactive access to the github runner at failure time and see if I can reproduce the failure manually. I was hoping not to update to 20.04 yet due to downstream dependencies for my container, but it looks like that might be unavoidable.

Christopher Harrop - NOAA Affiliate via slurm-users

unread,
Jun 14, 2024, 2:07:03 PMJun 14
to slurm...@lists.schedmd.com
I have confirmed that the issue is Ubuntu 20.04.  I used the tmate github action to get access to the Ubuntu 20.04 github arm runner and tried the steps manually one be one.  It did indeed fail, almost immediately in the "debuild -b -uc -us” step.  Given that the same experiment done on a Ubuntu 22.04 arm EC2 instance was successful, it appears that 220.04+ is required.  I was hoping not to have to go down that road but am now looking at updating all downstream dependencies to 22.04.

If anyone can confirm/deny that 20.04 doesn’t work, I’d be interested in hearing your experience.
Reply all
Reply to author
Forward
0 new messages