Hello Everyone,
I have a question regarding the use of spack generated singularity containers on clusters.
# Build stage with Spack pre-installed and ready to be used
DOCKERFILE: FROM spack/ubuntu-bionic:latest as builder
# What we want to install and how we want to install it
# is specified in a manifest file (spack.yaml)
RUN mkdir /opt/spack-environment \
&& (echo "spack:" \
&& echo " specs:" \
&& echo " - ope...@3.1.3" \
&& echo " - petsc" \
&& echo " - v...@8.2.0" \
&& echo " - bo...@1.70.0" \
&& echo " - ei...@3.3.8" \
&& echo " - cmake" \
&& echo " packages:" \
&& echo " all:" \
&& echo " providers:" \
&& echo " mpi:" \
&& echo " - ope...@3.1.3" \
&& echo " concretization: together" \
&& echo " config:" \
&& echo " install_tree: /opt/software" \
&& echo " build_jobs: 60" \
&& echo " view: /opt/view") > /opt/spack-environment/spack.yaml
# Install the software, remove unnecessary deps
RUN cd /opt/spack-environment && spack env activate . && spack config get config && spack install --fail-fast && spack gc -y
# Modifications to the environment that are necessary to run
RUN cd /opt/spack-environment && \
spack env activate --sh -d . >> /etc/profile.d/z10_spack_environment.sh
ENTRYPOINT ["/bin/bash", "--rcfile", "/etc/profile", "-l"]
I have successfully used it for single node computations so I don't think it is a problem with image itself or the compatibility of OpenMPI in host and container.
However when I tried to use it for multi-node I get
Warning: Permanently added '[172.19.17.149]:22222' (ECDSA) to the list of known hosts.
Permission denied, please try again.
Permission denied, please try again.
Permission denied (publickey,gssapi-keyex,gssapi-with-mic,password).
--------------------------------------------------------------------------
ORTE was unable to reliably start one or more daemons.
This usually is caused by:
* not finding the required libraries and/or binaries on
one or more nodes. Please check your PATH and LD_LIBRARY_PATH
settings, or configure OMPI with --enable-orterun-prefix-by-default
* lack of authority to execute on one or more specified nodes.
Please verify your allocation and authorities.
* the inability to write startup files into /tmp (--tmpdir/orte_tmpdir_base).
Please check with your sys admin to determine the correct location to use.
* compilation of the orted with dynamic libraries when static are required
(e.g., on Cray). Please check your configure cmd line and consider using
one of the contrib/platform definitions for your system type.
* an inability to create a connection back to mpirun due to a
lack of common network interfaces and/or no route found between
them. Please check network connectivity (including firewalls
and network routing requirements).
--------------------------------------------------------------------------
the mpi exec command is like this, mpiexec -N 2 -n 72 -npernode 36 -machinefile "${PJM_O_NODEINF}" \
singularity exec "${SIF_PATH_IN_HOST}" "${BIN_PATH_IN_SIF}"
Am I missing something here?
Could it be a problem with inifinband support, resource-manager support when building the openmpi in the container? Should rebuild the image again?
I greatly appreciate any help you can provide