SPACK CONTAINERS ON CLUSTERS

23 lượt xem
Chuyển tới thư đầu tiên chưa đọc

MKamra

chưa đọc,
03:01:58 17 thg 3, 202117/3/21
đến Spack

Hello Everyone,
I have a question regarding the use of spack generated singularity containers on clusters.
# Build stage with Spack pre-installed and ready to be used
DOCKERFILE:


FROM spack/ubuntu-bionic:latest as builder
# What we want to install and how we want to install it
# is specified in a manifest file (spack.yaml)
RUN mkdir /opt/spack-environment \
&&  (echo "spack:" \
&&   echo "  specs:" \
&&   echo "  - ope...@3.1.3" \
&&   echo "  - petsc" \
&&   echo "  - v...@8.2.0" \
&&   echo "  - bo...@1.70.0" \
&&   echo "  - ei...@3.3.8" \
&&   echo "  - cmake" \
&&   echo "  packages:" \
&&   echo "    all:" \
&&   echo "      providers:" \
&&   echo "        mpi:" \
&&   echo "        - ope...@3.1.3" \
&&   echo "  concretization: together" \
&&   echo "  config:" \
&&   echo "    install_tree: /opt/software" \
&&   echo "    build_jobs: 60" \
&&   echo "  view: /opt/view") > /opt/spack-environment/spack.yaml



# Install the software, remove unnecessary deps
RUN cd /opt/spack-environment && spack env activate . && spack config get config && spack install --fail-fast && spack gc -y

# Modifications to the environment that are necessary to run
RUN cd /opt/spack-environment && \
    spack env activate --sh -d . >> /etc/profile.d/z10_spack_environment.sh

ENTRYPOINT ["/bin/bash", "--rcfile", "/etc/profile", "-l"]


I have successfully used it for single node computations so I don't think it is a problem with image itself or the compatibility of OpenMPI in host and container.
However when I tried to use it for multi-node I get

Warning: Permanently added '[172.19.17.149]:22222' (ECDSA) to the list of known hosts.
Permission denied, please try again.
Permission denied, please try again.
Permission denied (publickey,gssapi-keyex,gssapi-with-mic,password).
--------------------------------------------------------------------------
ORTE was unable to reliably start one or more daemons.
This usually is caused by:

* not finding the required libraries and/or binaries on
  one or more nodes. Please check your PATH and LD_LIBRARY_PATH
  settings, or configure OMPI with --enable-orterun-prefix-by-default

* lack of authority to execute on one or more specified nodes.
  Please verify your allocation and authorities.

* the inability to write startup files into /tmp (--tmpdir/orte_tmpdir_base).
  Please check with your sys admin to determine the correct location to use.

*  compilation of the orted with dynamic libraries when static are required
  (e.g., on Cray). Please check your configure cmd line and consider using
  one of the contrib/platform definitions for your system type.

* an inability to create a connection back to mpirun due to a
  lack of common network interfaces and/or no route found between
  them. Please check network connectivity (including firewalls
  and network routing requirements).
--------------------------------------------------------------------------

the mpi exec command is like this,



mpiexec -N 2 -n 72 -npernode 36 -machinefile "${PJM_O_NODEINF}" \
singularity exec "${SIF_PATH_IN_HOST}" "${BIN_PATH_IN_SIF}"

Am I missing something here?
Could it be a problem with inifinband support, resource-manager support when building the openmpi in the container? Should rebuild the image again?

I greatly appreciate any help you can provide

Trả lời tất cả
Trả lời tác giả
Chuyển tiếp
0 tin nhắn mới