I have created a Compute Engine VM instance built with Slurm (using one of the standard blueprints "hpc-cluster-small.yaml"). On the VM I have a code base that uses a variety of packages that have been installed using the Spack package manager, including MPICH. The issue I run into occurs when running the code using srun.
"The application appears to have been direct launched using "srun",
but OMPI was not built with SLURM support. This usually happens
when OMPI was not configured --with-slurm and we weren't able
to discover a SLURM installation in the usual places."
I've tried a number of things including building the MPICH library with Spack to include the existing installation of Slurm that is built with this instance. I've also tried installing MPICH with a Spack-installed Slurm, but whenever I load this module, it seems to break Slurm altogether and I get errors like:
sinfo: error: resolve_ctls_from_dns_srv: res_nsearch error: Unknown host
sinfo: error: fetch_config: DNS SRV lookup failed
sinfo: error: _establish_config_source: failed to fetch config
sinfo: fatal: Could not establish a configuration source
Is there a way to easily reconfigure the VM instance so that it recognizes different MPI implementations or different Slurm installations?
You received this message because you are subscribed to the Google Groups "google-cloud-slurm-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to google-cloud-slurm-...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/google-cloud-slurm-discuss/8b492c21-1a3e-40e1-95e0-9595e0602666n%40googlegroups.com.
This tells openmpi to build in the support for Slurm specifically. But you'll likely want the other lines too to ensure it builds against the correct Spack library files.