Singularity and Rmpi

111 views
Skip to first unread message

Nikki Tebaldi

unread,
Apr 2, 2021, 4:18:33 PM4/2/21
to singularity

Hello! I am trying to containerize and run an R program that uses Rmpi (https://cran.r-project.org/web/packages/Rmpi/index.html) to launch a bunch of processes in parallel.

I get the following error when I run or execute a Singularity container built with OpenMPI and Rmpi on our HPC cluster:

"--------------------------------------------------------------------------
mpirun was unable to launch the specified application as it could not access
or execute an executable:

Executable: /usr/local/lib/R/site-library/Rmpi/Rslaves.sh
Node: node26

while attempting to start process rank 0.
--------------------------------------------------------------------------"

Rmpi is installed in the container and the Rslaves.sh file is present in that location with permissions of 755 as expected. It's almost as if the mpirun binary is trying to access Rmpi in the cluster environment which does not have Rmpi installed.

Any advice or suggestions on where I might be going wrong?

Thank you!!
- Nikki

Nikki Tebaldi

unread,
Apr 2, 2021, 4:19:40 PM4/2/21
to singularity, Nikki Tebaldi
I was unable to attach a def file in my previous post. So here it is:

Bootstrap: docker
From: ubuntu:20.04

%files
run_geobamdata.R /opt/
geobamdata_0.1.0.tar.gz /opt/

%environment
# Point to OMPI binaries, libraries, man pages
export OMPI_DIR=/opt/ompi
export PATH="$OMPI_DIR/bin:$PATH"
export LD_LIBRARY_PATH="$OMPI_DIR/lib:$LD_LIBRARY_PATH"
export MANPATH="$OMPI_DIR/share/man:$MANPATH"

%post
# Install packages needed by OpenMPI and geoBAM
apt update
echo "America/New_York" | tee /etc/timezone
DEBIAN_FRONTEND=noninteractive apt install -y tzdata
apt -y install locales gnupg software-properties-common build-essential libcurl4-gnutls-dev libxml2-dev libssl-dev wget git bash gcc gfortran g++ make file libnetcdf-dev libnetcdff-dev
locale-gen en_US.UTF-8
apt-key adv --keyserver keyserver.ubuntu.com --recv-keys E298A3A825C0D65DFD57CBB651716619E084DAB9
echo 'deb https://cloud.r-project.org/bin/linux/ubuntu focal-cran40/' >> /etc/apt/sources.list

# Install OpenMPI
export OMPI_DIR=/opt/ompi
export OMPI_VERSION=4.1.0
export OMPI_URL="https://download.open-mpi.org/release/open-mpi/v4.1/openmpi-$OMPI_VERSION.tar.bz2"
mkdir -p /tmp/ompi
mkdir -p /opt
# Download
cd /tmp/ompi && wget -O openmpi-$OMPI_VERSION.tar.bz2 $OMPI_URL && tar -xjf openmpi-$OMPI_VERSION.tar.bz2
# Compile and install
cd /tmp/ompi/openmpi-$OMPI_VERSION && ./configure --prefix=$OMPI_DIR && make -j8 install

# Set env variables so we can compile our application
export PATH=$OMPI_DIR/bin:$PATH
export LD_LIBRARY_PATH=$OMPI_DIR/lib:$LD_LIBRARY_PATH
# Install R and geoBAM
apt update
apt -y install r-base r-base-dev
/usr/bin/Rscript -e 'Sys.setenv(DOWNLOAD_STATIC_LIBV8 = 1); install.packages("V8")'
/usr/bin/Rscript -e 'install.packages("dplyr", dependencies = TRUE, repos = "http://cran.rstudio.com/")'
/usr/bin/Rscript -e 'install.packages("reshape2", dependencies = TRUE, repos = "http://cran.rstudio.com/")'
/usr/bin/Rscript -e 'install.packages("settings", dependencies = TRUE, repos = "http://cran.rstudio.com/")'
/usr/bin/Rscript -e 'install.packages("devtools", dependencies = TRUE, repos = "http://cran.rstudio.com/")'
/usr/bin/Rscript -e "install.packages('ncdf4', dependencies=TRUE, repos='http://cran.rstudio.com/')"
/usr/bin/Rscript -e "install.packages('foreach', dependencies=TRUE, repos='http://cran.rstudio.com/')"
/usr/bin/Rscript -e "install.packages('parallel', dependencies=TRUE, repos='http://cran.rstudio.com/')"
/usr/bin/Rscript -e "install.packages('doParallel', dependencies=TRUE, repos='http://cran.rstudio.com/')"
/usr/bin/Rscript -e "install.packages('batchtools', dependencies=TRUE, repos='http://cran.rstudio.com/')"
/usr/bin/Rscript -e "install.packages('yaml', dependencies=TRUE, repos='http://cran.rstudio.com/')"
/usr/bin/Rscript -e "install.packages('data.table', dependencies=TRUE, repos='http://cran.rstudio.com/')"
/usr/bin/Rscript -e 'devtools::install_github("craigbrinkerhoff/geoBAMr", force = TRUE)'

# Install Rmpi and doMPI
/usr/bin/R CMD INSTALL --configure-args="--with-mpi=/opt/ompi" /opt/Rmpi_0.6-9.1.tar.gz

# Install geobamdata
/usr/bin/R CMD INSTALL /opt/geobamdata_0.1.0.tar.gz
/usr/bin/rm /opt/Rmpi_0.6-9.1.tar.gz
/usr/bin/rm /opt/geobamdata_0.1.0.tar.gz

# Create directories for geobamdata
mkdir -p /opt/data/input
mkdir -p /opt/data/output

%runscript
exec /usr/bin/Rscript /opt/run_geobamdata.R "$@"

%labels
Version v0.0.1
Name rmpi

%help
This container has OpenMPI installed alongside Rmpi with geobamdata copied to the /opt directory.

Kandes, Martin

unread,
Apr 2, 2021, 6:30:35 PM4/2/21
to singularity, Nikki Tebaldi
Hi Nikki,

I don't think Rmpi will work, at least not easily, with the standard way we usually run MPI-enabled Singularity containers. it should work for a single-node, but the multi-node case could be tricky as Rmpi is calling mpirun from within the container.

Marty

From: Nikki Tebaldi <nteb...@umass.edu>
Sent: Friday, April 2, 2021 1:19 PM
To: singularity <singu...@lbl.gov>
Cc: Nikki Tebaldi <nteb...@umass.edu>
Subject: [Singularity] Re: Singularity and Rmpi
 
--
You received this message because you are subscribed to the Google Groups "singularity" group.
To unsubscribe from this group and stop receiving emails from it, send an email to singularity...@lbl.gov.
To view this discussion on the web visit https://groups.google.com/a/lbl.gov/d/msgid/singularity/a3b91560-0f83-49a0-93d3-6a2cee8fdae8n%40lbl.gov.

Nikki Tebaldi

unread,
Apr 2, 2021, 7:21:04 PM4/2/21
to singu...@lbl.gov
Hi Marty,

Thank you for the response! That does shed some light on the error I am running into.

Do you have any recommendations for parallelizing R code across compute nodes using a container?

Thanks!!
-Nikki

From: Kandes, Martin <mka...@sdsc.edu>
Sent: Friday, April 2, 2021 6:30:30 PM

To: singularity <singu...@lbl.gov>
Cc: Nikki Tebaldi <nteb...@umass.edu>
Subject: Re: [Singularity] Re: Singularity and Rmpi
 
You received this message because you are subscribed to a topic in the Google Groups "singularity" group.
To unsubscribe from this topic, visit https://groups.google.com/a/lbl.gov/d/topic/singularity/r2uQfME7esQ/unsubscribe.
To unsubscribe from this group and all its topics, send an email to singularity...@lbl.gov.
To view this discussion on the web visit https://groups.google.com/a/lbl.gov/d/msgid/singularity/BYAPR04MB43585EA86862A05178F93358DC7A9%40BYAPR04MB4358.namprd04.prod.outlook.com.

Kandes, Martin

unread,
Apr 2, 2021, 11:05:26 PM4/2/21
to singu...@lbl.gov
Nikki,

I think there may be a way to do this ... but I’ve not done it myself before. I’m updating our container library right now, so I may have a chance to give it a try. I’ll let you know if I can get something working. I just know Rmpi is a bit weird. e.g., for some reason it sets up it’s own MPI communicator in addition to MPI_COMM_WORLD. I ran into a bug with OpenMPI 3.1.4 with Rmpi that would not return a non-zero exit value when it would disconnect from the communicator at the end of a job —- which would not kill the Slurm job it was running in.

Anyhow, I’ll try and have a look. But maybe someone else knows the trick on how to run an MPI job across nodes from inside the container. I’m pretty sure I saw a talk on this once at first Singularity User Group. Videos are on Sylabs website. Maybe it was just binding host system compilers and MPI into container? 

Marty


From: Nikki Tebaldi <nteb...@umass.edu>
Sent: Friday, April 2, 2021 7:21:00 PM
To: singu...@lbl.gov <singu...@lbl.gov>

Kandes, Martin

unread,
Apr 2, 2021, 11:06:35 PM4/2/21
to singu...@lbl.gov
*when it would not disconnect 


From: Kandes, Martin <mka...@sdsc.edu>
Sent: Friday, April 2, 2021 11:05:19 PM

Bennet Fauber

unread,
Apr 3, 2021, 8:32:23 AM4/3/21
to singu...@lbl.gov
Nikki,

You should check to see whether you or your user will use the run_slurm() function from geobamdata.  That spawns additional jobs in Slurm and getting that function modified so it is 'container aware' may be challenging.  It looks like geobamdata simply takes a list of input NetCDF files, then processes them using 4 cores for each input.

It may be simpler and more portable to stick to running the container on a single node and instead use a job array or some other mechanism for spawning additional jobs when you have very large lists of input files.

geobamdata appears to use doParallel on a single workstation, but it does not use doMPI for multiple nodes, instead it calls out to, probably, `srun` and submits additional jobs (tasks?) for each input.  That seems to be done with `batchtools`.  That may complicate matters because you will have to insure that the command that `batchtools` passes to `srun` is the one appropriate for the container, as well.



Bennet Fauber

unread,
Apr 3, 2021, 9:03:42 AM4/3/21
to singu...@lbl.gov
Nikki,

The batchtools does use `sbatch`, not `srun`, so it will submit a job for each input behind the scenes.

You might find


useful if you continue to try to set this up so that geobamdata does job management for you, as it shows one way to modify the command that batchtools will send to each of the jobs.

Nikki Tebaldi

unread,
Apr 3, 2021, 12:13:26 PM4/3/21
to singularity, Kandes, Martin
Marty,

Thank you! I will check out the videos on Sylabs website. I was following the "Singularity and MPI applications" documentation from the Sylabs user guide and was using the hybrid model but maybe the bind model will work differently as you mentioned.

I am also wondering if it would work better to refactor my Rmpi code to run in “single program multiple data” (SPMD) mode instead of having rank 0 spawn the other ranks. But maybe Rmpi sets up its own MPI communicator no matter what.

And finally, it may be better practice to run many instances of one container to accomplish parallelization instead of trying to do so inside of a container but I am not sure if that will produce any significant overhead.

Thanks again for point me in a few different directions! I look forward to hearing if you can get something working if you get the chance.

- Nikki

Nikki Tebaldi

unread,
Apr 3, 2021, 12:26:40 PM4/3/21
to singularity, Bennet Fauber
Hi Bennet,

run_slurm was my attempt at getting batchtools to function which it does but then I ran into issues when I containerized the code and needed to make SLURM binaries available to the container. As you mention this is challenging!

I also attempted to use doMPI at first but had some trouble getting that running in the container on our cluster and decided to opt for more control using Rmpi since I have done something similar for a Python program using MPI in a Singularity container.

I am starting to think (as I mentioned in a previous post) that it is better practice to run a container for each job so in this case a job for each input file that needs to be processed. Or if that proves to computationally heavy for many input files, I could chunk up the input files so that a container processes multiple files but only runs on a single node and then run multiple instances of that.

Thanks!
- Nikki

Nikki Tebaldi

unread,
Apr 3, 2021, 12:36:57 PM4/3/21
to singularity, Bennet Fauber
Bennet,

Thank you for the link to the batchtools SLURM template. Would this still require making the SLURM binaries (e.g. squeue) available to the container? Or maybe this does not include the use of a container.


Which gives the example command:
`singularity exec rocker_r-base.img Rscript -e 'batchtools::doJobCollection("<%= uri %>")'`

But I am not entirely sure how this is functioning and how it might interact with SLURM as it still seems to be calling batchtools from inside a container. (I think I have a lot of things to test now!)

Thanks again,
- Nikki
Reply all
Reply to author
Forward
0 new messages