singularity container on HPC

218 views
Skip to first unread message

Haseeb Mahmud

unread,
Feb 15, 2018, 8:34:13 AM2/15/18
to singu...@lbl.gov
Hello,

I have built a namd container that uses the MPI build of NAMD 2.12 that I built from source. I am trying to execute  this container  on my HPC using mpirun on my host slurm script using multiple nodes , however, although the job runs , it looks to be only using 1 processer and 1 node. 

I built my container from a script  with an Ubuntu operating syytem and In my %post section, I first configure and make openmpi-2.1.0  then i install mpich using "apt install mpich"  because my MPI build of NAMD wont compile without it.  Then in my %post section I also build the namd 2.12 MPI build itself as all the NAMD source files are in my container as well.

In my slurm script when I run " mpirun -np #P singularity exec  namdimage.simg /path_to namd_executable_in_container/namd2 inputfile" ,I get the problem  of no scaling.




My sample out looks like :
Charm++> Running on MPI version: 3.0
Charm++> level of thread support used: MPI_THREAD_SINGLE (desired: MPI_THREAD_SINGLE)
Charm++> Running in non-SMP mode: numPes 1
Charm++> Using recursive bisection (scheme 3) for topology aware partitions
..........

Info: Running on 1 processors, 1 nodes, 1 physical nodes.


Any ideas what I may be doing wrong.

Thanks,

Haseeb

victor sv

unread,
Feb 15, 2018, 10:07:25 AM2/15/18
to singu...@lbl.gov
HI Haseeb,

first of all I would like to understand with MPI family and version is running in and out the containers.

NAMD is linked agains OpenMPI or MPICH?

Which MPI family and version is running in the host? it should be enough if you show the output of "mpirun --version".

BR,
Víctor.

--
You received this message because you are subscribed to the Google Groups "singularity" group.
To unsubscribe from this group and stop receiving emails from it, send an email to singularity+unsubscribe@lbl.gov.

Haseeb Mahmud

unread,
Feb 15, 2018, 9:49:41 PM2/15/18
to singu...@lbl.gov, vict...@gmail.com
Hello,

Since my email I rebuilt my container with NAMD linked against OpenMPI version 2.10. I then ran my container on my host which also uses OpenMPI version 2.10.. And I get new errors. Looks like now now it is using both nodes but I get the below errors.


[compute-0.cluster:64549] mca_btl_tcp_proc: unknown af_family received: 255
[compute-0.cluster:64549] unknown address family for tcp: 0
[compute-0.cluster:64549] mca_btl_tcp_proc: unknown af_family received: 255
[compute-0.cluster:64549] unknown address family for tcp: 0
*** Error in `/NAMD_2.12_Source/Linux-x86_64-g++/namd2': munmap_chunk(): invalid pointer: 0x00000000098c5cb0 ***
*** Error in `/NAMD_2.12_Source/Linux-x86_64-g++/namd2': corrupted double-linked list: 0x0000000008b4b540 ***

Charm++> Running on MPI version: 3.1
Charm++> level of thread support used: MPI_THREAD_SINGLE (desired: MPI_THREAD_SINGLE)
Charm++> Running in non-SMP mode: numPes 88
Charm++> Using recursive bisection (scheme 3) for topology aware partitions
Converse/Charm++ Commit ID: v6.7.1-0-gbdf6a1b-namd-charm-6.7.1-build-2016-Nov-07-136676
Warning> Randomization of stack pointer is turned on in kernel, thread migration may not work! Run 'echo 0 > /proc/sys/kernel/randomize_va_space' as root to disable it, or try run with '+isomalloc_sync'.
CharmLB> Load balancer assumes all CPUs are same.
Charm++> Running on 2 unique compute nodes (88-way SMP).
Charm++> cpu topology info is gathered in 0.019 seconds.

Info: Startup phase 9 took 0.00478208 s, 1237.92 MB of memory in use
Info: CREATING 44375 COMPUTE OBJECTS
Info: Startup phase 10 took 0.0075435 s, 1237.92 MB of memory in use
Info: Startup phase 11 took 0.000689846 s, 1237.92 MB of memory in use
Info: Startup phase 12 took 4.81852e-05 s, 1237.92 MB of memory in use
Info: Finished startup at 13.4259 s, 1237.92 MB of memory in use

Info: useSync: 0 useProxySync: 0
[compute-0:64562] *** Process received signal ***
[compute-0:64562] Signal: Segmentation fault (11)
[compute-0:64562] Signal code: Address not mapped (1)
[compute-0:64562] Failing at address: 0x7
[compute-0:64699] *** Process received signal ***
[compute-0:64699] Signal: Segmentation fault (11)
[compute-0:64699] Signal code: Address not mapped (1)
[compute-0:64699] Failing at address: 0xffffffffffffffff
[compute-0:64737] *** Process received signal ***
[compute-0:64737] Signal: Segmentation fault (11)



Haseeb 



To unsubscribe from this group and stop receiving emails from it, send an email to singularity...@lbl.gov.

Kandes, Martin

unread,
Feb 15, 2018, 10:31:29 PM2/15/18
to singu...@lbl.gov, vict...@gmail.com
Hi Haseed,

I guess my first question would be: why do you need to build NAMD within a Singularity container? I've built it on a few systems before with different build options. It usually plays nicely with most systems. If you're doing an 'mpi' comm  build, here is a build script I've used before [1]. Maybe it'll help.

Marty

[1]

#!/bin/bash
#
# A build script for NAMD

declare -r COMPILER_MODULE='gnu/4.9.2'
declare -r MPI_MODULE='mvapich2_ib/2.1'

declare -r OS='linux'
declare -r ARCH='x86_64'

declare -r NAMD_NAME='NAMD'
declare -r NAMD_VERSION='2.12'
declare -r NAMD_TARBALL="${NAMD_NAME}_${NAMD_VERSION}_Source.tar.gz"
declare -r NAMD_DIR="${NAMD_NAME}_${NAMD_VERSION}_Source"
declare -r NAMD_URL='http://www.ks.uiuc.edu/Research/namd'
declare -r NAMD_COMPILER='g++'
declare -r NAMD_ARCH="Linux-${ARCH}-${NAMD_COMPILER}"

declare -r CHARM_NAME='charm'
declare -r CHARM_VERSION='6.7.1'
declare -r CHARM_TARFILE="${CHARM_NAME}-${CHARM_VERSION}.tar"
declare -r CHARM_DIR="${CHARM_NAME}-${CHARM_VERSION}"
declare -r CHARM_COMM='mpi'
declare -r CHARM_ARCH="${CHARM_COMM}-${OS}-${ARCH}"
declare -r CHARM_OPTIONS='mpicxx'

declare -r FFTW_NAME='fftw'
declare -r FFTW_TARBALL="${FFTW_NAME}-${OS}-${ARCH}.tar.gz"

declare -r TCL_NAME='tcl'
declare -r TCL_VERSION='8.5.9'
declare -r TCL_TARBALL="${TCL_NAME}${TCL_VERSION}-${OS}-${ARCH}.tar.gz"
declare -r TCL_THREADED_TARBALL="${TCL_NAME}${TCL_VERSION}-${OS}-${ARCH}-threaded.tar.gz"

module purge
module load "${COMPILER_MODULE}"
module load "${MPI_MODULE}"

tar -xzvf "${PWD}/tarballs/${NAMD_TARBALL}"
cd "${NAMD_DIR}"

wget "${NAMD_URL}/libraries/${FFTW_TARBALL}"
wget "${NAMD_URL}/libraries/${TCL_TARBALL}"
wget "${NAMD_URL}/libraries/${TCL_THREADED_TARBALL}"

tar -xvf "${CHARM_TARFILE}"
tar -xzvf "${FFTW_TARBALL}"
tar -xzvf "${TCL_TARBALL}"
tar -xzvf "${TCL_THREADED_TARBALL}"

mv "${OS}-${ARCH}" fftw
mv "${TCL_NAME}${TCL_VERSION}-${OS}-${ARCH}" tcl
mv "${TCL_NAME}${TCL_VERSION}-${OS}-${ARCH}-threaded" tcl-threaded

cd "${CHARM_DIR}"
./build charm++ "${CHARM_ARCH}" "${CHARM_OPTIONS}" --with-production

cd ../
./config "${NAMD_ARCH}" --charm-arch "${CHARM_ARCH}-${CHARM_OPTIONS}"
cd "${NAMD_ARCH}"
make




From: Haseeb Mahmud [smahm...@gmail.com]
Sent: Thursday, February 15, 2018 6:48 PM
To: singu...@lbl.gov
Cc: vict...@gmail.com
Subject: Re: [Singularity] singularity container on HPC

Haseeb Mahmud

unread,
Feb 16, 2018, 12:31:47 PM2/16/18
to singu...@lbl.gov
Hello,

 I am able to build and run  MPI version of NAMD fine on my cluster outside a container. I just would like to run it within a container as well just as a test with singularity. Is there any other dependencies.libraries  I should be adding to my container other than Open MPI 2.1.0.  My HPC'  network is  Ethernet (TCP), and not infiniband.

Regards,

Haseeb

To unsubscribe from this group and stop receiving emails from it, send an email to singularity+unsubscribe@lbl.gov.

Jason Stover

unread,
Feb 16, 2018, 1:56:38 PM2/16/18
to singu...@lbl.gov
Hi Haseeb,

I think what you will want to do is when you build the image have:

%runscript
/path_to namd_executable_in_container/namd2 ${@}

Then change your mpi line to being:

mpirun -np #P /path/to/namdimage.simg inputfile

So you will be executing the image itself.

-J
> email to singularity...@lbl.gov.
Reply all
Reply to author
Forward
0 new messages