OpenMP thread scaling issue, only 8 cores, not 28 (native compile vs singularity)

Perrin Meyer

unread,

Mar 21, 2021, 12:23:08 AM3/21/21

to singularity

I have access to an academic HPC cluster. I am currently running a OpenMP based Fortran BEM code (wrapped in Julia) (on a single 28 core node).

When I compile it on the cluster nodes, it scales to all 28 cores

After a bit of work, I managed to build a Ubuntu 20 based Singularity container, and I compile the fortran code, julia wrappers, etc. This is on my laptop, since I dont have root access on the HPC cluster (that would be nice...).

However, when I run the same fortran code on the same nodes Instead of 28 core scaling, I'm only getting 8 node scaling. (The quadrature scheme is embarrassingly parallel, so I can tell from just looking at top. The cluster compiled code gets 2799% cpu utilization, the singularity container gets 799% cpu utilization. When it hits GMRES its not so regular).

OMP_NUM_THREADS is not set in either environment... I tried setting OMP_NUM_THREADS to 28, but it still only scales to 8 cores

My laptop has 4 cores/8 threads, which is suspicious. Why would compiling a singularity image on my 8 thread laptop limit me to 8 threads on a HPC cluster node with 28 cores? (and I know by native compilation the same fortran code can scale to 28 cores..)

Any help would be greatly appreciated

Sincerely

Perrin Meyer

Kandes, Martin

unread,

Mar 21, 2021, 12:44:11 AM3/21/21

to singu...@lbl.gov

Perrin,

What is the host OS and CPU on the cluster you’re observing this degraded performance?

Also, is OpenBLAS involved as a dependency?

Marty

Get Outlook for iOS

From: Perrin Meyer <perri...@gmail.com>
Sent: Sunday, March 21, 2021 12:23:08 AM
To: singularity <singu...@lbl.gov>
Subject: [Singularity] OpenMP thread scaling issue, only 8 cores, not 28 (native compile vs singularity)

--
You received this message because you are subscribed to the Google Groups "singularity" group.
To unsubscribe from this group and stop receiving emails from it, send an email to singularity...@lbl.gov.
To view this discussion on the web visit https://groups.google.com/a/lbl.gov/d/msgid/singularity/defce510-ed08-4967-9538-c0e4ee20fbf5n%40lbl.gov.

Perrin Meyer

unread,

Mar 21, 2021, 5:26:11 AM3/21/21

to singu...@lbl.gov

I was testing my singularity container on the seawulf cluster at suny stonybrook, each node has 28 Haswell cores. Its a penguin computing cluster, which I think is CentOS based, but I'm not sure

https://it.stonybrook.edu/help/kb/understanding-seawulf

Yes, In my Singularity build I am building OpenBLAS and the Fortran BEM code links to it. I built it with the USE_OPENMP=1 flag, which I thought was the way to make the Fortran OpenMP code play nice wiith OpenBlas...

On seawulf I built the fortran code (and Julia, and gmsh) using

module load git/2.12.2
module load shared
module load gcc-stack
module load openblas

So I'm not sure how they build OpenBLAS.

This code scales to all 28 cores (at least for the embarrassingly parallel parts).

My Singularity container was built from the def file below.

Thank you for your help!

perrin meyer

Bootstrap: docker
From: ubuntu:20.04

%post
## singularity version 3.7.0

export DEBIAN_FRONTEND=noninteractive
apt -y update
apt -y install git
## from https://github.com/JuliaLang/julia/blob/master/doc/build/build.md
apt -y install build-essential
apt -y install libatomic1
apt -y install python3
apt -y install gfortran
## from https://stackoverflow.com/questions/44331836/apt-get-install-tzdata-noninteractive
apt install -y --no-install-recommends tzdata
apt -y install wget
apt -y install m4
apt -y install cmake
apt -y install pkg-config
apt -y install curl
apt -y install perl

## build Julia from github source
git clone https://github.com/JuliaLang/julia.git
cd /julia
git checkout v1.5.4
## from ERROR: Unable to find compatible target in system image.
## this seams to solve it
export JULIA_CPU_TARGET='generic;sandybridge,-xsaveopt,clone_all;haswell,-rdrnd,base(1)'
make -j 8
mkdir /opt/julia
export JULIA_DEPOT_PATH=/opt/julia
/julia/julia -e 'using Pkg ; Pkg.add("SpecialFunctions")'
/julia/julia -e 'using Pkg ; Pkg.add("BSON")'
/julia/julia -e 'using Pkg ; Pkg.add("LinearAlgebra")'
/julia/julia -e 'using Pkg ; Pkg.add("Printf")'
/julia/julia -e 'using SpecialFunctions; using BSON ; using LinearAlgebra ; using Printf'

## note, no OpenGL or FLTK or even OpenCascade, so only good for mesh loading/saving...
## built in kernel might work
cd /opt
git clone https://gitlab.onelab.info/gmsh/gmsh.git
cd gmsh/
git checkout gmsh_4_8_0
mkdir build/
cd build/
cmake -DENABLE_BUILD_DYNAMIC=1 -DCMAKE_INSTALL_PREFIX=/opt ..
make -j 8
make install

cd /opt
git clone https://github.com/xianyi/OpenBLAS.git
cd OpenBLAS/
git checkout v0.3.9
## I think this is necessary... so confusing, HPC...
## if LLNL is behind it, it will probably work eventually...
## make install PREFIX=your_installation_directory
## The default installation directory is /opt/OpenBLAS.
make USE_OPENMP=1
make install
## this goes to /opt/OpenBLAS by default..
## hack, copy to /usr/local/lib ...
cd /opt/OpenBLAS
cp *.so /usr/local/lib/

## fmm3d
cd /opt
git clone https://github.com/flatironinstitute/FMM3D.git
cd FMM3D/
make lib
cd lib
## hack... I have a complicated relationship with LD...
## I'm SETTING ENVIRONMENT VARIABLE, WHAT MORE DO YOU WANT link time different than runtime...
cp *.so /usr/local/lib/

## fmm3dbie
export LD_LIBRARY_PATH=:$LD_LIBRARY_PATH:/opt/lib:/opt/OpenBLAS/:/opt/FMM3D/lib:/usr/local/lib:
cd /opt
git clone https://gitlab.com/fastalgorithms/fmm3dbie.git
cd fmm3dbie/
cp make.inc.linux.gnu.openblas make.inc
make lib
cd lib
## hack to make linker happy...
cp *.so /usr/local/lib/

%environment
export JULIA_DEPOT_PATH=/opt/julia:$HOME/.juliasingularity:
export PATH=$PATH:/julia:/opt/bin:
export LD_LIBRARY_PATH=:$LD_LIBRARY_PATH:/opt/lib:/opt/OpenBLAS:/opt/FMM3D/lib:/usr/local/lib:

You received this message because you are subscribed to a topic in the Google Groups "singularity" group.
To unsubscribe from this topic, visit https://groups.google.com/a/lbl.gov/d/topic/singularity/vixhI9XMVs8/unsubscribe.
To unsubscribe from this group and all its topics, send an email to singularity...@lbl.gov.
To view this discussion on the web visit https://groups.google.com/a/lbl.gov/d/msgid/singularity/BYAPR04MB43583DC517F7D8789687C6E8DC669%40BYAPR04MB4358.namprd04.prod.outlook.com.

Kandes, Martin

unread,

Mar 21, 2021, 2:35:13 PM3/21/21

to singu...@lbl.gov

Perrin,

I think I've run into this problem before myself. i.e., If you build your Singularity container on another system with a lower core count than you expect to run it on, then you need to make sure to set the maximum number of OpenBLAS threads explicitly. For example, this is my make line for OpenBLAS [1]. Here is the full definition file where this line comes from [2]. I set OPENBLAS_MAX_NUM_THREADS=256 since our latest system has 128-core nodes (or 256 when hyperthreaded).

Marty

[1]

make FC='gfortran' BINARY=64 USE_OPENMP="${OPENBLAS_USE_OPENMP}" DYNAMIC_ARCH=1 NUM_THREADS="${OPENBLAS_MAX_NUM_THREADS}"

[2]

https://github.com/mkandes/naked-singularity/blob/master/definition-files/hpl/Singularity.hpl-2.3-ubuntu-18.04-openmpi-4.0.4-openblas-0.3.14

From: Perrin Meyer <perri...@gmail.com>
Sent: Sunday, March 21, 2021 2:25 AM
To: singu...@lbl.gov <singu...@lbl.gov>
Subject: Re: [Singularity] OpenMP thread scaling issue, only 8 cores, not 28 (native compile vs singularity)

To view this discussion on the web visit https://groups.google.com/a/lbl.gov/d/msgid/singularity/CABwHzfyVHoiWSMXN07TA5V%3DWTbXZF3vMwnNk3zhjfPb%3D_t_KrA%40mail.gmail.com.

Perrin Meyer

unread,

Mar 21, 2021, 6:17:17 PM3/21/21

to singu...@lbl.gov

Thanks,

What is OPENBLAS_USE_OPENMP set to?

I'll try 1 for now, see if that works, then try 0...

Is OpenBLAS smart enough to not use four times as many threads as cores, or should I set the thread limit for each Node I run on? Gosh, I'll have to spin a Singularity Container to Spin Singularity containers. Computing has become so abstract, but still BASH and gcc command line options...

Also, I heard that Intel was opening up the licensing for their Fortran compilers and MKL, something about its called the Intel One Suite or something marketing like that. Which in theory might mean open source projects could use Intel compilers / MKL more easily (the people at NYU say this particular code runs 20% faster on Intel compilers. Any hints on Singularity and Intel compilers /MKL?

Thanks again,

perrin meyer

To view this discussion on the web visit https://groups.google.com/a/lbl.gov/d/msgid/singularity/BYAPR04MB4358896DF2AA514CC3EF2082DC669%40BYAPR04MB4358.namprd04.prod.outlook.com.

Perrin Meyer

unread,

Mar 21, 2021, 6:20:01 PM3/21/21

to singu...@lbl.gov

oop, I didn't notice the link to the DEF file

To view this discussion on the web visit https://groups.google.com/a/lbl.gov/d/msgid/singularity/BYAPR04MB4358896DF2AA514CC3EF2082DC669%40BYAPR04MB4358.namprd04.prod.outlook.com.

Perrin Meyer

unread,

Mar 22, 2021, 12:34:07 AM3/22/21

to singularity, Perrin Meyer

Thanks! I confirmed that my re-built Singularity container with OpenBLAS threads set to 28 scales to all 28 cores (on one 28core node of the SeaWulf cluster). I'm running more detailed timings overnight.

I might get inspired and try to download (all 30GB!) of the newly re-branded Intel oneAPI(tm) Compilers and MKL and build a singularity container with everything built with Intel compilers and linked to MKL, but then I might also have to get some actual work done...).

The "base" (24GB total!)

https://software.intel.com/content/www/us/en/develop/tools/oneapi/base-toolkit.html

the "HPC" ("classic" Fortran.. and MKL..., New Fortran with OpenMP GPUP Offload support !

https://software.intel.com/content/www/us/en/develop/tools/oneapi/hpc-toolkit.html