OpenMP thread scaling issue, only 8 cores, not 28 (native compile vs singularity)

358 views
Skip to first unread message

Perrin Meyer

unread,
Mar 21, 2021, 12:23:08 AM3/21/21
to singularity
I have access to an academic HPC cluster.  I am currently running a OpenMP based Fortran BEM code (wrapped in Julia) (on a single 28 core node). 

When I compile it on the cluster nodes, it scales to all 28 cores

After a bit of work, I managed to build a Ubuntu 20 based Singularity container, and I compile the fortran code, julia wrappers, etc.  This is on  my laptop, since I dont have root access on the HPC cluster (that would be nice...). 

However, when I run the same fortran code on the same nodes Instead of 28 core scaling, I'm only getting 8 node scaling. (The quadrature scheme is embarrassingly  parallel, so I can tell from just looking at top.  The cluster compiled code gets 2799% cpu utilization, the singularity container gets 799% cpu utilization. When it hits GMRES its not so regular). 

OMP_NUM_THREADS is not set in either environment... I tried setting OMP_NUM_THREADS to 28, but it still only scales to 8 cores

My laptop has 4 cores/8 threads, which is suspicious.  Why would compiling a singularity image on my 8 thread laptop limit me to 8 threads on a HPC cluster node with 28 cores? (and I know by native compilation the same fortran code can scale to 28 cores..)

Any help would be greatly appreciated

Sincerely

Perrin Meyer 

Kandes, Martin

unread,
Mar 21, 2021, 12:44:11 AM3/21/21
to singu...@lbl.gov
Perrin,

What is the host OS and CPU on the cluster you’re observing this degraded performance?

Also, is OpenBLAS involved as a dependency?

Marty


From: Perrin Meyer <perri...@gmail.com>
Sent: Sunday, March 21, 2021 12:23:08 AM
To: singularity <singu...@lbl.gov>
Subject: [Singularity] OpenMP thread scaling issue, only 8 cores, not 28 (native compile vs singularity)
 
--
You received this message because you are subscribed to the Google Groups "singularity" group.
To unsubscribe from this group and stop receiving emails from it, send an email to singularity...@lbl.gov.
To view this discussion on the web visit https://groups.google.com/a/lbl.gov/d/msgid/singularity/defce510-ed08-4967-9538-c0e4ee20fbf5n%40lbl.gov.

Perrin Meyer

unread,
Mar 21, 2021, 5:26:11 AM3/21/21
to singu...@lbl.gov
I was testing my singularity container on the seawulf cluster at suny stonybrook, each node has 28 Haswell cores. Its a penguin computing cluster, which I think is CentOS based, but I'm not sure

Yes, In my Singularity build I am building OpenBLAS and the Fortran BEM code links to it.  I built it with the USE_OPENMP=1 flag, which I thought was the way to make the Fortran OpenMP code play nice wiith OpenBlas... 

On seawulf I built the fortran code (and Julia, and gmsh) using
module load git/2.12.2
module load shared
module load gcc-stack
module load openblas

So I'm not sure how they build OpenBLAS.  

This code scales to all 28 cores (at least for the embarrassingly parallel parts). 

My Singularity container was built from the def file below. 

Thank you for your help!

perrin meyer 



Bootstrap: docker
From: ubuntu:20.04

%post
        ## singularity version 3.7.0
       
        export DEBIAN_FRONTEND=noninteractive
        apt -y update
        apt -y install git
        ## from https://github.com/JuliaLang/julia/blob/master/doc/build/build.md
        apt -y install build-essential
        apt -y install libatomic1
        apt -y install python3
        apt -y install gfortran
        ## from https://stackoverflow.com/questions/44331836/apt-get-install-tzdata-noninteractive
        apt install -y --no-install-recommends tzdata
        apt -y install wget
        apt -y install m4
        apt -y install cmake
        apt -y install pkg-config
        apt -y install curl
        apt -y install perl


        ## build Julia from github source
        git clone https://github.com/JuliaLang/julia.git
        cd /julia
        git checkout v1.5.4
        ## from ERROR: Unable to find compatible target in system image.
        ## this seams to solve it
        export JULIA_CPU_TARGET='generic;sandybridge,-xsaveopt,clone_all;haswell,-rdrnd,base(1)'
        make -j 8
        mkdir /opt/julia
        export JULIA_DEPOT_PATH=/opt/julia
        /julia/julia -e 'using Pkg ; Pkg.add("SpecialFunctions")'
        /julia/julia -e 'using Pkg ; Pkg.add("BSON")'
        /julia/julia -e 'using Pkg ; Pkg.add("LinearAlgebra")'
        /julia/julia -e 'using Pkg ; Pkg.add("Printf")'
        /julia/julia -e 'using SpecialFunctions; using BSON ; using LinearAlgebra ; using Printf'

        ## note, no OpenGL or FLTK or even OpenCascade, so only good for mesh loading/saving...
        ## built in kernel might work
        cd /opt
        git clone https://gitlab.onelab.info/gmsh/gmsh.git
        cd gmsh/
        git checkout gmsh_4_8_0
        mkdir build/

               cd build/
        cmake -DENABLE_BUILD_DYNAMIC=1 -DCMAKE_INSTALL_PREFIX=/opt ..
        make -j 8
        make install

        cd /opt
        git clone https://github.com/xianyi/OpenBLAS.git
        cd OpenBLAS/
        git checkout v0.3.9
        ## I think this is necessary... so confusing, HPC...
        ## if LLNL is behind it, it will probably work eventually...
        ## make install PREFIX=your_installation_directory
        ## The default installation directory is /opt/OpenBLAS.
        make USE_OPENMP=1
        make install
        ## this goes to /opt/OpenBLAS by default..
        ## hack, copy to /usr/local/lib ...
        cd /opt/OpenBLAS

           cp *.so /usr/local/lib/

        ## fmm3d
        cd /opt
        git clone https://github.com/flatironinstitute/FMM3D.git
        cd FMM3D/
        make lib
        cd lib
        ## hack... I have a complicated relationship with LD...
        ## I'm SETTING ENVIRONMENT VARIABLE, WHAT MORE DO YOU WANT link time different than runtime...
        cp *.so /usr/local/lib/
       
        ## fmm3dbie

          export LD_LIBRARY_PATH=:$LD_LIBRARY_PATH:/opt/lib:/opt/OpenBLAS/:/opt/FMM3D/lib:/usr/local/lib:
        cd /opt
        git clone https://gitlab.com/fastalgorithms/fmm3dbie.git
        cd fmm3dbie/
        cp make.inc.linux.gnu.openblas make.inc
        make lib
        cd lib
        ## hack to make linker happy...
        cp *.so /usr/local/lib/
       

%environment
        export JULIA_DEPOT_PATH=/opt/julia:$HOME/.juliasingularity:
        export PATH=$PATH:/julia:/opt/bin:
        export LD_LIBRARY_PATH=:$LD_LIBRARY_PATH:/opt/lib:/opt/OpenBLAS:/opt/FMM3D/lib:/usr/local/lib:

 

You received this message because you are subscribed to a topic in the Google Groups "singularity" group.
To unsubscribe from this topic, visit https://groups.google.com/a/lbl.gov/d/topic/singularity/vixhI9XMVs8/unsubscribe.
To unsubscribe from this group and all its topics, send an email to singularity...@lbl.gov.
To view this discussion on the web visit https://groups.google.com/a/lbl.gov/d/msgid/singularity/BYAPR04MB43583DC517F7D8789687C6E8DC669%40BYAPR04MB4358.namprd04.prod.outlook.com.

Kandes, Martin

unread,
Mar 21, 2021, 2:35:13 PM3/21/21
to singu...@lbl.gov
Perrin,

I think I've run into this problem before myself. i.e., If you build your Singularity container on another system with a lower core count than you expect to run it on, then you need to make sure to set the maximum number of OpenBLAS threads explicitly. For example, this is my make line for OpenBLAS [1]. Here is the full definition file where this line comes from [2]. I set OPENBLAS_MAX_NUM_THREADS=256 since our latest system has 128-core nodes (or 256 when hyperthreaded).

Marty

[1]

make FC='gfortran' BINARY=64 USE_OPENMP="${OPENBLAS_USE_OPENMP}" DYNAMIC_ARCH=1 NUM_THREADS="${OPENBLAS_MAX_NUM_THREADS}"

[2]


From: Perrin Meyer <perri...@gmail.com>
Sent: Sunday, March 21, 2021 2:25 AM
To: singu...@lbl.gov <singu...@lbl.gov>
Subject: Re: [Singularity] OpenMP thread scaling issue, only 8 cores, not 28 (native compile vs singularity)
 

Perrin Meyer

unread,
Mar 21, 2021, 6:17:17 PM3/21/21
to singu...@lbl.gov
Thanks,

What is OPENBLAS_USE_OPENMP set to? 
I'll try 1 for now, see if that works, then try 0... 

Is OpenBLAS smart enough to not use four times as many threads as cores, or should I set the thread limit for each Node I run on? Gosh, I'll have to spin a Singularity Container to Spin Singularity containers.   Computing has become so abstract, but still BASH and gcc command line options... 

Also, I heard that Intel was opening up the licensing for their Fortran compilers and MKL, something about its called the Intel One Suite or something marketing like that. Which in theory might mean open source projects could use Intel compilers / MKL  more easily (the people at NYU say this particular code runs 20% faster on Intel compilers.  Any hints on Singularity and Intel compilers /MKL? 

Thanks again,

perrin meyer




Perrin Meyer

unread,
Mar 21, 2021, 6:20:01 PM3/21/21
to singu...@lbl.gov
oop, I didn't notice the link to the DEF file 



Perrin Meyer

unread,
Mar 22, 2021, 12:34:07 AM3/22/21
to singularity, Perrin Meyer
Thanks! I confirmed that my re-built Singularity container with OpenBLAS threads set to 28 scales to all 28 cores (on one 28core node of the SeaWulf cluster).  I'm running more detailed timings overnight. 

I might get inspired and try to download (all 30GB!) of the newly re-branded Intel oneAPI(tm) Compilers and MKL and build a singularity container with everything built with Intel compilers and linked to MKL, but then I might also have to get some actual work done...). 

The "base" (24GB total!)

the "HPC" ("classic" Fortran.. and MKL..., New Fortran with OpenMP GPUP Offload support !

Thanks again

perrin meyer
Reply all
Reply to author
Forward
0 new messages