I have access to an academic HPC cluster. I am currently running a OpenMP based Fortran BEM code (wrapped in Julia) (on a single 28 core node).
When I compile it on the cluster nodes, it scales to all 28 cores
After a bit of work, I managed to build a Ubuntu 20 based Singularity container, and I compile the fortran code, julia wrappers, etc. This is on my laptop, since I dont have root access on the HPC cluster (that would be nice...).
However, when I run the same fortran code on the same nodes Instead of 28 core scaling, I'm only getting 8 node scaling. (The quadrature scheme is embarrassingly parallel, so I can tell from just looking at top. The cluster compiled code gets 2799% cpu utilization, the singularity container gets 799% cpu utilization. When it hits GMRES its not so regular).
OMP_NUM_THREADS is not set in either environment... I tried setting OMP_NUM_THREADS to 28, but it still only scales to 8 cores
My laptop has 4 cores/8 threads, which is suspicious. Why would compiling a singularity image on my 8 thread laptop limit me to 8 threads on a HPC cluster node with 28 cores? (and I know by native compilation the same fortran code can scale to 28 cores..)
Any help would be greatly appreciated
Sincerely
Perrin Meyer