Wall clock time increases with the number of cores used

38 views
Skip to first unread message

Gerard Lods

unread,
Jul 23, 2025, 12:37:33 AMJul 23
to pflotran-users
Good morning,
==========================================
1) Problem description
 We have the pflotran parallel problem : "wall clock time increases with the number of cores used", observed with the input file : .../pflotran/shortcourse/exercises/1D_Calcite/calcite_flow_and_tran.in, with different discretizations GRID NXYZ.

Note that outputs are identical with those in serial.

  
==========================================
2) Linux configuration

The installation was carried out following the procedure :  https://documentation.pflotran.org/user_guide/how_to/installation/linux.html#linux-install. The regression tests worked correctly.

The mpirun we used ($PATH) is /usr/bin/mpirun -> /etc/alternatives/mpirun -> /usr/bin/mpirun.openmpi -> /usr/bin/orterun :
$ ($PATH)mpirun --version   
mpirun (Open MPI) 4.1.6

Here are some other informations :

$ .../petsc/arch-linux-c-opt/bin/mpirun --version
HYDRA build details:
    Version:                                 4.2.1
    Release Date:                            Wed Apr 17 15:30:02 CDT 2024
    CC:                              gcc   -fPIC -Wno-lto-type-mismatch -Wno-stringop-overflow -O3
    Configure options:                       '--disable-option-checking' '--prefix=/home/u_pflotran/petsc/arch-linux-c-opt' 'MAKE=/usr/bin/gmake' '--libdir=/home/u_pflotran/petsc/arch-linux-c-opt/lib' 'CC=gcc' 'CFLAGS=-fPIC -Wno-lto-type-mismatch -Wno-stringop-overflow -O3 -g -O2' 'AR=/usr/bin/ar' 'ARFLAGS=cr' 'CXX=g++' 'CXXFLAGS=-Wno-lto-type-mismatch -Wno-psabi -O3 -std=gnu++20 -fPIC -g -O2' 'FFLAGS=-fPIC -ffree-line-length-none -ffree-line-length-0 -Wno-lto-type-mismatch -O3 -Wno-unused-function -fallow-argument-mismatch -fallow-argument-mismatch -g -O2' 'FC=gfortran' 'F77=gfortran' 'FCFLAGS=-fPIC -ffree-line-length-none -ffree-line-length-0 -Wno-lto-type-mismatch -O3 -Wno-unused-function -fallow-argument-mismatch -fallow-argument-mismatch -g -O2' '--enable-shared' '--with-pm=hydra' '--disable-java' '--with-hwloc=embedded' '--with-device=ch3:nemesis' '--enable-g=meminit,dbg' 'PYTHON=/usr/bin/python3' '--disable-maintainer-mode' '--disable-dependency-tracking' '--cache-file=/dev/null' '--srcdir=.' 'LDFLAGS=' 'LIBS=' 'CPPFLAGS= -I/home/u_pflotran/petsc/arch-linux-c-opt/externalpackages/mpich-4.2.1/src/mpl/include -I/home/u_pflotran/petsc/arch-linux-c-opt/externalpackages/mpich-4.2.1/modules/json-c -I/home/u_pflotran/petsc/arch-linux-c-opt/externalpackages/mpich-4.2.1/modules/hwloc/include -D_REENTRANT -I/home/u_pflotran/petsc/arch-linux-c-opt/externalpackages/mpich-4.2.1/src/mpi/romio/include -I/home/u_pflotran/petsc/arch-linux-c-opt/externalpackages/mpich-4.2.1/src/pmi/include'
    Process Manager:                         pmi
    Launchers available:                     ssh rsh fork slurm ll lsf sge manual persist
    Topology libraries available:            hwloc
    Resource management kernels available:   user slurm ll lsf sge pbs cobalt
    Demux engines available:                 poll select
$ cat /etc/lsb-release
DISTRIB_ID=LinuxMint
DISTRIB_RELEASE=22.1
DISTRIB_CODENAME=xia
DISTRIB_DESCRIPTION="Linux Mint 22.1 Xia"
$ cat /etc/os-release
NAME="Linux Mint"
VERSION="22.1 (Xia)"
ID=linuxmint
ID_LIKE="ubuntu debian"
PRETTY_NAME="Linux Mint 22.1"
VERSION_ID="22.1"
HOME_URL="https://www.linuxmint.com/"
SUPPORT_URL="https://forums.linuxmint.com/"
BUG_REPORT_URL="http://linuxmint-troubleshooting-guide.readthedocs.io/en/latest/"
PRIVACY_POLICY_URL="https://www.linuxmint.com/"
VERSION_CODENAME=xia
UBUNTU_CODENAME=noble
$ uname -a
Linux pflocalcul 6.8.0-55-generic #57-Ubuntu SMP PREEMPT_DYNAMIC Wed Feb 12 23:42:21 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
$ gcc --version
gcc (Ubuntu 13.3.0-6ubuntu2~24.04) 13.3.0
Copyright (C) 2023 Free Software Foundation, Inc.
$ pflotran -version
Petsc Release Version 3.21.5, unknown
       The PETSc Team
    petsc...@mcs.anl.gov
https://petsc.org/
See https://petsc.org/release/changes for recent updates.
See https://petsc.org/release/faq for problems.
See https://petsc.org/release/manualpages for help.
Libraries linked from /home/u_pflotran/petsc/arch-linux-c-opt/lib
$ cd $PFLOTRAN_DIR/src/pflotran
$ make clean
$ make pflotran >& make.log   
Please find attached the file make.log .

========================================== 
3) Test of mpirun

We verifed that mpirun works correctly by calculating the number pi by numerical integration (trapezium method) over n=2000000000 intervals with :
   
- Distribution of calculations to the cores is ok.
- Calculation of pi is ok.
- Wall clock time decreases with increasing number of cores.
$ mpirun -n 3 pi_parallel
---------------------------------------------
Number of iterations  = 2000000000
Number of cores       = 3
Core 1 : Number of iterations = 666666667 (fraction = 1/3.000000)
Core 0 : Number of iterations = 666666667 (fraction = 1/3.000000)
Core 2 : Number of iterations = 666666666 (fraction = 1/3.000000)
pi is approximately 3.1415926535898153, Error is 0.0000000000000222
wall clock time = 3.332112E+00 s
---------------------------------------------

   
Please find attached the C codes, pi_serial.c and pi_parallel.c. 
Note that :
    #include <mpi.h> did not work.
We used :
    #include "/usr/lib/x86_64-linux-gnu/openmpi/include/mpi.h"
----------------------------------------------------------------------
Best regards,
Gerard 
pi_parallel.c
make.log
pi_serial.c

Hammond, Glenn E

unread,
Jul 23, 2025, 12:19:37 PMJul 23
to pflotra...@googlegroups.com
Gerard,

As noted in the study published in Water Resources Research (https://agupubs.onlinelibrary.wiley.com/doi/full/10.1002/2012WR013483), PFLOTRAN exhibits superior strong scaling performance when the number of degrees of freedom (DOF) per process (core) exceeds 10,000. Given that your problem scenarios involve relatively few DOFs—on the order of thousands—optimal scaling behavior is unlikely under these conditions.

I recommend running the short course regional_doublet problem with a refined grid to potentially achieve better performance. Furthermore, it is worth noting that a standard Linux desktop motherboard is not designed for efficient communication, either in terms of latency or bandwidth. As the number of processes increases, memory contention further reduces performance. The speedups reported in the 2014 Water Resources Research paper are specifically tied to executions on supercomputers that incorporate high-performance communication interconnects.

Glenn

From: 'Gerard Lods' via pflotran-users <pflotra...@googlegroups.com>
Date: Tuesday, July 22, 2025 at 9:37 PM
To: pflotran-users <pflotra...@googlegroups.com>
Subject: [pflotran-users: 8506] Wall clock time increases with the number of cores used

Check twice before you click! This email originated from outside PNNL.
--
You received this message because you are subscribed to the Google Groups "pflotran-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pflotran-user...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/pflotran-users/08b83890-5097-421d-ab6b-77bce3498ebbn%40googlegroups.com.
img-5480a0bd-96c2-4fcc-826b-ca5073ecae9d
img-41ff8ce2-c474-4a33-a651-c74365b1dbb9

Gerard Lods

unread,
Jul 28, 2025, 12:49:41 AMJul 28
to pflotran-users
Glenn,

Thank you for clarifying this.

Best,
Gerard

Reply all
Reply to author
Forward
0 new messages