Running impi_intel_linux compiled FDS with multiple processes using PBS

462 views
Skip to first unread message

Ruben

unread,
Jan 26, 2024, 5:18:01 AM1/26/24
to FDS and Smokeview Discussions
Hi!

Currently, I am trying to run an Intel oneAPI compiled version of FDS 6.7.7 on a Linux cluster. The problem that I have is that the Bash script does not let me assign more processes (or threads...) but instead just runs FDS multiple times. For the sake of clarity I will first include the Bash script that I am using and then provide further information. I apologize for the rather long question, I believe all that I included might be relevant.

Any help is much appreciated.


The script (removed unnecessary long paths to directories):
#!/bin/bash

# Job Name
#PBS -N M14_opt0

# Amount of nodes and processors per node. Amount of processors is up to amount of meshes.
#PBS -l nodes=1:ppn=12

# Maximum time after which the simulation is automatically stopped.
#PBS -l walltime=00:01:00

# Amount of simultaneous processes. (Same as amount of processes)
export OMP_NUM_TREADS=12
export I_MPI_PIN_DOMAIN=omp

# Load OpenMPI
module load devtoolset/8 intel/oneapi_2022u1 mpi/openmpi-1.8.8-intel

# Removes stack size limit
ulimit -s unlimited

# Navigate to the FDS file you want to run
cd "<path>"

# Run the fds file
# np gives number of processes (here 12), second term navigates to FDS application, and third is the name of your .fds file
mpiexec -np 12 <path>/firemodels/fds/Build/impi_intel_linux_64/fds_impi_intel_linux_64 2m_M14.fds


The error log of the run returns the following:
Starting FDS ...

 MPI Process      0 started on n11-73

 Starting FDS ...

 MPI Process      0 started on n11-73

 Reading FDS input file ...


 Reading FDS input file ...


 Starting FDS ...

 MPI Process      0 started on n11-73

 Reading FDS input file ...


 Starting FDS ...

 MPI Process      0 started on n11-73

 Reading FDS input file ...


 Starting FDS ...

 MPI Process      0 started on n11-73

 Starting FDS ...

 MPI Process      0 started on n11-73

 Reading FDS input file ...


 Reading FDS input file ...


 Starting FDS ...

 MPI Process      0 started on n11-73

 Reading FDS input file ...


 Starting FDS ...

 MPI Process      0 started on n11-73

 Starting FDS ...

 MPI Process      0 started on n11-73

 Reading FDS input file ...


 Reading FDS input file ...


 Starting FDS ...

 MPI Process      0 started on n11-73

 Starting FDS ...

 MPI Process      0 started on n11-73

 Reading FDS input file ...


 Starting FDS ...

 MPI Process      0 started on n11-73

 Reading FDS input file ...


 Reading FDS input file ...


ERROR: MATL_ID, vegetation, on SURF, foliage, does not exist (CHID: 2mM14_cat)

ERROR: FDS was improperly set-up - FDS stopped (CHID: 2mM14_cat)

 Fire Dynamics Simulator

 Current Date     : January 26, 2024  10:44:00
 Revision         : FDS6.7.7-0-gfe0d4ef-HEAD
 Revision Date    : Thu Nov 18 17:10:22 2021 -0500
 Compiler         : Intel ifort 2021.2.0
 Compilation Date : Jan 25, 2024 14:27:11

 MPI Enabled;    Number of MPI Processes:       1
 OpenMP Enabled; Number of OpenMP Threads:      1

 MPI version: 3.1
 MPI library version: Intel(R) MPI Library 2021.2 for Linux* OS


 Job TITLE        : Concatenated : Validation based on the 2m, 14 % trees from Mell et al. (2009)
 Job ID string    : 2mM14_cat


I suspect the FDS input file errors are an artifact of the incorrect setup of the Bash script, as the FDS input file runs fine using a single process and node.

_____________________
Background information:

For the compiled FDS we load
module load devtoolset/8 intel/oneapi_2022u1 mpi/openmpi-1.8.8-intel
instead of
module load mpi/openmpi-x86_64

The reason for this is that the latter returned the error:
/Build/impi_intel_linux_64/fds_impi_intel_linux_64: error while loading shared libraries: libiomp5.so: cannot open shared object file: No such file or directory
I assumed this was a result of compiling FDS using Intel OneAPI.

The script that we use to submit a job for the pre-compiled FDS:
#!/bin/bash

# Job Name
#PBS -N M14_2m_old

# Amount of nodes and processors per node. Amount of processors is up to amount of meshes. (generally, try to use one node, and then the amount of processors you need)
#PBS -l nodes=1:ppn=12

# Maximum time after which the simulation is automatically stopped. 
#PBS -l walltime=500:00:00

# Amount of simultaneous processes. (Same as amount of processes)
export OMP_NUM_TREADS=12
export I_MPI_PIN_DOMAIN=omp

# Load OpenMPI (program that distributes tasks between processors)
module load mpi/openmpi-x86_64

# Direct to the folder where FDS is installed
# export MODULEPATH=[direct to fds module]
export MODULEPATH=$<path>/FDS6.7.7_SMV6.7.18/bin/modules:$MODULEPATH

# Activate FDS
module load FDS6

# Navigate to the FDS file you want to run
cd <path>

# Run the fds file
# np gives number of processes (here 12), second term navigates to FDS application, and third is the name of your .fds file
mpiexec -np 12 <path>/FDS6.7.7_SMV6.7.18/bin/fds 2m_M14.fds


Works as intended with log:
Starting FDS ...

 MPI Process      0 started on n11-73
 MPI Process      8 started on n11-73
 MPI Process      4 started on n11-73
 MPI Process      1 started on n11-73
 MPI Process      3 started on n11-73
 MPI Process      5 started on n11-73
 MPI Process      6 started on n11-73
 MPI Process      7 started on n11-73
 MPI Process      9 started on n11-73
 MPI Process     10 started on n11-73
 MPI Process     11 started on n11-73
 MPI Process      2 started on n11-73

 Reading FDS input file ...


 Fire Dynamics Simulator

 Current Date     : January 25, 2024  09:32:42
 Revision         : FDS6.7.7-0-gfe0d4ef38-release
 Revision Date    : Thu Nov 18 17:10:22 2021 -0500
 Compiler         : Intel ifort 2021.4.0
 Compilation Date : Nov 19, 2021 08:15:06

 MPI Enabled;    Number of MPI Processes:      12
 OpenMP Enabled; Number of OpenMP Threads:      1

 MPI version: 3.1
 MPI library version: Intel(R) MPI Library 2021.4 for Linux* OS

Randy McDermott

unread,
Jan 26, 2024, 2:49:25 PM1/26/24
to fds...@googlegroups.com
Others are better at this stuff than me... but my bet is that you have a conflict here:

# Activate FDS
module load FDS6

Look at that module (it is in the FDS/FDS6/bin/modules directory) and you will see that you are overwriting the openmpi libraries you are exporting before you load that module.

--
You received this message because you are subscribed to the Google Groups "FDS and Smokeview Discussions" group.
To unsubscribe from this group and stop receiving emails from it, send an email to fds-smv+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/fds-smv/cbd7fd25-9139-4064-9d25-92dc337b460en%40googlegroups.com.

Ruben

unread,
Jan 26, 2024, 4:03:04 PM1/26/24
to FDS and Smokeview Discussions
Dear Randy,

Thank you for your response.

Perhaps my original post was confusing. The problem I described has to do with the first Bash script, not the second. In this script, the module FDS6 is never loaded, as I am not using the pre-compiled version of FDS. I included the second Bash script to show that for the pre-compiled FDS version, the process distribution does work.

Kind regards

Op vrijdag 26 januari 2024 om 20:49:25 UTC+1 schreef Randy McDermott:

Kevin McGrattan

unread,
Jan 26, 2024, 4:43:51 PM1/26/24
to fds...@googlegroups.com
Why are you loading Intel oneAPI and Open MPI libraries at the same time? oneAPI has its own MPI library.

o...@aquacoustics.biz

unread,
Jan 27, 2024, 1:52:30 AM1/27/24
to FDS and Smokeview Discussions
Your first scrip also has a syntax error:   export OMP_NUM_TREADS=12

Ruben

unread,
Jan 27, 2024, 5:01:35 AM1/27/24
to FDS and Smokeview Discussions
@aquacoustics.biz   Thank you for pointing out the error in  export OMP_NUM_TREADS=12 . I assume you meant the typo in TREADS -> THREADS. Unfortunately, this did not resolve the problem. Interestingly, the second script worked as intended while containing the same error.

@Kevin The first two modules in  module load devtoolset/8 intel/oneapi_2022u1 mpi/openmpi-1.8.8-intel are prerequisite modules for   mpi/openmpi-1.8.8-intel (I don't know if this is specific to the HPC cluster I'm using...)

I first tried to run the script with module load devtoolset/8 intel/oneapi_2022u1 , but that did not work either. Below is the error log.
(this error remained the same after fixing the typo)

Abort(1091215) on node 0 (rank 0 in comm 0): Fatal error in PMPI_Init_thread: Other MPI error, error stack:
MPIR_Init_thread(138)........:
MPID_Init(1139)..............:
MPIDI_OFI_mpi_init_hook(1728): OFI get address vector map failed
Abort(1091215) on node 1 (rank 1 in comm 0): Fatal error in PMPI_Init_thread: Other MPI error, error stack:
MPIR_Init_thread(138)........:
MPID_Init(1139)..............:
MPIDI_OFI_mpi_init_hook(1728): OFI get address vector map failed
Abort(1091215) on node 2 (rank 2 in comm 0): Fatal error in PMPI_Init_thread: Other MPI error, error stack:
MPIR_Init_thread(138)........:
MPID_Init(1139)..............:
MPIDI_OFI_mpi_init_hook(1728): OFI get address vector map failed
Abort(1091215) on node 3 (rank 3 in comm 0): Fatal error in PMPI_Init_thread: Other MPI error, error stack:
MPIR_Init_thread(138)........:
MPID_Init(1139)..............:
MPIDI_OFI_mpi_init_hook(1728): OFI get address vector map failed
Abort(1091215) on node 4 (rank 4 in comm 0): Fatal error in PMPI_Init_thread: Other MPI error, error stack:
MPIR_Init_thread(138)........:
MPID_Init(1139)..............:
MPIDI_OFI_mpi_init_hook(1728): OFI get address vector map failed
Abort(1091215) on node 5 (rank 5 in comm 0): Fatal error in PMPI_Init_thread: Other MPI error, error stack:
MPIR_Init_thread(138)........:
MPID_Init(1139)..............:
MPIDI_OFI_mpi_init_hook(1728): OFI get address vector map failed
Abort(1091215) on node 6 (rank 6 in comm 0): Fatal error in PMPI_Init_thread: Other MPI error, error stack:
MPIR_Init_thread(138)........:
MPID_Init(1139)..............:
MPIDI_OFI_mpi_init_hook(1728): OFI get address vector map failed
Abort(1091215) on node 7 (rank 7 in comm 0): Fatal error in PMPI_Init_thread: Other MPI error, error stack:
MPIR_Init_thread(138)........:
MPID_Init(1139)..............:
MPIDI_OFI_mpi_init_hook(1728): OFI get address vector map failed
Abort(1091215) on node 8 (rank 8 in comm 0): Fatal error in PMPI_Init_thread: Other MPI error, error stack:
MPIR_Init_thread(138)........:
MPID_Init(1139)..............:
MPIDI_OFI_mpi_init_hook(1728): OFI get address vector map failed
Abort(1091215) on node 9 (rank 9 in comm 0): Fatal error in PMPI_Init_thread: Other MPI error, error stack:
MPIR_Init_thread(138)........:
MPID_Init(1139)..............:
MPIDI_OFI_mpi_init_hook(1728): OFI get address vector map failed
Abort(1091215) on node 10 (rank 10 in comm 0): Fatal error in PMPI_Init_thread: Other MPI error, error stack:
MPIR_Init_thread(138)........:
MPID_Init(1139)..............:
MPIDI_OFI_mpi_init_hook(1728): OFI get address vector map failed
Abort(1091215) on node 11 (rank 11 in comm 0): Fatal error in PMPI_Init_thread: Other MPI error, error stack:
MPIR_Init_thread(138)........:
MPID_Init(1139)..............:
MPIDI_OFI_mpi_init_hook(1728): OFI get address vector map failed

Op zaterdag 27 januari 2024 om 07:52:30 UTC+1 schreef o...@aquacoustics.biz:

o...@aquacoustics.biz

unread,
Jan 27, 2024, 5:46:19 AM1/27/24
to FDS and Smokeview Discussions
Compare the mpiexec lines in the two scripts.  You're running a different fds file in a different directory.


mpiexec -np 12 <path>/firemodels/fds/Build/impi_intel_linux_64/fds_impi_intel_linux_64 2m_M14.fds

mpiexec -np 12 <path>/FDS6.7.7_SMV6.7.18/bin/fds 2m_M14.fds

So use the second mpiexec line in the first script (and bash from the 2m_M14.fds directory).

I suspect that  <path>/firemodels/fds/Build/impi_intel_linux_64/fds_impi_intel_linux_64  may not exist on all nodes in the cluster.

If you are going to use OMP then 12 threads is not efficient use of cores and, subject to your hardware, is may over-subscribe your nodes.   

Ruben

unread,
Jan 27, 2024, 7:39:13 AM1/27/24
to FDS and Smokeview Discussions
As a result of your comment, I realized that I did not understand the script properly. For some reason, I thought that the number of threads had to equal the number of processes on 1 node. However, all that I am currently trying to do, is use 12 processors on 1 node to compute the simulation for 12 meshes. The number of threads is simply 1 in this case.

Regarding your comment about the different directories: this is done on purpose. I am using different directories for the pre-compiled version and compiled version of FDS. The .fds file is in both directories.

I think it is useful to rephrase my original problem, having adjusted the script that I am running. The simulation consists of 12 meshes and I want to run this on 1 node using 12 processors. Moreover, I want to run it using a custom Intel OneAPI compiled FDS version (although I have not made any adjustments to the source code, yet). As there are to my knowledge multiple libraries / modules available for MPI on the Linux cluster, I have tried several of them.

The script that I am using looks as follows:

#!/bin/bash

# Job Name
#PBS -N M14_opt0

# Amount of nodes and processors per node. Amount of processors is up to amount of meshes.
#PBS -l nodes=1:ppn=12

# Maximum time after which the simulation is automatically stopped.
#PBS -l walltime=00:01:00

export OMP_NUM_THREADS=1
export I_MPI_PIN_DOMAIN=omp

module load mpi/openmpi-x86_64

# Navigate to the FDS file you want to run
cd "<path>"

# Run the fds file
mpiexec -np 12  <path>/firemodels/fds/Build/impi_intel_linux_64/fds_impi_intel_linux_64 2m_M14.fds


The module mpi/openmpi-x86_64 gives the error log:

<path>/firemodels/fds/Build/impi_intel_linux_64/fds_impi_intel_linux_64: error while loading shared libraries: libiomp5.so: cannot open shared object file: No such file or directory
<path>  /firemodels/fds/Build/impi_intel_linux_64/fds_impi_intel_linux_64: error while loading shared libraries: libiomp5.so: cannot open shared object file: No such file or directory
(repeated)

________________________________________________
The modules devtoolset/8 intel/oneapi_2022u1 (devtoolset/8 is a prerequisite module) return:
________________________________________________
Lastly, using devtoolset/8 intel/oneapi_2022u1 mpi/openmpi-1.8.8-intel (like the original script) we get:

 Reading FDS input file ...


ERROR: The number of MPI processes, 1, exceeds the number of meshes, 0 (CHID: 2mM14_cat)


ERROR: FDS was improperly set-up - FDS stopped (CHID: 2mM14_cat)

 Starting FDS ...

 MPI Process      0 started on n11-73

 Starting FDS ...

 MPI Process      0 started on n11-73

 Reading FDS input file ...


 Reading FDS input file ...


 Starting FDS ...

 MPI Process      0 started on n11-73

 Reading FDS input file ...


 Fire Dynamics Simulator

 Current Date     : January 27, 2024  13:35:17

 Revision         : FDS6.7.7-0-gfe0d4ef-HEAD
 Revision Date    : Thu Nov 18 17:10:22 2021 -0500
 Compiler         : Intel ifort 2021.2.0
 Compilation Date : Jan 25, 2024 14:27:11

 MPI Enabled;    Number of MPI Processes:       1
 OpenMP Enabled; Number of OpenMP Threads:      1

 MPI version: 3.1
 MPI library version: Intel(R) MPI Library 2021.2 for Linux* OS


(fds info repeated)
Op zaterdag 27 januari 2024 om 11:46:19 UTC+1 schreef o...@aquacoustics.biz:

Tim O'Brien

unread,
Jan 27, 2024, 4:49:28 PM1/27/24
to fds...@googlegroups.com

You have a problem with the mpi library paths.  Kevin alluded to this with his comment about two versions of mpi.

FDS bundle runtime libraries are (by default) in ~/FDS/FDS6/bin/mpi.  The Intel OneAPI libraries are (by default) in /opt/intel/oneapi/…   The libraries are different.  
libiomp5.so does not exist in the FDS bundle.  So you need to adjust the mpi library path to suit your OneAPI environment.  Refer to the OneAPI installation instructions which specifically discuss setting environment variables.  There is a script file called setvars (or similar) in the in the OneAPI directories for this purpose.

--
You received this message because you are subscribed to a topic in the Google Groups "FDS and Smokeview Discussions" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/fds-smv/ltTQnJJc2rc/unsubscribe.
To unsubscribe from this group and all its topics, send an email to fds-smv+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/fds-smv/fd50dcb0-035c-48eb-927c-2ff821d96e6dn%40googlegroups.com.

Kevin McGrattan

unread,
Jan 27, 2024, 5:08:05 PM1/27/24
to fds...@googlegroups.com
Here's a thought. If all else fails, you should be able to compile the code using the Intel oneAPI Fortran compiler and MPI libraries, and then just replace the fds executable in the FDS/FDS6/bin directory with the one you compiled. Then you just run like you would if you were running the compiled executable that you download.

o...@aquacoustics.biz

unread,
Jan 28, 2024, 6:08:09 AM1/28/24
to FDS and Smokeview Discussions
This works (and I use it routinely) but with one caveat.  The version of MPI under OneAPI must be consistent on all nodes.  If it isn't then you can anticipate errors of the general form:

Abort(1614991) on node 0 (rank 0 in comm 0): Fatal error in PMPI_Init_thread: Other MPI error, error stack:
MPIR_Init_thread(143)........:
MPID_Init(1310)..............:
MPIDI_OFI_mpi_init_hook(1953):
MPIDU_bc_table_create(320)...: Missing hostname or invalid host/port description in business card

Ruben

unread,
Jan 29, 2024, 2:48:21 AM1/29/24
to FDS and Smokeview Discussions
Thank you all for your responses, they have been very helpful!

I was completely stuck on this, but I'm sure I can get it working now using one of the two suggestions.

Kind regards,

Ruben

Op zondag 28 januari 2024 om 12:08:09 UTC+1 schreef o...@aquacoustics.biz:

Ruben

unread,
Jan 29, 2024, 5:15:46 AM1/29/24
to FDS and Smokeview Discussions
In case anyone else runs into a similar issue, I thought I would describe what I did to fix it without changing the environment paths myself.

First, I used Kevin's idea of copying the executable to the pre-compiled  FDS/FDS6/bin directory. This works as intended but make sure to copy also the other files in the compiled FDS executable directory (e.g. in firemodels/fds/Build/impi_intel_linux_64/) otherwise it returns the error: 
execvp error on file <path to executable> (Permission denied)

Then, I figured out that you could also copy the  FDS/FDS6/bin/modules/FDS6 file to the custom compiled FDS directory in a created 'modules' directory (e.g. /firemodels/fds/modules). I adjusted the Bash script to read this file, which then loads the modules from the pre-compiled FDS automatically. The script reads the file via:

export MODULEPATH=$<path>/firemodels/fds/modules:$MODULEPATH


# Activate FDS
module load FDS6


For me, the second solution is preferred because it allows me to separate different compiled versions of FDS.

Thanks again for all the help.
Op maandag 29 januari 2024 om 08:48:21 UTC+1 schreef Ruben:
Reply all
Reply to author
Forward
0 new messages