[FDS-SMV Developer Blog] FDS Parallel Processing using MPI (Message Passing Interface)

2,865 views
Skip to first unread message

Kevin McGrattan

unread,
Jul 9, 2014, 1:38:26 PM7/9/14
to fds...@googlegroups.com
The latest release of FDS (6.1.0) runs with OpenMP by default; that is, the code uses multiple cores/processors of a single computer to process a single mesh. In other words, OpenMP is a "shared memory, multi-processor" form of parallel processing. We are now looking at the MPI version of FDS. This is where we use multiple computers to process multiple meshes -- distributed memory, multi-processor. For linux and OS X, we use Open MPI, an open-source, free set of MPI libraries. For Windows, we have been using MPICH2 (MPI-2), a similar set of libraries distributed by Argonne National Labs. We've recently learned that the MPICH team is dropping support for Windows, and a team from Microsoft has developed MS-MPI, a similar set of libraries as MPICH, to take its place. We have experimented with MS-MPI recently, but we discovered that MS-MPI is designed for use on a Windows HPC server, essentially a dedicated cluster of computers running a special OS specifically for high performance computing. MPICH had the advantage of running on an ordinary office network. We recognize that this is probably a common configuration for small engineering firms. Now with support for MPICH on Windows going away, and MS-MPI not quite what we are looking for, we are searching for another alternative. There are two, that we know of. First, Open MPI can, in theory, port to Windows, but we discovered that it involves installing Cygwin, essentially a unix/linux emulator for MS Windows. That proved to be quite onerous. Next, there is Intel MPI. At NIST, we use Intel compilers for both FDS and Smokeview releases. Intel sells its own MPI libraries as an add-on to its existing compilers. It also would allow us to distribute the run-time libraries needed for you to be able to run our compiled version of FDS. We are currently testing Intel MPI, and while we do, if any of you has any experience with it or comments in general on MPI, we'd like to hear from you. You can post your comments via the FDS-SMV Discussion Group under this thread.



--
Posted By Kevin McGrattan to FDS-SMV Developer Blog at 7/09/2014 01:38:00 PM

Johan Borgman

unread,
Jul 10, 2014, 12:31:07 PM7/10/14
to fds...@googlegroups.com

I see the logic not going for MS-MPI support, since it is limited to Windows Server OS. But since I'm currently considering Windows HPC I am curious whether you were successful in your experiments with MS-MPI or not?

Kevin

unread,
Jul 10, 2014, 2:01:36 PM7/10/14
to fds...@googlegroups.com
We did succeed in compiling FDS with the MS-MPI libraries. But we don't have an HPC cluster, so we found it very difficult to run an FDS job across our office network. We had to do quite a bit of work, most of which I could probably not reproduce, to get the machines to recognize each other. Most troublesome was having to run the smpd (the daemon that controls message passing) as a user process in the foreground, as opposed to a background system process. The person at MS who was helping us said that this mode of operation was typically for debugging and development, not regular processing. He told us that the HPC cluster configuration is the target of MS-MPI, not a regular office network. i would guess that if you got an HPC cluster, must of what we did manually would be done automatically as part of the installation. That is why these "clusters" (be they Windows, Linux or Mac) are popular.

Matt

unread,
Jul 11, 2014, 4:51:30 AM7/11/14
to fds...@googlegroups.com
Hi Kevin,

Have you considered using the graphics card?  I know it wont help with memory or large jobs across a network, but it may speed up single PC jobs....

I note the CUDA reference CFD projects on their site - might be worthy of investigation.  I looked into it a while back - but it wasn't for me.


Matt.

Lukas A.

unread,
Jul 11, 2014, 5:38:55 AM7/11/14
to fds...@googlegroups.com
Hey Kevin,

I have been using IntelMPI, parallel to other MPI implementations to compile and run FDS. There have been some issues in combination with FDS (it was the MPI_ALLGATHERV call mainly), which caused some trouble. However, we reported the issue and Intel fixed it.

Anyway, as IntelMPI is commercial, not all users will be able to use it on their systems, without buying IntelMPI. This is only an issue if you provide a precompiled software, and the users do not want to compile and link by themselves. General scientific (HPC) software is commonly compiled on each system to match the environment (i.e. MPI). The reason is, that the (MPI-)libraries linked to the binary executable might not match the (MPI-)runtime system on the target computer. For example:

The values of some variables, like MPI_COMM_WORLD or others are not consistent within the MPI implementations. Here are a few lines of the mpi.h header files of MPICH and OpenMPI:

MPICH:
#define MPI_PROC_NULL (-1)
#define MPI_ANY_SOURCE (-2)
#define MPI_ROOT (-3)

OpenMPI:
#define MPI_ANY_SOURCE -1
#define MPI_PROC_NULL -2
#define MPI_ROOT -4

This might in general trigger issues if an application is compiled and liked against MPI implementation A and run in the runtime environment of MPI implementation B.

In the end, I would encourage everyone who runs a cluster to compile their own executables. If the source code follows the MPI standard, it should compile and link with all MPI implementations (if they fully support the MPI standard).

Best,
Lukas
> --
> You received this message because you are subscribed to the Google Groups "FDS and Smokeview Discussions" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to fds-smv+u...@googlegroups.com.
> To post to this group, send email to fds...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/fds-smv/047d7b41419ece6ec804fdc62cb2%40google.com.
> For more options, visit https://groups.google.com/d/optout.

Kevin

unread,
Jul 11, 2014, 10:12:40 AM7/11/14
to fds...@googlegroups.com
There has been some discussion of CUDA on our discussion group, but I am a bit suspicious of these claims of orders of magnitude speed up. We have found that OpenMP gives us about a factor of 2 speed up on a single machine, and the bottleneck is not the number of cores but rather the speed of getting information in and out of RAM. I would have to study more closely the nature of the algorithms in these CUDA CFD programs. The 3-D loops in FDS can require significant amounts of information from RAM. I don't see how GPUs can get information more rapidly than the cores of the CPU.

Kevin

unread,
Jul 11, 2014, 10:24:54 AM7/11/14
to fds...@googlegroups.com
Thanks for the info. I agree that it is best to compile and link to libraries in a consistent way. So we will continue to conform to all Fortran and MPI standards. We will probably continue to use OpenMPI in building our linux and OSX release binaries. The issue for us now is Windows. We could continue to use MPICH, but it would remain MPI-2. We see that MPI-3 is beginning to spread, and we want to take advantage of any advances in MPI. All other options that we know of for Windows do not seem practical except Intel MPI. We will buy the Intel MPI libraries, but we would then release free redistributable files as part of the FDS package. That is, we would add smpd.exe, mpiexec.smpd.exe and impi.dll to the bin directory where we store the FDS binaries. We are working now to build into our installation script all of the necessary environment variables, launch commands, and path settings so that the end user does not have to do this. We suspect that many do not use MPICH2 because of the need to do a separate install and variable settings. It was a struggle for me and a very clever student to get MPICH2 to work across our network.

As for the fact that Intel MPI is commercial -- we already use Intel compilers and distribute these binaries, while some FDS users compile with free compilers like Gnu Fortran and PG Fortran. We'll continue with this way of doing things.


Kevin

unread,
Jul 24, 2014, 3:18:10 PM7/24/14
to fds...@googlegroups.com
We are looking for volunteers to try a test version of FDS using the Intel MPI libraries for Windows. The test package can be found here:

https://drive.google.com/folderview?id=0B_wB1pJL2bFQcURod1UyZTJUaEE&usp=sharing

The installation procedure is the same as that of the current release of FDS.

A few things. First, we have only tested this version on a Windows domain network; that is, a network where user accounts are centrally managed. It is OK if this network has a firewall; the installation script will set up the necessary exceptions to allow FDS to run in parallel across the network even with the firewall in place. Second, the test installation will install FDS 6.1.1, but the old MPICH executable (fds_mpi.exe) will be overwritten. So anyone who has MPICH working and wants to continue using MPICH until we officially release should avoid the test installation because it will not only overwrite the executable, it will also install the Intel variant of mpiexec. Finally, there is no need to install the MPI package yourself. Everything you need to run should be in the test package. At least, this is what we want to test.

Volunteers should post to this thread whether they have succeeded or failed to run a simple test case. Do not spend too much time fussing with this -- if the installation procedure is not simple then we have to rethink it. We have found that the old MPICH procedure was fairly difficult, and we think we have a much simpler process now.

Kevin

unread,
Jul 24, 2014, 3:21:34 PM7/24/14
to fds...@googlegroups.com
One last thing, and most importantly, notes on running the test version can be found here:

https://code.google.com/p/fds-smv/wiki/Running_FDS_MPI_on_Windows

Follow the instructions under "Intel MPI."

Antony

unread,
Jul 27, 2014, 8:46:39 PM7/27/14
to fds...@googlegroups.com
Hi Kevin,

I installed this binary and run with localonly successfully.

Kevin於 2014年7月25日星期五UTC+8上午3時21分34秒寫道:

Kevin

unread,
Jul 28, 2014, 8:56:36 AM7/28/14
to fds...@googlegroups.com
Thanks for checking. Can you now check to see if you can run a multi-mesh job on two computers?

Andrew

unread,
Jul 29, 2014, 6:53:22 AM7/29/14
to fds...@googlegroups.com
Hi

I have just checked this on a single dual processor machine running Windows 7.  I found that the install was very simple apart from the fact that when I entered the command mpiexec sometimes it would resort to the MPICH version.  Once I edited the environmental variables to remove all references to the MPICH2 folder everything worked fine.

Also when checking the processor graphs the Intel mpi version has fixed issue 2120 in that for my 8 threaded test job the first 4 cores of each processor were loaded fully with all other cores idle.

Andrew

Kevin

unread,
Jul 29, 2014, 8:59:51 AM7/29/14
to fds...@googlegroups.com
I am surprised that the MPICH2 version of mpiexec took precedence. We designed the install script so that the path to FDS6/bin would take precedence over all other paths. Could it be that you did not open a new command prompt in which the new paths would take effect? It is too late now, but I would have suggested that you type "where mpiexec" to determine which version is listed first. That is the one that gets executed.

In any case, it's good that you have it working on a single machine. But the real challenge is to test it on a group of computers on the network. Do you have other computers to test on?

Andrew

unread,
Jul 29, 2014, 9:31:16 AM7/29/14
to fds...@googlegroups.com
This is what "where mpiexec" currently generates.  I think the issue might have been when I removed the MPICH2 environmental variable it was at the front:

C:\Program Files\fds\FDS6\bin\mpiexec.exe
C:\Program Files\MPICH2\bin\mpiexec.exe

We have a Windows HPC cluster however as the two versions cant be ran together the whole cluster would need to be changed over.  Does version 6.1.1 just incorporate the mpi changes or are there physics changes as well? 

Kevin

unread,
Jul 29, 2014, 10:32:16 AM7/29/14
to fds...@googlegroups.com
Yes, by changing the path order, you have given the Intel MPI mpiexec precedence.

Version 6.1.1 is just a maintenance release of 6.1. There are no changes in physics. Also, the test bundle has the new MPI. The "official" FDS release still uses MPICH2. We want to release FDS 6.2.0 with Intel MPI.

Lukas A.

unread,
Aug 1, 2014, 3:05:50 AM8/1/14
to fds...@googlegroups.com
Kevin,

you are _only_ looking for Windows + IntelMPI testing volunteers, right?

Best,
Lukas
> --
> You received this message because you are subscribed to the Google Groups "FDS and Smokeview Discussions" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to fds-smv+u...@googlegroups.com.
> To post to this group, send email to fds...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/fds-smv/ee06da18-bcc3-4162-80aa-e4d1c190511a%40googlegroups.com.

Kevin

unread,
Aug 1, 2014, 8:01:35 AM8/1/14
to fds...@googlegroups.com
Yes. We have Intel MPI installed on our linux cluster too, and we plan to compare it to Open MPI. But for the moment I want to focus on Windows domain networks. Open MPI is still supported on linux and OSX, but it is more difficult to use on Windows because it requires one to install cygwin.
> To unsubscribe from this group and stop receiving emails from it, send an email to fds-smv+unsubscribe@googlegroups.com.

Ed

unread,
Aug 5, 2014, 9:42:54 AM8/5/14
to fds...@googlegroups.com
On using OpenMP and bottleneck due to RAM, I'm currently trying to find a way to quantify how many cores I should dedicate to a model before we lose the advantage?  I'm currently experimenting with the same model but different number of cores and for example, I doubled the core count from 4 to 8 on a Xeon E5 (v2), and I am only seeing about ~10% improvement . 

I appreciate this is simplistic; I'm currently thinking this in terms of the speed of the memory against the combined speed of the cores where this suggests 6 cores is the sweet spot.  I appreciate any ideas of a better way to quantify this.  

PS - Thank you for OpenMP, it reduced the need to split the model into meshes and therefore reducing one area of consideration on large model (mesh interface).  

Many thanks.

Kevin

unread,
Aug 5, 2014, 1:01:46 PM8/5/14
to fds...@googlegroups.com
We find that there are diminishing returns after 4 cores. We have a linux cluster where each node (which has a single bank of RAM) has two sockets (CPUs) and each socket has 4 cores. We find that a good way to use this machine is to put MPI processes on each socket and have each supported by four cores. In our next release, we are going to combine MPI and OpenMP together to enable users to use both together.

Kumar

unread,
Aug 26, 2014, 12:49:09 AM8/26/14
to fds...@googlegroups.com
I am trying to run a case already completed in FDS 5.5.3, now with FDS 6.1.
 
The performance of FDS 6.0 and FDS 6.1 when compared, I thought, I shall use 6.1 which detects and uses all the threads.
However, the permonace is too slow compared to the serial run I did with FDS 5.5.3.
 
In FDS 5.5.3, the simulation of 1600 s, could be completed within 20 hrs.

       Total CPU: 20.34 hr, Total time: 1600.02 s, Time step: 0.02022 s

In FDS 6.1,With same mesh, same problem,  the simulation could only run upto 110 s, in 20 hrs.

Total CPU:       20.93 hr,  Total time:     110.06 s. Time step: 0.536E-03 s

The timestep is too low in FDS 6.1.

Attaching the performance and machine details snapshots.

 

 

 

      

 
Untitled.png

Kevin

unread,
Aug 26, 2014, 9:26:28 AM8/26/14
to fds...@googlegroups.com
If there is a significantly different time step, it is possible that there is a difference in the way the fire physics are being modeled. I cannot comment on your case further. Can you submit a simple example to the Issue Tracker. Include the FDS 5 input file and the FDS 6 input file. Make these input files as close as possible.
Reply all
Reply to author
Forward
0 new messages