Running FDS in Parallel Mode vs Serial Mode

1,402 views
Skip to first unread message

lawrence

unread,
Oct 18, 2009, 12:21:15 AM10/18/09
to FDS and Smokeview Discussions
Hi,

Currently I can either run a FDS parallel mode using the Pyrosim or
the FDS_MPI in the command prompt. I noticed that the FDS in Pyrosim
parallel mode is able to max out (100%) the CPU processing power and
when I want to reduce the CFD run time i will use the Pyrosim parallel
run mode option. If i use FDS_MPI the cpu processing power averages at
50% CPU resource usage.

Lately for a partucular project, I have noticed a significant
difference between a parallel and serial run results (both done on
64bit OS), the parallel run gives a significantly better visibility
result. The CFD model is modelling a 4 MW fire event in a basement
which is provided with a combination of natural and mechanical
ventilation.

To kick start my investigation process, would appreciate some feedback
of the following:-
Are there other methods I could employ to see which CFD run (serial or
parallel) is actually giving me more reasonable result?


1. The Pyrosim parallel compiler information (I tried to replace with
the individual latest 64bit complier files from the NIST download page
but there were error statements in the Pyrosim GUI eventhough the FDS
could compile alebit more slowly):-
Compilation Date : Fri, 17 Oct 2008
Version : 5.2.3 Parallel
SVN Revision No. : 2514.
Are these relese versions stable or are there documented issues
related to these versions?

2. I have attached the out file of this run (I am re-running this
model again - i have refined the meshes such that i have 8meshes to
utilize the cpu 8 cores and the mesh ratio has been changed such that
"between adjacent mesh and within the same mesh" the mesh ration is 1
or 2). Are these max, min divergence and CFL figures looking normal?

Time Step 42200 October 18, 2009 12:14:14
----------------------------------------------
Mesh 1, Cycle 42200
CPU/step: 1.997 s, Total CPU: 24.02 hr
Time step: 0.00636 s, Total time: 259.48 s
Max CFL number: 0.99E-01 at ( 27, 40, 13)
Max divergence: 0.12E-03 at ( 20, 39, 14)
Min divergence: -0.71E-04 at ( 28, 40, 14)
Mesh 2, Cycle 42200
CPU/step: 1.030 s, Total CPU: 12.19 hr
Time step: 0.00636 s, Total time: 259.48 s
Max CFL number: 0.99E-01 at ( 27, 0, 13)
Max divergence: 0.57E-02 at ( 16, 30, 17)
Min divergence: -0.11E-01 at ( 17, 30, 18)
Radiation Loss to Boundaries: 1.925 kW
Mesh 3, Cycle 42200
CPU/step: 2.261 s, Total CPU: 26.91 hr
Time step: 0.00636 s, Total time: 259.48 s
Max CFL number: 0.12E+00 at ( 0, 37, 14)
Max divergence: 0.70E+00 at ( 14, 36, 12)
Min divergence: -0.39E+00 at ( 2, 54, 13)
Total Heat Release Rate: 0.042 kW
Radiation Loss to Boundaries: 59.382 kW
Mesh 4, Cycle 42200
CPU/step: 1.180 s, Total CPU: 13.93 hr
Time step: 0.00636 s, Total time: 259.48 s
Max CFL number: 0.24E+00 at ( 60,151, 12)
Max divergence: 0.70E-01 at ( 61,154, 11)
Min divergence: -0.61E-01 at ( 51,155, 15)
Radiation Loss to Boundaries: 5.692 kW
Mesh 5, Cycle 42200
CPU/step: 2.560 s, Total CPU: 30.22 hr
Time step: 0.00636 s, Total time: 259.48 s
Max CFL number: 0.24E+00 at ( 34,146, 12)
Max divergence: 0.19E-01 at ( 72, 64, 13)
Min divergence: -0.19E-01 at ( 72, 64, 14)
Radiation Loss to Boundaries: 1.539 kW
Mesh 6, Cycle 42200
CPU/step: 2.985 s, Total CPU: 35.30 hr
Time step: 0.00636 s, Total time: 259.48 s
Max CFL number: 0.50E+00 at ( 72,285, 13)
Max divergence: 0.12E+00 at (155, 58, 12)
Min divergence: -0.11E+00 at (156, 89, 12)
Radiation Loss to Boundaries: 17.808 kW
Mesh 7, Cycle 42200
CPU/step: 2.930 s, Total CPU: 34.60 hr
Time step: 0.00636 s, Total time: 259.48 s
Max CFL number: 0.41E+00 at (133,287, 13)
Max divergence: 0.64E+01 at (136, 13, 7)
Min divergence: -0.13E+02 at (130, 43, 9)
Total Heat Release Rate: 3947.258 kW
Radiation Loss to Boundaries: 1497.730 kW
Mesh 8, Cycle 42200
CPU/step: 2.650 s, Total CPU: 31.34 hr
Time step: 0.00636 s, Total time: 259.48 s
Max CFL number: 0.94E+00 at (616, 25, 0)
Max divergence: 0.28E+01 at (584, 16, 4)
Min divergence: -0.22E+01 at (630, 6, 3)
Total Heat Release Rate: 0.027 kW
Radiation Loss to Boundaries: 10.664 kW

Thank You
Best Regards
Lawrence

Kevin

unread,
Oct 18, 2009, 3:40:19 PM10/18/09
to FDS and Smokeview Discussions
This is a very confusing list of issues.

1. I do not know what the "Pyrosim parallel mode" is. My guess is that
the difference in your parallel calculations is the use of individual
processors.

2. FDS 5.2.3 is old. We're up to FDS 5.4.1. I suggest that if you
raise issues, you should be using the latest version.

3. Raw output from the .out file is not particularly useful in
diagnosing cases.

If you feel that there is a bug in the software, use the Issue Tracker
to report it, one issue at a time. From your post above, I think you
are drawing conclusions that are not correct.

lawrence

unread,
Oct 18, 2009, 10:35:55 PM10/18/09
to FDS and Smokeview Discussions
Hi Kevin,

Thanks for the input. I will stop using Pyrosim to run the FDS taking
your advice of using the latest version. I will use the latest FDS
version to see if the CFD results between the parallel and serial runs
are in good agreement or not.

Thank You
Lawrence
> > Lawrence- Hide quoted text -
>
> - Show quoted text -

charlie.thornton

unread,
Oct 19, 2009, 9:47:10 AM10/19/09
to FDS and Smokeview Discussions
Alternately, you could install the latest version of PyroSim. FDS
5.4.1 is bundled with the current version.

Also, PyroSim's "Run Parallel FDS..." doesn't do anything particularly
magical to run FDS. You might want to check your config file if
you're not getting enough CPU utilization when running from the
command line.

lawrence

unread,
Oct 20, 2009, 8:06:03 AM10/20/09
to FDS and Smokeview Discussions
Hi Charlie,

*Hit my head* did not know why i did not think about that....updating
the pyrosim to the latest release.

I did not use a config.txt file, I typed at command prompt <mpiexec -n
8 fds5_mpi.exe xxx.fds> to execute the parallel run. If fact I am
running 2 of them now (after i "killed" the FDS parallel runs that
were running in pyrosim). With FDS pyrosim it takes about 7 days to
complete as compared with the mpiexec method, it will take I estimate
about 10 days.

When I run FDS parallel I am actually still running the FDS in one
desk top trying to use all the cpu cores and not between desktops.
Does this have something to do with the Openmp currently under
development for the 64bit vs MPI?

Thank You
Lawrence





On Oct 19, 9:47 pm, "charlie.thornton" <charlie.thorn...@gmail.com>
wrote:

Kevin

unread,
Oct 20, 2009, 9:10:41 AM10/20/09
to FDS and Smokeview Discussions
OpenMP and MPI are two completely different ways of running FDS in
parallel. Even the letters "MP" do not stand for the same thing. 32 vs
64 bit is not relevant in this context. Let's just discuss MPI. If you
run fds5_mpi on one machine, regardless of the number of cores, you
will experience slowdown because each of your processes has to get
information out of the same bank of memory (RAM). So there is a
bottleneck. MPI works best when you distribute the job across multiple
machines, each of which has its own RAM and there is less of a
bottleneck getting information in and out of RAM. When I run MPI jobs,
I typically assign two meshes (two processes) to each machine because
each machine in our computing cluster has two individual processors.
These processors might have multiple cores, but we have found that
when you load up all the cores, there is a non-negligible slowdown.
One process per processor has a very minor slowdown because each
processor still has to get information in and out of RAM.
> > > > - Show quoted text -- Hide quoted text -

lawrence

unread,
Oct 20, 2009, 10:50:52 AM10/20/09
to FDS and Smokeview Discussions
Hi Kevin,

Thank you for helping me distinguish between openmp and mpi. I have
attempted to get clear this confusion but was not successful till now.

I have tried to link my two desktop via a lan cross over cable but
each machine refuses to let the other into their OS/Folders. Both my
desktop OS is ultimate vista 64bit. So till i solve this
problem...have turned on sharing etc but it has something to do with
limited network connectivity issue,i am left to trying to use a single
PC multiple cores to help speed things up.

So far the cfd run time has shown the following approximate CFD run
time trend:-

1. FDS command prompt serial run on intel i7:920 8G ram (3.8Ghz) - A
hours.
2. FDS pyrosim parallel run on i7:920 - 0.5A hours.
3. FDS pyrosime parallel run on intel dual W5580 24G ram- 0.35A hours.
4. FDS command prompt parallel run (MPI) on W5580 24G ram - 0.5A
hours.

So for long CFD runs I usually use option 3 but for now till i update
the pyrosim, i am using option 4.

Thank You
Lawrence

charlie.thornton

unread,
Oct 20, 2009, 11:41:19 AM10/20/09
to FDS and Smokeview Discussions
When running from the command line, you might want to experiment with
the "-channel ssm" mpiexec option. (e.g. mpiexec -channel ssm -n 5
fds5_mpi bigfire.fds) In our experiments with MPI runs on just one
machine this option was faster than the default. It /may/ explain the
difference between #3 and #4. I guess I might need to take back my
"nothing particularly magical" comment from before :)
> > > > > > >      ...
>
> read more »

lawrence

unread,
Oct 20, 2009, 12:17:19 PM10/20/09
to FDS and Smokeview Discussions
Hi Charlie,

WOH! Your tip has solved the "mystrey", all the cores went 100%!

Thanks Charlie!

Now I can proceed with the comparision between the parallel and serial
runs to establish if the results are more or less the same. This is an
important tool as during rush jobs I use th mpi method to obtain the
results at 1/4 (approx) of the usual time required.
But after i observed this difference in results between serial/
parallel runs for my recent job, i have stopped using the parallel
till i am sure that this is a one off issue or i understand what had
happened.

Hope to post this issue in the issue tracker once the results are out
in a weeks time or more (since i am running two models on the same PC
now).

Thank you once more.
Lawrence


On Oct 20, 11:41 pm, "charlie.thornton" <charlie.thorn...@gmail.com>
wrote:
> > > > > > > >        Max CFL number:  0.24E+00 at ( 34,146,- Hide quoted text -
>
> - Show quoted text -...
>
> read more »

BatGirl

unread,
Nov 14, 2009, 5:30:16 PM11/14/09
to FDS and Smokeview Discussions
Is "-channel ssm" an option using MPICH2 (I can't find it in its help
list...)?

charlie.thornton

unread,
Nov 16, 2009, 10:48:29 AM11/16/09
to FDS and Smokeview Discussions
Try the MPICH2 Windows Developers Guide (section 9.2) at:

http://www.mcs.anl.gov/research/projects/mpich2/documentation/index.php?s=docs

The guide indicates that the shared memory channel (shm) would be
faster on jobs run entirely on one machine, but our tests showed that
the sockets + shared memory (ssm) channel actually worked faster so
that's what we went with for PyroSim. If you do some tests and learn
more about the relationship between channels, hardware arrangement,
and performance, I'd love to hear about it. I do like the sound of
this "nemesis" channel...

- Charlie

BatGirl

unread,
Nov 23, 2009, 11:01:18 AM11/23/09
to FDS and Smokeview Discussions
WOW! like night and day - w/o "-channel ssm", with the MPI version
running my huge model, I get about 80% usage of processors 0-2 and
about 60% of processor 3 (btw - this is a quad processor on a single
machine...).

BUT - w/ "-channel ssm", all four are at 100%!!!!!!!!!!!!!!!!!!!!!!!!

Yes, I can see it being 'faster' for it actually usea all the
computer's processing power.

RAM usage seems to be about the same...
Reply all
Reply to author
Forward
0 new messages