Efficiency of MPI calculation

tatsuya.sak...@gmail.com

unread,

Oct 28, 2014, 6:21:19 PM10/28/14

to fds...@googlegroups.com

FDS 6.1.2 was run on a SL6 machine for two input files in serial and MPI modes. The difference between the input files was only in mesh regions defined for MPI mode. HRR results are in good agreement, but simulation times are enormously different.

The serial calculation, with the input file “cylinder_serial_4.fds” and with using one processor had the wall clock time around 68,000 sec.

On the other hand, the MPI calculation, with the input file “cylinder_MPI_1.fds” and two processors (4 cores) had a computational time of around 250,000 sec, which is four times as long as the serial calculation!

I would like to know how the MESH lines in the input file for MPI calculation can be improved to take the advantage of MPI with respect to the computational time.

cylinder_serial_4.fds

cylinder_MPI_1.fds

Kevin

unread,

Oct 29, 2014, 9:25:23 AM10/29/14

to fds...@googlegroups.com

I just ran the two input files for 10 s. The 1 mesh (serial) case required 1188 s (wall clock time) and the 4 mesh (parallel) case required 398 s. Ideally, the 4 mesh case should run in 1/4 the time as the 1 mesh case, but there are inefficiencies related to MPI communication and in my case there is competition among the 4 processes that share the same node. I ran the jobs on our linux cluster, but the 4 mesh case ran on one node. I could speed things up a bit by separating the MPI processes onto different nodes.

In your case, something seems to be very wrong. Could you post the .out files for both cases. I would like to look at the timing summary at the end.

tatsuya.sak...@gmail.com

unread,

Oct 29, 2014, 7:21:34 PM10/29/14

to fds...@googlegroups.com

Kevin,

Thank you for your early response.

The .out files from serial run (a4_splitz_serial.out) and MPI run (a4_splitz_MPI.out) are attached to this post.

2014年10月29日水曜日 8時25分23秒 UTC-5 Kevin:

a4_splitz_serial.out

a4_splitz_MPI.out

Kevin

unread,

Oct 30, 2014, 8:16:11 AM10/30/14

to fds...@googlegroups.com

Your 1 mesh serial job took 68236 s (wall clock time). Your 4 mesh, 2 MPI process parallel job took 40243 s. Your "serial" job was actually run with 4 OpenMP threads, which probably sped up the 1 mesh case by a factor of 2.

Where did you get the times that you report in your original post?

tatsuya.sak...@gmail.com

unread,

Oct 31, 2014, 1:22:35 PM10/31/14

to fds...@googlegroups.com

Kevin,

The time reported in the previous post was the total computational time which was obtained from the end part of .out file.

Anyway, I will investigate in more depth considering your last post.

I appreciate your support very much.

2014年10月30日木曜日 7時16分11秒 UTC-5 Kevin:

Kevin

unread,

Oct 31, 2014, 1:46:58 PM10/31/14

to fds...@googlegroups.com

Where in the .out file do you see 250,000 seconds?

Chris Salter

unread,

Nov 15, 2014, 1:54:31 PM11/15/14

to fds...@googlegroups.com

It should be noted that the MPI implementation should perform quicker if the meshes are about equal.

I wrote a letter to Fire Technology recently - basic comparisons show that an MPICH model can run significantly quicker. For example, the benchmark file was able to run in 2000 seconds, in comparison to the same model on OpenMP at about 7000 seconds.

Kevin

unread,

Nov 15, 2014, 2:13:13 PM11/15/14

to fds...@googlegroups.com

MPI and OpenMP are two very different means of parallelizing. We have found that at best, OpenMP can speed up the processing of a single mesh by about a factor of 2, even with 4 to 6 cores devoted to it. MPI can potentially cut the CPU time by as many times as you have cores or machines, but you have to divide the FDS mesh into multiple meshes.

Chris Salter

unread,

Nov 15, 2014, 2:57:44 PM11/15/14

to fds...@googlegroups.com

That's understandable and it's what I'd seen as well.

Having the option of both I think is definitely worthwhile, especially as the OpenMP seems far more stable than in the previous (FDS 5) version.

Reply all

Reply to author

Forward