Multiple models running on MPI going slow

696 views
Skip to first unread message

Harry Wall

unread,
Feb 7, 2017, 9:05:23 PM2/7/17
to FDS and Smokeview Discussions
We have just converted one of our cfd dedicated machines to linux in order to allow us to run multiple models at once however the models have been running slower than expected. The machines info is shown in the attached image.

Originally we were running this machine on windows with the mpi set to the number of meshes and it was quite fast - could run 1.5M (26 mesh) model in 40 hours.

However, now when running four models utilising all 48 cores on the cfd machine, the computing speed was found to drop substantially (8-14s/hour). Based on my limited understanding on MPI and computing, I was under the impression that if each mesh is given one core, then using all 48 cores shouldn't decrease the speed of the system if enough memory is provided for each core. Currently the system is only utilising 10-20% of the available memory so I assume that isn't the case. Additionally, it appears quite a few of the cores are only running at 50% as shown in the second attached image. 

Based on the fact it slows down when they are all on MPI (see additional information below), I thought it could be the following
  • System is overloaded
  • My understanding of MPI is incorrect and it cannot complete 48 processes at once efficiently
  • For some unknown reason some of the cores are only working at 50% capacity
  • Some cores are working on multiple models therefore slowing them down
I have been reading quite a bit about MPI and OMP to try to find a solution however I am not making much process. If anyone has suggestions regarding achieving the original speed (40-50s/hour) it would be greatly appreciated. 

Cheers,
Harry




Additional information:

Prior running them on MPI, we were running them using OMP = number of meshes based on advice from someone but I switched one of the 9 mesh models over to MPI while the other three were running on OMP it increased to 48s/hour from 12s/hour. However when switching them all over to MPI I found the speed to drop at 8-14s/hour per model. 


Code to start model:

OMP_NUM_THREADS=1
mpirun -np [number of meshs] fds [model name].fds

The four models are:
9 mesh = 550k cells
9 mesh = 550k cells
12 mesh = 600k cells
16 mesh = 1.5M cells

cfd1.PNG
cfd1-1.PNG

Kevin

unread,
Feb 8, 2017, 8:47:39 AM2/8/17
to FDS and Smokeview Discussions
Here is a good article explaining some of the hardware

https://kb.iu.edu/d/avfb

From your description, your machine has 4 sockets, with 6 cores per socket. That's 24 cores. Each core supports 2 threads, but I think that this only matters if you are using OpenMP. I would limit your usage to 24 FDS processes, not 48. Experiment with OMP_NUM_THREADS=2, but that might not help much.

Even if you limit yourself to 24 processes, do not expect them to be as fast as one because each MPI process is competing in accessing memory. I believe that with 4 sockets, you only have 4 "pipelines" to RAM.

Harry Wall

unread,
Feb 12, 2017, 11:37:49 PM2/12/17
to FDS and Smokeview Discussions
Thanks Kevin,

That's quite helpful. Based on your advice and further reading, I've ran a few sets of tests to check how each variable (number of threads, number of MPI, mesh size and stack size) and it seems that fastest way is just to just MPI rather than trying to multiple threads at the same time. Models are back to running 1800s in 30-50 hours which is great.

Thanks for your help,
Harry

Kevin

unread,
Feb 13, 2017, 8:21:58 AM2/13/17
to FDS and Smokeview Discussions
Yes, I generally use multiple OpenMP threads when I have extra cores sitting idle (we have a cluster with hundreds of cores). But you will see a performance decrease if you break up the domain into too many meshes. That's something you might want to check, running a case about 100 time steps with increased number of meshes.
Reply all
Reply to author
Forward
0 new messages