PyFR performing worse with more cores

72 views
Skip to first unread message

Liam Doult

unread,
Dec 7, 2016, 10:46:35 AM12/7/16
to PyFR Mailing List
Hi, 

I am executing Pyfr 1.5.0 on duel xeon 2630v4 10c per node with 2 nodes. As I scale the mpirun -np up to 40 the performance is drastically reduced.

It also appears to be capped at 50% of the maximum cpu performance.

Any insight is appreciated

Regards
Liam

Freddie Witherden

unread,
Dec 7, 2016, 12:41:24 PM12/7/16
to pyfrmai...@googlegroups.com
Hi Liam,
The performance and scalability of PyFR is heavily determined by the
type of problem you are running. If the mesh has relatively few
elements it is not surprising that the performance begins to regress as
as the number of ranks is increased. Additionally, for some cases PyFR
can be bound by memory bandwidth and so once you have enough ranks
inside of a node to saturate the memory bus no improvement should be
expected from adding additional ranks.

Further, you do not state what compiler/BLAS library you are using and
what the interconnect is between the nodes. Again, if this is gigabit
ethernet then poor scalability is not unexpected.

Moreover, PyFR is a hybrid MPI/OpenMP code and so you are almost always
better off with one MPI rank per NUMA zone.

Regards, Freddie.

signature.asc

Liam Doult

unread,
Dec 7, 2016, 3:20:56 PM12/7/16
to PyFR Mailing List
Hi,

So looking deeper into the issue, it appears that running mpirun instantiates 8 processors but only executes on 4.

This is using mpirun -np 8. This occurs on both nodes. so it appears in instantiates 16 instead of 8. This does not happen on any of our other mpirun benchmarks.

We are running centos7, mpi 1.10, and standard gcc for this execution.
We are executing cube_hex.

perhaps you can add some insight into why this is happening.

regards
Liam

Freddie Witherden

unread,
Dec 7, 2016, 5:13:32 PM12/7/16
to pyfrmai...@googlegroups.com
Hi Liam,

On 07/12/16 12:20, Liam Doult wrote:
> Hi,
>
> So looking deeper into the issue, it appears that running mpirun
> instantiates 8 processors but only executes on 4.
>
> This is using mpirun -np 8. This occurs on both nodes. so it appears in
> instantiates 16 instead of 8. This does not happen on any of our other
> mpirun benchmarks.
>
> We are running centos7, mpi 1.10, and standard gcc for this execution.
> We are executing cube_hex.
>
> perhaps you can add some insight into why this is happening.

One thing you need to make sure of is that mpi4py is linked against the
same distribution of MPI that is providing mpirun. Otherwise strange
things can happen.

Regards, Freddie.

signature.asc
Reply all
Reply to author
Forward
0 new messages