Hi Liam,
The performance and scalability of PyFR is heavily determined by the
type of problem you are running. If the mesh has relatively few
elements it is not surprising that the performance begins to regress as
as the number of ranks is increased. Additionally, for some cases PyFR
can be bound by memory bandwidth and so once you have enough ranks
inside of a node to saturate the memory bus no improvement should be
expected from adding additional ranks.
Further, you do not state what compiler/BLAS library you are using and
what the interconnect is between the nodes. Again, if this is gigabit
ethernet then poor scalability is not unexpected.
Moreover, PyFR is a hybrid MPI/OpenMP code and so you are almost always
better off with one MPI rank per NUMA zone.
Regards, Freddie.