Hi Junting,
On 06/01/2020 12:34, Junting Chen wrote:
> As far as I know, when using multiple GPUs, I had to select local-rank
> for device-id and cuda-aware for mpi-type. When exactly should i be
> using round-robin and local-rank? And when should i be using standard or
> cuda-aware?
If the GPUs in your system are in compute exclusive mode then
round-robin is probably what you want. Otherwise, opt for local-rank.
So long as each rank gets its own GPU there should be no impact on
performance.
In terms of the mpi-type this depends heavily on the hardware you're
running on and the MPI library you're using. If your MPI library is
CUDA aware then setting mpi-type = cuda-aware can improve performance.
> How would you select GiMMiK cutoff? How does it affect accuracy /
> performance?
Some experimentation is needed here as the optimal value depends on the
element types you're using, if anti-aliasing is enabled, and the CPU
that you are running on.
> I believe block-1d and block-2d are determined by GPU's specification. I
> am not very familiar with Cuda. Please someone can elaborate a bit. For
> example I am running pyfr with two Tesla k80s in parallel, what's the
> block size for 1d and 2d pointswise kernels?
You should seldom need to modify either of these two values. On some
pathological meshes reducing block-1d can improve performance, but not
by a lot.
Regards, Freddie.