Cuda backend parameters

Junting Chen

unread,

Jan 6, 2020, 12:34:40 PM1/6/20

to PyFR Mailing List

Hello,

I am wondering if someone can provide a bit more descriptions on these parameters to optimize performance.

As far as I know, when using multiple GPUs, I had to select local-rank for device-id and cuda-aware for mpi-type. When exactly should i be using round-robin and local-rank? And when should i be using standard or cuda-aware?

How would you select GiMMiK cutoff? How does it affect accuracy / performance?

I believe block-1d and block-2d are determined by GPU's specification. I am not very familiar with Cuda. Please someone can elaborate a bit. For example I am running pyfr with two Tesla k80s in parallel, what's the block size for 1d and 2d pointswise kernels?

Parameterises the CUDA backend with

device-id — method for selecting which device(s) to run on:
int | round-robin | local-rank
gimmik-max-nnz — cutoff for GiMMiK in terms of the number of non-zero entires in a constant matrix:
int
mpi-type — type of MPI library that is being used:
standard | cuda-aware
block-1d — block size for one dimensional pointwise kernels:
int
block-2d — block size for two dimensional pointwise kernels:
int, int

Thanks a lot!

Junting Chen

Freddie Witherden

unread,

Jan 6, 2020, 5:18:29 PM1/6/20

to pyfrmai...@googlegroups.com

Hi Junting,

On 06/01/2020 12:34, Junting Chen wrote:
> As far as I know, when using multiple GPUs, I had to select local-rank
> for device-id and cuda-aware for mpi-type. When exactly should i be
> using round-robin and local-rank? And when should i be using standard or
> cuda-aware?

If the GPUs in your system are in compute exclusive mode then
round-robin is probably what you want. Otherwise, opt for local-rank.
So long as each rank gets its own GPU there should be no impact on
performance.

In terms of the mpi-type this depends heavily on the hardware you're
running on and the MPI library you're using. If your MPI library is
CUDA aware then setting mpi-type = cuda-aware can improve performance.

> How would you select GiMMiK cutoff? How does it affect accuracy /
> performance?

Some experimentation is needed here as the optimal value depends on the
element types you're using, if anti-aliasing is enabled, and the CPU
that you are running on.

> I believe block-1d and block-2d are determined by GPU's specification. I
> am not very familiar with Cuda. Please someone can elaborate a bit. For
> example I am running pyfr with two Tesla k80s in parallel, what's the
> block size for 1d and 2d pointswise kernels?

You should seldom need to modify either of these two values. On some
pathological meshes reducing block-1d can improve performance, but not
by a lot.

Regards, Freddie.

signature.asc

Junting Chen

unread,

Jan 7, 2020, 12:40:48 PM1/7/20

to PyFR Mailing List

Thanks Freddie,

So when starting a run, do you usually play with the GiMMiK cutoff a bit to find the most optimized value (does it influence the performance significantly / worth the effort of finding the optimized value)? What's the range of this value? Is a power of 2 (example uses 512) somewhat beneficial?

Junting

Freddie Witherden

unread,

Jan 8, 2020, 9:32:05 AM1/8/20

to pyfrmai...@googlegroups.com

Hi Junting,

On 07/01/2020 12:40, Junting Chen wrote:
> Thanks Freddie,
>
> So when starting a run, do you usually play with the GiMMiK cutoff a bit
> to find the most optimized value (does it influence the performance
> significantly / worth the effort of finding the optimized value)? What's
> the range of this value? Is a power of 2 (example uses 512) somewhat
> beneficial?

The values I would try are 0 (disables GiMMiK), 512 (the default), and
8192. There is nothing special about the number being a power of two.

Regards, Freddie.

signature.asc

Reply all

Reply to author

Forward