AMGCL with OpenCL - AMD vs. NVIDIA Consumer GPUs F64 performance

278 views
Skip to first unread message

C B

unread,
May 22, 2021, 10:11:56 PM5/22/21
to amgcl
Hello Denis and Everyone,

What is your opinion on f64 performance with low end GPUs, or let's say best value for the cost ?

I tried to find f64 performance and I found that many NVIDIA GPUs that have great f32 performance have very low f64 performance,
whereas AMD's GPUs seem to have better relative f64 performance on average.

https://arrayfire.com/explaining-fp64-performance-on-gpus/   AMD GPUs perform fairly well for FP64 compared to FP32. Most AMD cards (including consumer/gaming series) will give between 1:3 and 1:8 FP32 performance for FP64.

https://www.techpowerup.com/gpu-specs/geforce-rtx-3090.c3622

A very expensive RTX 3090  has a 1:64 F64:F32 ratio, with FP64 = 560 GFlops, whereas

a very inexpensive Radeon RX 5600M (laptop) has a 1:16 ratio with FP 64 = 360 GFlops !

But I suppose it is not just FP64, I guess OpenCL vs. CUDA also is an important factor, what is your advice in this regard ?

Last month NVIDIA released a new OpenCL 3.0 driver, has anyone tried it to compare the OpenCL vs. CUDA performance on the same GPU ?

I would like to try this Radeon GPU with AMGCL, but first I need OpenCL, and on my Windows computer amgcl-master/cmake/opencl/FindOpenCL.cmake   is not finding the OpenCL installations, I have at least 2, one from Intel and the other from Nvidia.  I browsed FindOpenCL.cmake and it seems to indicate that it was sort of customized for AMD's sdk, which is now discontinued. Is this the case ?

Thanks for your help / recommendations !
Cheers,

Denis Demidov

unread,
May 23, 2021, 1:49:08 AM5/23/21
to amgcl
On Sunday, May 23, 2021 at 5:11:56 AM UTC+3 cebau...@gmail.com wrote:
Hello Denis and Everyone,

What is your opinion on f64 performance with low end GPUs, or let's say best value for the cost ?

I would say that for amgcl the memory bandwidth is more important than double precision arithmetics,
since the algorithms are memory-bound, not compute-bound.
 

I tried to find f64 performance and I found that many NVIDIA GPUs that have great f32 performance have very low f64 performance,
whereas AMD's GPUs seem to have better relative f64 performance on average.

https://arrayfire.com/explaining-fp64-performance-on-gpus/   AMD GPUs perform fairly well for FP64 compared to FP32. Most AMD cards (including consumer/gaming series) will give between 1:3 and 1:8 FP32 performance for FP64.

https://www.techpowerup.com/gpu-specs/geforce-rtx-3090.c3622

A very expensive RTX 3090  has a 1:64 F64:F32 ratio, with FP64 = 560 GFlops, whereas

a very inexpensive Radeon RX 5600M (laptop) has a 1:16 ratio with FP 64 = 360 GFlops !

But I suppose it is not just FP64, I guess OpenCL vs. CUDA also is an important factor, what is your advice in this regard ?

Last month NVIDIA released a new OpenCL 3.0 driver, has anyone tried it to compare the OpenCL vs. CUDA performance on the same GPU ?


On my GPU, OpenCL performs on par with CUDA, and the OpenCL backend in amgcl is even slightly faster than the CUDA one on the same NVIDIA GPU.
 

I would like to try this Radeon GPU with AMGCL, but first I need OpenCL, and on my Windows computer amgcl-master/cmake/opencl/FindOpenCL.cmake   is not finding the OpenCL installations, I have at least 2, one from Intel and the other from Nvidia.  I browsed FindOpenCL.cmake and it seems to indicate that it was sort of customized for AMD's sdk, which is now discontinued. Is this the case ?


FindOpenCL.cmake copy in amgcl is only used with ancient versions of CMake:

https://github.com/ddemidov/amgcl/blob/61d219699005743338a41768c41bca0a8678d24e/CMakeLists.txt#L18-L21

After that, FindOpenCL is part of cmake distribution.
I don't have access to a windows machine, but you should be able to manually set cmake variables (these should be in "advanced") section
OpenCL_INCLUDE_DIR (the path which has CL/opencl.h) and OpenCL_LIBRARY (the path to OpenCL.dll).

C B

unread,
May 23, 2021, 2:04:02 PM5/23/21
to Denis Demidov, amgcl
Denis,
Thank you so much for your insights !
Then if we are dealing mainly with memory-bound applications, are there any specific metrics that we need to look at when selecting a GPU ?
If we want to buy a GPU, what are the most important parameters for HPC ?
I guess this may also depend on the memory bandwidth when communicating between CPU and GPU ...

Thanks again for your insights :)
Cheers,


--
You received this message because you are subscribed to the Google Groups "amgcl" group.
To unsubscribe from this group and stop receiving emails from it, send an email to amgcl+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/amgcl/1b6134b1-f978-4988-b45e-c19ce744e025n%40googlegroups.com.

Denis Demidov

unread,
May 23, 2021, 2:20:46 PM5/23/21
to amgcl
On Sunday, May 23, 2021 at 9:04:02 PM UTC+3 cebau...@gmail.com wrote:
Denis,
Thank you so much for your insights !
Then if we are dealing mainly with memory-bound applications, are there any specific metrics that we need to look at when selecting a GPU ?
If we want to buy a GPU, what are the most important parameters for HPC ?
I guess this may also depend on the memory bandwidth when communicating between CPU and GPU ...

If the problems you are going to solve are mostly memory-bound, then I would look at the GPUs with the faster memory bandwidth.
Double-precision performance should also help, but professional GPUs with non-restricted double arithmetics are going to be an order of magnitude more expensive,
so that is up to your budget really. I can not help you with the specific models, as I haven't been following the market for some time.

C B

unread,
May 23, 2021, 3:29:43 PM5/23/21
to Denis Demidov, amgcl
Denis,
Thank you very much for your comments.

I found this link showing how to measure memory bandwidth with the utilities fo the standard CUDA tookit

Thanks again,
Cheers

Reply all
Reply to author
Forward
0 new messages