I am trying to find a proper reason for the statement of "PyFR has a good utilization of GPU acceleration technology" , or possibly refer to a paper. I understand that memory transfer between CPU and GPU is one of the big slowdowns of GPU computing.
In the paper "pyfr an opensource frame work for solving advection-diffusion..." in 2014, it was said GEMM was optimized for large square matrices, where the constant operator in PyFR are small and square, and state matrices are short and fat. Is this improved?
--
You received this message because you are subscribed to the Google Groups "PyFR Mailing List" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pyfrmailingli...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/pyfrmailinglist/e46b4ba3-67b2-4cc0-865d-b15596d0f430%40googlegroups.com.