Hi,
I am seeing that V100 and A100 have more that 80SM, so I tried to change the second level panel size (recpnb variable in sgetrf_panel_native.cpp) with this hope to see better performance, but it seems that I have a misunderstanding here and it goings slower even.
Why is the value of recpnb 32?(I remember in a paper Ahmad mentioned that because most GPUs have this number of SM ).
Why by using a larger number the performance does not change? I think I have free SM, because during factorization of the panel GPU does not do anything else, and the other 48 SM should be free.
Best regards,
Aran