Half precision solvers

Ben Burnett

unread,

Oct 21, 2021, 5:49:58 PM10/21/21

to MAGMA User

Hello,

I am a student working with a group doing research on mixed precision numerical methods for solving differential equations and am wondering if there is a gsev routine that is completely half precision?

We have seen the mixed precision iterative refinement schemes provided by Magma that internally use half precision but are able to achieve double precision and the similar Nvidia cusolverDn<T1><T2>gsev solvers. However we are interested in a solver that is able to perform the solve at half precision without needing to achieve double precision. Could I potentially copy one of the existing routines for single/double precision and rebuild magma with it?

If it helps to try and describe what we are attempting I'll try and include a brief description. As an example, one of the methods we are working with is done in three stages. The first stage is an implicit step that solves a linear system and is done entirely at a low precision (we are hoping to do this on a GPU to use half precision). The second stage is an explicit post processing step done at a higher precision that in theory will correct the use of the low precision implicit step inexpensively. The final stage is the update stage done also at high precision to move forward with the iterative solve of the differential equation.

Thank you in advance for any advice you can offer,

Ben

Stanimire Tomov

unread,

Oct 21, 2021, 6:15:44 PM10/21/21

to Ben Burnett, MAGMA User

Ben,

Probably you can use the magma_xhsgetrf_gpu routine - it takes the matrix in single precision

and factorizes it using internally some FP16 and some FP32 arithmetic, so it is not entirely FP16.

In terms of speed though is like an entirely FP16 routine, and in terms of accuracy is better than

an FP16 routine and worse than FP32.

Does this sound like something it can be used in your case?

Entirely FP16 codes are challenge as they will require the availability of a lot of BLAS routines

in FP16 besides hgemm, and CUBLAS is not providing them.

Stan

--
You received this message because you are subscribed to the Google Groups "MAGMA User" group.
To unsubscribe from this group and stop receiving emails from it, send an email to magma-user+...@icl.utk.edu.
To view this discussion on the web visit https://groups.google.com/a/icl.utk.edu/d/msgid/magma-user/4f346630-761d-47ad-8b6f-5e452d041bban%40icl.utk.edu.

Ben Burnett

unread,

Oct 22, 2021, 9:25:30 AM10/22/21

to MAGMA User, to...@icl.utk.edu, MAGMA User, Ben Burnett

Hi Stan,

Potentially, I think these might work. Would the accuracy comparison still be applicable after using the forward/back solves? In otherwords would the relationship xhsgetrf+sgetrs < sgetrf+sgetrs < dgetrf+dgetrs with regards to accuracy still hold?

Thanks for the advice!

Ben

Stanimire Tomov

unread,

Oct 25, 2021, 3:24:31 AM10/25/21

to Ben Burnett, MAGMA User, to...@icl.utk.edu

Hi Ben,

Yes, from what I understood you could order the accuracies that way.

I can try to comment more if you provide more detail of the 3 steps, e.g., is it easy to write the formulas

(e.g., step one looks like you solve (1) A_h x_1 = b_h where A_h and b_h are probably in FP16; after that

is explicit update like x2 = x1 + something, so if you are improving (1) would the term “something” be some

residual, maybe preconditioned for (1), similar to the iterative refinement and maybe done in single precision, etc.?).

Thanks,

Stan

Nima Sahraneshin

unread,

Mar 11, 2022, 4:39:28 PM3/11/22

to Ben Burnett, MAGMA User

Hi,

I have implemented a completely FP16 version of "magma_zgetrf_gpu" for my research, but you shouldn't expect to see a large speedup with completely FP16 version.