MAGMA routines with support for multiple GPUs

Danesh Daroui

unread,

Dec 18, 2025, 5:38:41 PM12/18/25

to MAGMA User

Hi all,

I am looking for equivalent MAGMA functions for e.g., LU factorization and Gaussian elimination that could support multiple GPUs. In MAGMA document I see this note:

docs/documentation.txt:_mgpu | magma_dgetrf_mgpu | hybrid CPU/multiple-GPU routine where the matrix is distributed across multiple GPUs' device memories.

The problem is that I cannot find anywhere in the code that "magma_dgetrf_mgpu" is implemented. My branch is completely synched with the latest master on github. Thanks for your help.

Regards,

Danesh

Natalie Beams

unread,

Dec 18, 2025, 5:41:05 PM12/18/25

to MAGMA User, danesh...@gmail.com

Hi Danesh,

Have you built MAGMA yet? The file `src/dgetrf_mgpu.cpp`, which has `magma_dgetrf_mgpu`, should get generated from `src/zgetrf_mgpu.cpp` when you build the library.

-- Natalie

Danesh Daroui

unread,

Dec 18, 2025, 6:22:24 PM12/18/25

to Natalie Beams, MAGMA User

Hi Natalie,

Thanks a lot for your prompt response. I re-built MAGMA and now mgpu routines do exist. I did a quick research and also asked ChatGPT and the information I got was that support for multiple GPUs is quite tricky and should be done using MPI and this will be deprecated in the future versions of MAGMA. It also proposed to use SLATE but I really want to use MAGMA because of its robustness and excellent performance. I had used ScaLAPACK before and know this needs some effort to use MPI. I wanted to ask whether MAGMA routines already use MPI for multiple GPUs but abstract from the user or they use different ways for solutions over more than one GPU? Does MAGMA use tiling and distribute the matrix or it only supports task parallelism and not data parallelism? I hope ChatGPU hallucinated about MAGMA! ;)

Regards,

Danesh

Natalie Beams

unread,

Dec 18, 2025, 6:54:58 PM12/18/25

to MAGMA User, danesh...@gmail.com, MAGMA User, Natalie Beams

Hi Danesh,

MAGMA does not (and does not plan to) use MPI for anything. The multi-GPU routines are not for distributed matrices. SLATE, which uses MPI, can be thought of as the GPU-enabled replacement for ScaLAPACK. If you have code that uses ScaLAPACK, it should be fairly simple to switch to using SLATE instead. However, if you don't need to distribute your matrices because they fit in GPU memory without being split up, then you can keep using MAGMA.

-- Natalie

Danesh Daroui

unread,

Dec 19, 2025, 1:02:23 PM12/19/25

to Natalie Beams, MAGMA User

Hi Natalie,

I think by distributed, you mean across multiple nodes, otherwise mgpu routines in MAGMA should support multiple GPUs running on one node, right? MPI might be needed for multiple nodes which is not the case here. I see in the source code that the distribution of matrices across GPUs should be done manually which is OK and the rest is done automatically. Is this correct?

Regards,

Danesh

Sent from Gmail Mobile

Natalie Beams

unread,

Dec 19, 2025, 1:16:42 PM12/19/25

to MAGMA User, danesh...@gmail.com, MAGMA User, Natalie Beams

Hi Danesh,

By "distributed", I meant "using multiple MPI processes which own parts of the data" -- whether on one node or multiple nodes. But in the MAGMA documentation, we also use "distributed" for these multi-GPU routines, so I should have been more specific. Also, I thought you needed enough memory to fit the matrix on one GPU for *getrf_mgpu, but looking at the code again, I see I was mistaken about that. It doesn't have to fit, but you need to split it up yourself, or use `magma_*setmatrix_1D_col_bcyclic` to take a matrix in host memory and distribute it among the GPUs in the correct manner.

-- Natalie

Danesh Daroui

unread,

Dec 19, 2025, 6:00:05 PM12/19/25

to Natalie Beams, MAGMA User

Hi Natalie,

Many thanks for your support. This was very helpful. I will use _mpgu routines in my case when multiple GPUs are used on a single node. I also hope getri function is also implemented in MAGMA to support multiple GPUs in the future.