MAGMA crashes when I use any other GPU than GPU0

11 views
Skip to first unread message

Danesh Daroui

unread,
Jan 2, 2026, 11:57:12 PM (6 days ago) Jan 2
to MAGMA User
Hi all,

In my code I used to set the GPU device to 0 as my machine always had one GPU. Now I wanted to test my code on a machine with more than one GPU and I started with the platforms where a container is running and provide up to 5 GPUs. In this case when I set my GPU device to e.g., 1 or 2 when we have 3 GPUs, MAGMA always crashes and following errors are shown:

** On entry to cusparseCreate(): CUDA context cannot be initialized

 ** On entry to cusparseSetStream() parameter number 1 (handle) had an illegal value: NULL pointer

** On entry to cusparseCreate(): CUDA context cannot be initialized

 ** On entry to cusparseSetStream() parameter number 1 (handle) had an illegal value: NULL pointer

 ** On entry to cusparseCreate(): CUDA context cannot be initialized

 ** On entry to cusparseSetStream() parameter number 1 (handle) had an illegal value: NULL pointer

 ** On entry to cusparseCreate(): CUDA context cannot be initialized

 ** On entry to cusparseSetStream() parameter number 1 (handle) had an illegal value: NULL pointer

Segmentation fault (core dumped)


The crash happens exactly when the first call to a MAGMA routine happens. I set CUDA device this way:

cudaError_t e = cudaSetDevice(m_default_gpu);
if (e != cudaSuccess) {
  cerr << "cudaSetDevice(" << m_default_gpu << ") failed: "
    << cudaGetErrorString(e) << "\n";
  return 1;
}

e = cudaFree(0);  // Forces context creation NOW.
if (e != cudaSuccess) {
  cerr << "Context init failed on GPU " << m_default_gpu << ": "
    << cudaGetErrorString(e) << "\n";
  return 1;
}

Does anybody know what can be the source of the problem? Does MAGMA have any dedicated routine to set the GPU device that shall be used instead of cudaSetDevice? If my code is correct, I am also wondering if the problem is with how a container may abstract GPU devices when later on a machine with real physical GPU devices the code may work correctly? Thanks in advance.

Regards,

Danesh


Ahmad Abdelfattah

unread,
Jan 3, 2026, 12:06:11 AM (6 days ago) Jan 3
to Danesh Daroui, MAGMA User
Hi Danesh, 

Perhaps a minimal reproducer would be helpful to track down the issue. Do you call “magma_init” and “magma_finalize” at the beginning/end of your code? 

MAGMA supports multi-GPU environments. There is a “magma_setdevice” to select a specific GPU. 

If you have the MAGMA testers compiled inside the container, try running any tester ( something like “./testing_sgemm -N 100” ), and observe the environment information printed at the beginning to see if MAGMA sees all GPUs. 

Ahmad

--
You received this message because you are subscribed to the Google Groups "MAGMA User" group.
To unsubscribe from this group and stop receiving emails from it, send an email to magma-user+...@icl.utk.edu.
To view this discussion visit https://groups.google.com/a/icl.utk.edu/d/msgid/magma-user/CABjLfggz6tm%2BWDZYMYCD%2BC7BxuLW0eg1R%3Dx-wBHhOGpZ8VjPJQ%40mail.gmail.com.

Danesh Daroui

unread,
Jan 5, 2026, 1:23:46 PM (3 days ago) Jan 5
to Ahmad Abdelfattah, MAGMA User
Hi Ahmad,
Thank you for your reply. I found the bug and it was actually on my side. MAGMA works fine. I used CUDA to set the device and now I use MAGMA's routine to set the device which is almost the same. The bug was after setting the device that I always created a queue on GPU0 which made the code crash when we choose e.g., GPU1. Now it is fixed and works fine.
Regards,
Danesh

Danesh Daroui

unread,
Jan 5, 2026, 1:23:49 PM (3 days ago) Jan 5
to Ahmad Abdelfattah, MAGMA User
Hi again Ahmad,
I have a question regarding the pattern I follow when mixing single and mgpu routines in MAGMA. First of all, I set the first available device for single GPU MAGMA calls. Then use all available GPUs (including the oneI had chosen) when calling MAGMA mgpu routines. Is that correct? I have seen in mgpu implementations in MAGMA that current device is read and the set again, so essentially MAGMA keeps what it was before when using multiple GPUs as I see.
Regards,
Danesh

Ahmad Abdelfattah

unread,
Jan 7, 2026, 6:20:34 AM (yesterday) Jan 7
to Danesh Daroui, MAGMA User
Hi Danesh, 

Yes, this sounds correct to me. 

Ahmad
Reply all
Reply to author
Forward
0 new messages