The testing_zgetrf use only 1 thread in cpu.

spyros Liolis

unread,

Jan 27, 2025, 10:40:34 AMJan 27

to MAGMA User

Hi all,

First of all congrats to all magma's developers. It is a great tool because with minimum changes we can run our codes with much more speed.

I have compiled the magma with MKL support and everything works OK.

When I run the ./magma_zgetrf -l (for lapack) I get the results for gpu and also for cpu.

In this test the lapack function uses all the threads (32 threads of the intel I9 13900 cpu) and also the 100% of the GPU.

When I test without -l option the GPU usage is 100% but the CPU use only 1 thread, even I have been using large number in ./testing_zgetrf -n 15000.

It supposed magma can run hybrid and if I am right understanding it must use more cpus.

Why use only 1 thread. Should I set any env variable?

Thanks

Regards,

Spyros.

Mark Gates

unread,

Jan 27, 2025, 11:59:23 AMJan 27

to spyros Liolis, MAGMA User

For certain calls, MAGMA is hybrid. (Some calls are native, GPU-only, which is often faster than hybrid on today's GPUs.) For getrf, MAGMA runs the panel on the CPU, and the trailing matrix updates on the GPU. In this case, it doesn't do anything for CPU parallelism; it relies on the underlying LAPACK and BLAS library (e.g., Intel MKL or OpenBLAS) for CPU parallelism. Because the panel is not very wide (nb = 32), the LAPACK library may decide there is not enough work for multiple threads.

You could experiment with larger block sizes in magma_get_zgetrf_nb() in

magma/control/get_nb.cpp

For many LAPACK and BLAS libraries, $OMP_NUM_THREADS controls the number of threads that are used.

Mark

spyros Liolis

unread,

Jan 29, 2025, 10:27:23 AMJan 29

to MAGMA User, mga...@icl.utk.edu, MAGMA User, spyros Liolis

OK. I will experiment with larger block and I will let you know.

Thanks,

Spyros.

spyros Liolis

unread,

Jan 29, 2025, 10:28:46 AMJan 29

to MAGMA User, spyros Liolis

I would also add that my system is linux rocky8.

Reply all

Reply to author

Forward