Hi Gianluca,
Can you provide some more specifics?
- What kind of machine are you running on (what CPU, GPU, # cores)?
- What CPU BLAS/LAPACK library are you using?
- What is your target matrix size?
- Do you need eigenvalues only, left eigvecs, right eigvecs, or both?
- The output of MAGMA's testing_zgeev would be helpful (see below).
Matrix size n = 1000 is pretty small for MAGMA. I wouldn't expect it to be able to accelerate; seeing a slow-down compared to LAPACK is not surprising, depending on the CPUs and GPUs.
Note that the non-symmetric eigenvalue problem is memory bound, so it won't get close to the peak flop/s performance of GPUs. In this case, MAGMA is taking advantage of the higher bandwidth of GPU memory vs. CPU memory.
Example with 1 Volta V100-SXM2-32GB GPU and 10 CPU cores, Intel E5-2698 v4 @ 2.20GHz.
# Eigenvalues only, no vectors case (-LN -RN).
magma/testing> export OMP_NUM_THREADS=10
magma/testing> ./testing_zgeev -n 100:900:100 -n 1000:20000:1000 --lapack -LN -RN
% MAGMA 2.9.0 svn 32-bit magma_int_t, 64-bit pointer.
% Compiled for CUDA architectures 70
% CUDA runtime 11080, driver 12070. OpenMP threads 10. MKL 2024.0.2, MKL threads 10.
% device 0: Tesla V100-SXM2-32GB, 1530.0 MHz clock, 32494.1 MiB memory, capability 7.0
% Mon Mar 10 16:24:27 2025
% Usage: ./testing_zgeev [options] [-h|--help]
% jobvl = No vectors, jobvr = No vectors, ngpu = 1
% N CPU Time (sec) GPU Time (sec) |W_magma - W_lapack| / |W_lapack|
%==========================================================================
100 0.02 0.02 1.68e-15 ok
200 0.05 0.05 2.52e-15 ok
300 0.09 0.09 2.84e-15 ok
400 0.15 0.15 2.96e-15 ok
500 0.24 0.23 2.98e-15 ok
600 0.47 0.41 3.49e-15 ok
700 0.59 0.56 3.83e-15 ok
800 0.72 0.66 3.77e-15 ok
900 0.82 0.75 3.52e-15 ok
1000 0.99 0.85 3.54e-15 ok
2000 3.40 2.81 4.38e-15 ok
3000 10.05 7.98 5.21e-15 ok
4000 19.73 13.12 5.41e-15 ok
5000 35.53 19.15 5.53e-15 ok
6000 63.72 33.08 6.75e-15 ok
7000 91.87 43.48 6.70e-15 ok
8000 135.70 58.77 6.76e-15 ok
# Left eigenvectors, no right eigvec (-LV -RN).
magma/testing> ./testing_zgeev -n 100:900:100 -n 1000:20000:1000 --lapack -LV -RN
% MAGMA 2.9.0 svn 32-bit magma_int_t, 64-bit pointer.
% Compiled for CUDA architectures 70
% CUDA runtime 11080, driver 12070. OpenMP threads 10. MKL 2024.0.2, MKL threads 10.
% device 0: Tesla V100-SXM2-32GB, 1530.0 MHz clock, 32494.1 MiB memory, capability 7.0
% Mon Mar 10 16:34:41 2025
% Usage: ./testing_zgeev [options] [-h|--help]
% jobvl = Vectors needed, jobvr = No vectors, ngpu = 1
% N CPU Time (sec) GPU Time (sec) |W_magma - W_lapack| / |W_lapack|
%==========================================================================
100 0.02 0.03 1.82e-15 ok
200 0.07 0.07 2.60e-15 ok
300 0.11 0.12 2.84e-15 ok
400 0.19 0.18 2.97e-15 ok
500 0.30 0.27 3.13e-15 ok
600 0.52 0.48 3.70e-15 ok
700 0.66 0.61 3.68e-15 ok
800 0.87 0.76 3.77e-15 ok
900 1.02 0.91 3.63e-15 ok
1000 1.15 1.08 3.47e-15 ok
2000 4.69 4.15 4.24e-15 ok
3000 14.57 11.40 5.28e-15 ok
4000 28.43 20.98 5.54e-15 ok
5000 50.63 33.22 5.48e-15 ok
6000 85.10 54.21 6.79e-15 ok
7000 125.27 72.77 6.72e-15 ok
8000 182.26 102.08 6.77e-15 ok
Mark