Info about magmaf_zgeev_gpu

29 views
Skip to first unread message

Gianluca Frazzei

unread,
Mar 10, 2025, 9:49:38 AMMar 10
to MAGMA User
As for the title, I was wondering if such function is under construction, if there is some information about problems that were encountered in its creation, why doesn't it exists, and so on.

thanks a lot to you all

Mark Gates

unread,
Mar 10, 2025, 10:03:08 AMMar 10
to Gianluca Frazzei, MAGMA User
Hi Gianluca,

There's magmaf_zgeev (in Fortran) and magma_zgeev (in C), which is a hybrid CPU–GPU routine that takes the matrix in CPU memory. There is no version that takes the matrix in GPU memory. A significant portion of work is still done on the CPU, so there did not seem to be a large benefit to having a GPU interface.

Mark

Gianluca Frazzei

unread,
Mar 10, 2025, 12:42:08 PMMar 10
to MAGMA User, mga...@icl.utk.edu, MAGMA User, Gianluca Frazzei
I am still a newbie in all of this, so I'll try and ask this instead: I am interested in diagonalizing Non-Hermitian Hamiltonians but on my machine I get stuck at the order of magnitude 10^3 of the system size. I found out that by using the magmaf  version of zgeev it actually takes more computing time to obtain the eigenstates and eigenvalues compared to the standard lapack one, and I felt really confused about this. I was wondering of you knew how could this be the case, and if you also knew any way to leverage the magma library to speed up my computations.
Thanks a lot

Mark Gates

unread,
Mar 10, 2025, 1:03:42 PMMar 10
to Gianluca Frazzei, MAGMA User
Hi Gianluca,

Can you provide some more specifics?
  • What kind of machine are you running on (what CPU, GPU, # cores)?
  • What CPU BLAS/LAPACK library are you using?
  • What is your target matrix size?
  • Do you need eigenvalues only, left eigvecs, right eigvecs, or both?
  • The output of MAGMA's testing_zgeev would be helpful (see below).
Matrix size n = 1000 is pretty small for MAGMA. I wouldn't expect it to be able to accelerate; seeing a slow-down compared to LAPACK is not surprising, depending on the CPUs and GPUs.

Note that the non-symmetric eigenvalue problem is memory bound, so it won't get close to the peak flop/s performance of GPUs. In this case, MAGMA is taking advantage of the higher bandwidth of GPU memory vs. CPU memory.

Example with 1 Volta V100-SXM2-32GB GPU and 10 CPU cores, Intel E5-2698 v4 @ 2.20GHz.

# Eigenvalues only, no vectors case (-LN -RN).
magma/testing> export OMP_NUM_THREADS=10
magma/testing> ./testing_zgeev -n 100:900:100 -n 1000:20000:1000 --lapack -LN -RN
% MAGMA 2.9.0 svn 32-bit magma_int_t, 64-bit pointer.
% Compiled for CUDA architectures 70
% CUDA runtime 11080, driver 12070. OpenMP threads 10. MKL 2024.0.2, MKL threads 10.
% device 0: Tesla V100-SXM2-32GB, 1530.0 MHz clock, 32494.1 MiB memory, capability 7.0
% Mon Mar 10 16:24:27 2025
% Usage: ./testing_zgeev [options] [-h|--help]

% jobvl = No vectors, jobvr = No vectors, ngpu = 1
%   N   CPU Time (sec)   GPU Time (sec)   |W_magma - W_lapack| / |W_lapack|
%==========================================================================
  100      0.02             0.02          1.68e-15   ok
  200      0.05             0.05          2.52e-15   ok
  300      0.09             0.09          2.84e-15   ok
  400      0.15             0.15          2.96e-15   ok
  500      0.24             0.23          2.98e-15   ok
  600      0.47             0.41          3.49e-15   ok
  700      0.59             0.56          3.83e-15   ok
  800      0.72             0.66          3.77e-15   ok
  900      0.82             0.75          3.52e-15   ok
 1000      0.99             0.85          3.54e-15   ok
 2000      3.40             2.81          4.38e-15   ok
 3000     10.05             7.98          5.21e-15   ok
 4000     19.73            13.12          5.41e-15   ok
 5000     35.53            19.15          5.53e-15   ok
 6000     63.72            33.08          6.75e-15   ok
 7000     91.87            43.48          6.70e-15   ok
 8000    135.70            58.77          6.76e-15   ok


# Left eigenvectors, no right eigvec (-LV -RN).
magma/testing> ./testing_zgeev -n 100:900:100 -n 1000:20000:1000 --lapack -LV -RN
% MAGMA 2.9.0 svn 32-bit magma_int_t, 64-bit pointer.
% Compiled for CUDA architectures 70
% CUDA runtime 11080, driver 12070. OpenMP threads 10. MKL 2024.0.2, MKL threads 10.
% device 0: Tesla V100-SXM2-32GB, 1530.0 MHz clock, 32494.1 MiB memory, capability 7.0
% Mon Mar 10 16:34:41 2025
% Usage: ./testing_zgeev [options] [-h|--help]

% jobvl = Vectors needed, jobvr = No vectors, ngpu = 1
%   N   CPU Time (sec)   GPU Time (sec)   |W_magma - W_lapack| / |W_lapack|
%==========================================================================
  100      0.02             0.03          1.82e-15   ok
  200      0.07             0.07          2.60e-15   ok
  300      0.11             0.12          2.84e-15   ok
  400      0.19             0.18          2.97e-15   ok
  500      0.30             0.27          3.13e-15   ok
  600      0.52             0.48          3.70e-15   ok
  700      0.66             0.61          3.68e-15   ok
  800      0.87             0.76          3.77e-15   ok
  900      1.02             0.91          3.63e-15   ok
 1000      1.15             1.08          3.47e-15   ok
 2000      4.69             4.15          4.24e-15   ok
 3000     14.57            11.40          5.28e-15   ok
 4000     28.43            20.98          5.54e-15   ok
 5000     50.63            33.22          5.48e-15   ok
 6000     85.10            54.21          6.79e-15   ok
 7000    125.27            72.77          6.72e-15   ok
 8000    182.26           102.08          6.77e-15   ok



Mark

Reply all
Reply to author
Forward
0 new messages