advice on finding eigenvalues and eigenvectors of large matrices

Tom Carroll

unread,

Sep 18, 2021, 2:47:03 PM9/18/21

to MAGMA User

Hello,

I'm a new user of MAGMA and new to the GPU game in general (but a long-time user of ScaLAPACK). I'm using a GPU node with the following:

AMD EPYC 7452 32-Core
256 GB system memory
4 x Nvidia A100 GPU's with 40 GB memory each

I've been running some tests and, based on the results, I'm planning to upgrade the system memory from 256 GB to 512 GB. Before I make the investment, I'm hoping that someone can confirm that I'm thinking about this correctly.

I'd like to be able to find the eigenvalues and eigenvectors of a double precision, symmetric, Hermitian matrix that is about 110,000 x 110,000. The following test at 90,000 x 90,000 is just about the biggest matrix I can solve:

$ testing/testing_dsyevd -N 90000 -JV --ngpu 4

% MAGMA 2.6.1 64-bit magma_int_t, 64-bit pointer.

Compiled with CUDA support for 8.0

% CUDA runtime 11030, driver 11030. OpenMP threads 32.

% device 0: NVIDIA A100-PCIE-40GB, 1410.0 MHz clock, 40536.2 MiB memory, capability 8.0

% device 1: NVIDIA A100-PCIE-40GB, 1410.0 MHz clock, 40536.2 MiB memory, capability 8.0

% device 2: NVIDIA A100-PCIE-40GB, 1410.0 MHz clock, 40536.2 MiB memory, capability 8.0

% device 3: NVIDIA A100-PCIE-40GB, 1410.0 MHz clock, 40536.2 MiB memory, capability 8.0

% Fri Sep 17 17:01:09 2021

% Usage: testing/testing_dsyevd [options] [-h|--help]

% jobz = Vectors needed, uplo = Lower, ngpu = 4

%======================================================================

90000 --- 831.9553 --- --- --- ok

That test used about 250 GB of system memory (all of it!) and topped out at about 18 GB per GPU. I expect my larger matrix to therefore use about (110/90)^2 x 250 = 375 GB of system memory and (110/90)^2 x 18 = 27 GB of each GPU memory. And, thus, upgrading to 512 GB system memory should be sufficient.

Thanks for any advice!!

(And also thanks to the developers -- MAGMA seems fantastic so far!)

Cheers,

tom

ps In case anyone might find this helpful, the trickiest part of the install was getting things set up correctly for ILP64. I used OpenBlas with

make USE_OPENMP=1 INTERFACE64=1 LIBNAMESUFFIX=64

Then I built MAGMA with the attached make.inc.

make.inc

Stanimire Tomov

unread,

Sep 19, 2021, 9:15:48 PM9/19/21

to Tom Carroll, MAGMA User

Tom,

Thank you for your feedback on MAGMA!

We are glad you find it useful.

Thank you also for the ILP64 example on compiling with OpenBLAS - we have

an example only with MKL in make.inc.mkl-icc-ilp64.

I managed to reproduce your case for 90K matrix. The matrix itself takes 64GB

plus some auxiliary space. When I query the system I get

-bash-4.2$ free -tg

              total        used        free      shared  buff/cache   available

Mem:           2015          93        1735         181         186        1738

Swap:             3           0           3

Total:         2019          93        1739

so we use about 93 GB but total-free = 280 GB. I guess the overhead here is a little larger

than on your system. Anyway, your question is what happens when we try to run on a 110K matrix.

What I get for this case is

-bash-4.2$ nvidia-smi 
Sat Sep 18 17:31:48 2021    
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A    234931      C   ./testing_dsyevd                25573MiB |
|    1   N/A  N/A    234931      C   ./testing_dsyevd                25555MiB |
|    2   N/A  N/A    234931      C   ./testing_dsyevd                25555MiB |
|    3   N/A  N/A    234931      C   ./testing_dsyevd                25555MiB |
+-----------------------------------------------------------------------------+

so each GPU used about 25 GB.

-bash-4.2$ free -tg
              total        used        free      shared  buff/cache   available
Mem:           2015         130        1608         270         275        1612
Swap:             3           0           3
Total:         2019         130        1612

so 130 GB used for user data and total is 2019 - 1612 = 407 GB.

This run is on DGX system with 8 GPUs each with 80 GB and the host has 2019 GB.
-bash-4.2$ ./testing_dsyevd -N 110000 -JV --ngpu 4
% MAGMA 2.6.1 svn 64-bit magma_int_t, 64-bit pointer.

Compiled with CUDA support for 8.0

% CUDA runtime 11000, driver 11000. OpenMP threads 64. MKL 2019.0.3, MKL threads 64. 
% device 0: A100-SXM-80GB, 1410.0 MHz clock, 81252.2 MiB memory, capability 8.0
% device 1: A100-SXM-80GB, 1410.0 MHz clock, 81252.2 MiB memory, capability 8.0
% device 2: A100-SXM-80GB, 1410.0 MHz clock, 81252.2 MiB memory, capability 8.0
% device 3: A100-SXM-80GB, 1410.0 MHz clock, 81252.2 MiB memory, capability 8.0
% device 4: A100-SXM-80GB, 1410.0 MHz clock, 81252.2 MiB memory, capability 8.0
% device 5: A100-SXM-80GB, 1410.0 MHz clock, 81252.2 MiB memory, capability 8.0
% device 6: A100-SXM-80GB, 1410.0 MHz clock, 81252.2 MiB memory, capability 8.0
% device 7: A100-SXM-80GB, 1410.0 MHz clock, 81252.2 MiB memory, capability 8.0
% Sat Sep 18 17:27:05 2021
% Usage: ./testing_dsyevd [options] [-h|--help]

% jobz = Vectors needed, uplo = Lower, ngpu = 4
%   N   CPU Time (sec)   GPU Time (sec)   |S-S_magma|   |A-USU^H|   |I-U^H U|

%============================================================================
110000      ---            932.1178           ---           ---         ---      ok

Stan

--
You received this message because you are subscribed to the Google Groups "MAGMA User" group.
To unsubscribe from this group and stop receiving emails from it, send an email to magma-user+...@icl.utk.edu.
To view this discussion on the web visit https://groups.google.com/a/icl.utk.edu/d/msgid/magma-user/dc91b5aa-21ed-4835-9e64-87e95ce121ecn%40icl.utk.edu.
<make.inc>

Mark Gates

unread,

Sep 20, 2021, 11:05:36 AM9/20/21

to Stanimire Tomov, Tom Carroll, MAGMA User

Hi Tom,

This isn't ready yet, but SLATE will provide a distributed, GPU-accelerated eigenvalue solver. The advantage with SLATE is that it uses a more efficient tiled representation of the matrix, so it should use about half the memory that MAGMA uses, plus being able to use multiple nodes.

http://icl.utk.edu/slate/

Within MAGMA, you may find that the 2stage eigenvalue solver, testing_dsyevdx_2stage, is significantly faster. The "x" version also allows finding a subset of eigenvectors.

Mark

Tom Carroll

unread,

Sep 20, 2021, 4:50:14 PM9/20/21

to MAGMA User, mga...@icl.utk.edu, Tom Carroll, MAGMA User, to...@icl.utk.edu

Hi Stan and Mark,

Thanks for your replies! That is all very helpful. I feel confident moving ahead with the upgrade.

And, yes, I've been using ScaLAPACK for many years so I am looking forward to SLATE -- as soon as it can do eigenvectors!