Data partition and allocate specific GPU memory

79 views
Skip to first unread message

Hyesun

unread,
Jul 20, 2021, 10:16:58 AM7/20/21
to MAGMA User
Hi,

I am a beginner of MAGMA. 

Following the testing app named "testing_dsyevd.cpp", I can execute large N with multiple GPUs. When the matrix size is too big to fit in the CPU memory, I can't execute the dsyevd routine. 

To solve the problem, I think partitioning of input data can be a solution. Do you have any example code for this? Or you can tell me how to manage the partitioned data and execute the routine in the multiple GPUs. 

I can use these functions named "magma_dsetmatrix" and "magma_dgetmatrix" to tranfer data CPU/GPU to GPU/CPU. Can I allocate the specific GPU memory? After transferring data to GPU, is it right to call "magma_dsyevd_m( abs_ngpu, ,,,)" to execute with multiple GPUs? 

Best regards,
Hyesun

Mark Gates

unread,
Jul 22, 2021, 2:32:51 PM7/22/21
to Hyesun, MAGMA User
On Tue, Jul 20, 2021 at 10:17 AM Hyesun <angeli...@gmail.com> wrote:
Following the testing app named "testing_dsyevd.cpp", I can execute large N with multiple GPUs. When the matrix size is too big to fit in the CPU memory, I can't execute the dsyevd routine.

Do you mean GPU device memory? If it doesn't fit in CPU host memory, you would need an out-of-core algorithm that saves part of it on disk. There has been research on these, but nothing in MAGMA. Usually we assume the CPU memory is bigger than the combined GPU memory, so we can have a copy of the matrix in CPU memory, as well as in GPU memory.


To solve the problem, I think partitioning of input data can be a solution. Do you have any example code for this? Or you can tell me how to manage the partitioned data and execute the routine in the multiple GPUs.

dsyevd_m takes the matrix in CPU host memory, then internally distributes it to multiple GPUs. It should be able to solve a matrix larger than will fit in one GPU, but the matrix needs to fit in the CPU host memory.

 
 I can use these functions named "magma_dsetmatrix" and "magma_dgetmatrix" to tranfer data CPU/GPU to GPU/CPU. Can I allocate the specific GPU memory? After transferring data to GPU, is it right to call "magma_dsyevd_m( abs_ngpu, ,,,)" to execute with multiple GPUs? 

Yes, magma_d{set,get}matrix transfer data between CPU <=> GPU. You can allocate using magma_dmalloc. However, for dsyevd_m, all matrices are passed in CPU host memory, so there is no need to allocate GPU memory or transfer to the GPU. The routine handles all that internally.

Mark


Best regards,
Hyesun


--
Innovative Computing Laboratory
University of Tennessee, Knoxville

Hyesun

unread,
Jul 29, 2021, 1:07:04 PM7/29/21
to MAGMA User, mga...@icl.utk.edu, MAGMA User, Hyesun
Thanks a lot :) 
I wanted to execute dsyevd_m routine using 50,000 x 50,000 matrix. 
However, it can't allocate memory on CPU host. That's why I wanted to alloc gpu memory and distribute input manually. 
I can't understand why it returns that error because I have a 512 G mem in DGX server. Could you tell me why? 

./testing_dsyevd --ngpu 8 -n 50000
% MAGMA 2.6.0  32-bit magma_int_t, 64-bit pointer.
Compiled with CUDA support for 6.0
% CUDA runtime 11000, driver 11000. OpenMP threads 80. MKL 2021.0.3, MKL threads 40.
% device 0: Tesla P100-SXM2-16GB, 1480.5 MHz clock, 16280.9 MiB memory, capability 6.0
% device 1: Tesla P100-SXM2-16GB, 1480.5 MHz clock, 16280.9 MiB memory, capability 6.0
% device 2: Tesla P100-SXM2-16GB, 1480.5 MHz clock, 16280.9 MiB memory, capability 6.0
% device 3: Tesla P100-SXM2-16GB, 1480.5 MHz clock, 16280.9 MiB memory, capability 6.0
% device 4: Tesla P100-SXM2-16GB, 1480.5 MHz clock, 16280.9 MiB memory, capability 6.0
% device 5: Tesla P100-SXM2-16GB, 1480.5 MHz clock, 16280.9 MiB memory, capability 6.0
% device 6: Tesla P100-SXM2-16GB, 1480.5 MHz clock, 16280.9 MiB memory, capability 6.0
% device 7: Tesla P100-SXM2-16GB, 1480.5 MHz clock, 16280.9 MiB memory, capability 6.0
% Tue Jul 27 15:30:28 2021
% Usage: ./testing_dsyevd [options] [-h|--help]

% jobz = No vectors, uplo = Lower, ngpu = 8
%   N   CPU Time (sec)   GPU Time (sec)   |S-S_magma|   |A-USU^H|   |I-U^H U|
%============================================================================
Error: magma_dmalloc_cpu( &h_A, N*lda )
failed at testing/testing_dsyevd.cpp:151: error -112: cannot allocate memory on CPU host

Best Regards,
Hyesun

2021년 7월 23일 금요일 오전 3시 32분 51초 UTC+9에 mga...@icl.utk.edu님이 작성:

Stanimire Tomov

unread,
Jul 29, 2021, 1:16:46 PM7/29/21
to Hyesun, MAGMA User, mga...@icl.utk.edu
Looks like you have compiled magma for 32-bit integers so the range is too short
to compute that large matrices. You can use the make.inc-examples/make.inc.mkl-gcc-ilp64
as a make.inc example on how to compile for 64-bit integers and that should fix the problem.
Stan

--
You received this message because you are subscribed to the Google Groups "MAGMA User" group.
To unsubscribe from this group and stop receiving emails from it, send an email to magma-user+...@icl.utk.edu.
To view this discussion on the web visit https://groups.google.com/a/icl.utk.edu/d/msgid/magma-user/390d8104-ecb9-4df4-8ba7-a4eabcb98c07n%40icl.utk.edu.

Hyesun

unread,
Aug 1, 2021, 11:11:01 PM8/1/21
to MAGMA User, to...@icl.utk.edu, MAGMA User, mga...@icl.utk.edu, Hyesun
Thank you. It solved in our DGX server. 
We have another server which consists of AMD CPU and NVIDIA GPU. 
In that server, I compiled magma using make.inc-examples/make.inc.openblas. 

Hyesun 

2021년 7월 30일 금요일 오전 2시 16분 46초 UTC+9에 to...@icl.utk.edu님이 작성:
Reply all
Reply to author
Forward
0 new messages