dgetrf_gpu and memory problem

nima sahraneshin

unread,

Aug 22, 2020, 8:08:12 AM8/22/20

to MAGMA User

Hi,

I am running MGMA on Volata GPU, but it seems that I have a problem with "dgetrf_gpu". Just to understand the problem I check the memory before and after these lines of code:

if ( m == n ) {

dAT = dA;

lddat = ldda;

magmablas_dtranspose_inplace( m, dAT(0,0), lddat, queues[0] );

}

else {

lddat = maxn; // N-by-M

if (MAGMA_SUCCESS != magma_dmalloc( &dAT, lddat*maxm )) {

*info = MAGMA_ERR_DEVICE_ALLOC;

printf("line 191\n");

goto cleanup;

}

magmablas_dtranspose( m, n, dA(0,0), ldda, dAT(0,0), lddat, queues[0] );

}

magma_queue_sync( queues[0] ); // finish transpose

And the result is like that:

GPU0 memory: free=29520384, total=33290752

GPU1 memory: free=31187456, total=33290752

////

GPU0 memory: free=0 , total=0

GPU1 memory: free=0, total=0

I am using "cudaMemGetInfo( &free, &total )" for checking. Do you have any idea about things that are happening here?

Ahmad Abdelfattah

unread,

Aug 22, 2020, 2:43:09 PM8/22/20

to nima sahraneshin, MAGMA User

Can you please send us the MAGMA output you are getting when running dgetrf_gpu? What sizes are you testing for? Does the factorization pass or fail?

Thanks,

Ahmad

--
You received this message because you are subscribed to the Google Groups "MAGMA User" group.
To unsubscribe from this group and stop receiving emails from it, send an email to magma-user+...@icl.utk.edu.
To view this discussion on the web visit https://groups.google.com/a/icl.utk.edu/d/msgid/magma-user/120d0edc-5b95-4e40-b2a6-8c182c8756f2n%40icl.utk.edu.

Nima Sahraneshin

unread,

Aug 23, 2020, 10:06:45 AM8/23/20

to Ahmad Abdelfattah, MAGMA User

Thanks for your reply.

MAGMA output you are getting when running dgetrf_gpu?

The last argument is info which here equal to -112.

What sizes are you testing for?

The size is from 512 to larger dimension. Actually when I am in hybrid mode I have a problem, because I think it is not possible to malloc pinned memory after doing the transposition of the matrix.

I am not sure but I think the problem should come from "magmablas_dtranspose_inplace", but I don't know how the total memory in GPU is 0 after that function.

Does the factorization pass or fail?

Fail.

Ahmad Abdelfattah

unread,

Aug 23, 2020, 10:45:00 AM8/23/20

to Nima Sahraneshin, MAGMA User

The error code -112 is MAGMA_ERR_HOST_ALLOC (defined in magma_types.h under include/), which means that MAGMA could not allocate memory on the CPU side. So the routines exits without proceeding further. No factorization is performed.

Can you please copy and paste the full MAGMA output on your terminal? MAGMA usually prints some useful information about the environment it is running on.

Thanks,

Ahmad

Nima Sahraneshin

unread,

Aug 23, 2020, 11:18:02 AM8/23/20

to Ahmad Abdelfattah, MAGMA User

Thanks Ahmad.

Do we have any debug mode in MAGMA? because I am not seeing more info from MAGMA, but I am providing some info that I know.

CUDA Version 10.0.130
OpenBLAS-0.3.10
gcc (Ubuntu 7.5.0-3ubuntu1~19.10) 7.5.0
Intel(R) Xeon(R) Gold 6126 CPU
Tesla V100
NVIDIA-SMI 450.57

RAM 62GB

But if it is related to the host why am I seeing GPU memory 0?

Ahmad Abdelfattah

unread,

Aug 23, 2020, 11:36:30 AM8/23/20

to Nima Sahraneshin, MAGMA User

Okay, I thought you were using the MAGMA default tester. Can you please run testing_dgetrf_gpu (under testing/) for the same sizes you mentioned? This testing code should be the right setup for calling the dgetrf routine. If it fails as well, please provide me with the output you are getting.

You can run something like:

./testing_dgetrf_gpu -N 512 --version 1 // (hybrid mode — CPU and GPU)

./testing_dgetrf_gpu -N 512 --version 3 // (native mode — GPU only)

I’m trying to find out first where the error is coming from (MAGMA’s routine or the code calling it).

Thanks,

Ahmad

Nima Sahraneshin

unread,

Aug 23, 2020, 11:46:50 AM8/23/20

to Ahmad Abdelfattah, MAGMA User

Okay. Here is the output:

./testing_dgetrf_gpu -N 512 --version 1

% MAGMA 2.5.3 compiled for CUDA capability >= 3.0, 32-bit magma_int_t, 64-bit pointer.
% CUDA runtime 10000, driver 11000. OpenMP threads 24.
% device 0: Tesla V100-PCIE-32GB, 1380.0 MHz clock, 32510.5 MiB memory, capability 7.0
% device 1: Tesla V100S-PCIE-32GB, 1597.0 MHz clock, 32510.5 MiB memory, capability 7.0
% Sun Aug 23 17:40:14 2020
% Usage: ./testing_dgetrf_gpu [options] [-h|--help]

% version 1
% M N CPU Gflop/s (sec) GPU Gflop/s (sec) |PA-LU|/(N*|A|)
%========================================================================
magma_dgetrf_gpu returned error 1: function-specific error, see documentation.

512 512 --- ( --- ) 0.09 ( 0.97) ---

./testing_dgetrf_gpu -N 512 --version 3

% MAGMA 2.5.3 compiled for CUDA capability >= 3.0, 32-bit magma_int_t, 64-bit pointer.
% CUDA runtime 10000, driver 11000. OpenMP threads 24.
% device 0: Tesla V100-PCIE-32GB, 1380.0 MHz clock, 32510.5 MiB memory, capability 7.0
% device 1: Tesla V100S-PCIE-32GB, 1597.0 MHz clock, 32510.5 MiB memory, capability 7.0
% Sun Aug 23 17:40:49 2020
% Usage: ./testing_dgetrf_gpu [options] [-h|--help]

% version 3
% M N CPU Gflop/s (sec) GPU Gflop/s (sec) |PA-LU|/(N*|A|)
%========================================================================
512 512 --- ( --- ) 0.10 ( 0.89) ---

Ahmad Abdelfattah

unread,

Aug 23, 2020, 12:03:14 PM8/23/20

to Nima Sahraneshin, MAGMA User

It looks like the native factorization is working. You can run both versions with the ‘-c' option to check the factorization.

For the hybrid factorization, a positive error code means that the factorization algorithm has encountered a singularity in the matrix (see the documentation under src/zgetrf_gpu.cpp). I doubt that something is wrong on the CPU side, because the hybrid mode performs the panel factorization on the CPU.

The good thing is that you are not getting the -112 error code, so you need to compare your code against the default testing code.

I have tested both versions on my side, and both work fine. The default tester also lets you test a full CPU factorization (using OpenBLAS in your case). You can run with the options (-l -c) to test the CPU and check the factorization.

Ahmad

Reply all

Reply to author

Forward