Problems running diag driver

38 views
Skip to first unread message

Otto Kohulák

unread,
Nov 24, 2021, 10:37:05 AM11/24/21
to MAGMA User
Dear MAGMA community,

I am trying to run diagonalization of general matrix via magmaf_dgeev driver. However, I have encoutered to a strange status. The error status is -113. According to the documentation [1], negative status messages can be up to # of arguments of dgeev which is 13.

I am using gcc 11.2.0 with cuda 11.5 and MAGMA 2.6.1 (I know aout the issues with cuda 11 but they should affect only sparse algebra, right?)

In the attachments there is my sample program. Its output is here:


################
 Getting correct size for workspace
 Status:           0
 Allocating:     19800000
 Status:        -113
################

Can you help me please?

Best, 

Otto.


sanity_diag.F90

Stanimire Tomov

unread,
Nov 24, 2021, 12:17:05 PM11/24/21
to Otto Kohulák, MAGMA User
Hi Otto,

I don’t see anything wrong with the program and actually ran it on my system to try to reproduce the problem.
At first I thought the Fortran initialization 
  A_magma = 0
  A_magma(1,2) = 0.5
  A_magma(2,1) = 0.2
may not be good. Would this indeed initialize a 3x3 matrix?

Anyway, even with this initialization, I get
tomov$ ./sanity_diag 
 Getting correct size for workspace
 Status:           0
 Allocating:          198
 Status:           0

I had to change magma_init to magmaf_init and magma_finalize to magmaf_finalize.
If the initialization and the naming do not fix it, I would say there could be 
a problem with the CPU LAPACK that you are using. For a 3x3 matrix the routine
actually is not engaging the GPU and calls straight the CPU LAPACK routine.

Another difference is that your output say 
 Allocating:     19800000
Why are there so many zeroes after the 198? The lwork for n=3 should be just 198.

Thanks,
Stan 

--
You received this message because you are subscribed to the Google Groups "MAGMA User" group.
To unsubscribe from this group and stop receiving emails from it, send an email to magma-user+...@icl.utk.edu.
To view this discussion on the web visit https://groups.google.com/a/icl.utk.edu/d/msgid/magma-user/233c366c-2da5-4055-9c79-67b041bd13adn%40icl.utk.edu.
<sanity_diag.F90>

Otto Kohulák

unread,
Nov 24, 2021, 2:49:26 PM11/24/21
to Stanimire Tomov, MAGMA User
Dear Stanimire,

first of all, thank you for your answer.

  • The initialization of the matrix is incorrect. When I was writing the e-mail I was trying to reduce the sample code to a minimal size which reproduces the error. Accidentally I remove the part which puts ones on the diagonal. I started with a small 3x3 problem where is easy to see if the results are correct or not by eye.
  • Trailing zeros on the lwork is an error. Of course, it is 198.
  • In the beginning, I was not aware MAGMA has a Fortran interface, therefore, I wrote my own which addresses magma_init_ -> magma_init. I already moved to magmaf_* native interface.
  • The valuable information for me is it works on your computer. It means something is broken on my system. I was able to run other routines like magma(blas)_dgemm, magma_dgetrf, magma_dgetrs, producing correct results. So my installation cannot be completely wrong. However, I don't have the test suite, because I installed magma through spack and I could not find tests in the spack install location. I have installed it with CUDA and openblas backends:

ma...@2.6.1%g...@11.2.0+cuda+fortran~ipo~rocm+shared amdgpu_target=none build_type=RelWithDebInfo cuda_arch=80 arch=linux-debian11-zen
    ^cm...@3.21.4%g...@11.2.0~doc+ncurses+openssl+ownlibs~qt build_type=Release arch=linux-debian11-zen
        ^ncurses@6.2%g...@11.2.0~symlinks+termlib abi=none arch=linux-debian11-zen
            ^pkg...@1.8.0%g...@11.2.0 arch=linux-debian11-zen
        ^ope...@1.1.1l%g...@11.2.0~docs certs=system arch=linux-debian11-zen
            ^pe...@5.34.0%g...@11.2.0+cpanm+shared+threads arch=linux-debian11-zen
                ^berke...@18.1.40%g...@11.2.0+cxx~docs+stl patches=b231fcc4d5cff05e5c3a4814f6a5af0e9a966428dc2176540d2c05aff41de522 arch=linux-debian11-zen
                ^bz...@1.0.8%g...@11.2.0~debug~pic+shared arch=linux-debian11-zen
                    ^diffutils@3.8%g...@11.2.0 arch=linux-debian11-zen
                        ^libi...@1.16%g...@11.2.0 libs=shared,static arch=linux-debian11-zen
                ^gd...@1.19%g...@11.2.0 arch=linux-debian11-zen
                    ^readline@8.1%g...@11.2.0 arch=linux-debian11-zen
                ^zl...@1.2.11%g...@11.2.0+optimize+pic+shared arch=linux-debian11-zen
    ^cu...@11.5.0%g...@11.2.0~dev arch=linux-debian11-zen
        ^lib...@2.9.12%g...@11.2.0~python arch=linux-debian11-zen
            ^x...@5.2.5%g...@11.2.0~pic libs=shared,static arch=linux-debian11-zen
    ^open...@0.3.18%g...@11.2.0~bignuma~consistent_fpcsr~ilp64+locking+pic+shared threads=none arch=linux-debian11-zen

I tried to install it also from the source, but I encountered some installation issues. Like, at first I started with the CMake build system, it didn't work out-of-the-box, It was complaining about missing CMake.src file, which I was able to produce with the make, but then other problems emerged complaining about missing sources for targets. Is there documentation about MAGMAs CMake build system? On the website, I found info about Makefile&make.inc.

Anyway, I will try to track the problem on my PC. Maybe I can try MAGMA with MKL.

Thank you again.

Otto

st 24. 11. 2021 o 18:17 Stanimire Tomov <to...@icl.utk.edu> napísal(a):

Mark Gates

unread,
Jan 7, 2022, 8:56:13 PM1/7/22
to Otto Kohulák, MAGMA User
Hi Otto,

Just to clarify for the record, the -113 error is a MAGMA allocation error on the GPU device:

magma> grep -- -113 include/*.h
include/magma_types.h:#define MAGMA_ERR_DEVICE_ALLOC     -113     ///< could not malloc GPU device memory

Sorry these additional MAGMA errors are not well documented in each routine.

Mark
--
Innovative Computing Laboratory
University of Tennessee, Knoxville
Reply all
Reply to author
Forward
0 new messages