Re: large index sgemm MI250X

26 views
Skip to first unread message

Ahmad Abdelfattah

unread,
Mar 21, 2022, 8:58:02 AMMar 21
to Ramakrishnan Kannan, MAGMA User, Lu...@ornl.gov, sa...@ornl.gov, to...@icl.utk.edu
Hi Ramki, 

I think you need to build MAGMA with support for 64-bit integers. The tester output shows that MAGMA is compiled with 32-bit integers, which may cause failures when you try to allocate large chunks of memory (e.g. the number of bytes exceeds the 32-bit int limit). 

There should be no limit on the sizes for the SGEMM kernel. 

Ahmad

On Mar 21, 2022, at 8:27 AM, Ramakrishnan Kannan <ramakrish...@gmail.com> wrote:

Dear All,

The testing_sgemm is able to support a maximum output matrix of size M=32768, N=32768. When I try to run larger sizes, I am getting the following error. 

:0:rocdevice.cpp            :2603: 408817871536 us: 44300: [tid:0x7f6b70ba8700] Device::callbackQueue aborting with error : HSA_STATUS_ERROR_MEMORY_FAULT: Agent attempted to access an inaccessible address. code: 0x2b

HIPBLAS could handle large matrix sizes for sgemm.

1. What is the maximum output matrix size that MAGMA could handle?
2. How to run MAGMA for large matrix sizes? 

For your reference, I am pasting the entire console output.

Faulty run with large matrix size.
ramki@login2:~/magma/testing>./testing_sgemm -N 49152,49152,1024
% MAGMA 2.6.1  32-bit magma_int_t, 64-bit pointer.
% HIP runtime 50013601, driver 50013601. OpenMP threads 128.
% device 0: , 1700.0 MHz clock, 65520.0 MiB memory, capability 9.0
% device 1: , 1700.0 MHz clock, 65520.0 MiB memory, capability 9.0
% Mon Mar 21 08:18:15 2022
% Usage: ./testing_sgemm [options] [-h|--help]

% If running lapack (option --lapack), MAGMA and HIP error are both computed
% relative to CPU BLAS result. Else, MAGMA error is computed relative to HIP result.

% transA = No transpose, transB = No transpose
%   M     N     K   MAGMA Gflop/s (ms)  HIP Gflop/s (ms)   CPU Gflop/s (ms)  MAGMA error  HIP error
%========================================================================================================
:0:rocdevice.cpp            :2603: 408817871536 us: 44300: [tid:0x7f6b70ba8700] Device::callbackQueue aborting with error : HSA_STATUS_ERROR_MEMORY_FAULT: Agent attempted to access an inaccessible address. code: 0x2b

Aborted (core dumped)

Correct run
ramki@login2:~/magma/testing>./testing_sgemm -N 32768,32768,1024
% MAGMA 2.6.1  32-bit magma_int_t, 64-bit pointer.
% HIP runtime 50013601, driver 50013601. OpenMP threads 128.
% device 0: , 1700.0 MHz clock, 65520.0 MiB memory, capability 9.0
% device 1: , 1700.0 MHz clock, 65520.0 MiB memory, capability 9.0
% Mon Mar 21 08:19:16 2022
% Usage: ./testing_sgemm [options] [-h|--help]

% If running lapack (option --lapack), MAGMA and HIP error are both computed
% relative to CPU BLAS result. Else, MAGMA error is computed relative to HIP result.

% transA = No transpose, transB = No transpose
%   M     N     K   MAGMA Gflop/s (ms)  HIP Gflop/s (ms)   CPU Gflop/s (ms)  MAGMA error  HIP error
%========================================================================================================
32768 32768  1024   6330.29 ( 347.38)     347.26 (6332.52)     ---   (  ---  )    1.55e-09        ---    ok

Regards,
Ramki

Ramakrishnan Kannan

unread,
Mar 21, 2022, 9:21:59 AMMar 21
to MAGMA User, Lu...@ornl.gov, sa...@ornl.gov, ah...@icl.utk.edu, to...@icl.utk.edu

Ramakrishnan Kannan

unread,
Mar 21, 2022, 2:49:15 PMMar 21
to MAGMA User, Ramakrishnan Kannan, Lu...@ornl.gov, sa...@ornl.gov, ah...@icl.utk.edu, to...@icl.utk.edu
Ahmad,

When I try to compile with -DMAGMA_ILP64 flag in make.inc, I am getting the follow error. The magma_getdevice is taking a magma_int_t * function which internally gets translated to magma long long *. Is there any other way we need to enable the 64bit integers?

control/magma_f77.cpp:61:5: error: no matching function for call to 'magma_getdevice'
    magma_getdevice( dev );
    ^~~~~~~~~~~~~~~
./include/magma_auxiliary.h:230:1: note: candidate function not viable: no known conversion from 'magma_int_t *' (aka 'long long *') to 'magma_device_t *' (aka 'int *') for 1st argument
magma_getdevice( magma_device_t* dev );

Ramakrishnan Kannan

unread,
Mar 21, 2022, 3:26:25 PMMar 21
to MAGMA User, Ramakrishnan Kannan, Lu...@ornl.gov, sa...@ornl.gov, ah...@icl.utk.edu, to...@icl.utk.edu
We see that in magma_types.h, under HIP, magma_device_t is typedef-ed as "int" instead of magma_int_t. We could compile the code with 64 bit. But still getting the error with testing_sgemm. 

ramki@login2:~/snapshot/magma/testing>./testing_sgemm
% MAGMA 2.6.1 svn 64-bit magma_int_t, 64-bit pointer.

% HIP runtime 50013601, driver 50013601. OpenMP threads 128.
% device 0: , 1700.0 MHz clock, 65520.0 MiB memory, capability 9.0
% device 1: , 1700.0 MHz clock, 65520.0 MiB memory, capability 9.0
% Mon Mar 21 14:57:54 2022

% Usage: ./testing_sgemm [options] [-h|--help]

% If running lapack (option --lapack), MAGMA and HIP error are both computed
% relative to CPU BLAS result. Else, MAGMA error is computed relative to HIP result.

% transA = No transpose, transB = No transpose
%   M     N     K   MAGMA Gflop/s (ms)  HIP Gflop/s (ms)   CPU Gflop/s (ms)  MAGMA error  HIP error
%========================================================================================================
 1088  1088  1088   1349.97 (   1.91)       0.37 (6974.16)     ---   (  ---  )        -nan        ---    failed
 2112  2112  2112   7653.14 (   2.46)    29443.48 (   0.64)     ---   (  ---  )        -nan        ---    failed
 3136  3136  3136   8300.59 (   7.43)    31763.40 (   1.94)     ---   (  ---  )        -nan        ---    failed
 4160  4160  4160   8574.45 (  16.79)    30021.22 (   4.80)     ---   (  ---  )        -nan        ---    failed
 5184  5184  5184   8722.12 (  31.94)    40628.95 (   6.86)     ---   (  ---  )        -nan        ---    failed
 6208  6208  6208   8677.33 (  55.14)    41418.80 (  11.55)     ---   (  ---  )        -nan        ---    failed
 7232  7232  7232   8668.83 (  87.27)    39725.61 (  19.04)     ---   (  ---  )        -nan        ---    failed
 8256  8256  8256   8846.19 ( 127.23)    42316.52 (  26.60)     ---   (  ---  )        -nan        ---    failed
 9280  9280  9280   8828.66 ( 181.04)    42433.85 (  37.67)     ---   (  ---  )        -nan        ---    failed
10304 10304 10304   8776.33 ( 249.31)    42661.96 (  51.29)     ---   (  ---  )        -nan        ---    failed
ramki@login2:~/snapshot/magma/testing>./testing_sgemm -N 49152,49152,1024
% MAGMA 2.6.1 svn 64-bit magma_int_t, 64-bit pointer.

% HIP runtime 50013601, driver 50013601. OpenMP threads 128.
% device 0: , 1700.0 MHz clock, 65520.0 MiB memory, capability 9.0
% device 1: , 1700.0 MHz clock, 65520.0 MiB memory, capability 9.0
% Mon Mar 21 14:58:41 2022

% Usage: ./testing_sgemm [options] [-h|--help]

% If running lapack (option --lapack), MAGMA and HIP error are both computed
% relative to CPU BLAS result. Else, MAGMA error is computed relative to HIP result.

% transA = No transpose, transB = No transpose
%   M     N     K   MAGMA Gflop/s (ms)  HIP Gflop/s (ms)   CPU Gflop/s (ms)  MAGMA error  HIP error
%========================================================================================================
:0:rocdevice.cpp            :2603: 432845757079 us: 22909: [tid:0x7f5bbae53700] Device::callbackQueue aborting with error : HSA_STATUS_ERROR_MEMORY_FAULT: Agent attempted to access an inaccessible address. code: 0x2b
Aborted

Ed D'Azevedo

unread,
Mar 23, 2022, 12:23:56 AMMar 23
to Ramakrishnan Kannan, MAGMA User, Lu, Hao, sa...@ornl.gov, ah...@icl.utk.edu, Stanimire Tomov
Maybe the driver is using 32bit arithmetic? Perhaps consider using 64bit variables and static casts for  "1D" index calculations?

Just a thought.


--
You received this message because you are subscribed to the Google Groups "MAGMA User" group.
To unsubscribe from this group and stop receiving emails from it, send an email to magma-user+...@icl.utk.edu.
To view this discussion on the web visit https://groups.google.com/a/icl.utk.edu/d/msgid/magma-user/22d4721b-c1b5-4f1e-8b93-9a9472de1a1fn%40icl.utk.edu.

Stanimire Tomov

unread,
Mar 23, 2022, 1:08:19 AMMar 23
to Ramakrishnan Kannan, MAGMA User, Lu...@ornl.gov, sa...@ornl.gov, ah...@icl.utk.edu
Hi Ranki, 

I think when you change the type MAGMA is fine but the problem will still appear and it will be from LAPACK. 
If LAPACK is not compiled with 64-bit integers, the norm computation will be wrong along with the initialization, etc. 
(when 64-bit integers are passed by address and the library expects 32-bit integer, the computation gets wrong).
MKL for example is very good about this and always has support for LAPACK with 64-bit integers but I am not sure 
about the other vendor libraries, e.g., for libsci I couldn’t find 64-bit integer version on spock or Crusher. 
We can check on that and follow up. Are you using libsci? (or if not, which CPU BLAS and LAPACK are you using)

Thanks,
Stan

Stanimire Tomov

unread,
Mar 23, 2022, 1:10:35 AMMar 23
to Ed D'Azevedo, Ramakrishnan Kannan, MAGMA User, Lu, Hao, sa...@ornl.gov, ah...@icl.utk.edu
Ed,
Yes, something like this will help to fix the problem if we can not find
vendor library for 64-bit integers. SLATE is actually always using the 
32-bit integers but computations there are done on tiles (and tiles are 
handled separately) so they never get to compute a large index exceeding
32-bit integer range.
Stan

Ramakrishnan Kannan

unread,
Mar 23, 2022, 11:05:51 AMMar 23
to MAGMA User, to...@icl.utk.edu, MAGMA User, Lu...@ornl.gov, sa...@ornl.gov, ah...@icl.utk.edu, Ramakrishnan Kannan
For CPU BLAS/LAPACK, we are using openblas available on crusher.

The memory is allocated right and fails only in MAGMA SGEMM call. The same application works fine when compiled with HIPBLAS for large matrices. When I compiled MAGMA locally with 64bit, I still witness the following problem. Following is the error with the testing_sgemm compiled with 64-bit.

ramki@login2:~/snapshot/magma/testing>./testing_sgemm -N 49152,49152,1024
% MAGMA 2.6.1 svn 64-bit magma_int_t, 64-bit pointer. 
% HIP runtime 50013601, driver 50013601. OpenMP threads 128.
% device 0: , 1700.0 MHz clock, 65520.0 MiB memory, capability 9.0
% device 1: , 1700.0 MHz clock, 65520.0 MiB memory, capability 9.0
% Wed Mar 23 09:04:31 2022

% Usage: ./testing_sgemm [options] [-h|--help]

% If running lapack (option --lapack), MAGMA and HIP error are both computed
% relative to CPU BLAS result. Else, MAGMA error is computed relative to HIP result.

% transA = No transpose, transB = No transpose
%   M     N     K   MAGMA Gflop/s (ms)  HIP Gflop/s (ms)   CPU Gflop/s (ms)  MAGMA error  HIP error
%========================================================================================================
:0:rocdevice.cpp            :2603: 584396106830 us: 75101: [tid:0x7f6a653cb700] Device::callbackQueue aborting with error : HSA_STATUS_ERROR_MEMORY_FAULT: Agent attempted to access an inaccessible address. code: 0x2b
Aborted

ramki@login2:~/snapshot/magma/testing>ldd ./testing_sgemm
        linux-vdso.so.1 (0x00007fffd63d2000)
        libgcc_s.so.1 => /opt/cray/pe/gcc/11.2.0/snos/lib64/libgcc_s.so.1 (0x00007fe815399000)
        libpthread.so.0 => /lib64/libpthread.so.0 (0x00007fe815179000)
        libm.so.6 => /lib64/libm.so.6 (0x00007fe814e38000)
        librt.so.1 => /lib64/librt.so.1 (0x00007fe814c30000)
        libamdhip64.so.5 => /opt/rocm-5.0.2/lib/libamdhip64.so.5 (0x00007fe813dcd000)
        libmagma.so => /autofs/nccs-svm1_home1/ramki/snapshot/magma/lib/libmagma.so (0x00007fe81087a000)
        libopenblas.so.0 => /usr/lib64/libopenblas.so.0 (0x00007fe80e906000)
        libhipblas.so.0 => /opt/rocm-5.0.2/lib/libhipblas.so.0 (0x00007fe80e6a4000)
        libhipsparse.so.0 => /opt/rocm-5.0.2/lib/libhipsparse.so.0 (0x00007fe80e46e000)
        libstdc++.so.6 => /opt/cray/pe/gcc/11.2.0/snos/lib64/libstdc++.so.6 (0x00007fe80e05c000)
        libomp.so => /opt/rocm-5.0.0/llvm/bin/../lib/libomp.so (0x00007fe8156ce000)
        libc.so.6 => /lib64/libc.so.6 (0x00007fe80dc87000)
        /lib64/ld-linux-x86-64.so.2 (0x00007fe8155b2000)
        libdl.so.2 => /opt/rocm-5.0.2/lib/../../../lib64/libdl.so.2 (0x00007fe80da83000)
        libamd_comgr.so.2 => /opt/rocm-5.0.2/lib64/libamd_comgr.so.2 (0x00007fe80655c000)
        libhsa-runtime64.so.1 => /opt/rocm-5.0.2/lib/libhsa-runtime64.so.1 (0x00007fe806093000)
        libnuma.so.1 => /usr/lib64/libnuma.so.1 (0x00007fe805e87000)
        libgfortran.so.4 => /usr/lib64/libgfortran.so.4 (0x00007fe805ab3000)
        librocsolver.so.0 => /opt/rocm-5.0.2/lib/librocsolver.so.0 (0x00007fe7d1eca000)
        librocblas.so.0 => /opt/rocm-5.0.2/lib/librocblas.so.0 (0x00007fe7c09b8000)
        librocsparse.so.0 => /opt/rocm-5.0.2/lib/librocsparse.so.0 (0x00007fe7a71df000)
        libz.so.1 => /opt/rocm-5.0.2/lib64/../../../lib64/libz.so.1 (0x00007fe7a6fc8000)
        libtinfo.so.6 => /opt/rocm-5.0.2/lib64/../../../lib64/libtinfo.so.6 (0x00007fe7a6d9a000)
        libelf.so.1 => /usr/lib64/libelf.so.1 (0x00007fe7a6b82000)
        libdrm.so.2 => /opt/amdgpu/lib64/libdrm.so.2 (0x00007fe7a696d000)
        libdrm_amdgpu.so.1 => /opt/amdgpu/lib64/libdrm_amdgpu.so.1 (0x00007fe7a6761000)
        libquadmath.so.0 => /opt/cray/pe/gcc/11.2.0/snos/lib64/libquadmath.so.0 (0x00007fe7a651a000)
Reply all
Reply to author
Forward
0 new messages