Dear All,
The testing_sgemm is able to support a maximum output matrix of size M=32768, N=32768. When I try to run larger sizes, I am getting the following error.
:0:rocdevice.cpp :2603: 408817871536 us: 44300: [tid:0x7f6b70ba8700] Device::callbackQueue aborting with error : HSA_STATUS_ERROR_MEMORY_FAULT: Agent attempted to access an inaccessible address. code: 0x2b
HIPBLAS could handle large matrix sizes for sgemm.
1. What is the maximum output matrix size that MAGMA could handle?
2. How to run MAGMA for large matrix sizes?
For your reference, I am pasting the entire console output.
Faulty run with large matrix size.
ramki@login2:~/magma/testing>./testing_sgemm -N 49152,49152,1024
% MAGMA 2.6.1 32-bit magma_int_t, 64-bit pointer.
% HIP runtime 50013601, driver 50013601. OpenMP threads 128.
% device 0: , 1700.0 MHz clock, 65520.0 MiB memory, capability 9.0
% device 1: , 1700.0 MHz clock, 65520.0 MiB memory, capability 9.0
% Mon Mar 21 08:18:15 2022
% Usage: ./testing_sgemm [options] [-h|--help]
% If running lapack (option --lapack), MAGMA and HIP error are both computed
% relative to CPU BLAS result. Else, MAGMA error is computed relative to HIP result.
% transA = No transpose, transB = No transpose
% M N K MAGMA Gflop/s (ms) HIP Gflop/s (ms) CPU Gflop/s (ms) MAGMA error HIP error
%========================================================================================================
:0:rocdevice.cpp :2603: 408817871536 us: 44300: [tid:0x7f6b70ba8700] Device::callbackQueue aborting with error : HSA_STATUS_ERROR_MEMORY_FAULT: Agent attempted to access an inaccessible address. code: 0x2b
Aborted (core dumped)
Correct run
ramki@login2:~/magma/testing>./testing_sgemm -N 32768,32768,1024
% MAGMA 2.6.1 32-bit magma_int_t, 64-bit pointer.
% HIP runtime 50013601, driver 50013601. OpenMP threads 128.
% device 0: , 1700.0 MHz clock, 65520.0 MiB memory, capability 9.0
% device 1: , 1700.0 MHz clock, 65520.0 MiB memory, capability 9.0
% Mon Mar 21 08:19:16 2022
% Usage: ./testing_sgemm [options] [-h|--help]
% If running lapack (option --lapack), MAGMA and HIP error are both computed
% relative to CPU BLAS result. Else, MAGMA error is computed relative to HIP result.
% transA = No transpose, transB = No transpose
% M N K MAGMA Gflop/s (ms) HIP Gflop/s (ms) CPU Gflop/s (ms) MAGMA error HIP error
%========================================================================================================
32768 32768 1024 6330.29 ( 347.38) 347.26 (6332.52) --- ( --- ) 1.55e-09 --- ok
Regards,
Ramki