I'm using MAGMA with Fortran code and OpenACC (with PGI compiler). So far I have only the array of matrices on device. I have the following wrapper to the MAGMA v2 interface (adapted from the example one in fortran subdirectory in the source code):
INTERFACE
SUBROUTINE magmablas_zgemm_batched( opA, opB, m, n, k, alpha, dptr_A, ldda, dptr_B, lddb, beta, dptr_C, lddc, batchCount, queue) BIND(C, NAME='magmablas_zgemm_batched')
USE ISO_C_BINDING
INTEGER(C_INT), VALUE :: opA, opB, m, n, k, ldda, lddb, lddc
COMPLEX(C_DOUBLE_COMPLEX), VALUE :: alpha, beta
TYPE(C_PTR), DIMENSION(batchCount) :: dptr_A, dptr_B, dptr_C
TYPE(C_PTR), VALUE :: queue
END SUBROUTINE
END INTERFACE
opA = magma_trans_const( 'N' )
opB = magma_trans_const( 'N' )
!$ACC DATA PRESENT(matA, matB, matC)
!$ACC HOST_DATA USE_DEVICE(matA, matB, matC)
DO ibatch = 1, batchCount
dptr_A(ibatch) = C_LOC( matA(1,1,ibatch) )
dptr_B(ibatch) = C_LOC( matB(1,1,ibatch) )
dptr_C(ibatch) = C_LOC( matC(1,1,ibatch) )
END DO
CALL magmablas_zgemm_batched( opA, opB, m, n, k, alpha, dptr_A, ldda, dptr_B, lddb, beta, dptr_C, lddc, batchCount, queue )
!$ACC END HOST_DATA
!$ACC END DATA
If I use managed memory (-ta=tesla:managed) the code executes successfully.
If I don't use managed memory and run it through cuda-gdb, I get CUDA_EXCEPTION_14, Warp illegal address, which suggests that one of the arguments need to be on device, instead of on host?
Thanks,
Wileam Y. Phan