Error Handeling in MAGMA

46 views
Skip to first unread message

Aran Nokan

unread,
Mar 7, 2022, 9:04:34 AM3/7/22
to MAGMA User
Hi,

How can I find the errors in MAGMA? I have edited a function in MAGMA and the compilation process is fine, but during execution I am seeing memory problems.

I am using the below lines of code to check the errors, but it is not enough. I want to detect the function that contains the error.

         cudaError_t err = cudaGetLastError();

      if ( err != cudaSuccess )
           {
              printf("CUDA Error: %s\n", cudaGetErrorString(err));
            }
 
Now I am trying to use a Macro for that purpose.

#define gpuErrchk(ans) { gpuAssert((ans), __FILE__, __LINE__); }
inline void gpuAssert(cudaError_t code, const char *file, int line, bool abort =
true) {
if (code != cudaSuccess) {
fprintf(stderr, "GPUassert: %s %s %d\n", cudaGetErrorString(code), file,
line);
if (abort)
exit(code);
}
}

But it seems that the data type is not the same as the MAGMA function data type.

gpuErrchk(MAGMA_FUNC());

error: invalid conversion from ‘magma_int_t’ {aka ‘int’} to ‘cudaError_t’ {aka ‘cudaError’} [-fpermissive]
 
Could you please give me some ideas? or Do we have any documents for MAGMA error management?

Best regards,
Aran

Simplice

unread,
Mar 7, 2022, 9:55:47 AM3/7/22
to MAGMA User, noka...@gmail.com
Hi Aran,

It seems that you are mixing CUDA and MAGMA error types. Most magma routines return errors of the type magma_int_t.

I use the code below to check errors :

magma_int_t err = magma_func()
if(err!=MAGMA_SUCCESS){
    printf("ERROR: MAGMA check failed, error: %d\n", err);
    exit(1);
}

Regards,
Simplice

Aran Nokan

unread,
Mar 7, 2022, 10:59:00 AM3/7/22
to Simplice, MAGMA User
Thanks Simplice,

So for a void type function what will you do?
e.g.

void magmablas_strsm(
    magma_side_t side, magma_uplo_t uplo, magma_trans_t transA, magma_diag_t diag,
    magma_int_t m, magma_int_t n,
    float alpha,
    magmaFloat_const_ptr dA, magma_int_t ldda,
    magmaFloat_ptr       dB, magma_int_t lddb,
    magma_queue_t queue )

Simplice

unread,
Mar 7, 2022, 12:33:21 PM3/7/22
to MAGMA User, noka...@gmail.com, MAGMA User

Magma doesn't return error codes for all magma functions, maybe MAGMA team could give you more details about that. But errors are checked internally in each function and the routine magma_xerbla( __func__,  info )  is always called in case of an error.

It is therefore important to check the return of the utility functions (magma_init, magma_malloc, magma_dmalloc, ...) before calling the desired magma or magmablas routines.

Regards,
Simplice

Stanimire Tomov

unread,
Mar 7, 2022, 2:10:44 PM3/7/22
to Simplice, MAGMA User, noka...@gmail.com
Yes, I confirm what Simplice said about the MAGMA errors.
Further detail is that magma has a macro (see interface_cuda/error.h)

#define check_error( err ) \
        magma_xerror( err, __func__, __FILE__, __LINE__ )

that can be called like this

cudaError_t err = cuda_func();
check_errorerr );

or if it is MAGMA function, as Simplice was saying,

magma_int_t err = magma_func();
check_errorerr );

Note that we have overloaded the check, so it works baser on what is err - either
CUDA or MAGMA, etc.

To have this in effect, make sure your make.inc does have -DNDEBUG option.
We have put the check on a few places like memory allocations, streams, etc.
If you want to put checks in your own file that you are building using magma,
you can put more of these checks, e.g., after every function call.

If you want to modify your macro, probably you can also overload gpuAssert, e.g.,
inline void gpuAssert(magma_int_t code, const char *file, int line, bool abort=true)
{
   check_error( code );
}

Stan




--
You received this message because you are subscribed to the Google Groups "MAGMA User" group.
To unsubscribe from this group and stop receiving emails from it, send an email to magma-user+...@icl.utk.edu.
To view this discussion on the web visit https://groups.google.com/a/icl.utk.edu/d/msgid/magma-user/6e241726-5bf4-422e-95e3-6850210dd3b4n%40icl.utk.edu.

Reply all
Reply to author
Forward
0 new messages