MAGMA - CUBLAS error: invalid value (7) after calling magma_dsetmatrix

116 views
Skip to first unread message

henry wasker

unread,
Oct 14, 2021, 10:13:26 AM10/14/21
to MAGMA User
Hello !

I have the following function which inverses a matrix:

    void matrix_inverse_magma(vector<vector<double>> const &F_matrix, vector<vector<double>> &F_output) {
    
      // Index for loop and arrays
      int i, j, ip, idx;
    
      // Start magma part
      magma_init (); // initialize Magma
      magma_queue_t queue=NULL;
      magma_int_t dev=0;
      magma_queue_create(dev ,&queue );
      double gpu_time , *dwork; // dwork - workspace
      magma_int_t ldwork; // size of dwork
      magma_int_t *piv, info; // piv - array of indices of inter -
      magma_int_t m = F_matrix.size();
      cout << "m = " << m << endl;
      printf("m = %d\n", m);
      // changed rows; a - mxm matrix
      magma_int_t mm=m*m; // size of a, r, c
      double *a; // a- mxm matrix on the host
      double *d_a; // d_a - mxm matrix a on the device
      double *d_r; // d_r - mxm matrix r on the device
      double *d_c; // d_c - mxm matrix c on the device
     
      magma_int_t ione = 1;
      magma_int_t ISEED [4] = { 0,0,0,1 }; // seed
      magma_int_t err;
      const double alpha = 1.0; // alpha =1
      const double beta = 0.0; // beta=0
      ldwork = m * magma_get_dgetri_nb( m ); // optimal block size
      // allocate matrices
      err = magma_dmalloc_cpu( &a , mm ); // host memory for a
      err = magma_dmalloc( &d_a , mm ); // device memory for a
      err = magma_dmalloc( &d_r , mm ); // device memory for r
      err = magma_dmalloc( &d_c , mm ); // device memory for c
      err = magma_dmalloc( &dwork , ldwork );// dev. mem. for ldwork
      piv=( magma_int_t *) malloc(m*sizeof(magma_int_t ));// host mem.
    
      // Convert matrix to *a double pointer
      for (i = 0; i<m; i++){
        for (j = 0; j<m; j++){
          idx = i*m + j;
          a[idx] = F_matrix[i][j];
        }
      }
    
     
      // TEST working : generate random matrix a // for piv
      //lapackf77_dlarnv (&ione ,ISEED ,&mm ,a); // randomize a
    
    
      magma_dsetmatrix( m, m, a, m, d_a, m, queue); // copy a -> d_a
      magmablas_dlacpy(MagmaFull, m, m, d_a , m, d_r, m, queue);//d_a ->d_r
    
      // find the inverse matrix: d_a*X=I using the LU factorization
      // with partial pivoting and row interchanges computed by
      // magma_dgetrf_gpu; row i is interchanged with row piv(i);
      // d_a -mxm matrix; d_a is overwritten by the inverse
    
      gpu_time = magma_sync_wtime(NULL);
      magma_dgetrf_gpu( m, m, d_a, m, piv, &info);
      magma_dgetri_gpu(m, d_a, m, piv, dwork, ldwork, &info);
      gpu_time = magma_sync_wtime(NULL)-gpu_time;
    
      magma_dgemm(MagmaNoTrans ,MagmaNoTrans ,m, m, m, alpha, d_a ,m,
      d_r ,m,beta ,d_c ,m,queue); // multiply a^-1*a
      printf("magma_dgetrf_gpu + magma_dgetri_gpu time: %7.5f sec.\
      \n",gpu_time );
      magma_dgetmatrix( m, m, d_c , m, a, m, queue); // copy d_c ->a
      printf("upper left corner of a^-1*a:\n");
      magma_dprint( 4, 4, a, m ); // part of a^-1*a
    
      // Save Final matrix
      for (i = 0; i<m; i++){
        for (j = 0; j<m; j++){
          idx = i*m + j;
          F_output[i][j] = a[idx];
        }
      }
    
      free(a); // free host memory
      free(piv); // free host memory
      magma_free(d_a); // free device memory
      magma_free(d_r); // free device memory
      magma_free(d_c); // free device memory
      magma_queue_destroy(queue); // destroy queue
      magma_finalize ();
      // End magma part
    
    }

At the execution, I have the following error on this function (project_magma.cpp:192):

    magma_dsetmatrix( m, m, a, m, d_a, m, queue); // copy a -> d_a

**Error:**

    CUBLAS error: invalid value (7) in matrix_inverse_magma at project_magma.cpp:192
    On entry to magmablas_dlacpy, parameter 5 had an illegal value (info = -5)
    On entry to magma_dgetrf_gpu_expert, parameter 4 had an illegal value (info = -4)
    On entry to magma_dgetri_gpu, parameter 3 had an illegal value (info = -3)
     ** On entry to DGEMM  parameter number 8 had an illegal value
    magma_dgetrf_gpu + magma_dgetri_gpu time: 0.00001 sec.
    CUBLAS error: invalid value (7) in matrix_inverse_magma at project_magma.cpp:209
    upper left corner of a^-1*a:
    On entry to magma_dprint, parameter 4 had an illegal value (info = -4)

**Important remark:** If I am using the filling of `a` array:

    lapackf77_dlarnv (&ione ,ISEED ,&mm ,a); // randomize a

Code works fine and correct inversion is performed.

If I am using the filling of `a` array by doing:

      // Convert matrix to *a double pointer
      for (i = 0; i<m; i++){
        for (j = 0; j<m; j++){
          idx = i*m + j;
          a[idx] = F_matrix[i][j];
        }
      }

Then, I will get the errors mentioned above.

I don't really understand where this error could come from since with the LAPACKE routines version, I have no problems.

How can I fix this error?

Stanimire Tomov

unread,
Oct 14, 2021, 10:37:36 AM10/14/21
to henry wasker, MAGMA User
Hi,
The code looks good.
Sounds like accessing the F_matrix leads to these problems.
Did you check that each of the F_matrix[i] is at least of size m?
Also, what if you put a random value here:
 // Convert matrix to *a double pointer
      for (i = 0; i<m; i++){
        for (j = 0; j<m; j++){
          idx = i*m + j;
          a[idx] =rand() / (double)(RAND_MAX);
        }
      }
i.e., if this works most probably F_matrix is indeed giving the problems?
Stan



--
You received this message because you are subscribed to the Google Groups "MAGMA User" group.
To unsubscribe from this group and stop receiving emails from it, send an email to magma-user+...@icl.utk.edu.
To view this discussion on the web visit https://groups.google.com/a/icl.utk.edu/d/msgid/magma-user/e5853758-b11d-4591-a09e-4e5bbf8a7b6fn%40icl.utk.edu.

henry wasker

unread,
Oct 14, 2021, 11:28:05 AM10/14/21
to MAGMA User, to...@icl.utk.edu, MAGMA User, henry wasker
Thanks for your quick answer.

What it is difficult to understand is that the following function matrix_inverse_lapack below is working fine : this is the Intel OneAPI version
of the function matrix_inverse_magma, I was using it before the porting to MAGMA version. This function uses the LAPACKE function
"LAPACKE_dgetrf" and "LAPACKE_dgetrf". Maybe I have compiled wrong MAGMA...

// Passing Matrixes by Reference
void matrix_inverse_lapack(vector<vector<double>> const &F_matrix, vector<vector<double>> &F_output) {


  // Index for loop and arrays
  int i, j, ip, idx;

  // Size of F_matrix
  int N = F_matrix.size();

  int *IPIV = new int[N];

  // Statement of main array to inverse
  double *arr = new double[N*N];

  // Statement of returned matrix of size N
  //vector<vector<double> > F_final(N, vector<double>(N));   

  // Output Diagonal block
  double *diag = new double[N];

  for (i = 0; i<N; i++){
    for (j = 0; j<N; j++){
      idx = i*N + j;
      arr[idx] = F_matrix[i][j];
    }
  }

  // LAPACKE routines
  int info1 = LAPACKE_dgetrf(LAPACK_ROW_MAJOR, N, N, arr, N, IPIV);
  int info2 = LAPACKE_dgetri(LAPACK_ROW_MAJOR, N, arr, N, IPIV);

 // Output
  for (i = 0; i<N; i++){
    for (j = 0; j<N; j++){
      idx = i*N + j;
      F_output[i][j] = arr[idx];
    }
  }

  delete[] IPIV;
  delete[] arr;
}

This is very disturbing, it seems however I have done the things correctly but a detail puts the mess.

Any help is welcome, Regards

henry wasker

unread,
Oct 14, 2021, 3:41:53 PM10/14/21
to MAGMA User, henry wasker, to...@icl.utk.edu, MAGMA User
Problem solved. This came from the fact that I did an extra-copy from GPU to CPU not necessary :
It is enough to do :

      magma_dgetmatrix( m, m, d_a , m, a, m, queue); // copy d_a ->a

and remove the multiplication a*a^1 :

      magma_dgemm(MagmaNoTrans ,MagmaNoTrans ,m, m, m, alpha, d_a ,m,
      d_r ,m,beta ,d_c ,m,queue); // multiply a^-1*a
      printf("magma_dgetrf_gpu + magma_dgetri_gpu time: %7.5f sec.\
      \n",gpu_time );

Thanks all !

henry wasker

unread,
Oct 14, 2021, 3:42:03 PM10/14/21
to MAGMA User, henry wasker, to...@icl.utk.edu, MAGMA User
I think that I am on a track :

if I do in the OneAPI LAPACKE version function "matrix_inverse_lapack" :

// Passing Matrixes by Reference
  void matrix_inverse_lapack(vector<vector<double>> const &F_matrix, vector<vector<double>> &F_output) {

    // Index for loop and arrays
    int i, j, ip, idx;

    // Size of F_matrix
    int N = F_matrix.size();
    cout << "m = " << N << endl;
...
...
}

Then at the execution, when I have "m=0", I have no warnings from LAPACK routines "LAPACKE_dgetrf" and "LAPACKE_dgetri".
This would mean that the inversion is not performed, that is to say, the LAPACKE routines might be ignored.

But with MAGMA, it doesn't seem to be the same things, if "m=0", the inversion could be performed and so generates the
error message at execution :

m = 0
CUBLAS error: invalid value (7) in matrix_inverse_magma at XSAF_C_magma.cpp:193

On entry to magmablas_dlacpy, parameter 5 had an illegal value (info = -5)
On entry to magma_dgetrf_gpu_expert, parameter 4 had an illegal value (info = -4)
On entry to magma_dgetri_gpu, parameter 3 had an illegal value (info = -3)
 ** On entry to DGEMM  parameter number 8 had an illegal value
magma_dgetrf_gpu + magma_dgetri_gpu time: 0.00001 sec.
CUBLAS error: invalid value (7) in matrix_inverse_magma at XSAF_C_magma.cpp:210

upper left corner of a^-1*a:
On entry to magma_dprint, parameter 4 had an illegal value (info = -4)

whereas for the following matrix to inverse, I have no problem since "m" different from 0 :

Computing high l's covariance matrix
m = 1188
m = 1188
magma_dgetrf_gpu + magma_dgetri_gpu time: 0.02796 sec.

upper left corner of a^-1*a:
[
   1.0000   0.0000   0.0000  -0.0000
   0.0000   1.0000  -0.0000   0.0000
  -0.0000   0.0000   1.0000   0.0000
  -0.0000   0.0000  -0.0000   1.0000
];

Is there a way to reproduce the same behavior of LAPACKE routines when "m=0"  for MAGMA functions "magma_dgetrf_gpu" and "magma_dgetri_gpu" ?

ps : by the way, it is strange that LAPACKE inversion functions don't generate error messages when "m=0". I have to go deeper in the debugging.

Regards.

Le jeudi 14 octobre 2021 à 17:28:05 UTC+2, henry wasker a écrit :

Stanimire Tomov

unread,
Oct 14, 2021, 3:59:15 PM10/14/21
to MAGMA User, henry....@gmail.com, Stanimire Tomov, MAGMA User
That's good you found what the problem was and are moving ahead!
Regarding size 0 I checked the codes and we explicitly check at the beginning, similar to LAPACK, if n == 0 and exit, e.g., see lines 104 .. 106 in dgetri_gpu.cpp:

    /* Quick return if possible */
    if ( n == 0 )
        return *info;

If there is problem probably is due to some memory corruption from somewhere else?
Also, magma has testers in the testing directory for almost all routines. If we test for example dgetri_gpu with n=0 I get:
Stans-MacBook-Pro:testing tomov$ ./testing_dgetri_gpu -n 0
% MAGMA 2.6.1 svn 32-bit magma_int_t, 64-bit pointer.
Compiled with CUDA support for 3.0
% CUDA runtime 7000, driver 7050. MAGMA not compiled with OpenMP.
% device 0: GeForce GT 750M, 925.5 MHz clock, 2047.6 MiB memory, capability 3.0
% Thu Oct 14 15:56:15 2021
% Usage: ./testing_dgetri_gpu [options] [-h|--help]

%   N   CPU Gflop/s (sec)   GPU Gflop/s (sec)   ||I - A*A^{-1}||_1 / (N*cond(A))
%===============================================================================
** On entry to DGETRI, parameter number  3 had an illegal value
lapackf77_dgetri returned error -3: invalid argument.

On entry to magma_dgetrf_gpu_expert, parameter 4 had an illegal value (info = -4)
magma_dgetrf_gpu returned error -4: invalid argument.

On entry to magma_dgetri_gpu, parameter 3 had an illegal value (info = -3)
magma_dgetri_gpu returned error -3: invalid argument.
    0     ---   (  ---  )      0.00 (   0.00)
so what happened here I think we set lda to be n but by documentation we want lda >= max(1, n),
so the errors detected are for the lda. LAPACK dgetri detected it as well as magma_dgetri_gpu.

Stan
Reply all
Reply to author
Forward
0 new messages