Hi all,
I am trying to call getri for matrix inversion in MAGMA. I know that matrix inversion is not recommended, but there is a formulation in EM analysis that we cannot really avoid it. :) In my code, I create a queue and then copy matrix to the device's memory and call getrf and getri as the testing code does. This is how I do the call:
magma_queue_t queue = nullptr;
magma_int_t dev;
magmaFloat_ptr dA, dwork;
magma_int_t ldda = magma_roundup(nCells, 32);
magma_int_t info, *ipiv;
magma_int_t ldwork = nCells * magma_get_dgetri_nb(nCells);
// Create a device queue for the existing GPU.
magma_getdevice(&dev);
magma_queue_create(dev, &queue);
magma_smalloc(&dA, nc);
magma_smalloc(&dwork, ldwork);
magma_imalloc_cpu(&ipiv, nCells);
magma_ssetmatrix(nCells, nCells, MKL_invRL, nCells, dA, ldda, queue);
magma_sgetrf_gpu(nCells, nCells, dA, ldda, ipiv, &info);
magma_sgetri_gpu(nCells, dA, ldda, ipiv, dwork, ldwork, &info);
magma_sgetmatrix(nCells, nCells, dA, ldda, MKL_invRL, nCells, queue);
if (info != 0) {
printf("magma_dgetrf_gpu returned error %lld: %s.\n", (long long)info, magma_strerror(info));
}
magma_free(dA);
magma_free(dwork);
magma_free_cpu(ipiv);
Now when I execute my code, I get the following error:
magma_dgetrf_gpu returned error 1: function-specific error, see documentation.
Moreover, when I just test to set the matrix and get it back and compare to see the data is transferred correctly, my test fails, meaning that I apparently do not transfer the data correctly to GPU. I might have done something wrong in creating queue or something else.
Can you please assist me in that?
My other question is, assuming that my code works, it will only utilize device #0. How can I use all GPU devices if I am running my code on a cluster with several GPU devices?
Regards,
Dan