dgetrf_batched and dgeqrf_batched device memory issue at higher batch sizes

114 views
Skip to first unread message

James Neuhaus

unread,
Aug 28, 2020, 10:57:50 AM8/28/20
to MAGMA User

Hello,

I am trying to write interfaces for PGI Fortran for dgetrf_batched and dgeqrf_batched to ultimately run on Summit. On both my local machine and Summit my test code generates the the following error when batch sizes are over a specific number (e.g. 128 for dgeqrf):

line 57: cudaLaunchKernel returned status 700: an illegal memory access was encountered
Failing in Thread:1
call to cuMemFreeHost returned error 700: Illegal address during kernel execution

From cuda-gdb:

warning: Cuda API error detected: cudaMemset returned (0x1)


CUDA Exception: Warp Illegal Address

Thread 1 "dqmc" received signal CUDA_EXCEPTION_14, Warp Illegal Address.
[Switching focus to CUDA kernel 0, grid 10, block (0,0,240), thread (0,0,0), device 0, sm 0, warp 6, lane 0]
0x000000001966b9f0 in dlaset_full_kernel_batched(int, int, double, double, double**, int)<<<(1,1,1000),(64,1,1)>>> ()


This is consistent across PGI 20.1 and the PGI 19 compilers on Summit with MAGMA 2.5.1, as well as my local machine (which is running a V100) using PGI 19.10 and MAGMA 2.5.3, which leads me to believe the issue is with my interface code.

Here's the relevant code for dgeqrf_batched:

   interface
      subroutine magma_dgeqrf_batchedF( &
         m, n, dA_array, lda, dtau_array, info_array, batchcount, queue) &
         bind(C, name="magma_dgeqrf_batched")
         use iso_c_binding
         integer(c_int), value  :: m, n, lda, batchcount
         type(c_ptr), value  :: dA_array    !! double_complex**
         type(c_ptr), value  :: dtau_array  !! int**
         type(c_ptr), value  :: info_array  !! int*
         type(c_ptr), value  :: queue
      end subroutine
   end interface

  subroutine magma2_dgeqrf_batched( batch_size, m,n, A, ld1, Tau,info)
      use iso_c_binding
      use magma2_common
      implicit none
      integer  :: m,n, ld1, batch_size, i, istat
      integer(c_int) :: cm, cn, cld1, cbatch_size
      double precision, target  :: A(0:ld1-1,0: n-1, 1:batch_size)
      double precision, target  :: Tau(0:m-1,1:batch_size)
      integer, intent(out) :: info(1:batch_size)
      type(c_ptr), dimension(1:batch_size) :: dptr_A, dptr_Tau
      type(c_ptr) :: iptr_info

      cm = m
      cn = n
      cld1 = ld1
      cbatch_size = batch_size

      !$acc data present(Tau,A,info) pcreate(dptr_A,dptr_Tau,iptr_info)
      !$acc host_data use_device(A,Tau,info)
      !$acc kernels loop private(i)
      do i = 1, batch_size
        dptr_A(i) = c_loc(A(LBOUND(A,1),LBOUND(A,2), i))
        dptr_Tau(i) = c_loc(Tau(LBOUND(Tau,1),i))
      enddo
      !$acc end kernels
      !$acc serial
      iptr_info = c_loc(info(0))
      !$acc end serial
      !$acc end host_data
      !$acc host_data use_device(A,Tau,info,dptr_A,dptr_Tau,iptr_info)
      call magma_dgeqrf_batchedF(cm,cn,c_loc(dptr_A),cld1,c_loc(dptr_tau),c_loc(iptr_info),cbatch_size,magma_queue)
      call magma_queue_sync(magma_queue)
      !$acc end host_data
      !$acc end data
   end subroutine magma2_dgeqrf_batched


Any help would be greatly appreciated.

Thanks,
James

James Neuhaus

unread,
Aug 29, 2020, 2:18:18 PM8/29/20
to MAGMA User, James Neuhaus
Here is some additional information taken from a portion of cuda-memcheck's output.  If you want to see the whole file please let me know.

========= CUDA-MEMCHECK
========= Program hit cudaErrorInvalidValue (error 1) due to "invalid argument" on CUDA API call to cudaMemset.
=========     Saved host backtrace up to driver entry point at error
=========     Host Frame:/lib64/libcuda.so [0x463c2c]
=========     Host Frame:/ccs/home/jneuhaus/scratch/cph102/magmatest/lib/libmagma.so [0x5eb0c4]
=========     Host Frame:/ccs/home/jneuhaus/scratch/cph102/magmatest/lib/libmagma.so (magma_dgeqrf_batched + 0x130) [0x339000]
=========     Host Frame:./dqmc [0x4710]
=========     Host Frame:./dqmc [0x12e38]
=========     Host Frame:./dqmc [0x2b54]
=========     Host Frame:/lib64/libc.so.6 [0x25200]
=========     Host Frame:/lib64/libc.so.6 (__libc_start_main + 0xc4) [0x253f4]
=========
========= Invalid __global__ write of size 8
=========     at 0x00000670 in dlaset_full_kernel_batched(int, int, double, double, double**, int)
=========     by thread (31,0,0) in block (0,0,0)
=========     Address 0x2000000000f8 is out of bounds
=========     Device Frame:dlaset_full_kernel_batched(int, int, double, double, double**, int) (dlaset_full_kernel_batched(int, int, double, double, double**, int) : 0x670)
=========     Saved host backtrace up to driver entry point at kernel launch time
=========     Host Frame:/lib64/libcuda.so (cuLaunchKernel + 0x26c) [0x2f0aec]
=========     Host Frame:/ccs/home/jneuhaus/scratch/cph102/magmatest/lib/libmagma.so [0x5914a8]
=========     Host Frame:/ccs/home/jneuhaus/scratch/cph102/magmatest/lib/libmagma.so [0x5ef3c0]
=========     Host Frame:/ccs/home/jneuhaus/scratch/cph102/magmatest/lib/libmagma.so (_Z52__device_stub__Z26dlaset_full_kernel_batchediiddPPdiiiddPPdi + 0xe8) [0x479928]
=========     Host Frame:/ccs/home/jneuhaus/scratch/cph102/magmatest/lib/libmagma.so (_Z26dlaset_full_kernel_batchediiddPPdi + 0x18) [0x479968]
=========     Host Frame:/ccs/home/jneuhaus/scratch/cph102/magmatest/lib/libmagma.so (magmablas_dlaset_batched + 0x11c) [0x479d4c]
=========     Host Frame:/ccs/home/jneuhaus/scratch/cph102/magmatest/lib/libmagma.so (magma_dgeqrf_expert_batched + 0x4d4) [0x33a9d4]
=========     Host Frame:/ccs/home/jneuhaus/scratch/cph102/magmatest/lib/libmagma.so (magma_dgeqrf_batched + 0x274) [0x339144]
=========     Host Frame:./dqmc [0x4710]
=========     Host Frame:./dqmc [0x12e38]
=========     Host Frame:./dqmc [0x2b54]
=========     Host Frame:/lib64/libc.so.6 [0x25200]
=========     Host Frame:/lib64/libc.so.6 (__libc_start_main + 0xc4) [0x253f4]
=========
========= Program hit cudaErrorLaunchFailure (error 719) due to "unspecified launch failure" on CUDA API call to cudaFree.
=========     Saved host backtrace up to driver entry point at error
=========     Host Frame:/lib64/libcuda.so [0x463c2c]
=========     Host Frame:/autofs/nccs-svm1_sw/summit/.swci/0-core/opt/spack/20180914/linux-rhel7-ppc64le/gcc-4.8.5/pgi-20.1-mkkhy2lfycqu757gtk5bwzaofoyihmz5/linuxpower/2020/cuda/10.1/lib64/libcublasLt.so\
.10 [0x288934]
=========     Host Frame:/autofs/nccs-svm1_sw/summit/.swci/0-core/opt/spack/20180914/linux-rhel7-ppc64le/gcc-4.8.5/pgi-20.1-mkkhy2lfycqu757gtk5bwzaofoyihmz5/linuxpower/2020/cuda/10.1/lib64/libcublasLt.so\
.10 (cublasLtCtxInit + 0x28) [0x57948]
=========     Host Frame:/autofs/nccs-svm1_sw/summit/.swci/0-core/opt/spack/20180914/linux-rhel7-ppc64le/gcc-4.8.5/pgi-20.1-mkkhy2lfycqu757gtk5bwzaofoyihmz5/linuxpower/2020/cuda/10.1/lib64/libcublas.so.1\
0 (cublasCreate_v2 + 0xe4) [0x149fa4]
=========     Host Frame:/ccs/home/jneuhaus/scratch/cph102/magmatest/lib/libmagma.so (magma_queue_create_internal + 0x98) [0x138d18]
=========     Host Frame:/ccs/home/jneuhaus/scratch/cph102/magmatest/lib/libmagma.so (magma_dgeqrf_expert_batched + 0x538) [0x33aa38]
=========     Host Frame:/ccs/home/jneuhaus/scratch/cph102/magmatest/lib/libmagma.so (magma_dgeqrf_batched + 0x274) [0x339144]
=========     Host Frame:./dqmc [0x4710]
=========     Host Frame:./dqmc [0x12e38]
=========     Host Frame:./dqmc [0x2b54]
=========     Host Frame:/lib64/libc.so.6 [0x25200]
=========     Host Frame:/lib64/libc.so.6 (__libc_start_main + 0xc4) [0x253f4]
=========
========= Program hit cudaErrorLaunchFailure (error 719) due to "unspecified launch failure" on CUDA API call to cudaFree.
=========     Saved host backtrace up to driver entry point at error
=========     Host Frame:/lib64/libcuda.so [0x463c2c]
=========     Host Frame:/autofs/nccs-svm1_sw/summit/.swci/0-core/opt/spack/20180914/linux-rhel7-ppc64le/gcc-4.8.5/pgi-20.1-mkkhy2lfycqu757gtk5bwzaofoyihmz5/linuxpower/2020/cuda/10.1/lib64/libcusparse.so\
.10 [0x591114]
=========     Host Frame:/autofs/nccs-svm1_sw/summit/.swci/0-core/opt/spack/20180914/linux-rhel7-ppc64le/gcc-4.8.5/pgi-20.1-mkkhy2lfycqu757gtk5bwzaofoyihmz5/linuxpower/2020/cuda/10.1/lib64/libcusparse.so\
.10 (cusparseCreate + 0x30) [0xb9cf0]
=========     Host Frame:/ccs/home/jneuhaus/scratch/cph102/magmatest/lib/libmagma.so (magma_queue_create_internal + 0xc8) [0x138d48]
=========     Host Frame:/ccs/home/jneuhaus/scratch/cph102/magmatest/lib/libmagma.so (magma_dgeqrf_expert_batched + 0x538) [0x33aa38]
=========     Host Frame:/ccs/home/jneuhaus/scratch/cph102/magmatest/lib/libmagma.so (magma_dgeqrf_batched + 0x274) [0x339144]
=========     Host Frame:./dqmc [0x4710]
=========     Host Frame:./dqmc [0x12e38]
=========     Host Frame:./dqmc [0x2b54]
=========     Host Frame:/lib64/libc.so.6 [0x25200]
=========     Host Frame:/lib64/libc.so.6 (__libc_start_main + 0xc4) [0x253f4]

Mark Gates

unread,
Aug 31, 2020, 12:24:28 PM8/31/20
to James Neuhaus, MAGMA User
A few observations below.

On Fri, Aug 28, 2020 at 10:57 AM 'James Neuhaus' via MAGMA User <magma...@icl.utk.edu> wrote:

Hello,

I am trying to write interfaces for PGI Fortran for dgetrf_batched and dgeqrf_batched to ultimately run on Summit. On both my local machine and Summit my test code generates the the following error when batch sizes are over a specific number (e.g. 128 for dgeqrf):

line 57: cudaLaunchKernel returned status 700: an illegal memory access was encountered
Failing in Thread:1
call to cuMemFreeHost returned error 700: Illegal address during kernel execution

From cuda-gdb:

warning: Cuda API error detected: cudaMemset returned (0x1)


CUDA Exception: Warp Illegal Address

Thread 1 "dqmc" received signal CUDA_EXCEPTION_14, Warp Illegal Address.
[Switching focus to CUDA kernel 0, grid 10, block (0,0,240), thread (0,0,0), device 0, sm 0, warp 6, lane 0]
0x000000001966b9f0 in dlaset_full_kernel_batched(int, int, double, double, double**, int)<<<(1,1,1000),(64,1,1)>>> ()


This is consistent across PGI 20.1 and the PGI 19 compilers on Summit with MAGMA 2.5.1, as well as my local machine (which is running a V100) using PGI 19.10 and MAGMA 2.5.3, which leads me to believe the issue is with my interface code.

Here's the relevant code for dgeqrf_batched:

   interface
      subroutine magma_dgeqrf_batchedF( &
         m, n, dA_array, lda, dtau_array, info_array, batchcount, queue) &
         bind(C, name="magma_dgeqrf_batched")
         use iso_c_binding
         integer(c_int), value  :: m, n, lda, batchcount
         type(c_ptr), value  :: dA_array    !! double_complex**
         type(c_ptr), value  :: dtau_array  !! int**

These comments are wrong — for dgeqrf, dA_array is double** and dtau_array is double**. But that shouldn't affect the code.

 
         type(c_ptr), value  :: info_array  !! int*
         type(c_ptr), value  :: queue
      end subroutine
   end interface

  subroutine magma2_dgeqrf_batched( batch_size, m,n, A, ld1, Tau,info)
      use iso_c_binding
      use magma2_common
      implicit none
      integer  :: m,n, ld1, batch_size, i, istat
      integer(c_int) :: cm, cn, cld1, cbatch_size
      double precision, target  :: A(0:ld1-1,0: n-1, 1:batch_size)
      double precision, target  :: Tau(0:m-1,1:batch_size)

Tau is of size min( m, n ). So this code is assuming m ≥ n.

 
      integer, intent(out) :: info(1:batch_size)
      type(c_ptr), dimension(1:batch_size) :: dptr_A, dptr_Tau
      type(c_ptr) :: iptr_info

      cm = m
      cn = n
      cld1 = ld1
      cbatch_size = batch_size

      !$acc data present(Tau,A,info) pcreate(dptr_A,dptr_Tau,iptr_info)
      !$acc host_data use_device(A,Tau,info)
      !$acc kernels loop private(i)
      do i = 1, batch_size
        dptr_A(i) = c_loc(A(LBOUND(A,1),LBOUND(A,2), i))
        dptr_Tau(i) = c_loc(Tau(LBOUND(Tau,1),i))
      enddo
      !$acc end kernels
      !$acc serial
      iptr_info = c_loc(info(0))

Should that be info(1)? I find it difficult to follow since the code seems to mix 1-based and 0-based indexing.

 
      !$acc end serial
      !$acc end host_data
      !$acc host_data use_device(A,Tau,info,dptr_A,dptr_Tau,iptr_info)
      call magma_dgeqrf_batchedF(cm,cn,c_loc(dptr_A),cld1,c_loc(dptr_tau),c_loc(iptr_info),cbatch_size,magma_queue)
      call magma_queue_sync(magma_queue)

I don't see where magma_queue comes from. It seems that should be passed into the routine.

James Neuhaus

unread,
Aug 31, 2020, 1:11:34 PM8/31/20
to MAGMA User, mga...@icl.utk.edu, MAGMA User, James Neuhaus
Thanks for the reply.

These comments are wrong — for dgeqrf, dA_array is double** and dtau_array is double**. But that shouldn't affect the code.

The interface code was copied and edited from a subroutine in the reference magma2_zfortran.F90 and I missed editing the comment.

Tau is of size min( m, n ). So this code is assuming m ≥ n.
 
m <= n. In the code which will call this subroutine this is guaranteed, so min(m,n) = m

Should that be info(1)? I find it difficult to follow since the code seems to mix 1-based and 0-based indexing.
 
Again, sorry, I was handed a few thousands of lines of code which did not have consistent indexing. I've fixed this in the code, and still have crashes

I don't see where magma_queue comes from. It seems that should be passed into the routine.

I'll post more of the code for clarity, as well as a tar of all of it. magma_queue is a c_ptr which is initialized in a separate routine.

 

James Neuhaus

unread,
Aug 31, 2020, 1:12:02 PM8/31/20
to MAGMA User, James Neuhaus, mga...@icl.utk.edu, MAGMA User
#include "magma2_common.f90"
#include "magma2.f90"

program main
   use iso_c_binding
   use magma2_common
   use magma2
   use cudafor
   implicit none

   integer, parameter :: nn = 100
   integer, parameter :: nwalker = 300

   double precision, dimension(0:nn-1,0:(2*nn-1),1:nwalker) :: dataArr
   double precision, dimension(0:nn-1,1:nwalker) :: tau
   integer(c_int), dimension(1:nwalker) :: info
   integer i, j,k, walker

   dataArr = 0.0d0

   do k = 1, nwalker
   do i=0, nn-1
     dataArr(i,i+nn,k) = 1.0d0
     do j = i, nn-1
       dataArr(i,j,k) = i+(1.0/dble(nn) *j)+100*k
       dataArr(j,i,k) = dataArr(i,j,k)
     enddo
   enddo
   enddo

   !$acc enter data copyin(dataArr) create(info,tau)
   call init_magma()
   call magma2_dgeqrf_batched(nwalker,nn,2*nn,dataArr,nn,tau,info)
   call finalize_magma()
   stop
end program main

James Neuhaus

unread,
Aug 31, 2020, 1:12:16 PM8/31/20
to MAGMA User, James Neuhaus, mga...@icl.utk.edu, MAGMA User
module magma2

   use iso_c_binding
   use magma2_common
   implicit none

   type(c_ptr), private :: magma_queue
   integer(c_int), private :: magma_device

   interface
      subroutine magma_dgetrf_batchedF( &
         m, n, dA_array, lda, ipiv_array, info_array, batchcount, queue) &
         bind(C, name="magma_dgetrf_batched")

         use iso_c_binding
         integer(c_int), value  :: m, n, lda, batchcount
         type(c_ptr), value  :: dA_array    !! double_complex**
         type(c_ptr), value  :: ipiv_array  !! int**

         type(c_ptr), value  :: info_array  !! int*
         type(c_ptr), value  :: queue
      end subroutine

      subroutine magma_dgeqrf_batchedF( &
         m, n, dA_array, lda, dtau_array, info_array, batchcount, queue) &
         bind(C, name="magma_dgeqrf_batched")
         use iso_c_binding
         integer(c_int), value  :: m, n, lda, batchcount
         type(c_ptr), value  :: dA_array    !! double_complex**
         type(c_ptr), value  :: dtau_array  !! int**
         type(c_ptr), value  :: info_array  !! int*
         type(c_ptr), value  :: queue
      end subroutine
      !! -------------------------------------------------------------------------
      !! initialize
      subroutine magma_init() &
         bind(C, name="magma_init")
         use iso_c_binding
      end subroutine

      subroutine magma_finalize() &
         bind(C, name="magma_finalize")
         use iso_c_binding
      end subroutine

      subroutine magma_get_device(dev) &
         bind(C, name="magma_getdevice")
         use iso_c_binding
         integer(c_int), target :: dev
      end subroutine

      subroutine magma_set_device(dev) &
         bind(C, name="magma_setdevice")
         use iso_c_binding
         integer(c_int), value :: dev
      end subroutine

      integer(c_size_t) function magma_mem_size(queue) &
         bind(C, name="magma_mem_size")
         use iso_c_binding
         type(c_ptr), value :: queue
      end function

      subroutine magma_queue_create_internal(dev, queue_ptr, func, file, line) &
         bind(C, name="magma_queue_create_internal")
         use iso_c_binding
         integer(c_int), value :: dev
         type(c_ptr), target :: queue_ptr  !! queue_t*
         character(c_char) :: func, file
         integer(c_int), value :: line
      end subroutine

      subroutine magma_queue_destroy_internal(queue, func, file, line) &
         bind(C, name="magma_queue_destroy_internal")
         use iso_c_binding
         type(c_ptr), value :: queue  !! queue_t
         character(c_char) :: func, file
         integer(c_int), value :: line
      end subroutine

      subroutine magma_queue_sync_internal(queue, func, file, line) &
         bind(C, name="magma_queue_sync_internal")
         use iso_c_binding
         type(c_ptr), value :: queue  !! queue_t
         character(c_char) :: func, file
         integer(c_int), value :: line
      end subroutine

      integer(c_int) function magma_queue_get_device(queue) &
         bind(C, name="magma_queue_get_device")
         use iso_c_binding
         type(c_ptr), value :: queue  !! queue_t
      end function

   end interface

contains

   subroutine magma_queue_create(dev, queue_ptr)
      use iso_c_binding
      integer(c_int), value :: dev
      type(c_ptr), target :: queue_ptr  !! queue_t*

      call magma_queue_create_internal( &
         dev, queue_ptr, &
         "magma_queue_create"//c_null_char, &
         __FILE__//c_null_char, &
         __LINE__)
   end subroutine

   subroutine magma_queue_destroy(queue)
      use iso_c_binding
      type(c_ptr), value :: queue  !! queue_t

      call magma_queue_destroy_internal( &
         queue, &
         "magma_queue_destroy"//c_null_char, &
         __FILE__//c_null_char, &
         __LINE__)
   end subroutine

   subroutine magma_queue_sync(queue)
      use iso_c_binding
      type(c_ptr), value :: queue  !! queue_t

      call magma_queue_sync_internal( &
         queue, &
         "magma_queue_sync"//c_null_char, &
         __FILE__//c_null_char, &
         __LINE__)
   end subroutine


   subroutine init_magma()
     implicit none
     call magma_init()
     call magma_get_device(magma_device)
     call magma_queue_create(magma_device,magma_queue)
   end subroutine init_magma

   subroutine finalize_magma()
     call magma_finalize()
   end subroutine finalize_magma

   subroutine magma_dgetrf_batched( batch_size, n, A, ld1, ipvt,info)

      use iso_c_binding
      use magma2_common
      implicit none
      integer  :: n, ld1, batch_size, i, istat
      integer(c_int), target :: cn, cld1, cbatch_size
      double precision(c_double), target  :: A(0:ld1 - 1, 0:n - 1, 1:batch_size)
      integer(c_int), target :: info(1:batch_size)
      integer(c_int), target :: ipvt(0:n - 1,1:batch_size)
      type(c_ptr), dimension(batch_size) :: dptr_A, iptr_ipvt
      type(c_ptr) :: iptr_info


      cn = n
      cld1 = ld1
      cbatch_size = batch_size
      !$acc data present(ipvt, info, A) pcreate(dptr_A,iptr_ipvt,iptr_info)
      !$acc host_data use_device(A,ipvt,info)

      !$acc kernels loop private(i)
      do i = 1, batch_size
        dptr_A(i) = c_loc(A(0,0, i))
        iptr_ipvt(i) = c_loc(ipvt(0,i))

      enddo
      !$acc end kernels
      !$acc serial
      iptr_info = c_loc(info(LBOUND(info,1)))

      !$acc end serial
      !$acc end host_data
      !$acc host_data use_device(A,ipvt,info,dptr_A,iptr_ipvt,iptr_info)
      call magma_dgetrf_batchedF(cn,cn,c_loc(dptr_A),cld1,c_loc(iptr_ipvt),c_loc(iptr_info),cbatch_size,magma_queue)
      call magma_queue_sync(magma_queue)

      !$acc end host_data
      !$acc end data

   end subroutine magma_dgetrf_batched


   subroutine magma2_dgeqrf_batched( batch_size, m,n, A, ld1, Tau,info)
      use iso_c_binding
      use magma2_common
      implicit none
      integer  :: m,n, ld1, batch_size, i, istat
      integer(c_int) :: cm, cn, cld1, cbatch_size
      double precision(c_double), target  :: A(0:ld1-1,0: n-1, 1:batch_size)
      double precision(c_double), target  :: Tau(0:m-1,1:batch_size)
      integer(c_int), target :: info(1:batch_size)

      type(c_ptr), dimension(1:batch_size) :: dptr_A, dptr_Tau
      type(c_ptr) :: iptr_info

      cm = m
      cn = n
      cld1 = ld1
      cbatch_size = batch_size

      !$acc data present(Tau,A,info) pcreate(dptr_A,dptr_Tau,iptr_info)
      !$acc host_data use_device(A,Tau,info)
      !$acc kernels loop private(i)
      do i = 1, batch_size
        dptr_A(i) = c_loc(A(LBOUND(A,1),LBOUND(A,2), i))
        dptr_Tau(i) = c_loc(Tau(LBOUND(Tau,1),i))
      enddo
      !$acc end kernels
      !$acc serial
      iptr_info = c_loc(info(LBOUND(info,1)))

      !$acc end serial
      !$acc end host_data
      !$acc host_data use_device(A,Tau,info,dptr_A,dptr_Tau,iptr_info)
      call magma_dgeqrf_batchedF(cm,cn,c_loc(dptr_A),cld1,c_loc(dptr_tau),c_loc(iptr_info),cbatch_size,magma_queue)
      call magma_queue_sync(magma_queue)
      !$acc end host_data
      !$acc end data

   end subroutine magma2_dgeqrf_batched
end module

James Neuhaus

unread,
Aug 31, 2020, 1:15:34 PM8/31/20
to MAGMA User, James Neuhaus, mga...@icl.utk.edu, MAGMA User
A tar of all code:
magma2.tgz
Message has been deleted

arif ali

unread,
Jan 26, 2021, 12:35:57 PM1/26/21
to MAGMA User, mga...@icl.utk.edu
Reply all
Reply to author
Forward
0 new messages