dgetf2_nopiv_internal_batched kernels

1 view

Skip to first unread message

aran nokan

unread,

Oct 3, 2021, 7:26:01 PM10/3/21

to MAGMA User

Hi,

I have a question about dgetf2_nopiv_internal_batched kernels. Actually I am not understanding why ntcol is going to be calculated like this, or shared memory:

const magma_int_t ntcol = (m1 > 32) ? 1 : (2 * (32/m1));
magma_int_t shmem = ntcol * magma_ceilpow2(n) * sizeof(double);
magma_int_t gridx = magma_ceildiv(batchCount, ntcol);
dim3 threads(m1, ntcol, 1);
dim3 grid(gridx, 1, 1);

Why ntcol=1? and why is it different for small m1?

Also e question about the template like this:

case n: dgetf2_nopiv_batched_kernel< n, magma_ceilpow2( n)><<<grid, threads, shmem, queue->cuda_stream()>>>(m1, dA_array, ai, aj, ldda, info_array, gbstep, batchCount); break;

Is < n, magma_ceilpow2( n)> for register buffer? So why < n, magma_ceilpow2( n)>?