dgetf2_nopiv_internal_batched kernels

Skip to first unread message

aran nokan

Oct 3, 2021, 7:26:01 PM10/3/21
to MAGMA User

I have a question about dgetf2_nopiv_internal_batched kernels. Actually I am not understanding why ntcol is going to be calculated like this, or shared memory:

    const magma_int_t ntcol = (m1 > 32) ? 1 : (2 * (32/m1));
    magma_int_t shmem = ntcol * magma_ceilpow2(n) * sizeof(double);
    magma_int_t gridx = magma_ceildiv(batchCount, ntcol);
    dim3 threads(m1, ntcol, 1);
    dim3 grid(gridx, 1, 1);
Why ntcol=1? and why is it different for small m1?

Also e question about the template like this:

        case  n: dgetf2_nopiv_batched_kernel< n, magma_ceilpow2( n)><<<grid, threads, shmem, queue->cuda_stream()>>>(m1, dA_array, ai, aj, ldda, info_array, gbstep, batchCount); break;

Is < n, magma_ceilpow2( n)> for register buffer? So why < n, magma_ceilpow2( n)>?

Reply all
Reply to author
0 new messages