Subset of Columns of a Given Matrix on GPU

27 views
Skip to first unread message

Yeonjun Jeong

unread,
Dec 7, 2024, 9:02:36 AM12/7/24
to MAGMA User
Hi everyone,

Given a matrix dA (dimension l×m) and a matrix dB (dimension l×n) that consists of a subset of the columns of dA (m > n), is there a way to create dC = [dB dx] or replace dB = [dB_{1}, dB_{2}, ..., dB_{n-1} dx] where dx is a column of dA but not an element of the columns of dB? The order of the columns of dB can be different than they appear in dA. All values in dA, dB, and dC are in double precision. I would like to do this operation on the GPU, so that data do not move between the GPU and the CPU.

Best,
Yeonjun


Mark Gates

unread,
Dec 7, 2024, 1:42:03 PM12/7/24
to Yeonjun Jeong, MAGMA User
Hi Yeonjun,

What is your final goal in extracting dB from dA? Are you using dB in a matrix-vector multiply or a factorization or some other operation? Depending on the use case, there may be simpler methods that don't ever extract dB from dA.

If you're just appending one column dx to matrix dB, then magma_zcopy can copy dx, assuming there's space in dB to hold it. See use below.

For extracting dB from dA, I don't think there's a ready-made routine in MAGMA, but one shouldn't be too difficult for you to write.

A _slow_ version would just use magma_zcopy to copy columns of A to B. Something like (untested!):

// UNTESTED!
// dA is m x n, dB is m x k, columns is k vector with indices to extract.
// T is any data type.
extract( int m, int n, int k, T* dA, int lda, T* dB, int ldb, int* columns, queue )
{
    assert( k <= n );
    for (int j = 0; j < k; ++j) {
        int j2 = columns[ j ];
        assert( 0 <= j2 && j2 < n );
        magma_zcopy( m, dA[ j2*lda ], 1, dB[ j*ldb ], 1, queue );
    }
}

A much faster version would be to make a modified copy of magmablas/zlacpy.cu that does the same thing. Something like (again, untested!):

// UNTESTED!
// Divides matrix into ceil( m/BLK_X ) x ceil( n/BLK_Y ) blocks.
// Each block has BLK_X threads.
// Each thread loops across one row, updating BLK_Y entries.
// Cf. magmablas/zlacpy.cu zlacpy_full_device.
static __device__
void extract_device(
    int m, int n, int k,
    const magmaDoubleComplex *dA, int ldda,
    magmaDoubleComplex       *dB, int lddb,
    int* columns )
{
    int i = blockIdx.x*BLK_X + threadIdx.x;
    int jj = blockIdx.y*BLK_Y;
    // do only rows inside matrix B.
    if (i < m) {
        // Thread copies row i, columns [ jj, ..., min( k, jj + BLK_Y ) ].
        dA += i;
        dB += i + jj*lddb;
        columns += jj;
        for (int j=0; j < BLK_Y && jj + j < k; ++j) {
            int j2 = columns[ j ];
            dB[ j*lddb ] = dA[ j2*ldda ];
        }
    }
}


Needs a __global__ function (cf. zlacpy_full_kernel) and the CPU driver (cf. magmablas_zlacpy) to call the above device routine.

Mark

Mark Gates

unread,
Dec 9, 2024, 1:52:55 PM12/9/24
to Yeonjun Jeong, MAGMA User
On Mon, Dec 9, 2024 at 1:10 PM Yeonjun Jeong <yeonjun...@gmail.com> wrote:

Do I have it right that the two functions 'extract' and 'extract_device' are for copying the columns of dA to dB according to the column indices stored in 'columns'?

Yes. Just sketches of how that can be done. Note `extract` assumes that the columns vector is in CPU host memory, while `extract_device` assumes the columns vector is in GPU device memory, so it would be better to call it dcolumns.

Mark

Yeonjun Jeong

unread,
Dec 9, 2024, 1:53:06 PM12/9/24
to MAGMA User, mga...@icl.utk.edu, MAGMA User, Yeonjun Jeong
Hi Mark,

Thank you for the answer. I was looking to perform matrix-vector multiplications and matrix-matrix multiplications (and an inverse of the resulting matrix) for hundreds (potentially thousands) of dB where only dx (a column) differs. magma_zcopy is what I was looking for to copy a vector into a dB. I imagine copying a vector for each dB (hence hundreds-thousands of copy calls) shouldn't slow things down too much, but I will also consider using dA directly (which will require writing my own routine).

Do I have it right that the two functions 'extract' and 'extract_device' are for copying the columns of dA to dB according to the column indices stored in 'columns'?

Best,
Yeonjun

Yeonjun Jeong

unread,
Dec 9, 2024, 2:46:05 PM12/9/24
to MAGMA User, mga...@icl.utk.edu, MAGMA User, Yeonjun Jeong
Thank makes sense. Thank you.

Yeonjun

Reply all
Reply to author
Forward
0 new messages