Hi Robert,
Absolutely, you can set elements on the GPU. I don't have any user-level example code, but added that to our
to-do list. The closest code we have is the
slate::set function, which sets the matrix to a constant on the diagonal (say, 1 for the identity), and another constant on the off-diagonal (say, 0). It has both CPU and GPU implementations. Look at:
slate/src/set.cc
which calls
slate/src/internal/internal_geset.cc
which for the GPU sets up batches of tiles on the GPU, and calls slate::device::batch::geset on each batch. In our case, all tiles within a batch have the same properties (dimensions, diagonal and off-diagonal constants). The CUDA kernel is defined in:
which has both single tile and batch versions. The ROCm kernel is auto-generated from the CUDA version as:
Does parallelizing across the tiles in a batch work for you? Or how would GPU parallelization work in your case?
For the future, we could simplify this process. We recently added a slate::set overload that takes a lambda function to compute entry (i, j) of a matrix on the CPU. I could envision a similar slate::set function that takes a GPU batch function, basically doing the job of set.cc and internal_geset.cc for you.
Mark
Interim Director, Innovative Computing Laboratory (ICL)
Research Assistant Professor, University of Tennessee, Knoxville