Basic usage : tile_node->existsOn(device) failed ; what am I doing wrong?

Robert Knop

unread,

Aug 1, 2023, 11:31:59 AM8/1/23

to SLATE User

I'm trying a very basic simple test. I can get things to work when running on the CPU, but I can't get things to work when running on the GPU.

First, I'm sure that the device is being seen; this line:

blas::Queue q{};
std::cout << "slate::gpu_aware_mpi() = " << slate::gpu_aware_mpi()
<< ", blas::get_device_count() = " << blas::get_device_count()
<< ", blas::Queue::device() = " << q.device()
<< std::endl;

Gives the expected output:

slate::gpu_aware_mpi() = 0, blas::get_device_count() = 1, blas::Queue::device() = 0

The lines:

auto slatetarget = slate::Target::Devices;
// auto slatetarget = slate::Target::Host;

slate::Matrix<double> A( n, n, blocksize, nblockrows, nblockcols, MPI_COMM_WORLD );
A.insertLocalTiles( slatetarget );

run. But, then, when I get to (with nblockrows=1, as I'm running with just a single process (mpirun -np 1)):

for ( int ti = 0 ; ti < nblockrows ; ++ti ) {
for ( int tj = 0 ; tj < nblockrows ; ++tj ) {
if ( A.tileIsLocal( ti, tj ) ) {
auto Atile = A.at( ti, tj );

it dies with:

SLATE ERROR: Error check 'tile_node->existsOn(device)' failed in at at /usr/local/include/slate/internal/MatrixStorage.hh:399

If I set slatetarget = slate::Target::Host, everything works; I don't get an error, and the code runs all the way through.

Have I skipped a necessary initialization step somewhere for using the device? What am I doing wrong?

-Rob

Robert Knop

unread,

Aug 1, 2023, 12:25:55 PM8/1/23

to SLATE User, Robert Knop

OK, I figured out my problem; I needed to give the device number (or slate::HostNum (which is -1) for the host) as the third argument of the A.at() call.

Plus, I had to be much more careful about copying between host and device memory.

-Rob

Mark Gates

unread,

Aug 1, 2023, 1:43:16 PM8/1/23

to Robert Knop, SLATE User

On Tue, Aug 1, 2023 at 12:25 PM Robert Knop <rak...@lbl.gov> wrote:

OK, I figured out my problem; I needed to give the device number (or slate::HostNum (which is -1) for the host) as the third argument of the A.at() call.

Correct. For multi-GPU systems (i.e., each MPI rank has multiple GPUs), you should check the device. Something like:

if (A.tileIsLocal( ti, tj ) && A.tileDevice( ti, tj ) == device) {
auto Atile = A.at( ti, tj, device );

or

if (A.tileIsLocal( ti, tj )) {

int device = A.tileDevice( ti, tj );

auto Atile = A.at( ti, tj, device );

}

We should add something like tileIsLocalOnDevice( ti, tj, device ) for the first case.

But another problem I see is here:

slate::Matrix<double> A( n, n, blocksize, nblockrows, nblockcols, MPI_COMM_WORLD );

and

for ( int ti = 0 ; ti < nblockrows ; ++ti ) {
for ( int tj = 0 ; tj < nblockrows ; ++tj ) {

This seems to be using nblockrows, nblockrows for two different purposes.

The Matrix constructor takes A( m, n, blocksize, p, q, comm ). Here p-by-q are the MPI process grid dimensions, e.g., for 6 MPI ranks, a 1x6, 2x3, 3x2, or 6x1 grid. The number of block rows and cols is different. It's computed as ceil( m / nb ) block rows and ceil( n / nb ) block cols. E.g., for a 900-by-1700 matrix with blocksize 200 has 5 block rows and 9 block columns, independent of the MPI process grid. You can access that using A.mt() and A.nt() for tiles in the m (rows) and n (cols) directions, respectively.

Mark

--

Innovative Computing Laboratory

University of Tennessee, Knoxville

Reply all

Reply to author

Forward