Basic usage : tile_node->existsOn(device) failed ; what am I doing wrong?

25 views
Skip to first unread message

Robert Knop

unread,
Aug 1, 2023, 11:31:59 AM8/1/23
to SLATE User
I'm trying a very basic simple test.  I can get things to work when running on the CPU, but I can't get things to work when running on the GPU.

First, I'm sure that the device is being seen; this line:

  blas::Queue q{};
  std::cout << "slate::gpu_aware_mpi() = " << slate::gpu_aware_mpi()
            << ", blas::get_device_count() = " << blas::get_device_count()
            << ", blas::Queue::device() = " << q.device()
            << std::endl;

Gives the expected output:

slate::gpu_aware_mpi() = 0, blas::get_device_count() = 1, blas::Queue::device() = 0

The lines:

  auto slatetarget = slate::Target::Devices;
  // auto slatetarget = slate::Target::Host;                      
  slate::Matrix<double> A( n, n, blocksize, nblockrows, nblockcols, MPI_COMM_WORLD );
  A.insertLocalTiles( slatetarget );

run.  But, then, when I get to (with nblockrows=1, as I'm running with just a single process (mpirun -np 1)):

  for ( int ti = 0 ; ti < nblockrows ; ++ti ) {
    for ( int tj = 0 ; tj < nblockrows ; ++tj ) {
      if ( A.tileIsLocal( ti, tj ) ) {
          auto Atile = A.at( ti, tj );

it dies with:
SLATE ERROR: Error check 'tile_node->existsOn(device)' failed in at at /usr/local/include/slate/internal/MatrixStorage.hh:399

If I set slatetarget = slate::Target::Host, everything works; I don't get an error, and the code runs all the way through.

Have I skipped a necessary initialization step somewhere for using the device?  What am I doing wrong?

-Rob

Robert Knop

unread,
Aug 1, 2023, 12:25:55 PM8/1/23
to SLATE User, Robert Knop
OK, I figured out my problem; I needed to give the device number (or slate::HostNum (which is -1) for the host) as the third argument of the A.at() call.

Plus, I had to be much more careful about copying between host and device memory.

-Rob

Mark Gates

unread,
Aug 1, 2023, 1:43:16 PM8/1/23
to Robert Knop, SLATE User
On Tue, Aug 1, 2023 at 12:25 PM Robert Knop <rak...@lbl.gov> wrote:
OK, I figured out my problem; I needed to give the device number (or slate::HostNum (which is -1) for the host) as the third argument of the A.at() call.

Correct. For multi-GPU systems (i.e., each MPI rank has multiple GPUs), you should check the device. Something like:

      if (A.tileIsLocal( ti, tj ) && A.tileDevice( ti, tj ) == device) {
          auto Atile = A.at( ti, tj, device );

or

      if (A.tileIsLocal( ti, tj )) {
          int device = A.tileDevice( ti, tj );
          auto Atile = A.at( ti, tj, device );
      }

We should add something like tileIsLocalOnDevice( ti, tj, device ) for the first case.

But another problem I see is here:
  slate::Matrix<double> A( n, n, blocksize, nblockrows, nblockcols, MPI_COMM_WORLD );
and
  for ( int ti = 0 ; ti < nblockrows ; ++ti ) {
    for ( int tj = 0 ; tj < nblockrows ; ++tj ) {

This seems to be using nblockrows, nblockrows for two different purposes.

The Matrix constructor takes A( m, n, blocksize, p, q, comm ). Here p-by-q are the MPI process grid dimensions, e.g., for 6 MPI ranks, a 1x6, 2x3, 3x2, or 6x1 grid. The number of block rows and cols is different. It's computed as ceil( m / nb ) block rows and ceil( n / nb ) block cols. E.g., for a 900-by-1700 matrix with blocksize 200 has 5 block rows and 9 block columns, independent of the MPI process grid. You can access that using A.mt() and A.nt() for tiles in the m (rows) and n (cols) directions, respectively.

Mark

--
Innovative Computing Laboratory
University of Tennessee, Knoxville
Reply all
Reply to author
Forward
0 new messages