I'm trying a very basic simple test. I can get things to work when running on the CPU, but I can't get things to work when running on the GPU.
First, I'm sure that the device is being seen; this line:
blas::Queue q{};
std::cout << "slate::gpu_aware_mpi() = " << slate::gpu_aware_mpi()
<< ", blas::get_device_count() = " << blas::get_device_count()
<< ", blas::Queue::device() = " << q.device()
<< std::endl;
Gives the expected output:
slate::gpu_aware_mpi() = 0, blas::get_device_count() = 1, blas::Queue::device() = 0
The lines:
auto slatetarget = slate::Target::Devices;
// auto slatetarget = slate::Target::Host;
slate::Matrix<double> A( n, n, blocksize, nblockrows, nblockcols, MPI_COMM_WORLD );
A.insertLocalTiles( slatetarget );
run. But, then, when I get to (with nblockrows=1, as I'm running with just a single process (mpirun -np 1)):
for ( int ti = 0 ; ti < nblockrows ; ++ti ) {
for ( int tj = 0 ; tj < nblockrows ; ++tj ) {
if ( A.tileIsLocal( ti, tj ) ) {
auto Atile = A.at( ti, tj );
it dies with:
SLATE ERROR: Error check 'tile_node->existsOn(device)' failed in at at /usr/local/include/slate/internal/MatrixStorage.hh:399
If I set slatetarget = slate::Target::Host, everything works; I don't get an error, and the code runs all the way through.
Have I skipped a necessary initialization step somewhere for using the device? What am I doing wrong?
-Rob