Happy New Year to you too!
Regarding the questions,
1) magma_setdevice will set which device will be used in subsequent computations.
If you set it to zero, every GPU command after that will be executed on device zero
(and device 1 will not be used, as you point out).
magma_getdevice will give you which is the currently set device.
2) Both GPUs have to be in the target to have compile flags for both. If this was done originally, you don’t
have to recompile. If both were added, the code will run on either.
You can check, for example with the “—dev device_number” option in the testing routines, e.g., on my laptop
Stans-MacBook-Pro:testing tomov$ ./testing_dgemm -n 2000 -c --niter 2 --dev 0
% MAGMA 2.5.4 svn compiled for CUDA capability >= 3.0, 32-bit magma_int_t, 64-bit pointer.
% CUDA runtime 7000, driver 7050. MAGMA not compiled with OpenMP.
% device 0: GeForce GT 750M, 925.5 MHz clock, 2047.6 MiB memory, capability 3.0
% Thu Dec 31 17:46:01 2020
% Usage: ./testing_dgemm [options] [-h|--help]
% If running lapack (option --lapack), MAGMA and cuBLAS error are both computed
% relative to CPU BLAS result. Else, MAGMA error is computed relative to cuBLAS result.
% transA = No transpose, transB = No transpose
% M N K MAGMA Gflop/s (ms) cuBLAS Gflop/s (ms) CPU Gflop/s (ms) MAGMA error cuBLAS error
2000 2000 2000 22.42 ( 713.75) 27.84 ( 574.64) --- ( --- ) 0.00e+00 --- ok
2000 2000 2000 24.96 ( 641.00) 27.93 ( 572.85) --- ( --- ) 0.00e+00 --- ok
Although I am sure it will run when you give specific device number, it is interesting to see if both can be used at the same time,
when they are different. You can check this with the multi-GPU codes, e.g.,
./testing_sgetrf_gpu --ngpu 2 -n 10000 -c --niter 2 -l
3) We don’t do anything weird in the parsing so I don’t see reason it will be slow.
I just tested and don’t notice it. It could be slowdown for other reasons - initialization
of GPU, etc. How did you determine it is slow and do you see slowdown only when you add specific