magma_getdevice resault vs nvtop, mix GPU Arch, opts pars

Skip to first unread message

aran nokan

Dec 31, 2020, 2:27:28 PM12/31/20
to MAGMA User

Happy 20201 to all of you :)

I have 3 questions.

1 ) I am using magma_getdevice and magma_setdevice to identify the device that I want to use for computation. I am selecting device 0 but in nvtop I am seeing that device 1 is under loaded.

2) In the machine we have two different architecture of GPU now, so should I install new MAGMA from scratch or will it work fine in mixed mode?

3) I have seen in the testing directory that by using I can parse some arguments, but I am seeing that this part takes really long time of excitation to parse a simple input like -N 512 or --matrix rand. What is the problem?  


Stanimire Tomov

Dec 31, 2020, 5:56:21 PM12/31/20
to aran nokan, MAGMA User
Hi Aran,

Happy New Year to you too!

Regarding the questions,
1) magma_setdevice will set which device will be used in subsequent computations. 
    If you set it to zero, every GPU command after that will be executed on device zero
   (and device 1 will not be used, as you point out).
   magma_getdevice will give you which is the currently set device.

2) Both GPUs have to be in the target to have compile flags for both. If this was done originally, you don’t 
    have to recompile. If both were added, the code will run on either.

    You can check, for example with the “—dev device_number” option in the testing routines, e.g., on my laptop

    Stans-MacBook-Pro:testing tomov$ ./testing_dgemm -n 2000 -c --niter 2 --dev 0
% MAGMA 2.5.4 svn compiled for CUDA capability >= 3.0, 32-bit magma_int_t, 64-bit pointer.
% CUDA runtime 7000, driver 7050. MAGMA not compiled with OpenMP. 
% device 0: GeForce GT 750M, 925.5 MHz clock, 2047.6 MiB memory, capability 3.0
% Thu Dec 31 17:46:01 2020
% Usage: ./testing_dgemm [options] [-h|--help]

% If running lapack (option --lapack), MAGMA and cuBLAS error are both computed
% relative to CPU BLAS result. Else, MAGMA error is computed relative to cuBLAS result.

% transA = No transpose, transB = No transpose
%   M     N     K   MAGMA Gflop/s (ms)  cuBLAS Gflop/s (ms)   CPU Gflop/s (ms)  MAGMA error  cuBLAS error
 2000  2000  2000     22.42 ( 713.75)      27.84 ( 574.64)     ---   (  ---  )    0.00e+00        ---    ok
 2000  2000  2000     24.96 ( 641.00)      27.93 ( 572.85)     ---   (  ---  )    0.00e+00        ---    ok

Although I am sure it will run when you give specific device number, it is interesting to see if both can be used at the same time,
when they are different. You can check this with the multi-GPU codes, e.g., 

./testing_sgetrf_gpu --ngpu 2 -n 10000 -c --niter 2 -l

3) We don’t do anything weird in the parsing so I don’t see reason it will be slow.
   I just tested and don’t notice it. It could be slowdown for other reasons - initialization
   of GPU, etc. How did you  determine it is slow and do you see slowdown only when you add specific
   parsing options?

Best regards,


You received this message because you are subscribed to the Google Groups "MAGMA User" group.
To unsubscribe from this group and stop receiving emails from it, send an email to
To view this discussion on the web visit

Reply all
Reply to author
0 new messages