dx -> dy using magma_dcopyvector yields zero values for dy

24 views
Skip to first unread message

Yeonjun Jeong

unread,
Feb 10, 2025, 10:24:04 AMFeb 10
to MAGMA User
Dear the MAGMA team,

I wrote the attached test Fortran code that makes use of 'magma_dcopyvector' for device -> device vector copy dx -> dy. Here, the values of dy copied to the host and printed are different than dx and always zero. I wrote the wrappers for this routine (at the bottom of the files 'magma2_zfortran.F90' and 'magma2_common.F90'), so maybe these are the issue? Your help would be appreciated.

Best regards,
Yeonjun Jeong
magma2_zfortran.F90
output.test
magma2_common.F90
Makefile.test
test_magma.F90

Andrew Cunningham

unread,
Feb 11, 2025, 10:04:21 AMFeb 11
to Yeonjun Jeong, MAGMA User
Hi Yeonjun,
Why are you calling magma_queue_create with 1.  Do you have multiple GPU’s in your system.

Try  magma_queue_create(0, &queue)


Andrew




--
You received this message because you are subscribed to the Google Groups "MAGMA User" group.
To unsubscribe from this group and stop receiving emails from it, send an email to magma-user+...@icl.utk.edu.
To view this discussion visit https://groups.google.com/a/icl.utk.edu/d/msgid/magma-user/5d510c8b-32be-4570-b4cc-9bedebb4169an%40icl.utk.edu.
<magma2_zfortran.F90><output.test><magma2_common.F90><Makefile.test><test_magma.F90>

Yeonjun Jeong

unread,
Feb 11, 2025, 12:07:05 PMFeb 11
to MAGMA User, AndrewC, MAGMA User, Yeonjun Jeong
Hi Andrew,

My system has four A100 GPUs. The values of dy still seem to be zero after changing the device ID to 0 (or 1, 2, 3). Setting the 'MAGMA_NUM_GPUS' environment variable to 1 or 4 does not resolve the issue. I am able to use other routines such as 'magma_setmatrix', 'magma_dgetrf', 'magma_dgetrf_batched', and 'magma_dgemv_batched' using a single GPU or four GPUs.

Best,
Yeonjun

Andrew Cunningham

unread,
Feb 11, 2025, 12:07:14 PMFeb 11
to Yeonjun Jeong, MAGMA User
  • You don’t say whether anything changed if you tried magma_queue_create(0, &queue)
  • Have you tried running many of the testing_ C++ examples included with magma which exercise those routines
  • I converted your code to C++ ( thanks ChatGPT !) and it works perfectly on my single GPU system. Try this C++ code below
  • If the C++ code works, there may be a subtle issue with the .F90 interface.
  • Remember magma in may areas is a relatively  thin interface over the underlying CUDA, so saying “magma is broken” is like saying “CUDA is broken"



#include <iostream>
#include <magma_v2.h>
#include <cstdlib>

void test_dcopyvector(int n, magma_queue_t queue);

int main() {
    std::cout << "--------------- init" << std::endl;
    magma_init();
    
    magma_queue_t queue;
    std::cout << "--------------- create queue" << std::endl;
    magma_queue_create(0, &queue);
    
    test_dcopyvector(3, queue);
    
    std::cout << "--------------- destroy queue" << std::endl;
    magma_queue_destroy(queue);
    
    std::cout << "--------------- finalize" << std::endl;
    magma_finalize();
    std::cout << "done" << std::endl;
    
    return 0;
}

void test_dcopyvector(int n, magma_queue_t queue) {
    double *x = new double[n];
    double *r = new double[n];
    double *s = new double[n];
    magmaDouble_ptr dx, dy;
    
    magma_malloc((void**)&dx, n * sizeof(double));
    magma_malloc((void**)&dy, n * sizeof(double));
    
    for (int i = 0; i < n; ++i) {
        x[i] = static_cast<double>(rand()) / RAND_MAX;
    }
    
    magma_dsetvector(n, x, 1, dx, 1, queue);
    magma_dcopyvector(n, dx, 1, dy, 1, queue);
    magma_dgetvector(n, dx, 1, r, 1, queue);
    magma_dgetvector(n, dy, 1, s, 1, queue);
    
    for (int i = 0; i < n; ++i) {
        std::cout << "    x(" << i << ") == " << x[i] << ", r(" << i << ") == " << r[i] 
                  << ", error == " << r[i] - x[i] << std::endl;
        std::cout << "    x(" << i << ") == " << x[i] << ", s(" << i << ") == " << s[i] 
                  << ", error == " << s[i] - x[i] << std::endl;
        std::cout << std::endl;
    }
    
    magma_free(dx);
    magma_free(dy);
    
    delete[] x;
    delete[] r;
    delete[] s;
}
--------------- init
--------------- create queue
    x(0) == 0.00125126, r(0) == 0.00125126, error == 0
    x(0) == 0.00125126, s(0) == 0.00125126, error == 0

    x(1) == 0.563585, r(1) == 0.563585, error == 0
    x(1) == 0.563585, s(1) == 0.563585, error == 0

    x(2) == 0.193304, r(2) == 0.193304, error == 0
    x(2) == 0.193304, s(2) == 0.193304, error == 0

--------------- destroy queue
--------------- finalize
done

Andrew Cunningham

unread,
Feb 11, 2025, 12:30:14 PMFeb 11
to Yeonjun Jeong, MAGMA User
Sorry, misread your response!  You see no change when using Device 0

Andrew Cunningham

unread,
Feb 11, 2025, 1:23:37 PMFeb 11
to Yeonjun Jeong, MAGMA User
Hi Yeonjun

I see your problem

You do not have the .F90 interfaces for those routines. So your code will be calling the “C” routines which will not work due to the pass by value/pass by reference differences.

You need to be using the magmaf_ versions.

For example

magmaf_dcopyvector etc.

They are found in the control/magmablas_dfortran.F90 

Andrew

On Feb 11, 2025, at 8:45 AM, Yeonjun Jeong <yeonjun...@gmail.com> wrote:

Andrew Cunningham

unread,
Feb 12, 2025, 7:16:47 AMFeb 12
to Yeonjun Jeong, MAGMA User
Hi Yeonjun,
I got your Fortran example working properly. 

I did it all in Visual Studio using the Intel Fortran Compiler (IFX)   but you will get the idea. It should be easier on Linux.

Normally I believe all these gyrations should not be necessary if you build magma with Fortran enabled.

I am not able to do that on Windows as there are some issues with the CMakeFile and the Intel Fortran compilers  that prevent CMake from working in that configuration.

So , I had to manually glue things together to make a relatively minimal example.

  • Create a “C” library using the xxxf77.cpp files found in magma-2.9.0/control
    • I used the minimal set for this example
    • Windows required the UPCASE #define
    • Windows requires the .CPP files be in a separate project from the Fortran files and to be in a separate static library
  • Modify your test program to call the needed magmaf_ functions that are found in the magmablas_dfortran.F90 module
    • I created a simple module “test_functions.f90” that only includes the needed functions.
  • Add in the F90 files from magma-2.9.0/fortran


 call magmaf_dsetvector( n, x, 1, dx, 1, queue ) 
 call magmaf_dcopyvector( n, dx, 1, dy, 1, queue ) 
 call magmaf_dgetvector( n, dx, 1, r, 1, queue ) 
 call magmaf_dgetvector( n, dy, 1, s, 1, queue )


 --------------- init
 --------------- create queue
    x(1) ==   0.3920868194323862E-06, r(1) ==   0.3920868194323862E-06, error ==   0.0000000000000000E+00
    x(1) ==   0.3920868194323862E-06, s(1) ==   0.3920868194323862E-06, error ==   0.0000000000000000E+00

    x(2) ==   0.2548044275764261E-01, r(2) ==   0.2548044275764261E-01, error ==   0.0000000000000000E+00
    x(2) ==   0.2548044275764261E-01, s(2) ==   0.2548044275764261E-01, error ==   0.0000000000000000E+00

    x(3) ==   0.3525161612610669E+00, r(3) ==   0.3525161612610669E+00, error ==   0.0000000000000000E+00
    x(3) ==   0.3525161612610669E+00, s(3) ==   0.3525161612610669E+00, error ==   0.0000000000000000E+00

test_magma.F90
test_functions.f90
Screenshot 2025-02-11 173319.png

Yeonjun Jeong

unread,
Feb 12, 2025, 1:03:47 PMFeb 12
to MAGMA User, AndrewC, MAGMA User, Yeonjun Jeong
Hi Andrew,

I was able to get the correct results using the 'magmaf' subroutines. I didn't know that there are Fortran wrappers included in the MAGMA distribution. Thank you for the help!

Best,
Yeonjun

Andrew Cunningham

unread,
Feb 12, 2025, 1:03:54 PMFeb 12
to Yeonjun Jeong, MAGMA User
Great! I used Magma for some crusty legacy Fortran F77 code a couple of years ago and had some great success in getting a speed up using the GPU.
Reply all
Reply to author
Forward
0 new messages