dx -> dy using magma_dcopyvector yields zero values for dy

Yeonjun Jeong

unread,

Feb 10, 2025, 10:24:04 AMFeb 10

to MAGMA User

Dear the MAGMA team,

I wrote the attached test Fortran code that makes use of 'magma_dcopyvector' for device -> device vector copy dx -> dy. Here, the values of dy copied to the host and printed are different than dx and always zero. I wrote the wrappers for this routine (at the bottom of the files 'magma2_zfortran.F90' and 'magma2_common.F90'), so maybe these are the issue? Your help would be appreciated.

Best regards,

Yeonjun Jeong

magma2_zfortran.F90

output.test

magma2_common.F90

Makefile.test

test_magma.F90

Andrew Cunningham

unread,

Feb 11, 2025, 10:04:21 AMFeb 11

to Yeonjun Jeong, MAGMA User

Hi Yeonjun,

Why are you calling magma_queue_create with 1. Do you have multiple GPU’s in your system.

Try magma_queue_create(0, &queue)

Andrew

--
You received this message because you are subscribed to the Google Groups "MAGMA User" group.
To unsubscribe from this group and stop receiving emails from it, send an email to magma-user+...@icl.utk.edu.
To view this discussion visit https://groups.google.com/a/icl.utk.edu/d/msgid/magma-user/5d510c8b-32be-4570-b4cc-9bedebb4169an%40icl.utk.edu.
<magma2_zfortran.F90><output.test><magma2_common.F90><Makefile.test><test_magma.F90>

Yeonjun Jeong

unread,

Feb 11, 2025, 12:07:05 PMFeb 11

to MAGMA User, AndrewC, MAGMA User, Yeonjun Jeong

Hi Andrew,

My system has four A100 GPUs. The values of dy still seem to be zero after changing the device ID to 0 (or 1, 2, 3). Setting the 'MAGMA_NUM_GPUS' environment variable to 1 or 4 does not resolve the issue. I am able to use other routines such as 'magma_setmatrix', 'magma_dgetrf', 'magma_dgetrf_batched', and 'magma_dgemv_batched' using a single GPU or four GPUs.

Best,

Yeonjun

Andrew Cunningham

unread,

Feb 11, 2025, 12:07:14 PMFeb 11

to Yeonjun Jeong, MAGMA User

You don’t say whether anything changed if you tried magma_queue_create(0, &queue)
Have you tried running many of the testing_ C++ examples included with magma which exercise those routines
I converted your code to C++ ( thanks ChatGPT !) and it works perfectly on my single GPU system. Try this C++ code below
If the C++ code works, there may be a subtle issue with the .F90 interface.
Remember magma in may areas is a relatively thin interface over the underlying CUDA, so saying “magma is broken” is like saying “CUDA is broken"

#include <iostream>

#include <magma_v2.h>

#include <cstdlib>

void test_dcopyvector(int n, magma_queue_t queue);

int main() {

std::cout << "--------------- init" << std::endl;

magma_init();

magma_queue_t queue;

std::cout << "--------------- create queue" << std::endl;

magma_queue_create(0, &queue);

test_dcopyvector(3, queue);

std::cout << "--------------- destroy queue" << std::endl;

magma_queue_destroy(queue);

std::cout << "--------------- finalize" << std::endl;

magma_finalize();

std::cout << "done" << std::endl;

return 0;

}

void test_dcopyvector(int n, magma_queue_t queue) {

double *x = new double[n];

double *r = new double[n];

double *s = new double[n];

magmaDouble_ptr dx, dy;

magma_malloc((void**)&dx, n * sizeof(double));

magma_malloc((void**)&dy, n * sizeof(double));

for (int i = 0; i < n; ++i) {

x[i] = static_cast<double>(rand()) / RAND_MAX;

}

magma_dsetvector(n, x, 1, dx, 1, queue);

magma_dcopyvector(n, dx, 1, dy, 1, queue);

magma_dgetvector(n, dx, 1, r, 1, queue);

magma_dgetvector(n, dy, 1, s, 1, queue);

for (int i = 0; i < n; ++i) {

std::cout << " x(" << i << ") == " << x[i] << ", r(" << i << ") == " << r[i]

<< ", error == " << r[i] - x[i] << std::endl;

std::cout << " x(" << i << ") == " << x[i] << ", s(" << i << ") == " << s[i]

<< ", error == " << s[i] - x[i] << std::endl;

std::cout << std::endl;

}

magma_free(dx);

magma_free(dy);

delete[] x;

delete[] r;

delete[] s;

}

--------------- init

--------------- create queue

x(0) == 0.00125126, r(0) == 0.00125126, error == 0

x(0) == 0.00125126, s(0) == 0.00125126, error == 0

x(1) == 0.563585, r(1) == 0.563585, error == 0

x(1) == 0.563585, s(1) == 0.563585, error == 0

x(2) == 0.193304, r(2) == 0.193304, error == 0

x(2) == 0.193304, s(2) == 0.193304, error == 0

--------------- destroy queue

--------------- finalize

done

Andrew Cunningham

unread,

Feb 11, 2025, 12:30:14 PMFeb 11

to Yeonjun Jeong, MAGMA User

Sorry, misread your response! You see no change when using Device 0

Andrew Cunningham

unread,

Feb 11, 2025, 1:23:37 PMFeb 11

to Yeonjun Jeong, MAGMA User

Hi Yeonjun

I see your problem

You do not have the .F90 interfaces for those routines. So your code will be calling the “C” routines which will not work due to the pass by value/pass by reference differences.

You need to be using the magmaf_ versions.

For example

magmaf_dcopyvector etc.

They are found in the control/magmablas_dfortran.F90

Andrew

On Feb 11, 2025, at 8:45 AM, Yeonjun Jeong <yeonjun...@gmail.com> wrote:

Andrew Cunningham

unread,

Feb 12, 2025, 7:16:47 AMFeb 12

to Yeonjun Jeong, MAGMA User

Hi Yeonjun,

I got your Fortran example working properly.

I did it all in Visual Studio using the Intel Fortran Compiler (IFX) but you will get the idea. It should be easier on Linux.

Normally I believe all these gyrations should not be necessary if you build magma with Fortran enabled.

I am not able to do that on Windows as there are some issues with the CMakeFile and the Intel Fortran compilers that prevent CMake from working in that configuration.

So , I had to manually glue things together to make a relatively minimal example.

Create a “C” library using the xxxf77.cpp files found in magma-2.9.0/control

I used the minimal set for this example
Windows required the UPCASE #define
Windows requires the .CPP files be in a separate project from the Fortran files and to be in a separate static library

Modify your test program to call the needed magmaf_ functions that are found in the magmablas_dfortran.F90 module

I created a simple module “test_functions.f90” that only includes the needed functions.

Add in the F90 files from magma-2.9.0/fortran

call magmaf_dsetvector( n, x, 1, dx, 1, queue )

call magmaf_dcopyvector( n, dx, 1, dy, 1, queue )

call magmaf_dgetvector( n, dx, 1, r, 1, queue )

call magmaf_dgetvector( n, dy, 1, s, 1, queue )

--------------- init

--------------- create queue

x(1) == 0.3920868194323862E-06, r(1) == 0.3920868194323862E-06, error == 0.0000000000000000E+00

x(1) == 0.3920868194323862E-06, s(1) == 0.3920868194323862E-06, error == 0.0000000000000000E+00

x(2) == 0.2548044275764261E-01, r(2) == 0.2548044275764261E-01, error == 0.0000000000000000E+00

x(2) == 0.2548044275764261E-01, s(2) == 0.2548044275764261E-01, error == 0.0000000000000000E+00

x(3) == 0.3525161612610669E+00, r(3) == 0.3525161612610669E+00, error == 0.0000000000000000E+00

x(3) == 0.3525161612610669E+00, s(3) == 0.3525161612610669E+00, error == 0.0000000000000000E+00

test_magma.F90

test_functions.f90

Screenshot 2025-02-11 173319.png

Yeonjun Jeong

unread,

Feb 12, 2025, 1:03:47 PMFeb 12

to MAGMA User, AndrewC, MAGMA User, Yeonjun Jeong

Hi Andrew,

I was able to get the correct results using the 'magmaf' subroutines. I didn't know that there are Fortran wrappers included in the MAGMA distribution. Thank you for the help!

Best,

Yeonjun

Andrew Cunningham

unread,

Feb 12, 2025, 1:03:54 PMFeb 12

to Yeonjun Jeong, MAGMA User

Great! I used Magma for some crusty legacy Fortran F77 code a couple of years ago and had some great success in getting a speed up using the GPU.

Reply all

Reply to author

Forward