Wrong result on QR decomposition when cols > 32 with magma_dgeqrf_gpu and magma_dgeqrf

Gabi Teodoru

unread,

May 12, 2025, 9:26:33 AMMay 12

to MAGMA User

Hi! I've been using the GPU QR decomposition, and it has been working perfectly on smaller matrices, but when I hit 32+ columns, I start getting a lower triangular matrix as a return.

magma_dgeqrf_gpu2 still works fine even above, though it doesn't produce the taus, and I need those to solve the system.

My setup is pretty vanilla, given:

magmaDouble_ptr d_A;

hsize_t rows;

hsize_t colsA; // rows > colsA

I run:

hsize_t k = std::min(rows, colsA);

// Allocate tau on the host
double *tau = new double[k];
magmaDouble_ptr dT;
magma_int_t nb = magma_get_dgeqrf_nb(rows, colsA);
// Use exact formula from documentation
magma_int_t dT_size = (2*k + (colsA + 31)/32*32)*nb;

magma_dmalloc(&dT, dT_size);

magma_dgeqrf_gpu(rows, colsA, d_A, rows, tau, dT, &info); // has a bug for colsA>32

I'm on Magma 2.9.0, NVIDIA GeForce RTX 4060 Laptop GPU, CUDA 12.0

Am I doing something wrong? Or is this a known bug? Or more complicated stuff going on?

Many thanks! :)

Nima Sahraneshin

unread,

May 12, 2025, 9:35:00 AMMay 12

to Gabi Teodoru, MAGMA User

Hi,

I don't have that version, but reducing the default internal block/panel size to 32 and recompiling might be the quickest way to resolve the issue.

Best regards,

Nima

--
You received this message because you are subscribed to the Google Groups "MAGMA User" group.
To unsubscribe from this group and stop receiving emails from it, send an email to magma-user+...@icl.utk.edu.
To view this discussion visit https://groups.google.com/a/icl.utk.edu/d/msgid/magma-user/5bb09449-f53f-4421-a308-422e273ee5adn%40icl.utk.edu.

Natalie Beams

unread,

May 12, 2025, 9:50:43 AMMay 12

to MAGMA User, unix...@gmail.com, MAGMA User, gabit...@gmail.com

I don't think there is a bug in the routine -- you can try out MAGMA's testing_dgeqrf_gpu test ( `--version 3` option will test magma_dgeqrf3_gpu) and see
how it compares to your code. I don't see any errors reported by the test for columns >= 32, with rows > cols.

What do you mean by "getting a lower triangular matrix as a return"? The dA matrix is supposed to contain (different) information in both upper and lower
parts upon exit. You are not seeing the upper part?

It might help if you provided your entire test code so we can try to reproduce what you are seeing.

-- Natalie

To unsubscribe from this group and stop receiving emails from it, send an email to magma-user+unsubscribe@icl.utk.edu.

Gabi Teodoru

unread,

May 12, 2025, 4:13:52 PMMay 12

to MAGMA User, Natalie Beams, unix...@gmail.com, MAGMA User, Gabi Teodoru

Hi Natalie,

Apologies if I replied twice (I first clicked reply to author, then I couldn't see any confirmation I had posted, so I'm reposting with reply-to-all, I'm new to Google Groups, sorry!).

Thanks for your reply. I'm attaching the code, which I've compiled with

g++ -std=c++17 -O3 -Wall -DADD_ -I/usr/local/magma/include -I/usr/include/x86_64-linux-gnu/openblas-pthread -I/usr/include/hdf5/serial -I/usr/local/cuda/include -c dbg.cpp -o dbg.o

g++ -std=c++17 -O3 -Wall -DADD_ -o dbg dbg.o -I/usr/local/magma/include -I/usr/include/x86_64-linux-gnu/openblas-pthread -I/usr/include/hdf5/serial -I/usr/local/cuda/include -L/usr/local/magma/lib -L/usr/lib/x86_64-linux-gnu/openblas-pthread -L/usr/local/cuda/lib64 -lmagma -lnvJitLink -lopenblas -lpthread -lm -L/usr/lib/x86_64-linux-gnu/hdf5/serial -lhdf5 -lcudart

Running this script generates a random matrix of given dimensions runs magma_dgeqrf_gpu, and outputs both matrices in .h5 format:

./dbg 100 33
MAGMA using device: 0
Device: NVIDIA GeForce RTX 4060 Laptop GPU
Saving result to A.h5
Matrix A: 100 x 33
Calling magma_dgeqrf_gpu with dimensions 100 x 33
Saving result to qr.h5

I then used this Python code to load it and compare it to the lapack version:

import os
import h5py
import numpy as np
import matplotlib.pyplot as plt
from scipy.linalg import lapack
from pathlib import Path

root = Path(r'\\wsl.localhost\Ubuntu\home\ ... ') # set file path here

def loadHdf5Matrix(filePath, datasetName='matrix'):
# Copy to temp location if on WSL path
if '\\wsl' in str(filePath):
import shutil
tempPath = Path(os.environ['TEMP']) / filePath.name
shutil.copy2(filePath, tempPath)
useFilePath = tempPath
else:
useFilePath = filePath

# Open the file
with h5py.File(useFilePath, 'r') as f:
# Load the entire dataset into memory
data = np.array(f[datasetName])
return data

def visualizeMatrix(matrix, title=None, cmap='viridis'):
plt.figure(figsize=(10, 8))
plt.imshow(matrix, cmap=cmap)
plt.colorbar(label='Value')
if title:
plt.title(title)
plt.tight_layout()
plt.show()

for i in 'A,qr'-spl:
globals()[i] = loadHdf5Matrix(root/(i+'.h5'))
R, tau, work, info = lapack.dgeqrf(A)
visualizeMatrix(qr[:qr.shape[1]])
print(np.allclose(qr, R))

This consistently fails for 33 or more columns, but succeeds for 32 or fewer. Looking at the top square part of the output matrix, for 32 columns:

while for 35 columns ...

(so I thought it was lower triangular, but in fact it's lower triangular, plus upper triangular in the columns past the first 32).

I also tried to run the script you told me, but I checked my MAGMA installation at /usr/local/magma and found it only contains the include and lib directories:

$ ls /usr/local/magma
include lib

Do you think you could help me from these alone? Or you think I should try to get a more complete installation of Magma and run that test script?

Thanks! :)

To unsubscribe from this group and stop receiving emails from it, send an email to magma-user+...@icl.utk.edu.

dbg.cpp

Natalie Beams

unread,

May 12, 2025, 4:27:25 PMMay 12

to MAGMA User, gabit...@gmail.com, Natalie Beams, unix...@gmail.com, MAGMA User

Thanks, I will look at your code.

Regarding the MAGMA tests:
Did you build MAGMA yourself? If so, you should be able to go to the directory where you did the `make` command and do
`make testing/testing_dgeqrf_gpu` to build the test (or, swap the test name to build any other test you want) -- building all the tests
would take quite a bit longer, there are a lot of them, but you can do that too (`make test`).

-- Natalie

To unsubscribe from this group and stop receiving emails from it, send an email to magma-user+unsubscribe@icl.utk.edu.

Gabi Teodoru

unread,

May 12, 2025, 5:21:39 PMMay 12

to MAGMA User, Gabi Teodoru, unix...@gmail.com, MAGMA User

Hi Natalie,

Thanks again for your help with this issue! I've made progress in investigating the problem -- I had tried to compile Magma some time ago, but it failed midway, but I still had some stuff compiled, and managed to find that function.

I ran the testing_dgeqrf_gpu program which passed all tests, even with 100×33 matrices with the -c flag and --version 1 .

I've tried to make my code match the testing code as closely as possible by

Aligning the matrix dimensions (ldda = magma_roundup(rows, 32))
Using the exact same dT size calculation formula
Initializing dT with zeros using magmablas_dlaset

Despite these changes, the issue persists in my code. Perhaps the difference is in how it was compiled? I tried to compile just the test script, but I can't get the magma_opts to work properly.

Given that the issue occurs exactly at the 32-column boundary, it seems like it might be related to a GPU block size or memory alignment issue that's triggered under certain conditions.

Thanks! :)

To unsubscribe from this group and stop receiving emails from it, send an email to magma-user+...@icl.utk.edu.

Natalie Beams

unread,

May 12, 2025, 6:28:46 PMMay 12

to MAGMA User, gabit...@gmail.com, Natalie Beams, unix...@gmail.com

Ah, I think I see the source of confusion now. It looks like MAGMA's comment documentation for versions 1 and 3 is incorrect (copied from the LAPACK-compliant version, version 2,
it seems).

Compare the descriptions in the `testing_dgeqrf_gpu` test, when calling the different versions:

For version 1:
// stores dT, V blocks have zeros, R blocks inverted & stored in dT

For version 3 (magma_zgeqrf3_gpu) :

// stores dT, V blocks have zeros, R blocks stored in dT

Only version 2 has the same output as LAPACK in terms of the R matrix (despite what the comments in src/zgeqrf_gpu.cpp and src/zgeqrf3_gpu.cpp say).

Later in the test, we see this for version 3:

if ( opts.version == 3 ) {
// copy diagonal blocks of R back to A
for( int i=0; i < min_mn-nb; i += nb ) {
magma_int_t ib = min( min_mn-i, nb );
magmablas_zlacpy( MagmaUpper, ib, ib, &dT[min_mn*nb + i*nb], nb, &d_A[ i + i*ldda ], ldda, opts.queue );
}
}

Then we get the A output (copied in h_R here) to be what we expect. version 1 is meant to be used to solve a system Ax = b, not to generate the Q and R matrices directly.

Sorry for the confusion. If you have a GitHub account, feel free to open an issue about the documentation/confusion (https://github.com/icl-utk-edu/magma).
If you are not a GitHub user, I can open an issue.

-- Natalie

To unsubscribe from this group and stop receiving emails from it, send an email to magma-user+unsubscribe@icl.utk.edu.

Natalie Beams

unread,

May 12, 2025, 6:36:23 PMMay 12

to MAGMA User, Natalie Beams, gabit...@gmail.com, unix...@gmail.com

Whoops, copied too quickly -- I got the code for copying the blocks from testing_zgeqrf_gpu, not the double version. So the call should be magmablas_dlacpy,
not magmablas_zlacpy. Then d_A has the expected format, and it's all copied to h_R.

To unsubscribe from this group and stop receiving emails from it, send an email to magma-user+...@icl.utk.edu.

Gabi Teodoru

unread,

May 13, 2025, 12:18:52 AMMay 13

to MAGMA User, Natalie Beams, unix...@gmail.com

Thank you so much!

Yes, I've switched to version 3 and put

for( int i=0; i < k-nb; i += nb ) {
magma_int_t ib = std::min( k-i, nb );
magmablas_dlacpy( MagmaUpper, ib, ib, &dT[k*nb + i*nb], nb, &d_A[ i + i*ldda ], ldda, queue );
}

into my code and now the results are the same as LAPACK's.

I have created an issue at: https://github.com/icl-utk-edu/magma/issues/49

Thanks again! :)

Reply all

Reply to author

Forward

Wrong result on QR decomposition when cols > 32 with magma_dgeqrf_gpu and magma_dgeqrf_gpu3

Gabi Teodoru

Nima Sahraneshin

Natalie Beams

Gabi Teodoru

Natalie Beams

Gabi Teodoru

Natalie Beams

Natalie Beams

Gabi Teodoru