QR Decompositions

8 views

Skip to first unread message

Kevin Ingles

unread,

Jun 21, 2025, 12:16:44 AMJun 21

to SLATE User

I am trying to just setup a simple script to make sure I understand how LAPACK++ expects matrix memory layout, and have encountered a frustrating and confusing bug:

As an example, I use the matrix given on the QR Factorization Wikipedia page:

#include <lapack.hh>

int main() {

// Row major
float a[] = { 12.0f, -51.0f, 4.0f, 6.0f, 167.0, -68.0f, -4.0f, 24.0f, -41.0f};
// Column major

// float a[] = { 12.0f, 6.0f, -4.0f, -51.0f, 167.0f, 24.0f, 4.0f, -68.0f, -41.0f };
float t[3];
int64_t return_code = lapack::gelq2(3, 3, a, 3, t);
std::cout << return_code << "\n";
for (int i = 0; i < 3; ++i)
{
for (int j = 0; j < 3; ++j)
std::cout << "\t" << a[i * 3 + j] << " ";
std::cout << "\n";
}
return return_code;
}

Row major returns garbage, while column major shows the correct value _below_ the diagonal and differs by an overall minus sign:

-14 0.230769 -0.153846
-21 -175 0.0555556
14 70 -35

Could someone explain to me what I am doing incorrectly?

Mark Gates

unread,

Jun 21, 2025, 1:17:58 AMJun 21

to Kevin Ingles, SLATE User

LAPACK, hence LAPACK++, is column-major. In most cases, the only way we could support row-major is by physically transposing the matrix in memory. QR in particular is much faster in col-major than row-major, since row-major has strided access. This is unlike BLAS++, which can support both col-major and row-major with close to zero overhead.

QR is unique up to signs. LAPACK is picking a different sign than shown on Wikipedia. There are numerical stability reasons for choosing one sign over the other, although there are algorithms that always make the diagonal of R positive (geqrfp, p for positive).

Note you are printing it out as if row-major, so the output is transposed. It should be

a[ i + j*lda ]

with lda = 3 in this case.

I noticed you are using gelq2, which is a Level 2 BLAS implementation (read: slow) of LQ, instead of QR. LQ is basically the transpose of QR. It's recommended to use geqrf for QR, which is a Level 3 BLAS implementation (read: fast).

Mark

Interim Director, Innovative Computing Laboratory (ICL)

Research Assistant Professor, University of Tennessee, Knoxville

https://icl.utk.edu/~mgates3/

Reply all

Reply to author

Forward

0 new messages