QR Decompositions

5 views
Skip to first unread message

Kevin Ingles

unread,
Jun 21, 2025, 12:16:44 AMJun 21
to SLATE User
I am trying to just setup a simple script to make sure I understand how LAPACK++ expects matrix memory layout, and have encountered a frustrating and confusing bug:

As an example, I use the matrix given on the QR Factorization Wikipedia page:

#include <lapack.hh>

int main() {
   // Row major
    float a[] = { 12.0f, -51.0f, 4.0f, 6.0f, 167.0, -68.0f, -4.0f, 24.0f, -41.0f};
    // Column major
    // float a[] = { 12.0f, 6.0f, -4.0f, -51.0f, 167.0f, 24.0f, 4.0f, -68.0f, -41.0f };
    float t[3];
    int64_t return_code = lapack::gelq2(3, 3, a, 3, t);
    std::cout << return_code << "\n";
    for (int i = 0; i < 3; ++i)
    {
        for (int j = 0; j < 3; ++j)
            std::cout << "\t" << a[i * 3 + j] << " ";
         std::cout << "\n";
    }
    return return_code;
}

Row major returns garbage, while column major shows the correct value _below_ the diagonal and differs by an overall minus sign:

        -14     0.230769        -0.153846
        -21     -175    0.0555556
        14      70      -35

Could someone explain to me what I am doing incorrectly?

Mark Gates

unread,
Jun 21, 2025, 1:17:58 AMJun 21
to Kevin Ingles, SLATE User
LAPACK, hence LAPACK++, is column-major. In most cases, the only way we could support row-major is by physically transposing the matrix in memory. QR in particular is much faster in col-major than row-major, since row-major has strided access. This is unlike BLAS++, which can support both col-major and row-major with close to zero overhead.

QR is unique up to signs. LAPACK is picking a different sign than shown on Wikipedia. There are numerical stability reasons for choosing one sign over the other, although there are algorithms that always make the diagonal of R positive (geqrfp, p for positive).

Note you are printing it out as if row-major, so the output is transposed. It should be
    a[ i + j*lda ]
with lda = 3 in this case.

I noticed you are using gelq2, which is a Level 2 BLAS implementation (read: slow) of LQ, instead of QR. LQ is basically the transpose of QR. It's recommended to use geqrf for QR, which is a Level 3 BLAS implementation (read: fast).

Mark

Interim Director, Innovative Computing Laboratory (ICL)
Research Assistant Professor, University of Tennessee, Knoxville

Reply all
Reply to author
Forward
0 new messages