gels precission?

49 views
Skip to first unread message

bcsj

unread,
Jan 29, 2024, 8:55:47 AM1/29/24
to SLATE User
I've been running some small test least squares problems to figure out a discrepancy we're seeing in results from the SLATE based code and our original python code.

As a test I solved the following problem where A is a 10x10 matrix with values 0,...,99 ordered along the rows, e.g. first row is 0, 1, 2, 3, 4, ..., 9; next 10, 11, ... etc.

The rhs B is a vector (10x1 matrix) with the values 0,1, ..., 9. 

I added regularization by attaching a 10x10 diagonal block to A making it 20x10 and a zero vector to B (now a 20x1 matrix). The diagonal of A is is the fourth root of machine epsilon (eps = 2.220446049250313e-16, diagonal element = 0.0001220703125).

Solving this in MATLAB or Python (they agree) I get a solution vector 

0.0345454545435952
0.0290909090894518
0.0236363636353082
0.0181818181811647
0.0127272727270212
0.00727272727287767
0.00181818181873411
0.00363636363540941
0.00909090908955286
-0.0145454545436963


But using SLATE (using slate::Matrix<double>) I get 

0.034536
0.0291014
0.0236314
0.0181721
0.0127565
0.00726051
0.00182334
-0.00364526
-0.00909073
-0.0145451


This is a pretty significant error, I would have expected a significantly smaller difference.

I can only assume that I must be missing something? That I'm doing something wrong somehow?

bcsj

unread,
Jan 30, 2024, 1:56:29 AM1/30/24
to SLATE User, bcsj
Ah, somehow I accidentally got rid of 2 negative signs in the last 3 lines of the MATLAB/NumPy-result. Here are the correct values.

0.0345454545435952
0.0290909090894518
0.0236363636353082
0.0181818181811647
0.0127272727270212
0.00727272727287767
0.00181818181873411
-0.00363636363540941
-0.00909090908955286
-0.0145454545436963

bcsj

unread,
Jan 30, 2024, 4:27:26 AM1/30/24
to SLATE User, bcsj
As a follow-up:

The former was based on an earlier release of slate. Since yesterday I pulled the 2023-11-05 SLATE release, recompiled and ran the code with the new version, which has more promising results.

Here are what I get with SLATE 2023-11-05 release least squares for the same problem as above. I've highlighted the decimals where the results begin to differ from the MATLAB/Numpy, on the order of 1e-14 to 1e-12, which is much more reasonable. Really makes me wonder what is going on in the earlier version.

 0.0345454545431762
 0.0290909090931238
 0.0236363636354853
 0.0181818181785512
 0.0127272727202263
 0.0072727272769715
 0.0018181818209852
-0.0036363636355044
-0.0090909090896817
-0.0145454545438391











Mark Gates

unread,
Jan 30, 2024, 5:32:45 PM1/30/24
to bcsj, SLATE User
Thanks for the error report. I don't know what the original problem may have been. Do you know what version of SLATE exhibited the error?

The usual error check we use for least squares problems is to check the orthogonality of the residual to the column span of the matrix. That is,

    R = B – AX
    err = norm( R^H A ) / ( max( m, n, nrhs ) * norm( A ) * norm( B ) )

We use the 1-norm. See LAPACK Working Note 41, section "Tests for the Least Squares Driver Routines".

Using this error measure indeed shows that the original solution you obtained with SLATE is inaccurate:

python error 4.02e-17
matlab error 8.06e-18
slate1 error 2.12e-07  # inaccurate
slate2 error 2.90e-17

The condition number of this matrix is 4.7e6, so you shouldn't expect a terribly accurate forward error in x. I would expect to lose 6 digits of accuracy. So solutions should agree in the first 10 digits or so, but could legitimately disagree in the last 6 digits or so.

If you could share your test code and what version of SLATE exhibited the error, that would greatly simplify reproducing the error, if still needed.

Mark

bcsj

unread,
Jan 31, 2024, 2:59:00 AM1/31/24
to SLATE User, mga...@icl.utk.edu, SLATE User, bcsj
Hi Mark,

Thank you for following up.

I realized that I haven't been using a "stable release" previously. I checked the first version I was using and it appears to be a pull at this commit: 
The lapackpp version was this hash: 4206bbf

The new version I compiled I made sure it was the 2023-11-05 release tagged version for both.

I don't know if it could matter, but for completeness I should mention that I switched some environment modules on LUMI for compiling, from PrgEnv-cray/8.3.3 previously to PrgEnv-cray/8.4.0, and LUMI/23.03 to LUMI/23.09.

My test code is here. 


#include <iostream>
#include <iomanip>

#include <slate/slate.hh>
#include <slateUtils.hh>

template <typename scalar_t>
void populate(slate::Matrix<scalar_t> &A) {
    int64_t m = A.m();
    int64_t n = A.n();
    int64_t mt = A.mt();
    int64_t nt = A.nt();
    int64_t nn = 0, mm = 0, nb, mb;
    for (int64_t j = 0; j < nt; j++) {
        nb = A.tileNb(j);
        mm = 0;
        for (int64_t i = 0; i < mt; i++) {
            mb = A.tileMb(j);
            if (A.tileIsLocal(i, j)) {
                A.tileGetForWriting(i, j, slate::LayoutConvert::RowMajor);
                auto tile = A(i, j);
                auto tiledata = tile.data();
                int64_t stride = tile.stride();
                for (int64_t ii = 0; ii < mb; ii++) {
                    for (int64_t jj = 0; jj < nb; jj++) {
                        int64_t kk = jj + ii * stride;
                        tiledata[kk] = (mm + ii) * n + nn + jj;
                    }
                }
            }
            mm += mb;
        }
        nn += nb;
    }
    A.tileUpdateAllOrigin();
    A.releaseWorkspace();
}


int main(int argc, char** argv) {

    // Initialize MPI, require MPI_THREAD_MULTIPLE support
    int err=0, mpi_provided=0;
    err = MPI_Init_thread( &argc, &argv, MPI_THREAD_MULTIPLE, &mpi_provided );
    assert( err == 0 && mpi_provided == MPI_THREAD_MULTIPLE );

    int64_t m = 10;
    int64_t n = m;
    int64_t nb = 4;
    int64_t p = 2;
    int64_t q = 2;

    double eps = std::numeric_limits<double>::epsilon();
    double beta = std::sqrt(eps);

    slate::Matrix<double> A(m + n, n, nb, p, q, MPI_COMM_WORLD );
    A.insertLocalTiles();

    auto H = A.slice(0, m-1, 0, n-1);
    auto R = A.slice(m, m+n-1, 0, n-1);
    populate(A);

    double gamma = std::sqrt(m * beta / n);
    std::vector<double> diag(n, gamma);
    slateUtils::matrix::diagonal(R, diag); // Sets diag as the diagonal, 0 elsewhere

    slate::Matrix<double> B(m+n, 1, nb, p, q, MPI_COMM_WORLD );
    B.insertLocalTiles();

    auto Y = B.slice(0, m-1, 0, 0);
    auto YR = B.slice(m, m+n-1, 0, 0);
    populate(Y);
    slateUtils::matrix::zero(YR); // Sets all elements to zero

    slateUtils::matrix::csv::write(A, "outA0.csv", ',', false); // Writes matrix to .csv file
    slateUtils::matrix::csv::write(B, "outB0.csv", ',', false);
   
    slate::gels<double>(A, B);

    slateUtils::matrix::csv::write(A, "outA1.csv", ',', false);
    slateUtils::matrix::csv::write(B, "outB1.csv", ',', false);

    return 0;

}

bcsj

unread,
Feb 9, 2024, 6:25:29 AM2/9/24
to SLATE User, mga...@icl.utk.edu, SLATE User, bcsj
Hi Mark,

Is there a good way to do this residual check after slate::gels has run? The function modifies the system matrix A and rhs b in the process, do I need to keep copies of them for that, or can it be done with the modified information that remains?

On Wednesday, January 31, 2024 at 12:32:45 AM UTC+2 mga...@icl.utk.edu wrote:

Mark Gates

unread,
Feb 9, 2024, 9:31:24 AM2/9/24
to bcsj, SLATE User
In our tester, we keep copies of A and B to compute residuals.

I will have to see whether we can compute the residuals internally. In LAPACK dgels:
 if TRANS = 'N' and m >= n, rows 1 to n of B contain the least
 squares solution vectors; the residual sum of squares for the
 solution in each column is given by the sum of squares of
 elements N+1 to M in that column;
I haven't looked how that is computed and if the same can be computed easily by SLATE. It would provide a nice internal consistency check for us.

Mark


Interim Director, Innovative Computing Laboratory (ICL)
Research Assistant Professor, University of Tennessee, Knoxville
Reply all
Reply to author
Forward
0 new messages