Dealing with errors in c-based lapack libraries

28 views
Skip to first unread message

Brendan Tracey

unread,
Aug 11, 2015, 2:08:11 PM8/11/15
to gonum-dev
I found a bug in the Lapack implementation in one of the LU related routines [0]. I'm not sure how we want to deal with this. This bug is also present in OpenBLAS. I'm not sure the correct way to handle it.

On the one hand, it's not necessarily true that all c-based implementations have this bug present, so it seems wrong to change the behavior of the function to provide a workaround for this bug. On the other hand, with the current failure a warning is printed and otherwise the function proceeds as if normal (so it is easy to get the incorrect result). It seems dangerous to allow this behavior. Furthermore, do there exist other implementations of row-major lapack? If all C based implementations have the same wrong behavior then we can just work around it.

Thoughts? If the time to fix is short enough then it doesn't matter, but even then there will likely be many users who have not updated their lapack library.

[0] https://github.com/xianyi/OpenBLAS/issues/615

Dan Kortschak

unread,
Aug 11, 2015, 5:22:01 PM8/11/15
to Brendan Tracey, gonum-dev
Mark the cgo routine with a BUG comment explaining the issue.

Brendan Tracey

unread,
Aug 11, 2015, 5:34:24 PM8/11/15
to Dan Kortschak, gonum-dev
But how does that propogate to mat64, for example? We’ve been moving to lapack-based implementations. Do we document that Solve for m < n does not work on some clapack implementations? At that point the bug isn’t in mat64, but that’s where the user is likely to encounter it.

Dan Kortschak

unread,
Aug 11, 2015, 5:44:30 PM8/11/15
to Brendan Tracey, gonum-dev
Yes, put a BUG comment there too explaining that the behaviour is dependent on the bakcing implementation and that for some cgo implementations it may be wrong. When it's fixed upstream, we remove the comment and add documentation to reinforce the notion that the backing libraries may be wrong and that we only guarantee correct behaviour when the backing BLAS and LAPACK are correct (and point to testing their code first with native).
Reply all
Reply to author
Forward
0 new messages