First of all, thank you for the work and the initiative. Lack of
standard C++ API for these two libraries is a real concern.
A/ Some comments
I do agree with the report and the chosen approach. In peculiar I
think one must stick to basic function template to define the API and
do not introduce too much C++ "Meta-Programming" at this stage.
1/ Blaze library I regret that the Blaze library (
https://bitbucket.org/blaze-lib/blaze)
is not cited in "2.3/ Programming Language C++". IMHO Blaze
performance and code quality would worth a citation. The Expression
Templates approach is also well detailed in two companion publications
(cited at the end of the html page).
2/ On non-const reference of a temporaryPage 16 of the report I read "Because C++ can’t take a non-const
reference of a temporary, the output submatrix of each call must be a
local variable". I think this is misleading. Maybe this was true when
the Elemental lib was created, but this is no longer the case. Now
with C++11 rvalue reference
(
http://en.cppreference.com/w/cpp/language/reference)
you can write code like:
class Matrix {
 public:     Matrix view();
};
El::Herk (..., Matrix&& matrix) { ... }
//-------------
Matrix m;
El::Herk(...,m.view())
without having to use a local variable to store m.view(), as you wrote
in the report.
3/ Error managementConcerning the design, IMHO I think that there is still room for improvement.
First I think there is a benefit of splitting "errors" into two categories.
See
https://isocpp.org/wiki/faq/exceptions#how-exceptions and
https://isocpp.org/wiki/faq/exceptions#why-not-exceptions for instance
3.1/ Error related to caller direct responsibility:
For instance calling a routine with a negative size. I think this kind of errors must result in pre-condition/assertion failure. The caller has to fix its code!
3.2/ Run-time numerical error/integer overflow error...In that cases, exceptions make more sense. For instance If a matrix is too ill-conditioned, throw an exception.
However I think the proposed implementation (p 33) can be problematic to process on the
caller side.
Take for instance:
throw_if_(n,std::numeric_limits < blas_int >::max ());And imagine a user calling Lapack using your API:
try {
 lapack_subroutine(...)
}
catch(const Error& e) {
// Now how to identify the problem?
if(e.what()=="std::numeric_limits < blas_int >::max()") {
 // ...
}
} I think this is problematic at least two reasons:
a/ The string "
std::numeric_limits < blas_int >::max()" can change.
For instance if you use automatic code formatting tool, maybe the
returned string will be "
std::numeric_limits<blas_int>::max()"
b/ If you have error like a matrix element is NaN or a "pivot is too
small" this would be nice to be able to retrieve which row/column is
concerned. With the current implementation this extra information is
lost. One need a mechanism to store this information in the returned
exception. (see
http://www.boost.org/doc/libs/1_64_0/libs/exception/doc/boost-exception.htmlfor instance).
B/ Fix minor problems of the current Blas/Lapack libs?The main reason of my post is that, as a Blas/Lapack C++ user, I am
always annoyed with some problems I will try to give some examples below. Some
of them are minor and maybe this effort is an opportunity to fix them.
1/ About NoTrans, Trans, TrancConj... (p 27)
1.1/ NoTrans, Trans, TrancConj are not closed for composition operator
"Conjugation" is missing and this is a barrier to develop general code
involving operator composition. For instance, I want to be able to
write:
Conj = Trans( TransConj )
Also BLAS does not define operations like:
y=a.conjugate(M).x+b.y
1.2/ Better names, for instance "NoTrans"="Identity"?
IMHO NoTrans, as you mention it, is not a good name. Why not simply use "Identity" and take the opportunity now to define:
"Op" with Identity, Transpose, TransConjugate, Conjugate readable names?
IMHO this would improve code readability.
In summary maybe define:
enum class Op : char { Identity = 'N', Transpose = 'T', TransConjugate = 'C', Conjugate = 'char_to_define_' };
For "Conjugate" operation which is not currently supported you
can throw an "not_implemented_yet" exception.
2/ There are some inconsistencies in the Blas API
2.1/ For instance, consider
- TRMV computes x <- op(A).x
- TRMM computes X <- alpha.op(A).X
We have a presence/absence of the "alpha" scalar if X is a vector or a matrix. Why not simply define the same:
x <- alpha.op(A).X
pattern for both cases?
2.2/ I also would love to have a general subroutine of the form:
C <- alpha.Op(A).B + beta.C
It exists for A=generic, symmetric/hermitian matrix but not for triangular ones.
However when A is triangular, we are not always interested by B in-place matrix product (the TRMM subroutines)
This lack of API uniformity is still a problem when we try to write generic code.
3/ Vocabulary: n and m
In Europe n is the number of rows and m the number of columns. I think in USA this is the reverse.
The US convention (m = # row, n # col) is certainly more logical, for instance in M_{i,j} of a m x n matrix, the letters pair (i,j) & (m,n) are following the same alphabetical order.
 However this is a little bit disturbing for an European reader. Maybe using I_size and J_size for row & column size removes this problem and every one is happy.
4/ Concerning Lapack/Lapacke
I was disappointed by Lapacke. It pretends to also manage RowMajor
format, but under the hood, it simply copies the matrix into the
ColMajor format (as you wrote in your report). This has a performance
impact and this is not really fair.
Personally when I code something using "Lapack", I stick to the
ColMajor mode for this reason, even if I use the Lapack_e_ interface.
Do you think that with time we will have a real RowMajor mode support for Lapack (without
temporary copies)? Is it one of your goal?
C/ Evolution of Blas/Lapack? Blis/Flame like approaches.As a user, I really like the approach taken by Blis
(
https://github.com/flame/blis). It supports "Conjugate" operator (my
1.1/ point) and general "stride" in both directions.
In this context a matrix is defined with two "strides" (not only with one "leading dimension"):
double *data,
int I_dim, int J_dim,
int I_stride, int I_stride
With that, you have generic in-place transposition and generic 2D
views. I think this is an incredible advantage in some applications
involving general tensors or multidimensional datasets.
I understand that this is out of scope and not supported by
Blas/Lapack.
Do you think you will go in this direction?
If so, the RowMaj and ColMajor enum is not relevant anymore:
ColMajor <=>Â (I_stride==1)RowMajor <=> (J_stride==1)
Maybe it is time to define a generic interface:
void gemv(
         const Op transA,
         const int64_t A_I_size, const int64_t A_J_size,
         const T alpha,
      const T const* A, const int64 A_I_stride, const int64 A_J_stride,
         const T const* x, const int64 x_stride,
         const T beta,
         T const* y, const int64 y_stride)
and return a "not_implemented_yet" exception for general strides (id (col_stride!=1)&&(row_stride!=1))
   Â
Again thank you for the work and for encouraging such open discussions.
Good luck & success for this project!
Best,
Vincent Picaud