Seems default DenseMatrix * DenseVector will allocate memory for result but for almost all iterative optimizers the workspace memory that comes bundled in the State can be re-used. Today we are running DenseMatrices but then for CSCMatrics also this will show up...
Is it possible to have a in-place alternative for the result ?
Even if I use OpMulMatrix.Impl2[M, T, T] and do res := mult(ata, x), the output of gemv will be allocated but I already have the memory of res allocated in the state and I want something like mult(ata, x, res)
I see something like this is done in axpy through InPlaceImpl3 but that's defined for vector operations y = ax + b only...
def axpy[A, X, Y](a: A, x: X, y: Y)(implicit axpy: scaleAdd.InPlaceImpl3[Y, A, X]): Unit = { axpy(y, a, x) }
We need to define these in-place operators for matrices...
Should I list these down and open up an issue ? I need these 5 in-place for now I think
1. dgemv : for dense matrix vector multiply
2. dgetrf: LU triangular factorization cache
3. dpotrf: cholesky triangular factorization cache
4. dgetrs: for triangular back solves related to LU
5. dpotrs: for triangular back solves related to Cholesky