Re: Structured linear algebra etc

Aki Vehtari

unread,

Aug 12, 2016, 3:56:30 AM8/12/16

to stan-dev, Rob Trangucci, vdo...@gmail.com

This discussion started on Twitter, moved to email, but let's continue on stan-dev.

I also noticed you have been working on to speed-up gradient
computation of regular matrix-matrix multiplications. How about
gradients of vector-matrix multiplications? How about gradients of
unary functions for matrices (e.g. element-wise operation exp(A) or
A.^2), which currently produce a big expression tree? These changes
would make user defined vectorized covariance functions and neural
networks (which Bob just tested) faster.

Do you have a specific application where Toeplitz is applicable? I
have found it rare case to be able to use Toeplitz form although we
have one case in GPstuff, too.

Dense SPDs are common for my GP models. Matlab doesn't have a specific
data type for symmetric, but checks it on-the-fly and based on that
chooses appropriate algorithm from BLAS/LAPACK. But since Stan is
using strong typing, it would be natural to have new matrix types,
too. For my GP models the speedup would come mostly if the gradients
would be computed using matrix derivatives instead of elementwise
autodiff. Taking into account the symmetricity would help a bit more.

For Markov random fields, support for sparse precision matrix
computations similar to what INLA does would be a huge speed-up (again
assuming small expression tree). There are tons of useful MRF models,
which would be nice to have in Stan.

For GPs with compact support covariance functions, support for sparse
covariance matrices might be useful, too.

I can also talk with you on Google-hangout, but the current Stan meeting time
is not good for me.

Aki

On 10.08.2016 23:12, Rob Trangucci wrote:
> Ha, yes I should've guessed that Twitter wouldn't be the best medium to discuss
> the Stan development roadmap for structured linear algebra. I cc'd Vince Dorie,
> who's working on this stuff as well.
>
> Thus far, we've just started exploring specializing the cholesky decomposition
> for Toeplitz SPD matrices, i.e.
>
> 1. cholesky_decompose(A) where A is SPD Toeplitz
>
> and
>
> 2. eigen_decompose(A) where A is SPD
>
> The easy (as in not requiring matrix calculus) speed-up for 1. comes from a
> specialized SPD Toeplitz cholesky decomposition. A secondary, but harder (as in
> requiring matrix calculus) speedup would be for a specialized gradient
> calculation of cholesky_decompose(A) w.r.t. SPD Toeplitz. Do you know of any
> implementation of d cholesky_decompose(A) / dA for structured A?
>
> The speed-up for 2. comes from just writing out the explicit gradients for d D
> / d A and d U / d A where A = U * D * U^(-1) and A is SPD. Currently, we're
> relying on naively autodiffing through Eigen's implementation of eigen
> decompositions. This would immediately make the methods outlined here:
> http://sethrf.com/files/fast-hierarchical-GPs.pdf much speedier (I know this is
> GP rather than Gaussian Random Fields).
>
> The next stage would be to specialize the eigen_decomposition for SPD Toeplitz.
>
> This is the current roadmap for structured linear algebra as it stands. 1 and 2
> will likely happen in parallel, with Vince doing the heavy lifting on 1 and me
> doing the heavy lifting on 2. We haven't actually written code yet for either,
> though my guess is that it won't be too hard for 2, whereas specializing any of
> the decompositions for specific structured matrices requires adding new matrix
> types to Stan, which won't be quite as easy. We're also stymied by the fact
> that Eigen doesn't have specialized solvers for structured matrices.
>
>
> Rob
>
> On Wed, Aug 10, 2016 at 3:46 PM, Aki Vehtari <Aki.V...@aalto.fi
> <mailto:Aki.V...@aalto.fi>> wrote:
>
> Hi Rob,
>
> Based on your tweet
> "Cholesky factor of SPD Toeplitz mats and eigen decompositions of SPD mats!
> @avehtari what else we should focus on?"
>
> and the git action, you've started working on structured linear algebra. I
> haven't been able to join the Stan meetings, so could you tell me what you
> have already discussed, what you are already doing and what are you
> planning and how you think they would affect the speed. Then it's easier
> for me to say where to focus and what might be missing :)
>
> Aki
>
>

Bob Carpenter

unread,

Aug 12, 2016, 5:59:07 AM8/12/16

to stan...@googlegroups.com

> On Aug 12, 2016, at 9:56 AM, Aki Vehtari <Aki.V...@aalto.fi> wrote:
>
> This discussion started on Twitter, moved to email, but let's continue on stan-dev.
>
> I also noticed you have been working on to speed-up gradient
> computation of regular matrix-matrix multiplications. How about
> gradients of vector-matrix multiplications?

That'll be next, I'm sure. Not nearly as much latitude
for speedups there, though.

> How about gradients of
> unary functions for matrices (e.g. element-wise operation exp(A) or
> A.^2), which currently produce a big expression tree?

These have to at least create nodes in the expression graph
for exp(a[n]) for each n. We could probably eliminate the
virtual function calls with some fancy footwork. That's not how
the current vectorization framework works, and I haven't thought
about how to generalize, but it's a possibility.

> These changes
> would make user defined vectorized covariance functions and neural
> networks (which Bob just tested) faster.

If we really cared about fast neural networks, we could just
code up the layers as custom functions rather than building them
up out of tanh(alpha * beta) [alpha a vector, beta a matrix, tanh
vectorized].

> Do you have a specific application where Toeplitz is applicable? I
> have found it rare case to be able to use Toeplitz form although we
> have one case in GPstuff, too.

I thought the use case was for time-series GPs. For that,
we'd want to represent a symmetric K x K Toeplitz matrix as
a (K / 2 + 1)-vector.

> Dense SPDs are common for my GP models. Matlab doesn't have a specific
> data type for symmetric, but checks it on-the-fly and based on that
> chooses appropriate algorithm from BLAS/LAPACK. But since Stan is
> using strong typing, it would be natural to have new matrix types,
> too.

cov_matrix is the SPD matrix type in Stan. A type in Stan
is a combination of a base type (int, real, vector, row_vector, matrix)
and a non-negative integer representing number of array dimensions.
Declarations like cov_matrix (for symmetric, positive-definite matrices [I
just submitted the pull request to fix symmetry]) only determine how
transformed parameters are transformed and only provide error checking
for other variable types. Local variables can't be declared with constraints.

So what we've done is left it up to the user, and written functions
that apply to specific types of matrices. Ben's been complaining about
this from the get go, because all those operations check the validity
of the input and can thus slow down processing. We also aren't doing
things like saving a factorization of a matrix once we've done it
once.

Changing any of this would be possible, but a very substantial undertaking.

> For my GP models the speedup would come mostly if the gradients
> would be computed using matrix derivatives instead of elementwise
> autodiff. Taking into account the symmetricity would help a bit more.

Almost all of our matrix operations use matrix derivatives now.

> For Markov random fields, support for sparse precision matrix
> computations similar to what INLA does would be a huge speed-up (again
> assuming small expression tree).

I don't know what this is. Is there a reference somewhere I'll
be able to understand (assume I was a mediocre student in undergrad
linear algebra at best).

> There are tons of useful MRF models,
> which would be nice to have in Stan.
>
> For GPs with compact support covariance functions, support for sparse
> covariance matrices might be useful, too.
>
> I can also talk with you on Google-hangout, but the current Stan meeting time is not good for me.
>
> Aki

P.S. No need to CC Rob or Vince---they should both be on the stan-dev
list.

Aki Vehtari

unread,

Aug 12, 2016, 9:15:23 AM8/12/16

to stan development mailing list

Thanks Bob for the answers.

On Friday, August 12, 2016 at 12:59:07 PM UTC+3, Bob Carpenter wrote:
> If we really cared about fast neural networks, we could just
> code up the layers as custom functions rather than building them
> up out of tanh(alpha * beta) [alpha a vector, beta a matrix, tanh
> vectorized].

I guess it's the same for GP covariance functions?

> > Do you have a specific application where Toeplitz is applicable? I
> > have found it rare case to be able to use Toeplitz form although we
> > have one case in GPstuff, too.
>
> I thought the use case was for time-series GPs. For that,
> we'd want to represent a symmetric K x K Toeplitz matrix as
> a (K / 2 + 1)-vector.

Yes, if the observation times are regular and there are no hierarchical model for several time-series.

> > For Markov random fields, support for sparse precision matrix
> > computations similar to what INLA does would be a huge speed-up (again
> > assuming small expression tree).
>
> I don't know what this is. Is there a reference somewhere I'll
> be able to understand (assume I was a mediocre student in undergrad
> linear algebra at best).

GMRFs naturally produce sparse precision matrices with lot of zeros.

Here's good intro to sparse matrix linear algebra (although it does not contain the latest algorithm developments)

https://www.researchgate.net/profile/Robert_Schreiber2/publication/2332186_Sparse_Matrices_In_Matlab_Design_And_Implementation/links/0c9605229051e76e9c000000.pdf

Following two references give connection to INLA, GMRF and sparse matrix computation

Section 2.1 in
http://onlinelibrary.wiley.com/doi/10.1111/j.1467-9868.2008.00700.x/abstract

Section 2 in
http://www.sciencedirect.com/science/article/pii/S0378375807000845

These papers also discuss deterministic posterior approximations, but you can ignore that (for now).

Aki

Andrew Gelman

unread,

Aug 12, 2016, 4:10:20 PM8/12/16

to stan...@googlegroups.com, Rob Trangucci, vdo...@gmail.com

Twitter sux

> --
> You received this message because you are subscribed to the Google Groups "stan development mailing list" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to stan-dev+u...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

Bob Carpenter

unread,

Aug 13, 2016, 7:22:07 AM8/13/16

to stan...@googlegroups.com

Thanks, Aki.

Lol that Andrew doesn't like Twitter given that he has 9K followers
and generates as many tweets as blog posts:

https://twitter.com/StatModeling?ref_src=twsrc%5Egoogle%7Ctwcamp%5Eserp%7Ctwgr%5Eauthor

- Bob

Rob Trangucci

unread,

Aug 15, 2016, 11:17:19 AM8/15/16

to stan-dev

Sorry to reply out-of-turn, but wanted to address your points too.

Rob

On Fri, Aug 12, 2016 at 3:56 AM, Aki Vehtari <Aki.V...@aalto.fi> wrote:

This discussion started on Twitter, moved to email, but let's continue on stan-dev.

I also noticed you have been working on to speed-up gradient
computation of regular matrix-matrix multiplications. How about
gradients of vector-matrix multiplications?

These receive about a 2X speed boost with the new matrix multiply specialization.

How about gradients of
unary functions for matrices (e.g. element-wise operation exp(A) or
A.^2), which currently produce a big expression tree? These changes
would make user defined vectorized covariance functions and neural
networks (which Bob just tested) faster.

This is certainly doable without too much work using Marcus' framework for matrix functions with precomputed gradients. It would be best to add different covariance functions that include precomputed gradients using Eigen for the internal matrix arithmetic. We'll be doing that shortly for cov_exp_quad.

Do you have a specific application where Toeplitz is applicable? I
have found it rare case to be able to use Toeplitz form although we
have one case in GPstuff, too.

My use cases are all of the sort where I have a time series with missing observations (say polling data observed on irregular days), and I want to use a GP prior on a time-series residuals. I use the multivariate normal NCP trick to generate the full draw from the GP prior as if I had observed all the time points, but then I use the only observed time steps in the likelihood (which is often Poisson or binomial). It seems like this would also work for spatial GPs as well. This can also be a good way to parameterize ARMA- type priors on latent variables as well.

Dense SPDs are common for my GP models. Matlab doesn't have a specific
data type for symmetric, but checks it on-the-fly and based on that
chooses appropriate algorithm from BLAS/LAPACK. But since Stan is
using strong typing, it would be natural to have new matrix types,
too. For my GP models the speedup would come mostly if the gradients
would be computed using matrix derivatives instead of elementwise
autodiff. Taking into account the symmetricity would help a bit more.

We currently do take into account the symmetry for cholesky_decompose, which computes its gradient efficiently. I don't think symmetry is taken into account when the covariance matrix is fed into the multivariate normal, which uses a specialized LDLT decomposition internally which has precomputed gradients.

For Markov random fields, support for sparse precision matrix
computations similar to what INLA does would be a huge speed-up (again
assuming small expression tree). There are tons of useful MRF models,
which would be nice to have in Stan.

I agree, I think we need to add a SparseMatrix type to the language. I think this would be a relatively large project, given that we'd need to add new io functions for reading/writing in sparse matrices. Luckily, Eigen supports sparse solvers and sparse addition and multiplication, so we'd probably benefit from using Eigen to do the computation under the hood for precomputed gradients.

For GPs with compact support covariance functions, support for sparse
covariance matrices might be useful, too.

I can also talk with you on Google-hangout, but the current Stan meeting time is not good for me.

Aki

Do you have time later this week to talk on Google hangouts? Say Tuesday?

I picked Toeplitz matrices because I had a use case, but it sounds like adding an SPD type to Stan that does more than cov_matrix would be useful. We could potentially specialize addition, and multiplication for SPD types as well.

Bob Carpenter

unread,

Aug 15, 2016, 11:21:09 AM8/15/16

to stan...@googlegroups.com

Adding new types, like sparse matrix, is relatively easy for
the language itself. What's hard is the I/O, because it
has to be pushed through var_context, the writers, and then
all the interfaces. I have sparse matrices prioritized for
after ragged arrays, but could be convinced otherwise. Once
I do one and work through all the new I/O, the second should
be relatively easy.

- Bob

Reply all

Reply to author

Forward