something for Stan that would be very useful for EP

61 views
Skip to first unread message

Andrew Gelman

unread,
Apr 17, 2015, 2:10:11 PM4/17/15
to stan...@googlegroups.com
Hi, I’ve been thinking a lot about EP lately, and I realized that there’s something that’s currently possible but not easy to do in Stan, that I think might be worth implementing so users can do it a more automatic and natural way.

What I want is to be able to put a multivariate normal, or maybe multivariate t, prior on all the parameters of the model, all on the unconstrained scale and all concatenated into a long vector. To do this now is possible, but it requires doing the transformations and concatenation “by hand” in the model. Since Stan is working with this big concatenated object anyway, it would be convenient to be able to work with it directly.

The other thing I want to be able to do is to work with this concatenated vector in the postprocessing, as for EP we will need to compute its mean and covariance matrix.

Oh, and one more thing, while we’re on the topic: Sometimes we’re doing importance weighting and we want to compute weighted means, weighted variances, and weighted cov matrix. These are just the usual formulas, with the difference that, instead of taking averages, we take weighted averages with prespecified weights. When all the weights are equal, it reduces to the simple mean, var, and cov matrix, with the minor difference that it’s N, not N-1, in the denominator of the var and cov (which is just the way it is; it’s not worth bothering to try to get this adjustment in).

A

P.S. I’d be happy to also put this on Github if you tell me where it goes, but I thought this would be worth sharing with the list in any case.

Bob Carpenter

unread,
Apr 18, 2015, 12:20:54 AM4/18/15
to stan...@googlegroups.com
Presumably you want to add some kind of likelihood, too?
Will that just be distributions for data so that there's no
Jacobian adjustment required for the constraining transform?

If so, this probably won't be so hard --- we replace the default
uniform on the unconstrained params with a multivariate normal or
Student-t and then turn off the Jacobian.

The only trick would be specifying the prior parameters. And of
course plumbing this through the sampler.

We're also going to be exposing the transforms, so another way to
go would be to have this, which is what you have to do now:

data {
int<lower=0> N;
vector[N] mu_theta_raw;
cov_matrix[N] Sigma_theta_raw;
parameters {
vector[N] theta_raw;
}
transformed parameters {
... define params theta in terms of theta_raw using transforms ...
}
model {
theta_raw ~ multi_normal(mu_theta_raw, Sigma_theta_raw);

... likelihood ...
}

But it's still going to be ugly because of slicing theta_raw into
segments.

- Bob
> --
> You received this message because you are subscribed to the Google Groups "stan development mailing list" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to stan-dev+u...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
>
>

Ben Goodrich

unread,
Apr 18, 2015, 11:53:34 AM4/18/15
to stan...@googlegroups.com
On Saturday, April 18, 2015 at 12:20:54 AM UTC-4, Bob Carpenter wrote:
If so, this probably won't be so hard --- we replace the default
uniform on the unconstrained params with a multivariate normal or
Student-t and then turn off the Jacobian.  

I think Alp is essentially doing this now, albeit perhaps not in the most direct way. I'm doing this by hand for a conference in a couple of weeks:
  1. z is a K-vector with iid standard normal priors and L has an lkj_corr_cholesky(1) prior
  2. x = L * z is multivariate normal transformed parameter with marginal mean zero and unit variance but correlation matrix L * L'
  3. p_k = Phi(x[k]) is a marginally standard uniform transformed parameter
  4. theta_k = F_k^{-1}(p_k) is a transformed parameter with CDF F_k()

This is basically a Gaussian copula with marginal distributions that are given by your substantive prior beliefs about each theta_k.

But if the covariance matrix of the multivariate normal is unknown, then doing this multivariate Matt trick is probably a better default. Also, I need to implement more inverse CDFs.

Ben

Alp Kucukelbir

unread,
Apr 18, 2015, 12:57:36 PM4/18/15
to stan...@googlegroups.com
Ben is spot on. The fullrank ADVI algorithm is closely related to gaussian EP. At a high level, here is the difference:

ADVI: min KL (q_fullrank_gaussian || posterior)

EP: min local-KL (posterior || q_fullrank_gaussian)

where "local KL" means that we only treat one component of the posterior at a time (via the tilted distribution.)

That being said, I think Andrew is asking for something different. In ADVI we automatically construct a fullrank Gaussian in the unconstrained parameter space. But we don't use it as a "prior". Actually now that I re-read it, I'm not quite sure I understand what Andrew is asking for.

Cheers
Alp

Ben Goodrich

unread,
Apr 18, 2015, 1:57:46 PM4/18/15
to stan...@googlegroups.com

I think you are both asking for an easier way to do the same transformations, just for different reasons.

Ben

Alp Kucukelbir

unread,
Apr 18, 2015, 2:11:28 PM4/18/15
to stan...@googlegroups.com
i wasn't aware i was asking for anything :)

the transformations i need for ADVI are all coded in, so i'm good!

cheers
alp
> --
> You received this message because you are subscribed to a topic in the Google Groups "stan development mailing list" group.
> To unsubscribe from this topic, visit https://groups.google.com/d/topic/stan-dev/pmSoCViRIrs/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to stan-dev+u...@googlegroups.com.

Andrew Gelman

unread,
Apr 18, 2015, 4:57:25 PM4/18/15
to stan...@googlegroups.com
Hi, we can talk. The EP distribution looks like a prior on the vector of parameters in the model. It’s not a distribution for data, as that would have no effect on the posterior (at least, if I think I understand what you’re saying). And, yes, exactly, if the transformed parameters are exposed within Stan, that would be great. Regarding the segments: Yes, for various purposes we’d need to access these, but if we’re careful we can do a lot of basic EP stuff on the transformed space and not need to touch the separate parameters.
A
Reply all
Reply to author
Forward
0 new messages