Article or proper explanation of the "Matt trick" / non-centered parameterization

Jökull Snæbjarnarson

unread,

Apr 15, 2016, 8:06:41 AM4/15/16

to Stan users mailing list

Hi guys,

I wish to better understand "the Matt Trick" / non-centered parameterization and how it makes the sampler explore hierarchical effects more effective. Thus I'm looking for a thorough explanation of the Matt trick not only how it's implemented but how it works, when and why.

Could any of you point out some good articles on this?

best regards,

Jökull Snæbjarnarson

Dustin Tran

unread,

Apr 15, 2016, 8:15:14 AM4/15/16

to stan-...@googlegroups.com

Hi, I think Michael B has a good article on this https://arxiv.org/abs/1312.0906

Dustin

--
You received this message because you are subscribed to the Google Groups "Stan users mailing list" group.
To unsubscribe from this group and stop receiving emails from it, send an email to stan-users+...@googlegroups.com.
To post to this group, send email to stan-...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Michael Betancourt

unread,

Apr 15, 2016, 9:36:07 AM4/15/16

to stan-...@googlegroups.com

My paper focuses on Hamiltonian Monte Carlo — for a

general discussion of the auxiliary parameterization in

general see the references therein by Gareth Roberts

and Omiros Papaspiliopoulos.

Guido Biele

unread,

Apr 15, 2016, 12:30:21 PM4/15/16

to Stan users mailing list

I think the section on

"Divergent Transitions and Mitigation Strategies"

in this case study (by Bob)

provides a good intuitive explanation on how non-centered parameterization

helps to avoid divergent iterations and improves efficiency by allow larger

step sizes.

Cheers - Guido

Bob Carpenter

unread,

Apr 15, 2016, 12:42:29 PM4/15/16

to stan-...@googlegroups.com

There's also the extreme example of reparameterizing
Radford Neal's funnel example (hierarchical model with no data).

Betancourt and Girolami's the definitive source.

- Bob

John Hall

unread,

Feb 1, 2017, 3:49:08 PM2/1/17

to Stan users mailing list

I found this paper quite informative. In particular, I had been playing around with a model that had drastically different n_eff if I use the centered parameteriation or the non-centered parameterization (centered was better, the original data set was about 120X500 with 120 groups having about 500 data points each). I suppose I had been working under the assumption that it was always better to use non-centered parameterization in cases like this, which is why it took me so long to try it the other way.

One part of the paper I wasn't entirely clear on was Figure 8. Am I correct that this was created by adjusting the value of alpha in the transformed data section of the "Generate One-Way Normal Pseudo-data" (and then running the respective other models and gathering the statistics)? If this is the case, then it looks like if you had standardized the data, then the centered parameterization is preferred. Also if this is the case, then it becomes obvious why the non-centered parameterization didn't work well in my case: the standard deviation of my data was closer to 0.08, well inside the region where centered parameterization produced more effective samples per unit time.

I would be curious how important other factors are in driving the outperformance of one versus the other. For instance, the number of groups vs. the amount of data or the standard deviation within the groups vs. the standard deviation of all the data.

Jonah Gabry

unread,

Feb 1, 2017, 4:15:02 PM2/1/17

to Stan users mailing list

On Wednesday, February 1, 2017 at 3:49:08 PM UTC-5, John Hall wrote:

For instance, the number of groups vs. the amount of data or the standard deviation within the groups vs. the standard deviation of all the data.

It's not so much the amount of data but rather how informative the data is about the parameters. You can have a small dataset that is informative enough to really pin down parameters and you can have large datasets where the data doesn't provide too much information about the parameters. If you want to see this in action in a simple example then you can play around with the eight schools example. If you leave the number of data points at 8 but modify y (or sigma) to make the data more or less informative about the parameters then which parameterization to use will depend on how you scale y (or sigma). In all cases the amount of data remains the same.

Jonah

Michael Betancourt

unread,

Feb 1, 2017, 5:51:48 PM2/1/17

to stan-...@googlegroups.com

These concerns are best analyzed empirically using Stan’s diagnostics.

See, for example, https://github.com/betanalpha/knitr_case_studies/tree/master/divergences_and_bias.

John Hall

unread,

Feb 1, 2017, 6:43:16 PM2/1/17

to stan-...@googlegroups.com

Diagnostics are good, but I'm also trying to develop some sort of intuition surrounding the issue. Thus, Jonah's advice to play around with sigma in the 8 schools problem was a good one.

For instance, if I multiply sigma by 100, then the non-centered has significantly more n_eff than centered. And the reverse if I divide by 100. This is similar to the figure from the paper in that the lower standard deviation favors the centered approach.

The degree of pooling seems to matter quite a bit. In this case, looking at something like the average sigma vs. the mean value of tau gave a better indication of whether the centered or the non-centered would be better than just looking at the average of the sigmas. But this probably only works because each group has one member. So probably more generally, I would refer to the Table 18.4 on page 394 of Gelman and Hill. The more pooling there is, the more reason to use the non-centered parameterization.

On a somewhat related topic, Stan's documentation could do a little better on page 325 with a heading of non-centered parameterization and the text that follows is:

When there is a lot of data, such a hierarchical model can be made much more efficient
by shifting the data’s correlation with the parameters to the hyperparameters. Similar
to the funnel example, this will be much more efficient in terms of effective sample
size when there is not much data (see (Betancourt and Girolami, 2013)).

which is not exactly the easiest to follow. And honestly I think there is a typo here. The paragraph before actually makes pretty clear to use the centered parameterization when there is a lot of data (which I wish I had read before). This starts out talking about "when there is a lot of data," which is the opposite of the use case of non-centered parameterization. I would re-write this as

When there is not much data, a non-centered parameterization can be much more efficient in terms of effective sample size by shifting the data’s correlation with the parameters to the hyperparameters. (see (Betancourt and Girolami, 2013)).

On Wed, Feb 1, 2017 at 5:51 PM, Michael Betancourt <betan...@gmail.com> wrote:

These concerns are best analyzed empirically using Stan’s diagnostics.
See, for example, https://github.com/betanalpha/knitr_case_studies/tree/master/divergences_and_bias.

On Feb 1, 2017, at 4:15 PM, Jonah Gabry <jga...@gmail.com> wrote:

On Wednesday, February 1, 2017 at 3:49:08 PM UTC-5, John Hall wrote:
For instance, the number of groups vs. the amount of data or the standard deviation within the groups vs. the standard deviation of all the data.

It's not so much the amount of data but rather how informative the data is about the parameters. You can have a small dataset that is informative enough to really pin down parameters and you can have large datasets where the data doesn't provide too much information about the parameters. If you want to see this in action in a simple example then you can play around with the eight schools example. If you leave the number of data points at 8 but modify y (or sigma) to make the data more or less informative about the parameters then which parameterization to use will depend on how you scale y (or sigma). In all cases the amount of data remains the same.

Jonah

--
You received this message because you are subscribed to the Google Groups "Stan users mailing list" group.

To unsubscribe from this group and stop receiving emails from it, send an email to stan-users+unsubscribe@googlegroups.com.

To post to this group, send email to stan-...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the Google Groups "Stan users mailing list" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/stan-users/HWiUtQLRCfE/unsubscribe.
To unsubscribe from this group and all its topics, send an email to stan-users+unsubscribe@googlegroups.com.

Jonah Sol Gabry

unread,

Feb 1, 2017, 7:12:03 PM2/1/17

to stan-...@googlegroups.com

You're right about that paragraph in the manual. Thanks for catching that. I can add this to the GitHub issue for edits to the manual for the next release.

Jonah

On Wed, Feb 1, 2017 at 6:43 PM John Hall <john.mic...@gmail.com> wrote:

Diagnostics are good, but I'm also trying to develop some sort of intuition surrounding the issue. Thus, Jonah's advice to play around with sigma in the 8 schools problem was a good one.

For instance, if I multiply sigma by 100, then the non-centered has significantly more n_eff than centered. And the reverse if I divide by 100. This is similar to the figure from the paper in that the lower standard deviation favors the centered approach.

The degree of pooling seems to matter quite a bit. In this case, looking at something like the average sigma vs. the mean value of tau gave a better indication of whether the centered or the non-centered would be better than just looking at the average of the sigmas. But this probably only works because each group has one member. So probably more generally, I would refer to the Table 18.4 on page 394 of Gelman and Hill. The more pooling there is, the more reason to use the non-centered parameterization.

On a somewhat related topic, Stan's documentation could do a little better on page 325 with a heading of non-centered parameterization and the text that follows is:

When there is a lot of data, such a hierarchical model can be made much more efficient
by shifting the data’s correlation with the parameters to the hyperparameters. Similar
to the funnel example, this will be much more efficient in terms of effective sample
size when there is not much data (see (Betancourt and Girolami, 2013)).

which is not exactly the easiest to follow. And honestly I think there is a typo here. The paragraph before actually makes pretty clear to use the centered parameterization when there is a lot of data (which I wish I had read before). This starts out talking about "when there is a lot of data," which is the opposite of the use case of non-centered parameterization. I would re-write this as

When there is not much data, a non-centered parameterization can be much more efficient in terms of effective sample size by shifting the data’s correlation with the parameters to the hyperparameters. (see (Betancourt and Girolami, 2013)).

On Wed, Feb 1, 2017 at 5:51 PM, Michael Betancourt <betan...@gmail.com> wrote:

These concerns are best analyzed empirically using Stan’s diagnostics.
See, for example, https://github.com/betanalpha/knitr_case_studies/tree/master/divergences_and_bias.

On Feb 1, 2017, at 4:15 PM, Jonah Gabry <jga...@gmail.com> wrote:

On Wednesday, February 1, 2017 at 3:49:08 PM UTC-5, John Hall wrote:
For instance, the number of groups vs. the amount of data or the standard deviation within the groups vs. the standard deviation of all the data.

It's not so much the amount of data but rather how informative the data is about the parameters. You can have a small dataset that is informative enough to really pin down parameters and you can have large datasets where the data doesn't provide too much information about the parameters. If you want to see this in action in a simple example then you can play around with the eight schools example. If you leave the number of data points at 8 but modify y (or sigma) to make the data more or less informative about the parameters then which parameterization to use will depend on how you scale y (or sigma). In all cases the amount of data remains the same.

Jonah

--
You received this message because you are subscribed to the Google Groups "Stan users mailing list" group.

To unsubscribe from this group and stop receiving emails from it, send an email to stan-users+...@googlegroups.com.

To post to this group, send email to stan-...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the Google Groups "Stan users mailing list" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/stan-users/HWiUtQLRCfE/unsubscribe.

To unsubscribe from this group and all its topics, send an email to stan-users+...@googlegroups.com.

To post to this group, send email to stan-...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the Google Groups "Stan users mailing list" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/stan-users/HWiUtQLRCfE/unsubscribe.

To unsubscribe from this group and all its topics, send an email to stan-users+...@googlegroups.com.

John Hall

unread,

Feb 2, 2017, 11:19:15 AM2/2/17

to stan-...@googlegroups.com

While you're at it, you might consider adding a reference to Gelman and Hill chapter 21 as well. It has a formula 21.14 (and some R code below it) for calculating a lambda parameter that can be used to determine the amount of pooling at any level. It would be a useful diagnostic for whether one should use the non-centered parameterization in a hierarchical model (and in general would probably be a useful summary statistic to look at anyway).

On Wed, Feb 1, 2017 at 7:11 PM, Jonah Sol Gabry <jga...@gmail.com> wrote:

You're right about that paragraph in the manual. Thanks for catching that. I can add this to the GitHub issue for edits to the manual for the next release.

Jonah

On Wed, Feb 1, 2017 at 6:43 PM John Hall <john.mic...@gmail.com> wrote:

Diagnostics are good, but I'm also trying to develop some sort of intuition surrounding the issue. Thus, Jonah's advice to play around with sigma in the 8 schools problem was a good one.

For instance, if I multiply sigma by 100, then the non-centered has significantly more n_eff than centered. And the reverse if I divide by 100. This is similar to the figure from the paper in that the lower standard deviation favors the centered approach.

The degree of pooling seems to matter quite a bit. In this case, looking at something like the average sigma vs. the mean value of tau gave a better indication of whether the centered or the non-centered would be better than just looking at the average of the sigmas. But this probably only works because each group has one member. So probably more generally, I would refer to the Table 18.4 on page 394 of Gelman and Hill. The more pooling there is, the more reason to use the non-centered parameterization.

On a somewhat related topic, Stan's documentation could do a little better on page 325 with a heading of non-centered parameterization and the text that follows is:

When there is a lot of data, such a hierarchical model can be made much more efficient
by shifting the data’s correlation with the parameters to the hyperparameters. Similar
to the funnel example, this will be much more efficient in terms of effective sample
size when there is not much data (see (Betancourt and Girolami, 2013)).

which is not exactly the easiest to follow. And honestly I think there is a typo here. The paragraph before actually makes pretty clear to use the centered parameterization when there is a lot of data (which I wish I had read before). This starts out talking about "when there is a lot of data," which is the opposite of the use case of non-centered parameterization. I would re-write this as

When there is not much data, a non-centered parameterization can be much more efficient in terms of effective sample size by shifting the data’s correlation with the parameters to the hyperparameters. (see (Betancourt and Girolami, 2013)).

On Wed, Feb 1, 2017 at 5:51 PM, Michael Betancourt <betan...@gmail.com> wrote:

These concerns are best analyzed empirically using Stan’s diagnostics.
See, for example, https://github.com/betanalpha/knitr_case_studies/tree/master/divergences_and_bias.

On Feb 1, 2017, at 4:15 PM, Jonah Gabry <jga...@gmail.com> wrote:

On Wednesday, February 1, 2017 at 3:49:08 PM UTC-5, John Hall wrote:
For instance, the number of groups vs. the amount of data or the standard deviation within the groups vs. the standard deviation of all the data.

It's not so much the amount of data but rather how informative the data is about the parameters. You can have a small dataset that is informative enough to really pin down parameters and you can have large datasets where the data doesn't provide too much information about the parameters. If you want to see this in action in a simple example then you can play around with the eight schools example. If you leave the number of data points at 8 but modify y (or sigma) to make the data more or less informative about the parameters then which parameterization to use will depend on how you scale y (or sigma). In all cases the amount of data remains the same.

Jonah

--
You received this message because you are subscribed to the Google Groups "Stan users mailing list" group.

To unsubscribe from this group and stop receiving emails from it, send an email to stan-users+unsubscribe@googlegroups.com.

To post to this group, send email to stan-...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the Google Groups "Stan users mailing list" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/stan-users/HWiUtQLRCfE/unsubscribe.

To unsubscribe from this group and all its topics, send an email to stan-users+unsubscribe@googlegroups.com.

To post to this group, send email to stan-...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the Google Groups "Stan users mailing list" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/stan-users/HWiUtQLRCfE/unsubscribe.

To unsubscribe from this group and all its topics, send an email to stan-users+unsubscribe@googlegroups.com.

To post to this group, send email to stan-...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the Google Groups "Stan users mailing list" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/stan-users/HWiUtQLRCfE/unsubscribe.

To unsubscribe from this group and all its topics, send an email to stan-users+unsubscribe@googlegroups.com.

John Hall

unread,

Feb 2, 2017, 1:29:12 PM2/2/17

to stan-...@googlegroups.com

This is a function I wrote to implement Gelman and HIll formula 21.14 with rstan. The higher lambda is the more benefit pooling there is and the greater the benefit from non-centered parameterization in hierarchical models.

lambda <- function(fit, ...) {
    extracted_fit <- rstan::extract(fit, permuted = TRUE, ...)
    N <- length(extracted_fit)
    result <- rep(NA, N)
    names(result) <- names(extracted_fit)
    for (i in 1:N) {
        extracted_fit_i <- extracted_fit[[i]]
        if (length(dim(extracted_fit_i)) == 1) next #only calculate if more than
                                                    #1 dimension
        e <- extracted_fit_i - mean(extracted_fit_i)
        result[i] <- 1 - var(apply(e, 2, mean)) / mean(apply(e, 1, var))
    }
    return(result)
}

Jonah Sol Gabry

unread,

Feb 2, 2017, 3:49:20 PM2/2/17

to stan-...@googlegroups.com

Cool, thanks. Your suggestions are now officially on the record:

https://github.com/stan-dev/stan/issues/2122

(Scroll down to the bottom)

To unsubscribe from this group and stop receiving emails from it, send an email to stan-users+...@googlegroups.com.

To post to this group, send email to stan-...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the Google Groups "Stan users mailing list" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/stan-users/HWiUtQLRCfE/unsubscribe.

To unsubscribe from this group and all its topics, send an email to stan-users+...@googlegroups.com.

To post to this group, send email to stan-...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the Google Groups "Stan users mailing list" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/stan-users/HWiUtQLRCfE/unsubscribe.

To unsubscribe from this group and all its topics, send an email to stan-users+...@googlegroups.com.

To post to this group, send email to stan-...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the Google Groups "Stan users mailing list" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/stan-users/HWiUtQLRCfE/unsubscribe.

To unsubscribe from this group and all its topics, send an email to stan-users+...@googlegroups.com.

To post to this group, send email to stan-...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the Google Groups "Stan users mailing list" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/stan-users/HWiUtQLRCfE/unsubscribe.

To unsubscribe from this group and all its topics, send an email to stan-users+...@googlegroups.com.

Bob Carpenter

unread,

Feb 2, 2017, 5:07:06 PM2/2/17

to stan-...@googlegroups.com

The time is a direct link on the site, so you can copy it:

https://github.com/stan-dev/stan/issues/2122#issuecomment-276830373

- Bob

Jonah Sol Gabry

unread,

Feb 2, 2017, 5:19:06 PM2/2/17

to stan-...@googlegroups.com

Cool. I did not know that!

--
You received this message because you are subscribed to a topic in the Google Groups "Stan users mailing list" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/stan-users/HWiUtQLRCfE/unsubscribe.

To unsubscribe from this group and all its topics, send an email to stan-users+unsubscribe@googlegroups.com.

John Hall

unread,

Feb 2, 2017, 11:18:20 PM2/2/17

to stan-...@googlegroups.com

Awesome!

Reply all

Reply to author

Forward