What are best practices for pairs() plots with big multilevel models?

Jonathan Gilligan

unread,

Jul 22, 2015, 12:26:45 PM7/22/15

to Stan users mailing list

There have been some great discussions here about how important it is to use pairs() plots to check for sampling problems (bottlenecks, correlations, etc.), but I would love advice about how to work with big multilevel models where there are a small number of top-level hyperparameters characterizing hyperpriors, but at lower levels have one or more priors (each with one or more parameters) for each of a large number of groups, so pairs() plots of all combinations of parameters would require too many panes to be useful or practical.

Even for a small example, like 8 schools, plotting all possible pairs (2 hyperparameters and 16 school-level parameters) would require more than 300 panes, and with larger cases that are common in MLM, one can have thousands of group-level parameters, implying millions of panes in an exhaustive pairs() plot.

I've found it very useful when I'm working on models to do pairs() plots of all combinations of hyperparameters to check for bad parameterizations, but have ignored correlations among the site-level parameters because I don't know what to do with so many dimensions.

It would be very helpful to me to know how other people think about this kind of thing and what they do.

Ben Goodrich

unread,

Jul 22, 2015, 1:06:02 PM7/22/15

to Stan users mailing list, jonathan...@gmail.com

On Wednesday, July 22, 2015 at 12:26:45 PM UTC-4, Jonathan Gilligan wrote:

There have been some great discussions here about how important it is to use pairs() plots to check for sampling problems (bottlenecks, correlations, etc.), but I would love advice about how to work with big multilevel models where there are a small number of top-level hyperparameters characterizing hyperpriors, but at lower levels have one or more priors (each with one or more parameters) for each of a large number of groups, so pairs() plots of all combinations of parameters would require too many panes to be useful or practical.

I would say to always include lp__, top-level hyperparameters, and any variance / standard deviation that goes into the likelihood. If that does not point to the source of your problem, then maybe do additional pairs plots with lp__ and a batch of lower-level parameters. High correlations are not a big deal, per se, but can be problematic when combined with changing variances.

Ben

Krzysztof Sakrejda

unread,

Jul 22, 2015, 2:29:05 PM7/22/15

to Stan users mailing list, jonathan...@gmail.com

When choosing which lower-level parameters to plot it helps to divide them into groups based on the amount of data available.

Krzysztof

Ben

Bob Carpenter

unread,

Jul 22, 2015, 2:32:30 PM7/22/15

to stan-...@googlegroups.com, jonathan...@gmail.com

I love all the great advice on this list. That's totally going in
the manual as soon as we add a debugging advice chapter:

https://github.com/stan-dev/stan/issues/1238#issuecomment-123818846

- Bob

> --
> You received this message because you are subscribed to the Google Groups "Stan users mailing list" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to stan-users+...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

Reply all

Reply to author

Forward