Rhat on a single chain

Ben Lambert

unread,

Oct 29, 2015, 4:33:44 AM10/29/15

to Stan users mailing list

Hi,

I have been running a stan model with a single chain using:

fit <- stan(file = aStanModel, data = data, iter = 2000, chains = 1, seed=1)

And I still get out values for Rhat. However, I was wondering how this is calculated given that there is only 1 chain?

Best,

Ben

Michael Betancourt

unread,

Oct 29, 2015, 7:53:51 AM10/29/15

to stan-...@googlegroups.com

Stan uses split R-hat that takes N chains, splits them in half, and then performs the R-hat calculation on the resulting 2N chains. So even if you just run one chain you can compute a valid R-hat using the two halves.

Everything is discussed in the manual.

--
You received this message because you are subscribed to the Google Groups "Stan users mailing list" group.
To unsubscribe from this group and stop receiving emails from it, send an email to stan-users+...@googlegroups.com.
To post to this group, send email to stan-...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Ben Lambert

unread,

Oct 29, 2015, 11:00:24 AM10/29/15

to Stan users mailing list

Great - thanks.

Bob Carpenter

unread,

Oct 29, 2015, 1:21:45 PM10/29/15

to stan-...@googlegroups.com

We strongly recommend running more than one chain to
diagnose multimodality and slow convergence or stuck
chains that look like they've converged but haven't.

- Bob

Ben Lambert

unread,

Oct 29, 2015, 2:20:00 PM10/29/15

to Stan users mailing list

Ok, that explains some of the differences in diagnostics that I am achieving from using 1 chain vs 16. Up until now, was using just 1, but wanted to ramp up the number of iterations, so moved to 16 (being run in parallel).

Before I was getting Rhat, NDivergent, etc. all looking fine, but when I move to many chains this isn't the case. Does this imply that previously chains were getting stuck in an area of the parameter space?

Best,

Ben

Bob Carpenter

unread,

Oct 29, 2015, 2:26:47 PM10/29/15

to stan-...@googlegroups.com

> On Oct 29, 2015, at 2:20 PM, Ben Lambert <ben.c....@googlemail.com> wrote:
>
> Ok, that explains some of the differences in diagnostics that I am achieving from using 1 chain vs 16. Up until now, was using just 1, but wanted to ramp up the number of iterations, so moved to 16 (being run in parallel).
>
> Before I was getting Rhat, NDivergent, etc. all looking fine, but when I move to many chains this isn't the case. Does this imply that previously chains were getting stuck in an area of the parameter space?

Yes. If each of the chains for a parameter
converges to the same mean and sd, our n_eff
estimate reduces to doing them independently across
chains and adding them together. Otherwise, it's
conservative as we believe it should be to reflect
that the chains aren't mixing well, so the means
and variances shouldn't be trusted.

There are complete definitions in the manual in
the chapter on MCMC.

- Bob

Andrew Gelman

unread,

Oct 29, 2015, 4:23:27 PM10/29/15

to stan-...@googlegroups.com

I wouldn’t quite call it a “valid” R-hat, but you can compute it.

On Oct 29, 2015, at 7:29 AM, Michael Betancourt <betan...@gmail.com> wrote:

Michael Betancourt

unread,

Oct 29, 2015, 6:14:41 PM10/29/15

to stan-...@googlegroups.com

Modulo the adaptation parameters all being the same —
see if the average acceptance probability and step size
are similar across chains. If they are then outlying chains
would indicate small pathological neighborhoods that you
weren’t seeing before. Otherwise the adaptation isn’t
converging to a consistent answer and you might have
to tweak warmup before judging any pathological regions.

Jonah

unread,

Oct 29, 2015, 7:02:56 PM10/29/15

to Stan users mailing list

And always run multiple chains. In parallel there's really no cost to running several (preferably >= 4) rather than 1. Even if you can't parallelize for whatever reason, I'd argue that the (scientific) cost of running them sequentially (even for really slow models) is infinitely smaller than the cost of unknowingly using invalid estimates.

Of course this doesn't apply if you're doing it for educational purposes (e.g. to show that multiple chains are required in order to identify certain pathologies when doing MCMC).

Jan Tünnermann

unread,

Oct 29, 2015, 7:16:24 PM10/29/15

to stan-...@googlegroups.com

It would be a cool feature like this: stan would monitor the effective
sample size during sampling (at least sporadically). Whenever the sample
is sufficiently large, a new chain is started. The user would specify
the number of threads and the desired overall sample size. Stan then
would do something like this:

c1 |-----|-------|-----|-------|--------|
c2 |---|---------|-------|---|--------|
c3 |-----|-------|-----|-------|------|
c4 |-----|-----|---------|---------|---|

E.g., ~4 serial chains in 4 threads

While currently it does something like this:

c1 |---------------------------------|
c2 |---------------------------------|
c3 |---------------------------------|
c4 |---------------------------------|

The overall sample size in this case is similar, but the across chain
diagnostics weaker (probably trades for more samples per chain than
really needed).

Each in the chains in the hypothetical first case would go through the
whole adaptation and warmup business, of course.

Best regards
Jan

On 10/30/2015 12:02 AM, Jonah wrote:
> And always run multiple chains. In parallel there's really no cost to running several (preferably >= 4) rather than 1. Even if you can't parallelize for whatever reason, I'd argue that the (scientific) cost of running them sequentially (even for really slow models) is infinitely smaller than the cost of unknowingly using invalid estimates.
>
> Of course this doesn't apply if you're doing it for educational purposes (e.g. to show that multiple chains are required in order to identify certain pathologies when doing MCMC).
>

--
M.Sc. Jan T U E N N E R M A N N _________________________________

University of Paderborn
Faculty of Computer Science
Electrical Engineering and Mathematics
GET Lab

phone: +49/ (0)5251/ 60 - 2219 Pohlweg 47-49
fax: +49/ (0)5251/ 60 - 3238 D-33098 Paderborn
email: tuenn...@get.uni-paderborn.de

Faculty of Cultural Sciences
Department of Psychology
PsyLab
phone: +49/ (0)5251/ 60 - 3211 Warburger Straße 100
fax: +49/ (0)5251/ 60 - 3528 D-33098 Paderborn
email: jan.tue...@uni-paderborn.de

Bob Carpenter

unread,

Oct 29, 2015, 9:14:14 PM10/29/15

to stan-...@googlegroups.com

Thanks for the suggestions.

We've been over this ground quite a bit over the past
few years, but don't have an organized discussion log on the topic.

Right now, we're running multi-process, not multi-threaded.

Right now, there's zero interprocess communication required.

Any of the portable thread libraries are beyond our ability
to cope, at least with current staffing.

It's difficult to monitor effective sample size on the fly
efficiently.

The next thing to do on this research project would be to
define what exactly you mean by monitoring ESS during sampling
and how it interacts with warmup and how it'd fit into our
existing architecture. A lot of this can be mocked up by
setting seeds, so you can see what would happen if warmup
continued, or if sampling continued.

Perhaps a more interesting angle is figuring out how to usefully
combine cross-chain adaptation information.

- Bob

Reply all

Reply to author

Forward