If you want to know a PSRF for a particular margin of interest, then I think you have to use the original scale. If you are using the Rhats to decide if the chains have converged to the posterior distribution, then maybe transformations help in some cases but it is not so hard to fool it. One way is to draw from a posterior that has standard normal margins and a sufficiently evil copula density, such as the bivariate Clayton
parameters {
real z[2];
real<lower=0> delta;
}
model {
real theta;
real p[2];
theta <- delta - 1;
z ~ normal(0,1);
delta ~ normal(0,1);
p[1] <- Phi(z[1]);
p[2] <- Phi(z[2]);
increment_log_prob( log1p(theta) -
(1 + theta) * log(p[1] * p[2]) -
(1 + 2 * theta) / theta * log(pow(p[1], -theta) + pow(p[2], -theta) - 1) );
}
With 8 chains, 6000 iterations, and a seed of 12345, you get small Rhats on the original scale
> Clayton
Inference for Stan model: Clayton.
8 chains, each with iter=6000; warmup=3000; thin=1;
post-warmup draws per chain=3000, total post-warmup draws=24000.
mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff Rhat
z[1] 0.06 0.03 1.00 -1.93 -0.62 0.05 0.73 1.98 1490 1.01
z[2] -0.01 0.03 1.02 -2.01 -0.71 -0.02 0.71 2.02 1359 1.00
delta 0.88 0.02 0.58 0.10 0.43 0.76 1.22 2.28 1003 1.01
lp__ -1.74 0.02 1.16 -4.83 -2.23 -1.43 -0.91 -0.37 2941 1.00
and on the rank-normalized scale
> rank_rhats(Clayton)
Inference for the input samples (8 chains: each with iter=3000; warmup=0):
mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff Rhat
z[1] 0 0.03 1 -1.96 -0.67 0 0.67 1.96 1537 1.01
z[2] 0 0.03 1 -1.96 -0.67 0 0.67 1.96 1371 1.00
delta 0 0.05 1 -1.96 -0.67 0 0.67 1.96 482 1.02
lp__ 0 0.03 1 -1.96 -0.67 0 0.67 1.96 1193 1.00
but the chains do not converge, in part because they get stuck
which Mike's preferred method would catch because n_divergent__ is usually not zero and my preferred method would catch because with a thin interval of 6

you reject the null hypothesis that the split chains all have the same distribution
> convergence_test(Clayton, thin = 6, R = 199)
[1] "Multivariate loss = 1.005"
disco(x = posterior, factors = data.frame(chain_id), R = R)
Distance Components: index 1.00
Source Df Sum Dist Mean Dist F-ratio p-value
chain_id 15 49.03718 3.26915 2.734 0.005
Within 3984 4764.27077 1.19585
Total 3999 4813.30795
So, I am still searching for a case
https://groups.google.com/d/msg/stan-dev/lMy2CZJD9fk/JtjzFw3AUx0Jas to why using Rhats to decide the question of whether the chains have converged to the stationary distribution is better than what I am favoring.
Ben