R_hat (convergence statistic)

Øystein Sandvik

unread,

Aug 7, 2012, 9:58:38 AM8/7/12

to hddm-...@googlegroups.com

Hi
I have compared the R_hat (Gelman-Rubin) convergence statistic of kabuki (https://github.com/hddm-devs/kabuki) and pymc. It looks like both have errors, so I also did my own implementation and compared all three on a few runs.
After correcting the errors I found and square rooting the return value of the pymc implementation, I got the same results for all three implementations.
Kabuki seems to have the number of chains and samples mixed up (see below). The pymc implementation match the one in kabuki except from one part. If "V = s2 + B_over_n/m" is skipped, all three implementations provides equal results.
Does anyone have any experience using these?

Below is the code I used for comparing the different implementations. The comments are mostly mine, but the actual calculations are from the source code (except from the expressions marked # CORRECTED).

##########################################

# kabuki/analyze.py, line 285 ("def R_hat")
def R_hat_kabuki(samples):
    #n, num_chains = samples.shape # n=num_samples
    num_chains, n = samples.shape # CORRECTED
    chain_means = np.mean(samples, axis=1)
    # Calculate between-sequence variance
    between_var = n * np.var(chain_means, ddof=1)
    chain_var = np.var(samples, axis=1, ddof=1)
    within_var = np.mean(chain_var) # OK (=pymc)
    marg_post_var = ((n-1.)/n) * within_var + (1./n) * between_var # = pymc s2
    R_hat_sqrt = np.sqrt(marg_post_var/within_var)
    return R_hat_sqrt

# pymc/diagnostics.py, line 450 ("def gelman_rubin")
def R_hat_pymc(x):
    if np.shape(x) < (2,):
        raise ValueError('Gelman-Rubin diagnostic requires multiple chains.')
    try:
        m,n = np.shape(x)
    except ValueError:
        return [gelman_rubin(np.transpose(y)) for y in np.transpose(x)]
    # Calculate between-chain variance
    B_over_n = np.sum((np.mean(x,1) - np.mean(x))**2)/(m-1)
    # Calculate within-chain variances
    W = np.sum([(x[i] - xbar)**2 for i,xbar in enumerate(np.mean(x,1))]) / (m*(n-1)) # OK (=kabuki)
    # (over) estimate of variance
    s2 = W*(n-1)/n + B_over_n # = marg_post_var
    # Pooled posterior variance estimate
    V = s2 + B_over_n/m
    V = s2 # CORRECTED
    # Calculate PSRF
    R = V/W
    return R

def R_hat_my_imp(samples):
    m, n = numpy.shape(samples) # m = chains, n = samples
    # Chain variance
    chain_var = numpy.var(samples, axis=1, ddof=1) # degrees of freedom = n-ddof
    # Within-chain variance (mean of variances of each chain)
    W = 1./m * numpy.sum(chain_var)
    # Chain means
    chain_means = numpy.mean(samples, axis=1)
    #mean_of_means = numpy.mean(chain_means) # all chains have same length
    # Variance of chain means
    chain_means_var = numpy.var(chain_means, ddof=1)
    # Between-chain variance
    B = n * chain_means_var
    # Weighted average of within and between variance
    #(marginal posterior variance)
    Var_hat = (float(n-1)/n)*W + B/n
    # Potential scale reduction factor
    R_hat = numpy.sqrt(Var_hat / W)
    return R_hat

##########################################

Does my implementation of the Gelman-Rubin statistic look right?
Does anyone know what the extra B_over_n/m that is added to the estimated variance in the pymc implementation is? Is this from a slightly different version of gelman-rubin, or is it simply a coding/implementation error? (I will also post this to the pymc forum)

Thanks,
Øystein

Thomas Wiecki

unread,

Aug 7, 2012, 10:08:40 AM8/7/12

to hddm-...@googlegroups.com

Hi Øystein,

Thanks a lot for fixing that. I was always suspicious of my implementation and meant to get back to it. I think I will just scratch it and forward the call to the pymc gelman-rubin (no reason for code duplication) after the B_over_n/m issue is resolved. However, I do find your version the most readable of the three (with mine coming in second ;)).

Thomas

Imri Sofer

unread,

Aug 7, 2012, 10:24:29 AM8/7/12

to hddm-...@googlegroups.com

we already replaced our implementation in the development version with the pymc implementation.
so you'll see it in the next update.

Thomas Wiecki

unread,

Aug 8, 2012, 12:43:50 PM8/8/12

to hddm-...@googlegroups.com

Øystein,

Where you able to run multiple chains in HDDM that could then be subjected to the Gelman-Rubin? Is there some code that could be reused?

Thanks,

Thomas

On Tue, Aug 7, 2012 at 9:58 AM, Øystein Sandvik <oystein...@gmail.com> wrote:

guido biele

unread,

Aug 8, 2012, 1:51:26 PM8/8/12

to hddm-...@googlegroups.com

Hi,
apologies for the long email.

i am answering because øystein and I are working together and I took care of the parallelization of the chains (implemented with shell script).
øystein: please corrrect or complete my description if neccessary.

we are running the models on a large grid (OS linux) which uses slurm as a scheduler. therefore we send different chains to different cores, save the traces and model specification, and load this data again once all chains are done to calculate convergence. (this works, with exception of the difficulty to load large databases)

we can email you the scripts and you can see if anything of it is of use.

the key to the aproach with saving and loading the traces is to keep traces, model specification and data together. at the moment we are not doing this in an not very elegant manner, but we have discussed saving a completed model as a pickle with three objects (if this is the right terminology): a string with the model specification, the data array, and the trace database (and maybe some trace stats, see the ps below). combined with a "load model" comand in hddm this would be an easy way to save a complete.

cheers-guido

ps: writing this made me think of a solution to the problem we have with loading large (>100mb) databases: we could write a trace_stats command for hddm which calculates variances of the traces as needed for R_hat and stores this together with info about the sample size and the model. with this information saved, one doesn't need the complete trace database to calculte R_hat.

pps: an alternative way to submit jobs to a grid would be to send a job to a node with several cores and to do the parallelization from within python. the disadvantage of this approach is that a job will only be started once such a node is available. the approach we use requires more coding outside python (there is a slurm module for python, but we havent used it yet), but allows starting a job as soon as single core is available.
--
Sent from my Android phone with K-9 Mail. Please excuse my brevity.

Thomas Wiecki

unread,

Aug 8, 2012, 2:25:10 PM8/8/12

to hddm-...@googlegroups.com

On Wed, Aug 8, 2012 at 1:51 PM, guido biele <g.p....@psykologi.uio.no> wrote:

Hi,
apologies for the long email.

i am answering because øystein and I are working together and I took care of the parallelization of the chains (implemented with shell script).
øystein: please corrrect or complete my description if neccessary.

we are running the models on a large grid (OS linux) which uses slurm as a scheduler. therefore we send different chains to different cores, save the traces and model specification, and load this data again once all chains are done to calculate convergence. (this works, with exception of the difficulty to load large databases)

we can email you the scripts and you can see if anything of it is of use.

I'm especially interested in how you combine the different chains into one. Or does your Rhat script just work with multiple models and doesn't combine them?

the key to the aproach with saving and loading the traces is to keep traces, model specification and data together. at the moment we are not doing this in an not very elegant manner, but we have discussed saving a completed model as a pickle with three objects (if this is the right terminology): a string with the model specification, the data array, and the trace database (and maybe some trace stats, see the ps below). combined with a "load model" comand in hddm this would be an easy way to save a complete.

I always wanted to code that but haven't gotten around to it. So if you wanted to give that a try that'd be great and I'll happily include that. It shouldn't be too hard like you already said.

cheers-guido

ps: writing this made me think of a solution to the problem we have with loading large (>100mb) databases: we could write a trace_stats command for hddm which calculates variances of the traces as needed for R_hat and stores this together with info about the sample size and the model. with this information saved, one doesn't need the complete trace database to calculte R_hat.

That's actually a great idea. The main reason to run multiple chains is to compute the R_hat so that'd be a nice optimization.

pps: an alternative way to submit jobs to a grid would be to send a job to a node with several cores and to do the parallelization from within python. the disadvantage of this approach is that a job will only be started once such a node is available. the approach we use requires more coding outside python (there is a slurm module for python, but we havent used it yet), but allows starting a job as soon as single core is available.

The experiments.py interface I referred to earlier already does that (inside of python), but requires MPI, which slurm can deal with.

Guido Biele

unread,

Aug 8, 2012, 3:02:56 PM8/8/12

to hddm-...@googlegroups.com

On Wednesday, August 8, 2012 8:25:10 PM UTC+2, Thomas wrote:

On Wed, Aug 8, 2012 at 1:51 PM, guido biele <g.p....@psykologi.uio.no> wrote:
Hi,
apologies for the long email.

i am answering because øystein and I are working together and I took care of the parallelization of the chains (implemented with shell script).
øystein: please corrrect or complete my description if neccessary.

we are running the models on a large grid (OS linux) which uses slurm as a scheduler. therefore we send different chains to different cores, save the traces and model specification, and load this data again once all chains are done to calculate convergence. (this works, with exception of the difficulty to load large databases)

we can email you the scripts and you can see if anything of it is of use.

I'm especially interested in how you combine the different chains into one. Or does your Rhat script just work with multiple models and doesn't combine them?

I am not sure what you mean with combining models. Rhat is naturally computed across chains. here is Oystein's code:

##########################################

# Function for calculating R_hat (potential scale reduction factor)

# Gelman-Rubin (new implementation)

def R_hat(samples):

# Number of chains (m) and number of samples (n)

m, n = numpy.shape(samples)

# Chain variance

chain_var = numpy.var(samples, axis=1, ddof=1) # degrees of freedom = n-ddof

# Within-chain variance (mean of variances of each chain)

W = 1./m * numpy.sum(chain_var)

# Chain means

chain_means = numpy.mean(samples, axis=1)

# Variance of chain means

chain_means_var = numpy.var(chain_means, ddof=1)

# Between-chain variance

B = n * chain_means_var

# Weighted average of within and between variance

#(marginal posterior variance)

Var_hat = (float(n-1)/n)*W + B/n

# Potential scale reduction factor

R_hat = numpy.sqrt(Var_hat / W)

return R_hat

##########################################

# Function for checking convergence

# adapted from kabuki.analyze.test_chain_convergance()

def test_chain_convergence(models, params, implementation):

params = params

number_of_params = len(params)

R_hat_param = {}

print 'calculating convergence statistic for ',

for param in params:

print param,

t = models[0].mc.trace(param)

num_samples = t.length()

num_chains = len(models)

samples = numpy.empty((num_chains, num_samples))

for i,model in enumerate(models):

trace = model.mc.trace(param).gettrace()

samples[i,:] = trace

R_hat_param[param] = R_hat(samples)

return R_hat_param

##########################################

one thing I haven't have time to look up is if/how the statics of the different chains should be combined if they all converged to the same solution. is this maybe what you mean with combining?

the key to the aproach with saving and loading the traces is to keep traces, model specification and data together. at the moment we are not doing this in an not very elegant manner, but we have discussed saving a completed model as a pickle with three objects (if this is the right terminology): a string with the model specification, the data array, and the trace database (and maybe some trace stats, see the ps below). combined with a "load model" comand in hddm this would be an easy way to save a complete.

I always wanted to code that but haven't gotten around to it. So if you wanted to give that a try that'd be great and I'll happily include that. It shouldn't be too hard like you already said.

cheers-guido

ps: writing this made me think of a solution to the problem we have with loading large (>100mb) databases: we could write a trace_stats command for hddm which calculates variances of the traces as needed for R_hat and stores this together with info about the sample size and the model. with this information saved, one doesn't need the complete trace database to calculte R_hat.

That's actually a great idea. The main reason to run multiple chains is to compute the R_hat so that'd be a nice optimization.

then we will write a short command that save model specification, data, inso and statistics for R_hat, (and also a histogram) into one pickle.

pps: an alternative way to submit jobs to a grid would be to send a job to a node with several cores and to do the parallelization from within python. the disadvantage of this approach is that a job will only be started once such a node is available. the approach we use requires more coding outside python (there is a slurm module for python, but we havent used it yet), but allows starting a job as soon as single core is available.

The experiments.py interface I referred to earlier already does that (inside of python), but requires MPI, which slurm can deal with.

I didn't loo at this in great detail to be honest. the reason is that I alos use the grid for other stuff, so that it was the quickest for me to do this with a shell script.

Thomas Wiecki

unread,

Aug 8, 2012, 6:11:56 PM8/8/12

to hddm-...@googlegroups.com

On Wed, Aug 8, 2012 at 3:02 PM, Guido Biele <g.p....@psykologi.uio.no> wrote:

one thing I haven't have time to look up is if/how the statics of the different chains should be combined if they all converged to the same solution. is this maybe what you mean with combining?

I think if the Rhat holds you can just concatenate them.

the key to the aproach with saving and loading the traces is to keep traces, model specification and data together. at the moment we are not doing this in an not very elegant manner, but we have discussed saving a completed model as a pickle with three objects (if this is the right terminology): a string with the model specification, the data array, and the trace database (and maybe some trace stats, see the ps below). combined with a "load model" comand in hddm this would be an easy way to save a complete.

I always wanted to code that but haven't gotten around to it. So if you wanted to give that a try that'd be great and I'll happily include that. It shouldn't be too hard like you already said.

cheers-guido

ps: writing this made me think of a solution to the problem we have with loading large (>100mb) databases: we could write a trace_stats command for hddm which calculates variances of the traces as needed for R_hat and stores this together with info about the sample size and the model. with this information saved, one doesn't need the complete trace database to calculte R_hat.

That's actually a great idea. The main reason to run multiple chains is to compute the R_hat so that'd be a nice optimization.
then we will write a short command that save model specification, data, inso and statistics for R_hat, (and also a histogram) into one pickle.

Sounds great. Although I'm a little unsure if you want to do this on the 0.2 version since it will be obsolete soon. I think it shouldn't be too hard to recode for 0.3 but you might want to consider working on feature/hnode branch. I am using it in production and it performs much better. Just have to work out some rough edges.

Also, ideally you overload the __pickle__() method (or something associated) so that one could simply say pickle.dump(model, "model_file") and model = pickle.load("model_file") and the model will get recreated and the associated chains loaded.

Here is some info I found on that: http://docs.python.org/library/pickle.html#object.__setstate__

I do consider this more advanced python stuff, so no worries if you don't want to go there. The code is likely to be easy adapted in any case.

Reply all

Reply to author

Forward