Computational Efficiency Discrepancy

Remy MacDonald

unread,

Jul 29, 2022, 1:39:54 AM7/29/22

to nimble-users

Hello,

I have encountered a counter-intuitive discrepancy in the computational cost between two MCMC algorithms and was hoping you would have insight into why this could be occurring,

In MCMC2.R (which sources nimbleToy2.R) the nimble function involves splitting X into two sub-matrices and evaluating a function of each sub-matrix. On the other hand, with MCMC10.R, the nimble function involves splitting X into ten sub-matrices. I'm finding that the latter is more computationally efficient even though there are more parameters which need to be estimated (eps1,...,eps10 verses eps1,eps2). In fact, if we increase the number of sub matrices and corresponding values of epsilon, the efficiency only improves.

Please let me know if my question is unclear and I will try to clarify. Thank you so much!

MCM2.R

nimbleToy10.R

MCMC10.R

nimbleToy2.R

Perry de Valpine

unread,

Jul 29, 2022, 11:05:03 AM7/29/22

to Remy MacDonald, nimble-users

Hi Remy,

Thanks for the question and very clear reproducible code. It was puzzling but I think I know what is going on.

First I checked on time spent in each sampler, using

cmcmcRmodel$run(niter, time = TRUE)

cmcmcRmodel$samplerTimes

The second sampler accounts for essentially all of the difference in computing time (about 8 seconds difference on my machine out of 25 or 34 seconds in the two cases, so quite noticeable).

From

confRModel$printSamplers()

we see that the second sampler is for eps. This will have the expensive computations of M as a dependency.

Yet the computations of the pieces of M itself are not counter-intuitive in terms of computational cost.

system.time(for(i in 1:100) cRmodel$calculate("M")) # Loop 100 times for some averaging of costs

# This is nearly the same in the two cases but slightly *faster* when you've used 2 sub-matrices instead of 10. That makes sense and is opposite to the weird MCMC result. Also note that there will be some overhead in R when nimble evaluates cRmodel$calculate("M"). It will be in R that it determines what nodes need calculation, and sets up a loop for that, so that may account for some of the slowdown for 10. But the difference is 2.1 seconds (case 10) vs 2.0 seconds (case 2) on my machine, so not of the magnitude in the MCMC issue.

So this suggests it is not raw computation time of the M pieces that is the issue.

I think what is happening is that invalid proposals (outside support of the prior) in the block sampler are immediately rejected without wasting time computing dependencies, and this is much more likely to happen in the case of 10 elements for eps than 2 elements for eps. To explain more, your eps[1:q] is a vector node, meaning it is (by default MCMC configuration) sampled as a vector using an adaptive random-walk Metropolis-Hastings block sampler. And the prior is your dUnifVector which makes it uniform from 1 to 3 in every element. The RW-MH block sampler makes multivariate normal proposals, for which the scale and correlation are adapted over many iterations and might be quite poor (arbitrary) choices at the start. Like many of our samplers, after it makes a proposal, it first checks the log probability of the prior. If that is -Inf (meaning the proposal is in a region of 0 prior probability) it immediately rejects the proposal and that iteration of that sampler is done. When this occurs the dependency computations are *not* done. In your case, every time an invalid proposal is made, the cost of computing M (all of its pieces) is skipped because the proposal will be rejected regardless. A block sampler in 10 dimensions with uniform prior is much more likely to have at least one element fall outside of the valid range (1 to 3) and thus have the whole vector proposal be immediately rejected. Also I noticed from the posterior that elements of eps can spend a lot of time (iterations) up against a boundary value, making it likely that a proposal will be on the invalid side of that boundary. In other words, the case of 10 goes faster because it is more often making waster proposals for eps[1:10] that are rejected without incurring the cost of computing M (and then W, lambda and Z)

I'm not sure of the full context here, but some ideas are: you could set up eps as scalar nodes and they will be sampled one by one; and/or you could set up eps on a different scale, like logit(eps), so they do not have the boundaries from the uniform. Or you may have just wanted to make sense of what nimble is doing.

The cost of the actual multivariate normal draw and adaptation inside the RW-MH block sampler in this case is very small compared to the model computations.

I hope that is clear!

Perry

--
You received this message because you are subscribed to the Google Groups "nimble-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to nimble-users...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/nimble-users/c4b30668-eda1-4be4-b1ec-225aea84b098n%40googlegroups.com.

Ben Lee

unread,

Jul 29, 2022, 11:47:43 AM7/29/22

to nimble-users

Hi Perry,

As always, thank you and the rest of the nimble team for the quick and detailed response. This makes absolute sense that we are rejecting too many proposals because we use uniform priors for eps. In the next stage, we will try a scaling/translation such as logit(eps) and use the multivariate normal proposal. We really appreciate your attention to this matter.

best,

Ben

Daniel Turek

unread,

Jul 29, 2022, 1:20:08 PM7/29/22

to Ben Lee, nimble-users

Ben, also noting you can directly monitor the acceptance rate history of the RW_block and the RW samplers, by setting options:

nimbleOptions(buildInterfacesForCompiledNestedNimbleFunctions = TRUE)
nimbleOptions(MCMCsaveHistory = TRUE)

Then accessing the acceptance rate history of any particular sampler as:

Cmcmc$samplerFunctions[[SAMPLER_NUMBER_HERE]]$getAcceptanceHistory()

Each element in the vector returned is the empirical acceptance rate of the sampler (proportion), observed during each adaptation period (which by default is every 200 MCMC iterations, although you can modify that easily if you want.

You can also access other diagnostics, such as the "scale" variable, and the proposal covariance matrix, things like that. See code of the RW_block and RW and other samplers, to see these methods.

Cheers,

Daniel

To view this discussion on the web visit https://groups.google.com/d/msgid/nimble-users/23adf235-4f59-49c0-9cd9-a740f9014168n%40googlegroups.com.

Ben Lee

unread,

Jul 30, 2022, 1:27:48 AM7/30/22

to nimble-users

Thank you Daniel. This is very helpful, and they should help us diagnose issues with the sampling.

best,

Ben

David Pleydell

unread,

Aug 1, 2022, 10:12:45 AM8/1/22

to nimble-users

Dear Ben

If the issue is that adaptive Metropolis-Hastings efficiency is being decreased due to bounds on the parameter space, then you may like to try the nimbleNoBounds package. I tried this and added code to your scripts to quantify efficiency (attached), and noticed that the efficiency for sampling tau can be increased (very) roughly by 3 times, although the efficiency gain for sampling eps on a transformed scale is less convincing. However, the key bottle-neck with sampling your model is the extremely large auto-correlation in the delta. The functions effectiveSize and acfplot from the coda library can provide helpful diagnostics in your quest for better mixing.

Cheers

David

nimbleToy10_noBounds.R

MCMC2_noBounds.R

nimbleToy10.R

nimbleToy2.R

MCMC10_noBounds.R

MCMC10.R

nimbleToy2_noBounds.R

MCMC2.R

Reply all

Reply to author

Forward