Difference between "conjugate sampler" and "RW

PierGianLuca

unread,

Jun 22, 2025, 5:18:36 PMJun 22

to nimble-users

Dear Nimble devs and users,

I was wondering what is the difference between "conjugate sampler" and "RW_dirichlet sampler" for a node that follows a Dirichlet distribution.

More in detail (simplifying the original model):

A. If I write a model as follows:

W[1:n] ~ ddirch(alpha = alphas[1:n])

WW[1:n] <- W[1:n] + 1e-100 # to avoid zero-elements

for(d in 1:m){
K[d] ~ dcat(prob = WW[1:n])
data[d] ~ dnorm(mean = mean[K[d]], var = 1)
}

Then I see that W is assigned a RW_dirichlet sampler:
----
categorical sampler (11)
[...]
- K[] (10 elements)
RW_dirichlet sampler (1)
- W[1:64]
----

B. If I write the model as follows:

W[1:n] ~ ddirch(alpha = alphas[1:n])

for(d in 1:m){
K[d] ~ dcat(prob = W[1:n])
data[d] ~ dnorm(mean = mean[K[d]], var = 1)
}

Then I see that W is assigned a conjugate sampler:
----
conjugate sampler (1025)
[...]
- W[1:64]
categorical sampler (11)
[...]
- K[] (10 elements)
----

But in either case, shouldn't the posterior for W follow a Dirichlet distribution? How does the "conjugate sampler" for W in case B. differ from the RW_dirichlet sampler in case A.?

Thank you for your help!
Best regards,
Luca

Daniel Turek

unread,

Jun 24, 2025, 6:35:27 PMJun 24

to PierGianLuca, nimble-users

Luca, these are good questions.

The "conjugate sampler" does conjugate updates to the elements of W[], using (in your case) the dirichlet prior - categorical likelihood conjugate relationship. This is a specific case of the more general dirichlet prior - multinomial likelihood conjugate relationship. In this case, yes, the exact posterior of W[] is dirichlet, and we can sample it directly.

The RW_dirichlet sampler does metropolis-hastings updates, one at a time and on a logarithmic scale, to the elements of W[], normalizing the sum of the elements of W[] to 1 along the way. This applies when W[] does not exactly fall into a conjugate relationship. I just did a quick pencil and paper check, and I don't believe that when W[] follows a dirichlet prior, then you add a constant to all elements of W[] (in your case the very small constant 1e-100), then use that sum as the probability parameter for a categorical distribution, the posterior is no longer exactly dirichlet. I think you could also verify this by writing down the pdf/pmf of the dirichlet and categorical distributions.

That said, you shouldn't need to add the constant 1e-100 to the elements of W[], because the corner case of any W[i] = 0 is a boundary case, and that should never arise during MCMC updates. Just make sure to initialize all elements of W[] to positive values which sum to 1, and I think you'll be ok. Does this make sense? Please let me know if I've looked over your code or your questions too quickly.

Daniel

--
You received this message because you are subscribed to the Google Groups "nimble-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to nimble-users...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/nimble-users/edad4822-2c09-b174-78af-ca32cc065786%40magnaspesmeretrix.org.

Chris Paciorek

unread,

Jun 24, 2025, 8:32:50 PMJun 24

to Daniel Turek, PierGianLuca, nimble-users

Hi Daniel,

I was talking with Luca about his code in a separate issue (the warnings when logProb values are Inf) and he reminded me that he had discussed this modeling context on the list some time ago. The issue is that W[i]=0 can be caused by numerical underflow, so it does indeed happen, unfortunately. I think at the time, I may have looked at this and not figured out a solution for the underflow happening in conjugate sampler, since there's not really a way to intervene and set the 0 values to some small epsilon. In Luca's case the underflow leads to an Inf log probability when sampling parameters of which W is a dependency. So it's all something of a numerical hairball, so to speak.

-chris

To view this discussion visit https://groups.google.com/d/msgid/nimble-users/CAKbe0hqLD7M73H%2BvtnrshKoa1QhRiwJ-6Vzk%2BC9VoGUQbu5Osw%40mail.gmail.com.

Daniel Turek

unread,

Jun 24, 2025, 9:51:34 PMJun 24

to paci...@stat.berkeley.edu, PierGianLuca, nimble-users

Thanks for following up, Chris. I hope we can figure this one out.

PierGianLuca

unread,

Jun 26, 2025, 4:27:51 AMJun 26

to Daniel Turek, paci...@stat.berkeley.edu, nimble-users

Hi Chris and Daniel,

Daniel: "pencil and paper check [...] the posterior is no longer exactly dirichlet". – Absolutely right! Sorry for writing without thinking properly first.

The thing here is that by adding something like 1e-100 I expect it to be "almost conjugate". As Chris kindly mentioned, this is related to zero W-element values occurring because of underflow.

Let me report some small findings about this, and submit a request. For context, the initial problem, 2 years ago, was that zero-values for W elements were throwing off child- or parent-nodes with categorical samplers, making them give nonsensical samples. Because of this I switched to a slice sampler for the categorical nodes; it has seemed to give reasonable results.

– Now, two years later, I've tried the categorical sampler again. It behaves correctly now, even when zero-W values appear in a child- or parent-node with a Dirichlet distribution. So one of the latest versions of Nimble fixed this. Thank you so much! 🚀

– I've also applied the correction suggested by Daniel two years ago: add 1e-100 or so to W, to make sure it has no zero elements. From runs on very different datasets, I can say that the results are practically identical, within Monte Carlo variability, with those achieved without Daniel's correction. By "results" I mean not only MC estimates but also various Monte Carlo traces, standard deviations, ESS, and so on.

– Daniel's correction has the advantage that it makes the whole procedure more sound, and there are no warnings about infinite log-probabilities from the sampler. But it has the disadvantage of being slightly slower, 5% or almost 10% from a quick look. I don't know if that's because of the extra "+1e-100" deterministic node, or because of the RW-sampler that then gets used, or both.

So I was wondering whether it would make sense theoretically, and if so whether it'd be possible, to have a non-default option of adding a small number like 1e-100 to W, internally *within* the Dirichlet sampler itself, in order to avoid underflow. So that the conjugate sampler could still be used and log-probability warnings avoided.

Anyway, nothing urgent from my side. Thank you again for the continuous improvements and fixes!

Cheers,
Luca

On 250625 03:50, Daniel Turek wrote:
> Thanks for following up, Chris. I hope we can figure this one out.
>

> On Tue, Jun 24, 2025 at 8:32 PM Chris Paciorek <paci...@stat.berkeley.edu <mailto:paci...@stat.berkeley.edu>> wrote:
>
> Hi Daniel,
>
> I was talking with Luca about his code in a separate issue (the warnings when logProb values are Inf) and he reminded me that he had discussed this modeling context on the list some time ago. The issue is that W[i]=0 can be caused by numerical underflow, so it does indeed happen, unfortunately. I think at the time, I may have looked at this and not figured out a solution for the underflow happening in conjugate sampler, since there's not really a way to intervene and set the 0 values to some small epsilon. In Luca's case the underflow leads to an Inf log probability when sampling parameters of which W is a dependency. So it's all something of a numerical hairball, so to speak.
>
> -chris
>

> To unsubscribe from this group and stop receiving emails from it, send an email to nimble-users...@googlegroups.com <mailto:nimble-users%2Bunsu...@googlegroups.com>.
> To view this discussion visit https://groups.google.com/d/msgid/nimble-users/edad4822-2c09-b174-78af-ca32cc065786%40magnaspesmeretrix.org <https://groups.google.com/d/msgid/nimble-users/edad4822-2c09-b174-78af-ca32cc065786%40magnaspesmeretrix.org>.

>
> --
> You received this message because you are subscribed to the Google Groups "nimble-users" group.

> To unsubscribe from this group and stop receiving emails from it, send an email to nimble-users...@googlegroups.com <mailto:nimble-users...@googlegroups.com>.
> To view this discussion visit https://groups.google.com/d/msgid/nimble-users/CAKbe0hqLD7M73H%2BvtnrshKoa1QhRiwJ-6Vzk%2BC9VoGUQbu5Osw%40mail.gmail.com <https://groups.google.com/d/msgid/nimble-users/CAKbe0hqLD7M73H%2BvtnrshKoa1QhRiwJ-6Vzk%2BC9VoGUQbu5Osw%40mail.gmail.com?utm_medium=email&utm_source=footer>.
>

Daniel Turek

unread,

Jun 28, 2025, 9:49:51 PMJun 28

to PierGianLuca, paci...@stat.berkeley.edu, nimble-users

Luca, what you said makes sense. I just sketched out a very quick sampler, which performs a conjugate update for a dirichlet-distributed node, then adds an arbitrary constant to all elements of the new node value. The constant value is called "eps", and you can specify whatever value you want for eps (see example in the code below), or if you don't provide a value for eps then the default value is 1e-100.

This sampler is fairly simplistic and specialized, in that it only operates on a single node, that node must have a dirichlet prior distribution, and all dependencies of that node must have categorical distributions. Also, a heads up, the sampler is not clever enough to track how the target dirichlet node is actually used in the distributions of the dependent categorical nodes - it assumes that it's used directly as the probability parameter for each dependent categorical node. So you could easily violate this assumption, which would make the results incorrect.

An example of using this sampler (which is called the conjugate_dirch_plus_eps sampler) is given below, on a trivial example model. Please try it out, and let me know what you find. If you specify "eps = 0", then the constant of 0 is added, and you should get identical results as when using the conjugate sampler. And, when eps > 0, this should prevent the numeric underflow problems you were having before.

Cheers,

Daniel

library(nimble)

conjugate_dirch_plus_eps <- nimbleFunction(
contains = sampler_BASE,
setup = function(model, mvSaved, target, control) {
## control list extraction
eps <- extractControlElement(control, 'eps', 1e-100)
## node list generation
target <- model$expandNodeNames(target)
calcNodes <- model$getDependencies(target)
depNodes <- model$getDependencies(target, self = FALSE)
## numeric value generation
d <- length(model[[target]])
Ndeps <- length(depNodes)
## checks
if(length(target) > 1) stop('conjugate_dirch_plus_eps only operates on a single node')
if(model$getDistribution(target) != 'ddirch') stop('conjugate_dirch_plus_eps target node must have ddirch distribution')
depDists <- sapply(depNodes, function(x) model$getDistribution(x))
if(!all(depDists == 'dcat')) stop('conjugate_dirch_plus_eps all dependencies should have dcat distributions')
},
run = function() {
posterior_alpha <- model$getParam(target, 'alpha')
for(i in 1:Ndeps) {
depValue <- model$getParam(depNodes[i], 'value')
posterior_alpha[depValue] <- posterior_alpha[depValue] + 1
}
newTargetValue <- rdirch(1, posterior_alpha)
newTargetValue <- newTargetValue + eps
model[[target]] <<- newTargetValue
model$calculate(calcNodes)
nimCopy(from = model, to = mvSaved, row = 1, nodes = calcNodes, logProb = TRUE)
},
methods = list(
reset = function() {}
)
)

code <- nimbleCode({
x[1:3] ~ ddirch(a[1:3])
y ~ dcat(x[1:3])
y2 ~ dcat(x[1:3])
})

constants <- list(a = c(1, 1, 1))
data <- list(y = 1, y2 = 3)
inits <- list(x = c(1/3, 1/3, 1/3))

Rmodel <- nimbleModel(code, constants, data, inits)

conf <- configureMCMC(Rmodel)

## set value of eps here (default value is 1e-100)
conf$replaceSampler('x', conjugate_dirch_plus_eps, eps = 0.001)

Rmcmc <- buildMCMC(conf)

## compile, run, etc.....

PierGianLuca

unread,

Jun 30, 2025, 12:51:21 AMJun 30

to Daniel Turek, paci...@stat.berkeley.edu, nimble-users

Daniel, this is fantastic, thank you so much for your interest and time!

I'll test it right away and report back!

Cheers,
Luca

PierGianLuca

unread,

Jul 30, 2025, 5:51:59 AMJul 30

to nimble-users, paci...@stat.berkeley.edu, Daniel Turek

Hi Daniel,

I thought I had replied to your email about the "conjugate_dirch_plus_eps" sampler – but I hadn't! Sorry for the delay.

Quick report: the alternative Dirichlet sampler works great. I've compared results, across several datasets, obtained using (a) your new sampler, (b) the original conjugate one, which leads to the "infinite-log" warnings, and (c) the non-conjugate one where an "epsilon" is added to the Dirichlet output in a deterministic node.

Results from all three samplers seem to always agree within Monte Carlo variability (the Monte Carlo chains are never exactly the same, since the numbers produced by the samplers are slightly different). Monte Carlo computations with the new sampler is the fastest of the three options. I'm including three sets of example traces (two chains each) as pdfs; the pdf names should be self-explanatory.

To minimize the "epsilon correction" as much as possible, I've changed the `eps` default value to `.Machine$double.xmin` (which on my machines is 2.22507e-308) rather than 1e-100. No "infinity-log" warnings appear with this smaller value, and again I didn't see any concrete differences between the two choices.

Thank you again for the sampler! 🙏 🚀 Let me know if you want me to run any particular tests.

Cheers,
Luca

MCtraces-Danielsampler.pdf

MCtraces-conj_sampler_infinitywarns.pdf

MCtraces-non_conj_sampler.pdf

Daniel Turek

unread,

Jul 30, 2025, 7:39:12 AMJul 30

to PierGianLuca, nimble-users, paci...@stat.berkeley.edu

Fantastic, thanks for the update, Luca. I'm glad it's all working now.

Cheers!

Daniel

Reply all

Reply to author

Forward

Difference between "conjugate sampler" and "RW_dirichlet sampler"?

PierGianLuca

Daniel Turek

Chris Paciorek

Daniel Turek

PierGianLuca

Daniel Turek

PierGianLuca

PierGianLuca

Daniel Turek