Cautionary note for using categorical or multinomial distributions

47 views

Skip to first unread message

Michael Schaub

unread,

Aug 14, 2025, 10:59:24 AMAug 14

to hmec...@googlegroups.com

Hi all,

I have always believed, perhaps naively, that the categorical and multinomial distributions in Nimble and JAGS do not work when the cell probabilities do not sum to one. However, this is not the case; they also work in this instance, but the estimated parameters differ from those obtained when the cell probabilities sum to one. I have summarised these findings in the attached cautionary note. Many of you are probably already aware of this issue, in which case you can ignore this post. However, for those who are unfamiliar with this issue, the findings may be useful.

Kind regards

Michael

PD Dr. Michael Schaub
Leiter Bereich "Populationsbiologie"
Dozent | Lecturer Universität Bern
Tel. ++41 41 462 97 66
michael...@vogelwarte.ch
www.vogelwarte.ch

Check out our new book about ‘Integrated Population Models’ at www.vogelwarte.ch/ipm

Willkommen im neuen Besuchszentrum in Sempach! http://www.vogelwarte.ch/de/besuch/
Bienvenue au nouveau centre de visite à Sempach! http://www.vogelwarte.ch/fr/visite/

Cautionary note on the use of the categorical and multinomial distributions in JAGS and Nimble.pdf

Perry de Valpine

unread,

Aug 14, 2025, 2:43:10 PMAug 14

to Michael Schaub, hmec...@googlegroups.com

Hi Michael,

Thanks for sharing this. It is quite interesting. As you explain at the end, nimble (and I assume JAGS) are working correctly and as documented, and the issues you raise are rather more about different ways to parameterize models and set up priors (sometimes unintentionally ending up with informative priors on the parameters of interest). I would like to add a few comments in case it helps.

The first case where you illustrate perhaps surprising results is when you have three parameters, p[1], p[2], p[3], each with a prior dunif(0,1), and you get surprising estimates of p[1], p[2], p[3] given data. However, when you post-process the posterior you obtain better results. This makes a lot of sense. One way to probe this is to simulate from the prior and look at the induced prior on the parameters of interest.

# Simulate from the prior

p1 <- runif(10000, 0, 1)
p2 <- runif(10000, 0, 1)
p3 <- runif(10000, 0, 1)

# look at the induced prior of the parameter of interest
hist(p1 / (p1+p2+p3))

# Clearly this is a weird and informative prior.

In your case the results after transforming the posterior to the parameters of interest (e.g. p1/(p1+p2+p3) ) looked more reasonable (outweighing the informative prior with lots of data), but my point is that seeing how a prior on the parameters in the model induces a quite different prior on parameter(s) of interest can be a flag to be careful, and one can do this before looking at posteriors.

There are really only two free parameters among the three proportions (in compositional data analysis they say the parameters "live on the simplex", meaning the constraint to sum to 1).

I also just wanted to note (since you pointed out that it may not be clear whether nimble and/or JAGS normalize the probabilities) that with nimble if one is not sure what is being calculated, one can check in several ways. To check if dmulti normalizes the probabilities to sum to 1, one can do:

pA <- c(.1, .3, .6)

pB <- c(10, 30, 60)

x <- c(2, 5, 5)

n <- 12

# 1. Call function directly

nimble::dmulti(x, size=n, prob = pA, log=TRUE)

nimble::dmulti(x, size=n, prob = pB, log=TRUE)

Generally nimble's distribution calls from R will use the same internal code as the compiled model. The same value returned from these shows that pB is being normalized.

#2. Put it in an uncompiled model and call it

m <- nimbleModel(nimbleCode({x[1:3] ~ dmulti(size=n, prob=p[1:3])}))

m$x <- x

m$n <- n

m$p <- pA

m$calculate("x")

m$p <- pB

m$calculate("x")

#3. Put it in a compiled model and call it

cm <- compileNimble(m)

cm$p <- pA

cm$calculate("x")

cm$p <- pB

cm$calculate("x")

One reason the normalization is helpful is that one can end up with numerical glitches otherwise. Another is that one can parameterize more flexibly (as you've done) and then post-process to parameters of interest. However, if one wants it to work differently, one can write a new version of dmulti (e.g. taking only length(p)-1 parameters and computing the last one) as a nimbleFunction.

By the way, negative values will not be valid because even when normalized one will have the log of a negative number:

pC <- c(-1, 30, 60)

nimble::dmulti(x, size=n, prob = pC, log=TRUE) # returns NaN from log of a negative

Thanks again for raising this issue.

Best wishes,

Perry

--
*** Three hierarchical modeling email lists ***
(1) unmarked: for questions specific to the R package unmarked
(2) SCR: for design and Bayesian or non-bayesian analysis of spatial capture-recapture
(3) HMecology (this list): for everything else, especially material covered in the books by Royle & Dorazio (2008), Kéry & Schaub (2012), Kéry & Royle (2016, 2021) and Schaub & Kéry (2022)
---
You received this message because you are subscribed to the Google Groups "hmecology: Hierarchical Modeling in Ecology" group.
To unsubscribe from this group and stop receiving emails from it, send an email to hmecology+...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/hmecology/GV0P278MB078242BE8E43606BC232A2639A35A%40GV0P278MB0782.CHEP278.PROD.OUTLOOK.COM.

Reply all

Reply to author

Forward

0 new messages