Multinomial nodes in RJ-MCMC

Nels Johnson

unread,

Jul 27, 2021, 11:55:04 AM7/27/21

to nimble-users

My problem is as follows: I am doing spatial logistic regression. I have two classes of covariates, say, % cover and % bare ground in a radius r around each point in space. I compute these percentages for many values of r = 1, 2,..., R. This means I also have x11, x12,...,x1R and x21, x22, ..., x2R. I want only one of these x1's and x2's values in my model at a given time. Possibly, my model should not contain neither of the x1 or x2 values. I think this is a good candidate problem for using RJ-MCMC. However, I am not sure if nimble will let me implement what I would like. Here are two example codes illustrating what I would like the RJ model to look like:

logitCode <- nimbleCode({

beta0 ~ dnorm(0,sd = 5)

z1 ~ dmulti(size=1,prob=rep(1/R,R)) ## indicator variable for each coefficient

z2 ~ dmulti(size=1,prob=rep(1/R,R)) ## indicator variable for each coefficient

for(j in 1:R) {

beta1[j] ~ dnorm(0, sd = 2.5 )

beta2[j] ~ dnorm(0, sd = 2.5 )

zbeta1[j] <- z1[j] * beta1[j] ## indicator * beta

zbeta2[j] <- z2[j] * beta2[j] ## indicator * beta

}

for(j in 1:n) {

logit(p[j]) <- beta0 + inprod(X1[j, 1:R], zbeta1[1:R]) +

inprod(X2[j, 1:R], zbeta2[1:R])

y[j] ~ dbern(p[j])

}

})

logitCode2 <- nimbleCode({

phi ~ dunif(0,1)

beta0 ~ dnorm(0,sd = 5)

z1 ~ dmulti(size=1,prob=rep(1/R,R)) ## indicator variable for each coefficient

z2 ~ dmulti(size=1,prob=rep(1/R,R)) ## indicator variable for each coefficient

l1 ~ dbern(phi) ## indicator variable for each coefficient

l2 ~ dbern(phi) ## indicator variable for each coefficient

for(j in 1:R) {

beta1[j] ~ dnorm(0, sd = 2.5 )

beta2[j] ~ dnorm(0, sd = 2.5 )

zbeta1[j] <- z1[j] * l1 * beta1[j] ## indicator * beta

zbeta2[j] <- z2[j] * l2 * beta2[j] ## indicator * beta

}

for(j in 1:n) {

logit(p[j]) <- beta0 + inprod(X1[j, 1:R], zbeta1[1:R]) +

inprod(X2[j, 1:R], zbeta2[1:R])

y[j] ~ dbern(p[j])

}

})

What are all's thoughts on this?

Perry de Valpine

unread,

Jul 28, 2021, 2:18:08 PM7/28/21

to Nels Johnson, nimble-users

Hi Nels,

Thanks for making a new thread on this and also the two code ideas. This is a really interesting topic and we are hearing on-list and by other channels of interests in more complicated RJMCMC needs.

You are right that the RJ scheme already implemented is fairly limited. Each indicator can be paired with exactly one coefficient. The sampler for the indicator will use RJ logic to jointly propose making the coefficient 0 (if turning indicator to 0) or proposing a new indicator value (if turning indicator to 1). The regular sampler for the coefficient (what is used when it is "on") is placed inside of a toggle function, which simply skips it if it is off.

For background, the two issues addressed by this in general are wasted computation and coefficient mixing. If one simply codes indicators into the model without using the RJ samplers, two potentially undesirable things will happen. A coefficient will be sampled even when its indicator is 0, which represents wasted computation. And the coefficient will follow its prior when its indicator is 0, which means it might wander around for many iterations before reaching suitable values for the indicator to successfully mix to 1 again. In practice this has sometimes made people place informative priors on coefficients, essentially for implementation reasons rather than statistical/scientific reasons. That could be why you have informative priors on beta[j] and beta2[j].

In your first example, it looks to me like it will work correctly but suffers from both issues. The beta1[] and beta2[] elements for which z1[] and z2[], respectively, are 0, will continue to be mixed, wasting computation and following their priors. Possibly that is why you gave them informative priors. Also I am not seeing clearly whether this scheme allows you to have x1 or x2 not in the model at all. That could be covered if you add an extra column to X1 and X2 with zeros, so if that column is the currently selected one, in effect there is no X1 or X2. So aside from efficiency and having your hand forced to use informative priors (and I don't know if either are concerns for you in this project), it seems feasible.

In your second example, I think the I1 and I2 serve to turn the entire X1 or X2 on or off, respectively. But they don't satisfy a one-to-one indicator-coefficient pairing, so our basic RJ system won't work for them. So again it makes sense and could work unless informative priors or efficiency are concerns.

Here is another idea for setting it up with existing tools. And below I'll comment on extending the RJMCMC system. I am kind of assuming your informative priors were a concession to the implementation, so I will try to support uninformative priors.

z1 ~ dmulti(size=1,prob=rep(1/R,R))

z2 ~ dmulti(size=1,prob=rep(1/R,R))

z1i <- which(z1[1:R] == 1) # There was a minor bug discovered for compilation of "which" recently,

z2i <- which(z2[1:R] == 1) # so if this doesn't work, it could be done easily in a nimbleFunction instead

beta1_shared ~ dnorm(0, sd = 1000) # uninformative

beta2_shared ~ dnorm(0, sd = 1000) # ditto

l1 ~ dbern(phi) # (I'm not sure how well informed phi will be by two cases.)

l2 ~ dbern(phi)

for(j in 1:R) {

beta1[j] ~ dnorm(0, sd = 2.5 )

beta2[j] ~ dnorm(0, sd = 2.5 )

}

for(j in 1:n) {

logit(p[j]) <- beta0 + I1 * X1[j, z1i] * (beta1_shared + beta1[j]) + I2 * X2[j, z2i] * (beta2_shared + beta2[j])

y[j] ~ dbern(p[j])

}

(

or if there are not terms in the logit-scale linear combination that would be recalculated only for some j but not others, you could instead of the last for loop write

logit(p[1:n]) <- beta0 + I1 * X1[1:n, z1i] * (beta1_shared + beta1[j]) + I2 * X2[1:n, z2i] * (beta2_shared + beta2[j])

for(j in 1:n) {

y[j] ~ dbern(p[j])

}

)

Then you could do RJ with I1 indicating for beta1_shared and I2 indicating for beta2_shared. This would not save the wasted computation of updating the beta1[j] and beta2[j] elements. The idea would be that maybe the coefficients for the different r values are not wildly different, so they can share a coefficient with an uninformative prior and then have their own adjustments with informative priors. The latter will follow their priors when turned "off". There would also be wasted computation when z1 or z2 is updated if I1 or I2 is off (0), respectively. The beta1_shared and beta1[1:R] (ditto for beta2) are over-parameterized, but possibly in a way that will give reasonable interpretation (or post-processing) due to the informative priors. If the coefficient posteriors for different r values really overlap, you could possibly even remove the beta1[1:R] (ditto for beta2).

You might be able to put together variations on these ideas.

If you want to take this further, you could consider copying and modifying the RJMCMC code in the package. Although we've heard interest in this kind of thing from others as well, right now none of the core development team feels we can take time to do this ourselves, but we can support you and answer questions (on list or off list) if you want to get into it. It's all done from R in nimbleFunctions.

I hope that helps!

-Perry

--
You received this message because you are subscribed to the Google Groups "nimble-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to nimble-users...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/nimble-users/e299fa41-e1a4-49ae-be37-8e57f0e2fd69n%40googlegroups.com.

Nels Johnson

unread,

Jul 28, 2021, 6:16:47 PM7/28/21

to nimble-users

Thanks, Perry, for your suggestions. This does clarify nimble's current capability with respect with RJ-MCMC. I think given our current deadline I am going to go with a different approach, but I will definitely watch this topic for updates. If I make any progress extending nimble's tools I will certainly let you know and ask if I need help.

-Nels

Reply all

Reply to author

Forward