RJMCMC with indicator variables

112 views
Skip to first unread message

Dave Hudson

unread,
Jun 21, 2021, 10:44:05 AM6/21/21
to nimble-users
Hi team,

I am experimenting with the RJMCMC features in Nimble and trying to run it with some microsatellite data. I have a simplified model running with indicators on 5 different markers and it runs fine which is great but is there anyway of constraining the model so that only one or two of the indicators/variables can be switched on at any one time. Basically I am trying to check if any one of the microsatellite markers is better than an in breeding coefficient... so I only want each one to be tested one at a time against the inbreeding coefficient. I could do this 5 different times with each marker variable in turn running against the inbreeding coefficient but if I have more markers (which I do!) this is going to be quite costly in time!
I hope that makes sense?!
Many thanks
Dave

Chris Paciorek

unread,
Jun 21, 2021, 3:58:47 PM6/21/21
to Dave Hudson, nimble-users
hi Dave,

I think you could do this by introducing a constraint into your model.  For a simpler case that you have two indicators, you could simply specify that, say, ind1 * ind2 == 0 to ensure they are not both on. For your "one or two" out of five, you could presumably do something like "sum(ind[1:5]) <= 2". 

See Section 5.2.7.3 for how to specify constraints.

That said, there will be some inefficiency in that the RJMCMC will make some proposals that violate the constraint and do some unnecessary computation before the constraint part of the model rules out the proposal. It might be possible to rework the RJMCMC samplers to never make a proposal that violates the constraint, though I don't know if that would save non-negligible time. I think you would basically need to pass the relevant indicators in as part of the 'control' argument of the reworked samplers. 

-Chris

--
You received this message because you are subscribed to the Google Groups "nimble-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to nimble-users...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/nimble-users/bed2cab8-3280-49d0-9889-865f3179579fn%40googlegroups.com.

Daniel Turek

unread,
Jun 25, 2021, 8:43:15 AM6/25/21
to Dave Hudson, nimble-users, Chris Paciorek
Dave, I wanted to add a few comments here, in case it helps.  The use of constraints may or may not work well, as Chris said for reasons of efficiency.  The current RJ system adds/removes variables one-at-a-time, and therefore there could be many "proposed additions" which involve model likelihood calculations, but ultimately end up in a guaranteed rejection because they violate the constraint (of including too many marker variables).  Furthermore, this would scale progressively worse, in the presence of even more possible marker variables.

An alternate approach that came to mind for me, thinking about your problem of "selecting 1 among 5 markers to include in the model" (and later, perhaps more) would be to reframe the RJ where it contains a single indicator variable, with a categorical prior, which determines which marker variable is included in the model.  Then, the selection of "which single marker" is encompassed into a single RJ move, first updating the categorical indicator of "which marker to include", and then updating the value of the (single) marker which is currently in the model.  I think this would be much more efficient, and hopefully would also scale well.  The downside, here, is that it would involve re-working some of NIMBLE's RJ sampler functions, to operate instead on a categorical indicator (a modified version of the sampler_RJ_indicator function), and also a modified version of the sampler_RJ_toggled function, to only update the value of the "current" marker.  You can find code for these, as they are used in the current RJ system, at:


Let us know if this sounds appealing, or you decide to go down this path.  It sounds like an interesting problem you're working on.

Cheers,
Daniel



Dave Hudson

unread,
Jun 25, 2021, 10:28:41 AM6/25/21
to nimble-users
Thanks so much Daniel and Chris for the input.

The method that Chris suggested actually worked quite well for what I was after and also in a number of other situations with simple constraints which I have to admit was very cool! 

I have now gone a little further with the attached model, so I was trying to manipulate a situation where only z[4] (the interaction term) could be switched on if both z[1] and z[3] (the main effects) were also switched on so included in the model.  The attached configuration in the screenshot really didnt work but by making the constraint <= rather than == seems to have made all the difference... just now running a model with the second constraint included also.

I think I get the gist of your second option Daniel... I will see if I can get something working!!

Many thanks again!

Dave
Screenshot 2021-06-25 152155.jpg

Nels Johnson

unread,
Jul 26, 2021, 6:48:07 PM7/26/21
to nimble-users
I have a similar problem to this. Also, I am a brand new nimble user who was attracted specifically by RJ-MCMC. We are doing a spatial regression where spatial covariates are local averages at a point over some radius. We have many potential radiuses we could use for each covariate type, but we would only like to use one radius for each covariate type. What that means is I would like to have  z ~ dmulti instead of z ~ dbern for all radiuses of the same covariate type. Ideally I would like to have z1, z2, ..., zp ~ dmulti (one for each covariate type) and then another variable l1, l2, ..., lp ~ dbern which kicks the z's in and out of the model. Is this possible in nimble? I have written nimble code that works the way I would hope it would, but that doesn't give sensible answers. Should I make this its own thread?

Perry de Valpine

unread,
Jul 27, 2021, 11:23:23 AM7/27/21
to Nels Johnson, nimble-users
Hi Nels,

This sounds interesting.  Yes, could you please make a new thread (that helps us stay organized) and show the code you have?  I think what you are saying is that you have several explanatory variables for variable selection.  And for each of those variables, you have alternative versions based on different spatial scales (averaging radiuses), of which you want one and only one.  So a variable could be in or out of the model, and if it is in the model it should be one and only one of the possible versions.  Is that right?  And you want RJ to work on the first aspect of the problem, but not the second, which you will do with dmulti (or possibly dcat)?  It would be helpful to see your idea in code.

-Perry


Reply all
Reply to author
Forward
0 new messages