Sampling categorical variables with RJMCMC

359 views
Skip to first unread message

Daniel Eacker

unread,
Oct 12, 2021, 3:58:35 PM10/12/21
to nimble-users
Dear nimble users,


I'm trying to use Reverse Jump MCMC to conduct model selection for a Bayesian spatial capture-recapture model that includes categorical variables (each with > 2 levels). However, I can't seem to find even a simple linear regression example that includes how to deal with a categorical variable.

For example, if indicator variables are used for the categorical variable with 3 levels, this will necessitate two indicator variables and two additional coefficients in a linear regression model. However, I'd like to estimate the probability for the model that includes that variables vs. not including it rather than including or not just including the indicator variables.

Does anyone have any idea how to properly handle this in nimble? I thought of using the dconstraint() distribution in some way to restrict the algorithm to select a 1 when both z2 and z3  are both equal to one, but this doesn't seem correct.

The examples provided only deal with continuous variables and not a categorical factor. I see that this presents some technical challenges to Bayesian model selection, but I was just curious if you've come across this issue and found as solution.

I've created some basic simulation code (some of which I've borrowed) that I've been using to try to figure out the problem:

install.packages("MuMIn")
library(MuMIn) # for frequentist model selection comparison
library(nimble)

## Linear regression with intercept and two covariates (one is categorical with 3 levels)
code <- nimbleCode({
  beta0 ~ dnorm(0, sd = 100)
  beta1 ~ dnorm(0, sd = 100)
  beta2 ~ dnorm(0, sd = 100)
  beta3 ~ dnorm(0, sd = 100)
  sigma ~ dunif(0, 100)
  z1 ~ dbern(psi) ## indicator variable associated with beta1
  z2 ~ dbern(psi) ## indicator variable associated with beta2
  z3 ~ dbern(psi) ## indicator variable associated with beta3
   psi ~ dbeta(1, 1) ## hyperprior on inclusion probability
  for(i in 1:N) {
    Ypred[i] <- beta0 + beta1 * z1 * x1[i] + beta2 * z2 * x2[i] + beta3 * z3 * x3[i]
    Y[i] ~ dnorm(Ypred[i], sd = sigma)
  }
})

## simulate some data
set.seed(100)
N <- 100
x1 <- runif(N, -1, 1)
x2 <- as.factor(apply(rmultinom(N, 1, prob=c(1/3,1/3,1/3)),2,function(x) which(x==1)))   
x2.mat = factor2ind(x2, "1")
Y <- rnorm(N, 1 + 0.10 * x1 + 0 *x2.mat[,1] + -0.05 * x2.mat[,2], sd = 1)

summary(lm(Y~x1 + x2))


mod1 = lm(Y~x1 + x2, na.action="na.fail")
dredge(mod1)


# , constraint=1
RJexampleModel <- nimbleModel(code, constants = list(N = N),
                              data = list(Y = Y, x1 = x1, x2 = x2.mat[,1], x3 = x2.mat[,2]),
                              inits = list(beta0 = 0, beta1 = 0, beta2 = 0, beta3 = 0,
                                           sigma = sd(Y), z1 = 1, z2 = 1, z3 = 1, psi = 0.5))

RJexampleConf <- configureMCMC(RJexampleModel)

RJexampleConf$addMonitors('z1', 'z2', 'z3')

configureRJ(conf = RJexampleConf,
            targetNodes = c("beta1", "beta2", "beta3"),
            indicatorNodes = c('z1', 'z2', 'z2'),
            control = list(mean = c(0, 0, 0), scale = 2))

mcmcRJ <- buildMCMC(RJexampleConf)

cRJexampleModel <- compileNimble(RJexampleModel)

CMCMCRJ <- compileNimble(mcmcRJ, project = RJexampleModel, resetFunctions = TRUE)

set.seed(100)
system.time(samplesRJ <- runMCMC(CMCMCRJ, niter = 100000, nburnin = 10000, thin = 10))

Thanks,

Dan



Daniel Eacker

unread,
Oct 12, 2021, 5:41:34 PM10/12/21
to nimble-users
Also, the factor2ind() function is in the Survout library.

devtools::install_github("SophiaJia/Survout")
library(Survout)

Daniel Eacker

unread,
Oct 14, 2021, 1:34:34 PM10/14/21
to nimble-users
Well, I think I've got a solution to the problem I was facing with trying to determine whether a categorical variable should be in included in the model or not (as opposed to getting posterior inclusion probabilities for the indicator variables representing the categorical variable in the model. Perhaps someone more knowledgeable than me could chime in if they think there's something incorrect with the approach.

I used a coding trick introduced by Ntzoufras (2002) to deal with categorical variables in a Gibbs Variable Selection (GVS) approach. The method uses a vector (i.e., pos) and nested indexing to track which indicator variables should associate with which coefficients. For a categorical variable, this causes the entire variable to be removed when z = 0 for that coefficient, which is what I was looking to achieve:

rjmcmc_data$pos = c(1,2,2,2,2,3,4,4,4,5) # make indexing to deal with categorical variables

  for(i in 1:nvars){
    z[i] ~ dbern(0.5)
  }
  for(i in 1:(ncoefs-1)){
    z.temp[i] <- z[pos[i]]
  }


In the simulation, I fit the model in frequentist and use AIC for comparison to the Bayesian model selection inference. I then use RJMCMC algorithm provided in nimble and also compare the Gibbs Variable Selection method that is similar but uses psuedopriors to take draws from the prior when z = 0. Of course, one has to consider the priors used in any Bayesian model or model selection routine so I tested a couple of different priors. For a logistic regression model a normal(0,0.5)for the beta coefficients (parameterized as mean and precision) is a vague prior when passed through the logit link function.

Anyhow, I was pleased that the results from the Bayesian model selection procedures using this coding adjustment produced results that were very consistent with the frequentist inference and very similar across the RJMCMC and GVS. Also, without the position vector included, there are too many possible models to choose from since the number of models would depend on the number of levels in each categorical variable along with the overall number of variables.


Here is the code to run the 1) simulate the data, 2) fit frequentist models, 3) fit RJMCMC with the position vector for categorical variables, 4) no position vector for categorical variables and a latent variable for psi (the inclusion probability), and 5) with psi set at 0.5, and 6) using GVS with the position vector included:



# 1) simulate multiple linear regression problem and prepare it for the 4 model selection tests
library(Survout) # for factor2ind function

# set random seed generator
set.seed(500)

# simple function to standardize variables
std2=function(x){
  (x - mean(x,na.rm=TRUE))/(2*sd(x,na.rm=TRUE))
}

# number of observations
N=100

# create variables (both continuous and categorical)
elev = std2(rlnorm(N, log(500), sd=1))
hsi.var = as.factor(apply(rmultinom(N, 1, prob=rep(1/5,5)),2,function(x) which(x==1)))
hsi.ind = factor2ind(hsi.var, "1") # as indicator variables
site.var = as.factor(apply(rmultinom(N, 1, prob=rep(1/4,4)),2,function(x) which(x==1)))
site.ind = factor2ind(site.var, "1") # as indicator variables
sess.var = as.factor(rbinom(N, 1, 0.5))
sess.ind = factor2ind(sess.var, "1") # as indicator variable
slope = std2(runif(N, 0, 90))

# true betas
beta = c(-1,0.75,0.2,-0.5,0.3,0.6)

# generate predictions from true model = ~ elev + site + slope
set.seed(100)
y = rbinom(N, 1, plogis(beta[1] + elev*beta[2] + site.ind[,1]*beta[3] + site.ind[,2]*beta[4] + site.ind[,3]*beta[5] + slope*beta[6]))



##### TEST MODEL SELECTION METHODS AGAINST FREQUENTIST INFERENCE#############################################################################
# 2) Frequentist (fit full model and then dredge)

# make datasets for frequentist and Bayesian models
freq_data = data.frame(y=y,elev=elev, hsi = hsi.var, site = site.var, sess = sess.var, slope = slope)

#inspect dataset
head(freq_data)

fit1 = glm(y ~ elev + hsi + sess + site  + slope, na.action="na.fail", family="binomial",data = freq_data)
summary(fit1)
library(MuMIn)
(dredge.fit = dredge(fit1))
(top.model = get.models(dredge.fit, subset = 1)[[1]])
drop1(fit1)

####################################################################################################################################
# 3) RJMCMC in nimble (test 1 - modifed prior probability of inclusion estimated for each variable)
library(nimble)

# define vector for categorical variables to associate with the same inclusion porbability and variable
rjmcmc_data = list(y=y, elev=elev, hsi2 = hsi.ind[,1], hsi3 = hsi.ind[,2], hsi4 = hsi.ind[,3], hsi5 = hsi.ind[,4], site2 = site.ind[,1], site3 = site.ind[,2], site4 = site.ind[,3], sess = sess.ind[,1], slope = slope)

rjmcmc_data$pos = c(1,2,2,2,2,3,4,4,4,5) # make indexing to deal with categorical variables
rjmcmc_data$prior_prec = 0.5 # define prior precision
rjmcmc_data$ncoefs = 11 # number of coefficients including intercept (we'll force the intercept ot be included)
rjmcmc_data$nvars = max(rjmcmc_data$pos, na.rm=TRUE) # number of variables under selection

# define model code
rjmcmc_code <- nimbleCode({
  # priors
    beta[1] ~ T(dnorm(0, 0.5),-10,10)
  for(i in 2:ncoefs){
    beta[i] ~ T(dnorm(0, prior_prec),-10,10)
  }  
  #psi ~ dunif(0,1) ## hyperprior on inclusion probability
  for(i in 1:nvars){
    z[i] ~ dbern(0.5)
  }
  for(i in 1:(ncoefs-1)){
    z.temp[i] <- z[pos[i]]
  }
  # likihood
  for(i in 1:N) {
    logit(p[i]) <- beta[1] + beta[2]*z.temp[1]*elev[i] + beta[3]*z.temp[2]*hsi2[i] + beta[4]*z.temp[3]*hsi3[i] + beta[5]*z.temp[4]*hsi4[i] + beta[6]*z.temp[5]*hsi5[i] +  + beta[7]*z.temp[6]*sess[i] + beta[8]*z.temp[7]*site2[i] + beta[9]*z.temp[8]*site3[i] + beta[10]*z.temp[9]*site4[i] + beta[11]*z.temp[10]*slope[i]
    y[i] ~ dbern(p[i])
  }
})

RJexampleModel <- nimbleModel(rjmcmc_code, constants = list(N = length(rjmcmc_data$y),pos = rjmcmc_data$pos,nvars=rjmcmc_data$nvars,ncoefs=rjmcmc_data$ncoefs,prior_prec=rjmcmc_data$prior_prec),
                              data = rjmcmc_data[1:11],
                              inits = list(beta=as.numeric(summary(fit1)$coefficients[,1]),z=rbinom(rjmcmc_data$nvars,1,0.5))) # ,psi=runif(1,0.25,0.5)
 
RJexampleModel$calculate()                                          

RJexampleConf <- configureMCMC(RJexampleModel)

configureRJ(conf = RJexampleConf,
            targetNodes = c("beta[2]","beta[3]","beta[4]","beta[5]","beta[6]","beta[7]","beta[8]","beta[9]","beta[10]","beta[11]"),
            indicatorNodes = c('z.temp[1]', 'z.temp[2]','z.temp[3]','z.temp[4]','z.temp[5]','z.temp[6]','z.temp[7]','z.temp[8]','z.temp[9]','z.temp[10]'),
            control = list(mean = rep(0, rjmcmc_data$ncoefs-1), scale = 2))

RJexampleConf$printSamplers(c("z.temp[4]","beta[3]"))

if(("z.temp" %in% RJexampleConf$monitors)==FALSE){
RJexampleConf$addMonitors("z.temp")
}
if(("z" %in% RJexampleConf$monitors)==FALSE){
RJexampleConf$addMonitors("z")

}
 
mcmcRJ <- buildMCMC(RJexampleConf)

cRJexampleModel <- compileNimble(RJexampleModel)

CMCMCRJ <- compileNimble(mcmcRJ, project = RJexampleModel, resetFunctions = TRUE)

set.seed(100)
system.time(samplesRJ  <- runMCMC(CMCMCRJ, niter = 100000, nburnin = 10000, thin = 10))

round(apply(samplesRJ,2,mean),3)
(d1=data.frame(var=c("elev","hsi","sess","site","slope"),pr = round(as.numeric(apply(samplesRJ[,c("z[1]","z[2]","z[3]","z[4]","z[5]")], 2, function(x) sum(x)/length(x))),3)))
png("Indicator_RJMCMC3a.png",width = 8, height = 5, units = "in", res = 300)
barplot(pr~var,data=d1,ylim=c(0,1),ylab = "Inclusion probability",xlab="Coefficient")
dev.off()
round(as.numeric(apply(samplesRJ[,c("z.temp[1]","z.temp[2]","z.temp[3]","z.temp[4]","z.temp[5]","z.temp[6]","z.temp[7]","z.temp[8]","z.temp[9]","z.temp[10]")], 2, function(x) sum(x)/length(x))),3)
g = samplesRJ[,c("z[1]","z[2]","z[3]","z[4]","z[5]")]
g2 = paste(g[,1],g[,2],g[,3],g[,4],g[,5],sep="")
sort(round(table(g2)/length(g2), 3), decreasing=TRUE)




# 4) RJMCMC in nimble (test 2 - unmodified RJMCMC with psi as a latent variable and no "z" variable; just "z.temp")

# define model code
rjmcmc_code <- nimbleCode({
  # priors
  beta[1] ~ T(dnorm(0, 0.5),-10,10)
  for(i in 2:ncoefs){
    beta[i] ~ T(dnorm(0, prior_prec),-10,10)
  }   
  psi ~ dunif(0,1) ## hyperprior on inclusion probability
  for(i in 1:(ncoefs-1)){
    z.temp[i] ~ dbern(psi)
  }
  # likihood
  for(i in 1:N) {
    logit(p[i]) <- beta[1] + beta[2]*z.temp[1]*elev[i] + beta[3]*z.temp[2]*hsi2[i] + beta[4]*z.temp[3]*hsi3[i] + beta[5]*z.temp[4]*hsi4[i] + beta[6]*z.temp[5]*hsi5[i] +  + beta[7]*z.temp[6]*sess[i] + beta[8]*z.temp[7]*site2[i] + beta[9]*z.temp[8]*site3[i] + beta[10]*z.temp[9]*site4[i] + beta[11]*z.temp[10]*slope[i]
    y[i] ~ dbern(p[i])
  }
})

RJexampleModel <- nimbleModel(rjmcmc_code, constants = list(N = length(rjmcmc_data$y),pos = rjmcmc_data$pos,nvars=rjmcmc_data$nvars,ncoefs=rjmcmc_data$ncoefs,prior_prec=rjmcmc_data$prior_prec),
                              data = rjmcmc_data[1:11],
                              inits = list(beta=as.numeric(summary(fit1)$coefficients[,1]),z.temp=rbinom(rjmcmc_data$ncoefs-1,1,0.5),psi=runif(1,0.25,0.5))) # ,

RJexampleModel$calculate()                                          

RJexampleConf <- configureMCMC(RJexampleModel)

configureRJ(conf = RJexampleConf,
            targetNodes = c("beta[2]","beta[3]","beta[4]","beta[5]","beta[6]","beta[7]","beta[8]","beta[9]","beta[10]","beta[11]"),
            indicatorNodes = c('z.temp[1]', 'z.temp[2]','z.temp[3]','z.temp[4]','z.temp[5]','z.temp[6]','z.temp[7]','z.temp[8]','z.temp[9]','z.temp[10]'),
            control = list(mean = rep(0, rjmcmc_data$ncoefs-1), scale = 2))

RJexampleConf$printSamplers(c("z.temp[4]","beta[3]"))

if(("z.temp" %in% RJexampleConf$monitors)==FALSE){
  RJexampleConf$addMonitors("z.temp")

}

mcmcRJ <- buildMCMC(RJexampleConf)

cRJexampleModel <- compileNimble(RJexampleModel)

CMCMCRJ <- compileNimble(mcmcRJ, project = RJexampleModel, resetFunctions = TRUE)

set.seed(100)
system.time(samplesRJ  <- runMCMC(CMCMCRJ, niter = 100000, nburnin = 10000, thin = 10))

# with indicator variables
round(apply(samplesRJ,2,mean),3)
(d1=data.frame(var=c("elev","hsi2","hsi3","hsi4","hsi5","sess","site2","site3","site4","slope"),pr = round(as.numeric(apply(samplesRJ[,c("z.temp[1]","z.temp[2]","z.temp[3]","z.temp[4]","z.temp[5]","z.temp[6]","z.temp[7]","z.temp[8]","z.temp[9]","z.temp[10]")], 2, function(x) sum(x)/length(x))),3)))
png("Indicator_RJMCMC3b.png",width = 8, height = 5, units = "in", res = 300)
barplot(pr~var,data=d1,ylim=c(0,1),ylab = "Inclusion probability",xlab="Coefficient")
dev.off()
g = samplesRJ[,c("z.temp[1]","z.temp[2]","z.temp[3]","z.temp[4]","z.temp[5]","z.temp[6]","z.temp[7]","z.temp[8]","z.temp[9]","z.temp[10]")]
g2 = paste(g[,1],g[,2],g[,3],g[,4],g[,5],g[,6],g[,7],g[,8],g[,9],g[,10],sep="")
sort(round(table(g2)/length(g2), 3), decreasing=TRUE)[1:20]


# 5) RJMCMC in nimble (test 2 - unmodified RJMCMC with no indicator and priorProb=0.5 and no "z" variable; just "z.temp")

# define model code
rjmcmc_code <- nimbleCode({
  # priors
  beta[1] ~ T(dnorm(0, 0.5),-10,10)
  for(i in 2:ncoefs){
    beta[i] ~ T(dnorm(0, prior_prec),-10,10)
  }   
  # likihood
  for(i in 1:N) {
    logit(p[i]) <- beta[1] + beta[2]*elev[i] + beta[3]*hsi2[i] + beta[4]*hsi3[i] + beta[5]*hsi4[i] + beta[6]*hsi5[i] + + beta[7]*sess[i] + beta[8]*site2[i] + beta[9]*site3[i] + beta[10]*site4[i]  + beta[11]*slope[i]
    y[i] ~ dbern(p[i])
  }
})

RJexampleModel <- nimbleModel(rjmcmc_code, constants = list(N = length(rjmcmc_data$y),pos = rjmcmc_data$pos,nvars=rjmcmc_data$nvars,ncoefs=rjmcmc_data$ncoefs,prior_prec=rjmcmc_data$prior_prec),
                              data = rjmcmc_data[1:11],inits = list(beta=as.numeric(summary(fit1)$coefficients[,1])))
RJexampleModel$calculate()                                          

RJexampleConf <- configureMCMC(RJexampleModel)

configureRJ(conf = RJexampleConf,
            targetNodes = c("beta[2]","beta[3]","beta[4]","beta[5]","beta[6]","beta[7]","beta[8]","beta[9]","beta[10]","beta[11]"),
            priorProb = 0.5,
            control = list(mean = rep(0, rjmcmc_data$ncoefs-1), scale = 2))


mcmcRJ <- buildMCMC(RJexampleConf)

cRJexampleModel <- compileNimble(RJexampleModel)

CMCMCRJ <- compileNimble(mcmcRJ, project = RJexampleModel, resetFunctions = TRUE)

set.seed(100)
system.time(samplesRJ  <- runMCMC(CMCMCRJ, niter = 100000, nburnin = 10000, thin = 10))

# with no indicator variables
betaCols <- grep("beta\\[", colnames(samplesRJ))
posterior_inclusion_proportions <- colMeans(apply(samplesRJ[, betaCols],
                                                  2, function(x) x != 0))
(d1=data.frame(var=c("elev","hsi2","hsi3","hsi4","hsi5","sess","site2","site3","site4","slope"),pr=posterior_inclusion_proportions[2:11]))
png("Indicator_RJMCMC3c.png",width = 8, height = 5, units = "in", res = 300)
barplot(pr~var,data=d1,ylim=c(0,1),ylab = "Inclusion probability",xlab="Coefficient")
dev.off()

### END TEST RJMCMC ##################################################################################



### GIBBS VARIABLE SELECTION #########################################################################
# 6) Gibbs variable selection

# put data into a list
gvs_data = list(y=y, elev=elev, hsi2 = hsi.ind[,1], hsi3 = hsi.ind[,2], hsi4 = hsi.ind[,3], hsi5 = hsi.ind[,4], site2 = site.ind[,1], site3 = site.ind[,2], site4 = site.ind[,3], sess = sess.ind[,1], slope = slope)
gvs_data$prior_prec = 0.5 # define prior precision
gvs_data$pos = c(1,2,2,2,2,3,4,4,4,5) # make indexing to deal with categorical variables
gvs_data$ncoefs = 11 # number of coefficients including intercept (we'll force the intercept ot be included)
gvs_data$nvars = max(gvs_data$pos, na.rm=TRUE) # number of variables under selection
gvs_data$nmodels = 2^gvs_data$nvars

# get psuedopriors from frequentist fit
gvs_data$mu.beta.ps = as.numeric(summary(fit1)$coefficients[2:nrow(summary(fit1)$coefficients),1])
gvs_data$tau.beta.ps = 1/(as.numeric(summary(fit1)$coefficients[2:nrow(summary(fit1)$coefficients),2])^2)

# define model code
gvs_code <- nimbleCode({
  # priors
  beta[1] ~ T(dnorm(0, 0.5),-10,10)
  for(i in 2:ncoefs){
    beta[i] ~ T(dnorm(beta_mu_prior[i-1], beta_tau_prior[i-1]),-10,10)
  }  
  # psuedopriors
  for(i in 1:(ncoefs-1)){
    beta_mu_prior[i] <- (1-z.temp[i]) * mu.beta.ps[i]
    beta_tau_prior[i] <- (1-z.temp[i]) * tau.beta.ps[i] + z.temp[i] * prior_prec
  }
  # Model selection code
  for(i in 1:nvars){
    z[i] ~ dbern(0.5) # prior probability for variable inclusion
  }
  for(i in 1:(ncoefs-1)){
    z.temp[i] <- z[pos[i]]
  }
  # likihood
  for(i in 1:N) {
    logit(p[i]) <- beta[1] + beta[2]*z.temp[1]*elev[i] + beta[3]*z.temp[2]*hsi2[i] + beta[4]*z.temp[3]*hsi3[i] + beta[5]*z.temp[4]*hsi4[i] + beta[6]*z.temp[5]*hsi5[i] +  + beta[7]*z.temp[6]*sess[i] + beta[8]*z.temp[7]*site2[i] + beta[9]*z.temp[8]*site3[i] + beta[10]*z.temp[9]*site4[i] + beta[11]*z.temp[10]*slope[i]
    y[i] ~ dbern(p[i])
  }

})

RJexampleModel <- nimbleModel(gvs_code, constants = list(N = length(gvs_data$y),pos = gvs_data$pos,nvars=gvs_data$nvars,ncoefs=gvs_data$ncoefs,prior_prec=gvs_data$prior_prec,
                                                         mu.beta.ps=gvs_data$mu.beta.ps, tau.beta.ps=gvs_data$tau.beta.ps),                   
                              data = gvs_data[1:11],inits = list(beta=as.numeric(summary(fit1)$coefficients[,1]),z=rbinom(gvs_data$nvars,1,0.5)))

RJexampleModel$calculate()                                          

RJexampleConf <- configureMCMC(RJexampleModel, monitors=c("z.temp","z","beta"))


mcmcRJ <- buildMCMC(RJexampleConf)

cRJexampleModel <- compileNimble(RJexampleModel)

CMCMCRJ <- compileNimble(mcmcRJ, project = RJexampleModel, resetFunctions = TRUE)

set.seed(100)
system.time(samplesRJ  <- runMCMC(CMCMCRJ, niter = 100000, nburnin = 10000, thin = 10))

round(apply(samplesRJ,2,mean),3)
(d1=data.frame(var=c("elev","hsi","sess","site","slope"),pr = round(as.numeric(apply(samplesRJ[,c("z[1]","z[2]","z[3]","z[4]","z[5]")], 2, function(x) sum(x)/length(x))),3)))
png("Indicator_GVS4.png",width = 8, height = 5, units = "in", res = 300)
barplot(pr~var,data=d1,ylim=c(0,1),ylab = "Inclusion probability",xlab="Coefficient")
dev.off()
round(as.numeric(apply(samplesRJ[,c("z.temp[1]","z.temp[2]","z.temp[3]","z.temp[4]","z.temp[5]","z.temp[6]","z.temp[7]","z.temp[8]","z.temp[9]","z.temp[10]")], 2, function(x) sum(x)/length(x))),3)
g = samplesRJ[,c("z[1]","z[2]","z[3]","z[4]","z[5]")]
g2 = paste(g[,1],g[,2],g[,3],g[,4],g[,5],sep="")
sort(round(table(g2)/length(g2), 3), decreasing=TRUE)
cbind(round(apply(samplesRJ,2,mean),3)[1:11],as.numeric(summary(fit1)$coefficients[,1]))

############# END GIBBS VARIABLE SELECTION #################################################


Thanks,

Dan

Sally paganin

unread,
Oct 21, 2021, 4:14:24 PM10/21/21
to Daniel Eacker, nimble-users
Hi Daniel, 

many thanks for keeping up with the updates and sorry for the late reply, unfortunately this ended up in the spam folder and it took a bit to get up to speed with the problem. 

I haven't looked carefully into the Gibbs Variable Selection (GVS) approach and  Ntzoufras (2002) paper, but I have some inputs on the nimble implementation. Consider the model written with indicator variables. When using configureRJ, nimble assigns two custom samplers to the regression coefficients and the indicator variables; for the first ones it uses a modified version of the original sampler, which performs updates only when the target node is already in the model (toggled sampler) , while for the indicator variables a reversible jump sampler to transition the variable in/out of the model (RJ sampler). 

In your code example (3) what happens is that, since z.temp is deterministic, configureRJ assigns the toggled sampler to the betas, but it retains the binary sampler for the z's. In fact nimble should throw a warning about the fact that it cannot assign a sampler to z.temp.  This means that when a z[i] is 0, the corresponding beta(s) just stay where they are during sampling. The MCMC runs, but it's not technically a legitimate RJMCMC scheme, since when one proposes to set z[i] to 1, there is no accompanying proposal for beta(s) from the RJ proposal distribution. Rather the beta is whatever it was the last time z[i] was 1.


Given this situation, the nimble-dev team will think about whether we should prevent the toggled sampler from being assigned when the special RJ sampler for the indicator cannot be assigned.


Going back on the RJMCMC problem,  you might be able to modify the reversible jump code to account for multiple nodes in the proposal distribution (see file MCMC_RJ.R in our source code at https://github.com/nimble-dev/nimble/blob/devel/packages/nimble/R/MCMC_RJ.R ). 


If you decide to try, I'd suggest modifying only sampler_RJ_indicator (at line 58) to account for multiple nodes, using simple independent normal proposals for each of the beta(s) associated with one variable. Here some pointers using for example variable hsi2 with indicator z[2] and nodes beta[3], .., beta[6]
  • setup code: control$targetNode will be a vector of nodes (e.g. beta[3], .., beta[6]) instead of one node.
  • run code: here is where the sampler makes the proposal to insert/remove the coefficients. We need to account for multiple coefficients when making the proposal to add or remove the coefficient nodes. For example the proposal to add a node is made in lines 76 - 77, and would become
proposalCoef <- numeric(length(coefNodes))
logProbForwardProposal <- 0
for(l in 1:length(coefNodes)){
   proposalCoef[l] <- rnorm(1, proposalMean, proposalScale)
  logProbForwardProposal <- logProbForwardProposal + dnorm(proposalCoef, proposalMean, proposalScale, log = TRUE)
}
  • Then you may skip the configureRJ function and assign your samplers manually; this should be something like this: 

nodeControl  = list(mean = 0, scale = 1)
nodeControl$targetNode <- c("beta[3]", "beta[4]","beta[5]", "beta[6]")

conf$removeSamplers("z[2]")
conf$addSampler(type = my_sampler_RJ_indicator,
                  target = "z[2]",
                  control = nodeControl)

## Add sampler for the coefficient variable (when is in the model)
currentConf <- conf$getSamplers("beta[3]")
conf$addSampler(type = sampler_RJ_toggled,
                  target = "beta[3]",
                  control = list(samplerType = currentConf[[1]]))


Of course, happy to follow-up on this in case you decide to give it a try! 

Best, 

Sally 

--
You received this message because you are subscribed to the Google Groups "nimble-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to nimble-users...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/nimble-users/1340c72e-0e0e-4a86-9f67-362ab454ae7fn%40googlegroups.com.

Daniel Eacker

unread,
Oct 25, 2021, 1:29:19 PM10/25/21
to nimble-users
Hi Sally,

I really appreciate you getting back to me and all the great work by the development team that went into building and improving NIMBLE. The software has been a game changer for running Bayesian models and I've been able to greatly speed up the run time and improve convergence for the spatial capture-recapture models I've been working. I'm currently trying to get past the model selection hurdle, which is why I wanted help resolving the issue of selection for factored variables.

Anyhow, I've got a chance to test out your suggestions, and after a few problems getting the "sampler_RJ_indicator" function to deal with a factored variable that is represented by two or more indicator variables in the model, I think I've got it to work!


Anyhow, I've noticed that the model averaged estimates seem correct now as I think they were kind of funky in my earlier tries. Also, I didn't use the pos but just used the same indicator variable ("z[3]") for both beta[3] and beta[4]. This seems to work just fine but maybe you can take a look at the example when you have time (I tried to attach the files but it was taking forever didn't seem like it was going to work for me, so the code is below).

Thanks again for the help!

Sincerely,

Dan

# my_sampler_RJ_indicator #######################################
 my_sampler_RJ_indicator <- nimbleFunction(
    name = 'my_sampler_RJ_indicator',
    contains = sampler_BASE,
    setup = function(model, mvSaved, target, control) {
      ## note: target is the indicator variable,
      ## control$targetNode is the variable conditionally in the model
      ## control list extraction
      coefNode      <- control$targetNode
      coefNodes     <- control$targetNode_c
      proposalScale <- control$scale
      proposalMean  <- control$mean
      ## node list generation
      calcNodes <- model$getDependencies(c(coefNode, target))
      calcNodesReduced <- model$getDependencies(target)
    },
    run = function() {
      currentIndicator <- model[[target]]
      if(currentIndicator == 0) {   ## propose addition of coefNode
        currentLogProb <- model$getLogProb(calcNodesReduced)
        proposalCoef <- numeric(length(coefNode))
        logProbForwardProposal <- 0
        for(l in 1:length(coefNode)){

          proposalCoef[l] <- rnorm(1, proposalMean, proposalScale)
          logProbForwardProposal <- logProbForwardProposal + dnorm(proposalCoef[l], proposalMean, proposalScale, log = TRUE)
        }
        model[[coefNodes]] <<- proposalCoef
        model[[target]] <<- 1
        proposalLogProb <- model$calculate(calcNodes)
        logAcceptanceProb <- proposalLogProb - currentLogProb - logProbForwardProposal
      } else {                      ## propose removal of coefNode
        currentLogProb <- model$getLogProb(calcNodes)
        currentCoef <-  model[[coefNodes]]
        logProbReverseProposal<- 0
        for(l in 1:length(coefNode)){
          logProbReverseProposal <- logProbReverseProposal + dnorm(currentCoef[l], proposalMean, proposalScale, log = TRUE)
        }
        model[[coefNodes]] <<- 0
        model[[target]] <<- 0
        model$calculate(calcNodes)
        logAcceptanceProb <- model$getLogProb(calcNodesReduced) - currentLogProb + logProbReverseProposal
      }
      accept <- decide(logAcceptanceProb)
      if(accept) { copy(from = model, to = mvSaved, row = 1, nodes = calcNodes, logProb = TRUE)
      } else     { copy(from = mvSaved, to = model, row = 1, nodes = calcNodes, logProb = TRUE) }
    },
    methods = list(
      reset = function() { }
    )
  )
#######################################################

## example RJMCMC #######################################

require(nimble)


code <- nimbleCode({
      beta0 ~ dnorm(0, sd=100)
  for(i in 1:4){
      beta[i] ~ dnorm(0, sd=100)
   }
      sigma ~ dunif(0, 4)
      for(i in 1:3){
        z[i] ~ dbern(psi)
      }
      psi ~ dbeta(1,1) ## hyperprior on inclusion probability
      for(i in 1:N) {
        Ypred[i] <- beta0 + beta[1]*x1[i]*z[1] + beta[2]*x2[i]*z[2] + beta[3]*x3[i]*z[3] +  beta[4]*x4[i]*z[3]

        Y[i] ~ dnorm(Ypred[i], sd = sigma)
      }
    })
    
    ## simulate some data
    set.seed(1)
    N <- 200

    # simple function to standardize variables
    std2=function(x){(x - mean(x,na.rm=TRUE))/(2*sd(x,na.rm=TRUE))}
    
    x1 <- std2(runif(N, -1, 1))
    x2 <- std2(runif(N, -1, 1)) ## this covariate has no effect
    x34 = rmultinom(N, 1, c(1/3,1/3,1/3))
    x34_fact = as.factor(apply(x34, 2, function(x) which(x==1))) # factor for frequentist approach
    x3 <- ifelse(apply(x34, 2, function(x) which(x==1))==2,1,0)
    x4 <- ifelse(apply(x34, 2, function(x) which(x==1))==3,1,0)
    trueCoefs = c(1,-1.2,0,1.35,1.2)
    Y <- rnorm(N, trueCoefs[1] + trueCoefs[2]*x1 + trueCoefs[3]*x2 + trueCoefs[4]*x3 + trueCoefs[5]*x4, sd = 1)
    summary(lm(Y~x1 + x2 + x34_fact, na.action = "na.fail"))
    require(MuMIn)
    dredge(lm(Y~x1 + x2 + x34_fact, na.action = "na.fail"))
    ## build the model
    rIndicatorModel <- nimbleModel(code, constants = list(N = N),
                                   data = list(Y = Y, x1 = x1, x2 = x2, x3=x3, x4=x4),
                                   inits = list(beta0 = rnorm(1,0,0.5), beta=rnorm(4,0,0.5),sigma = sd(Y),z=rbinom(3,1,0.5),psi=runif(1,0.25,0.75)))  
    
    conf <- configureMCMC(rIndicatorModel)
    
    # load my_sampler_RJ_indicator.R that has a slight change to original nimbleFunction
    source("my_sampler_RJ_indicator.R")
    
    nodeControl  = list(mean = 0, scale = 2)
    # z[1] indicator variable
    nodeControl$targetNode <- c("beta[1]")
    conf$removeSamplers("z[1]")
    conf$addSampler(type = sampler_RJ_indicator,
                    target = "z[1]",

                    control = nodeControl)
    
    ## Add sampler for the coefficient variable (when is in the model)
    currentConf <- conf$getSamplers("beta[1]")
    conf$removeSamplers("beta[1]")
    conf$addSampler(type = sampler_RJ_toggled,
                    target = "beta[1]",

                    control = list(samplerType = currentConf[[1]]))
    
    # z[2] indicator variable
    conf$removeSamplers("z[2]")
    nodeControl$targetNode <- c("beta[2]")
    conf$addSampler(type = sampler_RJ_indicator,

                    target = "z[2]",
                    control = nodeControl)
    
    ## Add sampler for the coefficient variable (when is in the model
    currentConf <- conf$getSamplers("beta[2]")
    conf$removeSamplers("beta[2]")
    conf$addSampler(type = sampler_RJ_toggled,
                    target = "beta[2]",

                    control = list(samplerType = currentConf[[1]]))   
    
    # z[3] indicator variable
    conf$removeSamplers("z[3]")
   nodeControl$targetNode_c <- "beta[3:4]" # supply concatenated version of variable to deal with model[[coefNode]]
    nodeControl$targetNode <- c("beta[3]","beta[4]")
    conf$addSampler(type = my_sampler_RJ_indicator,
                    target = "z[3]",

                    control = nodeControl)
    
    ## Add sampler for the coefficient variable (when is in the model)
    currentConf <- conf$getSamplers(c("beta[3]"))
    conf$removeSamplers(c("beta[3]"))
    conf$addSampler(type = sampler_RJ_toggled,
                    target = c("beta[3]"),

                    control = list(samplerType = currentConf[[1]]))   
    
    currentConf <- conf$getSamplers(c("beta[4]"))
    conf$removeSamplers(c("beta[4]"))
    conf$addSampler(type = sampler_RJ_toggled,
                    target = c("beta[4]"),

                    control = list(samplerType = currentConf[[1]]))
    
    conf$addMonitors("z[1]", "z[2]", "z[3]")
    
    rIndicatorMCMC <- buildMCMC(conf)
    cIndicatorModel <- compileNimble(rIndicatorModel)
    cIndicatorMCMC <- compileNimble(rIndicatorMCMC, project = rIndicatorModel, resetFunctions = TRUE, showCompilerOutput = FALSE)
    
    set.seed(1)
    samples <- runMCMC(cIndicatorMCMC, 100000, nburnin = 10000)
    
    zCols <- grep("z\\[", colnames(samples))
    round(posterior_inclusion_prob <- colMeans(samples[, zCols]),3)

    bCols <- grep("beta\\[", colnames(samples))
    round(ma_coefs <- c(beta0=mean(samples[,"beta0"]),colMeans(samples[, bCols])),3)
    trueCoefs
#######################################################
Reply all
Reply to author
Forward
0 new messages