Custom sampler for Chinese Restaurant Table Distribution

59 views
Skip to first unread message

David Christianson

unread,
Sep 18, 2025, 9:14:58 AMSep 18
to nimble-users
Hello All, I am working with a data created by a CRP but in a way the precludes use of nimble's included dCRP distribution. In my case, the number of customers that may patronize the restaurant ('M[i]') are a random variable distributed as a gamma poisson. The number of tables they occupy ('K[i]') follows a CRP conditional on M (and also alpha, using Antoniak's parameterization). By default nimble assigns slices samplers to both M and K which is probably inefficient because K is conditional on M and has an upper limit of M. Despite this, my current model model behaves as intended as longs as M does not equal zero.  However, M=0 & K=0 are certainly within the parameter space I wish to sample. The slice sampler can never entertain this possibility because whenever M=0 is proposed, K is >0 which has probability 0 (impossible to have an occupied table if there are no customers).  Likewise if K=0 is every proposed, M is >0, which is also not possible.  Initializing M[i], K[i] nodes at 0 results in the sampler sticking at 0 for the same reason.  What I think I need is sampler that makes proposals for M[i] and K[i] jointly to respect the rule that if M=0, then K=0 and K=0 only when M=0, and the constraint that K must always be less less than or equal to M (for M>0).  
My challenge is that I have no experience writing samplers and am struggling to teach myself by looking at the internal code of existing samplers.  Are there other resources or tutorials out there on writing custom samplers? In my case, would it work best to edit one of the existing samplers or start from scratch?  Might some sort of 'cheat' work here where perhaps my custom CRT distribution allows for a non-zero probability of K>0 when M=0 to unstick the existing slice sampler if M is initiated at 0?  Any suggestions or resources would be appreciated.    
Thanks in advance.

Perry de Valpine

unread,
Sep 18, 2025, 9:38:06 AMSep 18
to David Christianson, nimble-users
Hi David,
You could take a look at the repository of workshop materials at https://github.com/orgs/nimble-training/repositories. The nimble-virtual-2023 is one that I think has an example of a custom sampler, and there may be others. It does often work well to start by copying one of nimble's samplers from the source code into your own code file. Just change the "name" argument so it doesn't conflict then you should be able to start editing and using it. Remember you can run an MCMC uncompiled so you can get inside the sampler code at run-time with R debugging tools (e.g. "browser()" and "debug(mysampler)") to help follow step by step and clarify what is happening. Of course, I've just responded about software mechanics, and you will also need some confidence on the validity of any sampler you create. You could ask further about that. I hope this gets you started.
Perry

--
You received this message because you are subscribed to the Google Groups "nimble-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to nimble-users...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/nimble-users/cae583c4-7a68-404b-ace9-b229ec070d96n%40googlegroups.com.

David Christianson

unread,
Sep 18, 2025, 2:05:05 PMSep 18
to nimble-users
Thanks Perry, I will start with those resources.  In my case, the slice sampler 'works' (other than it won't explore the space with 0 tables) so I am not sure whether R debugging on the uncompiled models will help, but should help to explore the behavior of my custom sampler...which I am confident will not work on the first several tries!   

David Christianson

unread,
Sep 18, 2025, 5:53:09 PMSep 18
to nimble-users
I have taken a first stab based on one of the custom samplers posted to the group (Andrew Seaton) and after looking at the github tutorial pages. I know my sampler is probably incredibly inefficient but I also know it won't work currently because M and K are indexed and I would appreciate any insight on how to reference a particular indexed node in a custom sampler.  This may be the smallest of the problems this sampler may face! Thanks again!
customMKsampler <- nimbleFunction(
  contains = sampler_BASE,
  setup = function(model, mvSaved, target, control) {
    NULL
  },
  run = function(){
    current_M <- model[['M']]
    current_K <- model[['K']]
    currentLogProb <- model$calculate('M')+model$calculate('K')
   
    # create a random step up or down by one for K and M
    Mstep<--1+rbern(1,0.5)*2
    Kstep<--1+rbern(1,0.5)*2
     
    if(current_M==0){ # unstick from 0
      proposed_K<-1
      proposed_M<-1
    }
    if (current_M + Mstep==0){ # if proposed M steps into 0, then K must also be 0
        proposed_M<-0
        proposed_K<-0
    }
    if (current_K+Kstep==0){ #if proposed K steps into 0, then M must also become 0
        proposed_M<-0
        proposed_K<-0
    }
    if ((current_K+Kstep)>(current_M+Mstep)){ # if proposed K is bigger than proposed M, then set equal
       proposed_M<-proposed_K<-current_M+Mstep
    }
     
   
    model[['M']] <<- proposed_M
    model[['K']] <<- proposed_K
    proposedLogProb <- model$calculate('M')+model$calculate('K')
   
    # Calculate acceptance probability
    logAcceptanceRatio <- proposedLogProb - currentLogProb
    if (log(runif(1)) < logAcceptanceRatio) {
      jump <- TRUE
    } else {
      jump <- FALSE
    }
   
    if(jump) copy(from = model, to = mvSaved, row = 1,
                  nodes = c("M","K"), logProb = TRUE)
    else     copy(from = mvSaved, to = model, row = 1,
                  nodes = c("M","K"), logProb = TRUE)
  },
  methods = list(reset = function(){})
)


Chris Paciorek

unread,
Sep 21, 2025, 9:43:56 PMSep 21
to David Christianson, nimble-users
Hi David,
If the value of `target` is, say, `c('M','K')` you can do things like this in setup code:

mNodes <- model$expandNodeNames(target[1])
kNodes <- model$expandNodeNames(target[2])
p <- length(mNodes)

Then in the run code you can refer to `mNodes[i]`, `kNodes[i]`.

-chris

Perry de Valpine

unread,
Sep 22, 2025, 5:25:44 AMSep 22
to paci...@stat.berkeley.edu, David Christianson, nimble-users
Also, in case you haven't had a chance to check it out, Section 15.5 of our User Manual has an example of writing a new sampler. The roles of setup code and run code there might be a good illustration.
-Perry


Message has been deleted

Perry de Valpine

unread,
Sep 24, 2025, 1:03:29 PM (13 days ago) Sep 24
to David Christianson, nimble-users
Hi David,
Please try the code below. This shows the separation between setup code and run code that you want, and uses the input arguments to the setup code when the sampler is created. You can put a browser() statement in the setup code if you want to catch it in action and look at its argument values. modell[[node]] works for a single node, not a vector of them. You could use values(model, target) <- values to do it that way, but I just split the two nodes involved into an mNode and kNode. I hope this gets you going and the programming sections of the User Manual at r-nimble.org can help you build on this.
Perry

##-----------------------------
## custom simultaneous sampler of M (total customsers) and K (occupied tables)

customMKsampler <- nimbleFunction(
  contains = sampler_BASE,
  setup = function(model, mvSaved, target, control) {
    calcNodes <- model$getDependencies(target)
    mNode<-model$expandNodeNames(target[1]) # mNode <- target[1] is probably fine. Using model$expandNodeNames is extra careful
    kNode<-model$expandNodeNames(target[2]) # kNode <- target[2] is probably fine. Ditto.
    cat("Setting up a customMKsampler (setup code runs in R)\n")
    cat("mNode is ", mNode, "\n")
    cat("kNode is ", kNode, "\n")
  },
  run = function(){
    # initial model logProb
    model_lp_initial <- model$getLogProb(calcNodes)

    # current values for number of customers (M) and occupied tables (K) at the target restaurant
    #Mi <- (target)[1]
    #Ki <- (target)[2]
    current_M <- model[[mNode]]
    current_K <- model[[kNode]]
    ## mNode<-model$getNodeNames(target[1])
    ## kNode<-model$getNodeNames(target[2])


    # create a random step up or down by one for K and M
    Mstep<--1+rbinom(1,1,0.5)*2
    Kstep<--1+rbinom(1,1,0.5)*2

    if(model[[mNode]]==0){ # unstick from 0
      proposed_K<-1
      proposed_M<-1
    }
    if (model[[mNode]] + Mstep==0){ # if proposed M steps into 0, then K must also be 0
      proposed_M<-0
      proposed_K<-0
    }
    if (model[[kNode]]+Kstep==0){ #if proposed K steps into 0, then M must also become 0
      proposed_M<-0
      proposed_K<-0
    }
    if ((model[[kNode]]+Kstep)>(model[[mNode]]+Mstep)){ # if proposed K is bigger than proposed M, then set equal
      proposed_M<-proposed_K<-current_M+Mstep
    }

    model[[mNode]] <<- proposed_M #c(proposed_M,proposed_K)
    model[[kNode]] <<- proposed_K

    # proposal model logProb
    model_lp_proposed <- model$calculate(calcNodes)

    # log-Metropolis-Hastings ratio
    log_MH_ratio <- model_lp_proposed - model_lp_initial

    # Metropolis-Hastings step: determine whether or
    # not to accept the newly proposed value
    u <- runif(1, 0, 1)
    if(u < exp(log_MH_ratio)) jump <- TRUE
    else                      jump <- FALSE

    # keep the model and mvSaved objects consistent

    if(jump) copy(from = model, to = mvSaved, row = 1,
                  nodes = calcNodes, logProb = TRUE)

    else     copy(from = mvSaved, to = model, row = 1,
                  nodes = calcNodes, logProb = TRUE)


  },
  methods = list(reset = function(){})
)

On Wed, Sep 24, 2025 at 6:06 PM David Christianson <daveal...@gmail.com> wrote:
Thanks Perry and Chris. Those were both helpful suggestions that I have a tried to incorporate into the reprex below. I don't seem to be referencing the individual M and K nodes correctly within the target block c("M[i]","K[i]"). I have tried a couple of approaches but get error messages such as  " Problem accessing node or variable ... " or similar messages.  

library(nimble)

rm(list = ls())
##------------------
# simulate restaurant customers and occupied tables
set.seed(123456)
S<-100
M <- K <- m <- vector(length=S) # containers for total customers and total occup tables
r <- 5
p <- 0.5
alpha <- 5
beta0 <-2
beta1 <--1
beta2<-1
covar1<-rnorm(S)
covar2<-rnorm(S)
lambda<-exp(beta0+beta1*covar1+beta2*covar2)
rlambda <- r/lambda # rate of the dgamma, scale would be mu/r
glambda <- rgamma(S, shape=r, rate=rlambda) # shape, rate parameterization

# populate M and K at each restaurant
for(i in 1:S){
  M[i] <- rpois(1,glambda[i])
  m[i] <- rbinom(1,M[i],p)
  if(M[i]==0){
    K[i]<-0
  }
  else{
  z<-rCRP(1,size=M[i],conc=alpha)
  K[i]<-max(z) # largest table assignment is total occupied tables
  }
}

# poke holes in data
#M<-replace(M,sample(S,20),NA)
#K<-replace(K,sample(S,20),NA)

##-------------------
#helper functions and tables
## two-term log-sum-exp
logspace_add <- function(logx, logy) {
  if (is.infinite(logx)) return(logy)
  if (is.infinite(logy)) return(logx)
  if (logy > logx) {
    tmp <- logx
    logx <- logy
    logy <- tmp
    }
  logx + log1p(exp(logy - logx))
}

## unsigned Stirling numbers,first kind
## trianglular matrix with rows and columns  from 0 to Nmax
build_logStirling1 <- function(Nmax) {
  out <- matrix(-Inf, nrow = Nmax + 1, ncol = Nmax + 1)
  # special case
  out[1, 1] <- 0
  ## n = 1 is a special case: strirling number
  out[2, 2] <- 0
  ## populate matrix
  for (n in 2:Nmax) {                  # real n ≥ 2
    ## ---- k = 1 column:  S₁(n,1) = (n-1)!  ---------------------
    out[n + 1, 2] <- lgamma(n)        
   
    ## ---- k ≥ 2 -----------------------------------------------
    for (k in 2:n) {
      a <- out[n,k]           # (n-1, k-1)
      b <- log(n - 1) + out[n,k + 1]  # (n-1)*first stirling of (n-1, k)
      out[n + 1, k + 1] <- logspace_add(a, b)
    }
  }
  out
}

## build
digits <- 3*max(M,na.rm=T) + 1 # upper limit on potential M
# Precompute logStirling numbers of 1st kind up to potential max M
logStirling1 <- build_logStirling1(digits)

## check
print(round(exp(logStirling1[1:5, 1:5])))

# make nimble data and constants lists
constants <- list(S=S,
                  digits = digits,
                  logStirling1=logStirling1,
                  covar1=covar1,
                  covar2=covar2
                  )
str(constants)
data <- list(ones_M = rep(1,S),
             m=m)
str(data)

##---------------------------------
## nimble model
code <- nimbleCode({
  # Priors\  
  beta0 ~ dnorm(0,sd=10)
  beta1 ~ dnorm(0,sd=10)
  beta2 ~ dnorm(0,sd=10)
  alpha ~ dbeta(0.01,0.01)
  r ~ dbeta(0.01,0.01)
  p <- 0.5
  # site loop for things that don't vary over time
  for(s in 1:S){
    log(lambda[s])<- beta0  + beta1 *covar1[s] + beta2 * covar2[s]
    rlambda[s] <- r/lambda[s] # rate of the dgamma, scale would be mu/r
    glambda[s] ~ dgamma(shape=r, rate=rlambda[s]) # shape, rate parameterization,
    #problambda[s] <- r/(r+(lambda[s]*site_area[s]))
    M[s] ~ dpois(glambda[s])
    m[s] ~ dbinom(size=M[s],prob=p)
    ones_M[s] ~ dconstraint(M[s]>=K[s] & M[s]<digits)  
    K[s] ~ dAntoniakCRP(M=M[s], alpha=alpha,logStirling1[1:digits,1:digits])
  }
})

##------------housekeeping
deregisterDistributions(c("dAntoniakCRP"))

##-----------------------
# Antoniak (CRP table-count) pmf
dAntoniakCRP <- nimbleFunction(
  run = function(x = integer(0), M = integer(0), alpha = double(0), logStirling1=double(2),log = integer(0, default=0)){
    returnType(double(0))
    # Special case: empty site
    if (M == 0) {
      if (x == 0) {             # P(K=0 | M=0) = 1
        if (log) return(0.0) else return(1.0)
      } else {
        if (log) return(-Inf) else return(0.0)
      }
    }
    if(x<1|x>M) { if(log) return(-Inf) else return(0.0) }
    # log S(M,x) at [M+1,x+1]
    logS <- logStirling1[M+1, x+1]
    logp<- logS + x*log(alpha) + lgamma(alpha) - lgamma(alpha+M)
    if(log) return(logp) else return(exp(logp))
  }
)

# registering necessary to use slice sampler on K
registerDistributions(list(
  dAntoniakCRP = list(
    BUGSdist = "dAntoniakCRP(M, alpha, logStirling1)",
    types = c('value = integer(0)','M=integer(0)','alpha = double(0)', 'logStirling1=double(2)'),
    discrete=T
  )
))

##-----------------------------
## custom simultaneous sampler of M (total customsers) and K (occupied tables)

customMKsampler <- nimbleFunction(
  contains = sampler_BASE,
  setup = function(model, mvSaved, target, control) {
    calcNodes <- model$getDependencies(target)
  },
  run = function(){
    # initial model logProb
    model_lp_initial <- getLogProb(model, calcNodes)
   
    # current values for number of customers (M) and occupied tables (K) at the target restaurant
    #Mi <- (target)[1]
    #Ki <- (target)[2]
    #current_M <- model[[Mi]]
    #current_K <- model[[Ki]]
    mNode<-model$getNodeNames(target[1])
    kNode<-model$getNodeNames(target[2])

   
    # create a random step up or down by one for K and M
    Mstep<--1+rbinom(1,1,0.5)*2
    Kstep<--1+rbinom(1,1,0.5)*2
   
    if(model[[mNode]]==0){ # unstick from 0
      proposed_K<-1
      proposed_M<-1
    }
    if (model[[mNode]] + Mstep==0){ # if proposed M steps into 0, then K must also be 0
      proposed_M<-0
      proposed_K<-0
    }
    if (model[[kNode]]+Kstep==0){ #if proposed K steps into 0, then M must also become 0
      proposed_M<-0
      proposed_K<-0
    }
    if ((model[[kNode]]+Kstep)>(model[[mNode]]+Mstep)){ # if proposed K is bigger than proposed M, then set equal
      proposed_M<-proposed_K<-current_M+Mstep
    }
   
    model[[target]] <<- c(proposed_M,proposed_K)
   
    # proposal model logProb
    model_lp_proposed <- model$calculate(calcNodes)
   
    # log-Metropolis-Hastings ratio
    log_MH_ratio <- model_lp_proposed - model_lp_initial
   
    # Metropolis-Hastings step: determine whether or
    # not to accept the newly proposed value
    u <- runif(1, 0, 1)
    if(u < exp(log_MH_ratio)) jump <- TRUE
    else                      jump <- FALSE
   
    # keep the model and mvSaved objects consistent

    if(jump) copy(from = model, to = mvSaved, row = 1,
                  nodes = calcNodes, logProb = TRUE)

    else     copy(from = mvSaved, to = model, row = 1,
                  nodes = calcNodes, logProb = TRUE)

   
  },
  methods = list(reset = function(){})
)



##--------------------------
## initial values
## missing values for missing K (occupied tables) must be less than or equal to corresponding non-missing M (number of customers), unless M=1 or M=0 in which case K=1 and K=0, respectively.

Minits<-replace(replace(M,is.na(M),K[is.na(M)]),!is.na(M),NA) #replace missing M with K
Kinits<-replace(replace(K,is.na(K),M[is.na(K)]),!is.na(K),NA) # replace missing K with M

#handle cases where both M and K are missing
bothmissing <- is.na(M)&is.na(K)
Kinits[bothmissing]<-Minits[bothmissing]<-1

inits<-list(r = 5,
            alpha = 5,
            beta0 =2,
            beta1 =-1,
            beta2=1,
            M=data$m+1,
            K=data$m+1)

##--------------------
## build

model <- nimbleModel(code, constants, data, inits)
#model$initializeInfo()

#
conf <- configureMCMC(model, monitors = c(
  "r","alpha","beta0","beta1","beta2",
  "M","K"
))


cmodel<- compileNimble(model,resetFunctions = T)

conf <- configureMCMC(cmodel, monitors=c(
  "r","alpha","beta0","beta1","beta2",
  "M","K"
))
conf$printSamplers()


# replace independent slice sampler for each M and K pair
for (i in 1:constants$S) {
  conf$removeSamplers(paste0("M[", i, "]"))
  conf$removeSamplers(paste0("K[", i, "]"))
  conf$addSampler(
    target  = c(paste0("M[", i, "]"), paste0("K[", i, "]")),
    type    = "customMKsampler"
  )
}
conf$printSamplers()
cmcmc<- buildMCMC(conf)

cmcmc<- compileNimble(cmcmc, project=cmodel,showCompilerOutput = T)
samples <- runMCMC(cmcmc, niter=11000, nburnin=1000,thin=5)


library(MCMCvis)
MCMCsummary(samples, params =
              "r","alpha","beta0","beta1","beta2",
            "M","K")

David Christianson

unread,
Sep 24, 2025, 1:05:08 PM (13 days ago) Sep 24
to nimble-users
Thanks Perry and Chris, those suggestions were very helpful and I was able to get a reprex run, but I don't seem to be getting any sampling of M or K, unless M or K is 0). Recall that M is number of customers and K is number of occupied tables in the restaurant.
test_MKsampler.R

David Christianson

unread,
Sep 24, 2025, 1:12:04 PM (13 days ago) Sep 24
to nimble-users
Hi Perry, We seem to have just cross-posted! I found the issue with the setup code immediately after posting (despite working on it for several hours prior to posting). Please look at the script in my latest post if you can -regarding the lack of sampling in M or K.  Thanks again for your time.

David Christianson

unread,
Sep 24, 2025, 2:22:16 PM (13 days ago) Sep 24
to nimble-users
I can see the problem now - my sampler was focussed on handling the special cases of 0 customers or 0 tables and tables>customers but wasn't generating proposals for any other case.  Fixed that, but I am still interested in suggestions for efficiency improvements.

customMKsampler <- nimbleFunction(
  contains = sampler_BASE,
  setup = function(model, mvSaved, target, control) {
    calcNodes <- model$getDependencies(target)
    mNode<-model$expandNodeNames(target[1]) #per PdV, mNode <- target[1] is probably fine.
    kNode<-model$expandNodeNames(target[2])

  },
  run = function(){
    # initial model logProb
    model_lp_initial <- getLogProb(model, calcNodes)


    # create a random step up or down by one for K and M
    Mstep<--1+rbinom(1,1,0.5)*2
    Kstep<--1+rbinom(1,1,0.5)*2
   
    proposed_M <- model[[mNode]]+Mstep
    proposed_K <- model[[kNode]]+Kstep
   
    if(model[[mNode]]==0|model[[kNode]]==0){ # unstick from 0
      proposed_M<-1
      proposed_K<-1
    }
    if (proposed_M==0|proposed_K==0){ # if proposed M steps into 0, then K must also be 0
      proposed_M<-0
      proposed_K<-0
    }
    if (proposed_K>proposed_M){ # if proposed K is bigger than proposed M, then set equal
      proposed_K<-proposed_M
    }
   
    model[[mNode]] <<- proposed_M

    model[[kNode]] <<- proposed_K
   
    # proposal model logProb
    model_lp_proposed <- model$calculate(calcNodes)
   
    # log-Metropolis-Hastings ratio
    log_MH_ratio <- model_lp_proposed - model_lp_initial
   
    # Metropolis-Hastings step: determine whether or
    # not to accept the newly proposed value
    u <- runif(1, 0, 1)
    if(u < exp(log_MH_ratio)) jump <- TRUE
    else                      jump <- FALSE
   

    # keep the model and mvSaved objects consistent
    if(jump) copy(from = model, to = mvSaved, row = 1,
                  nodes = calcNodes, logProb = TRUE)

    else     copy(from = mvSaved, to = model, row = 1,
                  nodes = calcNodes, logProb = TRUE)

   
  },
  methods = list(reset = function(){})
)

David Christianson

unread,
Sep 24, 2025, 2:23:21 PM (13 days ago) Sep 24
to nimble-users
working example attached
test_MKsampler_working.R
Reply all
Reply to author
Forward
0 new messages