Align within-cluster parameters with cluster weights

16 views
Skip to first unread message

Meng Qiu

unread,
Jun 19, 2022, 9:01:41 PM6/19/22
to nimble-users
Hi all,

Hope you all had a great weekend. 

I am wondering if there is an easier way in nimble to extract cluster weights and within-cluster parameters that are aligned. Take the following simple nonparametric mixture of products of Bernoullis as an example: 

# Nimble model code
bcode <- nimbleCode({
  for(i in 1:N) {
    for(j in 1:J) {
      y[i, j] ~ dbern(ip[z[i], j])
    }
  }
  for(i in 1:N) {
    for(j in 1:J) {
      ip[i, j] ~ dbeta(shape1 = 2, shape2 = 5)
    }
  }
  z[1:N] ~ dCRP(conc = alpha, size = N)
  alpha ~ dgamma(shape = 2, rate = 2)
})


For the second iteration, I can extract the cluster sizes and within-cluster parameter estimates as follows:

table(zSamples[2, ]) # cluster sizes
 1      2      3       4       5
31   18   363    74     14 


For each iteration, the within-cluster parameter estimates can be reorganized into a J (number of variables) by K (number of clusters) matrix:

ip.m <- matrix(ipSamples[2,], nrow=J, ncol=N, byrow=T) # J = 5 in this case

Then, for the first variable:

table(ip.m[1, zSamples[2,]]) 
0.03230  0.29045  0.37660  0.51149  0.87088
      74            18            14             31           363 


For the second variable:

table(ip.m[2, zSamples[2,]])
0.08561  0.51955  0.56262  0.61106  0.88537
      74             14            18            31          363 


The two sets of parameter estimates are not aligned if they are extracted using the table function. The ideal output I expect is as follows:

1      2      3       4       5
31   18   363    74     14

0.51149   0.29045    0.87088   0.03230   0.37660
       31            18             363            74             14 
 
0.61106   0.56262    0.88537   0.08561   0.51955
       31            18             363            74             14 

My way to solve this issue is cumbersome. But when it comes to a more complecated model, say the variables are polytomous with different number of categories, I do not know how to solve it: 

# Nimble model
pcode <- nimbleCode({
  for(i in 1:N) {
    for(j in 1:J) {
      y[i, j] ~ dcat(ip[z[i], j, 1:C[j]])
    }
  }
  for(i in 1:N) {
    for(j in 1:J) {
      ip[i, j, 1:C[j]] ~ ddirch(beta[1:C[j]])
    }
  }
  z[1:N] ~ dCRP(conc = alpha, size = N)
  alpha ~ dgamma(shape = 2, rate = 2)
})


The code for the binary case is attached. Thank you very much.


Best,
Chris

example code.R

Chris Paciorek

unread,
Jun 22, 2022, 1:47:54 PM6/22/22
to Meng Qiu, nimble-users
Hi Chris,

I might not be fully understanding your question. Can't you do something like this:

table(zSamples[2, ]) 

ids <- sort(unique(zSamples[2, ]))

# for each occupied cluster, extract the parameters
sapply(ids, function(i) ip.m[, i])

-Chris

--
You received this message because you are subscribed to the Google Groups "nimble-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to nimble-users...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/nimble-users/b64f8344-d30b-4205-9fc5-a3ee4fa5949cn%40googlegroups.com.

Meng Qiu

unread,
Jun 22, 2022, 2:57:07 PM6/22/22
to Chris Paciorek, nimble-users
Thanks a lot, Chris! I made it too complicated. 

Also, if I understand correctly, NIMBLE does not fill the gap between clusters automatically. Does NIMBLE have any built-in function to do so? Before doing "ids <- sort(unique(zSamples[2, ]))," I run the following code to manually fill the gap:

z.nogap <- matrix(0, nrow=nMCMC, ncol=N)
for (iter in 1:nMCMC.post) {
  temp <- sort(unique(as.numeric(zSamples[iter,])))
  for(i in 1:N) {
    z.nogap[iter,i] <- match(zSamples[iter,][i], temp)
  }
}



Best,
Chris

--

Meng (Chris) Qiu

Graduate Student, Quantitative Psychology

Statistical Methods for Real Data Lab|University of Notre Dame

Chris Paciorek

unread,
Jun 22, 2022, 4:44:02 PM6/22/22
to Meng Qiu, nimble-users
hi Chris,

For reasons of efficiency, we do not fill the gaps when doing MCMC in this situation. We don't have any code in nimble to fill the gaps, so it's up to you. Personally, I would just manipulate the output using the existing IDs assigned by nimble and then, at the very end of processing, just relabel thing so there are no gaps.

(Side note: when opening a new cluster, we do assign that cluster the lowest possible id value, so gaps are partially filled in some sense.)

Meng Qiu

unread,
Jun 22, 2022, 7:43:39 PM6/22/22
to Chris Paciorek, nimble-users
Thanks for the suggestions, Chris. I appreciate it.


Best,
Chris
Reply all
Reply to author
Forward
0 new messages