Question about Logit Stick-Breaking Process

35 views
Skip to first unread message

Cody Chen

unread,
Aug 23, 2025, 9:15:52 PMAug 23
to nimble-users
Hi all,

I performed a Dependent Dirichlet Process Mixture (DDPM) model using the logit stick-breaking process (LSBP; see Model 1) to allow mixture weights (w) to depend on covariates. The model ran properly in nimble. However, I encountered the following two questions: 
  1. I'm confused about how to interpret the regression coefficients in the logistic regression part, as covariates in LSBP affect mixture weights indirectly through the stick-breaking proportions (v). In traditional finite mixture regression model (FMRM; see Model 2), covariates predict mixture weights directly, and their regression coefficients can be interpreted through log odds. 
  2. If the DDPM model and the FMRM both identify the same number of components, they are supposed to yield very similar component profiles, right? To verify this, I ran both models on the same dataset. However, the two sets of profiles looked quite different. 
The nimble code and dataset are attached below. I'm sorry that this is not a nimble question, but I would greatly appreciate it if you could share your thoughts with me. Thank you very much.

model1 <- nimbleCode({
  # Model likelihood
  for(i in 1:N) {
    for(j in 1:J) {
      y[i,j] ~ dbern(p = ip[z[i], j])
    }
    z[i] ~ dcat(prob = w[i,1:H])
    # logit stick-breaking process
    for(h in 1:(H-1)) {
      logit(v[i,h]) <- b0[h] + b1[h]*x1[i] + b2[h]*x2[i] + b3[h]*x3[i]
    }
    w[i,1:H] <- stick_breaking(v[i,1:(H-1)]) # stick-breaking weights
  }
  # Prior distributions
  # prior on item parameters
  for(h in 1:H) {
    for(j in 1:J) {
      ip[h, j] ~ dbeta(shape1 = 1, shape2 = 1)
    }
  }
  # prior on regression coefficients
  for(h in 1:(H-1)) {
    b0[h] ~ dnorm(mean = 0, tau = 1E-3)
    b1[h] ~ dnorm(mean = 0, tau = 1E-3)
    b2[h] ~ dnorm(mean = 0, tau = 1E-3)
    b3[h] ~ dnorm(mean = 0, tau = 1E-3)
  }
})

model2 <- nimbleCode({
  # Model likelihood
  for (i in 1:N) {
    # cluster assignment
    z[i] ~ dcat(prob = w[i, 1:K])
    # binary outcomes for J items
    for (j in 1:J) {
      y[i, j] ~ dbern(p = ip[z[i], j])
    }
    # mixture proportions using softmax (logit link)
    for (k in 1:K) {
      eta[i, k] <- b0[k] + b1[k] * x1[i] + b2[k] * x2[i] + b3[k] * x3[i]
    }
    # softmax transformation
    denom[i] <- sum(exp(eta[i, 1:K]))
    for (k in 1:K) {
      w[i, k] <- exp(eta[i, k]) / denom[i]
    }
  }
  # Prior distributions
  for(k in 1:K) {
    # beta prior for item response probabilities
    for(j in 1:J) {
      ip[k, j] ~ dbeta(shape1 = 1, shape2 = 1)
    }
    # normal prior for regression coefficients
    b0[k] ~ dnorm(mean = 0, tau = 1E-3)
    b1[k] ~ dnorm(mean = 0, tau = 1E-3)
    b2[k] ~ dnorm(mean = 0, tau = 1E-3)
    b3[k] ~ dnorm(mean = 0, tau = 1E-3)
  }
})
dataset.csv
Model1_LSBP.R
Model2_FMRM.R

Chris Paciorek

unread,
Aug 29, 2025, 3:11:38 PMAug 29
to Cody Chen, nimble-users
Hi Cody,

As you say, this is not a nimble question per se; I have a bit of familiarity with these models but not a lot, so I'll make a few comments that may be obvious/too vague.

Regarding the covariate interpretation, since the stickbreaking is relative to the previous sticks, I guess you could see if you could work out the absolute, rather than relative effect of a covariate by some manipulations that account for how a covariate affects the amount broken off for the first stick and then the relative effect of that covariate in breaking off the second stick and so forth. You might be able to quantify the effect of a "one unit" change in a covariate in such fashion. I'm not sure if there's any work on this in the statistical literature - there might be, and a chatbot query or Google Scholar query might point you in a useful direction.

I don't have any insight into your second question, though it is an interesting one. I might be concerned about label switching making it hard to "align" things. And perhaps given that the covariates affect the relative weight in model 1 and absolute weights in model 2, that influences the fitting such that it affects the profiles.

-chris

--
You received this message because you are subscribed to the Google Groups "nimble-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to nimble-users...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/nimble-users/ff1bcf8e-adb7-4c1a-934b-162714c5356fn%40googlegroups.com.

Cody Chen

unread,
Aug 29, 2025, 3:47:57 PMAug 29
to paci...@stat.berkeley.edu, nimble-users
Hi Chris,

Thank you very much for your thoughts! Although I didn't find a clear interpretation in the literature, your approach linking covariates and stick-breaking proportions makes sense to me. Regarding the second question, I agree that many other factors could affect the profiles. I'll dive deeper into these questions.


Best,
Cody

Sally paganin

unread,
Sep 2, 2025, 4:35:18 PMSep 2
to Cody Chen, paci...@stat.berkeley.edu, nimble-users
Hi Cody, Chris, 

I'll chime in on this as well. I am also not an expert on these models, but have some familiarity.

Covariate interpretation. I agree with what Chris said. One way to look at it is that logistic stick-breaking can be rewritten as an ordered (continuation-ratio) logistic regression. If you know those models, the analogy helps: each coefficient governs the probability of “stopping” at a given stick, conditional on not having stopped earlier. However, I have seen this representation used to build computational methods (in this paper ), but I don't have literature in mind that looks at the interpretation of the coefficients.

Component profiles. For your second question, my initial reaction is that I wouldn’t expect the profiles to look similar, even if you estimate the same number of clusters. Besides the label switching problem, the two models use different representations of the latent clustering. In the finite mixture regression (FMRM), the parametrization of the weights is symmetric, while the stick-breaking uses a sequential representation. I think this difference in parametrization induces different priors on the weights, and if you have similar profiles, that would be more by chance.

Good luck!


Sally 


Cody Chen

unread,
Sep 2, 2025, 5:44:24 PMSep 2
to Sally paganin, paci...@stat.berkeley.edu, nimble-users
Hi Sally,

Thank you very much for your thoughts and the reference. That's very helpful. 

Your opinion on the second question makes me wonder which method to trust if they yield different profiles in a real data analysis. However, this might not be a yes-or-no question.


Best,
Cody
Reply all
Reply to author
Forward
0 new messages