BYM2 hurdle joint model with stacked responses

9 views
Skip to first unread message

Felipe Barletta

unread,
Oct 9, 2025, 1:02:57 PM (4 days ago) Oct 9
to r-inla-disc...@googlegroups.com
Hy my friends,

I am fitting a BYM2 hurdle joint model to hospitalization cancers in Portugal, where

it a bivariate hurdle model with spatial effects (BYM2) for two outcomes:

  • SevHigh (high severity hospitalizations count)

  • c.obitos (death hospitalization count)

Each outcome uses a two-part hurdle:

  • Binary part (is 0 if not count observed,  1 if observed): Binomial

  • Positive part (given > 0): censored Poisson with truncation at 0 (cenpoisson.I = c(0,0))

Both parts include BYM2 spatial random effects on Portuguese municipalities (278 areas).

  • Y_bin1 (binomial for SevHigh)

  • Y_bin2 (binomial for c.obitos)

  • Y_pos1 (censored Poisson for SevHigh)

  • Y_pos2 (censored Poisson for c.obitos)


Each vector has length L, with non-NAs only in its active block (size n) and NA elsewhere. Same pattern for covariates and random-effect indices.

Key pieces (abbreviated):



## --- Number of observations
n.l1 <- nrow(train)
n.l2 <- nrow(train)

## --- Auxiliary vectors
NAs.l1 <- rep(NA, n.l1)
NAs.l2 <- rep(NA, n.l2)

## --- Response variables
# Binary
Z1 <- as.integer(train$SevHigh > 0)
Z2 <- as.integer(train$c.obitos > 0)

# Positive part
Y1_pos <- ifelse(train$SevHigh > 0, train$SevHigh, NA)
Y2_pos <- ifelse(train$c.obitos > 0, train$c.obitos, NA)

Stack each outcome in vector with NA to align
Y_bin1 <- c(Z1, NAs.l2, NAs.l1, NAs.l2)       # binomial Y1
Y_bin2 <- c(NAs.l1, Z2, NAs.l1, NAs.l2)       # binomial Y2
Y_pos1 <- c(NAs.l1, NAs.l2, Y1_pos, NAs.l2)   # cenpoisson Y1
Y_pos2 <- c(NAs.l1, NAs.l2, NAs.l1, Y2_pos)   # cenpoisson Y2

# Join in list

Y.joint <- list(Y_bin1, Y_bin2, Y_pos1, Y_pos2)


Covariates are stacked with the same block structure. I also create 4 index vectors for BYM2 effects (one per stacked part), with non-NAs only in the corresponding block:

# Covariates
covariates <- list(
  inter = as.factor(c(rep("b.l1", n.l1), rep("b.l2", n.l2),
                      rep("p.l1", n.l1), rep("p.l2", n.l2))),
 
  # month
  month.l1 = c(train$mes_id, NAs.l2, train$mes_id, NAs.l2),
  month.l2 = c(NAs.l1, train$mes_id, NAs.l1, train$mes_id),

  # gender of patient
  gender.l1 = c(train$gender, NAs.l2, train$gender, NAs.l2),
  gender.l2 = c(NAs.l1, train$gender, NAs.l1, train$gender),

  # age > 60
  age60.l1 = c(train$age60, NAs.l2, train$age60, NAs.l2),
  age60.l2 = c(NAs.l1, train$age60, NAs.l1, train$age60),

  # average length of stay
  avg.los.l1 = c(train$avg.los, NAs.l2, train$avg.los, NAs.l2),
  avg.los.l2 = c(NAs.l1, train$avg.los, NAs.l1, train$avg.los),

   #Ei
  Ei.l1 = c( NAs.l2,NAs.l2,train$Ei2,NAs.l2),
  Ei.l2 = c(NAs.l1, NAs.l2,NAs.l2,train$Ei3)
)

# Unique index
unique.id <- unique(train$city_id)
n.id <- length(unique.id)
idx.l1 <- match(train$city_id, unique.id)
idx.l2 <- match(train$city_id, unique.id)

## Random effects - spatial
r.effects <- list(
  #  binomial Y1
  idx.sh.l1.1 = c(idx.l1, NAs.l2, rep(NA,n.l1), rep(NA,n.l2)),
  #  binomial Y2
  idx.sh.l2.1 = c(NAs.l1, idx.l2, rep(NA,n.l1), rep(NA,n.l2)),
  #  cenpoisson Y1
  idx.sh.l1.2 = c(rep(NA,n.l1), rep(NA,n.l2), idx.l1, NAs.l2),
  #  cenpoisson Y2
  idx.sh.l2.2 = c(rep(NA,n.l1), rep(NA,n.l2), NAs.l1, idx.l2)
)

joint.data <- c(covariates, r.effects)
joint.data$Y <- Y.joint

#Offsets for the positive parts only:
E1 <- na.omit(joint.data$Ei.l1)  # deve ter comprimento n.l1 = 26132
E2 <- na.omit(joint.data$Ei.l2)  # idem

## model
m_hurdle <- inla(
  Y ~ -1 + inter +
    gender.l1 + age60.l1 + avg.los.l1 + month.l1 +
    gender.l2 + age60.l2 + avg.los.l2 + month.l2 +
    f(idx.sh.l1.1, model="bym2", graph=W.pt, scale.model=TRUE, hyper=prior1) +
    f(idx.sh.l2.1, model="bym2", graph=W.pt, scale.model=TRUE, hyper=prior2) +
    f(idx.sh.l1.2, model="bym2", graph=W.pt, scale.model=TRUE, hyper=prior1) +
    f(idx.sh.l2.2, model="bym2", graph=W.pt, scale.model=TRUE, hyper=prior2),
 
  data = joint.data,
  family = c("binomial","binomial","cenpoisson","cenpoisson"),
  Ntrials = list(rep(1,n.l1), rep(1,n.l2), NULL, NULL),
  E = list(NULL, NULL, E1, E2),
  control.family = list(
    list(), list(),
    list(cenpoisson.I=c(0,0)),  # truncation in 0
    list(cenpoisson.I=c(0,0))
  ),
  control.compute = list(dic=TRUE, waic=TRUE, cpo=TRUE),
  control.predictor = list(compute=TRUE),
    control.inla = list(int.strategy="eb")
)

And that's the message error:
*** inla.core.safe:  The inla program failed, but will rerun in case better initial values may help. try=1/1
Error in inla.core.safe(formula = formula, family = family, contrasts = contrasts,  :
  arguments imply differing number of rows: 26132, 0
The inla program failed and the maximum number of tries has been reached.

What I’ve checked

  • Lengths: every stacked vector (responses, covariates, index vectors) 

  • Non-NAs per part: sum(!is.na(Y_bin1)) = n, same for Y_bin2, Y_pos1, Y_pos2.

  • Ntrials lengths match binomial non-NAs: length(rep(1,n)) = n.

  • Offsets E1, E2 have length n and contain no NAs.

  • Index ranges for the four BYM2 terms are within 1..278, with n_valid = n and n_levels = 278 for each stacked index.

  • Graph W.pt reports n = 278 and non-empty neighbor lists.

Given that, I suspect the error is with Ntrials and E but I'm not sure.

If any of the above are wrong, could that alone explain the 26132, 0 mismatch?

If someone has any recommendations or examples on correctly aligning Ntrials/E with stacked components, I’d be very grateful.
Thank you in advance for your time and help!


Finn Lindgren

unread,
Oct 9, 2025, 4:55:47 PM (4 days ago) Oct 9
to Felipe Barletta, r-inla-disc...@googlegroups.com
I think you're right in suspecting E/Ntrials: When using inla.stack() to organise the data for multiple observation models, E and Ntrials would be constructed to be vectors of the same length as the total predictor, so instead of

  E = list(NULL, NULL, E1, E2)
you need
  E = c(rep(1, length(Y_bin1) + length(Y_bin2)), E1, E2)
and similarly for Ntrials (the values won't be used by the parts of the models that don't involve them, so setting the to 1 is ok.
NA might also be ok, as I think that's what inla.stack() would do if you include the E/Ntrials vectors as part of the 'data' argument.

Note: Both inla.stack() and the inlabru::bru() interface would handle the data stacking for you.

Finn

--
You received this message because you are subscribed to the Google Groups "R-inla discussion group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to r-inla-discussion...@googlegroups.com.
To view this discussion, visit https://groups.google.com/d/msgid/r-inla-discussion-group/CAJz4ZjHQs2CAj0qdy8WRZAv%2B9spmZKWtLHPCPkuxJvZ25FD_Zg%40mail.gmail.com.


--
Finn Lindgren
email: finn.l...@gmail.com
Reply all
Reply to author
Forward
0 new messages