Hy my friends,
I am fitting a BYM2 hurdle joint model to hospitalization cancers in Portugal, where
it a bivariate hurdle model with spatial effects (BYM2) for two outcomes:
Each outcome uses a two-part hurdle:
-
Binary part (is 0 if not count observed, 1 if observed): Binomial
-
Positive part (given > 0): censored Poisson with truncation at 0 (cenpoisson.I = c(0,0)
)
Both parts include BYM2 spatial random effects on Portuguese municipalities (278 areas).
-
Y_bin1
(binomial for SevHigh
)
-
Y_bin2
(binomial for c.obitos
)
-
Y_pos1
(censored Poisson for SevHigh
)
-
Y_pos2
(censored Poisson for c.obitos
)
Each vector has length L
, with non-NAs only in its active block (size n
) and NA
elsewhere. Same pattern for covariates and random-effect indices.
Key pieces (abbreviated):
## --- Number of observations
n.l1 <- nrow(train)
n.l2 <- nrow(train)
## --- Auxiliary vectors
NAs.l1 <- rep(NA, n.l1)
NAs.l2 <- rep(NA, n.l2)
## --- Response variables
# Binary
Z1 <- as.integer(train$SevHigh > 0)
Z2 <- as.integer(train$c.obitos > 0)
# Positive part
Y1_pos <- ifelse(train$SevHigh > 0, train$SevHigh, NA)
Y2_pos <- ifelse(train$c.obitos > 0, train$c.obitos, NA)
# Stack each outcome in vector with NA to align
Y_bin1 <- c(Z1, NAs.l2, NAs.l1, NAs.l2) # binomial Y1
Y_bin2 <- c(NAs.l1, Z2, NAs.l1, NAs.l2) # binomial Y2
Y_pos1 <- c(NAs.l1, NAs.l2, Y1_pos, NAs.l2) # cenpoisson Y1
Y_pos2 <- c(NAs.l1, NAs.l2, NAs.l1, Y2_pos) # cenpoisson Y2
# Join in list
Y.joint <- list(Y_bin1, Y_bin2, Y_pos1, Y_pos2)
Covariates are stacked with the same block structure. I also create 4 index vectors for BYM2 effects (one per stacked part), with non-NAs only in the corresponding block:
# Covariates
covariates <- list(
inter = as.factor(c(rep("b.l1", n.l1), rep("b.l2", n.l2),
rep("p.l1", n.l1), rep("p.l2", n.l2))),
# month
month.l1 = c(train$mes_id, NAs.l2, train$mes_id, NAs.l2),
month.l2 = c(NAs.l1, train$mes_id, NAs.l1, train$mes_id),
# gender of patient
gender.l1 = c(train$gender, NAs.l2, train$gender, NAs.l2),
gender.l2 = c(NAs.l1, train$gender, NAs.l1, train$gender),
# age > 60
age60.l1 = c(train$age60, NAs.l2, train$age60, NAs.l2),
age60.l2 = c(NAs.l1, train$age60, NAs.l1, train$age60),
# average length of stay
avg.los.l1 = c(train$avg.los, NAs.l2, train$avg.los, NAs.l2),
avg.los.l2 = c(NAs.l1, train$avg.los, NAs.l1, train$avg.los),
#Ei
Ei.l1 = c( NAs.l2,NAs.l2,train$Ei2,NAs.l2),
Ei.l2 = c(NAs.l1, NAs.l2,NAs.l2,train$Ei3)
)
# Unique index
unique.id <- unique(train$city_id)
n.id <- length(
unique.id)
idx.l1 <- match(train$city_id,
unique.id)
idx.l2 <- match(train$city_id,
unique.id)
## Random effects - spatial
r.effects <- list(
# binomial Y1
idx.sh.l1.1 = c(idx.l1, NAs.l2, rep(NA,n.l1), rep(NA,n.l2)),
# binomial Y2
idx.sh.l2.1 = c(NAs.l1, idx.l2, rep(NA,n.l1), rep(NA,n.l2)),
# cenpoisson Y1
idx.sh.l1.2 = c(rep(NA,n.l1), rep(NA,n.l2), idx.l1, NAs.l2),
# cenpoisson Y2
idx.sh.l2.2 = c(rep(NA,n.l1), rep(NA,n.l2), NAs.l1, idx.l2)
)
joint.data <- c(covariates, r.effects)
joint.data$Y <- Y.joint
#Offsets for the positive parts only:
E1 <- na.omit(joint.data$Ei.l1) # deve ter comprimento n.l1 = 26132
E2 <- na.omit(joint.data$Ei.l2) # idem
## model
m_hurdle <- inla(
Y ~ -1 + inter +
gender.l1 + age60.l1 + avg.los.l1 + month.l1 +
gender.l2 + age60.l2 + avg.los.l2 + month.l2 +
f(idx.sh.l1.1, model="bym2", graph=W.pt, scale.model=TRUE, hyper=prior1) +
f(idx.sh.l2.1, model="bym2", graph=W.pt, scale.model=TRUE, hyper=prior2) +
f(idx.sh.l1.2, model="bym2", graph=W.pt, scale.model=TRUE, hyper=prior1) +
f(idx.sh.l2.2, model="bym2", graph=W.pt, scale.model=TRUE, hyper=prior2),
data = joint.data,
family = c("binomial","binomial","cenpoisson","cenpoisson"),
Ntrials = list(rep(1,n.l1), rep(1,n.l2), NULL, NULL),
E = list(NULL, NULL, E1, E2),
control.family = list(
list(), list(),
list(cenpoisson.I=c(0,0)), # truncation in 0
list(cenpoisson.I=c(0,0))
),
control.compute = list(dic=TRUE, waic=TRUE, cpo=TRUE),
control.predictor = list(compute=TRUE),
control.inla = list(int.strategy="eb")
)
And that's the message error:
*** inla.core.safe: The inla program failed, but will rerun in case better initial values may help. try=1/1
Error in inla.core.safe(formula = formula, family = family, contrasts = contrasts, :
arguments imply differing number of rows: 26132, 0
The inla program failed and the maximum number of tries has been reached.
What I’ve checked
-
Lengths: every stacked vector (responses, covariates, index vectors)
-
Non-NAs per part: sum(!is.na(Y_bin1)) = n
, same for Y_bin2
, Y_pos1
, Y_pos2
.
-
Ntrials
lengths match binomial non-NAs: length(rep(1,n)) = n
.
-
Offsets E1
, E2
have length n
and contain no NAs.
-
Index ranges for the four BYM2 terms are within 1..278
, with n_valid = n
and n_levels = 278
for each stacked index.
-
Graph W.pt
reports n = 278
and non-empty neighbor lists.
Given that, I suspect the error is with Ntrials and E but I'm not sure.
If any of the above are wrong, could that alone explain the 26132, 0
mismatch?
If someone has any recommendations or examples on correctly aligning Ntrials
/E
with stacked components, I’d be very grateful.
Thank you in advance for your time and help!