Survival fit with Gompertz distribution is not plausible

189 views
Skip to first unread message

Heinz-Werner Priess

unread,
Aug 6, 2021, 5:17:27 AM8/6/21
to R-inla discussion group
Dear INLA-group,

I try to fit treatment-data for teeth to a survival model. When using the package "flexsurv" with a Gompertz distribution the fit followes nicely the original data. However, when I repeat the fit with INLA, the fit looks different. Right now I cannot decide whether I use INLA in a wrong way or the INLA-fit is wrong because of the "experimental" status of the Gompertz-distribution. The survival-analysis with INLA follows the instructions from the book "Bayesian inference with INLA" from Virgilio Gómez-Rubio.

Here is a reproducible example:

library(survival)
library(flexsurv)
library(INLA)
library(parallel)
library(ggplot2)

# create random data
# parameters from the initial flexsurf fit on original data
#      shape       rate
# -0.2290890 -0.7998633

random_data<-data.frame(time = rgompertz(n = 1e3,
                                         shape = -0.2290890,
                                         rate = exp(-0.7998633)),
                        cens = 1)
# In my original data set, I have data for eight years. To adjust for this
# time-span all times longer than eight years will be censored
random_data[random_data$time > 8, 'time']<-8
random_data[random_data$time == 8, 'cens']<-0

# Kaplan-Meier plot of the random data
km<-survfit(Surv(time,cens)~1,
            data=random_data)
plot(km)

# fit with flexsurv
flexsurv_model<-flexsurvreg(Surv(time,cens) ~ 1,
                            data = random_data,
                            dist = 'gompertz')
plot(flexsurv_model)
coef(flexsurv_model)
#      shape       rate
# -0.2391953 -0.7735919
# visual fit and recovered parameters are very close to the original data

# fit with INLA
sinla <- inla.surv(random_data$time, random_data$cens)
inla_model <- inla(sinla ~ 1,
                   data = random_data,
                   family = "gompertzsurv")
summary(inla_model)
# the recovered parameters look different from the original ones

# plot fit
# Grid of times
time <- seq(0.01, 8, by = 0.1)
# Marginal of alpha
alpha.marg <- inla_model$marginals.fixed[["(Intercept)"]]
# Set number of cores to use
options(mc.cores = 4)
# Compute post. marg. of survival function
surv.inla <- mclapply(times, function(t){
  surv.marg <- inla.tmarginal(function(x) {exp(- exp(x) * t)}, alpha.marg)
  surv.stats <- inla.zmarginal(surv.marg, silent = TRUE)
  return(unlist(surv.stats[c("quant0.025", "mean", "quant0.975")]))
})

surv.inla <- as.data.frame(do.call(rbind, surv.inla))
surv.inla$times<-times


km_data<-data.frame(time=km$time,
                    surv=km$surv,
                    lb=km$lower,
                    ub=km$upper)
ggplot(km_data,aes(x=time))+
  geom_line(aes(y=surv))+
  geom_ribbon(aes(ymin=lb,ymax=ub),alpha=0.3)+
  geom_ribbon(data=surv.inla, aes(ymin=quant0.025,ymax=quant0.975),alpha=0.3)+
  geom_line(data=surv.inla, aes(y=mean))
# the plot also looks different from the original data

# crosscheck with new data according to the recovered parameters

random_data2<-data.frame(time = rgompertz(n = 1e3,
                                         shape = 0.004,
                                         rate = exp(-1.355)),
                        cens = 1)
# In my original data set, I have data for eight years. To adjust for this
# time-span all times longer than eight years will be censored
random_data2[random_data2$time > 8, 'time']<-8
random_data2[random_data2$time == 8, 'cens']<-0

# Kaplan-Meier plot of the random data
km2<-survfit(Surv(time,cens)~1,
            data=random_data2)
plot(km2)
km2_data<-data.frame(time=km2$time,
                    surv=km2$surv,
                    lb=km2$lower,
                    ub=km2$upper)
# plot the inla-fit togegher with the new random data
ggplot(km2_data,aes(x=time))+
  geom_line(aes(y=surv))+
  geom_ribbon(aes(ymin=lb,ymax=ub),alpha=0.3)+
  geom_ribbon(data=surv.inla, aes(ymin=quant0.025,ymax=quant0.975),alpha=0.3)+
  geom_line(data=surv.inla, aes(y=mean))

# there is a nice fit of the inla-fit with the new random-data

Helpdesk

unread,
Aug 6, 2021, 6:03:09 AM8/6/21
to Heinz-Werner Priess, R-inla discussion group
there were some errors in the code, please try now
> --
> You received this message because you are subscribed to the Google
> Groups "R-inla discussion group" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to r-inla-discussion...@googlegroups.com.
> To view this discussion on the web, visit
> https://groups.google.com/d/msgid/r-inla-discussion-group/cdcc2f50-d3df-48db-9942-5e92f38eb86dn%40googlegroups.com
> .

--
Håvard Rue
he...@r-inla.org
runme.R

Heinz-Werner Priess

unread,
Aug 6, 2021, 6:40:59 AM8/6/21
to R-inla discussion group
Thank you for the quick answer.
In line 57 I changed
surv.inla$times<-times
to
surv.inla$time<-times
to make the the plot possible.

However, I the parameters I get from the INLA-fit deviate from the parameters of the initial distribution. The result is essentially the same as my first fit.

Initial Distribution:
random_data<-data.frame(time = rgompertz(n = 1e4,
                                         shape = -0.2290890,
                                         rate = exp(-0.7998633)),
                        cens = 1)

> summary(inla_model)
Call:
   "inla(formula = sinla ~ 1, family = \"gompertzsurv\", data = random_data)"
Time used:
    Pre = 1.38, Running = 2.82, Post = 0.0724, Total = 4.28
Fixed effects:
              mean    sd 0.025quant 0.5quant 0.975quant   mode kld
(Intercept) -1.341 0.011     -1.363   -1.341     -1.319 -1.341   0

Model hyperparameters:
                                   mean   sd 0.025quant 0.5quant 0.975quant  mode
alpha parameter for Gompertz-surv 0.001 0.00       0.00    0.001      0.001 0.001

Marginal log-Likelihood:  -19029.95
Posterior marginals for the linear predictor and
 the fitted values are computed

When I use theese parameters to create a new Gompertz-distribution, the INLA-result and the new Distribution fit nicely.
random_data2<-data.frame(time = rgompertz(n = 1e3,
                                          shape = 0.001,
                                          rate = exp(-1.341)),
                         cens = 1)
shape = 0.001 is not the same as shape = -0.229
rate = exp(-0.7998633) is not the same as rate = exp(-1.341)

Inappropriate fit,
original data = black,
INLA-fit = red

inappropriate_fit.png

Helpdesk

unread,
Aug 6, 2021, 6:44:35 AM8/6/21
to Heinz-Werner Priess, R-inla discussion group

sure you're using the same parameterisation ?

Helpdesk

unread,
Aug 6, 2021, 6:44:46 AM8/6/21
to Heinz-Werner Priess, R-inla discussion group
inla.doc("gompertz")


On Fri, 2021-08-06 at 03:40 -0700, Heinz-Werner Priess wrote:

Helpdesk

unread,
Aug 6, 2021, 6:50:16 AM8/6/21
to Heinz-Werner Priess, R-inla discussion group

you simulate with shape = -0.229... and the INLA implementation assumes
shape > 0, so this can be the reason.

On Fri, 2021-08-06 at 03:40 -0700, Heinz-Werner Priess wrote:

Heinz-Werner Priess

unread,
Aug 6, 2021, 6:58:39 AM8/6/21
to R-inla discussion group
This sounds like a plausible reason. Using shape > 0 for random-data I have no problems to recover my parameters in the INLA-fit.
However, my original data follows a Gompertz-distribution with negative shape. Is there any possibility to allow negative shapes in the INLA-fit?

Helpdesk

unread,
Aug 6, 2021, 12:02:29 PM8/6/21
to Heinz-Werner Priess, R-inla discussion group

I'll have to check what implications this might have

Heinz-Werner Priess

unread,
Aug 9, 2021, 4:23:24 AM8/9/21
to R-inla discussion group
The possibility of negative shapes would be great! When analyzing tooth data and the data follows a Gompertz-distribution, the shape is always negative.
In the following example you can see the time to the first intervention per person. The black line is the Kaplan-Meier-curve, the red line is the fit with a Gompertz-distribution (flexsurv). The dataset contains 7e6 observations:
complete_dataset.png
Reply all
Reply to author
Forward
0 new messages