How to add censored data processing to an univariate integrated random walk with two harmonics

171 views
Skip to first unread message

Dominique Soudant

unread,
Nov 3, 2023, 5:19:52 PM11/3/23
to R-inla discussion group
Dear all,

I analyse time series data related to the marine environment. I have developed an approach using the Dynamic Linear Model (DLM) and the R dlm package. However, these time series may contain numerous measurements at either the detection or quantification limit, which fluctuate due to changing measurement techniques. I identified the Tobit approach as a way to model these censored data. Allik et al. developed a Tobit-Kalman model that is very close to the DLM formulation. However, I believe that the DLM package does not allow for such censored models. So I turned to INLA, thinking that it would be more flexible and allow me to formulate models that deal with censored data.

I started working with INLA using the book "Dynamic Time Series Models using R-INLA: An Applied Perspective". Today, the INLA model I'm able to produce is very close to the one obtained with the dlm package. However, I'm still struggling to integrate the processing of censored data. My feeling is that it's not necessarily something complicated, but that it requires a good understanding of the Bayesian approach and good skills in writing DLMs in INLA.

Can anyone help me with this?

As a starting point, please find attached a script containing a time series simulation modelled by an integrated random walk with two harmonic seasonality.

Thanks in advance
DS
06_00_TSDLM_01_AddTrigoSeason2H.r

Helpdesk (Haavard Rue)

unread,
Nov 4, 2023, 4:29:37 AM11/4/23
to Dominique Soudant, R-inla discussion group

Coding data as 'survival data' can you integrate censoring information,
but only in a probalistic sense not determinisitic. you can get that the
linear predictor is very likely > 2, say, but you cannot impose 'must be
> 2'.

like this

n <- 1000
x <- rep(NA, n)
y <- rep(NA, n)
event <- rep(NA, n)
i <- 0
low <- 1
high <- 2
while(i < n) {
xx <- rnorm(1, sd=0.2)
eta <- 1 + xx
yy <- rexp(1, rate = exp(eta))
if (yy > low) {
i <- i+1
x[i] <- xx
if (yy < high) {
y[i] <- yy
event[i] <- 1
} else {
y[i] <- high
event[i] <- 0
}
}
}

Y <- inla.surv(y, event = event, truncation = rep(low,n))

r <- inla(Y ~ 1 + x,
data = list(Y=Y, x=x),
family = "exponentialsurv",
verbose = TRUE,
control.fixed=list(prec.intercept=1, prec=1))
summary(r)




would this help?
> --
> You received this message because you are subscribed to the Google
> Groups "R-inla discussion group" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to r-inla-discussion...@googlegroups.com.
> To view this discussion on the web, visit
> https://groups.google.com/d/msgid/r-inla-discussion-group/a89e8b58-98eb-4c1e-8daf-45ce6761014dn%40googlegroups.com
> .

--
Håvard Rue
he...@r-inla.org

Dominique Soudant

unread,
Nov 6, 2023, 4:15:39 AM11/6/23
to R-inla discussion group
Dear Håvard,
Thank you for your reply. I will work on it, especially in the context of time series, observed data and latent processes.
Kind regards.
DS

Dominique Soudant

unread,
Nov 7, 2023, 5:17:17 AM11/7/23
to R-inla discussion group
Dear Håvard,

Thank you for all your help and support. It seems that I have managed to integrate a censoring process into a first-order dynamic linear time series model (TSDLM), perhaps in a quick and dirty way (cf. included script with one model with censoring process and another one without). In fact, I have a few requests, questions and thoughts: :
  • I need to fully understand what is done in this handling of censored data, equations and references : do you have any suggestions on this particular subject ? I already read the inla.doc().
  • the estimation of the model integrating the censoring process give this message :
            *** max_correction = 25.01 >= 25.00, so 'vb.correction' is aborted
            *** Please (re-)consider your model, priors, confounding, etc.
            *** You can change the emergency value (current value=25.00) by
            ***     'control.inla=list(control.vb=list(emergency=...))'
    I may change the ermergency value, but actually I' don't understand why I obtain this message, what does it meant, and what are the different corrective options and their meaning ;
  • running the script several times, I noticed that the second model (i.e. without the censoring process) does not always give the same result, including one where the mean level goes through all the observations ; this multiplication of results is very disturbing and I do not understant it ;
  • I found the lognormal.surv family, I wonder if I'll need any more family : today I'm using log, logit and square root transformation, but I think the data used with the last two transformations are actually not censored;
Thanks in advance
Kind regards
DS
TSDLM 1 censored.r

Helpdesk (Haavard Rue)

unread,
Nov 7, 2023, 11:46:06 AM11/7/23
to Dominique Soudant, R-inla discussion group
On Tue, 2023-11-07 at 02:17 -0800, Dominique Soudant wrote:
> Dear Håvard,
>
> Thank you for all your help and support. It seems that I have managed
> to integrate a censoring process into a first-order dynamic linear
> time series model (TSDLM), perhaps in a quick and dirty way (cf.
> included script with one model with censoring process and another one
> without). In fact, I have a few requests, questions and thoughts: :

>  * I need to fully understand what is done in this handling of
> censored data, equations and references : do you have any suggestions
> on this particular subject ? I already read the inla.doc().

this follows from std survial analysis on how cencoring is done. Like if
the observation is right censored, then the likelihood is 1-F(time),
where 'time' is the cencoring timd and F() is the CDF



>  * the estimation of the model integrating the censoring process give
> this message :
>            *** max_correction = 25.01 >= 25.00, so 'vb.correction' is
> aborted
>            *** Please (re-)consider your model, priors, confounding,
> etc.
>            *** You can change the emergency value (current
> value=25.00) by
>            ***     'control.inla=list(control.vb=list(emergency=...))'
>    I may change the ermergency value, but actually I' don't understand
> why I obtain this message, what does it meant, and what are the
> different corrective options and their meaning ;

This is a warning produced by the variational correction for the mean,
and that correction is to high to be meaningful hence the correction is
skipped. this usually happens when the model is not well identified or
weakly identified, and you are simply asking to much from the data. fix:
simplify or strengthen priors.



>  * running the script several times, I noticed that the second model
> (i.e. without the censoring process) does not always give the same
> result, including one where the mean level goes through all the
> observations ; this multiplication of results is very disturbing and I
> do not understant it ;

maybe you can add an offset such that the mean value is about 0, which
will help. I guess the initial value for the intercept can be off and
then this can go wrong (as also the variance is estimated)...

>  * I found the lognormal.surv family, I wonder if I'll need any more
> family : today I'm using log, logit and square root transformation,
> but I think the data used with the last two transformations are
> actually not censored;

that sounds easier to me...

Dominique Soudant

unread,
Nov 9, 2023, 8:53:53 AM11/9/23
to R-inla discussion group
Dear Håvard,

thank you very much for your answers :
  1. I think we're talking about something like this: https://en.wikipedia.org/wiki/Censoring_(statistics)#Likelihood; I haven't worked on it since university, 30 years ago, but I'll get back to it;
  2. I can surely strengthen priors ;
  3. the model is a RW1 : formula.rw1 <- y.rw1 ~ f(id.w, model = "rw1",constr = FALSE) - 1
Thank you again.
Kind regards
DS

Helpdesk (Haavard Rue)

unread,
Nov 12, 2023, 4:37:01 AM11/12/23
to Dominique Soudant, R-inla discussion group

add to the rw model, scale.model=TRUE



On Thu, 2023-11-09 at 05:53 -0800, Dominique Soudant wrote:
> Dear Håvard,
>
> thank you very much for your answers :
>    1. I think we're talking about something like this:
> https://en.wikipedia.org/wiki/Censoring_(statistics)#Likelihood; I
> haven't worked on it since university, 30 years ago, but I'll get back
> to it;
>    2. I can surely strengthen priors ;
>    3. the model is a RW1 : formula.rw1 <- y.rw1 ~ f(id.w, model =

Dominique Soudant

unread,
Nov 17, 2023, 10:16:05 AM11/17/23
to R-inla discussion group
Dear Håvard,

Thank you for your suggestion. However, one of my colleagues found the Easter egg. The formula code was written as follows :
formula.rw1 <- log(yLogNCen) ~ f(id.w

                                ,model = "rw1"
                                ,constr = FALSE
                                # ,scale.model=TRUE
                                )
                                -1

which implies ( at least in an interactive R session) that "-1" is not included in the formula. If I write :
formula.rw1 <- log(yLogNCen) ~ -1 + f(id.w

                                ,model = "rw1"
                                ,constr = FALSE
                                # ,scale.model=TRUE
                                )

everything works fine. I made the same mistake for the censored model, but after correcting it, I no longer get any warnings. I have attached the new code.

Thanks again
Kind regards.
DS
TSDLM 1 censored.r

Finn Lindgren

unread,
Nov 17, 2023, 10:49:10 AM11/17/23
to Dominique Soudant, R-inla discussion group
R terminates an expression at a line break if the expression is complete, so the reason the -1 didn’t take effect in the formula is because the “-“ was on a new line, making the formula terminate as the formula expression before it was a valid complete R statement, and the -1 became a separate statement. This would happen also in an interactive R session.
Finn
Reply all
Reply to author
Forward
0 new messages