[DLM with INLA] Interventions

54 views
Skip to first unread message

Dominique Soudant

unread,
Jul 1, 2024, 10:08:23 AMJul 1
to R-inla discussion group
Hello,
In time series analysis, let's define an "intervention" as a modification of the model parameters to take into account of exogenous information. I'm considering two types of "intervention" here: 1) dealing with an exceptional value (i.e. neither false nor dubious) also known as an outlier 2) dealing with a level change. With the dlm package, I was doing an "intervention" for an outlier by allowing the observation variance to be greater than the routine observation variance (i.e. V_outlier = (1+exp(delta))*V_routine, delta estimated by maximum likelihood). Similarly, for a level change, I was allowing the evolution variance of the mean level to be greater than the routine evolution variance of the mean level.

In INLA, let's say that Obs are the observations.

Outlier
Let's say that the outlier is the 200th observation (RW1 process). The way I found to add an intervention for such outlier is as follow :
ObsMat <- matrix(cbind(Obs,NA),n,2)
ObsMat[200,1] <- NA
ObsMat[200,2] <- Obs[200]

RW1 = ObsMat ~ 0 + f(i, model="rw1",constr=FALSE)

# Estimation
model.RW1 = inla(RW1,
  data = data.frame(i),
  family = rep("gaussian", 2), # Two gaussian noise 1) regular observations 2) outlier
  control.family = list(
    list()
    ,list()    
  ),
  control.predictor = list(compute = TRUE)
)
This induces that an observation variance is specifically estimated for the outlier and my preliminary tests show that it works rather well.

Question : is there a way to specify that this outlier variance has to be greater than the "routine" observation variance ?

Level shifts
Let's say that in addition to the previous outlier, two interventions for level shifts have  to be specified at the 400th and the 500th observations.
k <- rep(0,n)
k[400:length(Obs)] <- 1
l <- rep(0,n)
l[500:length(Obs)] <- 1
RW1 = ObsMat ~ 0 + f(i, model="rw1",constr=FALSE) + k + l

# Estimation
model.RW1 = inla(RW1,
  data = data.frame(i,k,l),
  family = rep("gaussian", 2), # Two gaussian noise 1) regular observations 2) outlier
  control.family = list(
    list()
    ,list()    
  ),
  control.predictor = list(compute = TRUE)
)
This works pretty well too. But it's not the same idea as enlarging the evolution variance of the mean level.

Question : I can live with this way of handling level change interventions, but, alternatively, I wonder if it's possible to specify a higher evolution variance of the mean level for this type of intervention ?

Thank you for all the job done around INLA.

DS

Finn Lindgren

unread,
Jul 2, 2024, 8:48:18 AMJul 2
to R-inla discussion group
Hi,

more general versions of such models can be implemented with rgeneric
and generic, but in your particular example, there are some simpler
alternatives:

Since you treat the "outliers" as having known location in the data,
you could introduce a random effect component that is added only to
those observations, in effect giving them variance variance_obs +
extra_variance_outlier, instead of what you did, which was to have two
separate models. By adding the random effect only to the outliers, you
will be guaranteed that their model variance is larger than for
non-outliers. This can be accomplished by
+f(additional_variability, model="iid")
where additional_variability has NA for all non-outliers, and indices
1, 2, 3, 4, ..., number_of_outliers, for the outliers.

Similarly, to get increased innovation variance after a specific time
point, you could add another random walk component there, constrained
to start at zero at that time, and applied only after that time (by
having NA in the input vector for earlier time points.
The combined effect would be that of an increased innovation variance
from that time (drawback of that approach is that it doesn't allow you
to estimate whether the change is a decrease or an increase).
Something like this:
+ f(extra_walk,model="rw1",constr=FALSE,extraconstr=list(A=matrix(rep(c(0,1,0),c(start_time-1,1,total_time-start_time)),
e = 0))
> --
> You received this message because you are subscribed to the Google Groups "R-inla discussion group" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to r-inla-discussion...@googlegroups.com.
> To view this discussion on the web, visit https://groups.google.com/d/msgid/r-inla-discussion-group/853cb581-c2f8-43d1-b938-b00edd2da360n%40googlegroups.com.



--
Finn Lindgren
email: finn.l...@gmail.com

Dominique Soudant

unread,
Jul 3, 2024, 8:52:47 AMJul 3
to R-inla discussion group
Hi,
Thank you very much for your reply. I'll get back to you as soon as possible.
Regards,
DS

Dominique Soudant

unread,
Jul 3, 2024, 11:09:56 AMJul 3
to R-inla discussion group
Hi,
Let's consider the following data simulation :
# SSsimple is used to simulate the time series.
if(!require("SSsimple")){
  install.packages("SSsimple",repos="http://cran.irsn.fr")
  require("SSsimple")
}

# The model is :
# Y_t = F*Theta_t + epsilon_t, epsilon_t ~ N(0,V)
# Theta_t = GG*Theta_{t-1} + omega_t, omega_t ~ N(0,W)

# F and GG are structurally derived from the model described above.
# The model described above and the package dlm were used to estimate V, W, and
# Beta0 from actual data.

#set.seed(2)

# Observation "matrix"
F <- matrix(c(1),1,1)

# Evolution matrix
GG <- matrix(c(1)
,1
,1
,byrow=TRUE)

# Observation variance
V <- 0.2721225

# Evolution variance matrix
W <- matrix(c(10**-6)
,1
,1
,byrow=TRUE)

# Time 0 state "matrix"
Beta0 <- c(0.4250634)

# Simulating the time series :
sim <- SS.sim(F=GG
,Q=W
,H=F
,R=V
,600
,Beta0
)

Obs <- sim$Z

# adding an outlier
Obs[round(length(Obs)/3,0)] <- 10

# Adding level shift
Obs[(2*round(length(Obs)/3,0)):length(Obs)] <- Obs[(2*round(length(Obs)/3,0)):length(Obs)]+2
Obs[500:length(Obs)] <- Obs[500:length(Obs)]-1
If I'm right, the INLA model with the outlier as an iid noise is :
# INLA estimation of the model _________________________________________________
# Formula ______________________________________________________________________
n=length(Obs)
i <- 1:n

# outlier
j <- rep(NA,n)
j[round(length(Obs)/3,0)] <- 1

k <- rep(0,n)
k[(2*round(length(Obs)/3,0)):length(Obs)] <- 1

l <- rep(0,n)
l[500:length(Obs)] <- 1

RW1 = Obs ~ 0 + f(i, model="rw1",constr=FALSE)  +
              f(j,model="iid") +

              k +
              l

# Estimation
# loading the INLA library
# ------------------------
require(INLA)

model.RW1 = inla(RW1,
  data = data.frame(i,j,k,l),
  family = rep("gaussian", 1),
  control.family = list(
    list()

  ),
  control.predictor = list(compute = TRUE)
)
But the behaviour is not as expected : 
rm("r")
r <- model.RW1

SIM <- Obs
INLA <- r$summary.fitted.values[1:n,1]
LL <- r$summary.fitted.values[1:n,3]
UL <- r$summary.fitted.values[1:n,5]
YRange <- range(Obs,INLA,LL,UL,na.rm=TRUE)
plot(SIM,col="lightgray",ylim=YRange)
lines(INLA)
lines(LL,lty="dashed")
lines(UL,lty="dashed")
abline(v=seq(1,length(Obs),by=52),col="lightgray")
The fitted values jump up to the outlier instead of ignoring it. I think I'd already tried that way and my conclusion was that the intervention for the outlier has to be on the observation variance, not on the unobserved component.

Regards
DS

Finn Lindgren

unread,
Jul 3, 2024, 1:36:29 PMJul 3
to Dominique Soudant, R-inla discussion group
Comment on your final sentence:

The "fitted values" predictor includes the random effects, as it
includes _all_ the latent variables in the model. Since the outlier
effect is explicitly modelled by a random effect component, it will
also be part of the "fitted values" predictor, so it _should_ "jump
up" to the outlier.
Since you're using the extra component to "capture" outliers, you're
probably looking for the posterior prediction that does _not_ include
that in the expression. This is probably most easily done by posterior
sampling; that allows you to compute any functional of the model
components you like.

Put differently:
The extra component does not explicitly need to be part of the
observation variance if you spell out the _whole_ model in
mathematical form, i.e. including an extra random term for the
observation noise. It will be something like
all_except_noise + regular_observation_noise + outlier_noise
The "fitted values" you looked at are "all_except_noise +
outlier_noise", but you (in your specific model case) were only
interested in all_except_noise. The solution is to make sure you're
looking at all_except_noise, and not the "fitted values", since the
"fitted values" are not what your model interpretation is interested
in. What you want to do is simply different from the normal type of
models, so the default "fitted values" isn't the right thing for you.
Solve that by looking at the right thing (e.g. via posterior sampling
that lets you compute anything you want).
"fitted values" is only the right thing when you want "the sum of
_everything_ in the latent predictor model". In your case the extra
outlier noise component is specifically there because you're
interested in "the sum of everything _except_ the additional noise
induced by outliers", so "fitted values" isn't what you need.

Finn
> To view this discussion on the web, visit https://groups.google.com/d/msgid/r-inla-discussion-group/a5089232-46e5-4c80-8747-f139a5561e7fn%40googlegroups.com.

Dominique Soudant

unread,
Jul 4, 2024, 3:16:29 AMJul 4
to R-inla discussion group
Ok I get it, thank you so much for taking the time to reply.
Regards
DS
Reply all
Reply to author
Forward
0 new messages