Using INLA in the precence of missing covariates

601 views
Skip to first unread message

Mahbubeh Parsaeian

unread,
May 4, 2014, 1:03:33 AM5/4/14
to r-inla-disc...@googlegroups.com

Dear all

I have a data set in which both the response variable and some of co-variates are missing. As I read in FAQ, in the presence of missing values in covariates INLA consider it as 0 (If x[i] = NA this means that x[i] is not part of the linear predictor for y[i]. For fixed effects, this is equivalent to x[i]=0.). Also it is mentioned you can formulate a joint model for the data and the covariates in the case of missing covariates, but its not explained how.I am a beginner R user so I just want to know is there any solution to deal with these missing values? I should mention the number of missing values is small for these covariates.

Thanks in advance for answering this question.

Best

Mahboubeh 

Håvard Rue

unread,
May 4, 2014, 4:53:05 AM5/4/14
to Mahbubeh Parsaeian, r-inla-disc...@googlegroups.com
Hi,

there is no standard way to define a model for the missing covariates,
this is part of the modelling itself. If you have any idea of how to
formulate a model for the missing covariates, we can discuss how to
implement it jointly with the basic model.

Best
H

--
Håvard Rue
Department of Mathematical Sciences
Norwegian University of Science and Technology
N-7491 Trondheim, Norway
Voice: +47-7359-3533 URL : http://www.math.ntnu.no/~hrue
Mobile: +47-9260-0021 Email: havar...@math.ntnu.no

R-INLA: www.r-inla.org

Mahbubeh Parsaeian

unread,
May 4, 2014, 6:16:25 AM5/4/14
to r-inla-disc...@googlegroups.com, Mahbubeh Parsaeian, hr...@r-inla.org

Thank you very much for your prompt and timely response.

I have three complete covariates (G,M,UR) and three incomplete covariates (H,U,P) which have 3 (H), 14 (U) and 33 (P) missing values respectively from the total of 186 observations.

I have two solutions: first forget incomplete covariates and run the model just for complete covariates (The predictive power of this model is not good enough)

Second, I should use additional information of these incomplete covariates to improve the prediction model.

Let me consider each of these incomplete covariates as response. I can appropriately predict H and P according to three complete covariates.

I think it’s not a good job to predict missing values of one covariate and then use predicted values in the final model. I think it’s better to find a joint model to simultaneously deal with missing values of X and Y as I think AMELIA package deal with missing values.

Thanks again. 

Best,

Mahboubeh

Finn Lindgren

unread,
May 4, 2014, 6:34:47 AM5/4/14
to Mahbubeh Parsaeian, r-inla-disc...@googlegroups.com, hr...@r-inla.org
Hi,

I'll pull up your last paragraph to respond to:

> I think it’s not a good job to predict missing values of one
> covariate and then use predicted values in the final model. I think
> it’s better to find a joint model to simultaneously deal with missing
> values of X and Y as I think AMELIA package deal with missing values.

Yes, the CRAN Amelia package does bootstrap imputation, which is what
one needs to do in the frequentist setting, and the method is based on
the same idea needed in the Bayesian setting: construct a model for
the missing covariates and use that for inference, either via sampling
(bootstrap or Bayesian Monte Carlo) or direct calculation (INLA).

Unfortunately the Amelia documentation seems focused on the imputation
algorithm and not on the underlying models, so isn't very helpful.
It's the model for your covariate that you need to have to do this in INLA.

Finn
> --
> You received this message because you are subscribed to the Google
> Groups "R-inla discussion group" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to r-inla-discussion...@googlegroups.com
> <mailto:r-inla-discussion...@googlegroups.com>.
> To post to this group, send email to
> r-inla-disc...@googlegroups.com
> <mailto:r-inla-disc...@googlegroups.com>.
> Visit this group at http://groups.google.com/group/r-inla-discussion-group.
> For more options, visit https://groups.google.com/d/optout.

Mahbubeh Parsaeian

unread,
May 4, 2014, 8:58:56 AM5/4/14
to r-inla-disc...@googlegroups.com, Mahbubeh Parsaeian, hr...@r-inla.org

Dear Finn

I’m not sure I understood you solution correctly. You mean:

1-I construct a model for missing covariate. For example predict H (missing covariate) via a spatial model (which explains high percentage of variation in H and captures significant spatial correlation that it shows).

2- Use the predicted results (fitted values and its distribution) to draw samples from predicted covariates (H).

3- Fit the Final  model based on response variable (Y) and simulated values of covariate (H)?

Sorry as mentioned I am a beginner. Please guide me whether I understood your solution correctly.

Thanks a lot.

Mahboubeh

> To post to this group, send email to
> r-inla-disc...@googlegroups.com

Finn Lindgren

unread,
May 4, 2014, 10:04:53 AM5/4/14
to Mahbubeh Parsaeian, r-inla-disc...@googlegroups.com, hr...@r-inla.org
On 4 May 2014, at 13:58, Mahbubeh Parsaeian <mparsa...@gmail.com> wrote:

I’m not sure I understood you solution correctly. You mean:

1-I construct a model for missing covariate. For example predict H (missing covariate) via a spatial model (which explains high percentage of variation in H and captures significant spatial correlation that it shows).

2- Use the predicted results (fitted values and its distribution) to draw samples from predicted covariates (H).

3- Fit the Final  model based on response variable (Y) and simulated values of covariate (H)?

That is one approach, yes, but there is no real need to actually _sample_ from the covariate model, and it can be very computationally costly, since the final model needs to be estimated once per resampled covariate combination (this is what you'd do when using the Amelia package, I believe).  Instead, one can do a joint estimation of the covariate model and the final model, entirely without sampling. (This typically works when the covariate is continuous; for discrete covariates it can be more difficult to avoid sampling, but that depends on the precise structure of the particular model.)

In the Bayesian setting, missing values are nothing special, as long as there is a model for "everything that is unobserved and affects what is observed".  So when the joint model for _everything_ can be written on the form that inla supports, there is no need to "impute" or resample.

Finn




To unsubscribe from this group and stop receiving emails from it, send an email to r-inla-discussion...@googlegroups.com.
To post to this group, send email to r-inla-disc...@googlegroups.com.

Mahbubeh Parsaeian

unread,
May 4, 2014, 1:14:53 PM5/4/14
to r-inla-disc...@googlegroups.com, Mahbubeh Parsaeian, hr...@r-inla.org

Thanks you very much.
My covariates are all continuous. I think I should find a way to jointly estimate the covariate model and the response model in a way that inla supports.
I used tutorials and worked examples to fit a spatial model for covariate and response. These tutorials really helped me to formulate my model. Is there any example or tutorial that can help me to write this model in inla, I mean to jointly estimate parameters of covariate model and response model?

Best

Mahboubeh

To unsubscribe from this group and stop receiving emails from it, send an email to r-inla-discussion-group+unsub...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages