factor scores with missing variables

196 views
Skip to first unread message

Michele

unread,
Sep 21, 2017, 7:57:38 AM9/21/17
to lavaan
Hello everyone! 

I have a simple question on factor scores in Lavaan. 

After performing CFA on a dataset I would like to compute the predicted factor scores (either using the regression method or the Bartlett method) for each individual. 
I can easily do so using the predict or lavPredict commands. 

I've read on this forum that this can only be done for complete observations, however when I use this function with my dataset (which includes some incomplete observations) I get factor scores also for those observations with missing data. Am I missing something or doing something wrong here? Has this function be updated to deal with missing observations? If so, what formula is used to compute the factor scores in this case? 

Thank you! 

Michele 

Yves Rosseel

unread,
Nov 8, 2017, 8:49:32 AM11/8/17
to Michele Giannola, lav...@googlegroups.com
The issue was the combination of missing data + newdata. This now works
in lavaan dev 0.6-1.1176

Yves.

On 09/21/2017 03:06 PM, Michele Giannola wrote:
> Hi Yves,
> Thank you so much for your clarification!
>
> I am having another (probably very silly) issue with predict. I am
> estimating a model on a dataset and would like to use the model's
> parameters to perform the prediction on another (larger) dataset.
> However, for some reason, when I use the "newdata" option this seems to
> be completely ignored and the factor scores are computed on the original
> dataset.
>
> I am not sure why this is the case and any I would greatly appreciate
> any help from your side.
>
> I am copying the code below and attaching the datasets.
>
> Thank you again !
>
> Best,
> Michele
>
> library(lavaan)
>
> # Dataset used to estimate the model:
> data_factor_model <- read.csv("dataset2.csv", header=T)
> # Dataset used to predict factor scores:
> data_predict <- read.csv("dataset1.csv", header=T)
>
> # 1. Estimate model:
> input_FM <-cbind(scale(cbind(data_factor_model$x6, data_factor_model$x1,
> data_factor_model$x2, data_factor_model$x3,
>                              data_factor_model$x4, data_factor_model$x5  ),
>                        center = T, scale = T))
>
> my.model <- ' cog =~ V1 + V2 + V3 + V4 + V5 + V6  '
>
> fit <- cfa(my.model, data=input_FM , meanstructure = TRUE, missing = "ML")
> summary(fit, fit.measures=TRUE)
>
> # 2. Do prediction:
> input_S <-scale(cbind(data_predict$x6, data_predict$x1, data_predict$x2,
> data_predict$x3,
>                       data_predict$x4, data_predict$x5),
>                 center = T, scale = T)
>
> cog <- lavPredict(fit, newdata = input_S, method = "Bartlett")
>
>
>
>
> 2017-09-21 13:04 GMT+01:00 Yves Rosseel <yros...@gmail.com
> <mailto:yros...@gmail.com>>:
>
> Has this function be updated to deal with missing observations?
>
>
> Yes. It can now handle missing observations.
>
> If so, what formula is used to compute the factor scores in this
> case?
>
>
> Single imputation. Under the model, we first create a single imputed
> dataset (using the conditional expectation of the missing values,
> given the model, and given the observed data). Then, we compute
> factor scores as if all the data is observed.
>
> Yves.
>
>

--
Yves Rosseel -- http://www.da.ugent.be
Department of Data Analysis, Ghent University
http://lavaan.org
Reply all
Reply to author
Forward
0 new messages