How do I set the variance? lavaan ERROR: some variables have no values (only missings) or no variance.

717 views
Skip to first unread message

Camille Williams

unread,
Apr 6, 2020, 3:33:43 AM4/6/20
to lavaan
Hi, 

I want to compare the factor scores I obtain when I do the CFA with the complete dataset to the factor scores from the CFA where I have all data points except for one variable where I have no data points (here, x1). For the latter CFA, I fix the loadings, variance, and intercept with the values from the CFA with the complete dataset (see code below). 

However, when I attempt to run the CFA model with missing data points (fit_fixed) I get this error.  Error in lav_data_full(data = data, group = group, cluster = cluster, : lavaan ERROR: some variables have no values (only missings) or no variance 

I evidently do not have any data points so how can I specify the variance ? 

# Data
 
HolzingerSwineford1939

# 1. CFA on participants who completed all cognitive tests
 
G_factor_model <- ' G  =~ NA*x1 + x4 + x5 + x6 + x9
G ~~ 1*G
'
fit <- cfa(G_factor_model, data=HolzingerSwineford1939, meanstructure=TRUE)
parameterestimates(fit)
FactorScores <- lavPredict(fit, method = "ml")

# 2. CFA on participants who completed all cognitive tests except x1

HolzingerSwineford1939_no_x1 <- setDT(as.data.frame(HolzingerSwineford1939[, -7]))
HolzingerSwineford1939_no_x1$x1 <- NA #blank column

G_factor_model_fixed <- ' G  =~ 0.480*x1 + 0.990*x4 + 1.102*x5 + 0.913*x6 + 0.276*x9
# fix variance
G ~~ 1*G
x1 ~~  1.128*x1
# fix intercepts
x1 ~ 4.936 * 1
x4 ~ 3.061 * 1
x5 ~ 4.341* 1
x6 ~ 2.186 * 1
x9 ~ 5.374 * 1
'
fit_fixed <- cfa(G_factor_model_fixed, data=HolzingerSwineford1939_no_x1, meanstructure=TRUE, missing = "ml")

FactorScores_fixed <- lavPredict(fit_fixed,
method = "ml")

# Compare factor scores
cor
(
FactorScores, FactorScores_fixed)



Nickname

unread,
Apr 7, 2020, 1:51:42 PM4/7/20
to lavaan
Camille,
  I could be missing something, but I think that your question suggests Ken Bollen's definition of a latent variable as any variable in the model that is absent from the data.  Specifically, if your second data set has no x1 values, then it seems to me that if you want to retain x1 in the model, you need to model it as a latent variable.  You can specify a latent variable with no indicators using "x1 ~= 0" as shown in point 7 of the model.syntax help file. 

  However, it is not obvious to me at the moment why you could not simply omit x1 from the model fit to the data with no x1 values.  I am not sure how a variable with no data impacts factor score estimation.  My guess is that the factor score estimation will ignore x1 either way because it is not part of the lambda matrix either way.  Only observed variables can be dependent variables in the lambda matrix.

Keith
------------------------
Keith A. Markus
John Jay College of Criminal Justice, CUNY
http://jjcweb.jjay.cuny.edu/kmarkus
Frontiers of Test Validity Theory: Measurement, Causation and Meaning.
http://www.routledge.com/books/details/9781841692203/

Camille Williams

unread,
Apr 8, 2020, 5:42:06 AM4/8/20
to lavaan
Hi Keith, 

This message is more of a comment/discussion. 

Update on the code. I shouldn't have to repeat the CFA model as I did previously. Since I want to apply the CFA performed on the data without missing values on data with missing values, I can do the following, which only works if x1 is not empty. (Here I run it with 100 missing values in x1). 

dt <- HolzingerSwineford1939

model <- 'G  =~ NA*x1 + x4 + x5 + x6 + x9
          G ~~ 1*G'

# Perform CFA on full dataset

fit_fiml <- cfa(model, data=dt, missing="fiml")

# Remove some values for varible x4
dtWithNA <- dt
dtWithNA[1:100, "x1"] <- NA_real_
nrow(dt)# 301

# Compute factor scores using data with missing values
FactorScores_missing <- lavPredict(fit_fiml, newdata=dtWithNA, method="ML")

# Compare factor scores
cor
(FactorScores, FactorScores_missing)



Based on this article by Estabrook and Neale (2013, link: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3773873/): 
"Individual’s factor score was estimated from a model that included that individual’s row of data (with missing values where appropriate) as the entire data set for that model. All factor loadings, residual variances, and manifest intercepts for that model were fixed at the values found in the initial model estimation. The only free parameter in both ML methods was the factor score estimate. [...]

When missing values were present, the FIML method proceeds by eliminating those rows and columns of the predicted covariance matrix corresponding to the position of the missing values. Elements of the predicted mean vector (or threshold matrix) are also removed, so the likelihood calculation is performed only for those values present in the data. " 


Keeping x1 in the CFA model computes factor scores for individuals with missing data [1:100] that are more similar to the factor scores they would have obtained if these individuals did not have missing data in x1. 

## Two models
dt
<- HolzingerSwineford1939
dtWithNA
<- dt dtWithNA[1:100, "x1"] <- NA_real_ nrow(dt)# 301
dt_no_x1
<- dt[-7]

G_factor_model
<- ' G  =~ NA*x1 + x4 + x5 + x6 + x9 G ~~ 1*G'
G_factor_model_no_x1
<- ' G  =~ NA*x4 + x5 + x6 + x9 G ~~ 1*G'

fit
<- cfa(G_factor_model, data=dt,missing = "ml")
fit_no_x1
<- cfa(G_factor_model_no_x1, data=dt,missing = "ml")

# Factor Scores - Complete Data Set, Data set wthout x1, Dataset where x1[1:100] = NA
dtWithNA
<- setDT(dtWithNA)
dtWithNA
[1:100, 7] # column 7 corresponds to x1
# [1] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
# [32] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
# [63] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
# [94] NA NA NA NA NA NA NA

FactorScores <- lavPredict(fit, newdata = dt, method = "ml")
FactorScores_no_x1  <- lavPredict(fit_no_x1, newdata = dt_no_x1, method = "ml")
FactorScores_NA <- lavPredict(fit, newdata = dtWithNA, method = "ml")

# Keeping x1 in the CFA model allows for factor scores for individuals with missing data [1:100] that are more similar to the factor scores they would have obtained if the individual 1:100 did not have missing data.
cor
(FactorScores[1:100], FactorScores_NA[1:100])
#[1] 0.9981
cor
(FactorScores[1:100], FactorScores_no_x1[1:100])
#[1] 0.9975



Nickname

unread,
Apr 9, 2020, 8:41:36 AM4/9/20
to lavaan
Camille,
  That helps to clarify what you want to do and everything that you wrote seems correct.  Two thoughts:

1. One feature of the factor analysis model is that because it models items from the latent variable rather than vice versa, if the model is correctly specified, then omitting an item should not affect the parameters for other items.  Factor score estimation, of course, reverses this and models the latent variable from the items.  So, depending upon your motivation, it might make sense to compare the factor score estimates based on the full model to factor score estimates based on a model that eliminates x1 and fixes all the remaining parameters to their values in the full model.

2. You could do a manual version of multiple imputation by using the full sample to fit a model predicting x1 from the other observed variables, then simulate multiple sets of imputed values for x1, then compute factor scores with the imputed data sets, then compare the estimates from the complete data to the distribution across the imputed data sets.  It would still seem okay to check the imputation by comparing the univariate distribution of x1 to the imputed x1 values so long as you do not tweak the imputation to optimize the match of the covariance matrix between complete data and imputed data.  Again, the usefulness of this comparison depends on what the original motivation was.

Keith
Reply all
Reply to author
Forward
0 new messages