Estimation CFA of Categorical Data without "DWLS"

101 views

Skip to first unread message

Cristian Santa

unread,

Jan 4, 2018, 6:02:10 PM1/4/18

to lavaan

Hello,

I try to evaluate a theoretical CFA model. I am generating the samples with the simsem package according to the following model:

data <- generate(CFA.Model,200) #Generate a sample of multivariate normal distribution

Then I transform the data to categorical variables with the following function:

DichoData <- function(data,q){

D <- data.frame(ifelse (data < apply(data,2,quantile,q),0,1))

return(D)

}

With the implementation of the lavaan package, the model is as follows:

Model <- "

f1 =~ y1 + y2 + y3

f2 =~ y4 + y5 + y6

f1 ~~ f2

# With original data

fit.1 <- cfa(model = Model, data = data, std.lv = TRUE)

To implement the model with categorical variables, the "order" argument is added:

# With categorical data

fit.2 <- cfa(model = Model, data = DichoData(data,0.5), std.lv = T, order = names(data))

However, the model is estimated with DWLS. I would like to run the model with "ML", "WLS" or "ULS". The question is: Could I evaluate the model with the matrix of tetrachoric correlations? like this:

# Alternative

require(psych)

fit.3 <- cfa(model = Model, sample.cov = tetrachoric(DichoData(data,0.5))$rho,

sample.nobs = 200, std.lv = T, estimator = "ML")

fit.4 <- cfa(model = Model, sample.cov = tetrachoric(DichoData(data,0.5))$rho,

sample.nobs = 200, std.lv = T, estimator = "ULS")

fit.5 <- cfa(model = Model, sample.cov = tetrachoric(DichoData(data,0.5))$rho,

sample.nobs = 200, std.lv = T, estimator = "WLS")

The fit.3 and fit.4 models converge without problems, and the estimates of the parameters of the model are acceptable. The fit.5 model does not work because it needs the weight matrix.

Is it reasonable to assume that if the results are similar, then the method is acceptable?

Thanks for your attention.

Terrence Jorgensen

unread,

Jan 5, 2018, 9:13:08 AM1/5/18

to lavaan

according to the following model:

Your arrows should point in the other direction for factor loadings. The factor explains the indicators, not vice versa.

Then I transform the data to categorical variables with the following function:

DichoData <- function(data,q){
D <- data.frame(ifelse (data < apply(data,2,quantile,q),0,1))
return(D)
}

That means your threshold is not a population parameter, because it changes from sample to sample.

To implement the model with categorical variables, the "order" argument is added:

That should be ordered

However, the model is estimated with DWLS. I would like to run the model with "ML", "WLS" or "ULS".

Then you cannot trust your SEs or test statistics from ML or ULS. DWLS is used instead of WLS because the inverting the weight matrix becomes unstable with more than a couple variables (especially with such a small sample of 200) and eventually intractable with more than a few variables.

The question is: Could I evaluate the model with the matrix of tetrachoric correlations? like this:

Yes, but only your point estimates would be unbiased (see comment above about SEs and tests). You can also use lavCor() in the lavaan package instead of additionally depending on the psych package.

The fit.3 and fit.4 models converge without problems, and the estimates of the parameters of the model are acceptable. The fit.5 model does not work because it needs the weight matrix.

lavCor() can return a fitted lavaan model for estimating the tetrachorics (and thresholds). See the ?lavCor help page, bottom example. So you could construct the weight matrix from the vcov() output of that model for tetrachorics. I think you could omit the thresholds part of that matrix, since your input data is just the matrix of tetrachoric correlations.

Is it reasonable to assume that if the results are similar, then the method is acceptable?

The benchmark is not consistency across estimators, but each estimator's consistency with the true population parameters. As I pointed out above, your threshold should probably be a parameter (e.g., zero) instead of a sample quantile.

Terrence D. Jorgensen

Postdoctoral Researcher, Methods and Statistics

Research Institute for Child Development and Education, the University of Amsterdam