Estimating Bifactor Indices (PUC, ECV, H, FD) with ordered categorical indicators (using WLSMV or ULSMV)

325 views

Skip to first unread message

Manuel Torres Sahli

unread,

Jan 29, 2021, 2:02:27 PM1/29/21

to lavaan

Shorter story: I'm calculating PUC, ECV, H, and FD for a bifactor model following Rodriguez, Reise, and Haviland (2016). Would the equations need changes to work with ordered categorical indicators (using WLSMV or ULSMV)? Rodriguez et al. exemplify with a Likert scale but only work with the polychoric correlation marix. The calculations I'm currently using for continuous outcomes (using MLR) or non mean/variance adjusted DWLS are at the end.

Btw, of course I'm not aware of these indices being included in any lavaan-related package (e.g. semTools). I hope I did not miss that. In any case, would you consider that useful or feasible in the future?

Longer story:

I am analyzing a bifactor model with ordered categorical outcomes (5 points Likert-type scale). Following Rodriguez, Reise, and Haviland (2016) I am studying (1) the reliability of unit-weighted composite scores; (2) the use of a set of items to compute factor scores or to identify a latent variable in an SEM context; and (3) whether multidimensional (bifactor) data are “unidimensional enough” to specify a unidimensional measurement model in an SEM context.

The first aspect is dealt with omega hierarchical which as far as I know is adapted for categorical indices in `semTools::reliability` following different equations from Green and Yang (2009).

For the second aspect, Rodriguez et al. recommend using indices H and FD. For the third, they advise calculating explained common variance of the general factor on all items (ECV) and on specific items (I-ECV), as well as the percentage of uncontaminated correlations (PUC). Except for PUC, I am not sure if the formulas they propose are directly applicable when working with ordered categorical indicators using mean and variance adjusted estimators (WLSMV/ULSMV).

I'm sorry if the answers are obvious and in the case of PUC I honestly can't imagine how it would be different. I try my best to understand rigorously what I do, but due to my still limited knowledge on the mathematics underlying WLSMV (or ULSMV), I preferred to ask here since most members will know better.

Many thanks in advance

Manuel

References

Rodriguez A, Reise SP, Haviland MG. Evaluating bifactor models: calculating and interpreting statistical indices. Psychol Methods. 2016;21(2):137.

Code

These are the formulas I've used when treating the indicators as continuous (MLR) taken initially from https://github.com/ddueber/BifactorIndicesCalculator :

fit <- cfa(model = bifactor.model, data = data, estimator = "WLSMV")

fit.std <- lavInspect(fit, "std")

Lambda <- fit.std$lambda

Lambda[is.na(Lambda)] <- 0

Lambda <- as.matrix(Lambda)

Theta <- fit.std$theta

Phi <- fit.std$psi

class(Phi) <- "matrix"

#######

# ECV #

#######

# Rodriguez et al. (2016, equation 10)

ECV_SS_C <- function(Fac, Lambda) {

inFactor <- Lambda[, Fac] != 0

L2 = Lambda^2

sum(L2[, Fac] * inFactor)/sum(L2 * inFactor)

}

ECV_results <- sapply(1:ncol(Lambda), ECV_SS_C, Lambda)

names(ECV_results) <- colnames(Lambda)

ECV_results

#######

# PUC #

#######

# Rodriguez et al. (2016, no equation but example in p. 144)

numItemsOnFactor <- colSums(Lambda != 0)

specificCorrelationCount <- sum(sapply(numItemsOnFactor,

function(x) {

x * (x - 1)/2

})) - (nrow(Lambda) * (nrow(Lambda) - 1)/2)

1 - specificCorrelationCount/(nrow(Lambda) * (nrow(Lambda) -

1)/2)

#####

# H #

#####

# Rodriguez et al. (2016, equation 9)

1/(1 + 1/(colSums(Lambda ^2/(1 - Lambda^2))))

######

# FD #

######

# Rodriguez et al. (2016, equation 8)

Psi <- diag(Theta)

Sigma <- Lambda %*% Phi %*% t(Lambda) + diag(Psi)

FacDet <- sqrt(diag(Phi %*% t(Lambda) %*% solve(Sigma) %*%

Lambda %*% Phi))

names(FacDet) <- colnames(Lambda)

FacDet

Terrence Jorgensen

unread,

Feb 9, 2021, 3:19:10 PM2/9/21

to lavaan

Would the equations need changes to work with ordered categorical indicators (using WLSMV or ULSMV)? ... I'm not aware of these indices being included in any lavaan-related package (e.g. semTools).

Only the reliability() function you are already aware of, which does account for the latent-response model that is so popular with ordinal indicators.

I hope I did not miss that. In any case, would you consider that useful or feasible in the future?

Sure, you can submit a function to semTools that returns these indices for bifactor models. Try looking through some of the other user-contributed function's source code to see what the procedure looks like, e.g., htmt(), and please use roxygen comments in your source code. I rely on the roxygen2 package to automate the generation of help pages, etc. Again, looking at the source code available on GitHub will be a good guide.

https://github.com/simsem/semTools/tree/master/semTools/R

I try my best to understand rigorously what I do, but due to my still limited knowledge on the mathematics underlying WLSMV (or ULSMV), I preferred to ask here since most members will know better.

I don't think the estimator is important. For instance, the Green & Yang (2009) method you cited does not require DWLS or ULS to be used to obtain point estimates. You could use pairwise or marginal MLE. In any case, the formulas you are using only seem to rely on point estimates, so they should apply regardless of the estimator. Of course, what the point estimates mean can vary across estimators (e.g., PML and DWLS will return point estimates interpreted in terms of latent normal variables underlying observed ordinal variables, whereas ULS and ML will treat the numeric values of ordinal categories as though they are in fact numbers on the real line). So the indices' interpretation might be more/less valid under certain estimators than others, but their calculations should be the same.

It might be worth reaching out for guidance from the study authors, who I'm sure would be interested if someone were trying to automate their method for lavaan users.

Terrence D. Jorgensen

Assistant Professor, Methods and Statistics

Research Institute for Child Development and Education, the University of Amsterdam