Weighted Covariance estimation using lavcor

Kai Nehler

unread,

Feb 22, 2023, 11:02:38 AM2/22/23

to lavaan

Dear lavaan project team,
I am wondering how to compute a weighted covariance/correlation matrix using the lavcor. In the long run, I want to combine the process with missing values, but I already fail to achieve it with a full data case. The lavcor function does not have an argument for weights, but the help mentions that any arguments from lavaan can be used (in this case sampling.weights). For a minimal example of my code (with really excessive weights):

M <- matrix(c(1, 0.7, 0.7, 0.5, 0.7, 1, 0.95, 0.3, 0.7, 0.95, 1, 0.3, 0.5, 0.3, 0.3, 1),

nrow=4, ncol=4)

set.seed(1901)

data_small <- as.data.frame(MASS::mvrnorm(n = 30, mu = rep(0,4), Sigma = M))

weights <- c(rep(0.3, 15), rep(100000,15))

lavaan::lavCor(data_small, estimator = "ML", output = "cov", sampling.weights = weights)

lavaan::lavCor(data_small, estimator = "ML", output = "cov")

Looking at the help from the lavaan function, sampling.weights should be a variable from the dataset. Therfor, I included the weights as a variable, which would probabaly not be an option having missing values using an EM-algorithm, but it does not change anything.

data_small$weights <- weights

lavaan::lavCor(data_small, estimator = "ML", output = "cov", sampling.weights = "weights")

lavaan::lavCor(data_small[,1:4], estimator = "ML", output = "cov")

I would be glad about any pointers for corrections in the code.All the best and many thanks,
Kai

Terrence Jorgensen

unread,

Feb 23, 2023, 5:00:40 PM2/23/23

to lavaan

Hmm, glad you provided the reprex. The lavCor() function estimates correlations/covariances among all the variables in object=, except the group= variable. The sampling.weights= variable should also be excluded (and passed to lavaan() instead), but it is not. I posted an issue on GitHub. Until it is resolved, you can set output = "fit" to get what you need:

## first obtain the parameter table for the model of interest

satFit <- lavaan::lavCor(data_small[,1:4], estimator = "ML", output = "fit")
PT <- parTable(satFit)

## then fit that model again, using sample weights
satFitW <- lavaan(PT, data_small, sampling.weights = "weights")

## without weights

lavInspect(satFit, "cov.ov")
V1 V2 V3 V4
V1 0.887
V2 0.502 0.840
V3 0.541 0.836 0.916
V4 0.574 0.338 0.373 1.161

## accounting for weights
lavInspect(satFitW, "cov.ov")
V1 V2 V3 V4
V1 1.039
V2 0.591 0.877
V3 0.610 0.903 1.030
V4 0.833 0.507 0.522 1.180

Terrence D. Jorgensen

Assistant Professor, Methods and Statistics

Research Institute for Child Development and Education, the University of Amsterdam

http://www.uva.nl/profile/t.d.jorgensen

Kai Nehler

unread,

Feb 24, 2023, 7:07:46 AM2/24/23

to lavaan

Thank your for your response! That helps fitting the model with weights using fully observed data.

Like I previously mentioned, I want do combine the weights and missing values for covariance estimation. I carried on with my already simulated dataset and added some random

missing values.data_small_mis <- data_small

data_small_mis[sample(5:30,4),2] <- NA

Using lavcor, I wanted to do an EM-algorithm, which is done if missing = "ML" as described in the help funciton. This works as expected when using:

Fit_mis <- lavaan::lavCor(data_small_mis[,1:4], estimator = "ML", missing = "ML", output = "fit")

I tried including the weights in the way you demonstrated without missing values:

PT_mis <- parTable(satFit_mis)

satFitW_mis <- lavaan(PT_mis, data_small_mis, sampling.weights = "weights")

This returns an error message, that the model might not be identified, which is rather unlikely I think. Could this be due to a coding problem? Or does it maybe not make sense to include weights in an EM-algorithm with missing values from a theoretical perspective?

Terrence Jorgensen

unread,

Feb 25, 2023, 4:57:27 AM2/25/23

to lavaan

I want do combine the weights and missing values for covariance estimation

There's nothing about my suggestion that prevents that. The saturated model should be identified, unless you have some extreme case (e.g., none of the same cases have values for a pair of variables). You'll probably have to provide illustrative data if you want help with your specific nuanced case.

Kai Nehler

unread,

Mar 3, 2023, 9:50:42 AM3/3/23

to lavaan

Thank you for your response! I do not have actual data - I am working with simulated data. I actually missed in my code, that it was only a warning, so the code works.

But I found a different interesting behavior, still using the simulated dataset.

M <- matrix(c(1, 0.7, 0.7, 0.5, 0.7, 1, 0.95, 0.3, 0.7, 0.95, 1, 0.3, 0.5, 0.3, 0.3, 1),

nrow=4, ncol=4)

set.seed(1901)

data_small <- as.data.frame(MASS::mvrnorm(n = 30, mu = rep(0,4), Sigma = M))

weights <- c(rep(0.3, 15), rep(100000,15))

missing values.data_small_mis <- data_small

data_small_mis[sample(5:30,4),2] <- NA

Fit_mis <- lavaan::lavCor(data_small_mis[,1:4], estimator = "ML", missing = "ML", output = "fit")

# Again, this would be taking the code from the full dataset and just changing the names

PT_mis <- parTable(satFit_mis)

satFitW_mis <- lavaan(PT_mis, data_small_mis, sampling.weights = "weights")

# I also tried to add the arguments for estimation and missing handling and it returns different results.

satFitW_mis_args <- lavaan::lavaan(PT_mis, data_small_mis, estimator ="ML", missing = "ML", sampling.weights = "weights")

lavaan::lavInspect(satFitW_mis_non, "cov.ov")
lavaan::lavInspect(satFitW_mis_args, "cov.ov")

Debugging the lavaan function and many helper functions, I would assume that using the missing argument again (SATFITW_mis_args) would be correct for integrating the weights in the EM-algorithm. But I am not completely sure and would be interested in your feedback.

Reply all

Reply to author

Forward