Weighted Covariance estimation using lavcor

88 views
Skip to first unread message

Kai Nehler

unread,
Feb 22, 2023, 11:02:38 AM2/22/23
to lavaan
Dear lavaan project team,
I am wondering how to compute a weighted covariance/correlation matrix using the lavcor. In the long run, I want to combine the process with missing values, but I already fail to achieve it with a full data case. The lavcor function does not have an argument for weights, but the help mentions that any arguments from lavaan can be used (in this case sampling.weights). For a minimal example of my code (with really excessive weights):

M <- matrix(c(1, 0.7, 0.7, 0.5, 0.7, 1, 0.95, 0.3, 0.7, 0.95, 1, 0.3, 0.5, 0.3, 0.3, 1),
                 nrow=4, ncol=4)
set.seed(1901)
data_small <- as.data.frame(MASS::mvrnorm(n = 30, mu = rep(0,4), Sigma = M))
weights <- c(rep(0.3, 15), rep(100000,15))
lavaan::lavCor(data_small, estimator = "ML", output = "cov", sampling.weights = weights)
lavaan::lavCor(data_small, estimator = "ML", output = "cov")

Looking at the help from the lavaan function, sampling.weights should be a variable from the dataset. Therfor, I included the weights as a variable, which would probabaly not be an option having missing values using an EM-algorithm, but it does not change anything.

data_small$weights <- weights
lavaan::lavCor(data_small, estimator = "ML", output = "cov", sampling.weights = "weights")
lavaan::lavCor(data_small[,1:4], estimator = "ML", output = "cov")

I would be glad about any pointers for corrections in the code.All the best and many thanks,
Kai

Terrence Jorgensen

unread,
Feb 23, 2023, 5:00:40 PM2/23/23
to lavaan
Hmm, glad you provided the reprex.  The lavCor() function estimates correlations/covariances among all the variables in object=, except the group= variable.  The sampling.weights= variable should also be excluded (and passed to lavaan() instead), but it is not.  I posted an issue on GitHub.  Until it is resolved, you can set output = "fit" to get what you need:

## first obtain the parameter table for the model of interest
satFit <- lavaan::lavCor(data_small[,1:4], estimator = "ML", output = "fit")
PT <- parTable(satFit)

## then fit that model again, using sample weights
satFitW <- lavaan(PT, data_small, sampling.weights = "weights")

## without weights
lavInspect(satFit, "cov.ov")
      V1    V2    V3    V4
V1 0.887                  
V2 0.502 0.840            
V3 0.541 0.836 0.916      
V4 0.574 0.338 0.373 1.161

## accounting for weights
lavInspect(satFitW, "cov.ov")
      V1    V2    V3    V4
V1 1.039                  
V2 0.591 0.877            
V3 0.610 0.903 1.030      
V4 0.833 0.507 0.522 1.180

Terrence D. Jorgensen
Assistant Professor, Methods and Statistics
Research Institute for Child Development and Education, the University of Amsterdam

Kai Nehler

unread,
Feb 24, 2023, 7:07:46 AM2/24/23
to lavaan
Thank your for your response! That helps fitting the model with weights using fully observed data.
Like I previously mentioned, I want do combine the weights and missing values for covariance estimation. I carried on with my already simulated dataset and added some random

missing values.data_small_mis <- data_small
data_small_mis[sample(5:30,4),2] <- NA

Using lavcor, I wanted to do an EM-algorithm, which is done if missing = "ML" as described in the help funciton. This works as expected when using:

Fit_mis <- lavaan::lavCor(data_small_mis[,1:4], estimator = "ML", missing = "ML", output = "fit")

I tried including the weights in the way you demonstrated without missing values:

PT_mis <- parTable(satFit_mis)
satFitW_mis <- lavaan(PT_mis, data_small_mis, sampling.weights = "weights")

This returns an error message, that the model might not be identified, which is rather unlikely I think. Could this be due to a coding problem? Or does it maybe not make sense to include weights in an EM-algorithm with missing values from a theoretical perspective?

Terrence Jorgensen

unread,
Feb 25, 2023, 4:57:27 AM2/25/23
to lavaan
I want do combine the weights and missing values for covariance estimation

There's nothing about my suggestion that prevents that.  The saturated model should be identified, unless you have some extreme case (e.g., none of the same cases have values for a pair of variables).  You'll probably have to provide illustrative data if you want help with your specific nuanced case.

Kai Nehler

unread,
Mar 3, 2023, 9:50:42 AM3/3/23
to lavaan
Thank you for your response! I do not have actual data - I am working with simulated data. I actually missed in my code, that it was only a warning, so  the code works.

But I found a different interesting behavior, still using the simulated dataset.

M <- matrix(c(1, 0.7, 0.7, 0.5, 0.7, 1, 0.95, 0.3, 0.7, 0.95, 1, 0.3, 0.5, 0.3, 0.3, 1),
                 nrow=4, ncol=4)
set.seed(1901)
data_small <- as.data.frame(MASS::mvrnorm(n = 30, mu = rep(0,4), Sigma = M))
weights <- c(rep(0.3, 15), rep(100000,15))
missing values.data_small_mis <- data_small
data_small_mis[sample(5:30,4),2] <- NA
Fit_mis <- lavaan::lavCor(data_small_mis[,1:4], estimator = "ML", missing = "ML", output = "fit")

# Again, this would be taking the code from the full dataset and just changing the names

PT_mis <- parTable(satFit_mis)
satFitW_mis <- lavaan(PT_mis, data_small_mis, sampling.weights = "weights")

# I also tried to add the arguments for estimation and missing handling and it returns different results.

satFitW_mis_args <- lavaan::lavaan(PT_mis, data_small_mis, estimator ="ML", missing = "ML", sampling.weights = "weights")
lavaan::lavInspect(satFitW_mis_non, "cov.ov")
lavaan::lavInspect(satFitW_mis_args, "cov.ov")

Debugging the lavaan function and many helper functions, I would assume that using the missing argument again (SATFITW_mis_args) would be correct for integrating the weights in the EM-algorithm. But I am not completely sure and would be interested in your feedback.
Reply all
Reply to author
Forward
0 new messages