Measuring difference between a non-positive-definite correlation matrix and its nearly positive definite equivalent

57 views
Skip to first unread message

Simon Harmel

unread,
Jun 21, 2024, 9:18:50 PM (11 days ago) Jun 21
to lav...@googlegroups.com
Hello all,

When encountering a non-positive definite (NPD) correlation matrix, I'm often tempted to use "Matrix::nearPD(NPD, corr=TRUE)$mat" to convert my NPD correlation matrix to a nearly positive definite correlation matrix (see below).

Question 1: Is there a way to measure how much change will this introduce to the original NPD correlation matrix and hence determine the extent to which this strategy is justified?

Thanks,
Simon

```
NPD_MATRIX <- structure(c(0.58, 0.55, 0.52, 0.34, 0.56, 0.45, 0.52, 0.42, 0.55,
0.64, 0.36, 0.2, 0.4, 0.29, 0.35, 0.29, 0.52, 0.36, 0.58, 0.53,
0.58, 0.36, 0.55, 0.57, 0.34, 0.2, 0.53, 0.77, 0.52, 0.33, 0.63,
0.57, 0.56, 0.4, 0.58, 0.52, 0.32, 0.42, 0.52, 0.5, 0.45, 0.29,
0.36, 0.33, 0.42, 1, 0.4, 0.33, 0.52, 0.35, 0.55, 0.63, 0.52,
0.4, 0.62, 0.52, 0.42, 0.29, 0.57, 0.57, 0.5, 0.33, 0.52, 0.73
), dim = c(8L, 8L), dimnames = list(c("L2DA", "L2DF", "L2G",
"L2L", "L2M", "L2P", "L2R", "L2V"), c("L2DA", "L2DF", "L2G",
"L2L", "L2M", "L2P", "L2R", "L2V")))

as.matrix(Matrix::nearPD(NPD_MATRIX, corr=TRUE)$mat)
```

Jeremy Miles

unread,
Jun 21, 2024, 9:37:04 PM (11 days ago) Jun 21
to lav...@googlegroups.com
Can you just find the difference?

Tweak to your code:

pdm <- as.matrix(Matrix::nearPD(NPD_MATRIX, corr=TRUE)$mat)

round(pdm - NPD_MATRIX, 3)
L2DA L2DF L2G L2L L2M L2P L2R L2V
L2DA 0.42 0.00 0.00 0.00 0.00 0 0.00 0.00
L2DF 0.00 0.36 0.00 0.00 0.00 0 0.00 0.00
L2G 0.00 0.00 0.42 0.00 0.00 0 0.00 0.00
L2L 0.00 0.00 0.00 0.23 0.00 0 0.00 0.00
L2M 0.00 0.00 0.00 0.00 0.68 0 0.00 0.00
L2P 0.00 0.00 0.00 0.00 0.00 0 0.00 0.00
L2R 0.00 0.00 0.00 0.00 0.00 0 0.38 0.00
L2V 0.00 0.00 0.00 0.00 0.00 0 0.00 0.27


It looks like it just increases the variances. 

This is probably a bad idea - it is a bit like adding measurement error to your data - that will make your indicators less reliable which will decrease chi-square and RMSEA (hooray!) and decrease CFI (boo!).

Jeremy

--
You received this message because you are subscribed to the Google Groups "lavaan" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lavaan+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/lavaan/CACgv6yXOWZwdw1rfy8tuJB-oAii%3D6EVOHq9xWdLxE1cPuYtKoA%40mail.gmail.com.

Simon Harmel

unread,
Jun 24, 2024, 2:30:23 PM (9 days ago) Jun 24
to lav...@googlegroups.com
Jeremy (or others),

Thanks for your input. I have no problem trying something better. But I wonder what other options are there to use when dealing with a non-positive definite (NPD) correlation matrix as input in SEM?

Thanks,
Simon

Terrence Jorgensen

unread,
Jun 24, 2024, 4:06:44 PM (9 days ago) Jun 24
to lavaan
what other options are there to use when dealing with a non-positive definite (NPD) correlation matrix as input in SEM?

See the ridge= description on the ?lavOptions help page.

Terrence D. Jorgensen    (he, him, his)
Assistant Professor, Methods and Statistics
Research Institute for Child Development and Education, the University of Amsterdam
http://www.uva.nl/profile/t.d.jorgensen

Jeremy Miles

unread,
Jun 24, 2024, 4:49:44 PM (9 days ago) Jun 24
to lav...@googlegroups.com
My preference would be to work out why I've got an NPD matrix and fix that.

In my experience it's one of: 
 - pairwise deletion (which is a bad idea anyway)
 - using polychoric correlations and something has gone wrong (usually associated with small sample size)
 - I've made some sort of mistake with the data (put the same variable in twice, put in a variable that's a sum of others). 

Jeremy


Reply all
Reply to author
Forward
0 new messages