Reporting a polychoric correlation matrix in an article

M. C.

unread,

May 9, 2016, 4:57:56 PM5/9/16

to lavaan

Hello,

I am doing a questionnaire validation (confirmative) study with skewed data (n = 268), so I used lavaan's MLM estimator. It is my understanding that such an estimator is based on a polychoric correlation matrix (isn't it ?).

In the article Reporting Practices in Confirmatory Factor Analysis: An Overview and Some Recommendations, Jackson et al. (2009) advised to report in CFA papers the correlation matrix for the sake of replication (http://www.ncbi.nlm.nih.gov/pubmed/19271845).

Based on this example: http://rpackages.ianhowson.com/cran/lavaan/man/lavCor.html, I was able to create such a polychoric correlation matrix with the following commands (5-item scale):

SAS_DATA_ORD <- as.data.frame( lapply(SAS_DATA, cut, 5, labels=FALSE) )
lavCor(SAS_DATA_ORD, ordered=names(SAS_DATA_ORD))

It is my understanding that this set of commands creates a matrix based on an unrestricted model. However, such a correlation matrix would be more suitable for an exploratory factor analysis (EFA) than for a confirmatory factor analysis (CFA), since CFA is based on a restricted model.

Therefore, would it be more adequate to report in the article the correlation matrix based on the restricted model that is the aim of the CFA ? How is it possible to create such a polychoric correlation matrix based on my restricted model ?
Is the command inspect(SAS.FIT, "sampstat")$cov appropriate for this ?

I would appreciate any guidance on this subject.

Thank you,

Michael

Terrence Jorgensen

unread,

May 10, 2016, 3:36:27 AM5/10/16

to lavaan

I used lavaan's MLM estimator. It is my understanding that such an estimator is based on a polychoric correlation matrix (isn't it ?).

No, it is regular ML estimation. Afterward, the standard errors and chi-squared statistic are adjusted as a function of the excess kurtosis of the variables.

In the article Reporting Practices in Confirmatory Factor Analysis: An Overview and Some Recommendations, Jackson et al. (2009) advised to report in CFA papers the correlation matrix for the sake of replication (http://www.ncbi.nlm.nih.gov/pubmed/19271845).

You should analyze the covariance matrix. Fitting the model to the correlation matrix will yield incorrect standard errors. If you report the correlation matrix, you should also report the SDs for the sake of replication. Since you are using a robust estimator, you should also report the skew and kurtosis, although I'm not sure whether those univariate statistics would be sufficient to replicate the analysis.

Based on this example: http://rpackages.ianhowson.com/cran/lavaan/man/lavCor.html, I was able to create such a polychoric correlation matrix with the following commands (5-item scale):

SAS_DATA_ORD <- as.data.frame( lapply(SAS_DATA, cut, 5, labels=FALSE) )
lavCor(SAS_DATA_ORD, ordered=names(SAS_DATA_ORD))

It is my understanding that this set of commands creates a matrix based on an unrestricted model. However, such a correlation matrix would be more suitable for an exploratory factor analysis (EFA) than for a confirmatory factor analysis (CFA), since CFA is based on a restricted model.

EFA is less restricted than CFA because EFA allows all cross-loadings (or as many as possible to identify the model). But EFA is still imposes some restrictions, which is why the df > 0 (unless it is saturated). The "unrestricted" model you read about in the ?lavCor help page is just a saturated model, which freely estimates all bivariate correlations.

But you might not need to do this. Analyzing a polychoric correlation matrix is recommended for binary or ordinal indicators, not for continuous indicators, or at least approximately continuous (e.g., Likert items with at least 7 categories). And if you have categorical indicators, you don't need to estimate the polychoric correlation matrix. You just use the "ordered" argument to tell lavaan which variables are categorical, and it will use the appropriate steps by default.

Terrence D. Jorgensen

Postdoctoral Researcher, Methods and Statistics

Research Institute for Child Development and Education, the University of Amsterdam

UvA web page: http://www.uva.nl/profile/t.d.jorgensen

Mic Cantinotti

unread,

May 10, 2016, 9:51:19 AM5/10/16

to lavaan

Thank you for your explanation.

If I understand you well, I need to report the sample (observed) covariance-matrix. Is that syntax the right one to get that matrix ?

inspect(SAS.FIT, 'sampstat')

In the lavaan tutorial, it is indeed reported that "If you have no full dataset, but you do have a sample covariance matrix, you can still fit your model." (http://lavaan.ugent.be/tutorial/cov.html)

My data is measured on an ordinal scale with 5 levels. I tried to use WLSMV, but did not have a sample big enough for the solution to converge (WARNING: number of observation too small to compute Gamma).Therefore, my best second choice was MLM and I treated the scale like a continuous one, even if this can be criticized. I followed the indications provided here: https://groups.google.com/forum/#!topic/lavaan/pDJV6HNN9vc (message dated 21/11/2013).

Terrence Jorgensen

unread,

May 11, 2016, 5:19:10 AM5/11/16

to lavaan

I need to report the sample (observed) covariance-matrix. Is that syntax the right one to get that matrix ?

inspect(SAS.FIT, 'sampstat')

Yes

In the lavaan tutorial, it is indeed reported that "If you have no full dataset, but you do have a sample covariance matrix, you can still fit your model." (http://lavaan.ugent.be/tutorial/cov.html)

That page says nothing about robust estimation. The example there assumes data are normally distributed.

My data is measured on an ordinal scale with 5 levels. I tried to use WLSMV, but did not have a sample big enough for the solution to converge (WARNING: number of observation too small to compute Gamma).Therefore, my best second choice was MLM and I treated the scale like a continuous one, even if this can be criticized. I followed the indications provided here: https://groups.google.com/forum/#!topic/lavaan/pDJV6HNN9vc (message dated 21/11/2013).

In your previous post, you were using the "cut" function to turn SAS_DATA into ordered categories. I hope you are not doing that to your actual observed data. If your actual data is continuous, leave it that way. This research suggests you can get good estimates of structural parameters (e.g., factor correlations) when using robust MLE with ordinal indicators if they have at least 5 categories. But your research is about the measurement model, and that same article shows that more than 5 categories would be necessary for good estimates of factor loadings. If you can't even run WLSMV because of your small sample and you can't get more data, then you should be aware (and make your readers aware) that the ML estimates are probably attenuated.

Mic Cantinotti

unread,

May 11, 2016, 11:03:08 AM5/11/16

to lavaan

Hello,

The transformation of the data to ordered categories was only based on my erroneous assumption that I needed to provide the reader with a polychoric correlation matrix. I was not aiming at transforming the data for the CFA.

Thank you for your advice and the article. This is very appreciated.

Reply all

Reply to author

Forward

Reporting a polychoric correlation matrix in an article - restricted or unrestricted ?

M. C.

Terrence Jorgensen

Mic Cantinotti

Terrence Jorgensen

Mic Cantinotti