GOF indices for a probit model with ordinal covariates in lavaan

lemos.mari...@gmail.com

unread,

Jun 28, 2021, 12:20:43 PM6/28/21

to lavaan

I am trying to compute a probit regression model where all the covariates are categorical. Here are the model specification, declaration of ordered categorical variables, and call to "sem":

model <- 'CHG.UNAFFILIATED ~ CHG.UNBELIEVER + CONFIDENCE.CHURCHES + POWER.CHURCHES + REL.RELEVANCE.TODAY + REL.REPRESENTS.PAST + NUM.UNAFFILIATED.PARENTS + ATTEND.11.12 + SEX + AGE.GROUP + DEGREE + HHCHILDR + URBRURAL'

ordered <- c("CHG.UNAFFILIATED","CHG.UNBELIEVER","CONFIDENCE.CHURCHES","POWER.CHURCHES","REL.RELEVANCE.TODAY","REL.REPRESENTS.PAST","ATTEND.11.12","SEX","AGE.GROUP","DEGREE","URBRURAL","NUM.UNAFFILIATED.PARENTS","HHCHILDR")

> fit <- sem(model, data = df, ordered = ordered, std.lv = TRUE, estimator = "WLSMV")

Warning message:

In lav_data_full(data = data, group = group, cluster = cluster, :

lavaan WARNING: exogenous variable(s) declared as ordered in data: CHG.UNBELIEVER CONFIDENCE.CHURCHES POWER.CHURCHES REL.RELEVANCE.TODAY REL.REPRESENTS.PAST NUM.UNAFFILIATED.PARENTS ATTEND.11.12 SEX AGE.GROUP DEGREE HHCHILDR URBRURAL

When compared with the results of the “glm” function with probit link the estimates regression coefficients, standard errors and p-values look OK:

> summary(fit)

lavaan 0.6-8 ended normally after 45 iterations

Estimator DWLS

Optimization method NLMINB

Number of model parameters 13

Used Total

Number of observations 928 1334

Model Test User Model:

Standard Robust

Test Statistic 0.000 0.000

Degrees of freedom 0 0

Parameter Estimates:

Standard errors Robust.sem

Information Expected

Information saturated (h1) model Unstructured

Regressions:

Estimate Std.Err z-value P(>|z|)

CHG.UNAFFILIATED ~

CHG.UNBELIEVER 0.864 0.150 5.740 0.000

CONFIDENCE.CHU 0.032 0.081 0.397 0.692

POWER.CHURCHES 0.004 0.089 0.045 0.964

REL.RELEVANCE. -0.009 0.102 -0.090 0.928

REL.REPRESENTS -0.216 0.094 -2.296 0.022

NUM.UNAFFILIAT 1.105 0.084 13.179 0.000

ATTEND.11.12 0.255 0.056 4.539 0.000

SEX 0.135 0.117 1.154 0.248

AGE.GROUP 0.456 0.105 4.340 0.000

DEGREE -0.264 0.115 -2.305 0.021

HHCHILDR -0.671 0.319 -2.106 0.035

URBRURAL -0.045 0.047 -0.940 0.347

Intercepts:

Estimate Std.Err z-value P(>|z|)

.CHG.UNAFFILIAT 0.000

Thresholds:

Estimate Std.Err z-value P(>|z|)

CHG.UNAFFILIAT 3.312 0.645 5.137 0.000

Variances:

Estimate Std.Err z-value P(>|z|)

.CHG.UNAFFILIAT 1.000

Scales y*:

Estimate Std.Err z-value P(>|z|)

CHG.UNAFFILIAT 1.000

However, the intercept and variance of the single response variable (CHG.UNAFFILIATED) are fixed to one and zero, respectively, and when I try to obtain the fit indices usually reported I obtain the following:

> round(inspect(fit, what = "fit")[indices],3)

chisq df pvalue cfi rmsea rmsea.ci.lower rmsea.ci.upper srmr

0 1 1 1 0 0 0 0

How can I specify the model so that it provides measures of the overall model misfit like the Chisq and RMSEA in this situation?

Thank you very much for your attention.

Sincerely,

Carlos M. Lemos

Pat Malone

unread,

Jun 28, 2021, 1:25:11 PM6/28/21

to lav...@googlegroups.com

Carlos,

Two things.

1) In SEM, exogenous variables are treated as interval, even if they are not continuous. That is why lavaan rejects declaring them as categorical. This is generally fine for continuous and binary variables, but not for multi-category discrete variables, whether ordinal or nominal. For those, you need to create dummy or effect codes.

2) For your other question, your model is saturated and necessarily fits perfectly. So you are getting fit measures, but they look odd because there are no degrees of freedom that would you give you basis to reject the model.

As a side note, regression procedures outside SEM almost always use saturated models. Since your model is a univariate multiple regression, your results should be parallel to those of glm (aside from differences introduced by the estimator or missing data handling).

Pat

--
You received this message because you are subscribed to the Google Groups "lavaan" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lavaan+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/lavaan/5159863c-c387-4c55-8727-1576010666f0n%40googlegroups.com.

--

Patrick S. Malone, PhD

Sr Research Statistician, FAR HARBΦR

+1 803.553.4181 | pat@ | farharbor.com

This message may contain confidential information; if you are not the intended recipient please notify the sender and delete the message.

Carlos Lemos

unread,

Jun 29, 2021, 5:23:26 AM6/29/21

to lav...@googlegroups.com

Dear Pat,

Thank you very much for your attention and prompt reply.

About 1), I thought that “lavaan” was handling the ordinal covariates correctly because the program issued a warning and not an error message.

About 2), I noticed that the model is saturated, but my problem is precisely on how to overcome this situation (of course, the model is not predicting the response exactly). With “glm”, the same model is not saturated, and I get the deviance D, from which I can derive other measures of model misfit. In the “lavaan” model, the variance and intercept of the response variable are both fixed. “lavaan” works from the variance-covariance matrix, which is different from “glm”.

I want to compare this basic probit regression model with both "glm" and one or two SEM based on the same variables, to see whether the SEMs provide evidence on the plausibility of the causal hypotheses embodied in them. For this, I need a lavaan probit model that yields measures of overall misfit.

I will try your suggestion of generating dummy variables for the ordinal covariates, testing first with one or two variables, and see if this change leads to a model that is not saturated.

Once again, many thanks for your time and attention,

Carlos M. Lemos

To view this discussion on the web visit https://groups.google.com/d/msgid/lavaan/CAJhmz4fdiuEV%2B0PH2mFiA_WLuseH2eF9gsoegE3rRZQh7y_%3DSg%40mail.gmail.com.

Pat Malone

unread,

Jun 29, 2021, 8:40:20 AM6/29/21

to lav...@googlegroups.com

Carlos,

Since your lavaan model as specified is a generalized linear multiple regression, the only way to introduce misfit would be to constrain one or more of the regression coefficients. This seems pretty arbitrary.

And getting a deviance statistic from glm() does not mean the model is not saturated. You didn't share your glm() function call, but unless you went out of your way (such as my previous paragraph), it is saturated. If you ask for fit measures from lavaan, you'll also see -2LL, AIC, etc., even from a saturated model.

Pat

To view this discussion on the web visit https://groups.google.com/d/msgid/lavaan/CAApGuxpt8SnbtmGEaGh-zWVX0BU%3DRTH7idEL%2BpcUF%3DU6n6pNdA%40mail.gmail.com.

Carlos Lemos

unread,

Jun 29, 2021, 1:19:37 PM6/29/21

to lav...@googlegroups.com

Dear Pat,

Thank you very much for your time and patience in hanging on and helping me!

Here is the call to “glm” (I used probit link so that the results could be more closely compared with those by “lavaan”):

> fit <- glm(model, family = binomial(link = "probit"), data = na.omit(df))

The formula is as follows, with the same data frame and variables:

> fit$formula

[1] "CHG.UNAFFILIATED ~ CHG.UNBELIEVER + CONFIDENCE.CHURCHES + POWER.CHURCHES + REL.RELEVANCE.TODAY + REL.REPRESENTS.PAST + NUM.UNAFFILIATED.PARENTS + ATTEND.11.12 + SEX + AGE.GROUP + DEGREE + HHCHILDR + URBRURAL"

The model computed by “glm” has 28 parameters

> fit$rank

[1] 28

and is not saturated because

> fit$df.residual

[1] 703

i.e. there are a lot more data points than coefficients (parameters) to be estimated. With this, I can get several GOF indices, for example:

> with(fit,c(deviance,aic))

[1] 559.2648 615.2648

Returning to “lavaan”, I inspected the solution (also named "fit") in my first post:

> fitted(fit)

$res.cov

CHG.UN

CHG.UNAFFILIATED 1

$res.int

CHG.UNAFFILIATED

0

$res.th

CHG.UNAFFILIATED|t1

3.312

$res.slopes

CHG.UN CONFID POWER. REL.REL REL.REP NUM.UN ATTEND SEX AGE.GR DEGREE HHCHIL URBRUR

CHG.UNAFFILIATED 0.864 0.032 0.004 -0.009 -0.216 1.105 0.255 0.135 0.456 -0.264 -0.671 -0.045

$cov.x

CHG.UN CONFID POWER. REL.REL REL.REP NUM.UN ATTEND SEX AGE.GR DEGREE HHCHIL URBRUR

CHG.UNBELIEVER 0.188

CONFIDENCE.CHURCHES -0.008 0.656

POWER.CHURCHES 0.011 0.287 0.570

REL.RELEVANCE.TODAY 0.012 0.058 0.033 0.369

REL.REPRESENTS.PAST 0.006 0.150 0.129 0.132 0.442

NUM.UNAFFILIATED.PARENTS 0.058 0.004 -0.002 -0.017 -0.018 0.615

ATTEND.11.12 0.115 -0.103 -0.035 -0.063 -0.091 0.184 0.935

SEX 0.013 -0.008 -0.019 -0.007 -0.027 -0.013 0.015 0.234

AGE.GROUP 0.009 -0.031 -0.020 -0.032 -0.033 0.060 0.017 -0.011 0.282

DEGREE -0.008 -0.065 -0.024 0.007 -0.008 -0.018 0.003 0.015 0.018 0.240

HHCHILDR 0.002 0.001 0.001 -0.002 0.000 0.003 0.009 -0.001 -0.002 -0.001 0.024

URBRURAL 0.012 -0.069 -0.025 0.074 0.025 -0.085 -0.012 0.001 -0.075 0.090 -0.010 1.268

$mean.x

CHG.UNBELIEVER CONFIDENCE.CHURCHES POWER.CHURCHES REL.RELEVANCE.TODAY REL.REPRESENTS.PAST NUM.UNAFFILIATED.PARENTS

1.250 2.087 1.909 2.722 2.571 2.505

ATTEND.11.12 SEX AGE.GROUP DEGREE HHCHILDR URBRURAL

2.078 1.371 2.097 1.602 1.025 2.724

and found that the only free parameters are the threshold of the response variable and the regression coefficients. Using inspect(fit, what = “free”), I found that the whole variance-covariance is fixed. I guess that is the reason why lavaan estimates the regression coefficients, but cannot compute indices like chisq, rmsea, etc. in this situation. I will see if I can free for example the variance of the response variable, or other parameters, in a meaningful way.

Once again, many thanks for your time and help,

Carlos M. Lemos

To view this discussion on the web visit https://groups.google.com/d/msgid/lavaan/CAJhmz4c3BoVaUiTv%2BmNZOsaFehEgN%3DP3VazkwHdgpt0T%3DBZwMQ%40mail.gmail.com.

Pat Malone

unread,

Jun 29, 2021, 1:32:44 PM6/29/21

to lav...@googlegroups.com

You're welcome.

And if it helps, df means something completely different in the two analyses. df in SEM packages doesn't depend on N. And as I said, you can get things like AIC from a saturated model in lavaan by asking for fit measures. D and AIC are not measures of absolute fit--you can't reject a model with them. They're only useful for relative fit.

Pat

To view this discussion on the web visit https://groups.google.com/d/msgid/lavaan/CAApGuxoPE1wkxGiG2-wbRkzR_o9o1W4Q-sGF0%3Dv0hshggUshdw%40mail.gmail.com.

Reply all

Reply to author

Forward