I am trying to compute a probit regression model where all the covariates are categorical. Here are the model specification, declaration of ordered categorical variables, and call to "sem":
model <- 'CHG.UNAFFILIATED ~ CHG.UNBELIEVER + CONFIDENCE.CHURCHES + POWER.CHURCHES + REL.RELEVANCE.TODAY + REL.REPRESENTS.PAST + NUM.UNAFFILIATED.PARENTS + ATTEND.11.12 + SEX + AGE.GROUP + DEGREE + HHCHILDR + URBRURAL'
ordered <- c("CHG.UNAFFILIATED","CHG.UNBELIEVER","CONFIDENCE.CHURCHES","POWER.CHURCHES","REL.RELEVANCE.TODAY","REL.REPRESENTS.PAST","ATTEND.11.12","SEX","AGE.GROUP","DEGREE","URBRURAL","NUM.UNAFFILIATED.PARENTS","HHCHILDR")
> fit <- sem(model,
data = df, ordered = ordered, std.lv = TRUE, estimator = "WLSMV")
Warning message:
In lav_data_full(data = data, group = group, cluster = cluster, :
lavaan WARNING: exogenous variable(s) declared as ordered in data: CHG.UNBELIEVER CONFIDENCE.CHURCHES POWER.CHURCHES REL.RELEVANCE.TODAY REL.REPRESENTS.PAST NUM.UNAFFILIATED.PARENTS ATTEND.11.12 SEX AGE.GROUP DEGREE HHCHILDR URBRURAL
When compared with the results of the “glm” function with probit link the estimates regression coefficients, standard errors and p-values look OK:
> summary(fit)
lavaan 0.6-8 ended normally after 45 iterations
Estimator DWLS
Optimization method NLMINB
Number of model parameters 13
Used Total
Number of observations 928 1334
Model Test User Model:
Standard Robust
Test Statistic 0.000 0.000
Degrees of freedom 0 0
Parameter Estimates:
Standard errors Robust.sem
Information Expected
Information saturated (h1) model Unstructured
Regressions:
Estimate Std.Err z-value P(>|z|)
CHG.UNAFFILIATED ~
CHG.UNBELIEVER 0.864 0.150 5.740 0.000
CONFIDENCE.CHU 0.032 0.081 0.397 0.692
POWER.CHURCHES 0.004 0.089 0.045 0.964
REL.RELEVANCE. -0.009 0.102 -0.090 0.928
REL.REPRESENTS -0.216 0.094 -2.296 0.022
NUM.UNAFFILIAT 1.105 0.084 13.179 0.000
ATTEND.11.12 0.255 0.056 4.539 0.000
SEX 0.135 0.117 1.154 0.248
AGE.GROUP 0.456 0.105 4.340 0.000
DEGREE -0.264 0.115 -2.305 0.021
HHCHILDR -0.671 0.319 -2.106 0.035
URBRURAL -0.045 0.047 -0.940 0.347
Intercepts:
Estimate Std.Err z-value P(>|z|)
.CHG.UNAFFILIAT 0.000
Thresholds:
Estimate Std.Err z-value P(>|z|)
CHG.UNAFFILIAT 3.312 0.645 5.137 0.000
Variances:
Estimate Std.Err z-value P(>|z|)
.CHG.UNAFFILIAT 1.000
Scales y*:
Estimate Std.Err z-value P(>|z|)
CHG.UNAFFILIAT 1.000
However, the intercept and variance of the single response variable (CHG.UNAFFILIATED) are fixed to one and zero, respectively, and when I try to obtain the fit indices usually reported I obtain the following:
> round(inspect(fit, what = "fit")[indices],3)
chisq df pvalue cfi rmsea rmsea.ci.lower rmsea.ci.upper srmr
0 1 1 1 0 0 0 0
How can I specify the model so that it provides measures of the overall model misfit like the Chisq and RMSEA in this situation?
Thank you very much for your attention.
Sincerely,
Carlos M. Lemos
--
You received this message because you are subscribed to the Google Groups "lavaan" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lavaan+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/lavaan/5159863c-c387-4c55-8727-1576010666f0n%40googlegroups.com.
Dear Pat,
Thank you very much for your attention and prompt reply.
About 1), I thought that “lavaan” was handling the ordinal covariates correctly because the program issued a warning and not an error message.
About 2), I noticed that the model is saturated, but my problem is precisely on how to overcome this situation (of course, the model is not predicting the response exactly). With “glm”, the same model is not saturated, and I get the deviance D, from which I can derive other measures of model misfit. In the “lavaan” model, the variance and intercept of the response variable are both fixed. “lavaan” works from the variance-covariance matrix, which is different from “glm”.
I want to compare this basic probit regression model with both "glm" and one or two SEM based on the same variables, to see whether the SEMs provide evidence on the plausibility of the causal hypotheses embodied in them. For this, I need a lavaan probit model that yields measures of overall misfit.
I will try your suggestion of generating dummy variables for the ordinal covariates, testing first with one or two variables, and see if this change leads to a model that is not saturated.
Once again, many thanks for your time and attention,
Carlos M. Lemos
To view this discussion on the web visit https://groups.google.com/d/msgid/lavaan/CAJhmz4fdiuEV%2B0PH2mFiA_WLuseH2eF9gsoegE3rRZQh7y_%3DSg%40mail.gmail.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/lavaan/CAApGuxpt8SnbtmGEaGh-zWVX0BU%3DRTH7idEL%2BpcUF%3DU6n6pNdA%40mail.gmail.com.
Dear Pat,
Thank you very much for your time and patience in hanging on and helping me!
Here is the call to “glm” (I used probit link so that the results could be more closely compared with those by “lavaan”):
> fit <- glm(model, family = binomial(link = "probit"), data = na.omit(df))
The formula is as follows, with the same data frame and variables:
> fit$formula
[1] "CHG.UNAFFILIATED ~ CHG.UNBELIEVER + CONFIDENCE.CHURCHES + POWER.CHURCHES + REL.RELEVANCE.TODAY + REL.REPRESENTS.PAST + NUM.UNAFFILIATED.PARENTS + ATTEND.11.12 + SEX + AGE.GROUP + DEGREE + HHCHILDR + URBRURAL"
The model computed by “glm” has 28 parameters
> fit$rank
[1] 28
and is not saturated because
> fit$df.residual
[1] 703
i.e. there are a lot more data points than coefficients (parameters) to be estimated. With this, I can get several GOF indices, for example:
> with(fit,c(deviance,aic))
[1] 559.2648 615.2648
Returning to “lavaan”, I inspected the solution (also named "fit") in my first post:
> fitted(fit)
$res.cov
CHG.UN
CHG.UNAFFILIATED 1
CHG.UNAFFILIATED
0
CHG.UNAFFILIATED|t1
3.312
$res.slopes
CHG.UN CONFID POWER. REL.REL REL.REP NUM.UN ATTEND SEX AGE.GR DEGREE HHCHIL URBRUR
CHG.UNAFFILIATED 0.864 0.032 0.004 -0.009 -0.216 1.105 0.255 0.135 0.456 -0.264 -0.671 -0.045
$cov.x
CHG.UN CONFID POWER. REL.REL REL.REP NUM.UN ATTEND SEX AGE.GR DEGREE HHCHIL URBRUR
CHG.UNBELIEVER 0.188
CONFIDENCE.CHURCHES -0.008 0.656
POWER.CHURCHES 0.011 0.287 0.570
REL.RELEVANCE.TODAY 0.012 0.058 0.033 0.369
REL.REPRESENTS.PAST 0.006 0.150 0.129 0.132 0.442
NUM.UNAFFILIATED.PARENTS 0.058 0.004 -0.002 -0.017 -0.018 0.615
ATTEND.11.12 0.115 -0.103 -0.035 -0.063 -0.091 0.184 0.935
SEX 0.013 -0.008 -0.019 -0.007 -0.027 -0.013 0.015 0.234
AGE.GROUP 0.009 -0.031 -0.020 -0.032 -0.033 0.060 0.017 -0.011 0.282
DEGREE -0.008 -0.065 -0.024 0.007 -0.008 -0.018 0.003 0.015 0.018 0.240
HHCHILDR 0.002 0.001 0.001 -0.002 0.000 0.003 0.009 -0.001 -0.002 -0.001 0.024
URBRURAL 0.012 -0.069 -0.025 0.074 0.025 -0.085 -0.012 0.001 -0.075 0.090 -0.010 1.268
$mean.x
CHG.UNBELIEVER CONFIDENCE.CHURCHES POWER.CHURCHES REL.RELEVANCE.TODAY REL.REPRESENTS.PAST NUM.UNAFFILIATED.PARENTS
1.250 2.087 1.909 2.722 2.571 2.505
ATTEND.11.12 SEX AGE.GROUP DEGREE HHCHILDR URBRURAL
2.078 1.371 2.097 1.602 1.025 2.724
and found that the only free parameters are the threshold of the response variable and the regression coefficients. Using inspect(fit, what = “free”), I found that the whole variance-covariance is fixed. I guess that is the reason why lavaan estimates the regression coefficients, but cannot compute indices like chisq, rmsea, etc. in this situation. I will see if I can free for example the variance of the response variable, or other parameters, in a meaningful way.
Once again, many thanks for your time and help,
Carlos M. Lemos
To view this discussion on the web visit https://groups.google.com/d/msgid/lavaan/CAJhmz4c3BoVaUiTv%2BmNZOsaFehEgN%3DP3VazkwHdgpt0T%3DBZwMQ%40mail.gmail.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/lavaan/CAApGuxoPE1wkxGiG2-wbRkzR_o9o1W4Q-sGF0%3Dv0hshggUshdw%40mail.gmail.com.