Binary endogenous variable in SEM and rsquare issue

288 views
Skip to first unread message

Rosie

unread,
Apr 6, 2020, 2:58:28 PM4/6/20
to lavaan
Hello,

I'm trying to run a model (a replication of a previous paper) in lavaan using the 'sem()' function. I have four endogenous variables (three are continuous and one is a binary variable), five exogenous variables, and I have one covariate. My sample size is N=1172. I believe my issue is to do with specifying the binary variable in my model - when I run the model without the binary variable (with 3 endogenous variables), it works just fine. However, when I do include the binary variable (0/1 coded responses), I get this error:

lavaan WARNING:
    Could not compute standard errors! The information matrix could
    not be inverted. This may be a symptom that the model is not
    identified.lavaan WARNING: could not invert information matrix needed for robust test statistic



My model is as follows:

___________________________________________________________________________________________________

mymodel <- '

#endogenous variables 

endo1 =~ sds_q1 + sds_q4 +  sds_q6 + sds_q8

endo2 =~ sds_q2 + sds_q3 + sds_q5 + sde_q7

endo3 =~ dis_q1 + dis_q2 + dis_q3 + dis_q4

endo4 =~ sh_q2               #categorical variable (0/1)

#exogenous variables

exo1 =~ sresil_q1 + sresil_q2 + sresil_q5 + sresil_q9 + sresil_q11 + sresil_q13 + sresil_q19 + sresil_q20 + sresil_q25

exo2 =~ sresil_q4 +sresil_q8 + sresil_q10 + sresil_q12 + sresil_q14 + sresil_q18 + sresil_q23 + sresil_q24

exo3 =~ sresil_q3 +  sresil_q6 + sresil_q7 + sresil_q11 + sresil_q13 + sresil_q15 + sresil_q16 + sresil_q17 + sresil_q21 + sresil_q22  

exo4 =~ in_q1 + in_q2 + in_q3 + in_q4 + in_q5

exo5 =~ inq_6 + in_q7 + in_q8 + in_q9 + in_q10

#covariate
covariate =~ csds_q1 + csds_q2 + csds_q3 + csds_q4 + csds_q5 + csds_q6 + csds_q7 + csds_q8 + csds_q9 + csds_q10


#regression

endo1 ~ covariate

endo2 ~ endo1 + covariate

endo3~  endo2 + exo1 + exo2 + exo3 + exo4 + exo5 + covariate

endo4 ~ endo3 + covariate'


# model identification and estimation
modelfit <- sem(mymodel, data = data, ordered = "sh_q2")  #to specify categorical variable

#print results
summary(modelfit, fit.measures = T, standardized = T) 

_________________________________________________________________________________________________


I have two questions;

1. In the output for the model above, I get the error but I also get the output. All standard errors are NA, and lavaan doesn't compute SEM fit indices. I'd really appreciate some guidance as to where I might be going wrong with this.

2. When I run the model without the binary variable (exactly the same as above but without "endo4"), and I print rsquare values, only two values show up - the other is na. I get no errors, however, I'm curious as to why this might be the case, and how I might be able to get all three rsquare values as in the study I am replicating. It looks like this:

endo1          NA
endo2          0.848
endo3          0.695

Overall, could the issue be that I am just specifying too many endogenous variables for lavaan? Or that I'm specifying it incorrectly?

Thank you for any help!


Terrence Jorgensen

unread,
Apr 9, 2020, 7:30:50 PM4/9/20
to lavaan
Have you tried just regressing the binary variable on its predictors, instead of needlessly defining a single-indicator factor?  With a single binary indicator, unexpected identification issues might be at play.

Terrence D. Jorgensen
Assistant Professor, Methods and Statistics
Research Institute for Child Development and Education, the University of Amsterdam

Rosie Pendrous

unread,
May 25, 2020, 3:02:30 AM5/25/20
to lavaan
Hi Terrence,

Thank you very much for your response, and apologies for my delayed reply. As this is a replication study, ideally we would still use SEM, so I've run the model removing the categorical variable as an indicator in the structural model, and it seems to have worked just fine.

I did have another query if possible: my categorical endogenous variable is binary (either 0, 1 -- 'no', 'yes'), which Rosseel (2020) suggests lavaan can deal with. However, this variable is hugely zero-inflated (i.e. there are 1172 total cases, and 852 of them are '0' values). Do you have any thoughts on how this could be an issue in an SEM, or know of any papers that can help? There are papers on this for logistic regression generally, but less so on this applied in a SEM. 

Thanks again,

Rosie

Yves Rosseel

unread,
May 26, 2020, 12:26:59 PM5/26/20
to lav...@googlegroups.com
On 5/25/20 9:02 AM, Rosie Pendrous wrote:
> I did have another query if possible: my categorical endogenous variable
> is binary (either 0, 1 -- 'no', 'yes'), which Rosseel (2020) suggests
> lavaan can deal with. However, this variable is hugely zero-inflated
> (i.e. there are 1172 total cases, and 852 of them are '0' values).

That is not a problem. The term 'zero-inflated' is not used for binary
variables. It is (usally) used for count variables.

Yves.
Reply all
Reply to author
Forward
0 new messages