Help with model description and coefficient interpretation

938 views
Skip to first unread message

Paula M Luz

unread,
Jan 5, 2018, 9:18:50 AM1/5/18
to lavaan
Hello

I am trying to run a model as follows:

dmodel = '
g =~ a*item1 + b*item2 + c*item3 + d*item4 + e*item5 + f*item6
k =~ a*item7 + b*item8 + c*item9 + d*item10
g ~ k
binary01 ~ item11 + g
'

dfit = sem(model = dmodel,
           data = aux,
           std.lv = FALSE,
           ordered = c("binary01"))

So: items 1 through 6 are measured items that define latent variable g
items 7 through 10 are measured items that define latent variable k
k predicts g
then g + measured item11 predict a binary outcome (01 variable, ordered as needed)

The model seems to run fine but I have a few questions, in case anyone can help:

1. How to interpret the coefficients of my final regression. Ideally, I would like to use logistic regression so that I could interpret the coefficient of g and item11 as odds ratios. Can I?

Regressions:
                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
  g ~                                                                    
    k                 -0.201    0.029   -6.830    0.000   -0.266   -0.266
  binary01 ~                                                      
    item11             0.174    0.035    4.927    0.000    0.174    0.201
    g                  0.320    0.045    7.174    0.000    0.293    0.287

2. What does this mean, it is given in the output?

Scales y*:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
    binary01    1.000                               1.000    1.000


3. When I use lavaan() the model does not really run (most estimates are zero or the same value) but it does run with cfa() or sem(). And the error at the end is:

Warning message:
In cov2cor(Sigma.hat) :
  diag(.) had 0 or NA entries; non-finite result is doubtful

??

THANKS!!

Terrence Jorgensen

unread,
Jan 5, 2018, 9:49:40 AM1/5/18
to lavaan
1. How to interpret the coefficients of my final regression. Ideally, I would like to use logistic regression so that I could interpret the coefficient of g and item11 as odds ratios. Can I?

No, lavaan sets link = "probit" by default, but there is an experimental link = "logit" option you could try (see ?lavOptions).  Logit transforms probabilities into log-odds, whereas probit transforms probabilities into z-scores from a standard normal distribution (e.g., a probability of 50% is associated with a z-score of 0, and probability of 84% is associated with a z-score of 1).  So you can interpret your effects as changes in units of SD for the transformed outcome variable.  

This is equivalent to assuming there is a normally distributed "latent item response" underlying the binary outcome, although that might not make sense if the variable is truly dichotomous rather than a supposedly continuous construct that is simply measured with a binary scale.  You can read more about this here:



2. What does this mean, it is given in the output?

Scales y*:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
    binary01    1.000                               1.000    1.000

This is the SD of the latent item response, described above.

3. When I use lavaan() the model does not really run

This is probably because your model syntax does not specify all model parameters, particularly for the categorical outcome that you are unfamiliar with (e.g., a threshold).  Using the sem() function turns on some sensible defaults.  You can see what they are from your sem() output:

dfit@call


Terrence D. Jorgensen
Postdoctoral Researcher, Methods and Statistics
Research Institute for Child Development and Education, the University of Amsterdam

Paula M Luz

unread,
Jan 5, 2018, 10:35:39 AM1/5/18
to lavaan
Thanks so much, Terrence, very helpful indeed!

Your explanation of #1 makes sense and the papers are interesting. Unfortunately I can't assume a continuous construct that was measured on a binary scale. We are measuring a behavior that either individuals did or did not perform.

I tried the experimental link="logit" but it did not work:

Warning message:
In lav_options_set(opt) :
  lavaan WARNING: link will be set to “probit” for estimator = “DWLS”

Maybe because I ran it in sem() and not lavaan(). Does it not work in sem()? (I got the same output as without the link="logit" using sem())

In lavaan(), I tried adding the estimator="MML" as I read in the description that it would be necessary but I am getting an error:

> dfit = lavaan(model = dmodel,
+            data = aux,
+            std.lv = FALSE,
+            ordered = c("binary01"), estimator = "MML", link = "logit")
Error in lav_model_estimate(lavmodel = lavmodel, lavsamplestats = lavsamplestats,  : 
  lavaan ERROR: initial model-implied matrix (Sigma) is not positive definite;
  check your model and/or starting parameters.
In addition: Warning message:
In lav_options_set(opt) :
  lavaan WARNING: link will be set to “probit” for estimator = “MML”

To me this reads related to your response to my question 3: I am not specifying something.... could you expand on what dfit@call means? Thanks and sorry if this is too basic... if it is, just point me to a reference, if you can spare the time.

Thanks so much

Terrence Jorgensen

unread,
Jan 5, 2018, 10:54:33 AM1/5/18
to lavaan
I can't assume a continuous construct that was measured on a binary scale. We are measuring a behavior that either individuals did or did not perform.

That does not matter, you can still use the probit link.  It is just an arbitrary transformation.  Any book on categorical data analysis will tell you that results from logistic and probit regression area almost always nearly identical (e.g., see p. 70 at the link below)


I tried the experimental link="logit" but it did not work:

As I said, it is only experimental, so you should just use the probit regression.

Maybe because I ran it in sem() and not lavaan(). Does it not work in sem()? (I got the same output as without the link="logit" using sem())

sem() is just a wrapper around lavaan() with certain defaults turned on (see ?lavOptions), so the same arguments apply to both.  

could you expand on what dfit@call means?

You "call" an R function.  The following is a call to the round() function, with 2 arguments

round(pi, digits = 2)

A lavaan object saves the call to lavaan() in a "slot", that you access using the @ operator.  So dfit@call simply shows you what the lavaan() call looked like when you used the sem() function (which calls lavaan() with certain defaults set).  From the ?cfa help page:

HS.model <- ' visual  =~ x1 + x2 + x3
              textual =~ x4 + x5 + x6
              speed   =~ x7 + x8 + x9 '

fit
<- cfa(HS.model, data = HolzingerSwineford1939)
fit@call
# if you evaluate this call, you get the same result
fit2 <- lavaan(model = HS.model, data = HolzingerSwineford1939, 
               model.type = "cfa", int.ov.free = TRUE, int.lv.free = FALSE, 
               auto.fix.first = TRUE, auto.fix.single = TRUE, auto.var = TRUE, 
               auto.cov.lv.x = TRUE, auto.cov.y = TRUE, auto.th = TRUE, 
               auto.delta = TRUE)

Paula M Luz

unread,
Jan 5, 2018, 2:03:29 PM1/5/18
to lavaan
Hi Terrence

Thanks again for the replies, including the link to the book. I see how the probit would work, makes sense.

Thanks also for the explanation regarding the @call, I used it to get the entire coding for running lavaan() (like your example) and it ran fine.

Thanks!!
Reply all
Reply to author
Forward
0 new messages