Wrong Rsqared in Logit second stage moderated mediation

162 views
Skip to first unread message

kleines Peh

unread,
Mar 27, 2018, 1:23:13 PM3/27/18
to lavaan
Hi there

I am investigating a second-stage moderated mediation (Model 14 in Hayes templates), so I have the path between the mediator (M) and the dependent variable (Y) moderated by a variable V (which i don't mean center). 
I have dichotomous independent (X) and dependent variable (Y) and continuous Mediator (M) and Moderator (V). I try to replicate what I found in SPSS now with lavaan.
Now I encounter one major difference/problem:
when I run the model in lavaan, I get a Rsquared for the Y-model that is through the roof (0.975 vs. 0.2725 in SPSS). I tried both link = "logit" and the default probit.
I am not sure what's going on....

my lavaan code looks like this:

# create interaction of M and V
data$IA_MV<-data$M*data$V

#values of V (for adding them in the indirect effect conditional on moderator later)
mean(data$V) # 69.37594
mean(data$V)-sd(data$V) #  65.66968
mean(data$V)+sd(data$V) #  73.0822

model14<- ' # regressions
M~ a1*X
Y ~ b1*M
Y ~ b2*V
Y ~ b3*IA_MV
Y ~ cdash*X

# index of moderated mediation
index := a1*b3

# indirect effects conditional on moderators: a1*(b1 + b3*V)
SDbelow:= a1*b1+a1*b3*65.66968
average := a1*b1+a1*b3*69.37594
SDabove := a1*b1+a1*b3*73.0822
'
# fit model
f.model14 <- sem(model = model14,
                  data = data,
                  se = "bootstrap",
                  bootstrap = 1000, 
                 link = "logit")
# fit measures
summary(f.model14,
        fit.measures = TRUE,
        rsquare = TRUE)

On a side note: is there a way to add a constant to the model?

Thank you so much, I really appreciate your support!

Terrence Jorgensen

unread,
Mar 28, 2018, 7:40:50 AM3/28/18
to lavaan
when I run the model in lavaan, I get a Rsquared for the Y-model that is through the roof (0.975 vs. 0.2725 in SPSS).

Which pseudo-R-squared does SPSS report for a logistic regression?  There are numerous proposed:


I also like Tjur's, not shown in the link above:


lavaan runs probit regression with DWLS estimation, which starts by estimating polychoric correlations among the variables (treating your interaction as a separate variable, by the way).  So lavaan's R-squared is the proportion of latent-response variance that is explained by the predictors.  That could be quite different than pseudo-R-squared based on proportions or on log-likelihoods of target vs. empty models. 

I tried both link = "logit" and the default probit.
I am not sure what's going on....

Did you notice a warning saying that it switch the link back to probit?  That's the only method available, even when you request MML estimation in the latest development version:

sem('grade ~ x1 + x2', data = HolzingerSwineford1939, ordered = c("grade"), link = "logit", estimator = "MML")

Error in lav_model_gradient_mml(lavmodel = lavmodel, GLIST = GLIST, THETA = THETA[[g]],  : 
  logit link not implemented yet; use probit

Not sure under what circumstances the experimental logit link would actually work yet.

Terrence D. Jorgensen
Postdoctoral Researcher, Methods and Statistics
Research Institute for Child Development and Education, the University of Amsterdam

kleines Peh

unread,
Mar 28, 2018, 8:12:00 AM3/28/18
to lav...@googlegroups.com
Thanks for your quick reply and the pointers. 

The Pseudo Rsquared above is the McFadden, but also CoxSnell (. 3108) and Nagelkerke (. 4172) are way below the one I get from lavaan. 
I was worried that I did something wrong in my lavaan code?

I didn't get a warning message when running with the logit link. Thanks for the clarification, it seems to be ignored as the output is actually exactly the same. 



--
You received this message because you are subscribed to the Google Groups "lavaan" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lavaan+un...@googlegroups.com.
To post to this group, send email to lav...@googlegroups.com.
Visit this group at https://groups.google.com/group/lavaan.
For more options, visit https://groups.google.com/d/optout.

Terrence Jorgensen

unread,
Mar 28, 2018, 8:57:24 AM3/28/18
to lavaan
I didn't get a warning message when running with the logit link. Thanks for the clarification, it seems to be ignored as the output is actually exactly the same. 

you can check whether it was ignored

lavInspect(fit, "options")$link

kleines Peh

unread,
Apr 5, 2018, 5:48:30 AM4/5/18
to lav...@googlegroups.com
Hi Terrence
I really like Tjurs pseudo Rsquared - thanks for the recommendation!
Do I understand correctly, however, that I cannot use lav.predict to get the mean of the predicted probabilities? (I am referring to the following setence in the lavpredict description:  "Note that this function can not be used to ‘predict’ values of dependent variables, given the values of independent values (in the regression sense")
Is there another way to get the mean of the predicted probabilities for the two events from my model?

Thanks again

--
You received this message because you are subscribed to the Google Groups "lavaan" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lavaan+unsubscribe@googlegroups.com.

Terrence Jorgensen

unread,
Apr 5, 2018, 8:26:23 AM4/5/18
to lavaan
Is there another way to get the mean of the predicted probabilities for the two events from my model?

At this point, you would need to write out your regression equation and calculate the predicted probabilities yourself.  But you do not have any latent common factors in your model, it is just a path analysis.  So you can run the separate regression models in glm() and use the predict() function to save your predicted probabilities

?predict.glm

kleines Peh

unread,
Apr 5, 2018, 8:33:19 AM4/5/18
to lav...@googlegroups.com
Thank you so much for your quick and helpful reply! 

--
You received this message because you are subscribed to the Google Groups "lavaan" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lavaan+un...@googlegroups.com.

kleines Peh

unread,
May 27, 2018, 11:40:24 AM5/27/18
to lavaan
Hi Terrence

I did what you suggested and i found some differences in the coefficients of the second stage, resulting in different predicted probabilities.

Here is what I do with lavaan
# create interaction of M and V
data$IA_MV<-data$M*data$V

model14<- ' # regressions
M~ 1+a1*X
Y ~ b1*M
Y ~ b2*V
Y ~ b3*IA_MV
Y ~ 1+cdash*X
'

# fit model
f.model14 <- sem(model = model14,
                  data = data,
                  se = "bootstrap",
                  bootstrap = 1000, 
                 link = "probit")
These are the coefficient:
M~1=  6.526; a1=  -0.877; b1=   -0.962; b2= -0.094; b3= 0.014; cdash= 0.123; Y~1= 6.521   

This is how i did it with the separate regression models:
M.model<-lm(M~X, data=data)
Y.model<-glm(Y~X+M+V+IA_MV,  data=data, family = binomial(link = "probit")) 
These are the coefficients:
M~1=6.526 ; a1=-0.877;b1=-4.132; b2=-0.401; b3=0.060; cdash=0.454; Y~1=26.075

When i now predict the choice probabilities:
lavaan: pred_lavaan<-pnorm( 6.521+0.123*X -0.962*M-0.094*V+0.014*IA_MV) 
regression model: pred_regression<- predict(Y.model, data, type="response", se.fit=TRUE)

If I binarise at .5 as decision boundary (ifelse(pred_*<0.499, 0, 1)), I get different choices:
                     
True choice:    42.86 % yes
lavaan choice: 91.73 % yes
regression choice: 39.85 % yes

Why are there different? Am I doing something wrong?
Thanks

On Thursday, April 5, 2018 at 2:33:19 PM UTC+2, kleines Peh wrote:
Thank you so much for your quick and helpful reply! 

On Thu, Apr 5, 2018, 2:26 PM Terrence Jorgensen <tjorge...@gmail.com> wrote:
Is there another way to get the mean of the predicted probabilities for the two events from my model?

At this point, you would need to write out your regression equation and calculate the predicted probabilities yourself.  But you do not have any latent common factors in your model, it is just a path analysis.  So you can run the separate regression models in glm() and use the predict() function to save your predicted probabilities

?predict.glm


Terrence D. Jorgensen
Postdoctoral Researcher, Methods and Statistics
Research Institute for Child Development and Education, the University of Amsterdam

--
You received this message because you are subscribed to the Google Groups "lavaan" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lavaan+unsubscribe@googlegroups.com.

Terrence Jorgensen

unread,
Jun 2, 2018, 10:10:39 AM6/2/18
to lavaan
Why are there different? 

Because your regression estimates are different.  I suppose they are different because in the SEM, V (and the interaction?) is related to M, but you are only controlling for its effect on Y.  
Reply all
Reply to author
Forward
0 new messages