Multilevel path analysis question

353 views
Skip to first unread message

blairmid

unread,
Dec 30, 2022, 1:23:54 PM12/30/22
to lavaan
Hello,

I am trying to conduct a path analysis on a parallel, serially mediated model (as pictured below) using multilevel data (individuals organized into teams). I have been receiving errors when trying to run the analysis and I am hoping someone may be able to help me figure out if my code is incorrect, or if there may be another cause (e.g., my dataset not being big enough). 

This data was captured in an organization where half of the teams received an intervention (the IV), and half of the teams did not. Thus, since there is no within-group variation on the IV I assume this is a level 2 (team) variable, but we are specifically interested in level 1 (individual) effects. Other than the IV, all variables were measured at the individual level. I am planning to use these variables as observed (rather than latent, composed of the specific items we used to measure the variables).

My main questions are:
  • Is the code for the path analysis written correctly for estimating the effects among the variables?
  • Is the code for the path analysis written correctly for capturing the multilevel effects

Here is the model:

Path1.jpg

And here is the code I have written for the path analysis:
(Note: IV = independent variable, M1 = mediator 1, M2a = mediator 2a, M2b = mediator 2b, DV1 = dependent variable 1, DV2 = dependent variable 2)

model_1 <- '

level:1
M2a ~ k*M1
M2b ~ m*M1

DV1 ~ h1*M2a + j1*M2b 
DV2 ~ h2*M2a + j2*M2b

level: 2
M1 ~ a*IV
 
M2a ~ b1*M1 + c1*IV
M2b ~ b2*M1 + c2*IV
 
DV1 ~ d1*M2a + e1*M2b + f1*M1 + g1*IV
DV2 ~ d2*M2a + e2*M2b + f2*M1 + g2*IV
 
indirect_M2a := a*b1
indirect_M2b := a*b2
total_M2a := c1 + (a*b1)
total_M2b := c2 + (a*b2)
 
indirect_DV1_via_M2a := a*b1*d1 
indirect_DV1_via_M2b := a*b2*e1

indirect_DV2_via_M2a := a*b1*d2
indirect_DV2_via_M2b := a*b2*e2

total_DV1 := (a*b1*d1) + (a*b2*e1) + (c1*d1) + (a*f1) + (g1)
total_DV2 := (a*b1*d2) + (a*b2*e2) + (c2*e2) + (a*f2) + (g2)
'


fit_model_1 <- sem(model = model, data = data_set, cluster = "Team_ID")
summary(fit_model_1)



Here is a visual representation of the coefficients specified in the code:
Path Model 2.jpg



Any help that anyone can provide on this analysis would be greatly appreciated! Thank you in advance!!

Terrence Jorgensen

unread,
Jan 18, 2023, 10:35:00 AM1/18/23
to lavaan
I don't see any obvious problems with the syntax.  Can you copy/paste the error message you mentioned?

Terrence D. Jorgensen
Assistant Professor, Methods and Statistics
Research Institute for Child Development and Education, the University of Amsterdam

blairmid

unread,
Jan 18, 2023, 12:37:11 PM1/18/23
to lavaan
I have run this with two different data sets (separate data collections), and get the following errors. It looks like issues are different with each set.

On the first set of data:
Warning in lavaan::lavaan(model = model_1, data = data_set_1, cluster = "Team_ID", : lavaan WARNING: the optimizer warns that a solution has NOT been found! Warning in lavaan::lavaan(model = model_1, data = data_set_1, cluster = "Team_ID", : lavaan WARNING: estimation of the baseline model failed. lavaan 0.6-8 did NOT end normally after 68 iterations ** WARNING ** Estimates below are most likely unreliable

Screenshot:
Screen Shot 2023-01-18 at 9.30.52 AM.png

On the second set of data:
Warning in lav_data_full(data = data, group = group, cluster = cluster, : lavaan WARNING: Level-1 variable “M2b” has no variance within some clusters. The cluster ids with zero within variance are: 1102 1002 106 1 Warning in lav_data_full(data = data, group = group, cluster = cluster, : lavaan WARNING: Level-1 variable “DV1” has no variance within some clusters. The cluster ids with zero within variance are: 7 Warning in lav_data_full(data = data, group = group, cluster = cluster, : lavaan WARNING: Level-1 variable “DV2” has no variance within some clusters. The cluster ids with zero within variance are: 1101 Warning in lav_data_full(data = data, group = group, cluster = cluster, : lavaan WARNING: Level-1 variable “M1” has no variance within some clusters. The cluster ids with zero within variance are: 1102 4 Warning in lavaan::lavaan(model = model_2, data = data_set_2, cluster = "Team_ID", : lavaan WARNING: the optimizer (NLMINB) claimed the model converged, but not all elements of the gradient are (near) zero; the optimizer may not have found a local solution use check.gradient = FALSE to skip this check. lavaan 0.6-8 did NOT end normally after 378 iterations ** WARNING ** Estimates below are most likely unreliable

Screenshot:
Screen Shot 2023-01-18 at 9.33.08 AM.png

Thank you for your help!

Terrence Jorgensen

unread,
Jan 19, 2023, 4:31:01 AM1/19/23
to lavaan
It is hard to know why the optimizer can't find a good set of estimates to reproduce your data.  It can happen when there aren't enough data to estimate the number of parameters, but it can also happen when there is plenty of data and your model parameters just aren't sufficient to reproduce your data.  

The warning about no variance within some clusters can usually be ignored if your model converges, but it could cause nonconvergence if most/all clusters have no variance to model at the specified level of analysis (between or within).  It only looks like you have a few clusters in which those variables are constants, so I wouldn't expect it to be a problem, unless you only have a few clusters to begin with (in which case, MLSEM is not a good idea).  What is your overall N and number of clusters in each analysis?

Can you get convergence if you simply fit a single-level model and specify the cluster= variable, to request cluster-robust SEs and test statistics?  That is definitely preferable to MLSEM in smaller samples (N < 1000, fewer than 100-200 clusters).

blairmid

unread,
Jan 19, 2023, 11:38:14 AM1/19/23
to lavaan
Thanks so much for this recommendation, I didn't realize that cluster-robust stats were an option in lavaan. This seems like a much better approach, as our data set is fairly small: the first study includes 298 individuals in 6 teams, and the second study includes 354 individuals in 85 teams. It looks like this approach resolved the issue on the second data set, as no errors appear when running this code for that data set. However, when using this method for the first set of data, the previous errors no longer appear but this new error occurs:  

Warning in lav_model_vcov(lavmodel = lavmodel, lavsamplestats = lavsamplestats, : lavaan WARNING: The variance-covariance matrix of the estimated parameters (vcov) does not appear to be positive definite! The smallest eigenvalue (= -3.577269e-16) is smaller than zero. This may be a symptom that the model is not identified.

Screenshot:
Screen Shot 2023-01-19 at 8.32.24 AM.png

Do you have any suggestions for how we might resolve this issue or how we might diagnose the cause of this error?

Thank you so, so much for your insight! This is immensely helpful!

Terrence Jorgensen

unread,
Jan 20, 2023, 9:33:54 AM1/20/23
to lavaan
Do you have any suggestions for how we might resolve this issue or how we might diagnose the cause of this error?

It's only a warning, not an error.  With only 6 clusters, I'm not sure cluster-robust SEs are stable.  You could instead create 5 dummy codes to partial out the cluster differences as fixed effects. 

blairmid

unread,
Jan 23, 2023, 7:22:07 PM1/23/23
to lavaan
Thanks again for your help, I really appreciate it! 

When I conduct the analysis with team ID as a categorical variable as a control, the warning no longer appears. While this is great for this set of data, it does mean that I would have to use slightly different analytic approaches across my two sets of data. Because the second data set has so many groups, I cannot use this same approach of modeling in the group ID as a control variable as I was able to do for the first set of data, but this first data set produces a warning message when I try to use cluster-robust SEs as I was able to do for the second set of data.

In thinking about my options for trying to use the same approach for both sets of my data...can I still use the output that had the warning message about the variance-covariance matrix (discussed in the previous message), or would that output not be valid/accurate to use? Are there any other solutions to the issues with this data set that might still allow for us to use the same approach across the two data sets?

As a separate question but related to the structure of this code, if I want to control for T1 measures of my variables, should T1 measures of all 6 of my variables be included in every one of the regression equations, or just in select regression equations, such as only controlling for T1 of the variable when it is the DV in the equation, or only controlling for T1 of the variable when it is one of the predictors in the regression equation?

Thank you!

Yves Rosseel

unread,
Jan 30, 2023, 9:29:01 AM1/30/23
to lav...@googlegroups.com
> *can I still use the output that had the warning
> message about the variance-covariance matrix (discussed in the previous
> message)

Yes you can, but I would mention the warning the paper.

> any other solutions to the issues with this data set that might still
> allow for us to use the same approach across the two data sets?*

I would start with a much smaller model, and then gradually add more
variables to see when this warning appears.

Yves.
Reply all
Reply to author
Forward
0 new messages