Latent Growth Curve Model: covariates and differences between growth & sem functions

482 views

Skip to first unread message

May

unread,

Jul 23, 2021, 2:23:38 AM7/23/21

to lavaan

Dear Lavaan community,

I am working on building a latent growth curve model of depressive symptoms (continuous) across 4 time points (unequal spacing) during adolescence. I am testing a no-growth model, a linear model, and a quadratic model. The quadratic model had the best fit. I then needed to add multiple time-invariant and time-varying covariates to the model.

When I ran the model without covariates, the results made sense base on previous research and the means in my dataset (i.e., linear slope is significant and positive, quadratic slope is significant and negative). However, when I added the covariates, although I am getting fine fit indices, the results don't make sense anymore, both slopes are not significant and their signs are opposite. This lead me to think that perhaps the way I am specifying my model is incorrect (see syntax below).

mod6<-'

# factor loadings

mfq_i=~ 1*mfq10 + 1*mfq16 + 1*mfq17 + 1*mfq23

mfq_s=~ 0*mfq10 + 0.603*mfq16 + 0.719*mfq17 + 1.224*mfq23

mfq_q=~ 0*mfq10 + 3.636*mfq16 + 5.170*mfq17 + 14.982*mfq23

# time-invariant covariates

mfq_i~sex+spar+educ2

mfq_s~sex+spar+educ2+tannerph

mfq_q~sex+spar+educ2+tannerph

# time-varying covariates

mfq10~fmi9+crp9+meds9

mfq16~fmi15+crp15+meds15+ageq16+oc15+smok15+drink15

mfq17~fmi17+crp17+meds17+ageq17+oc17+smok17+drink17

mfq23~fmi24+crp24+meds24+ageq23+smok24+drink24

fit6<-growth(mod6,data=mydata, estimator='MLR', missing="FIML.x")

summary(fit6, fit.measures=TRUE, standardized=TRUE)

In this syntax I am using the growth() function and I followed the syntax from the lavaan tutorial. I noticed that using this function, it sets the means of the manifest variables to zero, but freely estimates their variance. It is also giving the covariance between the factors.

However, for the covariates, it only estimates their regressions with the factors, and does not estimate their means, variance, nor covariance between covariates. Is that a proper way to run a LGCM with covariates? Should I add these components to my model, and if this is the case, would I need to change anything in the final fit line?

I also found separate syntax from Pennstate quantdev tutorials that I tried to follow: one for including time-invariant and one for time-varying covariate (showing an example when only one covariate is included), so I am not sure exactly how to combine those and I have several critical questions.

In their example when they were using the sem() function, they set the factor means to be zero, but then also set the manifest variance to be equal, which is different than what the growth function does. Which one is the appropriate way to do?

For time invariant covariates, they included covariance between the two covariates, and freely estimated their means and variance.

However for the time-varying covariate example, they are freely estimating the means but they are imposing equality constraints on both their variance and their regressions with the factors. What is the purpose of adding these equality constraints? Is this really accounting for the effects of the time-varying covariates in the different waves if making them equal? They also did not include covariance between the covariates, but they are likely correlated... Is this step unnecessary if imposing equality constraints? This is confusing to me because they did estimated covariance between two time-invariant covariates in the other syntax.

When I tried to build my models in the sem function based on these scripts I am getting awful fit indices, and I am also getting warning messages that the cov matrix is not positive definitive and that some lv variance is negative. I am not sure if this is because I am setting up my model incorrectly or if this is related to some of my variables, but I want to make sure first that I am setting up my model properly.

I would highly appreciate any input you may have. I am including the links for the tutorials I was referring to.

Thank you so much!

May

Links:

https://quantdev.ssri.psu.edu/tutorials/growth-modeling-chapter-5-linear-growth-models-time-invariant-covariates

https://quantdev.ssri.psu.edu/tutorials/growth-modeling-chapter-8-multivariate-growth-models-and-dynamic-parameters

Terrence Jorgensen

unread,

Jul 23, 2021, 6:14:02 AM7/23/21

to lavaan

when I added the covariates, although I am getting fine fit indices, the results don't make sense anymore, both slopes are not significant and their signs are opposite.

Please be clear whether by "slopes" you are referring (a) to latent slopes or (b) to the effects of covariates on the growth factors.

If (a), are you saying the intercepts or the residuals variances (or both) are no longer significant. Recall that intercepts are expected values when predictors = 0, so the significance of the intercepts of growth factors could be rather arbitrary, depending how your predictors of growth are centered. If your residual variances are not significant, perhaps your covariates are explaining a lot of the individual differences in growth factors.

In this syntax I am using the growth() function and I followed the syntax from the lavaan tutorial. I noticed that using this function, it sets the means of the manifest variables to zero

Their intercepts are zero. Their means are a function of growth factors, which are functions of predictors of growth. You can see model-implied means with lavInspect(fit, "mean.ov")

However, for the covariates, it only estimates their regressions with the factors, and does not estimate their means, variance, nor covariance between covariates.

If the predictors are exogenous, then lavaan's default setting fixed.x=TRUE will simply take their summary statistics as given, rather than treating them as random variables, so that no assumptions need to be made about their distribution (similar to OLS regression). Not a problem, unless you have missing data on exogenous variables and want to use FIML, in which case missing = "fiml.x" is an option.

I also found separate syntax from Pennstate quantdev tutorials ... when they were using the sem() function, they set the factor means to be zero, but then also set the manifest variance to be equal, which is different than what the growth function does. Which one is the appropriate way to do?

It is less restrictive to allow for heteroskedasticity of residuals. I expect they imposed that restriction simply so the results would match lme4::lmer() results, which are also in the Ch 5 tutorial. lmer() does not allow for heteroskedastic residuals, although nlme::lme() does, which they show in the Ch 8 tutorial (but it is heteroskedasticity across "grp" rather than time/"grade", because they use varIdent() to equate across time within each group).

For time invariant covariates, they included covariance between the two covariates, and freely estimated their means and variance.

Again, if the covariates are exogenous predictors, you don't need to do that, because they will be taken as given by default (fixed.x = TRUE)

However for the time-varying covariate example, they are freely estimating the means but they are imposing equality constraints on both their variance and their regressions with the factors. What is the purpose of adding these equality constraints?

Again, to make it equivalent to the model they fitted with nlme(). That model only had a main effect of "spring", so there was no grade*spring interaction. If you leave the regressions unrestricted across spring2 -- spring 8, then the effect of that covariate changes across grades, which represents moderation of spring's effect by grade. In fact, leaving them unrestricted does not even impose the linearity constraint on the form of the moderation, which is probably why they didn't open that can of worms. To achieve that in nlme(), they would have needed a dummy code for each level of spring to allow for a unique effect at each time, where as the usual spring*grade interaction term is only a single parameter allowing for linear change in spring's effect across grade.

Basically, it is an empirical question whether your time-varying covariate's have a constant effect across time. You can fit the model with and without the equality constraints, then use lavTestLRT() to test the H0 of no moderation.

When I tried to build my models in the sem function based on these scripts I am getting awful fit indices, and I am also getting warning messages that the cov matrix is not positive definitive and that some lv variance is negative. I am not sure if this is because I am setting up my model incorrectly or if this is related to some of my variables, but I want to make sure first that I am setting up my model properly.

Hard to tell, but you do have MANY time-varying covariates, and each growth indicator is only regressed on a subset of them. (Similarly, your latent intercept is not regressed on tannerph.) So there are lots of restrictions in the way those covariates can be related to growth indicators. If your model fits poorly, check your lavResiduals() to see which relationships your model fails to reproduce well (e.g., model-implied means/(co)variances within 0.1 SD of observed ones). It may give you clues about which constraints are not viable.

Terrence D. Jorgensen

Assistant Professor, Methods and Statistics

Research Institute for Child Development and Education, the University of Amsterdam