problem to define a multilevel latent growth curve model (using the cluster option in lavaan)

323 views
Skip to first unread message

fenn.st...@gmail.com

unread,
Oct 6, 2018, 5:23:13 AM10/6/18
to lavaan
Dear group members,

I am writing my master thesis at the moment and I want to fit a multilevel latent growth curve model (using the cluster option in lavaan) concerning pupils and schools as grouping variables. Therefore I have the three following questions:

(1) Is it possible to fit a multilevel latent growth curve model by the SEM framework in lavaan? Curran (2002) page 536* wrote referring the difference between mixed models and latent growth curve models: The assumption of independence of observations is highlighted in the standard estimation of the SEM in that the discrepancy function is based on a single aggregate sample covariance matrix that allows for covariance structures only at a single level of analysis; the covariance structure within any other level of nesting is assumed to be null.


(2) If yes is my specification of the "configural / formative construct"** model, where the construct of interest is both within and between level right?

model = "
level: within
# intercept and slope with fixed coefficients
i =~ 1*t1 + 1*t2 + 1*t3
s =~ 0*t1 + 1*t2 + 2*t3

t1 ~~ resvar*t1   
t2 ~~ resvar*t2
t3 ~~ resvar*t3

level: between
i =~ 1*t1 + 1*t2 + 1*t3
s =~ 0*t1 + 1*t2 + 2*t3

t1 ~~ 0*t1   
t2 ~~ 0*t2
t3 ~~ 0*t3

i  ~~ s
"

growth(model, data=data, likelihood = "normal", cluster = "id_school")

->see my example attached (this show also the equivalence of a multilevel latent growth curve model to an mixed model (equivalence see for example ***), but the slope variance near 0 leads to -1 correlation, but the example is just for illustrative purpose)


(3) Is is I possible to define an “between only construct” model**, where the construct of interest is the between level only, because I am only interested in the school effect.

-> see my graphic attached



Thank you in advance.

Best regards

Julius Fenn



* Curran, P. J. (2003). Have multilevel models been structural equation models all along?. Multivariate Behavioral Research, 38(4), 529-569.

** see slide 79 respectively 75: http://users.ugent.be/~yrosseel/lavaan/zurich2017/MULTILEVEL/lavaan_multilevel_zurich2017.pdf
or page 488: Stapleton, L. M., Yang, J. S., & Hancock, G. R. (2016). Construct meaning in multilevel settings. Journal of Educational and Behavioral Statistics, 41(5), 481-520.

*** Bauer, D. J. (2003). Estimating multilevel linear models as structural equation models. Journal of Educational and Behavioral Statistics, 28(2), 135-167.
concept between only construct model.jpg
comparison HLM, SEM - multilevel structure.R

Terrence Jorgensen

unread,
Oct 11, 2018, 5:51:44 AM10/11/18
to lavaan
(1) Is it possible to fit a multilevel latent growth curve model by the SEM framework in lavaan?

Yes.

(2) If yes is my specification of the "configural / formative construct"** model, where the construct of interest is both within and between level right? 

You need to give different names to constructs at different levels (e.g., i.w and s.w at the within level, i.b and s.b at the between level).

(3) Is is I possible to define an “between only construct” model**, where the construct of interest is the between level only, because I am only interested in the school effect. 

You measured at the pupil level, so you need some kind of model for those responses.  I'm not even sure it makes sense to think of *only* the school-level change in individual-level variables.  At the very least, you might need the same model at both levels and measurement variance across levels to draw valid inferences at the school level.  Here is some important background reading.



Terrence D. Jorgensen
Postdoctoral Researcher, Methods and Statistics
Research Institute for Child Development and Education, the University of Amsterdam



Carolina Rojas Cordova

unread,
Oct 11, 2018, 8:08:29 AM10/11/18
to lav...@googlegroups.com
Dear Terrence, 

I just starting to use lavaan, and I would like to ask you a recommendation about bibliography for beginners to learn structural equation with lavaan, or maybe if you have any online classes where you teach the methods.
Thank you in advance for the information!
Best Regards

Carolina


--
You received this message because you are subscribed to the Google Groups "lavaan" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lavaan+un...@googlegroups.com.
To post to this group, send email to lav...@googlegroups.com.
Visit this group at https://groups.google.com/group/lavaan.
For more options, visit https://groups.google.com/d/optout.

fenn.st...@gmail.com

unread,
Oct 12, 2018, 5:33:08 AM10/12/18
to lavaan
Dear Carolina,

there is a course at DataCamp (which I not tried myself): https://www.datacamp.com/courses/structural-equation-modeling-with-lavaan-in-r
If you want to read a basic introduction I recommend the webpage of Clark (with application in lavaan): https://m-clark.github.io/sem/sem.html


I meanly teaching myself Structural Equation Models (SEM) with books and journals. Different books I recommend are:

The book "Structural Equations with Latent Variables" from Bollen (1989) is great to understand the basic principles of structural equation models (https://onlinelibrary.wiley.com/doi/book/10.1002/9781118619179)


Overview of different topics, implementation framework of SEM at page 7 (quite helpful to structure the different steps in SEM): Hoyle, R. H. (Ed.). (2012). Handbook of structural equation modeling. Guilford press.

Focusing longitudinal applications with examples in R / Mplus: Newsom, J. T. (2015). Longitudinal structural equation modeling: A comprehensive introduction. Routledge.

-> Personally for me the books of Bollen representing a milestone in understanding SEM.


### additional:
You can teach yourself different statistical methods from linear regression to vector machines in the Standford online course "An Introduction to Statistical Learning" here: http://www-bcf.usc.edu/%7Egareth/ISL/


Best regards
Julius

fenn.st...@gmail.com

unread,
Oct 17, 2018, 12:26:08 PM10/17/18
to lavaan
Dear Jorgensen,

thank you for your answer. Could you help me specifying the constraints? In the context of latent growth models what means measurement variance? Some kind of restricting the latent intercept / slope? Because the factor loadings to the manifest variables are fixed (which manifest one of the differences to a mixed model).

I think for my data a multilevel model does not work (?). The only way I could avoid negative variances for the slope at the between level is by specifying the variances of the latent intercept and slopes equal between the levels, and I have to restrict the error variances at the between level (highlighted green in the output). Do you know this phenomenon? Which seems for me to force the model to fit with unreasonable constraints.

lavaan 0.6-2 ended normally after 59 iterations

  Optimization method                           NLMINB
  Number of free parameters                         20
  Number of equality constraints                     6

                                                  Used       Total
  Number of observations                          1113        1355
  Number of clusters [id_schule]                    41

  Estimator                                         ML
  Model Fit Test Statistic                      78.919
  Degrees of freedom                                10
  P-value (Chi-square)                           0.000

Parameter Estimates:

  Information                                 Observed
  Observed information based on                Hessian
  Standard Errors                             Standard


Level 1 [within]:

Latent Variables:
                   Estimate  Std.Err  z-value  P(>|z|)
  iW =~                                               
    theta0            1.000                           
    theta1            1.000                           
    theta2            1.000                           
  sW =~                                               
    theta0            0.000                           
    theta1            1.000                           
    theta2            2.000                           

Regressions:
                   Estimate  Std.Err  z-value  P(>|z|)
  iW ~                                                
    geschlecht_all    0.178    0.057    3.144    0.002
    migrat_dmmy_ll    0.370    0.089    4.175    0.000
    sescen            0.009    0.002    4.944    0.000
  sW ~                                                
    geschlecht_all    0.000                           
    migrat_dmmy_ll    0.020    0.050    0.393    0.694
    sescen            0.000                           

Covariances:
                   Estimate  Std.Err  z-value  P(>|z|)
 .iW ~~                                               
   .sW               -0.015    0.031   -0.497    0.619

Intercepts:
                   Estimate  Std.Err  z-value  P(>|z|)
   .iW      (rImn)   -0.149    0.096   -1.552    0.121
   .sW      (rSVr)    0.391    0.028   14.147    0.000
   .theta0            0.000                           
   .theta1            0.000                           
   .theta2            0.000                           

Variances:
                   Estimate  Std.Err  z-value  P(>|z|)
   .iW      (rIsl)    0.638    0.067    9.480    0.000
   .sW      (rSsl)    0.026    0.022    1.193    0.233
   .theta0            0.786    0.064   12.233    0.000
   .theta1            0.543    0.033   16.241    0.000
   .theta2            0.430    0.048    9.016    0.000


Level 2 [id_schule]:

Latent Variables:
                   Estimate  Std.Err  z-value  P(>|z|)
  iB =~                                               
    theta0            1.000                           
    theta1            1.000                           
    theta2            1.000                           
  sB =~                                               
    theta0            0.000                           
    theta1            1.000                           
    theta2            2.000                           

Covariances:
                   Estimate  Std.Err  z-value  P(>|z|)
  iB ~~                                               
    sB               -0.142    0.049   -2.901    0.004

Intercepts:
                   Estimate  Std.Err  z-value  P(>|z|)
    iB      (rImn)   -0.149    0.096   -1.552    0.121
    sB      (rSVr)    0.391    0.028   14.147    0.000
   .theta0            0.000                           
   .theta1            0.000                           
   .theta2            0.000                           

Variances:
                   Estimate  Std.Err  z-value  P(>|z|)
    iB      (rIsl)    0.638    0.067    9.480    0.000
    sB      (rSsl)    0.026    0.022    1.193    0.233
   .theta0  (rsv2)    0.029    0.010    2.845    0.004
   .theta1  (rsv2)    0.029    0.010    2.845    0.004
   .theta2  (rsv2)    0.029    0.010    2.845    0.004


Thank you in advance for an answer.

Best regards

Julius Fenn

Terrence Jorgensen

unread,
Oct 18, 2018, 8:00:05 AM10/18/18
to lavaan
In the context of latent growth models what means measurement variance?

My mistake, I must have been typing too fast and thought you were working with common factors, too.

I think for my data a multilevel model does not work (?). The only way I could avoid negative variances for the slope at the between level is by specifying the variances of the latent intercept and slopes equal between the levels, and I have to restrict the error variances at the between level (highlighted green in the output). Do you know this phenomenon? Which seems for me to force the model to fit with unreasonable constraints.

Indeed, those are unreasonable constraints.  I would not impose them, or at least you should compare the models to test whether they hold.  

Nothing wrong with a negative variance estimate in principle, since ML estimation assumes all parameters have normal sampling distributions (i.e., unbounded in either direction).  Certainly the interpretation is difficult, since a true population variance should not be negative.  But if the between-level variance is close to zero, sampling fluctuations around that value are likely to yield negative finite-sample estimates, especially in small samples (your Level-2 N = 41, which is so small you can expect biased estimates).  Small between-level variances are quite common; I expect your ICCs are < 15% for all variables, because there is typically much more variation within than between schools.  So small between-level variances of growth factors would not be surprising, if the developmental process is largely an individual phenomena that is only trivially affected by differences between schools.

FYI, you should probably update to lavaan 0.6-3, since the new multilevel features are often being debugged and updated in the 0.6 family.  Or use the developmental version:

fenn.st...@gmail.com

unread,
Oct 18, 2018, 8:34:35 AM10/18/18
to lavaan
Dear Jorgensen,
thank you for your detailed answer! Your assumption about the ICCs in my data is right (school ICC is even < .05, ID > .30).

Could I ask you how do you use ICC measures in practice? Do you prefer certain ICC measures?

For me the ICC measures seems only as an weak indicator for justifying multilevel models:
For example Goldstein et al. (2002)* wrote "multilevel modelling, the residual variation in a response variable is split into component parts that are attributed to various levels. [..] Such a measure [ICC] however only makes sense in simple variance components". Another example: "With multiple random factors [..] we could ask about many different ICCs [..] So [..] the only case where we can summarize the degree of clustering with a single value is the single-random-factor / random-intercept-only case. Because this is such a small proportion of real-world cases, ICCs are not that useful most of the time" (https://stats.stackexchange.com/questions/115526/intraclass-correlation-coefficient-in-mixed-model-with-random-slopes). Even measures for composite reliability** assume for example no covariances between residual factors and no cross loadings.

Best regards
Julius Fenn



Fo


* Goldstein, H., Browne, W., & Rasbash, J. (2002). Partitioning variation in multilevel models. Understanding statistics: statistical issues in psychology, education, and the social sciences, 1(4), 223-231.

** Geldhof, G. J., Preacher, K. J., & Zyphur, M. J. (2014). Reliability estimation in a multilevel confirmatory factor analysis framework. Psychological methods, 19(1), 72.

Terrence Jorgensen

unread,
Oct 18, 2018, 8:49:11 AM10/18/18
to lavaan
Could I ask you how do you use ICC measures in practice? Do you prefer certain ICC measures?

Only descriptively, when there is a use for them, like in my previous response when I just wanted to say that if most observed variation is at the individual level, it is reasonable to expect most latent variation to be at the same level.  All the points you quoted are valid.  I like to think of an ICC as just another form of the same family of "proportion of variance in YYY explained by XXX", like R-squared or partial-eta-squared, so it is just a matter of what XXX relative to YYY you consider informative.  ICCs are very useful in generalizability theory, quantifying reliability of measures across raters, occasions, items in a scale, etc.  
Reply all
Reply to author
Forward
0 new messages