LGCM: quadratic model with several predictors and interactions

336 views
Skip to first unread message

Mili Rubio

unread,
Jun 15, 2023, 10:20:03 AM6/15/23
to lavaan
Dear all,

 I am working on analysing changes in alcohol consumption and alcohol problems measured 5 times over two years. I am conducting LGCM for the first time. So, I have two simple latent growth models: one with alcohol changes, and one with alcohol problems. My data suggests that there is change across time, although small. A quadratic model fits better than a linear model for both outcomes. I am also including predictors in a second model to see whether they can explain (quadratic) changes in alcohol use.
There are some things I am struggling with when I try my multivariate models:
 - When you add predictors to the model, do you regress them in the "i" (intercept) term, "s" (slope) term and "q" (quadratic) term?
 - For my alcohol problems model, I get a warning for negative variance when I add the predictors. I've read that a potential solution is to make the quadratic term a fixed effect, so you just allow average changes and no individual deviations. However, I've also received feedback that this is not the best solution and that it might be better to delete my quadratic term.

I am looking forward to getting some insights about my analyses. I can provide my codes if it helps with clarity.

 Best wishes, Mili Rubio

Keith Markus

unread,
Jun 16, 2023, 9:59:26 AM6/16/23
to lavaan
Mili,
My first thought is that in the report you will want to draw a clear demarcation at the point where planned analyses stop and post-hoc model fiddling begins.  Once you cross that line, remember that the p values are tainted by your knowledge of prior results.

My second thought is that once you enter the phase of unplanned model modification, your task shifts to explaining the result from the planned analysis.  One strategy would be to build up to the model with the negative variance slowly, adding one predictor at a time and adding one effect of the predictor at a time.  Once you obtain a negative variance or other signs of trouble, try swapping out things in the model for things not in the model to try to narrow down what is required to produce the problem.  I would be less concerned with generalizations about model specification and more with finding the issue specific to your model in your data.

One possibility is that the growth parameters (i, s and q) do not fully mediate the relationship between one or more of the predictors and the observed outcomes.  Much may depend on which variance produced the negative estimate (disturbance on q?).  I am guessing that you let the disturbances on i, s and q freely covary in the model with the predictors.  Perhaps another possibility is that you have a mixture in which one or more subgroups require the quadratic term and one or more do not.  You might explore that with some multiple group analyses based on other variables (or even categorized predictors) from the data set if your sample size allows for that.

Keith
------------------------
Keith A. Markus
John Jay College of Criminal Justice, CUNY
http://jjcweb.jjay.cuny.edu/kmarkus
Frontiers of Test Validity Theory: Measurement, Causation and Meaning.
http://www.routledge.com/books/details/9781841692203/

Mili Rubio

unread,
Jun 16, 2023, 1:46:48 PM6/16/23
to lav...@googlegroups.com
Dear Keith,

I appreciate your detailed response. I see what you mean about planned and exploratory analyses and I agree it is very important. I pre-register my work. If I do anything different from the planned analyses, this will be specified in my study as a deviation from the pre-registration (and will only be done for strong reasons). 

I want to primarily make sure I am estimating the model correctly. For instance, a quadratic model seems to explain better changes in alcohol consumption, but I am not sure how you should specify this in the code when you add the predictors to the model. So, to give a clear example: 

My code right now looks like this:

qua_ap_gen <- '
i =~1*Audit_problems_w1 + 1*Audit_problems_w2 + 1*Audit_problems_w3 + 1*Audit_problems_w4 + 1*Audit_problems_w5
s =~ 0*Audit_problems_w1 + 1*Audit_problems_w2 + 2*Audit_problems_w3 + 3*Audit_problems_w4 + 4*Audit_problems_w5
q =~ 0*Audit_problems_w1 + 1*Audit_problems_w2 + 4*Audit_problems_w3 + 9*Audit_problems_w4 + 16*Audit_problems_w5

i ~ LastYearDrinkfBCr_w1 + BRS_total_w1 + Loneliness_w1 + MSPSS_scale1_w1 + MSPSS_scale2_w1 + MSPSS_scale3_w1 + gender + interaction_loneliness + interaction_brs + interaction_sigother + interaction_family + interaction_friends 
q ~ LastYearDrinkfBCr_w1 + BRS_total_w1 + Loneliness_w1 + MSPSS_scale1_w1 + MSPSS_scale2_w1 + MSPSS_scale3_w1 + gender + interaction_loneliness + interaction_brs + interaction_sigother + interaction_family + interaction_friends
'
fit_qua_ap_gen <- growth(qua_ap_gen, data = alRISCOmini, missing="fiml", estimator = "MLR")
summary(fit_qua_ap_gen, fit.measure=TRUE, standardized=TRUE) 

However, my doubt is the following: In this type of model, are the predictors also regressed in the "s" term, or only in the "i" and "q" term? what does the "s" term actually mean if the change is non-linear/quadratic?

I hope my question is clear. I want to first understand this before I go into the second question.

Thanks in advance,
Mili

--
You received this message because you are subscribed to a topic in the Google Groups "lavaan" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/lavaan/fIUT1T2LJUE/unsubscribe.
To unsubscribe from this group and all its topics, send an email to lavaan+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/lavaan/e816e059-9279-4acc-943e-aad5317419c8n%40googlegroups.com.

Keith Markus

unread,
Jun 17, 2023, 8:10:43 AM6/17/23
to lavaan
Mili,
I do not believe that any of the parameters you are considering are incorrect.  It is more a matter of what is needed and what is useful.  Negative variance estimates can indicate an incorrect constraint, in your case either on the covariances between i, s and q, or on the observed variables themselves.  So, that is one thing to keep an eye on.

My strategy in this type of situation is to write a simple function that will plot growth curves with various growth parameter values, i.e., i, s and q, using the curve() function.  That way you can vary any one growth parameter alone, or two or more in combination, and see the result in the growth curve.  It can also be helpful to build out the function to accept variances (or standard deviations) for the growth parameters and the number of curves to draw.  You can then sample growth parameter values using rnorm() with the specified N, the parameter values used as the mean, and the specified standard deviation.  In my experience, trial and error with this kind of visualization of the resulting curves is the most effective way to answer you question about how the various effect coefficients will impact the growth curves.

The s parameter represents the linear portion of the growth curve.  If you hold that constant and only allow the predictors to affect the q term, then everyone will have (random variations around) the same linear slope and the predictor will only impact the quadratic deviations from this linear growth.

One strategy would be to begin with one variable affecting i, then allow it to affect s, then q, then move on to the next predictor.  Another option would be to add the effects on i one at a time, then move on to s, then to q.  I would probably try the first option first because it allows you to fully explore the impact of one predictor before adding more.  You have a very complex model and it is unlikely that the reason for the negative variance estimate will just pop out at you while inspecting the full model.   Building up the model step by step offers a more systematic approach to teasing out the problem.

Mili Rubio

unread,
Jun 19, 2023, 9:34:32 AM6/19/23
to lav...@googlegroups.com
Dear Keith,

Thank you for your thorough reply. It is very much appreciated!
I've applied the strategy you suggested about slowly building the model by allowing one variable to affect i, then s, and q. I encounter exactly the same problem with each predictor, which might mean a more general issue with my model.
Basically, if I allow the predictor to regress on i everything is fine, but if I include the s and then I already get the negative variance message. If I add the q term as well, then I don't get the negative variance error anymore BUT only if I allow the predictors to regress on the three terms i, s, and q. This applies to each individual predictor and also to the complete model with all predictors included.

I am unsure whether this has a clear logic, such as a more stable model with all parameters. On the other hand, it is also possible that I don't get the warning but the model is still not correct. Did you ever experience something similar or read/heard about this? I've found it difficult to find literature, especially one that I can relate to my case. I am wondering if having i, s, and q is a real solution (e.g., controlling for the linear portion of the curve) or doesn't seem plausible.

Concerning your advice, I didn't plot the growth curves with various parameters yet. Will check that as well soon.

Thanks in advance for any further advice,
Mili

--
You received this message because you are subscribed to a topic in the Google Groups "lavaan" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/lavaan/fIUT1T2LJUE/unsubscribe.
To unsubscribe from this group and all its topics, send an email to lavaan+un...@googlegroups.com.

Keith Markus

unread,
Jun 20, 2023, 9:58:21 AM6/20/23
to lavaan
Mili,
When you say everything is fine you only mean that there is no negative variance, not that the model fits well, correct?  One possibility is that the negative variance is compensating for misspecification in the model in which the predictors are not allowed to predict the quadratic term.
 
The interactive graphing that I suggested earlier will allow you to separate out the effect of each growth parameter on the shape of the growth trajectory.  (Just to clarify, all you need to do is plot the growth in the latent variable, not the observed indicators.)

Separate from that, it may be helpful to borrow a page from Aiken and West (1991) and graph the model implied growth curve for values of a chosen predictor at the mean, one SD below the mean, and on SD above the mean.  You can put all three growth curves on one graph (perhaps using different colors or line types).  This will give you a sense of how growth differs across values of the predictor holding all the other predictors constant.  It might be best to set the other predictors to their means to obtain the average growth with respect to the other predictors. Unlike the previous graphs, all three growth parameters will change together across values of the predictor.

One other thing to consider. As an alternative to the quadratic, especially if it does not fit well, you could fit a model with only i and s but freeing some of the loadings on s, working your way back from the final time point.  As you work your way back freeing loadings on s, I suspect that the negative variance will eventually stop happening.  The manner in which that happens may offer some insight.

Keith

Aiken, L. S., & West, S. G. (1991). Multiple regression: Testing and interpreting interactions. Sage Publications, Inc.

Mili Rubio

unread,
Jun 23, 2023, 12:59:16 PM6/23/23
to lav...@googlegroups.com
Hi Keith,

Thanks again and sorry to get back to you late. When I say "everything is fine" I mean that the warning for negative variances is not there anymore but also that the fit is good. I didn't experience generally fit issues, mainly the situation with the negative variances. The only model that didn't have good fit was the linear one but without any predictors. That's why I went on into adding the predictors to the quadratic model. I coloured in yellow the fit indexed we planned to look at. 

Here you can see my output for the quadratic model without predictors:

 lavaan 0.6.15 ended normally after 67 iterations

  Estimator                                         ML
  Optimization method                           NLMINB
  Number of model parameters                        14

  Number of observations                           437
  Number of missing patterns                        14

Model Test User Model:
                                              Standard      Scaled
  Test Statistic                                26.026      23.652
  Degrees of freedom                                 6           6
  P-value (Chi-square)                           0.000       0.001
  Scaling correction factor                                  1.100
    Yuan-Bentler correction (Mplus variant)                      

Model Test Baseline Model:

  Test statistic                              1361.347     893.829
  Degrees of freedom                                10          10
  P-value                                        0.000       0.000
  Scaling correction factor                                  1.523

User Model versus Baseline Model:

  Comparative Fit Index (CFI)                    0.985       0.980
  Tucker-Lewis Index (TLI)                       0.975       0.967

                                                                 
  Robust Comparative Fit Index (CFI)                         0.987
  Robust Tucker-Lewis Index (TLI)                            0.978

Loglikelihood and Information Criteria:

  Loglikelihood user model (H0)              -3641.038   -3641.038
  Scaling correction factor                                  1.222
      for the MLR correction                                      
  Loglikelihood unrestricted model (H1)      -3628.025   -3628.025
  Scaling correction factor                                  1.185
      for the MLR correction                                      
                                                                 
  Akaike (AIC)                                7310.077    7310.077
  Bayesian (BIC)                              7367.196    7367.196
  Sample-size adjusted Bayesian (SABIC)       7322.767    7322.767

Root Mean Square Error of Approximation:

  RMSEA                                          0.087       0.082
  90 Percent confidence interval - lower         0.055       0.050
  90 Percent confidence interval - upper         0.123       0.116
  P-value H_0: RMSEA <= 0.050                    0.032       0.048
  P-value H_0: RMSEA >= 0.080                    0.676       0.581
                                                                 
  Robust RMSEA                                               0.091
  90 Percent confidence interval - lower                     0.053
  90 Percent confidence interval - upper                     0.132
  P-value H_0: Robust RMSEA <= 0.050                         0.040
  P-value H_0: Robust RMSEA >= 0.080                         0.713

Standardized Root Mean Square Residual:

  SRMR                                           0.028       0.028

Parameter Estimates:

  Standard errors                             Sandwich
  Information bread                           Observed
  Observed information based on                Hessian

Latent Variables:
                   Estimate  Std.Err  z-value  P(>|z|)
  i =~                                                
    Adt_cnsmptn_w1    1.000                          
    Adt_cnsmptn_w2    1.000                          
    Adt_cnsmptn_w3    1.000                          
    Adt_cnsmptn_w4    1.000                          
    Adt_cnsmptn_w5    1.000                          
  s =~                                                
    Adt_cnsmptn_w1    0.000                          
    Adt_cnsmptn_w2    1.000                          
    Adt_cnsmptn_w3    2.000                          
    Adt_cnsmptn_w4    3.000                          
    Adt_cnsmptn_w5    4.000                          
  q =~                                                
    Adt_cnsmptn_w1    0.000                          
    Adt_cnsmptn_w2    1.000                          
    Adt_cnsmptn_w3    4.000                          
    Adt_cnsmptn_w4    9.000                          
    Adt_cnsmptn_w5   16.000                          

Covariances:
                   Estimate  Std.Err  z-value  P(>|z|)
  i ~~                                                
    s                -0.080    0.258   -0.311    0.756
    q                -0.010    0.052   -0.199    0.842
  s ~~                                                
    q                -0.175    0.057   -3.088    0.002

Intercepts:
                   Estimate  Std.Err  z-value  P(>|z|)
   .Adt_cnsmptn_w1    0.000                          
   .Adt_cnsmptn_w2    0.000                          
   .Adt_cnsmptn_w3    0.000                          
   .Adt_cnsmptn_w4    0.000                          
   .Adt_cnsmptn_w5    0.000                          
    i                 5.443    0.107   50.963    0.000
    s                 0.761    0.074   10.280    0.000
    q                -0.199    0.017  -11.433    0.000

Variances:
                   Estimate  Std.Err  z-value  P(>|z|)
   .Adt_cnsmptn_w1    1.718    0.303    5.662    0.000
   .Adt_cnsmptn_w2    1.115    0.131    8.501    0.000
   .Adt_cnsmptn_w3    0.868    0.134    6.484    0.000
   .Adt_cnsmptn_w4    1.068    0.128    8.317    0.000
   .Adt_cnsmptn_w5    0.372    0.261    1.423    0.155
    i                 3.465    0.367    9.441    0.000
    s                 0.740    0.254    2.908    0.004
    q                 0.049    0.014    3.437    0.001


I tried your advice to plot the different growths, linear and quadratic (I was not sure how to plot the intercept term) just by themselves. I need to see whether I would also pick a predictor to test how it affects the terms. I think you are suggesting that each predictor might affect a different kind of growth, and this might explain why I get weird warnings, right? I am thinking there is one alternative way to analyse the data that could be more appropriate and this is the piecewise model. Since a lockdown occurred during the first wave, until the second wave of data, I get this sudden increase in alcohol problems (or at least that's what we think happened) at the second time point, followed by a slow decrease or at least stability. Would you consider a potential solution for this issue using a piecewise model? In this case, we wouldn't have a nonlinear change, but more two linear segments. Maybe predictors actually relate differently to each segment and this model can explain better the situation.

See below the plots of the linear and quadratic growth
Plotting Latent Growth Curve Models.png
Let me know if you have any further insights. I am by now quite stuck and unsure how to continue, so any help is very much appreciated.

Have a nice weekend :),
Mili









--
You received this message because you are subscribed to a topic in the Google Groups "lavaan" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/lavaan/fIUT1T2LJUE/unsubscribe.
To unsubscribe from this group and all its topics, send an email to lavaan+un...@googlegroups.com.

Keith Markus

unread,
Jun 24, 2023, 1:30:07 PM6/24/23
to lavaan
Mili,
If you test a piecewise model with a knot at wave 2, you are essentially testing the hypothesis that the acceleration visible in the growth between wave 2 and wave 5 falls within sampling error of linear growth in that region.

The entire family of models that you are considering assumes that the growth parameters fully mediate the associations between the predictors and the observed growth variables.  The first set of graphs that I suggested would help you see how each growth parameter affects the pattern of growth (elevation, direction, deviation from linearity).  The graphs that I suggested would help you see how a change in a predictor affects the pattern of growth when the other predictors are held constant.

Your model fits much better than the independence model but not within sampling error.  I would take a close look at the model residuals.  Does the chi-square result form a large number of very small residuals?  Or are there some larger ones that stand out?  If so, are they residuals between the predictors and the observed growth variables?  If so, then the specific choice of growth model may not provide the required growth parameters to fully mediate the association between the predictors and observed growth variables.

Keith

Mili Rubio

unread,
Jun 27, 2023, 11:33:43 AM6/27/23
to lav...@googlegroups.com
Dear Keith,

Thanks once more. How would you suggest taking a look at the residuals? I've asked for unstandardized and standardized residuals using lavResiduals(). It seems there are some high values in some of the observed growth outcomes (for example, time 2) and between some of the observed growth outcomes and predictors (for 3 of my predictors). Would this suggest a growth model is not advised?

I also tried to graph the growth curve for values of a chosen predictor at the mean, one SD below the mean, and one SD above the mean. This didn´t work for me, tried to look online but I still couldn´t find an example to check how to do this with growth curves, any suggestion is appreciated...

From some things I tried based on your advice, the problem might be related to my latent variables  - especially when I add any predictor to the quadratic model. Maybe my outcome is too skewed and doesn´t change so much to allow the predictors to predict anything... Doesn´t seem to be coming from one specific predictor because I´ve tried in different instances building the model separately with one predictor at a time, and got the same warnings.

Thanks for all your patience and suggestions. Unfortunately, I cannot seem to find the root of the issue and, even less so, how to fix it. If you have any new insights, please don´t hesitate to share them with me.
Best wishes,
Mili

--
You received this message because you are subscribed to a topic in the Google Groups "lavaan" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/lavaan/fIUT1T2LJUE/unsubscribe.
To unsubscribe from this group and all its topics, send an email to lavaan+un...@googlegroups.com.

Keith Markus

unread,
Jun 29, 2023, 10:10:00 AM6/29/23
to lavaan
Mili,
In my experience it is helpful to first look at the observed means and then compare these to the implied means and the raw residual means.  These are easily extracted vectors of numbers that you can easily plot in a line graph.  For covariance residuals, I prefer the corrleation metric residuals because large values of raw covariance residuals may simply reflect large variances.  You should report the model either way but if the residuals exceed what you would consider practically significant (e.g., perhaps .01 for correlations) then you can discuss that as a caveat and caution to the reader regarding fit and what you conclude about whether the model is well specified.

If allowing the covariates to predict all three terms removes the negative variance, then the zero-constraints were probably misspecified and you should keep the regression weights free.  Your graph has only one obvious bend, which would be appropriate for a quadratic curve but you can explore that further by trying models with some free loadings.

Years ago I wrote code to plot implied growth curves and sadly misplaced it.  It might be useful for my SEM course.  So, I had a start at rewriting it.  The below is just starter code to give you a leg up.  It has not been adequately tested.  It is not as flexible or robust as it should be.  However, hopefully it is enough to get you started and you can tweak it to do what you need.

require(lavaan)

# define functions
computeGrowthParameters <- function(a, b, pred,
                                    pnames=c('i','s','q'),
                                    verbose=FALSE){
  # Compute predicted values of growth parameters
  # parameters:
  #  a = single column matrix of regression intercepts
  #  b = matrix of regression weights (rows=growth params, cols=covariates)
  #  pred = matrix of predictor values to be plotted
  #         (rows=sets of values for comparison, cols=covariates)
  # return value:
  #  matrix of values pred by parameter
  #  if verbose: list of matrices
  am <- a
  if(is.vector(a)){am <- as.matrix(a)}
  bm <- b
  pm <- pred
  ones <- matrix(1, ncol=1, nrow=dim(pm)[1]) # p by 1
  onesaConform <- identical(dim(ones)[2], dim(am)[2])
  pmbmConform <- dim(pm)[2] == dim(bm)[2]
  if(!onesaConform){warning('Number of intercepts does not match number of growth parameters')}
  if(!pmbmConform){warning('Number of covariates does not match between b and pred')}
  growthParameterValues <- NULL
  if(onesaConform & pmbmConform){
    # gp^ = a + sum(b[i]*pred[i])
    growthParameterValues <- (ones %*% t(am)) + (pm %*% t(bm)) # p by k=3
    colnames(growthParameterValues) <- pnames
  } # end if
  if(verbose){return(list(growthParameterValues,
                        am=am, bm=bm, pm=pm, ones=ones))}
  return(growthParameterValues)
} #end function

# illustration
computeGrowthParameters(a=as.matrix(1:3),
                        b=cbind(seq(-1,1,by=1), rep(1,times=3)),
                        pred=cbind(0:4,0:4),
                        verbose=TRUE)


plotGrowth <- function(a, b, li, ls, lq, pred, cex.label=.75,
                       covariate.labels=NULL, label.precision = 4){
  # Plot three growth curves for pred values
  # Param:
  #  a: matrix of regression intercepts (k x 1)
  #  b: matarix of regression weights (k x p)
  #  li: vector of intercept loadings
  #  ls: vector of slope loadings
  #  lq: vector of quadratic loadings
  #  pred: vector of selected predictor values (n x p)
  #  k = number of growth parameters, p = number of covariates,
  #  n = number of sets of values to be plotted
  # Ouptut:
  # Vectors of means
 
  # Get predicted values of growth parameter variables
  growthParameters <- computeGrowthParameters(a=a,
                                              b=b,
                                              pred=pred)
  if(is.null(growthParameters)){warning('Null Growth Parameters')}
 
  # Compute implied condional means
  lambda <- cbind(li, ls, lq)
  predictedMeans <- growthParameters %*% t(lambda) # sum(lambda[i]*gp[i])
  dimPred <- dim(predictedMeans)
  nPredVals <- dimPred[1] # number of predictor values (rows)
  nGrowthParam <- dimPred[2] # number of growth parameters (columns)
  nTimes <- dim(lambda)[1] # number of observed time points
 
  # Plot
  plot(x=c(1,nTimes),
       y=c(min(predictedMeans),max(predictedMeans)),
       type='n',
       xlab='Time',
       ylab='Dependent Variable')
  for(r in 1:nPredVals){
    lines(x=1:nTimes,
          y=predictedMeans[r,])
    points(x=1:nTimes,
          y=predictedMeans[r,],
          pch=15 + r)
    if(is.null(covariate.labels))
    {l <- 1:dimPred[2]} else
    {
      l <- ""
      for(c in covariate.labels){
        l <- paste(l, round(pred[r,c],label.precision))
      } # end loop
    } # end if-else
    text(x=1,
         y=predictedMeans[r,1],
         labels=l,
         pos=4,
         cex=cex.label)
  } # end loop
  return(predictedMeans)
} # end function



## linear growth model with a time-invariant covariates
## Modifed from help file
model.syntax <- '
  # intercept and slope with fixed coefficients
    i =~ 1*t1 + 1*t2 + 1*t3 + 1*t4
    s =~ 0*t1 + 1*t2 + 2*t3 + 3*t4
    s2 =~ 0*t1 + 1*t2 + 4*t3 + 9*t4

  # regressions
    i ~ x1 + x2
    s ~ x1 + x2
    s2 ~ x1 + x2

' # end model

sqrt(diag(var(Demo.growth))) # Standard Deviations
colMeans(Demo.growth)
fit <- growth(model.syntax, data = Demo.growth)
summary(fit)
resid(fit, 'raw') # more useful for means
resid(fit, 'cor') # more useful for covariances
lavInspect(fit, 'sampstat') # observed means
lavInspect(fit, 'implied') # implied means
lavInspect(fit, 'est')



# Draw plot
myEstimates <- lavInspect(fit, what='est')
a <- as.matrix(myEstimates$alpha[1:3,])
b <- myEstimates$beta[1:3, 4:5]
li <- myEstimates$lambda[1:4,1]
ls <- myEstimates$lambda[1:4,2]
lq <- myEstimates$lambda[1:4,3]
x1Mean <- myEstimates$alpha[4,1]
pred <- cbind(rep(x1Mean,5),c(.138 - 2, .138-1, .138, 1.138, 2.138))
colnames(pred) <- c('x1','x2')
#a;b;li;ls;lq;pred
plotGrowth(a = a,
           b = b,
           li = li,
           ls = ls,
           lq = lq,
           pred = pred,
           covariate.labels=c('x1','x2'))


Keith

Mili Rubio

unread,
Jul 5, 2023, 8:25:47 AM7/5/23
to lav...@googlegroups.com
Dear Keith,

Thank you for that extensive code! I will try it out and let you know how it goes :). Can I ask you something else in regard with one sentence you share: 

If allowing the covariates to predict all three terms removes the negative variance, then the zero-constraints were probably misspecified and you should keep the regression weights free.  Your graph has only one obvious bend, which would be appropriate for a quadratic curve but you can explore that further by trying models with some free loadings.

I don't 100% understand what do you mean with "zero-constraints were probably misspecified" and you should keep the regression weights free".  Indeed, my best-fitting model with no negative variance is the model in which the covariates predict the three terms. The only thing I still find puzzling is that my intercept value, under intercepts, is negative. Does that make sense? or have you ever seen a negative intercept score in LGM?

Thanks again,
Mili

--
You received this message because you are subscribed to the Google Groups "lavaan" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lavaan+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/lavaan/c843cc70-58b1-4db0-aa44-4174cf94b6c7n%40googlegroups.com.

Keith Markus

unread,
Jul 6, 2023, 9:33:00 AM7/6/23
to lavaan
Mili,
When you omit a parameter from an SE model, that is equivalent to including it but fixing it to zero.  So, your models that omitted effects on one of the growth parameter variables were essentially fixing the effects to zero.

I think that your second question is about an intercept parameter in the model (i.e., the effect of the constant on a variable) and not the growth intercept latent variable.  The parameter space ranges from negative infinity to positive infinity, intercepts are not bounded.  If you have large positive effects of positive-valued variables, then this will tend to increase the mean value of the variable.  The intercept term adjusts the mean value back toward the observed value, which in this case would be downward.  The fact that an intercept is negative does not necessarily imply that the variable has negative values.  The negative intercept can be counterbalanced by positive effects of variables.

One common situation in which a surprising intercept value can occur is on in which zero is outside the observed range of values for one or more predictors.  The intercept represents the expected value of the variable when all of its predictors equal zero (and thus drop out of the equation).  If zero is a fanciful value for one or more causal variables, then the intercept will be fanciful as well.  It represents the mean of the variable under conditions that do not actually occur in the data.  There is nothing wrong with that but it can be surprising if you do not see it coming.

Mili Rubio

unread,
Jul 6, 2023, 10:00:13 AM7/6/23
to lav...@googlegroups.com
Dear Keith,

Thanks a lot for your quick response! I now understand your explanation about misspecified zero constraints. Concerning your second question, you can see below the intercept value I am talking about. I interpreted this value as the growth intercept of the latent variable (but might be wrong?). The value of the intercept I find negative is this coloured below in violet. This negative value only appeared in the model with added covariates, not in the simple model without covariates. Just for you to know, I centered the values for better interpretation. I was not expecting this negative value and was, indeed, puzzled.

Best,
Mili


 

Intercepts:

                   Estimate  Std.Err  z-value  P(>|z|)

   .Audt_prblms_w1    0.000                          

   .Audt_prblms_w2    0.000                          

   .Audt_prblms_w3    0.000                          

   .Audt_prblms_w4    0.000                          

   .Audt_prblms_w5    0.000                          

   .i                -2.548    1.912   -1.332    0.183

   .s                 1.886    1.371    1.376    0.169

   .q                -0.315    0.314   -1.004    0.316

 

Variances:

                   Estimate  Std.Err  z-value  P(>|z|)

   .Audt_prblms_w1    2.418    0.892    2.712    0.007

   .Audt_prblms_w2    3.161    0.440    7.182    0.000

   .Audt_prblms_w3    3.109    0.461    6.747    0.000

   .Audt_prblms_w4    3.038    0.476    6.383    0.000

   .Audt_prblms_w5    2.122    0.765    2.774    0.006

   .i                 7.155    1.009    7.092    0.000

   .s                 1.714    0.573    2.990    0.003

   .q                 0.078    0.030    2.579    0.010

 


--
You received this message because you are subscribed to the Google Groups "lavaan" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lavaan+un...@googlegroups.com.

Keith Markus

unread,
Jul 7, 2023, 9:18:53 AM7/7/23
to lavaan
Mili,
Remember that i is a variable, not a constant.  Different participants have different growth curves, as indicated by the non-zero variances for i, s and q.  The highlighted value is the intercept for the i variable.

The intercept for a variable with no predictors will typically represent its mean because if you have no further relevant information, guessing the mean will minimize the summed squared errors of your guesses for individual values.  However, as I understand it, your model includes predictors of i.  In this case, to obtain the mean, you would need to plug in mean values for the predictors and compute i^ from the equation i^ = i[intercept] + b[i1](x[1]) + ... + b[in](x[n]).  I suspect that in your case the means of the predictors are positive.  If the effect coefficients are positive too, you can have a negative intercept and a positive mean.  For example, if there are two predictors with means of 5 and the effect coefficients are both .5, then an intercept of -2.5 could still give you a mean of -2.5 + (.5 * 5) + (.5 * 5) = 2.5, that is, positive 2.5.  You can find a more detailed discussion in a regression book like the one by Cohen, Cohen, West & Aiken.

There is nothing in the logic of the model that requires i to have a positive mean.  However, i is the predicted value of the observed variable at the first time point because the loadings for s and q are both zero for that time point.  So, if your observed variable has a positive range, then you might expect the mean of i to be positive for that reason.  (This paragraph assumes that there are no covariates with effects on the observed variables that are not fully mediated by the growth variables.)

If you have not done so already, you may find it helpful to plot the first 200 or so cases in a line plot with separate lines for each case, putting time on the x axis and the observed variables on the y axis.  It helps to plot different lines in different colors.  This will give you a sense of the variability in individual growth.  Variability at the first time point will give you a sense of the work that i has to do in your model.

Hopefully I am not missing anything.

Mili Rubio

unread,
Jul 7, 2023, 9:35:56 AM7/7/23
to lav...@googlegroups.com
Hi Keith,

Thank you for your explanation regarding the negative intercept for the latent variable 'i.' I appreciate your insights. Considering the unexpected nature of this negative value, I would like to ask for your opinion directly on what concerns me. In your experience, the negative intercept indicates a potential error or misspecification in the model, or might be a valid result that can be interpreted in the context of the predictors and their effect coefficients?

It might help to know that when I have fewer predictors, the value is not negative, the negative value appears as the model grows more complex. I believed this might be related to the complexity of the model, as now all predictors combined affect the latent intercept. However, my supervisor was concerned about this negative value and, consequently, me too. Mainly, we don´t know if this value is accepted in these models (I understand there is no right or wrong answer) and if it is correct to proceed with reporting the model. 

Thanks in advance for all your help and patience,
Mili

--
You received this message because you are subscribed to the Google Groups "lavaan" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lavaan+un...@googlegroups.com.

Keith Markus

unread,
Jul 8, 2023, 8:51:46 AM7/8/23
to lavaan
Mili,
I think that the only way you and your supervisor are going to become comfortable with the negative estimate is by understanding its interpretation in your model.

Let's try approaching it by way of an analogy.  Suppose your budget has to balance.  If you buy one thing, bread day, then you spend the price of bread times, say, 2 loaves: breadPrice*2.  To balance, you need income of the same amount which you can think of as a negative expense (0 - breadPrice*2).  Now suppose you buy milk and butter too.  Now you payments are breadPrice*2 + milkPrice*1 + butterPrice*1, which pushes your payments higher.  As a result, in order to balance your budget, your income needs to increase by the same amount, producing a larger negative number.  The more you spend, the larger in absolute value the negative number that represents your required income and gets added in to make the whole thing balance out to zero.

In the equation predicting i, the intercept is like the income term in the above example.  One way or another, the predicted value of i has to come out to the mean value of i for the average person, that is like the constraint of a balanced budget.  As long as the covariates are positive valued and have positive effects on i, they act like the expenses in the above example.  If you have no covariates predicting i, then the intercept represents the mean of i, the value if i for the average person.  When you add covariates, these push the predicted value of i up toward larger positive numbers, like the expenses in the above example.  The intercept is compensating for this, balancing that upward influence, by pulling the predicted value back down.  For the average person with mean values on all the covariates, the intercept will pull the predicted value of i back down to its mean.  This explains the pattern you observed in which adding more covariates led to larger negative intercepts (if my assumptions about your model are correct).

For an implausible person with out-of-range zero values on all of the covariates, the predicted value will equal the intercept.  It is okay if negative values predict values that are outside the range of what you would expect for the observed growth variables at time one, because the prediction of i equal to the intercept value is for hypothetical cases with values of the covariates that fall outside of its observed distribution.

To get a better feel for this, you could create a spreadsheet.  Enter 5 or 6 cases with reasonable values for the covariates.  Then use a spreadsheet formula to compute the predicted value of i for each case, using the estimated effect coefficients.  Then use the factor loading to compute the predicted time-one value of the observed growth variable.  As long as you keep the values of the covariates inside a reasonable range for your data, you should find that the computed values are also within range.  Conversely, if you are able to produce computed values that fall out of range and do not make sense, then this should only happen if you enter out-of-range covariate values.  You are exercising good practice by evaluating the plausibility of the estimates as part of your assessment of model fit.  You can think of the spreadsheet as a method for making that evaluation.  The estimated intercept value is plausible if it leads to plausible predictions.

Mili Rubio

unread,
Jul 10, 2023, 6:19:13 AM7/10/23
to lav...@googlegroups.com
Thanks, Keith, I get it now, with your simple explanation! I hope we will figure out soon what is happening with the model.

Best,
Mili

--
You received this message because you are subscribed to the Google Groups "lavaan" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lavaan+un...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages