Composite models in SEM lavaan for non-normal data

100 views
Skip to first unread message

Trisha Gopalakrishna

unread,
Feb 4, 2025, 12:32:50 PM2/4/25
to lavaan
Hello there,
I am trying to build an SEM to understand the causal relationship between a bunch of environmental variables and "stability" in a landscape. Stability has been calculated using trend analyses of an vegetation index. The environmental variables consist of climatic variables, heterogeneity of the area, percentage of area that burned due to fire, distance of area to the closest anthropic land use (proxy for anthropic land use pressures) and water table depth. First, I drew all the relationships plainly in Powerpoint, see image below. Capture.PNG
I want to construct two composite variables- background climate conditions from two indicators of long term seasonality and long term annual temperature and climate pulse stressors from two indicators of drought and heatwaves. I followed the advice from this post and this post about how to set the composite variables. Note the two links are different blogposts/resources about setting composite variables.

My model is


comp_trial1<- '
  #Climate composite variables
  backgroundclimate_composite <~ 1*longterm_rainfallseasonality + mean_annual_temp
  backgroundclimate_composite ~~ 0*backgroundclimate_composite
  pulse_stressors <~ 1*droughtindex + heatwave
  pulse_stressors ~~ 0*pulse_stressors
 
 #regression
 stable_area ~ 1*backgroundclimate_composite + 1*pulse_stressors + water_depth +    pixel_hetero +  anthropic_dist + burnedarea
  ndvi_stability ~~ 1*ndvi_stability
 
 #remaining relationships
  pulse_stressors ~ backgroundclimate_composite
  biodiversity ~ pulse_stressors + backgroundclimate_composite + anthropic_dist
  burnedarea ~ pulse_stressors + backgroundclimate_composite
  heatwave ~~ mat #covariance correlated
'
Using this post from this google group, in the main regression formula I set the loadings of the two climate composite variables as equal to each other (and equal to 1). If I plainly only include the two climate composites without equal loading, then the lavaan model fails. First question- why is the model failing and why do I need to use equal loading?

So my sample size is 3700 and all data are continuous and non- normal, see plot of  distributions and correlations between variables below

Capture2.PNG
Also note that since the units of all variables are not comparable, I scaled the data to be centered around the mean of each variable. Considering the non-normal distributions, I run the sem model using bootstrapping using the Satorra Bentler test

    modeltrial1<- sem(comp_trial1, scaled_analyses_ndvistability, se= "boot", test =   "Satorra.Bentler", bootstrap = 1000, fixed.x= F)
    summary(modeltrial1, standardize = T, fit.measures = TRUE)

The model runs but the model fit statistics are really bad, see result image below. Second question- what is wrong with my model such that the test statistics of corrected Chi sq, CFI, TLI and RMSEA are so bad? How do I improve my model fitness?
Capture3.PNG
Looking at the model fitness statistics above, I have not even checked if the coefficients and the rest of the results make sense. Please help.



Yves Rosseel

unread,
Feb 5, 2025, 3:14:07 AM2/5/25
to lav...@googlegroups.com
Long story short: the current version of lavaan (or any SEM software for
that matter) does a poor job handling composites. We rely on a 'trick'
where a phantom latent variable is created, which is then the dependent
variable predicted by composite indicators. That is also why strange
tricks (like the equality constraint) are needed.

I would recommend to have a look at the cSEM package, which was designed
to handle composites, and uses various PLS-style estimation methods.

If you are brave, you can try out the github version of lavaan (0.6-20).
In this version, we have 'native' support for composites (without
tricks). The model syntax is the same, but (for now) you need the
following options to run it:

fit <- sem(yourmodel, data = yourdata, optim.gradient = "numerical",
composites = TRUE)

but this may change in the next weeks. No support for robust test
statistics yet though. In other words, this is work in progress, but you
can write your model simply as:

#Climate composite variables
backgroundclimate_composite <~ longterm_rainfallseasonality +
mean_annual_temp
pulse_stressors <~ droughtindex + heatwave

#regression
stable_area ~ backgroundclimate_composite + pulse_stressors +
water_depth + pixel_hetero + anthropic_dist + burnedarea

#remaining relationships
pulse_stressors ~ backgroundclimate_composite
biodiversity ~ pulse_stressors + backgroundclimate_composite +
anthropic_dist
burnedarea ~ pulse_stressors + backgroundclimate_composite
heatwave ~~ mat #covariance correlated


Yves.

Trisha Gopalakrishna

unread,
Feb 5, 2025, 4:40:17 AM2/5/25
to lavaan
Hi Yves,

Thank you very much for your reply! This is helpful. It is funny that you mention that majority current SEM softwares do not handle composites well. Ok I will try cSEM. Do you have any idea about my first question ie using equal loadings for the two composite variables (in this case I use 1*composite long term climate and 1*climate pulse stressors)? I noticed that you do not include any such equal loadings in your regression formula of stable area.

Best wishes,
Trisha

Edward Rigdon

unread,
Feb 6, 2025, 1:48:15 PM2/6/25
to lav...@googlegroups.com
Yves--
     To live in such times...SmartPLS adds CFA capability and lavaan is going to include the modeling of composites.
     I have found that the Hensler-Ogasawara (H-O) specification works pretty well. With p observed variables loading on a common factor, you replace the p observed variable residual terms with p - 1 terms. Now the number of observed and unobserved terms is the same, and you are modeling a composite, not a common factor. In simulations, at population level, H-O produces results identical to those for PLS Mode B and GSCA's form with weights only (both use regression weights rather than the correlation weights of PLS Mode A and GSCA with weights and loadings). The H-O specification would be really easy to automate, since it just involves specific patterns of loadings fixed to 1 or 0, or free, and fixed covariances. The method also works for modeling one group of observed variables as a composite while leaving other groups of observed variables loading on common factors.
     I'm attaching some slides which are a subset of a presentation I made at the American Psychological Association convention in August, in Seattle. The whole deck plus syntax (using lavaan) are posted at osf.io--see the QR code on the first slide.
     Joerg Henseler and colleagues say they have made further learnings in terms of optimal starting values. It looks pretty straightforward to me, at this point, and easily automated.
--Ed Rigdon 

--
You received this message because you are subscribed to the Google Groups "lavaan" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lavaan+un...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/lavaan/671cd189-db5d-4352-a8c1-dd9e80c780ac%40gmail.com.
Brief APA Presentation August 2024.pptx
Reply all
Reply to author
Forward
0 new messages