Issues specifying causal formative measurement model in lavaan

23 views
Skip to first unread message

Janine

unread,
Aug 8, 2025, 11:04:26 AMAug 8
to lavaan

Dear lavaan users,

I’m working on specifying a causal-formative measurement model in lavaan and would appreciate some guidance. 

Specifically, I have a latent factor that is caused by six causal indicators and predicts two outcomes (ie, two emitted paths). These two outcome paths are both significant and intended to satisfy the 2+ emitted paths rule for identification. 

Building on Bollen & Bauldry (2011), I’m aiming to model a causal formative indicator model in which the latent variable includes a non-zero disturbance (error/variance) term. However, I’ve found that lavaan consistently fixes the latent variance to zero. This seems to effectively convert the model into a composite-formative model, something Bollen & Bauldry (2011) explicitly distinguish from causal-formative models, noting that only the composite type should have the latent variable’s residual variance fixed to zero.

I came across an earlier Google Forum post with cited tutorials that were helpful, but all examples I found fix the latent variable’s residual variance to zero. Fixing the variance to zero creates a composite formative factor, and I am specifically trying to estimate a causal formative factor where the latent residual variance is freely estimated. 

My question is: how can I explicitly tell lavaan to estimate the latent variance (ie, allow a non-zero residual variance for the latent construct), rather than fixing it to zero? 

I’ve included an example of my model code and output below. As you’ll see, lavaan sets the variance of the latent variable (threat) to 0. When I attempt to standardize the latent variable’s variance directly (eg, threat ~~ 1*threat) or simply include threat ~~ threat to estimate it freely, the model becomes unidentified and throws an error. However, as far as I can tell, this model should be at least structurally identified. I was not sure if the non-significant estimates of the causal indicators gives rise to empirical non-identification in this case? 

I’d be grateful for any advice or examples that clarify how to correctly specify and identify a causal formative model in lavaan.  Again, even if this specific model of mine is non-identified for some reason, I have struggled to locate applied examples of causal-formative (estimated latent residual variance) rather than composite-formative (latent residual variance fixed to zero) latent variables.

Thank you very much,

Janine

 

#THREAT MODEL 

> mod.threat <- ' #specifying measurement model  

+            threat <~ 1*x1 + x2 + x3 + x4 + x5 + x6

+            # threat ~~ threat

+            # allow residual covariances among indicators

+            x1 ~~ x2

+            x1 ~~ x3

+            x1 ~~ x4

+            x1 ~~ x4

+            x1 ~~ x6

+            

+            x2 ~~ x3

+            x2 ~~ x4

+            x2 ~~ x5

+            x2 ~~ x6

+            

+            x3 ~~ x4

+            x3 ~~ x5

+            x3 ~~ x6

+            

+            x4 ~~ x5

+            x4 ~~ x6

+            

+            x5 ~~ x6

+            

+            #specifying structural model portion

+            ysr_int_scaled ~ threat

+            ysr_ext_scaled ~ threat

+            '

> fit <- sem(mod.threat, data = df, estimator = 'ml')

Warning message:

lavaan->lav_data_full():  

   some observed variances are (at least) a factor 1000 times larger than others; use varTable(fit) to investigate 

 

> summary(fit, fit.measures = TRUE, standardized = TRUE)

lavaan 0.6-19 ended normally after 96 iterations

 

  Estimator                                         ML

  Optimization method                           NLMINB

  Number of model parameters                        30

 

                                                  Used       Total

  Number of observations                           133         167

 

Model Test User Model:

                                                     

  Test statistic                                25.564

  Degrees of freedom                                 6

  P-value (Chi-square)                           0.000

 

Model Test Baseline Model:

 

  Test statistic                               325.756

  Degrees of freedom                                28

  P-value                                        0.000

 

User Model versus Baseline Model:

 

  Comparative Fit Index (CFI)                    0.934

  Tucker-Lewis Index (TLI)                       0.693

 

Loglikelihood and Information Criteria:

 

  Loglikelihood user model (H0)              -1271.192

  Loglikelihood unrestricted model (H1)      -1258.410

                                                     

  Akaike (AIC)                                2602.385

  Bayesian (BIC)                              2689.095

  Sample-size adjusted Bayesian (SABIC)       2594.201

 

Root Mean Square Error of Approximation:

 

  RMSEA                                          0.157

  90 Percent confidence interval - lower         0.097

  90 Percent confidence interval - upper         0.221

  P-value H_0: RMSEA <= 0.050                    0.003

  P-value H_0: RMSEA >= 0.080                    0.981

 

Standardized Root Mean Square Residual:

 

  SRMR                                           0.081

 

Parameter Estimates:

 

  Standard errors                             Standard

  Information                                 Expected

  Information saturated (h1) model          Structured

 

Composites:

                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all

  threat <~                                                            

    x1                1.000                               0.495    0.619

    x2                0.083    0.150    0.558    0.577    0.041    0.136

    x3                0.550    0.242    2.269    0.023    0.272    0.654

    x4                0.040    0.158    0.251    0.802    0.020    0.054

    x5               -0.206    0.226   -0.912    0.362   -0.102   -0.223

    x6               -0.632    0.306   -2.067    0.039   -0.313   -0.460

 

Regressions:

                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all

  ysr_int_scaled ~                                                     

    threat            0.018    0.006    3.320    0.001    0.037    0.436

  ysr_ext_scaled ~                                                     

    threat            0.010    0.003    2.777    0.005    0.019    0.309

 

Covariances:

                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all

  x1 ~~                                                                 

    x2                 1.266    0.342    3.698    0.000    1.266    0.306

    x3                 0.417    0.237    1.764    0.078    0.417    0.139

    x4                 0.624    0.265    2.352    0.019    0.624    0.182

    x6                 0.406    0.143    2.843    0.004    0.406    0.221

  x2 ~~                                                                 

    x3                 4.609    0.786    5.866    0.000    4.609    0.579

    x4                 4.645    0.863    5.380    0.000    4.645    0.514

    x5                 3.092    0.655    4.722    0.000    3.092    0.427

    x6                 1.795    0.437    4.109    0.000    1.795    0.370

  x3 ~~                                                                 

    x4                 2.809    0.614    4.572    0.000    2.809    0.427

    x5                 2.341    0.496    4.723    0.000    2.341    0.445

    x6                 0.899    0.312    2.882    0.004    0.899    0.254

  x4 ~~                                                                 

    x5                 2.885    0.568    5.078    0.000    2.885    0.482

    x6                 1.637    0.370    4.428    0.000    1.637    0.408

  x5 ~~                                                                 

    x6                 1.599    0.305    5.238    0.000    1.599    0.497

 .ysr_int_scaled ~~                                                     

   .ysr_ext_scaled     0.001    0.000    3.466    0.001    0.001    0.315

 

Variances:

                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all

   .ysr_int_scaled    0.006    0.001    8.155    0.000    0.006    0.810

   .ysr_ext_scaled    0.004    0.000    8.155    0.000    0.004    0.904

    x1                1.564    0.192    8.155    0.000    1.564    1.000

    x2               10.933    1.318    8.298    0.000   10.933    1.000

    x3                5.789    0.707    8.186    0.000    5.789    1.000

    x4                7.474    0.909    8.219    0.000    7.474    1.000

    x5                4.789    0.587    8.155    0.000    4.789    1.000

    x6                2.158    0.261    8.255    0.000    2.158    1.000

    threat            0.000                               0.000    0.000

Edward Rigdon

unread,
Aug 8, 2025, 3:29:47 PMAug 8
to lav...@googlegroups.com
     Given strong predictors of threat, your two emitted paths would enable estimation of threat as a common factor--the communality between the two ysr variables--IF those paths were strong. But you see that the parameter estimates are tiny--both less than 0.02. If those paths are not strong, then identification may fail. "Statistical significance" is not the criterion, and what qualifies as "strong"--no one can say generically.
     But notice the warning message that there is a variable with a variance that is much larger than the other observed variables. You should address that in order to guard against unexpected behavior. In particular, notice that the two ysr variables seem to have very small variances--much smaller than those of the six predictors of threat. Before moving forward, I strongly suggest that you rescale variables so that their variances are more similar. You can do that by multiplying or dividing by constants, such as powers of 10. You would like the variances of all your observed variables to be within a factor of 10 of each other.
     I see that you included a line of code saying:
+            # threat ~~ threat

Currently, this is commented out. Try being explicit that you want this residual variance free:
      threat ~~ NA*threat

The "NA" says make this a free parameter. It is possible that resolving the scaling issue will make this unnecessary.
     In my opinion, you don't need the composite operator <~. You could just specify regression:

threat =~ your two ysr variables
threat ~ x1 + x2 + x3 + x4 + x5 + x6

To me, that looks like the same statistical model that you are trying to estimate, especially given that you want threat to have a nonzero residual variance.
     But others may take a different view.

--
You received this message because you are subscribed to the Google Groups "lavaan" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lavaan+un...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/lavaan/24358af3-3c9b-4555-b2bd-9fb1dc9b2946n%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages