'<~' vs '~' when modelling an observed variable's effect on a latent variable?

105 views
Skip to first unread message

Michael Truong

unread,
Nov 19, 2022, 11:13:45 AM11/19/22
to lavaan
Hello,

Newbie at SEM and lavaan here, so apologies if I mess up the terminology, but I just had a question regarding the <~ and ~ that I was hoping to get some clarification on...

What I'm wondering is what the difference between  <~ and ~ are supposed to be and when we're supposed and not supposed to use '<~'?

From my understanding based on ?model.syntax, `<~` is supposed to be used when we want to define the effects of an observed variable on a latent variable. This is similar to what is described in Beaujean's book on lavaan

However, when I look at the lavaan tutorial pdf there doesn't appear to be any mention of '<~'. Furthermore, when I look at the tutorial section for growth curve modelling, it seems that '~' is used to represent the effect of `x1` and `x2` on the latent intercept and slope. 

So is `<~` simply not supposed to be used anymore, or am I not understanding something about formative latent variables...?

Thanks

Michael Truong

unread,
Nov 19, 2022, 11:25:40 AM11/19/22
to lavaan
For what it's worth, I just checked the Beaujean book and on page 86 they also use '~' to specify the relationship between the observed variables effects on slope and intercept, so I'm starting to wonder if what I'm not understanding is what a formative latent variable is supposed to be.

Edward Rigdon

unread,
Nov 19, 2022, 11:41:52 AM11/19/22
to lav...@googlegroups.com
Michael--
     "Not supposed to be used anymore" is about what Yves has said--the <~ operator is unreliable.
     The distinguishing function of both =~ and <~ in lavaan is to create a new variable, something that is not in the dataset. The regression operator ~ does not create a new variable.
     You can look at syntax with the <~ operator
F <~ x1 + x2
as shorthand for the two lines
F =~ 0*x1
F ~ x1 + x2
F ~~ 0*F
The first line just uses the =~ functionality to instantiate unobserved variable F. The second line specifies F's relationship to x1 and x2. The third line fixed F's residual variance to 0, so that F is a composite of x1 and x2 and not just dependent on them.
     That still may leave an identification problem, because lavaan is a tool for modeling common factors, and the F defined here is a composite, not a common factor.

--
You received this message because you are subscribed to the Google Groups "lavaan" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lavaan+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/lavaan/eec818ca-8691-489c-9863-07ce109d1da3n%40googlegroups.com.

Michael Truong

unread,
Nov 19, 2022, 2:53:50 PM11/19/22
to lavaan
Hi Edward,

Thanks, I didn't know that--about the <~ operator being a shorthand. Do you know where I can read more about what the underlying math the operators correspond to are?

Secondly, if I understand this correctly, does that mean that if we want to define something like:

x1 -> F
F -> x2 + x3 + x4

where '->' represents the arrows in a diagram, where x(n) are different measures in the dataset and where F is the latent variable that we should use:

F ~ x1 
F =~ x2 + x3 + x4

Or does lavaan simply not handle this?

Lastly, I think I'm a bit confused about this situation regarding common versus composite factors in lavaan, I was under the impression that if we want to model the effect of an observed variable on some latent variable (such as x1 -> F or x1 to i in the growth curve tutorial) then we should use <~. But if lavaan doesn't handle this anymore and it also doesn't handle composite factors, then what is F ~ x1 supposed to mean? Is it an incorrect approximation of the composite factor, or?

Thanks

Edward Rigdon

unread,
Nov 19, 2022, 3:15:25 PM11/19/22
to lav...@googlegroups.com
Michael--
     I'm not an insider and not a coder, and, while lavaan is open source, like all of R, I could not easily tell you exactly what lavaan does with a <~ operator.
     But yes, what you describe is correct, if you want x1 to predict F, while F is a common factor with indicators x2, x3 and x4. The ~ operator describes the regression of F on x1, but does not create a variable called F. The second line creates F, a variable that is not present in the dataset.

Michael Truong

unread,
Nov 19, 2022, 4:51:38 PM11/19/22
to lavaan
Thank you, I think that clarifies most of the things for me. I'll try to dig in the source code myself to understand the discrepancies. 

Yves Rosseel

unread,
Nov 22, 2022, 5:09:12 AM11/22/22
to lav...@googlegroups.com
The <~ operator is a shorthand notation.

For example:

model <- ' fy =~ church + members + friends
f <~ 1*income + occup + educ
fy ~ f '

is equivalent to

model <- ' fy =~ church + members + friends
f =~ 0
f ~~ 0*f
f ~ 1*income + occup + educ
fy ~ f '

where 'f =~ 0' creates a 'phantom' latent variable.

If you accept this shorthand, there is nothing wrong with using the <~
operator. But you cannot just replace any (reflective) =~ definition by
a (formative) <~ definition. The rules of identification still follow
those of reflective latent variables.

Yves.

Michael Truong

unread,
Nov 22, 2022, 11:02:12 AM11/22/22
to lavaan
Hi Yves,

Thanks for jumping in.

So then if I understand correctly, does this mean that a latent variable cannot be both reflective and formative? 

Let's say we have a diagram where you have two indicators pointing to a latent variable, then that latent variable pointing to three indicators. Then does the choice of whether you define the LV as formative or reflective have to based on substantive concerns?

Thanks

Edward Rigdon

unread,
Nov 22, 2022, 11:23:51 AM11/22/22
to lav...@googlegroups.com
Michael--
In such a model, if you are using factor model software, like lavaan, then the variable is a common factor, because that is what lavaan creates. On the regression side, the common factor will have residual variance--an error variance that resolves the discrepancy between common factor and dependent composite.

If you constrain the residual variance of the factor to 0--implying that it is a strict composite of its predictors--then it is a composite of its predictors. However, if the residual variance is not actually 0, then it is a misspecified model, and model parameters are likely to be biased. Thus, what you will get is a composite formed with biased weights and / or a factor formed with biased loadings. How much of each kind of bias you get will depend on how many indicators on each side of the factor, the relative strength of correlations, your sample size, and your estimator.

I am assuming that, otherwise, the model is correct--in particular, that the common factor effectively mediates relations between the predictors and the indicators that load on the common factor, and between the various indicators.

It does not actually matter whether you label the variable "reflective" or "formative" (though it does seem to be important to many people). What matters is the statistical model, and whether or not it is consistent with the data.


--
You received this message because you are subscribed to the Google Groups "lavaan" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lavaan+un...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages