std.all for user-defined parameters in mutliple group models

Shu Fai Cheung (張樹輝)

unread,

Jul 5, 2023, 3:53:39 AM7/5/23

to lavaan

Hi All,

I encountered an interesting behavior of lavaan that I would like to have comments from others, including new users of lavaan.

Suppose, in a multiple group model, a user (a) imposes between-group equality constraints on regression paths, and (b) defines a parameter from these paths, e.g., the difference in effect:

``` r
# Adapted from the help page of cfa().
library(lavaan)
#> This is lavaan 0.6-15
#> lavaan is FREE software! Please report any bugs.
HS.model <- ' visual =~ x1 + x2 + x3
textual =~ x4 + x5 + x6
speed =~ x7 + x8 + x9
speed ~ c(a, a) * visual + c(b, b) * textual
amb := a - b'
```

This model syntax looks OK (I will discuss a problem with this model later).

The user-defined parameter is amb, a minus b.

These are the results (I added group.label on purpose, explained later):

```r

fit1 <- cfa(HS.model, data = HolzingerSwineford1939,
group = "school",
group.label = c("Pasteur", "Grant-White"))

parameterEstimates(fit1, standardized = TRUE, se = FALSE)[c(10, 11, 46, 47, 73), ]
#> lhs op rhs block group label est std.lv std.all std.nox
#> 10 speed ~ visual 1 1 a 0.236 0.359 0.359 0.359
#> 11 speed ~ textual 1 1 b 0.110 0.165 0.165 0.165
#> 46 speed ~ visual 2 2 a 0.236 0.289 0.289 0.289
#> 47 speed ~ textual 2 2 b 0.110 0.163 0.163 0.163
#> 73 amb := a-b 0 0 amb 0.127 0.194 0.194 0.194

```

The difference is .127, and the standardized difference is .194.

However, if we change the order of the group, these are the results:

```r

fit2 <- cfa(HS.model, data = HolzingerSwineford1939,
group = "school",
group.label = c("Grant-White", "Pasteur"))
parameterEstimates(fit2, standardized = TRUE, se = FALSE)[c(10, 11, 46, 47, 73), ]
#> lhs op rhs block group label est std.lv std.all std.nox
#> 10 speed ~ visual 1 1 a 0.236 0.289 0.289 0.289
#> 11 speed ~ textual 1 1 b 0.110 0.163 0.163 0.163
#> 46 speed ~ visual 2 2 a 0.236 0.359 0.359 0.359
#> 47 speed ~ textual 2 2 b 0.110 0.165 0.165 0.165
#> 73 amb := a-b 0 0 amb 0.127 0.126 0.126 0.126

```

The unstandardized difference is still .127, as expected. However, the difference in the standardized solution is now .126. That is, this difference depends on the order of the groups.

The reason is, the values of amb in the standardized solution are computed using the standardized solutions from different groups, the first group in each case:

```r
# In fit1, the std.all of 'amb' is computed using results from "Pasteur"
.359 - .165
#> [1] 0.194
# In fit2, the std.all of 'amb' is computed using results from "Grant-White"
.289 - .163
#> [1] 0.126
```

This phenomenon is not new and a similar one has been discussed here before.

However, the behavior of lavaan also makes sense because, to lavaan, when computing amb (:= a - b), it assumes there are only one value of all parameters labelled a, and one value for all parameters labelled b. This is true in the unstandardized solution. However, this is not true in the standardized solution.

For users who are aware of the phenomenon (equal Bs does not imply equal betas because the SDs can be different), they may already notice that the model defined above, though not wrong, can lead to misleading results in the standardized solution.

To (a) impose between-group equality constraints on regression paths, (b) define a parameter from these paths, e.g., the difference in effects, *and* (c) estimate the differences in both the unstandardized solution and the standardized solution, the correct way to define the model is to use explicit equality constraints, rather than labels:

``` r
# Adapted from the help page of cfa().
library(lavaan)
#> This is lavaan 0.6-15
#> lavaan is FREE software! Please report any bugs.
HS.model <- ' visual =~ x1 + x2 + x3
textual =~ x4 + x5 + x6
speed =~ x7 + x8 + x9
speed ~ c(a1, a2) * visual + c(b1, b2) * textual
a1 == a2
b1 == b2
amb1 := a1 - b1
amb2 := a2 - b2'
fit1 <- cfa(HS.model, data = HolzingerSwineford1939,
group = "school",
group.label = c("Pasteur", "Grant-White"))
fit2 <- cfa(HS.model, data = HolzingerSwineford1939,
group = "school",
group.label = c("Grant-White", "Pasteur"))

parameterEstimates(fit1, standardized = TRUE, se = FALSE)[c(10, 11, 46, 47, 73, 74), ]
#> lhs op rhs block group label est std.lv std.all std.nox
#> 10 speed ~ visual 1 1 a1 0.236 0.359 0.359 0.359
#> 11 speed ~ textual 1 1 b1 0.110 0.165 0.165 0.165
#> 46 speed ~ visual 2 2 a2 0.236 0.289 0.289 0.289
#> 47 speed ~ textual 2 2 b2 0.110 0.163 0.163 0.163
#> 75 amb1 := a1-b1 0 0 amb1 0.127 0.194 0.194 0.194
#> 76 amb2 := a2-b2 0 0 amb2 0.127 0.126 0.126 0.126
parameterEstimates(fit2, standardized = TRUE, se = FALSE)[c(10, 11, 46, 47, 73, 74), ]
#> lhs op rhs block group label est std.lv std.all std.nox
#> 10 speed ~ visual 1 1 a1 0.236 0.289 0.289 0.289
#> 11 speed ~ textual 1 1 b1 0.110 0.163 0.163 0.163
#> 46 speed ~ visual 2 2 a2 0.236 0.359 0.359 0.359
#> 47 speed ~ textual 2 2 b2 0.110 0.165 0.165 0.165
#> 75 amb1 := a1-b1 0 0 amb1 0.127 0.126 0.126 0.126
#> 76 amb2 := a2-b2 0 0 amb2 0.127 0.194 0.194 0.194
```

Defined this way, we can correctly have an estimate of the difference in effects in the unstandardized solution, which is the same across group, and *two* estimates of the difference in effects in the standardized solution, which can be different across groups because the SDs of the variables involved are not constrained to be different.

The difference between the unstandardized and the standardized solutions is not an issue here. This is well-known (same Bs does not imply same betas).

The issue I am interested in is a software design one.

Should lavaan compute the user-defined parameter (amb in the example) in the standardized solution (std.all), in cases like the one above?

That is, should an SEM program check whether the values it uses to compute a user-defined parameter (a and b, in this case), though constrained to be equal in the unstandardized solution, can take different values in the standardized solution, and hence refuses to compute the parameter in the standardized solution?

Or should users themselves have the responsibility to be careful when using labels if they are also going to read the results from the standardized solution?

-- Shu Fai

Yves Rosseel

unread,

Jul 5, 2023, 9:57:12 AM7/5/23

to lav...@googlegroups.com

On 7/5/23 09:53, Shu Fai Cheung (張樹輝) wrote:
> That is, should an SEM program check whether the values it uses to
> compute a user-defined parameter (a and b, in this case), though
> constrained to be equal in the unstandardized solution, can take
> different values in the standardized solution, and hence refuses to
> compute the parameter in the standardized solution?
>
> Or should users themselves have the responsibility to be careful when
> using labels if they are also going to read the results from the
> standardized solution?

Good question. This is indeed the difficulty of software design. In the
first years I worked on lavaan, I would have opted for the former (i.e.
lavaan tries to be smart). Today, I am more leaning towards the latter
(i.e., the user is responsible).

The reason is the following: what 'seems' to be smart (automatic)
behavior in one setting, may be very dumb (or unwanted) in another
setting. And it is very difficult (as a software developer) to foresee
all those settings.

If I could rewrite lavaan, it would be 'less' smart.

Yves.

Keith Markus

unread,

Jul 6, 2023, 9:47:17 AM7/6/23

to lavaan

Thanks for this interesting thread.

My inclination might be hidden option number 3: Do not auto-correct the code but issue a warning to the user to let them know that lavaan needs to resolve an ambiguous reference for the standardized defined parameter. I think that is consistent with what Yves is saying and has the added advantage of giving a head's up to the user that may prompt them to recognize the issue and change their code themselves.

I envision the warning being issued whenever any label is used more than once outside of defined parameters and also used in a defined parameter -- but I have not fully thought that through. The trick here is that lavaan does not realize that it is resolving an ambiguous reference, it is just following a default rule for parsing the model syntax. So, the condition for the warning needs to independently recognize when an ambiguous reference occurs.

Keith

------------------------
Keith A. Markus
John Jay College of Criminal Justice, CUNY
http://jjcweb.jjay.cuny.edu/kmarkus
Frontiers of Test Validity Theory: Measurement, Causation and Meaning.
http://www.routledge.com/books/details/9781841692203/

Shu Fai Cheung (張樹輝)

unread,

Jul 26, 2023, 10:16:31 PM7/26/23

to lavaan

(Sorry for my late follow-up. Had a lot of things to do after a trip.)

Thanks, Yves and Keith, for your insightful comments.

As a developer of some R tools, I also want (hope) users to understand that they should be responsible for what they do (including selecting the tool they use). However, I also care about user-friendliness because most users are, well, just users. They already need to spend a lot of time on their own content areas and it is not realistic to expect them to understand methods like SEM as much as methodologists do. I want to help users (me included, as I am also a user of tools I developed) to do things right and avoid doing things wrong. However, as Yves mentioned, being "smart" is not easy. There are so many ways thing can go wrong.

I think Kieth's option is a good one. If anything potentially problematic occurred, warn the users. This is also good for learning as this prompts users to ask questions (e.g., in this group) and they may learn something in the process.

-- Shu Fai

Reply all

Reply to author

Forward