discriminantValidity in semTools: "scaling factor is negative"

164 views
Skip to first unread message

Brian O'Neill

unread,
Jun 25, 2023, 4:56:24 PM6/25/23
to lavaan
Hello/Hei,

I'm conducting a CFA with three latent variables and the latent correlations all seem reasonable as the CI Upper values are all less than .8 (per https://doi.org/10.1177/1094428120968614). However, I receive "scaling factor is negative" messages when I run discriminantValidity:

MYFMODEL <- '
f.C =~ C.1 + C.2 + C.3 + C.4
f.N =~ NE.EV.3R + NE.D.2 + NE.D.4R + NE.A.3R
f.GJS =~ GJS.1 + GJS.2R + GJS.3
'

FIT01 <- cfa(MYFMODEL, BFG, estimator = "MLR", missing = "ML", mimic = "Mplus", std.lv=TRUE)

discriminantValidity(FIT01, cutoff = 0.8, merge = FALSE, level = 0.95) 

  lhs op   rhs        est   ci.lower   ci.upper Df      AIC      BIC    Chisq Chisq diff Df diff   Pr(>Chisq)
1 f.C ~~   f.N -0.1320105 -0.3782692  0.1142482 42 4181.299 4281.663 134.5332   45.17942       1 1.797848e-11
2 f.C ~~ f.GJS  0.3219815  0.1234612  0.5205019 42 4162.609 4262.972 115.8426         NA       1           NA
3 f.N ~~ f.GJS -0.2949092 -0.4695532 -0.1202653 42 4158.628 4258.992 111.8620         NA       1           NA

Warning messages:
1: In lav_test_diff_SatorraBentler2001(mods[[m]], mods[[m + 1]]) :
  lavaan WARNING: scaling factor is negative
2: In lav_test_diff_SatorraBentler2001(mods[[m]], mods[[m + 1]]) :
  lavaan WARNING: scaling factor is negative


When I change the cutoff to .77 I do not get the warnings:

discriminantValidity(FIT01, cutoff = 0.77, merge = FALSE, level = 0.95)
  lhs op   rhs        est   ci.lower   ci.upper Df      AIC      BIC    Chisq Chisq diff Df diff    Pr(>Chisq)
1 f.C ~~   f.N -0.1320105 -0.3782692  0.1142482 42 4170.521 4270.884 123.7547   36.84004       1  1.282296e-09
2 f.C ~~ f.GJS  0.3219815  0.1234612  0.5205019 42 4152.315 4252.678 105.5487  275.85788       1  6.001507e-62
3 f.N ~~ f.GJS -0.2949092 -0.4695532 -0.1202653 42 4149.571 4249.935 102.8054 1377.75843       1 1.430858e-301


Q1: Would using the Satorra Bentler 2010 calculation instead of lav_test_diff_SatorraBentler2001 provide a warning-free result (and, if so, how could I use the Satorra Bentler 2010 instead)?
Q2: Is the discriminantValidity cutoff value related to the CI Upper value rule of thumb in (Rönkkö & Cho, 2022)? My highest CI Upper value is .52 so I'm not sure why changing the cutoff to .77 made a difference.
Q3: Do my factors show evidence of discriminant validity?   :-)

I would appreciate any insight from the community!

Christian Arnold

unread,
Jun 25, 2023, 5:06:16 PM6/25/23
to lav...@googlegroups.com
Hi Brian,

ignore this function  You only need the CI to Interpret the resullts.and the delta method used is only "conditionally" suitable for this. Use the bootstrap or MonteCarloCI for better results (standardized solution to look at the correlations). About your error mesaag: : i suspect MLM is used internally, but it doesn't have to be and i didn't check it.

HTH

Christian 


Von: lav...@googlegroups.com <lav...@googlegroups.com> im Auftrag von Brian O'Neill <bmmo...@gmail.com>
Gesendet: Sonntag, Juni 25, 2023 10:56:33 PM
An: lavaan <lav...@googlegroups.com>
Betreff: discriminantValidity in semTools: "scaling factor is negative"
--
You received this message because you are subscribed to the Google Groups "lavaan" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lavaan+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/lavaan/8df38c46-011a-4444-a340-58a8e5ffaee1n%40googlegroups.com.

Brian O'Neill

unread,
Jun 25, 2023, 9:09:45 PM6/25/23
to lavaan
Thank you Christian!

Christian Arnold

unread,
Jun 26, 2023, 3:15:27 PM6/26/23
to lav...@googlegroups.com
Brian,

Honestly, I don't think this function produces anything useful If you applies the marker approach, then you can easily include the correlations as self-defined parameters and forgo the chi-squared difference tests. If you are only interested in the CI (which definitely makes sense), then you will very quickly find a lot of articles that show you that the ML delta method (even with the corrections like "M" or "R") are only conditionally suitable if the data are not normally distributed. There is a lot of literature about this, also about correlations. The function and the related article are not fully thought out, sorry @Mikko, we had discussed this elsewhere . There should be better soltuions (resampling).

HTH

Christian
From: lav...@googlegroups.com <lav...@googlegroups.com> on behalf of Brian O'Neill <bmmo...@gmail.com>
Sent: Monday, June 26, 2023 3:09:45 AM
To: lavaan <lav...@googlegroups.com>
Subject: Re: discriminantValidity in semTools: "scaling factor is negative"
 

Rönkkö, Mikko

unread,
Jun 27, 2023, 2:21:16 AM6/27/23
to lav...@googlegroups.com

Hi,

 

Christian is right that there are multiple different ways Cis can be calculated. We do not discuss these different ways in the ORM article because it is long enough already without that discussion. (Similarly, we could discuss the different ways nested model comparisons can be implemented and the different ways how disattenuated correlation can be implemented)

 

As a practical matter, I would personally just look at the CIs and not do the LR test at all. There is practically no difference between the performance of these tests. The reasons why I would go for LR is if a reviewer asks me to do so. In both cases, scale the LVs by fixing their variances instead of standardizing the estimates after estimation. The discriminantValidity function will automatically re-estimate incorrectly scaled models and will not rescale estimates to their standardized values. As such, I do not think that any results that the function prints out use delta method.

 

Mikko

 

Christian Arnold

unread,
Jun 27, 2023, 1:31:01 PM6/27/23
to lav...@googlegroups.com
Hi Mikko,

Thanks for your reply. I agree with you that the p-values do not contain informatioken that cannot be taken from the CI. Brian used MLR, probably because the data is not normally distributed. In that case resampling techniques seem to me better to get "good" CI, even if "M" or "R" is used. I will not provide a deifnition of "good" and refer for this to the literature.

I understand that this was not the focus of the article and it was more focused on the shortcomings of, for example, AVE

I don't understand your last statement. When I look at the source code, the first step is to make sure that the variances of the latent variables are fixed and to eliminate the marker approach. Then the CI are extracted from the parameterEstimates function. Why is this no longer the delta method? One could integrate the bootstrap or MonteCarloCI at this point easily ...?

Best

Christian 


Von: lav...@googlegroups.com <lav...@googlegroups.com> im Auftrag von Rönkkö, Mikko <mikko....@jyu.fi>
Gesendet: Dienstag, Juni 27, 2023 8:21:20 AM
An: lav...@googlegroups.com <lav...@googlegroups.com>
Betreff: Re: discriminantValidity in semTools: "scaling factor is negative"

Rönkkö, Mikko

unread,
Jun 28, 2023, 3:39:24 AM6/28/23
to lav...@googlegroups.com

Hi,

 

We must have a different understanding of what “delta method” means. I understand delta method as a technique for approximating a distribution of a nonlinear function of an asymptotically normal parameter estimate. This involves multiplying the variance estimates with the first derivatives of the parameter estiamates from both sides (https://en.wikipedia.org/wiki/Delta_method). Lavaan does not calculate CIs this way but uses what I call normal approximation method.

 

See lines 479-480 https://github.com/yrosseel/lavaan/blob/master/R/lav_object_methods.R

 

The discriminantValidity function already supports bootstrap CIs. If the lavaan object was esetimated with boostrap SEs, the function reports percentile intervals. This is inherited behavior from parameterEstimates. I just added an option to do other kinds of boostrap CIs, following the options of parameterEstimates.

 

I do not understand how MonteCarloCIs would work in this context.

 

Best regards,

 

Mikko

 

Shu Fai Cheung (張樹輝)

unread,
Jun 28, 2023, 4:47:37 AM6/28/23
to lavaan
Hi,

This topic is new to me. I happened to be exploring topics related to CI and LR tests and find
this case relevant.

I created a dataset for illustration. I used empirical = TRUE just to make sure that I have full control on
the sample covariance matrix. This is OK as this is for illustrate the possibility of a situation, to be
discussed below:

``` r
library(semTools)
#> Loading required package: lavaan
#> This is lavaan 0.6-15
#> lavaan is FREE software! Please report any bugs.
#>
#> ###############################################################################
#> This is semTools 0.5-6
#> All users of R (or SEM) are invited to submit functions or ideas for functions.
#> ###############################################################################
library(lavaan)

mod_p <-
"
f1 =~ .8 * x1 + .8 * x2 + .8 * x3
f2 =~ .7 * x4 + .7 * x5 + .7 * x6
f3 =~ .7 * x7 + .8 * x8 + .9 * x9
f1 ~~ .771*f2
f1 ~~ .772*f3
f2 ~~ .773*f3
"

dat <- simulateData(mod_p, sample.nobs = 300, seed = 45351,
                    empirical = TRUE)

mod <-
"
f1 =~ x1 + x2 + x3
f2 =~ x4 + x5 + x6
f3 =~ x7 + x8 + x9
"

fit <- cfa(mod, dat, std.lv = TRUE)
parameterEstimates(fit)[22:24, ]
#>    lhs op rhs   est    se      z pvalue ci.lower ci.upper
#> 22  f1 ~~  f2 0.771 0.067 11.566      0    0.640    0.902
#> 23  f1 ~~  f3 0.772 0.060 12.779      0    0.654    0.890
#> 24  f2 ~~  f3 0.773 0.066 11.632      0    0.643    0.903

discriminantValidity(fit)
#>   lhs op rhs   est  ci.lower  ci.upper Df      AIC      BIC    Chisq Chisq diff
#> 1  f1 ~~  f2 0.771 0.6403441 0.9016559 25 8421.692 8495.768 4.135843   4.135843
#> 2  f1 ~~  f3 0.772 0.6535972 0.8904029 25 8422.700 8496.776 5.144058   5.144058
#> 3  f2 ~~  f3 0.773 0.6427546 0.9032454 25 8421.579 8495.655 4.022965   4.022965
#>      RMSEA Df diff Pr(>Chisq)
#> 1 0.102239       1 0.04198416
#> 2 0.117531       1 0.02332606
#> 3 0.100382       1 0.04488474
```

<sup>Created on 2023-06-28 with [reprex v2.0.2](https://reprex.tidyverse.org)</sup>


This is the first time I use this function. Please correct me if I am wrong in reading the output.

As shown above, the CIs for f1~~f2 and f2~~f3 both include .90, the default cutoff values.

However, the chi-square test have p-values < .05 for both factor correlations. Do this mean that the CI method
and the LR test lead to different conclusions?

This is a toy example I artificially created for illustration, so I tweaked the population factor correlations such
that the confidence limits are close to .90 and the p-values are close to .05. Nevertheless, if the goal
is to do a test and discrepancy between the two methods occur when we need it most to
draw a conclusion, then the choice, CI or LR test, matter.

-- Shu Fai

Rönkkö, Mikko

unread,
Jun 28, 2023, 5:43:23 AM6/28/23
to lav...@googlegroups.com

Hi,

 

Do this mean that the CI method and the LR test lead to different conclusions?” If you draw a binary conclusion based on a strict rule, then the answer is yes. But if we ask if the CI upper limit of 0.9016559  and p value of 0.04 for LR test that est < .9 are substantially different results, I would say that they are not.

 

There are a number of reasons why the test might differ, and this is a general feature of SEM. For some discussion, see

 

Gonzalez, R., & Griffin, D. (2001). Testing parameters in structural equation modeling: Every “one” matters. Psychological Methods, 6(3), 258-269. https://doi.org/doi:10.1037/1082-989X.6.3.258

 

(The article discusses scaling differences, which is not relevant to this case, but I think it explains quote well why variance estimates and tests might differ between different techniques.)

 

Consider the following code:

 

## The industrialization and Political Democracy Example

## Bollen (1989), page 332

model <- '

  # latent variable definitions

     ind60 =~ x1 + x2 + x3

     dem60 =~ y1 + a*y2 + b*y3 + c*y4

     dem65 =~ y5 + a*y6 + b*y7 + c*y8

 

  # regressions

    dem60 ~ ind60

    dem65 ~ ind60 + dem60

 

  # residual correlations

    y1 ~~ y5

    y2 ~~ y4 + y6

    y3 ~~ y7

    y4 ~~ y8

    y6 ~~ y8

'

 

fit <- sem(model, data = PoliticalDemocracy)

 

summary(fit, fit.measures = TRUE)

 

model <- '

  # latent variable definitions

     ind60 =~ x1 + x2 + x3

     dem60 =~ y1 + a*y2 + b*y3 + c*y4

     dem65 =~ y5 + a*y6 + b*y7 + c*y8

 

  # regressions

    dem60 ~ ind60

    dem65 ~ ind60 + dem60

 

  # residual correlations

    y2 ~~ y4 + y6

    y3 ~~ y7

    y4 ~~ y8

    y6 ~~ y8

'

 

fit2 <- sem(model, data = PoliticalDemocracy)

 

lavTestLRT(fit, fit2)

 

 

The two models differ in that the second does not include the y1 y5 error covariance. The estimate from the first model is

 

Covariances:

                   Estimate  Std.Err  z-value  P(>|z|)

.y1 ~~                                              

   .y5                0.583    0.356    1.637    0.102

 

The LR test gives

 

Chi-Squared Difference Test

 

     Df    AIC    BIC  Chisq Chisq diff   RMSEA Df diff Pr(>Chisq) 

fit  38 3153.6 3218.5 40.179                                       

fit2 39 3154.7 3217.2 43.207     3.0274 0.16441       1    0.08187 .

 

 

These two p values should be the same asymptotically, but in small samples they will differ. In some cases this difference leads to values that are on different sides of a cutoff.

 

Mikko

 

Shu Fai Cheung (張樹輝)

unread,
Jun 28, 2023, 7:09:27 AM6/28/23
to lavaan
Hi,

Thanks a lot for your comments. Coincidentally, it was the paper by Gonzalez and Griffin that led
me to create the toy example to see how, practically, the difference can be, Wald CI and LR test.

There is a lot of papers and discussion on CI vs. drawing binary conclusion using p-value so I
would avoid discussing this issue in this thread. I explored that case to ask a "what-if"
scenario: What if a user wants to draw a binary conclusion (for whatever reason).

But I agree that the difference in the toy example is not practically large if we do not insist
making a binary conclusion. A better conclusion, in the toy example, may be
"not having enough precision to draw a conclusion" (e.g., CI too wide).

-- Shu Fai

Shu Fai Cheung (張樹輝)

unread,
Jun 28, 2023, 7:56:59 AM6/28/23
to lavaan
By the way, I thought about exploring the same issue using bootstrap CIs instead
of Wald CI but it took a lot of time and I gave up. This is due to the bootstrapping
done when fitting the constrained model once for each factor correlation.

I proposed the following minor changes to discriminantValidity():


The current approach, a simple call to update(), is indeed the safest approach.
Nevertheless, for constrainedModels(), it may be safe to disable the
computation of SE, I believe.

My two cents.

-- Shu Fai

Christian Arnold

unread,
Jun 29, 2023, 7:27:44 AM6/29/23
to lavaan
Hi Shu Fai et al.

I think there are some aspects that need to be considered, and that's why I claimed that the function is not as efficient as it could be. Here are a few extensions to Shu Fai's script:

fit.2 <- cfa(mod, dat)
standardizedSolution(fit.2)[22 : 24,]

Note: I did not fix the variances and just looked at the standardized solution. We get effectively the same CI.

mod <- paste0(
  mod,
  "
  f1 ~~ c.f1f2 * f2 + c.f1f3 * f3
  f2 ~~ c.f2f3 * f3
 
  f1 ~~ v.f1 * f1
  f2 ~~ v.f2 * f2
  f3 ~~ v.f3 * f3
 
  cor.f1f2 := c.f1f2 / (v.f1^0.5 * v.f2^0.5)
  cor.f1f3 := c.f1f3 / (v.f1^0.5 * v.f3^0.5)
  cor.f2f3 := c.f2f3 / (v.f2^0.5 * v.f3^0.5)
  "
)

fit.3 <- cfa(mod, dat)
parameterEstimates(fit.3)[25 : 27,]

set.seed(123)
monteCarloCI(fit.3)

set.seed(123)
fit.4 <- cfa(mod, dat, se = "bootstrap")
parameterEstimates(fit.4, boot.ci.type = "bca.simple")[25 :27,]

Note:
I added the correlations as parameters. We can now easily extract the CI using bootstrap or any other method. All based on lavaan standard functionalities (apart from the MCCI).

mod <- paste0(
  mod,
  "
  cor.f1f2.p := cor.f1f2 - 0.9
  cor.f1f3.p := cor.f1f3 - 0.9
  cor.f2f3.p := cor.f2f3 - 0.9
  "
)

fit.5 <- cfa(mod, dat)
parameterEstimates(fit.5)[25 : 30,]


Note: We can also easily determine p-values that match the CI. The parameters are not implemented completly correctly. The estimates of the correlations could be negative or above the cut-off. However, this would be easy to adjust - it serves only as an example for the toy model.


In all cases (except the bootstrap), it was not necessary to fit the model several times to get the desired results. Therefore, I conclude that there are more efficient ways that are more flexible. Especially if suboptimal conditions are present (non-normal data ...).

Best

Christian


Reply all
Reply to author
Forward
0 new messages