What does exactly “sample.nobs” do in lavaan::sem?

alodie

unread,

Sep 11, 2023, 5:14:02 AM9/11/23

to lavaan

Hi!

For my project, I'm simulating covariance matrices using the correlation matrix from the publications, and then I run models on these simulated matrices. In some simulations, I want to examine how changing the sample size might affect the results. For instance, I'm curious about how the results would differ with a smaller or larger sample size.

Now, my question is: Can I modify the code below to adjust the sample size as needed in this specific context (e.g., 200 when the true sample size was 2000 or 2000 when the true sample size was 250)?

fit <- lavaan::sem(model, sample.cov = as.matrix(dat.cov), sample.nobs = samplesize, std.lv = TRUE)

What does exactly “sample.nobs” do?

Thanks in advance for your response.

Best,
Alodie

Julia Walther

unread,

Sep 11, 2023, 10:11:47 AM9/11/23

to lavaan

Hi alodie,

"sample.nobs" indicates the sample size. When you use the sample covariance matrix as input, you have to indicate the sample size separately, because it can't be inferred from the sample covariance matrix (which has the size p x p), but it's needed for the test statistics and model fit. In contrast, if you input data (which has the size N x p), then the sample size is inferred from the number of rows. (p = number of observed variables, N = sample size)

Best,

Julia

Christian Arnold

unread,

Sep 11, 2023, 2:13:57 PM9/11/23

to lav...@googlegroups.com

As a complement to Julia. My best guess is that "nobs" stands for "number of observations". Only Yves can probably answer that. I don't think artificially changing the sample size produces meaningful findings - just my 2 cents ...

HTH

Christian

Von: lav...@googlegroups.com <lav...@googlegroups.com> im Auftrag von Julia Walther <mountk...@gmail.com>
Gesendet: Montag, September 11, 2023 4:11:58 PM
An: lavaan <lav...@googlegroups.com>
Betreff: Re: What does exactly “sample.nobs” do in lavaan::sem?

--
You received this message because you are subscribed to the Google Groups "lavaan" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lavaan+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/lavaan/aac47203-f818-4214-853c-51f35e3a262dn%40googlegroups.com.

alodie

unread,

Sep 12, 2023, 4:07:02 AM9/12/23

to lavaan

Thank you for your responses. Do you understand how the sample size affects this covariance matrix? Furthermore, because I can simply change the sample size in the code above (e.g., from 250 to 500 participants), it seems possible to simulate which sample size would have been the best to get robust results and stable estimates (given that the covariance matrix is the true covariance matrix and does not change across the different samples). Is this possible? Does this make sense?

Best,

Alodie

Shu Fai Cheung (張樹輝)

unread,

Sep 12, 2023, 4:24:59 AM9/12/23

to lavaan

Try to run the following and examine the results of lavInspect(fit, "sampstat"). It reports that actual sample covariance matrix being analyzed (with all options at default)

library(lavaan)

HS.model <- ' visual =~ x1 + x2 + x3
textual =~ x4 + x5 + x6
speed =~ x7 + x8 + x9 '

dat_cov <- cov(HolzingerSwineford1939[, paste0("x", 1:9)])

fit <- cfa(HS.model, sample.cov = dat_cov, sample.nobs = 20)
lavInspect(fit, "sampstat")
fit <- cfa(HS.model, sample.cov = dat_cov, sample.nobs = 50)
lavInspect(fit, "sampstat")
fit <- cfa(HS.model, sample.cov = dat_cov, sample.nobs = 200)
lavInspect(fit, "sampstat")
fit <- cfa(HS.model, sample.cov = dat_cov, sample.nobs = 500)
lavInspect(fit, "sampstat")

You can see that the sample covariance matrix being analyzed are not identical, even dat_cov is used in all cases. But the covariance matrices will become more and more similar as the sample size increases.

-- Shu Fai

Shu Fai Cheung (張樹輝)

unread,

Sep 12, 2023, 11:11:22 AM9/12/23

to lavaan

If we add likelihood = "wishart" , the covariance matrix being analyzed will be the same:

library(lavaan)
HS.model <- ' visual =~ x1 + x2 + x3
textual =~ x4 + x5 + x6
speed =~ x7 + x8 + x9 '

dat_cov <- cov(HolzingerSwineford1939[, paste0("x", 1:9)])

fit <- cfa(HS.model, sample.cov = dat_cov, sample.nobs = 20, likelihood = "wishart")
(tmp1 <- lavInspect(fit, "sampstat"))
fit <- cfa(HS.model, sample.cov = dat_cov, sample.nobs = 50, likelihood = "wishart")
(tmp2 <- lavInspect(fit, "sampstat"))
fit <- cfa(HS.model, sample.cov = dat_cov, sample.nobs = 200, likelihood = "wishart")
(tmp3 <- lavInspect(fit, "sampstat"))
fit <- cfa(HS.model, sample.cov = dat_cov, sample.nobs = 500, likelihood = "wishart")
(tmp4 <- lavInspect(fit, "sampstat"))

# Verify that the covariance matrices analyzed are the same
all.equal(tmp1, tmp2)
all.equal(tmp2, tmp3)
all.equal(tmp3, tmp4)
all.equal(dat_cov, unclass(tmp1$cov), ignore_attributes = TRUE)

You can take a look at the sections "sample.cov.rescale" and "likelihood" in the help page of lavOptions.

Hope this helps.

-- Shu Fai

Julia Walther

unread,

Sep 12, 2023, 5:29:15 PM9/12/23

to lavaan

Hi again,

the sample size (N) is used to rescale the sample covariance matrix (S) because sample.cov.rescale=TRUE is the default (see in ?lavOptions). Lavaan uses the "biased" S without Bessel's correction (i.e., the denominator is N) and cov() is the "unbiased" S with Bessels' correction (i.e., the denominator is N-1). Rescaling the sample covariance matrix means multiplying it with *(N-1)/N because it is assumed that you give the "unbiased" S as input.

library(lavaan)
HS.model <- ' visual =~ x1 + x2 + x3
textual =~ x4 + x5 + x6
speed =~ x7 + x8 + x9 '

N <- nrow(HolzingerSwineford1939)

# unbiased S:
S_u <- cov(HolzingerSwineford1939[, paste0("x", 1:9)])
fit_u <- cfa(HS.model, sample.cov = S_u, sample.nobs = N, sample.cov.rescale=FALSE)
lavInspect(fit_u, "sampstat") # same as S_u

# biased S:
S_b <- cov(HolzingerSwineford1939[, paste0("x", 1:9)])*(N-1)/N
fit_b <- cfa(HS.model, sample.cov = S_b, sample.nobs = N, sample.cov.rescale=FALSE)
lavInspect(fit_b, "sampstat") # same as S_b

Irrespective of which estimator we use, we have to indicate the correct N to get correct test statistics and model fit statistics (correct as in not "cheating" with N and getting false statistics). I guess we can use robust standard error estimation with giving S as input though.

Further, I guess when "likelihood=wishart" then "sample.cov.rescale=FALSE" and thus the S we give as input is not changed by the N we indicate.

Hope that helps,

Best,

Julia

alodie

unread,

Sep 23, 2023, 5:50:52 AM9/23/23

to lavaan

Thank you for all your responses. I now better understand how the covariance matrix is scaled with N.

Given that, is it safe to estimate the model with different Ns (e.g., 250, 500, 750, 1000 and 2000) in order to determine which sample size would have been the best to get stable results and estimates?

Only to be clear: The goal is not to "cheat". The goal is to simulate the best minimum sample size.

Shu Fai Cheung (張樹輝)

unread,

Sep 23, 2023, 6:00:15 AM9/23/23

to lavaan

May you clarify what you meant by being "stable"? For parameter estimates, having standard errors that are small enough by some criterion?

-- Shu Fai

alodie

unread,

Oct 1, 2023, 2:07:31 PM10/1/23

to lavaan

In this case, "stable" would mean that the results and estimates do not change, depending on the sample size.