Fit bifactor on simulated dataset

Pierre-Charles Soulié

unread,

Sep 22, 2024, 7:53:08 AMSep 22

to lavaan

Hello everyone,

I am new to R and lavaan.

I have a syntax to estimate a bifactor and a second-order model in order to compare them. I would like to execute this syntax on a simulated dataset. Unfortunately, I have no dataset to call (“your dataset”) and associated results of simulations are not that of the observed variables but that of general and specific factors.

I would like therefore to have help regarding syntax either :

a) to get the associated data (for overserved variables and not loadings parameters) with my simulation in order to execute my syntax regarding bifactor model on this data;

b) or to execute my syntax on the results of the simulation.

I have a syntax on simsem but I could also simulate my date with lavaan. Syntaxes for simulation and estimation of models are below.

Many thantks fro your help.

Pierre-Charles

## Fit the bifactor model

fit_bifactor <- cfa(bifactor_model, data = your_data)

## Fit the second-order model

fit_second_order <- cfa(second_order_model, data = your_data)

## S-L decomposition

fit_sl <- schmidLeiman(fit_second_order)

## Extract loadings

loadings_bifactor <- parameterEstimates(fit_bifactor)

loadings_sl <- parameterEstimates(fit_sl)

## Compare models

anova(fit_bifactor, fit_second_order)

For non-nested (most likely)

## Manually extract factor loadings

loadings_bifactor <- parameterEstimates(fit_bifactor)$est

loadings_second_order <- parameterEstimates(fit_second_order)$est

## Compute differences and test significance

loading_differences <- loadings_bifactor - loadings_second_order

## Wald

lavTestWald(fit_bifactor, constraints = "loadings_bifactor == loadings_second_order")

library(simsem)

## define population model

populationModel <- '

# define loadings

G =~ .6*y1 + .6*y2 + .6*y3 + .6*y4 + .6*y5 + .6*y6 + .6*y7 + .6*y8 + .6*y9

S1 =~ .2*y1 + .4*y2 + .4*y3

S2 =~ .4*y4 + .4*y5 + .4*y6

S3 =~ .4*y7 + .4*y8 + .4*y9

# define residual variances of indicators (= 1 - .6^2 - .4^2)

y1 ~~ .6*y1

y2 ~~ .48*y2

y3 ~~ .48*y3

y4 ~~ .48*y4

y5 ~~ .48*y5

y6 ~~ .48*y6

y7 ~~ .48*y7

y8 ~~ .48*y8

y9 ~~ .48*y9

# set factor variances to 1 (so that loadinGS are in a standardized metric)

G ~~ 1*G

S1 ~~ 1*S1

S2 ~~ 1*S2

S3 ~~ 1*S3

# define factors to be orthoGonal

S1 + S2 + S3 ~~ 0*G

S1 + S2 ~~ 0*S3

S1 ~~ 0*S2

'

## define analysis model

analysisModel <- '

G =~ NA*y1 + y2 + y3 + y4 + y5 + y6 + y7 + y8 + y9

S1 =~ NA*y1 + y2 + y3

S2 =~ NA*y4 + y5 + y6

S3 =~ NA*y7 + y8 + y9

S1 + S2 + S3 ~~ 0*G

S1 + S2 ~~ 0*S3

S1 ~~ 0*S2

G ~~ 1*G

S1 ~~ 1*S1

S2 ~~ 1*S2

S3 ~~ 1*S3

'

Yago Luksevicius de Moraes

unread,

Sep 23, 2024, 4:52:54 PMSep 23

to lavaan

Hi,
I’m a bit confused about your question, thus forgive me if I understood it wrong.

a) If you want a data frame that simulates your syntax, you can use the following syntax:

data.sim <- simsem::sim(nRep = 1e4, #number of desired data frames model = populationModel, n = nrow(your_data), #number of observations in the data frames seed = 123, #seed to allow replications multicore = TRUE, #paralellize to make it faster dataOnly = TRUE) #returns a list of data frames with no analysis

Then, you can try to fit the list of data frames with the following syntax:

fit.sim <- simsem::sim(model = analysisModel, rawData = data.sim, lavaanfun = "cfa", seed = 123, multicore = TRUE, #orthogonal = TRUE, # Uncomment here to orthogonalize the factors, rather than doing it in the analysis model syntax. std.lv = TRUE) #Use this command to identify the model by standardizing the factors instead of doing it in the analysis model syntax

b) if you tell simsem to generate and analyze the data at once, you can get the estimated loadings through the summary function or inside the $coef of the result. E.g.,

#generate and analyze the data at once fit.sim <- simsem::sim(model = analysisModel, generate = populationModel, nRep = 1e4, n = nrow(your_data), lavaanfun = "cfa", seed = 123, multicore = TRUE, #orthogonal = TRUE, # Uncomment here to orthogonalize the factors, rather than doing it in the analysis model syntax. std.lv = TRUE) #Use this command to identify the model by standardizing the factors instead of doing it in the analysis model syntax # Get estimated loadings through summary function print(simsem::summary(fit.sim)) #get estimated loadings inside simsem result print(summary(fit.sim@coef))

Best regards,
Yago

Pierre-Charles Soulié

unread,

Sep 24, 2024, 8:45:17 AMSep 24

to lavaan

Hi,

Thank you so much Yago. It is very kind of you to have taken the time to help me.

It is also really helpful.

Explained in other words, my problem is that I get a data frame with the results or loadings of my model that is the loadings of the general factor and the specific factors and I need a dataset only with results in termes of observed variables (y1 to y9) in order to apply the other part of my syntax.

I hope it more clear.

Best regards

Pierre-Charles

Yago Luksevicius de Moraes

unread,

Sep 25, 2024, 9:49:36 AMSep 25

to lavaan

Hi, Pierre-Charles,

As far as I understand, what you are asking is the first part of my answer to question a.

Just set `nRep = 1` and `n = x`, where `x` is the number of desired rows in the data frame.