Is there a possibility to compare two estimation methods in SEM with simsem package?

223 views
Skip to first unread message

Monika Ramoskaite

unread,
Nov 24, 2019, 11:17:39 AM11/24/19
to lavaan
I would like to use ML and DWLS estimators in SEM and then by using Monte-Carlo approach compare: a) average relative bias of the estimators, b) average relative mean squared error of parameter estimates, (c) average relative bias of standard error estimates, (d) relative bias of chi-square statistics, and (e) the model rejection rate associated with the chi-square statistic at an alpha level of .05

Is there such possibility with simsem package or any posibility to compare estimators?

Terrence Jorgensen

unread,
Nov 25, 2019, 6:16:22 AM11/25/19
to lavaan
Is there such possibility with simsem package or any posibility to compare estimators?

You just need to run sim() separately for each estimator.  If you have a factorial Monte Carlo design, you should run sim() once for each cell in your design.  You can compare results across conditions by printing the summary of results in each condition, and constructing tables for the results you want to focus on.

Terrence D. Jorgensen
Assistant Professor, Methods and Statistics
Research Institute for Child Development and Education, the University of Amsterdam

Alex Schoemann

unread,
Nov 25, 2019, 9:21:11 AM11/25/19
to lavaan
I'll add that you probably want to fit the models using each estimator to the same data sets if possible. To do this, run your code (for one condition) using sim() with dataOnly = TRUE, this will give you a list of data frames you can then fit models to this data  by using the rawData option in sim()

Alex

Monika Ramoskaite

unread,
Nov 25, 2019, 10:07:38 AM11/25/19
to lav...@googlegroups.com
thank you Alex.

would it be something like this?

fit <- sem(FullModel, data=data, std.lv=TRUE, estimator="ML")
raw <- sim(1000, n=nrow(data), model = fit, dataOnly=TRUE)
output <- sim(model = parTable(fit), rawData=raw)

and then change the estimator and repeat?

--
You received this message because you are subscribed to the Google Groups "lavaan" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lavaan+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/lavaan/948b6292-d3e8-411a-8266-b1695ae71402%40googlegroups.com.
Message has been deleted

Terrence Jorgensen

unread,
Nov 25, 2019, 11:08:41 AM11/25/19
to lavaan
I'll add that you probably want to fit the models using each estimator to the same data sets if possible.

Agreed, that greatly reduces Monte Carlo error (like a repeated-measures design)
 
To do this, run your code (for one condition) using sim() with dataOnly = TRUE, this will give you a list of data frames you can then fit models to this data  by using the rawData option in sim()

Wait, is that necessary?  I think you can simply use the same seed= argument (which would be the same by default since the default argument is seed=123321) to generate the same data.  As long as the generate= and n= arguments are the same, then the same data-generation mechanism/parameters are used, so you should get the same data.  I think the rawData= argument is advantageous when you want to use the same data across programs (e.g., comparing lavaan to Mplus, you can generate data using Mplus' MONTECARLO, then analyze them with simsem too).

Alex Schoemann

unread,
Nov 25, 2019, 3:26:28 PM11/25/19
to lavaan
I believe that code is correct.

Terrance: It's not necessary to output the raw data* and use it repeatedly but it (often) saves time**. It's redundant to generate the date 2, 3, or 4 times when it can just be stored as a list and reused.

Alex

* Assuming there are no other random processes in your code. If you're generating data and bootstrapping in some conditions then you might end up in different places in the random number string across conditions.
**This does cost you a little in terms of memory usage. If you're working with large datasets it might not be worth it to save the data as a list and regenerate it using the same seed. Also, generating the data separately can make it more difficult to parallelize computations in some situations.

Terrence Jorgensen

unread,
Nov 26, 2019, 8:13:53 AM11/26/19
to lavaan
Good points Alex, thanks.

Here is a message of Monika's that didn't make it to the forum, but made it to my own email:

As I understand, then it is enough to write sim() as it was: sim(1000, data=nrow(data), model=fit), as seed argument is set by default.

Perhaps but you are doing something complicated.

You also mention generate= argument, which I believe if not specified is set by default with model= argument?

If your analysis model estimates all the nonzero parameters in your data-generating model, then yes, as long as your model= argument includes estimates / starting values to use as parameters for data generation.
 
Therefore if I run the below simulation two times (for each estimator):
fit <- sem(FullModel, data=data, std.lv=TRUE, estimator="ML")
output <- sim(1000, n=nrow(data), model = parTable(fit))

I should receive the output I am looking for, or should I specify generate argument afterall?

I don't know, what specifically are you trying to do?  You vaguely mentioned comparing estimators, but you haven't explained whether you are using Monte Carlo to do power analysis, fit evaluation, or an empirical study comparing the estimators.

I have a model with mediation effect. When I run it with lavaan, sem() function, ML estimate, I get, quite good results

Not according to your chi-squared test and RMSEA.
 
 enter image description here
All parameter estimates are also significant, including mediation effect (also tested with bootstrap, to get CI).
However when I run simulation with simsem with sem() function

You mean using the sim() function, with sem() output as the model= argument?
 
I get poor results, poor model-fit indices cut-off values (e.g. CFI=0, RMSEA>0.3), and the pValue() function shows that all simulations showed worse results than analysis from observed data. In addition, strangely, andRule value (the number of replications that have all fit indices indicating a better model than the observed data) is equal 1, which I suppose contradicts the other results?
enter image description here  
What does it mean?

Yes, those are inconsistent results.  One would expect that given your poor model fit, it would be the other way around: the model should fit your observed data worse than most/all the data generated from the model parameters.

After some trial-and-error, I found the problem was that the estimates in the parameter table must be saved as starting values (i.e., the $start and $ustart columns of a parameter table).  This is now automatically done when generate= is a parameter table, as well as when generate= is a fitted lavaan object, from which the parTable() will be extracted.  A further complication was that the estimated values would not be preserved as population parameters when leaving generate=NULL.  That has now been fixed as well.

You can install the development version again, and check that it works using this reproducible example adapted from the lavaan tutorial page

set.seed(1234)
X
<- rnorm(100)
M
<- 0.5*X + rnorm(100)
Y
<- 0.7*M + rnorm(100)
Data <- data.frame(X = X, Y = Y, M = M)
model
<- ' # direct effect
             Y ~ c*X
           # mediator
             M ~ a*X
             Y ~ b*M
           # indirect effect (a*b)
             ab := a*b
           # total effect
             total := c + (a*b)
         '

fit
<- sem(model, data = Data)
summary
(fit)
output
<- sim(10, n=nrow(Data), model = fit)
summary
(output)


Notice that the Average Param column in the summary() output verifies that the population parameters for generating data are in fact the estimates from summary(fit)


Also, when I try to apply DWLS estimator in simsem, although it runs well when modelling in lavaan

You were looking at the same indices, but with DWLS, you need to look at the scaled statistic ("chisq.scaled","rmsea.scaled", etc.).  Also, I'm not sure if anything went right because you are only providing some excerpts of syntax and output that confused you.  Are you using the ordered= argument to tell lavaan that which variables are ordinal, or are you only setting estimator = "DWLS"?

Also, what is your goal in comparing estimators?  It is unclear whether you want to generate ordinal data regardless, then see what happens when you treat them as ordinal (with DWLS) or as continuous (with ML).

Monika Ramoskaite

unread,
Nov 26, 2019, 10:20:41 AM11/26/19
to lav...@googlegroups.com
Thank you for your patience and apologies if I do not express myself clearly, will try to do so below :)
 
You mean using the sim() function, with sem() output as the model= argument?
 
Yes, sorry.  
 
You were looking at the same indices, but with DWLS, you need to look at the scaled statistic ("chisq.scaled","rmsea.scaled", etc.). 

Are these available in lavaan? In the list of fitMeasures I could not find these:

image.png
 
Also, I'm not sure if anything went right because you are only providing some excerpts of syntax and output that confused you.  Are you using the ordered= argument to tell lavaan that which variables are ordinal, or are you only setting estimator = "DWLS"?

Yes, I was not using the ordered= argument, my mistake.
 
Also, what is your goal in comparing estimators?  It is unclear whether you want to generate ordinal data regardless, then see what happens when you treat them as ordinal (with DWLS) or as continuous (with ML).

I am doing an empirical study. I have collected data via Likert type questionnaires. In my case, once averaged, some variables have 12-16 categories, some are continuous. First I read a recommendation (Finney, S. J., & DiStefano, C. (2013)), that variables with 6 or more categories should be treated as continuous and estimated with ML estimator, after seeing the results and reading more articles (DiStefano & Morgan, 2014; Li, 2015, 2016), where DWLS outperformed ML even with variables with 10 categories, in addition to an ever-going debate whether Likert scale can be treated as continuous. I hypothesized that maybe applying DWLS for variables with 12-16 categories would yield better results, than treating all as continuous. However to compare the two estimators, I would like to see parameter estimates' and standard error bias and power. 

The code that I have so far to achieve this

image.png

The sim() output (for ML estimator) seems to be fine after the fix. However, I still have problems when applying sim() to "DWLS" estimator, with the error below , although I do not think I am referencing something that is not included in the data as the same worked for ML estimator and sem() function for DWLS also works fine:

image.png
What might be the cause?

Best regards,
Monika


Terrence Jorgensen

unread,
Nov 29, 2019, 5:58:41 AM11/29/19
to lavaan

You were looking at the same indices, but with DWLS, you need to look at the scaled statistic ("chisq.scaled","rmsea.scaled", etc.). 

Are these available in lavaan? In the list of fitMeasures I could not find these:

Because you were not declaring any variables as ordered=, so DWLS was not actually used.
 
I am doing an empirical study. 

Then to make the comparison of methods valid, you need to analyze the same data.  When you fit your model using different assumptions, then generate data that match those assumptions, you will not learn anything because your assumptions are met either way.  You should instead assign the different real-data results to different objects, and run your simulation by generating ordinal data:

fit.ml <- sem(..., estimator = "MLR") # all your cited studies compared DWLS to robust ML
fit
.dwls <- sem(..., ordered = ...) # DWLS will be chosen automatically
out.ml <- sim(..., model = fit.ml, generate = fit.dwls)
out.dwls <- sim(..., model = fit.dwls, generate = fit.dwls)

Terrence Jorgensen

unread,
Nov 29, 2019, 6:02:06 AM11/29/19
to lavaan
Sorry, quick addendum.  I think the error is caused by the default setting of conditional.x = TRUE when estimator = "DWLS"

fit.ml <- sem(..., estimator = "MLR") # all your cited studies compared DWLS to robust ML

fit
.dwls <- sem(..., ordered = ..., conditional.x = FALSE) # DWLS will be chosen automatically
out.ml <- sim(..., model = fit.ml, generate = fit.dwls, estimator = "MLR")
out.dwls <- sim(..., model = fit.dwls, generate = fit.dwls,
                ordered
= ..., conditional.x = FALSE)

Monika Ramoskaite

unread,
Nov 29, 2019, 8:25:23 AM11/29/19
to lav...@googlegroups.com
Hi terrance,

Thank you for your notes.
 
You were looking at the same indices, but with DWLS, you need to look at the scaled statistic ("chisq.scaled","rmsea.scaled", etc.). 

Are these available in lavaan? In the list of fitMeasures I could not find these:
 
Because you were not declaring any variables as ordered=, so DWLS was not actually used.

I did include ordered= (as per your comment), scaled statictics were not included after declaring the ordered variables. However it seems that the scaled statistics are not shown if I use both arguments ordered= and estimator="DWLS". Removing estimator="DWLS" and only keeping ordered= solved the problem and scaled statistics became visible.

 I am doing an empirical study. 

Then to make the comparison of methods valid, you need to analyze the same data.  When you fit your model using different assumptions, then generate data that match those assumptions, you will not learn anything because your assumptions are met either way.  You should instead assign the different real-data results to different objects, and run your simulation by generating ordinal data:

fit.ml <- sem(..., estimator = "MLR") # all your cited studies compared DWLS to robust ML

fit
.dwls <- sem(..., ordered = ...) # DWLS will be chosen automatically
out.ml <- sim(..., model = fit.ml, generate = fit.dwls)
out.dwls <- sim(..., model = fit.dwls, generate = fit.dwls)

Thank you, it does make sense. 
 
Sorry, quick addendum.  I think the error is caused by the default setting of conditional.x = TRUE when estimator = "DWLS"

fit.ml <- sem(..., estimator = "MLR") # all your cited studies compared DWLS to robust ML
fit
.dwls <- sem(..., ordered = ..., conditional.x = FALSE) # DWLS will be chosen automatically
out.ml <- sim(..., model = fit.ml, generate = fit.dwls, estimator = "MLR")
out.dwls <- sim(..., model = fit.dwls, generate = fit.dwls,
                ordered
= ..., conditional.x = FALSE)


I did add  conditional.x = FALSE argument as per your example and it did solve the previous error appearing after first iteration:
image.png

however I received another error after all iterations for both out.dwls and out.mlr
for out.dwls
image.png

for out.mlr

image.png
what might be the reason?
If I set generate=fit.mlr for both estimators instead of fit.dwls, the output.dwls does not converge, out.mlr - works fine.

Best regards,
Monika

jpma...@gmail.com

unread,
Nov 29, 2019, 9:21:33 AM11/29/19
to lav...@googlegroups.com

Hi Monica and Terrance,

… I am kind of working on a similar problem… here is what I’ve learned so far (I am using this data for an academic evaluation):

Uma imagem com texto

Descrição gerada automaticamente

 

My question now is a little different: What happens if you have normative population but your sample is strongly biased (say, respondents always give high scores because of social pressure or intervier effect)?

 

Will be glad to share thoughts …

 

Best,

João Marôco

--

You received this message because you are subscribed to the Google Groups "lavaan" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lavaan+un...@googlegroups.com.

image003.png
image004.png
image005.jpg
image006.jpg
image007.png

Terrence Jorgensen

unread,
Nov 29, 2019, 10:02:43 AM11/29/19
to lavaan

What happens if you have normative population but your sample is strongly biased (say, respondents always give high scores because of social pressure or intervier effect)?


Biased samples shouldn't be expected to provide unbiased results.  But I'm not sure what you mean by "biased" here.  If you are generating multivariate normal data, then frequently observing high-category scores is consistent with a thresholds.  If you still posit symmetric thresholds around zero, then asymmetric distributions of the categories must arise from nonnormal latent item-responses.  In your case, you are imagining the influence of other factors (e.g., social pressure), which would be represented in the data-generating model as another common factor.  The bias of your sample's results would then be due to unmodeled multidimensionality.

Terrence Jorgensen

unread,
Nov 29, 2019, 10:03:04 AM11/29/19
to lavaan
I did include ordered= (as per your comment), scaled statictics were not included after declaring the ordered variables. However it seems that the scaled statistics are not shown if I use both arguments ordered= and estimator="DWLS". Removing estimator="DWLS" and only keeping ordered= solved the problem and scaled statistics became visible.

Interesting.  Probably due to a new experimental feature in lavaan (multiple types of robust test can be requested for the same model), setting estimator = "DWLS" does not automatically set test = "scaled.shifted".  So instead, use the argument estimator = "WLSMV" to make sure the robust test is requested, as well as robust SEs, when using DWLS.

what might be the reason?

I don't know.  Make sure your parameter table for estimation using MLR does not include any thresholds:

parTable(fit.MLR)


 
If I set generate=fit.mlr for both estimators instead of fit.dwls, the output.dwls does not converge

Because you are generating continuous data, not ordinal data.

jpma...@gmail.com

unread,
Nov 29, 2019, 11:26:31 AM11/29/19
to lav...@googlegroups.com

Hi Terrence,

Thanks for your comments.

 

So, yes, assume that your population is normative and have a nice normal distribution, but then you just sample the clinical sub-population from the overall population.

 

For example, say that you want to measure depression with the Beck Depression Inventory and you correctly expect that your item distributions are like this for the normative population:

 

A picture containing clock

Description automatically generated

 

But then, you extract a sample that is composed of patients with major depression. All items will have quite skewed to the right responses, something like this:

 

A close up of a map

Description automatically generated

(I just changed colors to make simulations visualization less boring)

 

What kind of (wrong) estimation will you get for factor loadings from WLSMV or ML?

 

I know that WLSMV assumes that the latents are normal and ML assumes that the items are MVN… But, I see people (students, but not only) doing CFA all the time with this skewed data, and I believe that WLSMV overestimates the true loadings and ML may underestimate the true factor loadings… but I can not give definitive answers other than “if your items do not have sensitivity then the measure can’t be valid whatever results you got from CFA”…

 

Cool, ah😊

 

Best,

João Marôco

 

From: lav...@googlegroups.com <lav...@googlegroups.com> On Behalf Of Terrence Jorgensen
Sent: 29 de novembro de 2019 15:03
To: lavaan <lav...@googlegroups.com>
Subject: Re: Is there a possibility to compare two estimation methods in SEM with simsem package?

 

What happens if you have normative population but your sample is strongly biased (say, respondents always give high scores because of social pressure or intervier effect)?

--

You received this message because you are subscribed to the Google Groups "lavaan" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lavaan+un...@googlegroups.com.

image001.png
image002.jpg
Rplot_irns_normal.png

Monika Ramoskaite

unread,
Nov 30, 2019, 4:24:24 AM11/30/19
to lav...@googlegroups.com, tjorge...@gmail.com
(seems that my email did not reach the lavaan group today, and again, when just sent, just in case it really got lost, send it cc to the gmail as well)

As I understand, then it is enough to write sim() as it was: sim(1000, data=nrow(data), model=fit), as seed argument is set by default. You also mention generate= argument, which I believe if not specified is set by default with model= argument? Therefore if I run the below simulation two times (for each estimator):
fit <- sem(FullModel, data=data, std.lv=TRUE, estimator="ML")
output <- sim(1000, n=nrow(data), model = parTable(fit))

I shoud receive the output I am looking for, or should I specify generate argument afterall?

However, when I run the above functions, I receive the result, that I find confusing.

I have a model with mediation effect. When I run it with lavaan, sem() function, ML estimate, I get, quite good results, keeping in mind slightly non-normal data (attributing to higher chi-sq and it's significant p-value):

 enter image description here

All parameter estimates are also significant, including mediation effect (also tested with bootstrap, to get CI).

However when I run simulation with simsem with sem() function, I get poor results, poor model-fit indices cut-off values (e.g. CFI=0, RMSEA>0.3), and the pValue() function shows that all simulations showed worse results than analysis from observed data. In addition, strangely, andRule value (the number of replications that have all fit indices indicating a better model than the observed data) is equal 1, which I suppose contradicts the other results?

enter image description here  

What does it mean?

Also, when I try to apply DWLS estimator in simsem, although it runs well when modelling in lavaan

image.png


when I run simulations, I receive error:

image.png
What would be the way to remedy this?

Apologies for so many questions, but I feel slightly confused. Appreciate all the help.


2019-11-25, pr, 22:50 Monika Ramoskaite <ramoskai...@gmail.com> rašė:
Thank you both for replies. 
As I understand, then it is enough to write sim() as it was: sim(1000, data=nrow(data), model=fit), as seed argument is set by default. You also mention generate= argument, which I believe if not specified is set by default with model= argument? Therefore if I run the below simulation two times (for each estimator):
fit <- sem(FullModel, data=data, std.lv=TRUE, estimator="ML")
output <- sim(1000, n=nrow(data), model = parTable(fit))

I shoud receive the output I am looking for, or should I specify generate argument afterall?

However, when I run the above functions, I receive the result, that I find confusing.

I have a model with mediation effect. When I run it with lavaan, sem() function, ML estimate, I get, quite good results, keeping in mind slightly non-normal data (attributing to higher chi-sq and it's significant p-value):

 enter image description here

All parameter estimates are also significant, including mediation effect (also tested with bootstrap, to get CI).

However when I run simulation with simsem with sem() function, I get poor results, poor model-fit indices cut-off values (e.g. CFI=0, RMSEA>0.3), and the pValue() function shows that all simulations showed worse results than analysis from observed data. In addition, strangely, andRule value (the number of replications that have all fit indices indicating a better model than the observed data) is equal 1, which I suppose contradicts the other results?

enter image description here  

What does it mean?

Also, when I try to apply DWLS estimator in simsem, although it runs well when modelling in lavaan

image.png


when I run simulations, I receive the error:

image.png

What would be the way to remedy this?

Apologies for so many questions, but I feel slightly confused. Appreciate all the help.

--
You received this message because you are subscribed to the Google Groups "lavaan" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lavaan+un...@googlegroups.com.

Monika Ramoskaite

unread,
Nov 30, 2019, 4:24:24 AM11/30/19
to lav...@googlegroups.com
Thank you both for replies. 
As I understand, then it is enough to write sim() as it was: sim(1000, data=nrow(data), model=fit), as seed argument is set by default. You also mention generate= argument, which I believe if not specified is set by default with model= argument? Therefore if I run the below simulation two times (for each estimator):
fit <- sem(FullModel, data=data, std.lv=TRUE, estimator="ML")
output <- sim(1000, n=nrow(data), model = parTable(fit))

I shoud receive the output I am looking for, or should I specify generate argument afterall?

However, when I run the above functions, I receive the result, that I find confusing.

I have a model with mediation effect. When I run it with lavaan, sem() function, ML estimate, I get, quite good results, keeping in mind slightly non-normal data (attributing to higher chi-sq and it's significant p-value):

 enter image description here

All parameter estimates are also significant, including mediation effect (also tested with bootstrap, to get CI).

However when I run simulation with simsem with sem() function, I get poor results, poor model-fit indices cut-off values (e.g. CFI=0, RMSEA>0.3), and the pValue() function shows that all simulations showed worse results than analysis from observed data. In addition, strangely, andRule value (the number of replications that have all fit indices indicating a better model than the observed data) is equal 1, which I suppose contradicts the other results?

enter image description here  

What does it mean?

Also, when I try to apply DWLS estimator in simsem, although it runs well when modelling in lavaan

image.png


when I run simulations, I receive the error:

image.png

What would be the way to remedy this?

Apologies for so many questions, but I feel slightly confused. Appreciate all the help.

2019-11-25, pr, 22:26 Alex Schoemann <alexander...@gmail.com> rašė:

Monika Ramoskaite

unread,
Nov 30, 2019, 4:24:25 AM11/30/19
to lav...@googlegroups.com

I have a model with mediation effect. When I run it with lavaan ML estimate I get, quite good results, keeping in mind slightly non-normal data (attributing to higher chi-sq and it's significant p-value):

 enter image description here

All parameter estimates are also significant, including mediation effect (also tested with bootstrap, to get CI).

However when I run simulation with simsem with sem() function, I get poor results, terrible model-fit indices cut-off values (e.g. CFI=0, RMSEA>0.3), and the pValue() function shows that all simulations showed worse results than analysis from observed data. However, andRule value (the number of replications that have all fit indices indicating a better model than the observed data) is equal 1, which I suppose contradicts the other results?

enter image description here  

What does it mean?

Also, when I try to apply DWLS estimator in simsem, although it runs well when modelling in lavaan

image.png


when I run simulations, I receive error:

image.png

How do I remedy?

--
You received this message because you are subscribed to the Google Groups "lavaan" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lavaan+un...@googlegroups.com.

Monika Ramoskaite

unread,
Nov 30, 2019, 4:24:25 AM11/30/19
to lav...@googlegroups.com

I have a model with mediation effect. When I run it with lavaan ML estimate I get, quite good results, keeping in mind slightly non-normal data (attributing to higher chi-sq and it's significant p-value):

 enter image description here

All parameter estimates are also significant, including mediation effect (also tested with bootstrap, to get CI).

However when I run simulation with simsem with sem() function, I get poor results, terrible model-fit indices cut-off values (e.g. CFI=0, RMSEA>0.3), and the pValue() function shows that all simulations showed worse results than analysis from observed data. However, andRule value (the number of replications that have all fit indices indicating a better model than the observed data) is equal 1, which I suppose contradicts the other results?

enter image description here  

What does it mean?

Also, when I try to apply DWLS estimator in simsem, although it runs well when modelling in lavaan

image.png


when I run simulations, I receive error:

image.png

How do I remedy?

2019-11-25, pr, 13:16 Terrence Jorgensen <tjorge...@gmail.com> rašė:

Monika Ramoskaite

unread,
Nov 30, 2019, 8:02:01 AM11/30/19
to lav...@googlegroups.com
Apologies, please ignore the above email. This seems to be the one that for some reason got stuck and did not get sent to the group some days ago.

Best regards,
Monika

2019-11-30, št, 11:24 Monika Ramoskaite <ramoskai...@gmail.com> rašė:
Also @Terrence Jorgensen,

I have a model with mediation effect. When I run it with lavaan ML estimate I get, quite good results, keeping in mind slightly non-normal data (attributing to higher chi-sq and it's significant p-value):

 enter image description here

All parameter estimates are also significant, including mediation effect (also tested with bootstrap, to get CI).

However when I run simulation with simsem with sem() function, I get poor results, terrible model-fit indices cut-off values (e.g. CFI=0, RMSEA>0.3), and the pValue() function shows that all simulations showed worse results than analysis from observed data. However, andRule value (the number of replications that have all fit indices indicating a better model than the observed data) is equal 1, which I suppose contradicts the other results?

enter image description here  

What does it mean?

Also, when I try to apply DWLS estimator in simsem, although it runs well when modelling in lavaan

image.png


when I run simulations, I receive error:

image.png

How do I remedy?

Monika Ramoskaite

unread,
Nov 30, 2019, 10:47:28 AM11/30/19
to lav...@googlegroups.com
Hi All,

I am still trying to compare different estimators with simsem package.
This time ML and MLR

The results after running sim() come identical for both, however I believe this should not be the case, the data does have fair to moderate skew and kurtosis.
Did anyone run into the similar problem?

The code used in below:
image.png

Best regards,
Monika

Monika Ramoskaite

unread,
Nov 30, 2019, 3:04:15 PM11/30/19
to lav...@googlegroups.com
Outputs of both provided below:

image.png
image.png

Terrence Jorgensen

unread,
Dec 4, 2019, 7:26:51 AM12/4/19
to lavaan
I am still trying to compare different estimators with simsem package.
This time ML and MLR
The results after running sim() come identical for both, however I believe this should not be the case

Well, believe it.  MLR only adjusts standard errors; the point estimates under ML are already unbiased, so there is no need to adjust them.  Look at your estimates from both models, and you will see they are identical (but SEs and test stats differ).  Thus, you will generate the same (normally distributed) data in both cases.
 
the data does have fair to moderate skew and kurtosis.

There is no way to tell that from the parameter table.  You need to use the indDist= argument to specify how much skew/kurtosis you want items to have.  See the first example on the ?bindDist help page, and adapt it to pass your observed indicators' skew and kurtosis values, then pass that object to the indDist= argument.

Monika Ramoskaite

unread,
Dec 29, 2019, 8:33:43 AM12/29/19
to lav...@googlegroups.com
Hi Terrence,

Thank you for your answer.

I have tried indicating the distribution of data as per your suggestion:
image.png

And I understand that when MLR estimator is used only standard errors and chi-square is adjusted, it is the case when using sem() function in lavaan but not the case with sim() in simsem. After running the simulations, with kurtosis and skewness indicated, the outputs for both ML and MLR estimators are identical, including the standard errors. You mentoned that MLR only adjusts standard errors, but I do not see it in the output after simulations.

How can I access the adjusted standard errors after running simulations with MLR estimator?

Thank you!

Monika

--
You received this message because you are subscribed to the Google Groups "lavaan" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lavaan+un...@googlegroups.com.

Terrence Jorgensen

unread,
Jan 2, 2020, 8:31:38 AM1/2/20
to lavaan
but not the case with sim() 

If you want to use MLR in your simulation, you have to tell it so. See the ... argument description on the ?sim help page.  You specify any additional lavaan arguments/options that way, like estimator="MLR"

Monika Ramoskaite

unread,
Jan 8, 2020, 1:27:17 PM1/8/20
to lav...@googlegroups.com
Hi Terrence,

Apologies, I got confused with it I guess.
I thought that by passing my sem() object, created with estimator='MLR', to the model= and generate= arguments in sim() would be like estimating data with 'MLR'.

It works fine with estimator= argument indicated in sim().  

Thank you.

Best regards,
Monika

--
You received this message because you are subscribed to the Google Groups "lavaan" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lavaan+un...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages