I'm trying to fit this model with 2 mediators and 1 moderator:
1) I recoded gender with dummy coding and the IV with ordinal coding
but the DV and the MED2 are frequency raw data, is this problem to fit the model?
Also the IV is pretty skewed, should I use the MLM estimator?
2) I want to control for the clustering effect of the model by city where participants are living in (80 different cities) but R is warning me :"Warning messages:1: In lav_data_full(data = data, group = group, group.label = group.label,lavaan WARNING: `group.label' argument will be ignored if `group' argument is missing
cfa(model, data=DF, missing='fiml', group = "CITY")2: In lavaan::lavaan(model = model, data = DF, missing = "fiml", group.label = DF$CITY, :lavaan WARNING: syntax contains parameters involving exogenous covariates; switching to fixed.x = FALSE"Group involves exogenous covariates? Sorry I do not get that point.
3) Finally, once the model has been fitted, Rosseel (2012) precised that we can evaluate indirect effect in mediation analysis. But the fact that we're using CFA/SEM model isn't actually alternative to mediations analysis? Or are they complementary?
1) I recoded gender with dummy coding and the IV with ordinal codingYour syntax and path diagram have no labels, so I can't tell how these variables operate in your model. But if they are both exogenous predictors, then dummy codes are necessary for any kind of categorical variable (binary, ordinal, nominal).
Also, you say you have a moderator, but there are no product terms in your path diagram or syntax. So neither predictor is a moderator, just a covariate.
but the DV and the MED2 are frequency raw data, is this problem to fit the model?If they are not approximately continuous, then they should be treated as counts. Mplus is the only SEM software I am aware of that allows count outcomes. If there are not very many categories, you could treat the count outcome as ordered.
Also the IV is pretty skewed, should I use the MLM estimator?If you set fixed.x = TRUE (the default option: ?lavOptions), no assumptions are made about the distribution of exogenous predictors.2) I want to control for the clustering effect of the model by city where participants are living in (80 different cities) but R is warning me :"Warning messages:1: In lav_data_full(data = data, group = group, group.label = group.label,lavaan WARNING: `group.label' argument will be ignored if `group' argument is missingYou did not tell lavaan which column in DF is the grouping variable. You do not need to specify "group.label" unless you want to specify that the cities appear in a particular order in the output.cfa(model, data=DF, missing='fiml', group = "CITY")2: In lavaan::lavaan(model = model, data = DF, missing = "fiml", group.label = DF$CITY, :lavaan WARNING: syntax contains parameters involving exogenous covariates; switching to fixed.x = FALSE"Group involves exogenous covariates? Sorry I do not get that point.The message is not about group, it is about exogenous covariates (IV and MOD1). You do not need to specify that they (co)vary in the syntax. Using fixed.x = TRUE tells lavaan to just use their observed sample statistics, so that you don't need to assume they are normally distributed.
3) Finally, once the model has been fitted, Rosseel (2012) precised that we can evaluate indirect effect in mediation analysis. But the fact that we're using CFA/SEM model isn't actually alternative to mediations analysis? Or are they complementary?You can conduct mediation analysis in many ways. SEM is one of the better frameworks because all paths are easily estimated simultaenously. If you have categorical outcomes, though, it gets tricky. See some later slides in this presentation:You can also find a lot of advice about mediation on SEMNET:FYI, there is no measurement model in your syntax or diagram, so this is a path analysis, not a CFA. But either the cfa() or sem() functions both call lavaan() with the same default settings, so that detail is inconsequential. I was just confused by "CFA" in the subject.
my first regression line should include:DV~MOD1*MED1+MOD1*MED2+IV
myData$med1xG <- myData$MED1 * myData$MOD1
myData$med2xG <- myData$MED2 * myData$MOD1
model <- '
# regressions
DV ~ MOD1 + MED1 + MED2 + med1xG + med2xG + IV
...
'Do you recall a way to display R2-squared values for this? (to evaluate how much variance the full model explained).
What do you mean by "approximately continuous"? Is there a statistical way to check for that?
Yes now the error message referred to something more specific :"Error in lav_samplestats_from_data(lavdata = lavdata, missing = lavoptions$missing, :lavaan ERROR: data contains only a single observation in group 9In addition: There were 50 or more warnings (use warnings() to see the first 50) "I guess lavaan assumed there is at least 2 observations per group?
Absolutely, is there a way to change the subject to "path analysis"? Would shed a light on what's really inside the trend. Thanks in advance for your help!
my first regression line should include:DV~MOD1*MED1+MOD1*MED2+IVTerm-expansion with the asterisk operator (*) works like that in formula objects (?formula), but lavaan does not use formula objects. This is because SEMs model several regression equations simultaneously. So you need to explicitly include the product term as an additional variable in your ?model.syntax, like so:myData$med1xG <- myData$MED1 * myData$MOD1
myData$med2xG <- myData$MED2 * myData$MOD1
model <- '
# regressions
DV ~ MOD1 + MED1 + MED2 + med1xG + med2xG + IV
...
'
Do you recall a way to display R2-squared values for this? (to evaluate how much variance the full model explained).R-squared for each endogenous variable is the variance explained by its predictors, and you can request that from the summary() or parameterEstimates() output using the argument "rsquare = TRUE".
What do you mean by "approximately continuous"? Is there a statistical way to check for that?Not in a hypothesis-testing way, but there are practical guidelines you might be able to find with a little Googling. I think a Poisson distribution is approximately normalish when the mean is as little as 10 or 15. Binomial (not binary) variables are approximately normal when they are based on at least 30 trials. But continuity should be good enough, even if not normal, because there are robust estimators that adjust for excess kurtosis. Ordinal data can be treated as continuous when there are at least 5 categories, or at least 7 if the distributions are quite skewed. Here is some helpful reading, although it is about ordinal data, not counts per se.
Yes now the error message referred to something more specific :"Error in lav_samplestats_from_data(lavdata = lavdata, missing = lavoptions$missing, :lavaan ERROR: data contains only a single observation in group 9In addition: There were 50 or more warnings (use warnings() to see the first 50) "I guess lavaan assumed there is at least 2 observations per group?At least two observations are necessary to calculate (co)variance, because there is no (co)variability in a single number. But you need a lot more than 2 observations per group in SEM. How many cities do you have? If it is more than a 10, a multilevel SEM is probably the framework you want. lavaan does not yet provide multilevel functionality, but it will eventually (probably slowly) introduce such features:Here are a couple of articles that provide a good conceptual introduction to MSEM:
So if I put something like DV ~ IV + MOD1 + MOD2 for example, my moderators are just covariates right?
But does that mean my IV is also one? Or is there a way to specify it as an IV? A priori, should play the same role in the model?
no prediction for the level-2, I just want to control for the clustering effect.
"Error in estimate.moments.EM(Y = X[[g]], Mp = Mp[[g]], Yp = missing.[[g]], :lavaan ERROR: Sigma_22.inv cannot be inverted"I'm wondering if this is because of the number of groups or if it's the nature of the groups themselves? (they are identified by a number).
HS.model <- ' visual =~ x1 + x2 + x3
textual =~ x4 + x5 + x6
speed =~ x7 + x8 + x9 '
class(HolzingerSwineford1939$sex) # integer
fit <- cfa(HS.model, data = HolzingerSwineford1939, group = "sex")
summary(fit, fit.measures=TRUE)