SEM with clustered data using lavaan.survey

1,499 views
Skip to first unread message

cantal...@yahoo.com

unread,
Jun 17, 2013, 11:48:49 AM6/17/13
to lav...@googlegroups.com
Dear all,

I'm trying to fit a SEM using the lavaan.survey package. 

I have two dependent (endogenous) variables that are continuous and normalized.

Among the explanatory (exogenous) variables, there are both continuous variables and factors (recoded as dummy variables (0/1)). In addition, observations (individual nestlings) are clustered in another variable (nest ID, coded as 1,2,3,… and treated as a continuous variable). There may be 1, 2 or 3  nestlings per nest, and thus, samples are not completely independent.

The dataset is complete and I wonder what estimator should I use. I tried with ML and MLR.


Results are apparently correct. However, I’m not familiar with R, and I still have many doubts about the syntax, etc…

so, I would appreciate very much if you could take a look to the script (below) to check if I made any important mistake.

Any advice and/or suggestion will be most wellcome.

Thank you very much in advance


#Installing packages


install.packages("lavaan", dependencies = TRUE)
library(lavaan)
library (survey)
install.packages("lavaan.survey")
library (lavaan.survey)
#calling dataset
data<-read.table("cut_GLM.txt",header=TRUE, sep="\t")


#inspecting dataset

str(data)


#coding categorical factors as dummy (0/1) variables

data$dSEX<-model.matrix(~as.factor(data$SEX))[,-1]
data$dfosterF<-model.matrix(~as.factor(data$fosterF))[,-1]
data$dfosterM<-model.matrix(~as.factor(data$fosterM))[,-1]
data$dbiolF<-model.matrix(~as.factor(data$biolF))[,-1]
data$dbiolM<-model.matrix(~as.factor(data$biolM))[,-1]
data$dMC1R<-model.matrix(~as.factor(data$MC1R))[,-1]

# centering and rescaling continuous explanatory variables

data$sage<-scale (data$age)
data$sweight<-scale (data$weight)
data$sbrood<-scale (data$brood)

#Initial SEM model with direct and indirect effects

model1 <- "
PHA~ dMC1R  + sweight
IHA~ dMC1R +  sweight + PHA
sweight~ dfosterF + sbrood + sage
"

#PA1 path analysis without cluster.
PA1 <- lavaan(model1,  data=data,auto.var=TRUE,std.ov = TRUE, estimator="ML")
summary(PA1)

#group= defining the cluster
group<- svydesign(ids=~nest, prob=~1, data=data)


#path análisis with cluster
survey.fit <- lavaan.survey(lavaan.fit=PA1, survey.design=group)
summary(survey.fit)
# parameter estimates, SE, z-values, standarized parameter values
parameterEstimates (survey.fit)

# model-implied (fitted) covariance matrix (and mean vector) of a fitted model
fitted (survey.fit)

#residuals of a fitted model
resid (fit.survey, type = "standarized")

#estimated covariance matrix of the parameter estimates
vcov(survey.fit)

AIC(survey.fit)
BIC(survey.fit)

yrosseel

unread,
Jun 20, 2013, 5:43:30 AM6/20/13
to lav...@googlegroups.com
On 06/17/2013 05:48 PM, cantal...@yahoo.com wrote:
> Dear all,

> The dataset is complete and I wonder what estimator should I use. I
> tried with ML and MLR.

The MLR is the same as ML (ie the estimates are identical), but with
robust standard errors and a robust (scaled) test statistic. Robust
against non-normality of the endogenous variables, that is.

> #coding categorical factors as dummy (0/1) variables
>
> data$dSEX<-model.matrix(~as.factor(data$SEX))[,-1]

There is nothing wrong with this, but perhaps a more simple approach is

data$dSEX <- ifelse(data$SEX == "male", 1, 0)

(assuming you have 'male' and 'female', but this works with any coding).

> #Initial SEM model with direct and indirect effects
>
> model1 <- "
> PHA~ dMC1R + sweight
> IHA~ dMC1R + sweight + PHA
> sweight~ dfosterF + sbrood + sage
> "

Looks good. It is preferred to have single quotes (') instead of double
quotes (") around the syntax.

> #PA1 path analysis without cluster.
> PA1 <- lavaan(model1, data=data,auto.var=TRUE,std.ov = TRUE,
> estimator="ML")

I would avoid using std.ov=TRUE. You loose the information contained in
the variances.

> #group= defining the cluster
> group<- svydesign(ids=~nest, prob=~1, data=data)

I'm not so familiar with svydesign (can't help you here).

but it looks you did it right.

Yves.

Laura Gangoso

unread,
Jun 20, 2013, 9:00:04 AM6/20/13
to lav...@googlegroups.com
Dear Yves,

thank you very much for your kind response. I'll follow your suggestions, but I still have a doubt.
You said you would avoid using std.ov=TRUE. I thought it was neccessary since observed continuous explanatory variables (BUT NOT DEPENDENT VARIABLES) were standarized before entering the analysis (data$sage<-scale (data$age)). Just to be sure,  should I remove std.ov=TRUE from the script?

Apart from this, I would like to know your opinion about a "strange" result I obtained with the former model (you can find it below, highlighted in yellow). I'm not sure if it's due to either lavaan or svydesign problems.

Here is the output.

 PA1 <- lavaan(model1,  data=data,auto.var=TRUE,std.ov = TRUE, estimator="MLR")
> summary(PA1)
lavaan (0.5-13) converged normally after  15 iterations

  Number of observations                            64

  Estimator                                         ML      Robust
  Minimum Function Test Statistic                6.578       5.665
  Degrees of freedom                                 5           5
  P-value (Chi-square)                           0.254       0.340
  Scaling correction factor                                  1.161
    for the Yuan-Bentler correction

Parameter estimates:

  Information                                 Observed
  Standard Errors                   Robust.huber.white

                   Estimate  Std.err  Z-value  P(>|z|)
Regressions:
  PHA ~
    dMC1R             0.333    0.106    3.134    0.002
    dfosterF         -0.253    0.119   -2.131    0.033
    sweight           0.366    0.102    3.590    0.000
  IHA ~
    dMC1R             0.258    0.123    2.089    0.037
    sage             -0.425    0.226   -1.881    0.060
    sweight           0.368    0.235    1.564    0.118
    sbrood            0.231    0.127    1.817    0.069
    PHA               0.070    0.132    0.531    0.595

Intercepts:
    PHA               0.000
    IHA               0.000

Variances:
    PHA               0.656    0.114
    IHA               0.777    0.109


> group<- svydesign(ids=~nest, prob=~1, data=data)
>
> survey.fit <- lavaan.survey(lavaan.fit=PA1, survey.design=group)
> summary(survey.fit)
lavaan (0.5-13) converged normally after  38 iterations

  Number of observations                            64

  Estimator                                         ML      Robust
  Minimum Function Test Statistic              120.103      93.991
  Degrees of freedom                                 5           5
  P-value (Chi-square)                           0.000       0.000
  Scaling correction factor                                  1.278
    for the Satorra-Bentler correction

Parameter estimates:

  Information                                 Expected
  Standard Errors                           Robust.sem

                   Estimate  Std.err  Z-value  P(>|z|)
Regressions:
  PHA ~
    dMC1R             2.788    0.394    7.071    0.000
    dfosterF          1.895    0.423    4.475    0.000
    sweight           0.020    0.232    0.085    0.932
  IHA ~
    dMC1R             0.934    0.539    1.732    0.083
    sage             -0.650    0.424   -1.531    0.126
    sweight           0.420    0.443    0.947    0.344
    sbrood            0.448    0.249    1.796    0.072
    PHA               0.646    0.097    6.676    0.000

Intercepts:
    PHA               0.000
    IHA               0.000

Variances:
    PHA               2.629    0.330
    IHA               2.675    0.426

>
> AIC(survey.fit)
[1] 1100.235
> BIC(survey.fit)
[1] 1121.824
>

When using the lavann.survey, the variable highlighted in yellow (dfosterF) gives a positive estimate. However, I certainly know that this effect is negative (I run similar GLMMs analysis before and after running SEM and I obtained a negative estimate), and it's indeed negative in the first part of the model. Intriguingly, the +/- sign of this variable estimate varies (apparently at random)  in following attempts.  Any idea??
any suggestion would be very helpful.

Once again, thank you very much 

Laura Gangoso

unread,
Jun 20, 2013, 9:03:27 AM6/20/13
to lav...@googlegroups.com

yrosseel

unread,
Jun 28, 2013, 6:21:02 AM6/28/13
to lav...@googlegroups.com
On 06/20/2013 03:00 PM, Laura Gangoso wrote:
> Dear Yves,
>
> thank you very much for your kind response. I'll follow your
> suggestions, but I still have a doubt.
> You said you would avoid using std.ov=TRUE. I thought it was neccessary
> since observed continuous explanatory variables (BUT NOT DEPENDENT
> VARIABLES) were standarized before entering the analysis
> (data$sage<-scale (data$age)). Just to be sure, should I remove
> std.ov=TRUE from the script?

Yes, I would remove it.

> Apart from this, I would like to know your opinion about a "strange"
> result I obtained with the former model (you can find it below,
> highlighted in yellow). I'm not sure if it's due to either lavaan or
> svydesign problems.

I'm not aware of any lavaan/svydesign problems that might cause this.
But the residual variances changed a lot. What happens if you omit std.ov?

Yves.

Reply all
Reply to author
Forward
0 new messages