CFA - clustered standard errors

381 views
Skip to first unread message

Aaron Charlton

unread,
Nov 3, 2015, 6:56:06 PM11/3/15
to lavaan
I am doing a scale development project with Lavaan. I ran a CFA, but the problem is that I have 43 individuals answer the same questions four times, once for each set of stimuli. Is there a way I can account for this in Lavaan? I was told that I need to "cluster the standard errors", clustering together each individual's response. Does anyone know how to do that? Also, should I use the grouping aspect of the cfa function? Thanks. -Aaron

## confirmatory factor analysis
library(lavaan)
model1 <- ' history =~ HIS1 + HIS2 + HIS3 + HIS4
              motivation =~ MOT1 + MOT2 + MOT3 + MOT4
              emotion =~ EMO1 + EMO2 + EMO3 + EMO4
              fit =~ FIT1 + FIT2 + FIT3 + FIT4 + FIT5'  
fit2 <- cfa(model1, data=study1)

Stas Kolenikov

unread,
Nov 3, 2015, 10:25:32 PM11/3/15
to lav...@googlegroups.com
Probably something like

library(survey) # thanks to Thomas Lumley
library(lavaan.survey) # thanks to Daniel Oberski
person.as.cluster <- svydesign(ids=~person, probs~=1, data=study1)
fit2.clustered <- lavaan.survey(fit2, person.as.cluster, estimator="MLM")

assuming that your data are in semi-long format with each row representing one (out of the four) observations.

Alternatively, you could consider recasting your model as MTMM with the multiple methods being the multiple occasions if you think that the individuals learn in about the same way.



-- Stas Kolenikov, PhD, PStat (ASA, SSC)  
-- Principal Survey Scientist, Abt SRBI
-- Education Officer, Survey Research Methods Section of the American Statistical Association
-- Opinions stated in this email are mine only, and do not reflect the position of my employer
-- http://stas.kolenikov.name
 

--
You received this message because you are subscribed to the Google Groups "lavaan" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lavaan+un...@googlegroups.com.
To post to this group, send email to lav...@googlegroups.com.
Visit this group at http://groups.google.com/group/lavaan.
For more options, visit https://groups.google.com/d/optout.

Aaron Charlton

unread,
Nov 4, 2015, 4:47:18 PM11/4/15
to lavaan
Stas, 
Thank you so much for the response. I have actually been trying to use this package, but every time I run the lavann.survey function, I get this error message:

Warning message:
'lavaan::duplicationMatrix' is deprecated.
Use 'lav_matrix_duplication' instead.
See help("Deprecated") and help("lavaan-deprecated"). 

Following the error message, it fails to do any clustering. Any ideas why this might be happening? Thanks. 

Stas Kolenikov

unread,
Nov 4, 2015, 5:17:20 PM11/4/15
to lav...@googlegroups.com
That would be a question to Daniel Oberski, and something for him to fix. (Duplication matrices are used to convert different vectorizations of a symmetric matrix, and that's something you have to process when dealing with the moment matrices.)



-- Stas Kolenikov, PhD, PStat (ASA, SSC)  
-- Principal Survey Scientist, Abt SRBI
-- Education Officer, Survey Research Methods Section of the American Statistical Association
-- Opinions stated in this email are mine only, and do not reflect the position of my employer
-- http://stas.kolenikov.name
 

--

yrosseel

unread,
Nov 5, 2015, 3:04:20 AM11/5/15
to lav...@googlegroups.com
On 11/04/2015 10:47 PM, Aaron Charlton wrote:
> Warning message:
> 'lavaan::duplicationMatrix' is deprecated.
> Use 'lav_matrix_duplication' instead.
> See help("Deprecated") and help("lavaan-deprecated").

This is a (completely harmless) warning message, not an error!

> Following the error message, it fails to do any clustering.

What happens? Do you get another error message? Does the model fail to
converge? Please show us the complete R script and the output, and we
may be able help you better.

Yves.


Aaron Charlton

unread,
Nov 5, 2015, 2:06:03 PM11/5/15
to lavaan
I seem to get exactly the same model with and without clustering. 

Here is my code:

> str(study1)
'data.frame': 176 obs. of  21 variables:
 $ ID      : chr  "951138372" "951138372" "951138372" "951138372" ...
 $ stimulus: Factor w/ 4 levels "adidas_wc","mcD_olymp",..: 1 2 4 3 1 2 4 3 1 2 ...
 $ HIS1    : int  7 5 5 6 7 3 2 2 7 6 ...
 $ HIS2    : int  6 3 3 5 7 3 2 2 7 6 ...
 $ HIS3    : int  6 5 5 5 7 3 2 2 7 7 ...
 $ HIS4    : int  6 3 2 5 7 3 2 2 7 4 ...
 $ MOT1    : int  6 5 6 6 7 3 3 6 7 4 ...
 $ MOT2    : int  7 7 6 7 7 3 3 5 7 7 ...
 $ MOT3    : int  5 5 5 5 1 5 3 3 4 4 ...
 $ MOT4    : int  6 6 6 6 7 3 5 5 7 5 ...
 $ EMO1    : int  6 5 5 5 7 2 3 3 7 2 ...
 $ EMO2    : int  6 5 5 5 7 2 4 3 5 4 ...
 $ EMO3    : int  5 6 6 5 7 4 3 3 5 6 ...
 $ EMO4    : int  6 5 5 5 7 3 3 4 7 6 ...
 $ FIT1    : int  7 2 2 6 7 2 3 3 7 1 ...
 $ FIT2    : int  7 2 2 6 7 2 3 3 7 1 ...
 $ FIT3    : int  7 2 2 7 7 2 3 4 7 3 ...
 $ FIT4    : int  6 3 3 5 7 2 3 3 7 1 ...
 $ FIT5    : int  7 2 2 6 7 2 3 4 7 1 ...
 $ OUT1    : int  6 3 3 6 7 2 3 5 7 4 ...
 $ OUT2    : int  7 7 2 5 5 3 3 4 7 7 ...
> ## confirmatory factor analysis
> library(lavaan)
> model1 <- ' history =~ HIS1 + HIS2 + HIS3 + HIS4
+               motivation =~ MOT1 + MOT2 + MOT3 + MOT4
+               emotion =~ EMO1 + EMO2 + EMO3 + EMO4'
> #  fit =~ FIT1 + FIT2 + FIT3 + FIT4 + FIT5'  
> fit2 <- cfa(model1, data=study1)
> library(survey) 
> library(lavaan.survey) 
> person.as.cluster <- svydesign(ids=~ID, probs=~1, data=study1)
> fit2.clustered <- lavaan.survey(fit2, person.as.cluster, estimator="MLM")
Warning message:
'lavaan::duplicationMatrix' is deprecated.
Use 'lav_matrix_duplication' instead.
See help("Deprecated") and help("lavaan-deprecated"). 
> summary(fit2)
lavaan (0.5-19) converged normally after  41 iterations

  Number of observations                           176

  Estimator                                         ML
  Minimum Function Test Statistic              212.844
  Degrees of freedom                                51
  P-value (Chi-square)                           0.000

Parameter Estimates:

  Information                                 Expected
  Standard Errors                             Standard

Latent Variables:
                   Estimate  Std.Err  Z-value  P(>|z|)
  history =~                                          
    HIS1              1.000                           
    HIS2              0.952    0.034   28.164    0.000
    HIS3              0.962    0.034   28.420    0.000
    HIS4              0.902    0.039   22.969    0.000
  motivation =~                                       
    MOT1              1.000                           
    MOT2              0.603    0.059   10.215    0.000
    MOT3             -0.346    0.093   -3.722    0.000
    MOT4              0.697    0.054   12.815    0.000
  emotion =~                                          
    EMO1              1.000                           
    EMO2              1.054    0.066   15.970    0.000
    EMO3             -0.410    0.078   -5.230    0.000
    EMO4              0.771    0.071   10.887    0.000

Covariances:
                   Estimate  Std.Err  Z-value  P(>|z|)
  history ~~                                          
    motivation        1.382    0.205    6.755    0.000
    emotion           1.313    0.200    6.553    0.000
  motivation ~~                                       
    emotion           1.575    0.209    7.525    0.000

Variances:
                   Estimate  Std.Err  Z-value  P(>|z|)
    HIS1              0.121    0.029    4.122    0.000
    HIS2              0.338    0.045    7.548    0.000
    HIS3              0.337    0.045    7.492    0.000
    HIS4              0.520    0.062    8.369    0.000
    MOT1              0.674    0.101    6.701    0.000
    MOT2              0.650    0.077    8.432    0.000
    MOT3              2.218    0.239    9.292    0.000
    MOT4              0.404    0.056    7.255    0.000
    EMO1              0.625    0.089    6.995    0.000
    EMO2              0.377    0.074    5.078    0.000
    EMO3              1.614    0.175    9.238    0.000
    EMO4              1.002    0.117    8.547    0.000
    history           2.409    0.271    8.900    0.000
    motivation        1.681    0.251    6.708    0.000
    emotion           1.737    0.250    6.948    0.000

> summary(fit2.clustered)
lavaan (0.5-19) converged normally after  41 iterations

  Number of observations                           176

  Estimator                                         ML      Robust
  Minimum Function Test Statistic              212.844     138.145
  Degrees of freedom                                51          51
  P-value (Chi-square)                           0.000       0.000
  Scaling correction factor                                  1.541
    for the Satorra-Bentler correction

Parameter Estimates:

  Information                                 Expected
  Standard Errors                           Robust.sem

Latent Variables:
                   Estimate  Std.Err  Z-value  P(>|z|)
  history =~                                          
    HIS1              1.000                           
    HIS2              0.952    0.024   39.953    0.000
    HIS3              0.962    0.027   35.710    0.000
    HIS4              0.902    0.033   27.354    0.000
  motivation =~                                       
    MOT1              1.000                           
    MOT2              0.603    0.078    7.781    0.000
    MOT3             -0.346    0.169   -2.044    0.041
    MOT4              0.697    0.053   13.116    0.000
  emotion =~                                          
    EMO1              1.000                           
    EMO2              1.054    0.054   19.382    0.000
    EMO3             -0.410    0.139   -2.956    0.003
    EMO4              0.771    0.078    9.941    0.000

Covariances:
                   Estimate  Std.Err  Z-value  P(>|z|)
  history ~~                                          
    motivation        1.382    0.246    5.614    0.000
    emotion           1.313    0.199    6.586    0.000
  motivation ~~                                       
    emotion           1.575    0.213    7.395    0.000

Intercepts:
                   Estimate  Std.Err  Z-value  P(>|z|)
    HIS1              4.438    0.132   33.497    0.000
    HIS2              4.278    0.142   30.192    0.000
    HIS3              4.517    0.128   35.302    0.000
    HIS4              4.205    0.128   32.868    0.000
    MOT1              4.642    0.126   36.806    0.000
    MOT2              5.608    0.112   50.256    0.000
    MOT3              4.085    0.185   22.085    0.000
    MOT4              5.188    0.103   50.450    0.000
    EMO1              4.614    0.104   44.270    0.000
    EMO2              4.710    0.134   35.235    0.000
    EMO3              4.438    0.116   38.386    0.000
    EMO4              4.506    0.113   39.945    0.000
    history           0.000                           
    motivation        0.000                           
    emotion           0.000                           

Variances:
                   Estimate  Std.Err  Z-value  P(>|z|)
    HIS1              0.121    0.039    3.099    0.002
    HIS2              0.338    0.123    2.735    0.006
    HIS3              0.337    0.060    5.636    0.000
    HIS4              0.520    0.099    5.238    0.000
    MOT1              0.674    0.128    5.271    0.000
    MOT2              0.650    0.093    6.999    0.000
    MOT3              2.218    0.359    6.184    0.000
    MOT4              0.404    0.067    6.074    0.000
    EMO1              0.625    0.150    4.155    0.000
    EMO2              0.377    0.094    4.007    0.000
    EMO3              1.614    0.235    6.855    0.000
    EMO4              1.002    0.223    4.499    0.000
    history           2.409    0.246    9.785    0.000
    motivation        1.681    0.273    6.159    0.000
    emotion           1.737    0.266    6.527    0.000

> library(semTools)
> reliability(fit2)
         history motivation   emotion     total
alpha  0.9634943  0.3767812 0.4649107 0.8679822
omega  0.9638765  0.6193784 0.7369433 0.9261453
omega2 0.9638765  0.6193784 0.7369433 0.9261453
omega3 0.9637533  0.6348940 0.7667986 0.8763399
avevar 0.8697854  0.4561635 0.5797538 0.6580224
> reliability(fit2.clustered)
         history motivation   emotion     total
alpha  0.9634943  0.3767812 0.4649107 0.8679822
omega  0.9638765  0.6193784 0.7369433 0.9261453
omega2 0.9638765  0.6193784 0.7369433 0.9261453
omega3 0.9637533  0.6348940 0.7667986 0.8763399
avevar 0.8697854  0.4561635 0.5797538 0.6580224

Stas Kolenikov

unread,
Nov 5, 2015, 3:44:35 PM11/5/15
to lav...@googlegroups.com
So are HIS1, HIS2, HIS3, HIS4 your four occasions? If they are, and each individual is just one line of the data, then you don't have anything to cluster for. If you had scales administered four times, and you wanted to build a model where the latent construct is your latent variable, and items are the observed variables, then the idea of clustering could have been entertained to model the dependencies between the four occasions over time. As it stands, however, you have a CFA with no structure imposed; a very reasonable question to ask is whether you have measurement invariance between the occasions -- but it looks like motivation and emotion will likely fail that due to something odd happening on the third occasion.



-- Stas Kolenikov, PhD, PStat (ASA, SSC)  
-- Principal Survey Scientist, Abt SRBI
-- Education Officer, Survey Research Methods Section of the American Statistical Association
-- Opinions stated in this email are mine only, and do not reflect the position of my employer
-- http://stas.kolenikov.name
 

--

Aaron Charlton

unread,
Nov 5, 2015, 4:12:44 PM11/5/15
to lavaan
Hi, Stas.
Not exactly. HIS1-HIS4 are four different scale items that hopefully measure the latent variable 'history'. My occasions are contained in the 'stimulus' variable, a factor with four levels. I have the data in long form, so there are 4 observations for each person (ID) -- one for each of the four stimuli. Thanks. -Aaron 

Stas Kolenikov

unread,
Nov 5, 2015, 4:16:40 PM11/5/15
to lav...@googlegroups.com
Oh I see. Well your models aren't exactly the same with and without clustering. With clustering, the standard errors are different (as they should be; the point estimates should be the same). Also, with clustering, you should forget about the MLM fit test statistic reported (the 212 number), and only look at the "robust" statistic reported (the 138 number).



-- Stas Kolenikov, PhD, PStat (ASA, SSC)  
-- Principal Survey Scientist, Abt SRBI
-- Education Officer, Survey Research Methods Section of the American Statistical Association
-- Opinions stated in this email are mine only, and do not reflect the position of my employer
-- http://stas.kolenikov.name
 

Aaron Charlton

unread,
Nov 5, 2015, 4:26:52 PM11/5/15
to lavaan
Stas,
Thanks! Is there a resource you would recommend on interpreting the clustered output? -Aaron

Stas Kolenikov

unread,
Nov 5, 2015, 5:57:32 PM11/5/15
to lav...@googlegroups.com


-- Stas Kolenikov, PhD, PStat (ASA, SSC)  
-- Principal Survey Scientist, Abt SRBI
-- Education Officer, Survey Research Methods Section of the American Statistical Association
-- Opinions stated in this email are mine only, and do not reflect the position of my employer
-- http://stas.kolenikov.name
 

Reply all
Reply to author
Forward
0 new messages