CFA using lavaan in R with RStudio vs FACTOR

1,690 views
Skip to first unread message

Jaime Alvelo

unread,
Aug 4, 2014, 12:28:10 PM8/4/14
to lav...@googlegroups.com
Hello:
   I'm not a programmer, a statistician nor a marh wizard but a social worker accustomed to using SPSSPC & formerly SPSSX. So I've been informed that the version of SPPPC I have does not do Confirmatory Factor Analysis and was recommended R with lavaan as a way of getting the CFA results I need. What I have done so far?
1) Downloaded R (multiple authorizations required in our system--initially in VINCI but more readily handled at the local PC)
2) Installed RStudio
3) Downloaded lavaan
4) Oriented on how to upload data into R and did so.
5) Seen tutorials in YouTube for CFA and found them very technical for a beginner
6) Meanwhile was recommended to use FACTOR and do a Hull analysis with a Oblique Procrustean rotation on my data---did so with the remote guidance of the packages creator...found my data has one factor and not three as in study hypothesis. Was told by a psychometrician that was all I needed to go ahead and write it up.
   
My study: Adaptation of an 18 item scale with three dimensions (6 items each) from English  to Spanish. Hypothesis is that we would find same three dimensions in Spanish as in the English version. Items are Likert from 1 to 5 with 100 subjects. Exploratory Factor Analysis gave us three factors using the 1.000 eigenvalue criteria but the scree test showed a steep slope after 1st factor and leveling off after that with a relatively small proportion of the variance explained by the other factors. We got stalled in the study upon realizing our siftware did not have CFA and we have been struggling with that threafter. 
 
 
Question is, as this is all very technical would it help any if I continue my venture into R and proceed with the lavaan analysis or would that be a waste of time ( I still would need step by step guidance on how to run it).
 
Jaime Alvelo
San Juan, Puerto Rico
 
 
 

Jaime Alvelo

unread,
Aug 4, 2014, 1:43:23 PM8/4/14
to lav...@googlegroups.com

In my learning process I went to lavaan training site and tried to follow the example there which goes like this;
cfa> ## The famous Holzinger and Swineford (1939) example
cfa> HS.model <- ' visual  =~ x1 + x2 + x3
cfa+               textual =~ x4 + x5 + x6
cfa+               speed   =~ x7 + x8 + x9 '
cfa> fit <- cfa(HS.model, data=HolzingerSwineford1939)
cfa> summary(fit, fit.measures=TRUE)
     As in my study each factor has six variables by hypothezied dimensions (using the V1 to V18 as the vaiables)  I changed it to the following but got an error message as follows:
 
library(lavaan)
This is lavaan 0.5-16
lavaan is BETA software! Please report any bugs.
> cfa> HS.model <- ' burden  =~ V1 + V2 + V10 + V16 + V17 + V18
+ cfa+               sadness =~ V4 + V8 + V9 + V11 + V12 + V15
+ cfa+               worry   =~ V3 + X5 + V6 + V7 + V13 + V14 '
Error in cfa > HS.model <- " burden  =~ V1 + V2 + V10 + V16 + V17 + V18\ncfa+               sadness =~ V4 + V8 + V9 + V11 + V12 + V15\ncfa+               worry   =~ V3 + X5 + V6 + V7 + V13 + V14 " :
  could not find function "><-"
 
Question:Why is it indicating it could not find function "><-" ?

Jeremy Miles

unread,
Aug 4, 2014, 2:03:03 PM8/4/14
to lav...@googlegroups.com
You appear to have copied the prompt. You should type:

HS.model <- ' burden  =~ V1 + V2 + V10 + V16 + V17 + V18
               sadness =~ V4 + V8 + V9 + V11 + V12 + V15
               worry   =~ V3 + X5 + V6 + V7 + V13 + V14 '

Jeremy





--
You received this message because you are subscribed to the Google Groups "lavaan" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lavaan+un...@googlegroups.com.
To post to this group, send email to lav...@googlegroups.com.
Visit this group at http://groups.google.com/group/lavaan.
For more options, visit https://groups.google.com/d/optout.

Jaime Alvelo

unread,
Aug 4, 2014, 3:49:50 PM8/4/14
to lav...@googlegroups.com
Jeremy you got that right! Then I also spotted an error in the worry variable name where the X5 should have been a V5. Then I learned I had to substitute the file name of my data set in  the fit command (the data already had been loaded):
 fit <- cfa(HS.model, data=HolzingerSwineford1939) ------------------------------> I had made the error of using the same name
This was followed by:
summary(fit, fit.measures=TRUE)
 
And then I got this:
lavaan (0.5-16) converged normally after  45 iterations

  Number of observations                           100

  Estimator                                         ML
  Minimum Function Test Statistic              216.065
  Degrees of freedom                               132
  P-value (Chi-square)                           0.000

Model test baseline model:

  Minimum Function Test Statistic              875.299
  Degrees of freedom                               153
  P-value                                        0.000

User model versus baseline model:

  Comparative Fit Index (CFI)                    0.884
  Tucker-Lewis Index (TLI)                       0.865

Loglikelihood and Information Criteria:

  Loglikelihood user model (H0)              -2597.247
  Loglikelihood unrestricted model (H1)      -2489.215

  Number of free parameters                         39
  Akaike (AIC)                                5272.495
  Bayesian (BIC)                              5374.096
  Sample-size adjusted Bayesian (BIC)         5250.924

Root Mean Square Error of Approximation:

  RMSEA                                          0.080
  90 Percent Confidence Interval          0.060  0.099
  P-value RMSEA <= 0.05                          0.009

Standardized Root Mean Square Residual:

  SRMR                                           0.075

Parameter estimates:

  Information                                 Expected
  Standard Errors                             Standard

                   Estimate  Std.err  Z-value  P(>|z|)
Latent variables:
  burden =~
    V1                1.000
    V2                1.457    0.244    5.974    0.000
    V10               0.835    0.197    4.239    0.000
    V16               1.455    0.251    5.788    0.000
    V17               1.226    0.228    5.376    0.000
    V18               1.153    0.243    4.735    0.000
  sadness =~
    V4                1.000
    V8                1.442    0.197    7.318    0.000
    V9                0.478    0.190    2.509    0.012
    V11               1.252    0.197    6.340    0.000
    V12               1.149    0.184    6.236    0.000
    V15               1.277    0.199    6.402    0.000
  worry =~
    V3                1.000
    V5                1.320    0.298    4.426    0.000
    V6                1.822    0.379    4.801    0.000
    V7                1.191    0.312    3.817    0.000
    V13               1.468    0.332    4.418    0.000
    V14               0.824    0.256    3.214    0.001

Covariances:
  burden ~~
    sadness           0.392    0.094    4.191    0.000
    worry             0.325    0.095    3.417    0.001
  sadness ~~
    worry             0.367    0.101    3.616    0.000

Variances:
    V1                0.714    0.110
    V2                0.579    0.105
    V10               0.902    0.134
    V16               0.725    0.124
    V17               0.781    0.124
    V18               1.180    0.179
    V4                0.525    0.084
    V8                0.542    0.101
    V9                1.477    0.211
    V11               0.849    0.135
    V12               0.766    0.121
    V15               0.849    0.136
    V3                1.285    0.189
    V5                0.730    0.118
    V6                0.469    0.108
    V7                1.444    0.215
    V13               0.915    0.148
    V14               1.340    0.195
    burden            0.417    0.131
    sadness           0.496    0.130
    worry             0.385    0.160

I'm happy it ran...interpretation help needed.
 

Terrence Jorgensen

unread,
Aug 5, 2014, 12:52:04 AM8/5/14
to lav...@googlegroups.com
 

I'm happy it ran...interpretation help needed.


Brown (2006) Confirmatory Factor Analysis for Applied Research is a good introductory text to help you learn how to interpret parameters and measures of fit in EFA and CFA.  Your other earlier question about whether to use FACTOR or continue with lavaan is not a question about how to use lavaan, but about whether to use EFA or CFA.  That question would be more appropriate to post on SEMNET, which is about theoretical and practical issues: http://www2.gsu.edu/~mkteer/semnet.html

Terry

Jaime Alvelo

unread,
Aug 5, 2014, 1:24:17 PM8/5/14
to lav...@googlegroups.com
Terry:
     I did order the book from Amazon and will try to make sense of it when it arrives. I raised the question cause a psychometrician suggested I use Factor and do the Parallell test on the scale as based on the Scree test it seemed like our three diemesional scale in English was showing only one in Spanish,  Then the creator of FACTOR program wrote to me:

"From: Urbano Lorenzo Seva 
Sent: Wednesday, July 23, 2014 4:02 AM
To: Alvelo, Jaime
Subject: Re: FW: RE: RE: [EXTERNAL] Re: FACTOR Applications

Dear Jaime,

If Dr. Collazo retained one dimension, and considering the outcome in your own data, then I would advise to retain a single factor.

For the scree shoots that you send me, I see that you already defined correctly the analysis. To assess the outcome you  need to study the congruences indices that FActor will print in the outcome file.

Best,

Urbano"
                  Our hypothesis was that the Spanish version of the instrument would have the same three dimensions as the one in Enlglish though Confirmatory Factor Analysis. The FACTOR program analysis using a Pearson Corrlation Matrix, with 3 factors, (ULS) with a Oblique Procrustean rotation  showed one dimension with a scree test value of 10.600 (Hull Method for selecting number of common factors, 2011).
                   So I'm not so sure I needed to do the lavaan analysis and that is why I posted the question. You see the scale would be use for clinical purposes and I cant be recommending using it's three dimensions for distinct areas of interventions when in our culture it days something different. The scale measures Caregiver Burden of family members caring for Alheimer Patients. I can vouch for the overall "burden" dimension which in our culture include worry & sadness. Anyway thanks for the recommendation
Jaime


Seongho Bae

unread,
Aug 7, 2014, 10:29:14 AM8/7/14
to lav...@googlegroups.com
Wait, Why psychometrican suggested do the parallel test? What is your purpose for use FACTOR?

Seongho Bae

2014년 8월 6일 수요일 오전 2시 24분 17초 UTC+9, Jaime Alvelo 님의 말:

Jaime Alvelo

unread,
Aug 7, 2014, 12:14:49 PM8/7/14
to lav...@googlegroups.com
Seongo:
       The author of the original scale in English  already had done a EFA on a large data sample and found three factors. He then reduced an initial 50 item scale with three of the factors to an 18 item scale (taking an equal amount of items for each of the factors. Our team using a bicultural team approach and cross-cultural methods adapted the the scale to Spanish (forward & bacward translation using professional translators & testing for comprehension). What we wanted to to was to verify that in the Spanish version were an equivalent number of factors as in the original English version (original has copyright and we had permission from author).
Our idea was to use CFA to test whether we had the same factors and the items fell equally into each factor. We had a dated software version of SPSS that did not do CFA with no funds to buy the expensive software so we began to expore available shareware. We came to find R with the lavaan option and our psychometrician reccommende the FACTOR program. In FACTOR we did the following:

                                     F A C T O R  

                           Unrestricted Factor Analysis
                                Release Version 9.2
                                    February, 2013
                            Rovira i Virgili University
                                Tarragona, SPAIN
                                   Programming:
                               Urbano Lorenzo-Seva
                            Mathematical Specification:
                               Urbano Lorenzo-Seva
                                 Pere J. Ferrando
                            Date: Wednesday, July 23, 2014
                            Time: 9:26:6
--------------------------------------------------------------------------------
DETAILS OF ANALYSIS
Participants' scores data file                       : ....Factor\ASCII.dat
Number of participants                               : 100
Number of variables                                  : 18
Variables included in the analysis                   : ALL
Variables excluded in the analysis                   : NONE
Number of factors                                    : 3
Number of second order factors                       : 0
Procedure for determining the number of dimensions   : Hull method for selecting the number of common factors (Lorenzo-Seva, Timmerman, & Kiers, 2011)
Dispersion matrix                                    : Pearson Correlations
Method for factor extraction                         : Unweighted Least Squares (ULS)
Rotation to user defined target                      : Semi-specified oblique Procrustes rotation (Browne, 1972b)
Number of random starts                              : 10
Maximum mumber of iterations                         : 100
Convergence value                                    : 0.00001000
 
          The analysis done was base on some  prliminary results I had sent to Dr. Seva and he had sent me the format for the Procrustean rotation that looked like this:
Variable     F   1    F   2    F   3
V   1         ---     0.000    0.000 
V   2         ---     0.000    0.000 
V   3        0.000    0.000     ---  
V   4        0.000     ---     0.000 
V   5        0.000    0.000     ---  
V   6        0.000    0.000     ---  
V   7        0.000    0.000     ---  
V   8        0.000     ---     0.000 
V   9        0.000     ---     0.000 
V  10         ---     0.000    0.000 
V  11        0.000     ---     0.000 
V  12        0.000     ---     0.000 
V  13        0.000    0.000     ---  
V  14        0.000    0.000     ---  
V  15        0.000     ---     0.000 
V  16         ---     0.000    0.000 
V  17         ---     0.000    0.000 
V  18         ---     0.000    0.000 
 
                              As all this is very technical and having fed our consultants this information I'm assuming we have tested of our hypotheis and the results rejected it...The Spanish Version came up with only one factor and not three as in the English version.
                        Seongho I hope you have the background to understand all this technical jargon...I'm relying on the psychometricians and experts on the matter for the conclusions of this analysis. Any comment is more than welcome.
Jaime
PD: I did not answer your question about hte parallel analysis hoping that the above would clarify the whole point of the issues being presented..."did we or did we not find the same three factors?"

Seongho Bae

unread,
Aug 7, 2014, 3:29:00 PM8/7/14
to lav...@googlegroups.com
Dear Jaime,

Okay, I got it. You want to make a Spanish item scale, right?

First, FACTOR can Exploratory Factor Analysis, not CFA. So, Procrustes rotation criteria is not a CFA method. That's just a kind of EFA method.

Second, You can use another estimator in lavaan when you try CFA. It may help fitting your data in your model.

Like this:
HS.model <- '
burden  =~ V1 + V2 + V10 + V16 + V17 + V18
sadness =~ V4 + V8 + V9 + V11 + V12 + V15
worry   =~ V3 + V5 + V6 + V7 + V13 + V14
'

fit <- cfa(HS.model, data=HolzingerSwineford1939, estimator='WLSMV', ordered=names(HolzingerSwineford1939))
summary(fit, fit.measures=TRUE, standardized=TRUE)

Third, Please do not use the Eigenvalue greater than one rule (as known as Kaiser rules) when you try to estimate EFA models. That's not accurate criteria. You can try to calculate CFI, TLI and RMSEA when you estimate the EFA model via fa() in psych(), efaUnrotate() in semTools(), and mirt() with the M2 statistic. CFI and TLI > .9 and RMSEA Upper < .08 are may useful to find factor structure of Spanish Scale.

If you feel that's too hard to execute, I can help you freely.
You can ask privately via my e-mail with Developmental sample for EFA and Validation Sample for CFA.

--
Seongho Bae

Master of Psychology in Industrial & Organizational Psychology.

2014년 8월 8일 금요일 오전 1시 14분 49초 UTC+9, Jaime Alvelo 님의 말:

Jaime Alvelo

unread,
Aug 7, 2014, 4:02:18 PM8/7/14
to lav...@googlegroups.com
Seongo:
    Did send you output/results based on your recommendations. Please check your e-mail. Do we have a confirmation of one, two, three or more factors?
jaime

On Thursday, August 7, 2014 3:29:00 PM UTC-4, Seongho Bae wrote:
Dear Jaime,

Okay, I got it. You want to make a Spanish item scale, right?

First, FACTOR can Exploratory Factor Analysis, not CFA. So, Procrustes rotation criteria is not a CFA method. That's just a kind of EFA method.

Second, You can use another estimator in lavaan when you try CFA. It may help fitting your data in your model.

Like this:
HS.model <- '
burden  =~ V1 + V2 + V10 + V16 + V17 + V18
sadness =~ V4 + V8 + V9 + V11 + V12 + V15
worry   =~ V3 + V5 + V6 + V7 + V13 + V14
'

fit <- cfa(HS.model, data=HolzingerSwineford1939, estimator='WLSMV', ordered=names(HolzingerSwineford1939))
summary(fit, fit.measures=TRUE, standardized=TRUE)

Third, Please do not use the Eigenvalue greater than one rule (as known as Kaiser rules) when you try to estimate EFA models. That's not accurate criteria. You can try to calculate CFI, TLI and RMSEA when you estimate the EFA model via fa() in psych(), efaUnrotate() in semTools(), and mirt() with the M2 statistic. CFI and TLI > .9 and RMSEA Upper < .08 are may useful to find factor structure of Spanish Scale.

If you feel that's too hard to execute, I can help you freely.
You can ask privately via my e-mail with Developmental sample for EFA and Validation Sample for CFA.

--
Seongho Bae
 

Seongho Bae

unread,
Aug 7, 2014, 4:16:53 PM8/7/14
to lav...@googlegroups.com
Here is the Scale validation process:

1. Split half of file randomly. One is for doing EFA what will call 'Developmental Sample', the other is doing CFA what will call 'Validation Sample'.
2. In EFA, Increasing factors with checking the model fit based on chi-square (CFI, TLI, RMSEA), Communality, etc. with Developmental sample.
3. In CFA, Check the EFA model is right using an EFA structure with Validation sample.

I didn't receive any e-mail. My name is Seongho not Seongo. Please check e-mail address.

--
Seongho Bae

2014년 8월 8일 금요일 오전 5시 2분 18초 UTC+9, Jaime Alvelo 님의 말:

Jaime Alvelo

unread,
Aug 8, 2014, 10:57:02 AM8/8/14
to lav...@googlegroups.com
Ok. Seongho...got it now...sorry about that. People have problems with my name too (Jamie; Hymee...).
I sent it to seo...@kw.ac.kr yesterday!
And again this morning
Jaime

Seongho Bae

unread,
Aug 9, 2014, 4:48:25 AM8/9/14
to lav...@googlegroups.com
Sorry, No I didn't getting any email from you.

Can you try to send again that to another my email address?

seongh...@gmail.com

Thanks.

Seongho Bae

2014년 8월 8일 금요일 오후 11시 57분 2초 UTC+9, Jaime Alvelo 님의 말:

Jaime Alvelo

unread,
Aug 11, 2014, 12:47:39 PM8/11/14
to lav...@googlegroups.com
I've been informed that in order to conduct EFA and CFA I needed to have a much larger sample size using the formula 20* numbre of variables which tranlated would mean 320 for EFA and 320 more for CFA. So if that is true (many thesis are done with 100 cases) I've wasted my time with R & FACTOR and maybe I should just dump the study cause it was 5 years just to get 100 (each subject was 45 minutes and 5 to 10 volunteer on monthly basis-questionnaire included theorethical variables)...so for 620 it would take me 30 more years at which time I would have passed away. So maybe I can just do a descriptive and leave the rest to someone else.

Seongho Bae

unread,
Aug 11, 2014, 12:57:25 PM8/11/14
to lav...@googlegroups.com
Oh, sorry.

If you cannot collect more samples, that is may okay. The CFA result is okay. High correlation and the sample size problem are just a limitation.

Seongho Bae.

2014년 8월 12일 화요일 오전 1시 47분 39초 UTC+9, Jaime Alvelo 님의 말:

2014년 8월 12일 화요일 오전 1시 47분 39초 UTC+9, Jaime Alvelo 님의 말:

Jaime Alvelo

unread,
Aug 11, 2014, 2:21:11 PM8/11/14
to lav...@googlegroups.com
Seongho:
 
Ok,,,truth hurts but must be faced..study has limitations..Nonetheless the model may not support a three factor solution and I was considering if culture indicated one factor instead of thrree so I did a one factor run,,,did send you output to evaluate. Can you give feedback please.
Jaime
...

Seongho Bae

unread,
Aug 11, 2014, 2:36:53 PM8/11/14
to lav...@googlegroups.com
If you can find one factor in FACTOR application using simplimax, promin, etc., You can report that. 

You can do again CFA with WLSMV estimator, It may be fine to report to the journal or a thesis for reporting CFI, TLI, RMSEA, WRMR.

Seongho Bae

2014년 8월 12일 화요일 오전 3시 21분 11초 UTC+9, Jaime Alvelo 님의 말:

Mauricio Garnier-Villarreal

unread,
Aug 11, 2014, 11:22:06 PM8/11/14
to lav...@googlegroups.com
Hi Jaime

I dont agree with the recomendation that you got about the neccesary sample size. Acording to these rules it would be impossible to have enough sample to estimate complex models, I have work models with more than 200 parameters.

The way you define the neccesary sample size would be with a proper power analysis. A power analysis for CFA is basically a small simulation with varying sample sizes. In R you can do this with the simsem package ( https://github.com/simsem/simsem/blob/master/SupportingDocs/Examples/Version05/ex18/ex18.R ), in this link you can see an example of a power analysis.

Remember that for CFA what is analyzed is the covariance matrix, not the individual data, so, if your sample is a good representation (on the covariance matrix) of the population you are good with a relatively small sample.

I think that N=100 should be enough to estimate a model with 3 factors CFA. If the measures have good quality 100 its usually enough

Hope this helps, and gives more hope.

Bye

--- Mauricio Garnier-Villarreal
Seongho Bae
<td style="
...

Jaime Alvelo

unread,
Aug 12, 2014, 2:50:17 PM8/12/14
to lav...@googlegroups.com
Thanks for your input Mauricio. Sample size estimation would have had to be done prior to the study so that Seongho has pointed out to a limitation that must be acknowledged if you we are to publish in refereed journals where experts review your submitted article. We based our items on an already established scale in the English language done with a rigorous methodology and a good size sample. We used and inter-disciplinary approach using cross-cultural methodology for the adapatation of the instrument to our language. But as the instrument had clinical applications to its sub-scale scores we could not proceed to accepting just the rigorous forward and backward translation method. We proceeded to a field test replicating the original study and finding equivalent already validated scales to test for construct validity of the translated scale. We found that culture impacted the dimensions of the construct and thus we will not be recommending the use of the sub-scales. Nonetheless the full 18 item scale is still usable for clinical purposes and we will publish its use in the near future. I have two PhD psychologists with vast experience in my team who who will help me with the technical pieces of psychometrics. We do want to appreciate the observations & recommendations of :
Jeremy Miles
Terence Jorgensen
Seongo Bae
Mauricio Garnier
 
It is these supports that motivates us to continue research even if unfunded.
Jaime Alvelo, DSW
San Juan, Puerto Rico


--
You received this message because you are subscribed to a topic in the Google Groups "lavaan" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/lavaan/URqA2hDFDlU/unsubscribe.
To unsubscribe from this group and all its topics, send an email to lavaan+un...@googlegroups.com.

Mauricio Garnier-Villarreal

unread,
Aug 13, 2014, 1:02:53 AM8/13/14
to lav...@googlegroups.com
Hi Jaime

Good that you found a satisfactory solution.

As a kind of advertisement I leave you the link of the stats camps for next year ( http://www.depts.ttu.edu/immap/statscamp/ ), they are one week courses, several of them SEM related. I will be giving a course of foundation of SEM in spanish in case you or somebody that you know may be interested, I focus on its application in lavaan

good luck

bye

et

unread,
Feb 7, 2017, 8:45:45 AM2/7/17
to lavaan
@ Seongho Bae:
Do you have a citation in mind for the fact you shouldn't relie on eigenvalues doing an EFA with estimation techniques where you get fit indices?
Thanks a lot!
Reply all
Reply to author
Forward
0 new messages