[R] SEM model testing with identical goodness of fits

hyena

unread,

Mar 14, 2009, 5:07:20 PM3/14/09

to r-h...@stat.math.ethz.ch

HI,

I am testing several models about three latent constructs that
measure risk attitudes.
Two models with different structure obtained identical of fit measures
from chisqure to BIC.
Model1 assumes three factors are correlated with each other and model
two assumes a higher order factor exist and three factors related to
this higher factor instead of to each other.

Model1:
model.one <- specify.model()
tr<->tp,e.trtp,NA
tp<->weber,e.tpweber,NA
weber<->tr,e.webertr,NA
weber<->weber, e.weber,NA
tp<->tp,e.tp,NA
tr <->tr,e.trv,NA
....

Model two
model.two <- specify.model()
rsk->tp,e.rsktp,NA
rsk->tr,e.rsktr,NA
rsk->weber,e.rskweber,NA
rsk<->rsk, NA,1
weber<->weber, e.weber,NA
tp<->tp,e.tp,NA
tr <->tr,e.trv,NA
....

the summary of both sem model gives identical fit indices, using same
data set.

is there some thing wrong with this mode specification?

Thanks

______________________________________________
R-h...@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

John Fox

unread,

Mar 14, 2009, 6:30:08 PM3/14/09

to hyena, r-h...@stat.math.ethz.ch

Dear hyena,

>From your verbal description, I would have thought that the second model is
more restrictive than the first, but that doesn't seem to be the case -- if
the two models have identical log-likelihoods and degrees of freedom, as you
seem to imply, then it's a good bet that the models are observationally
indistinguishable. On the other hand, you don't provide a whole lot of
information; it would have been much more informative had you shown the
input and output for both models.

John

John Fox

unread,

Mar 14, 2009, 6:34:24 PM3/14/09

to hyena, r-h...@stat.math.ethz.ch

Dear hyena,

Actually, looking at this a bit more closely, the first models dedicate 6
parameters to the correlational and variational structure of the three
variables that you mention -- 3 variances and 3 covariances; the second
model also dedicates 6 parameters -- 3 factor loadings and 3 error variances
(with the variance of the factor fixed as a normalization). You don't show
the remaining structure of the models, but a good guess is that they are
observationally indistinguishable.

John

> -----Original Message-----
> From: r-help-...@r-project.org [mailto:r-help-...@r-project.org]
On
> Behalf Of hyena
> Sent: March-14-09 5:07 PM
> To: r-h...@stat.math.ethz.ch
> Subject: [R] SEM model testing with identical goodness of fits
>

hyena

unread,

Mar 15, 2009, 4:25:25 AM3/15/09

to r-h...@stat.math.ethz.ch

Dear John,

Thanks for the prompt reply! Sorry did not supply with more detailed
information.

The target model consists of three latent factors, general risk
scale from Weber's domain risk scales, time perspective scale from
Zimbardo(only future time oriented) and a travel risk attitude scale.
Variables with "prob_" prefix are items of general risk scale, variables
of "o1" to "o12" are items of future time perspective and "v5" to "v13"
are items of travel risk scale.

The purpose is to explore or find a best fit model that "correctly"
represent the underlining relationship of three scales. So far, the
correlated model has the best fit indices, so I 'd like to check if
there is a higher level factor that govern all three factors, thus the
second model.

The data are all 5 point Likert scale scores by respondents(N=397).
The example listed bellow did not show "prob_" variables(their names are
too long).

Given the following model structure, if they are indeed
observationally indistinguishable, is there some possible adjustments to
test the higher level factor effects?

Thanks,

###########################
#data example, partial
#########################
1 1 1 1
id o1 o2 o3 o4 o5 o6 o7 o8 o9 o10 o11 o12 o13 v5 v13 v14 v16 v17
14602 2 2 4 4 5 5 2 3 2 4 3 4 2 5 2 2 4 2
14601 2 4 5 4 5 5 2 5 3 4 5 4 5 5 3 4 4 2
14606 1 3 5 5 5 5 3 3 5 3 5 5 5 5 5 5 5 3
14610 2 1 4 5 4 5 3 4 4 2 4 2 1 5 3 5 5 5
14609 4 3 2 2 5 5 2 5 2 4 4 2 2 4 2 4 4 4

####################################
#correlated model, three scales corrlated to each other
model.correlated <- specify.model()
weber<->tp,e.webertp,NA
tp<->tr,e.tptr,NA
tr<->weber,e.trweber,NA
weber<->weber,NA,1

tp<->tp,e.tp,NA
tr <->tr,e.trv,NA

weber -> prob_wild_camp,alpha2,NA
weber -> prob_book_hotel_in_short_time,alpha3,NA
weber -> prob_safari_Kenia, alpha4, NA
weber -> prob_sail_wild_water,alpha5,NA
weber -> prob_dangerous_sport,alpha7,NA
weber -> prob_bungee_jumping,alpha8,NA
weber -> prob_tornado_tracking,alpha9,NA
weber -> prob_ski,alpha10,NA
prob_wild_camp <-> prob_wild_camp, ep2,NA
prob_book_hotel_in_short_time <-> prob_book_hotel_in_short_time,ep3,NA
prob_safari_Kenia <-> prob_safari_Kenia, ep4, NA
prob_sail_wild_water <-> prob_sail_wild_water,ep5,NA
prob_dangerous_sport <-> prob_dangerous_sport,ep7,NA
prob_bungee_jumping <-> prob_bungee_jumping,ep8,NA
prob_tornado_tracking <-> prob_tornado_tracking,ep9,NA
prob_ski <-> prob_ski,ep10,NA
tp -> o1,NA,1
tp -> o3,beta3,NA
tp -> o4,beta4,NA
tp -> o5,beta5,NA
tp -> o6,beta6,NA
tp -> o7,beta7,NA
tp -> o9,beta9,NA
tp -> o10,beta10,NA
tp -> o11,beta11,NA
tp -> o12,beta12,NA
o1 <-> o1,eo1,NA
o3 <-> o3,eo3,NA
o4 <-> o4,eo4,NA
o5 <-> o5,eo5,NA
o6 <-> o6,eo6,NA
o7 <-> o7,eo7,NA
o9 <-> o9,eo9,NA
o10 <-> o10,eo10,NA
o11 <-> o11,eo11,NA
o12 <-> o12,eo12,NA
tr -> v5, NA,1
tr -> v13, gamma2,NA
tr -> v14, gamma3,NA
tr -> v16,gamma4,NA
tr -> v17,gamma5,NA
v5 <-> v5,ev1,NA
v13 <-> v13,ev2,NA
v14 <-> v14,ev3,NA
v16 <-> v16, ev4, NA
v17 <-> v17,ev5,NA

sem.correlated <- sem(model.correlated, cov(riskninfo_s), 397)
summary(sem.correlated)
samelist = c('weber','tp','tr')
minlist=c(names(rk),names(tp))
maxlist = NULL
path.diagram(sem2,out.file =
"e:/sem2.dot",same.rank=samelist,min.rank=minlist,max.rank =
maxlist,edge.labels="values",rank.direction='LR')

#############################################
#high level latent scale, a high level factor exist
##############################################
model.rsk <- specify.model()

rsk->tp,e.rsktp,NA
rsk->tr,e.rsktr,NA
rsk->weber,e.rskweber,NA
rsk<->rsk, NA,1
weber<->weber, e.weber,NA
tp<->tp,e.tp,NA
tr <->tr,e.trv,NA

weber -> prob_wild_camp,NA,1
weber -> prob_book_hotel_in_short_time,alpha3,NA
weber -> prob_safari_Kenia, alpha4, NA
weber -> prob_sail_wild_water,alpha5,NA
weber -> prob_dangerous_sport,alpha7,NA
weber -> prob_bungee_jumping,alpha8,NA
weber -> prob_tornado_tracking,alpha9,NA
weber -> prob_ski,alpha10,NA
prob_wild_camp <-> prob_wild_camp, ep2,NA
prob_book_hotel_in_short_time <-> prob_book_hotel_in_short_time,ep3,NA
prob_safari_Kenia <-> prob_safari_Kenia, ep4, NA
prob_sail_wild_water <-> prob_sail_wild_water,ep5,NA
prob_dangerous_sport <-> prob_dangerous_sport,ep7,NA
prob_bungee_jumping <-> prob_bungee_jumping,ep8,NA
prob_tornado_tracking <-> prob_tornado_tracking,ep9,NA
prob_ski <-> prob_ski,ep10,NA
tp -> o1,NA,1
tp -> o3,beta3,NA
tp -> o4,beta4,NA
tp -> o5,beta5,NA
tp -> o6,beta6,NA
tp -> o7,beta7,NA
tp -> o9,beta9,NA
tp -> o10,beta10,NA
tp -> o11,beta11,NA
tp -> o12,beta12,NA
o1 <-> o1,eo1,NA
o3 <-> o3,eo3,NA
o4 <-> o4,eo4,NA
o5 <-> o5,eo5,NA
o6 <-> o6,eo6,NA
o7 <-> o7,eo7,NA
o9 <-> o9,eo9,NA
o10 <-> o10,eo10,NA
o11 <-> o11,eo11,NA
o12 <-> o12,eo12,NA
tr -> v5, NA,1
tr -> v13, gamma2,NA
tr -> v14, gamma3,NA
tr -> v16,gamma4,NA
tr -> v17,gamma5,NA
v5 <-> v5,ev1,NA
v13 <-> v13,ev2,NA
v14 <-> v14,ev3,NA
v16 <-> v16, ev4, NA
v17 <-> v17,ev5,NA

sem.rsk <- sem(model.rsk, cov(riskninfo_s), 397)
summary(sem.rsk)

##############
#model one results
###############
Model Chisquare = 680.79 Df = 227 Pr(>Chisq) = 0
Chisquare (null model) = 2443.4 Df = 253
Goodness-of-fit index = 0.86163
Adjusted goodness-of-fit index = 0.83176
RMSEA index = 0.07105 90% CI: (NA, NA)
Bentler-Bonnett NFI = 0.72137
Tucker-Lewis NNFI = 0.7691
Bentler CFI = 0.79282
SRMR = 0.069628
BIC = -677.56

Normalized Residuals
Min. 1st Qu. Median Mean 3rd Qu. Max.
-3.4800 -0.8490 -0.0959 -0.0186 0.6540 8.8500

Parameter Estimates
Estimate Std Error z value Pr(>|z|)
e.webertp -0.058847 0.023473 -2.5070 1.2175e-02
e.tptrl 0.151913 0.031072 4.8890 1.0134e-06
e.trweber -0.255449 0.044469 -5.7444 9.2264e-09
e.tp 0.114260 0.038652 2.9562 3.1149e-03
e.trv 0.464741 0.068395 6.7950 1.0832e-11
alpha2 0.488106 0.051868 9.4105 0.0000e+00
alpha3 0.446255 0.052422 8.5127 0.0000e+00
alpha4 0.517707 0.050863 10.1784 0.0000e+00
alpha5 0.772128 0.045863 16.8356 0.0000e+00
alpha7 0.782098 0.045754 17.0934 0.0000e+00
alpha8 0.668936 0.048092 13.9095 0.0000e+00
alpha9 0.376798 0.052977 7.1124 1.1400e-12
alpha10 0.449507 0.051885 8.6635 0.0000e+00
ep2 0.761752 0.058103 13.1104 0.0000e+00
ep3 0.800857 0.060154 13.3134 0.0000e+00
ep4 0.731980 0.056002 13.0705 0.0000e+00
ep5 0.403819 0.040155 10.0565 0.0000e+00
ep7 0.388322 0.039930 9.7250 0.0000e+00
ep8 0.552524 0.046619 11.8519 0.0000e+00
ep9 0.858023 0.063098 13.5982 0.0000e+00
ep10 0.797945 0.059651 13.3770 0.0000e+00
beta3 1.670861 0.312656 5.3441 9.0871e-08
beta4 1.536421 0.292725 5.2487 1.5319e-07
beta5 1.530081 0.294266 5.1997 1.9966e-07
beta6 1.767803 0.329486 5.3653 8.0801e-08
beta7 0.870601 0.200366 4.3451 1.3924e-05
beta9 1.692284 0.312799 5.4101 6.2975e-08
beta10 1.009742 0.224155 4.5047 6.6480e-06
beta11 1.723416 0.324593 5.3095 1.0995e-07
beta12 1.452796 0.286857 5.0645 4.0940e-07
eo1 0.885742 0.065529 13.5168 0.0000e+00
eo3 0.681004 0.055626 12.2425 0.0000e+00
eo4 0.730277 0.057682 12.6603 0.0000e+00
eo5 0.732500 0.059305 12.3514 0.0000e+00
eo6 0.642921 0.055797 11.5226 0.0000e+00
eo7 0.913393 0.066903 13.6526 0.0000e+00
eo9 0.672777 0.054994 12.2336 0.0000e+00
eo10 0.883505 0.065198 13.5512 0.0000e+00
eo11 0.660627 0.055399 11.9249 0.0000e+00
eo12 0.758847 0.059582 12.7361 0.0000e+00
gamma2 0.689244 0.089575 7.6946 1.4211e-14
gamma3 0.880574 0.093002 9.4684 0.0000e+00
gamma4 1.083443 0.092856 11.6680 0.0000e+00
gamma5 0.589127 0.087252 6.7520 1.4584e-11
ev1 0.535257 0.050039 10.6968 0.0000e+00
ev2 0.779221 0.060274 12.9280 0.0000e+00
ev3 0.639632 0.054097 11.8239 0.0000e+00
ev4 0.454467 0.048438 9.3824 0.0000e+00
ev5 0.838702 0.062929 13.3277 0.0000e+00

#####################################
#model two results
##################################
Model Chisquare = 680.79 Df = 227 Pr(>Chisq) = 0
Chisquare (null model) = 2443.4 Df = 253
Goodness-of-fit index = 0.86163
Adjusted goodness-of-fit index = 0.83176
RMSEA index = 0.07105 90% CI: (NA, NA)
Bentler-Bonnett NFI = 0.72137
Tucker-Lewis NNFI = 0.7691
Bentler CFI = 0.79282
SRMR = 0.069627
BIC = -677.56

Normalized Residuals
Min. 1st Qu. Median Mean 3rd Qu. Max.
-3.4800 -0.8490 -0.0959 -0.0186 0.6540 8.8500

Parameter Estimates
Estimate Std Error z value Pr(>|z|)
e.rsktp 0.187069 0.045642 4.09859 4.1567e-05
e.rsktrl 0.812070 0.131731 6.16462 7.0652e-10
e.rskweber -0.153542 0.038132 -4.02660 5.6589e-05
e.weber 0.214671 0.046260 4.64056 3.4746e-06
e.tp 0.079263 0.028484 2.78270 5.3909e-03
e.trv -0.194712 0.197101 -0.98788 3.2321e-01
alpha3 0.914263 0.131132 6.97206 3.1233e-12
alpha4 1.060649 0.143622 7.38499 1.5254e-13
alpha5 1.581889 0.177961 8.88898 0.0000e+00
alpha7 1.602316 0.182893 8.76095 0.0000e+00
alpha8 1.370476 0.164966 8.30764 0.0000e+00
alpha9 0.771961 0.128670 5.99955 1.9787e-09
alpha10 0.920922 0.136148 6.76413 1.3411e-11
ep2 0.761752 0.058109 13.10909 0.0000e+00
ep3 0.800856 0.060155 13.31314 0.0000e+00
ep4 0.731979 0.056003 13.07044 0.0000e+00
ep5 0.403818 0.040155 10.05643 0.0000e+00
ep7 0.388322 0.039932 9.72459 0.0000e+00
ep8 0.552523 0.046620 11.85175 0.0000e+00
ep9 0.858024 0.063099 13.59811 0.0000e+00
ep10 0.797943 0.059651 13.37694 0.0000e+00
beta3 1.670904 0.310681 5.37820 7.5234e-08
beta4 1.536444 0.290968 5.28045 1.2887e-07
beta5 1.530096 0.292603 5.22926 1.7019e-07
beta6 1.767838 0.327427 5.39918 6.6945e-08
beta7 0.870626 0.199814 4.35718 1.3175e-05
beta9 1.692309 0.310816 5.44473 5.1885e-08
beta10 1.009760 0.223270 4.52259 6.1088e-06
beta11 1.723432 0.322488 5.34417 9.0830e-08
beta12 1.452761 0.285172 5.09434 3.4997e-07
eo1 0.885741 0.065519 13.51880 0.0000e+00
eo3 0.681003 0.055625 12.24265 0.0000e+00
eo4 0.730278 0.057683 12.66029 0.0000e+00
eo5 0.732501 0.059307 12.35108 0.0000e+00
eo6 0.642919 0.055799 11.52215 0.0000e+00
eo7 0.913394 0.066900 13.65310 0.0000e+00
eo9 0.672778 0.054994 12.23360 0.0000e+00
eo10 0.883503 0.065197 13.55124 0.0000e+00
eo11 0.660630 0.055397 11.92534 0.0000e+00
eo12 0.758852 0.059582 12.73619 0.0000e+00
gamma2 0.689244 0.089545 7.69720 1.3989e-14
gamma3 0.880580 0.092955 9.47317 0.0000e+00
gamma4 1.083430 0.092789 11.67631 0.0000e+00
gamma5 0.589119 0.087233 6.75338 1.4444e-11
ev1 0.535258 0.050034 10.69783 0.0000e+00
ev2 0.779219 0.060273 12.92808 0.0000e+00
ev3 0.639627 0.054096 11.82402 0.0000e+00
ev4 0.454472 0.048437 9.38269 0.0000e+00
ev5 0.838705 0.062929 13.32769 0.0000e+00

John Fox

unread,

Mar 15, 2009, 9:00:12 AM3/15/09

to hyena, r-h...@stat.math.ethz.ch

Dear hyena,

> -----Original Message-----
> From: r-help-...@r-project.org [mailto:r-help-...@r-project.org]
On
> Behalf Of hyena
> Sent: March-15-09 4:25 AM
> To: r-h...@stat.math.ethz.ch
> Subject: Re: [R] SEM model testing with identical goodness of fits (2)
>
> Dear John,
>
> Thanks for the prompt reply! Sorry did not supply with more detailed
> information.
>
> The target model consists of three latent factors, general risk
> scale from Weber's domain risk scales, time perspective scale from
> Zimbardo(only future time oriented) and a travel risk attitude scale.
> Variables with "prob_" prefix are items of general risk scale, variables
> of "o1" to "o12" are items of future time perspective and "v5" to "v13"
> are items of travel risk scale.
>
> The purpose is to explore or find a best fit model that "correctly"
> represent the underlining relationship of three scales. So far, the
> correlated model has the best fit indices, so I 'd like to check if
> there is a higher level factor that govern all three factors, thus the
> second model.

Both models are very odd. In the first, each of tr, weber, and tp has direct
effects on different subsets of the endogenous variables. The implicit claim
of these models is that, e.g., prob_* are conditionally independent of tr
and tp given weber, and that the correlations among prob_* are entirely
accounted for by their dependence on weber. The structural coefficients are
just the simple regressions of each prob_* on weber. The second model is the
same except that the variances and covariances among weber, tr, and tp are
parametrized differently. I'm not sure why you set the models up in this
manner, and why your research requires a structural-equation model. I would
have expected that each of the prob_*, v*, and o* variables would have
comprised indicators of a latent variable (risk-taking, etc.). The models
that you specified seem so strange that I think that you'd do well to try to
find competent local help to sort out what you're doing in relationship to
the goals of the research. Of course, maybe I'm just having a failure of
imagination.

>
> The data are all 5 point Likert scale scores by respondents(N=397).

It's problematic to treat ordinal variables if they were metric (and to fit
SEMs of this complexity to a small sample).

> The example listed bellow did not show "prob_" variables(their names are
> too long).
>
> Given the following model structure, if they are indeed
> observationally indistinguishable, is there some possible adjustments to
> test the higher level factor effects?

No. Because the models necessarily fit the same, you'd have to decide
between them on grounds of plausibility. Moreover both models fit very
badly.

Regards,
John

hyena

unread,

Mar 15, 2009, 12:00:05 PM3/15/09

to r-h...@stat.math.ethz.ch

Dear John,

Thanks for the reply.

Maybe I had used wrong terminology, as you pointed out, in fact,
variables "prob*", "o*" and "v*" are indicators of three latent
variables(scales): weber, tp, and tr respectively. So variables
"prob*", "o*" and "v*" are exogenous variables. e.g., variable
"prob_dangerous_sport" is the answers of question "how likely do you
think you will engage a dangerous sport? (1-very unlikely to 5- very
likely). Variables weber, tr, tp are latent variables representing risk
attitudes in different domains(recreation, planned behaviour, travel
choice ). Hope this make sense of the models.

By exploratory analysis, it had shown consistencies(Cronbach alpha) in
each scale(latent variable tr, tp, weber), and significant correlations
among these three scales. The two models mentioned in previous posts
are the efforts to find out if there is a more general factor that can
account for the correlations and make the three scales its sub scales.
In this sense, SEM is used more of a CFA (sem is the only packages I
know to do so, i did not search very hard of course).

And Indeed the model fit is quite bad.

regards,

William Revelle

unread,

Mar 15, 2009, 12:11:54 PM3/15/09

to hyena, r-h...@stat.math.ethz.ch, John Fox

Dear Hyena,

Your model is of three correlated factors accounting for the
observed variables.
Those three correlations may be accounted for equally well by
correlations (loadings) of the lower order factors with a general
factor.
Those two models are indeed equivalent models and will, as a
consequence have exactly equal fits and dfs.

Call the three correlations rab, rac, rbc. Then a higher order
factor model will have loadings of
fa, fb and fc, where fa*fb = rab, fa*bc = rac, and fb*fc = rbc.
You can solve for fa, fb and fc in terms of factor inter-correlations.

You can not compare the one to the other, for they are equivalent models.

You can examine how much of the underlying variance of the original
items is due to the general factor by considering a bi-factor
solution where the general factor loads on each of the observed
variables and a set of residual group factors account for the
covariances within your three domains. This can be done in an
Exploratory Factor Analysis (EFA) context using the omega function in
the psych package. It is possible to then take that model and test it
using John Fox's sem package to evaluate the size of each of the
general and group factor loadings. (A discussion of how to do that
is at http://www.personality-project.org/r/book/psych_for_sem.pdf ).

Bill

--
William Revelle http://personality-project.org/revelle.html
Professor http://personality-project.org/personality.html
Department of Psychology http://www.wcas.northwestern.edu/psych/
Northwestern University http://www.northwestern.edu/
Attend ISSID/ARP:2009 http://issid.org/issid.2009/

John Fox

unread,

Mar 15, 2009, 12:46:17 PM3/15/09

to hyena, r-h...@stat.math.ethz.ch

Dear Hyena,

OK -- I see that what you're trying to do is simply to fit a confirmatory
factor-analysis model.

The two models that you're considering aren't really different -- they are,
as I said, observationally equivalent, and fit the data poorly. You can
*assume* a common higher-level factor and estimate the three loadings on it
for the lower-level factors, but you can't test this model against the first
model.

I'm not sure what you gain from the CFA beyond what you learned from an
exploratory factor analysis. Using the same data first in an EFA and then
for a CFA essentially invalidates the CFA, which is no longer confirmatory.
One would, then, expect a CFA following an EFA to fit the data well, since
the CFA was presumably specified to do so, but I suspect that a closer
examination of the EFA will show that the items don't divide so neatly into
the three sets.

Regards,

hyena

unread,

Mar 15, 2009, 1:29:25 PM3/15/09

to r-h...@stat.math.ethz.ch

Thanks for the clear clarification. The suggested bi-factor solution
sounds attractive. I am going to check it in details.

regards,

hyena

unread,

Mar 15, 2009, 3:13:18 PM3/15/09

to r-h...@stat.math.ethz.ch

The purpose of carrying this CFA is to test the validity of a new
developed scale "tr" with "v*" items, other two scales "weber" and "tp"
are existing scales that measures specific risk attitudes. I am not sure
if a simple correlation analysis is adequate to this purpose or not,
thus the CFA test.

Further, although a PCA has tested the dimensionality of all items, they
are not divided as PCA result suggested, rather, their original grouping
remains. The indicators are indeed not very well divided in PCA, mainly,
"o*" items are located in two components.

Originally, the EFA has been carried out on the first half of the sample
and CFA on the second half. Due to the low fit indices from CFA of the
partial sample, the full sample is tested in CFA to see if sample size
affects much, and the results is as poor as before.

It seems the time to read more about scale developing. And thanks for
all these inputs.

Reply all

Reply to author

Forward