formative measurement model for CFA

3,502 views
Skip to first unread message

Roger

unread,
Nov 26, 2012, 11:24:23 AM11/26/12
to lav...@googlegroups.com

Dear Lavaan users,

Being new to lavaan, I can't seem to figure out how to formulate a formative measurement model. I have two constructs each caused by a set of indicators. The code exampes I see on the internet seem to all represent reflective constructs, rather than formative models.
Let's say I have constructs Y1 caused by X1, X2, X3 and Y2 caused by X4, X5, X6, X7. My goal is to run a CFA, for the formative scales (Y1 and Y2).
How would I formulate that in lavaan?

thanks,
Roger

yrosseel

unread,
Nov 26, 2012, 1:35:37 PM11/26/12
to lav...@googlegroups.com
This is indeed mostly undocumented. But this is the best approach:

model <- ' Y1 <~ X1 + X2 + X3 + X4
Y2 <~ X5 + X6 + X7
# ...
'
where the '<~' operator is used to construct a formative factor.

Yves.

René Mayer

unread,
Nov 26, 2012, 4:16:14 PM11/26/12
to lav...@googlegroups.com
Dear Roger,
Y 'caused by' gets Y regressed on, like
Y1 ~ X1 + X2 + X3
Y2 ~ X4 + X5 + X6 + X7
and in case of composite exogenous variables you have to add
Y1 ~ 0*Y1
Y2 ~ 0*Y2
HTH,
Rene


Roger

unread,
Nov 26, 2012, 5:29:11 PM11/26/12
to lav...@googlegroups.com
Op 26-11-2012 22:16, René Mayer schreef:
Hi Rene,

This doesn't seem to work:

data(HolzingerSwineford1939)
HS.model <- ' visual  ~ x1 + x2 + x3
              textual ~ x4 + x5 + x6
              speed   ~ x7 + x8 + x9 '
fit <- cfa(HS.model, data=HolzingerSwineford1939)

Error in `[.data.frame`(data, , unlist(ov.names)) : 
  undefined columns selected

It looks like it expects visual, textual, and speed to be  variables in the dataframe, rather than latent constructs.

thanks,
Roger


Roger

unread,
Nov 26, 2012, 5:30:45 PM11/26/12
to lav...@googlegroups.com

Thanks Yves.
I tried this for a few datasets, but every time I get the following error:


data(HolzingerSwineford1939)
HS.model <- ' visual  <~ x1 + x2 + x3
              textual <~ x4 + x5 + x6
              speed   <~ x7 + x8 + x9 '
fit <- cfa(HS.model, data=HolzingerSwineford1939)

Error in solve.default(E) : 
  Lapack routine dgesv: system is exactly singular: U[1,1] = 0
Warning messages:
1: In estimateVCOV(lavaanModel, samplestats = lavaanSampleStats, options = lavaanOptions,  :
  lavaan WARNING: could not compute standard errors!

2: In pchisq(chisq, df) : NaNs produced

This happens with each of the datasets I have tried. Am I missing something?

best, Roger


Op 26-11-2012 19:35, yrosseel schreef:

yrosseel

unread,
Nov 27, 2012, 3:18:36 AM11/27/12
to lav...@googlegroups.com
On 11/26/2012 11:30 PM, Roger wrote:
>
> Thanks Yves.
> I tried this for a few datasets, but every time I get the following error:
>
> data(HolzingerSwineford1939)
> HS.model <- ' visual <~ x1 + x2 + x3
> textual <~ x4 + x5 + x6
> speed <~ x7 + x8 + x9 '
> fit <- cfa(HS.model, data=HolzingerSwineford1939)
>
> Error in solve.default(E) :
> Lapack routine dgesv: system is exactly singular: U[1,1] = 0

The above model is heavily under-identified. You have df = -12.
Formative latents are only useful when embedded in a larger structural
system.

Below is lavaan code for an example that appears in the Mplus handouts
(Topic 1, slides 245 and following). Using three different approaches
(the latter using the <~ operator). The 'f' factor is formative, while
'fy' is reflective.

lower <- '
1.
.3607 1.
.2104 .2655 1.
.1002 .2845 .1763 1.
.1563 .1924 .1363 .3046 1.
.1583 .3246 .2264 .3056 .3447 1. '
HT <- getCov(lower, names=c("church", "members", "friends",
"income", "occup", "educ"))


# 1. oldest approach
model <- ' fy =~ church + members + friends
f =~ NA*fy
fy ~~ fy
f ~~ 0*f
f ~ 1*income + occup + educ '
fit <- sem(model, sample.cov=HT, sample.nobs=530)
summary(fit)


# 2. new in lavaan 0.4-10 - using 'phantom' latent (=~ 0)
model <- ' fy =~ church + members + friends
f =~ 0
f ~~ 0*f
f ~ 1*income + occup + educ
fy ~ f '
fit <- sem(model, sample.cov=HT, sample.nobs=530)
summary(fit)

# 3. new in lavaan 0.4-12 - using <~ operator
model <- ' fy =~ church + members + friends
f <~ 1*income + occup + educ
fy ~ f '
fit <- sem(model, sample.cov=HT, sample.nobs=530)
summary(fit)










model <- ' fy =~ church + members + friends
f <~ 1*income + occup + educ
fy ~ f '
fit <- sem(model, sample.cov=HT, sample.nobs=530)
summary(fit)


Roger

unread,
Dec 1, 2012, 3:33:38 AM12/1/12
to lav...@googlegroups.com
Op 27-11-2012 9:18, yrosseel schreef:
Thanks you, Yves, you are of course entirely correct. Thank you for the
code, this is very helpful, I have adapted it for my own models.
I really appreciate your work on lavaan, making sem so user friendly.
The "<~" operator is very useful.

thanks, Roger

esther beierl

unread,
Sep 16, 2013, 9:57:58 AM9/16/13
to lav...@googlegroups.com, r.t.a.j....@gmail.com
Dear all,
when I run a formative CFA in lavaan, I get the lavaan error "please refit the model with test="standard"" - what does this mean?
Thanx in advance!
Esther

yrosseel

unread,
Sep 16, 2013, 12:04:40 PM9/16/13
to lav...@googlegroups.com
On 09/16/2013 03:57 PM, esther beierl wrote:
> Dear all,
> when I run a formative CFA in lavaan, I get the lavaan error "please
> refit the model with test="standard"" - what does this mean?

Are you using bootstrapping, using the bootstrapLavaan() function? In
that case, you must fit the original model with test="standard" (or
simply omit the test argument, which is "standard" by default).

If not, can you show us your script?

Yves.

esther beierl

unread,
Sep 17, 2013, 4:30:05 AM9/17/13
to lav...@googlegroups.com
Dear Yves,

No, I am not using Bootstrapping, I am using your script above (or I think I am doing :-))  with "<~" and just with one formative factor and the following two lines:

model1 <- ' f <~ x1 + x2 + x3 + x4 + x5 ' 

fit <- cfa(model1, data=my.data1)
summary(fit, fit.measures=TRUE)  

It says "lavaan ERROR: please refit the model with test="standard".

Where is my mistake?
Thanks very much!
Esther

Yves Rosseel

unread,
Sep 17, 2013, 5:31:05 AM9/17/13
to lav...@googlegroups.com
On 09/17/2013 10:30 AM, esther beierl wrote:
> Dear Yves,
>
> No, I am not using Bootstrapping, I am using your script above (or I
> think I am doing :-)) with "<~" and just with one formative factor and
> the following two lines:
>
> model1 <- ' f <~ x1 + x2 + x3 + x4 + x5 '
>
> fit <- cfa(model1, data=my.data1)

I get a LOT of trouble fitting the above model, but not an error related
to test="standard".

model1 is not identified. It has df=-5, and is over-saturated: whatever
the values are for the regressions, fitted covariance = observed
covariance, since they only contain exogenous variables.

You can simply not switch '=~' for '<~'; formative factors only make
sense when embedded in a larger model.

(but still puzzled where the error message came from)

Yves.

esther theresa

unread,
Sep 17, 2013, 5:58:00 AM9/17/13
to lav...@googlegroups.com
Actually I have 20 indicators and one factor (I was just too lazy to write it all...) and the model seems identified. 
But I see the problem, my model should be embedded in a larger reflective model, right?
Thanks!

esther theresa

unread,
Sep 26, 2013, 10:12:37 AM9/26/13
to lav...@googlegroups.com
Now I tried to create a second-order-cfa with a formative second-order-factor, two latent variables forming the second-order factor and having 10 indicators each. I specified the factor loadings of the indicators (.40-.60) and the two paths between the latent variables (.50 each) and the second-order-factor. The two latent variables don't correlate with each other. 
I wanted to simulate data (with "simulateData") and get the factor scores for the second-order-factor (with "predict(fit)"), but I get the following error message:
Fehler in computeEETA.LISREL(MLIST, mean.x = samplestats@mean.x[[g]], sample.mean = samplestats@mean[[g]],  : 
Indizierung außerhalb der Grenzen
Where is my mistake?
Thanks a lot for your help!

yrosseel

unread,
Sep 28, 2013, 1:21:11 PM9/28/13
to lav...@googlegroups.com
On 09/26/2013 04:12 PM, esther theresa wrote:

> I wanted to simulate data (with "simulateData") and get the factor
> scores for the second-order-factor (with "predict(fit)"), but I get the
> following error message:
> Fehler in computeEETA.LISREL(MLIST, mean.x = samplestats@mean.x[[g]],
> sample.mean = samplestats@mean[[g]], :
> Indizierung au�erhalb der Grenzen

We would need to see your full script to figure it out. Can you post it?

Yves.


esther theresa

unread,
Sep 30, 2013, 1:30:57 PM9/30/13
to lav...@googlegroups.com
Dear Yves,
 
that is my Syntax. Do you see the mistake/s?
 
Thx!

model1 <- '
f1 =~ 0.46*x1 + 0.56*x2 + 0.48*x3 + 0.58*x4 + 0.59*x5 + 0.41*x6 + 0.51*x7 + 0.58*x8 + 0.51*x9 + 0.49*x10
f2 =~ 0.59*x11 + 0.49*x12 + 0.54*x13 + 0.51*x14 + 0.42*x15 + 0.58*x16 + 0.45*x17 + 0.41*x18 + 0.47*x19 + 0.59*x20
dx ~ 0.5*f1 + 0.5*f2
f1 ~~ 0*f2
'
my.data1 <- simulateData(model1, model.type=cfa, meanstructure=FALSE, int.lv.free=TRUE, fixed.x=FALSE, auto.fix.first=FALSE, sample.nobs= 1000000L, ov.var=NULL, skewness=NULL, kurtosis=NULL, seed=12345, empirical=FALSE, return.fit=TRUE, standardized=TRUE)

fit.model1 <- cfa(model1, data = my.data1)
summary(fit.model1, fit.measures=TRUE, standardized=TRUE)

f.scores.model1 <- predict(fit.model1)
str(f.scores.model1)
fs1  <- f.scores.model1[,1]
fs2  <- f.scores.model1[,2]
fs3 <- f.scores.model1[,3]

yrosseel

unread,
Oct 1, 2013, 1:37:33 PM10/1/13
to lav...@googlegroups.com
On 09/30/2013 07:30 PM, esther theresa wrote:
> Dear Yves,
> that is my Syntax. Do you see the mistake/s?

This looks like a bug in lavaan. Will fix it ASAP.

Yves.

esther theresa

unread,
Oct 10, 2013, 7:55:17 AM10/10/13
to lav...@googlegroups.com
Thanks very much!!

Is it also possible to fit just formative models (without any reflective measurement part) with PLS in lavaan? That would be great for my research...

Esther

yrosseel

unread,
Oct 10, 2013, 11:00:28 AM10/10/13
to lav...@googlegroups.com
No. Maybe in some distant future, but it is not a priority. There is the
semPLS package for this.

Yves.

esther theresa

unread,
Oct 11, 2013, 5:43:56 AM10/11/13
to lav...@googlegroups.com
Thanks!

Sofie VR

unread,
Jul 30, 2014, 9:13:41 AM7/30/14
to lav...@googlegroups.com

Dear Lavaan users, 

I am currently exploring R and Lavaan. If I get it right you can use <~for constructing a formative scale and =~for a reflective one? Can you use for both of them the cfa function? Does it suffice to use the syntax of the reflective scale modeling, but only adjusting for <~? 

For the formative scale, do you have to correlate the items immediately in your syntax or afterwards after exploring the modification indices? 

Thank you for your help, 

Kind regards, 

Sofie

Terrence Jorgensen

unread,
Jul 30, 2014, 3:58:19 PM7/30/14
to lav...@googlegroups.com

 If I get it right you can use <~for constructing a formative scale and =~for a reflective one?

Yes, those are the correct operators.
 
Can you use for both of them the cfa function?

Yes, as long as your constructs are properly identified, you can specify a model with both formative and reflective indicators/constructs.
 
Does it suffice to use the syntax of the reflective scale modeling, but only adjusting for <~? 

For the formative scale, do you have to correlate the items immediately in your syntax or afterwards after exploring the modification indices? 


I'm not sure what you mean by "adjusting for <~".  Could you provide your model (syntax, path diagram) to show us what you are trying to accomplish?

Terry

Sofie Van Regenmortel

unread,
Jul 31, 2014, 3:19:47 AM7/31/14
to lav...@googlegroups.com
Dear Terry and other Lavaan users, 

Thank you for your answer. 

Here below you can find my syntax. Y, Z or X respresents latent factors. v combined with a number represents a manifest variable.

 

library (lavaan)

SEmodel <- ‘ Z <~Y1 + Y2  + Y3 + Y4 + v60_R + Y6 + Y7

Y=~ v71 + v72

Y2 =~v4801 + v4802 + v4803 + v4804 + v4805 + v4806 + v4807 + v4809

Y3 <~X1 + X2 + v50_R + v51_R + v40_R

X1 =~ v3503 + v3509 + v3510

X2 =~v3504 + v3507 + v3508

Y4 <~ v61_R + v62_R

Y6 <~v80_R + v26_R + v27_R + X3

X3 =~ v3201 + v3202 + v3203 + v3204 + v3205 + v3206 + v3207 + v3208Y

Y7 <~v90_R + v91_R’

 

Fit <- cfa(SEmodel, data = Dataset1)

Summary (fit, fit.measures=TRUE, standardized=TRUE, modindices=TRUE)

 

I am wondering if I have to add in my model the correlations between the formative indicators of one factor. For example of factor Z, the correlations between  Y1, Y2,  Y3, Y4, v60_R , Y6 ,Y7. Or should I wait for the modification indices and their values? 


Thank you very much for your help. Kind regards, 


Sofie 

 



--
You received this message because you are subscribed to a topic in the Google Groups "lavaan" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/lavaan/hnMCckaARoo/unsubscribe.
To unsubscribe from this group and all its topics, send an email to lavaan+un...@googlegroups.com.
To post to this group, send email to lav...@googlegroups.com.
Visit this group at http://groups.google.com/group/lavaan.
For more options, visit https://groups.google.com/d/optout.

Terrence Jorgensen

unread,
Aug 5, 2014, 12:32:32 AM8/5/14
to lav...@googlegroups.com

I am wondering if I have to add in my model the correlations between the formative indicators of one factor. For example of factor Z, the correlations between  Y1, Y2,  Y3, Y4, v60_R , Y6 ,Y7. Or should I wait for the modification indices and their values? 

 I don't think you need to specify them explicitly, those correlations should be included default.  But I don't think your model will run because your formative constructs are not identified.  A formative construct has no definition unless it predicts something else, either by having additional reflective indicators or by being a predictor of other constructs that are themselves identified.  Your construct Z is only an outcome, but is not observed, so it is not identified.  Because Z is not identified, I don't think Y3, Y4, Y6, or Y7 are identified either, since they don't predict anything (except Z, which is not identified).

Terry

Sofie Van Regenmortel

unread,
Aug 5, 2014, 2:14:59 AM8/5/14
to lav...@googlegroups.com
Thank you very much, I will see how I can make a new model. Kind regards, 

Sofie 


--

Sofie Van Regenmortel

unread,
Aug 6, 2014, 2:35:13 AM8/6/14
to lav...@googlegroups.com
Dear Terry,
Dear other lavaan users, 

Is it nog enough to determine the missing Y's in the Z part of the syntax by the following part of the syntax: 

Y1  =~ v71 + v72

Y2 =~v4801 + v4802 + v4803 + v4804 + v4805 + v4806 + v4807 + v4809

Y3 <~X+ X+ v50_R + v51_R + v40_R

X=~ v3503 + v3509 + v3510

X2 =~v3504 + v3507 + v3508

Y<~ v61_R + v62_R

Y6 <~v80_R + v26_R + v27_R + X3

X=~ v3201 + v3202 + v3203 + v3204 + v3205 + v3206 + v3207 + v3208Y

Y7 <~v90_R + v91_R


Here are some reflective scales and for every Y there are manifest variables? 


Thank you all for your help,




Kind regards, 


Sofie Van Regenmrotel



--

TRM

unread,
Aug 13, 2014, 2:30:22 PM8/13/14
to lav...@googlegroups.com
Is it possible with lavaan to estimate a model with only a formative construct and a structural model (no reflective construct) / does it make sense to do so? Something like this, for example:

y1 <~ x1 + x2 + x3 + x4
x5 ~ y1 + x6

where x's are observed.

Thanks,

TRM

Edward Rigdon

unread,
Aug 13, 2014, 9:58:56 PM8/13/14
to lav...@googlegroups.com

     Would it help to do this, versus just putting the 4 components of y into the second equation directly?  Lavaan estimates factor-based models, so it must obey factor-based rules.  Under those rules, there is no way for the algorithm to decide the weights for the four components of y.

     You could perhaps derive the numbers that you want using a sheaf coefficient:

http://intersci.ss.uci.edu/wiki/pdf/Heise1972NominalInducedandBlockvarsinregressionanal.pdf

The idea is that you allow the components to have direct effects on the ultimate dependent and then transform their combined contribution into an estimate of the coefficient for the composite predicting the dependent.  I think it is elegant and under-used.  

--Ed Rigdon

--
You received this message because you are subscribed to the Google Groups "lavaan" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lavaan+un...@googlegroups.com.

Message has been deleted

TRM

unread,
Aug 14, 2014, 3:08:48 PM8/14/14
to lav...@googlegroups.com
OK, so I derived the components of y1 using the sheaf coefficient (7 variables actually, instead of 4) by "hand" following the steps in Heise's paper, so hopefully they are correct. Is the idea then to take those new coefficients for the formative LV and add them in like this:

fit<- '
y1 <~ (0.84*x1) + (0.28*x2) + (-0.83*x3) + (-0.07*x4)...etc
x9 ~ y1 + 0.59*x8
'

where x1-x7 and their coefficients are the values obatined using the sheaf coefficient and x8 and its coeff are from 'Step 1'? Because this does not result in a viable model for me. As you can see, I have a number of negatives, which I am not sure is acceptable.

Edward Rigdon

unread,
Aug 14, 2014, 3:39:16 PM8/14/14
to lav...@googlegroups.com

     If your predictors / components are highly correlated, you are going to see negative weights.  If you don’t like negative partialed weights, you could avoid them by estimating your sheaf coefficients using correlation weights rather than regression weights.  Correlation weights ignore collinearity among the components, so if all zero order correlations with the dependent variable are positive, then the weights will be positive as well.

 

http://link.springer.com/article/10.1007/s11336-009-9127-y#page-1

 

     Besides avoiding negative weights, correlation weights have superior properties in out-of-sample prediction, except when sample sized is very high and the actual predictability of the dependent variable is quite high.

 

http://jeb.sagepub.com/content/29/3/317.short

 

--Ed Rigdon

 

 

From: lav...@googlegroups.com [mailto:lav...@googlegroups.com] On Behalf Of TRM
Sent: Thursday, August 14, 2014 3:09 PM
To: lav...@googlegroups.com
Subject: Re: formative measurement model for CFA

 

OK, so I derived the components of y1 using the sheaf coefficient (7 variables actually, instead of 4) by "hand" following the steps in Heise's paper, so hopefully they are correct. Is the idea then to take those new coefficients for the formative LV and add them in like this:

--

TRM

unread,
Aug 14, 2014, 3:44:55 PM8/14/14
to lav...@googlegroups.com
Thanks Ed. Was I correct in my model syntax in adding the sheaf-derived coefficients into the formula for the formative variable or have I missed something there?

Edward Rigdon

unread,
Aug 14, 2014, 10:00:30 PM8/14/14
to lav...@googlegroups.com

     It is not entirely clear to me what steps you took.  If you wanted to derive a value for the formatively defined variable, then yes, you add up the components using the weights.  But you don’t add up the weights to get another parameter.  As Heise notes in his Step 2, you compute the variance of the formatively defined variable using the standard formula:

Variance of the sum =

     Sum of the variances

     + 2 times the sum of the covariances

     You need the variance of the formatively defined variable in order to get the sheaf coefficient.

--Ed Rigdon

 

From: lav...@googlegroups.com [mailto:lav...@googlegroups.com] On Behalf Of TRM
Sent: Thursday, August 14, 2014 3:45 PM
To: lav...@googlegroups.com
Subject: Re: formative measurement model for CFA

 

Thanks Ed. Was I correct in my model syntax in adding the sheaf-derived coefficients into the formula for the formative variable or have I missed something there?

--

Sofie VR

unread,
Aug 17, 2014, 10:01:22 AM8/17/14
to lav...@googlegroups.com
Dear all, 

Does somebody of you know if the same estimators can be used in a model with or without formative construct? 

Thank you for your answer and help, 

Kind regards, 

Sofie  

Edward Rigdon

unread,
Aug 17, 2014, 9:41:59 PM8/17/14
to lav...@googlegroups.com
Sofie--
     I am not quite sure that I understand your question.  It seems to me there are factor-based techniques like factor-based SEM and composite-based techniques like partial least squares (PLS) path modeling or generalized structured component analysis (GSCA).  You can't estimate models of composites in a factor-based technique and vice versa.  Theo Dijkstra has blurred the lines a bit with his recent innovations, starting with PLS results and then transforming to factor-Model results.
     If you literally meant "estimators," though, then the answer is yes.  You can use least squares methods for both factor models and composite models.  But you won't find one package that estimates both kinds of models.
--Ed Rigdon

Sent from my iPad
--

TRM

unread,
Aug 18, 2014, 12:13:29 PM8/18/14
to lav...@googlegroups.com
Following the steps in Heise's paper, I: 

1. Did a standardized regression of my dependent variable (x9 ("z" in Heise)) on all my measured variables (x1 to x7 for the formative variable ("w1, w2" in Heise) and another variable x8 ("y" in Heise))

2. Derived the sheaf coefficient using the method in step 2 using the st. reg. coeff.'s for x1-x7 from above. I found the observed correlation between each pair of variables using cor.test()

3. Using the sheaf coefficient, found the a's for x1-x7 as Heise does in step 3.

4. Inserted the a's into the regression for the formative variable, y1, like this: 'y1 <~ (a1*x1) + (a2*x2) + (a3*x3) + (a4*x4)...etc' as I showed in my above post with numbers.

5. Then in the regression for the dependent variable:' x9 ~ y1 + x8' I added the st. reg. coeff. I found in "1." to get 'x9 ~ y1 + 0.59*x8'

Were these the proper steps? I was unsure of whether to add the coeff. in the last step.

Thanks,

TRM
Reply all
Reply to author
Forward
0 new messages