Order in which manifest variables are stated changes regression results

Skip to first unread message

Joseph Watson

Feb 20, 2019, 3:50:48 PM2/20/19
to lavaan

Dear all,

In attempting to use SEM as part of an impact evaluation process, I found that my model gives differing results (for unstandardised regression estimates and p-values) depending on the position in which manifest variables are stated. (Std.all and std.lv results remain constant.)


A previous forum post (https://groups.google.com/forum/#!topic/lavaan/aI6dXkbStaQ) states that "You should use unstandardized coefficients for making inferences about a null hypothesis, and use standardized coefficients as a standardized measure of effect size".


So, as using unstandardized coefficients (and their associated p-values) for making inferences about a null hypothesis is dependent on the order in which manifest variables are stated within lavaan, is there a rule as to the order in which they should be stated? (e.g., should you state the manifest variable with the highest Std.all loading onto its latent variable first)? 


Thank you so much again, and huge apologies if I have missed this in lavaan help.  


Edward Rigdon

Feb 20, 2019, 4:10:42 PM2/20/19
to lav...@googlegroups.com
Details would help. Can you share some output?
--Ed Rigdon

You received this message because you are subscribed to the Google Groups "lavaan" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lavaan+un...@googlegroups.com.
To post to this group, send email to lav...@googlegroups.com.
Visit this group at https://groups.google.com/group/lavaan.
For more options, visit https://groups.google.com/d/optout.

Joseph Watson

Feb 21, 2019, 8:51:40 AM2/21/19
to lavaan

Of course -

Evidence of the problem and my model specification is provided below:


'exp' manifest variables ordered one way (1)


'exp' manifest variables ordered another way (2)


This shows how changing the order of manifest variables within the latent, 'exp', alters undstandardised regression coefficients and (more importantly) p-values. Std.all remains the same.

The first image was produced using the following model:


maths.pri_model <- 'maths =~ h1706_sub_binary + h1706_add_binary + h1705

exp_c =~ pri_Child_Char_Rec6 + pri_Child_Char_Rec5 + pri_Child_Char_Rec4

exp =~ pri_Child_Char_Rec3 + pri_Child_Char_Rec2 + pri_Child_Char_Rec8 + pri_Child_Char_Rec9 + pri_Child_Char_Rec1 + h1708_binary

maths ~ exp + exp_c + age + asset_wealth_index + mother_attended_school + enr_binary'


maths.prifit <- cfa(model = maths.pri_model, data = Uwezo6plus_primary, ordered = c("h1705", "h1706_add_binary", "h1706_sub_binary", "h1708_binary"))


summary(maths.prifit, standardized = T, fit.measures = T)


Thanks again

Edward Rigdon

Feb 21, 2019, 10:21:47 AM2/21/19
to lav...@googlegroups.com
     Thank you--that is very helpful.
     First, if you just want a fix, you have multiple choices. One, you could standardize the common factor, here exp, via teh option std.lv = T
Then the order of the observed variables would be irrelevant.
     If you don't want to fix the factor variance, then you might try "effects coding," where you constrain the *average of the loadings to be 1, without constraining any particular loading. For a given factor f, put a a label on the loading for each observed variable ov: 
f =~ a1 * ov1 + a2 * ov2 + a3 * ov3
for example, and then imose a constraint:
a3 == 3 - a1 - a2
where the number is the number of observed variables loading on the factor.
      However, you have a particular problem. Notice that, in your first output where the "R3" variable is the marker variable, with its loading fixed to 1 by default, none of the other loadings are statistically significant (except for the "binary" variable), but "R1" is the marker variable, other loadings *are* statistically significant. This suggests that R3 does not loadg strongly on the factor, and that using it as marker variable is a bad idea. Notice that R3's loading in the second case is nonsignificant.  So your set of variables does not really conform to the factor model you are specifying. That is your real problem, and you need to address it before moving forward.
Ed Rigdon

Joseph Watson

Feb 25, 2019, 7:39:02 AM2/25/19
to lavaan
Dear Ed Rigdon,

Thank you so much for this detailed reply. This is really helpful. 

Apologies for overlooking something as simple as loadings values/p-values. (I had checked these were OK when examining the latent variable 'exp' in isolation from the whole model, but then failed to look at them again within the larger model that you have just commented upon!)

Thanks to your advice, I ended up 'simplifying' my model to get something that makes sense in lavaan! 

(If of any interest, I have kept my "dependent variable", 'maths', as a latent variable calculated by lavaan, but replaced the latent variables of 'exp' and 'exp_c' with ability estimates created first through IRT that are then treated as observed measures within the lavaan model. This is not quite as good as getting everything done in lavaan, but it functions!)

Thank you again,

Reply all
Reply to author
0 new messages