(1) Fix factor variance to 1 (UVI)
model <- 'f =~ NA*x1+x2+x3+x4+x5+x6+x7+x8+x9+x10+x11+x12
f~~1*f'
> fit1 (configural)
npar logl ppp bic dic p_dic waic p_waic se_waic looic p_loo se_loo margloglik 72.000 -2772.953 0.947 5926.661 5687.456 70.775 5686.784 67.496 60.379 5687.270 67.740 60.417 -3093.320
> fit2 (metric)
npar logl ppp bic dic p_dic waic p_waic se_waic looic p_loo se_loo margloglik 60.000 -2777.333 0.968 5871.962 5672.741 59.037 5672.749 57.057 61.082 5672.982 57.173 61.082 -3045.330
(2) Fix factor loading to .7 (population
value) (ULI)
model <- 'f =~ NA*x1+x2+x3+x4+x5+x6+x7+x8+x9+x10+x11+.7*x12
f~~c(NA,NA)*f
f~c(NA,NA)*1
x12~ 0*1'
> fit1 (configural)
npar logl ppp bic dic p_dic waic p_waic se_waic looic p_loo se_loo margloglik 72.000 -2790.510 0.854 5961.775 5672.763 45.872 5701.733 71.920 61.929 5702.018 72.063 61.931 -3080.562
> fit2 (metric)
npar logl ppp bic dic p_dic waic p_waic se_waic looic p_loo se_loo margloglik 61.000 -2777.232 0.963 5877.048 5673.033 59.285 5673.559 57.809 61.307 5673.786 57.923 61.310 -3039.682
-Xinya
Hello,I hope all is well with you.I fitted a one-factor two-group model with different identification methods (fixing factor variance UVI vs. fixing factor loading ULI) using blavaan through Stan. I found that DIC performed quite differently depending on the use of UVI or ULI when the sample size was small. In the output below (based on n=100 per group): (1) The logl in configural models (in green) differed quite a lot when using different constraints, but it was similar in metric models (in blue). (2) When doing model comparison, DIC would select the metric model when using UVI, but select configural model when using ULI. In addition, in a simulation study comparing between configural and metric models, when using ULI with small sample sizes, DIC had unusually high false positive rates and also much higher true positives than other fit measures. This was not observed with UVI, when sample size was large, or when comparing between metric and scalar models.I am not sure why the identification (UVI vs. ULI) makes such a big difference. This seems to only affect DIC with small sample sizes. Do you have any clues why this happened? Thank you so much!
(1) Fix factor variance to 1 (UVI)
model <- 'f =~ NA*x1+x2+x3+x4+x5+x6+x7+x8+x9+x10+x11+x12
f~~1*f'
"Lucida Console";color:blue">> fit1 (configural)color:black;border:none windowtext 1.0pt;mso-border-alt:none windowtext 0in;padding:0in"> npar logl ppp bic dic p_dic waic p_waic se_waic looic p_loo se_loo margloglikcolor:black;border:none windowtext 1.0pt;mso-border-alt:none windowtext 0in;padding:0in"> 72.000 -2772.953 0.947 5926.661 5687.456 70.775 5686.784 67.496 60.379 5687.270 67.740 60.417 -3093.320
"Lucida Console";color:blue">> fit2 (metric)color:black;border:none windowtext 1.0pt;mso-border-alt:none windowtext 0in;padding:0in"> npar logl ppp bic dic p_dic waic p_waic se_waic looic p_loo se_loo margloglikcolor:black;border:none windowtext 1.0pt;mso-border-alt:none windowtext 0in;padding:0in"> 60.000 -2777.333 0.968 5871.962 5672.741 59.037 5672.749 57.057 61.082 5672.982 57.173 61.082 -3045.330
(2) Fix factor loading to .7 (population value) (ULI)
model <- 'f =~ NA*x1+x2+x3+x4+x5+x6+x7+x8+x9+x10+x11+.7*x12
f~~c(NA,NA)*f
f~c(NA,NA)*1
x12~ 0*1'
--"Lucida Console";color:blue">> fit1 (configural)color:black;border:none windowtext 1.0pt;mso-border-alt:none windowtext 0in;padding:0in">npar logl ppp bic dic p_dic waic p_waic se_waic looic p_loo se_loo margloglikcolor:black;border:none windowtext 1.0pt;mso-border-alt:none windowtext 0in;padding:0in">72.000 -2790.510 0.854 5961.775 5672.763 45.872 5701.733 71.920 61.929 5702.018 72.063 61.931 -3080.562
"Lucida Console";color:blue">> fit2 (metric)color:black;border:none windowtext 1.0pt;mso-border-alt:none windowtext 0in;padding:0in">npar logl ppp bic dic p_dic waic p_waic se_waic looic p_loo se_loo margloglikcolor:black;border:none windowtext 1.0pt;mso-border-alt:none windowtext 0in;padding:0in">61.000 -2777.232 0.963 5877.048 5673.033 59.285 5673.559 57.809 61.307 5673.786 57.923 61.310 -3039.682
-Xinya
You received this message because you are subscribed to the Google Groups "blavaan" group.
To unsubscribe from this group and stop receiving emails from it, send an email to blavaan+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/blavaan/457a749f-7a84-4d13-b94e-3887e5718cd4o%40googlegroups.com.
Thank you for both of your replies! That is very helpful! We have two additional questions.
1. For the SE of LOO or WAIC difference, since the distribution of the difference is not known, could you provide some references on how the SE can be used with these fit measures?
2. We played around a little bit. When ULI was used, the model (two-group CFA) seems to take much longer to converge (than using UVI). Is this something you observed?
Also, the group 2 estimates look odd and did not seem to match the trace plots (though group 1 estimates matched the plots perfectly). Please see the code and snapshots below.
Using this code and selecting only replications with rhat < 1.05, DIC still shows extremely high false positives with small sample size (comparing to WAIC and LOO). Is there anything we missed? Thank you so much for your time!
Code for ULI (n/group = 100):
model <- 'f =~ NA*x1+x2+x3+x4+x5+x6+x7+x8+x9+x10+x11+.7*x12
f~~c(NA,NA)*f
f~c(NA,NA)*1
x12~ 0*1'
run1 <- blavaan::bcfa(model, data = dat, group = "grp", burnin=1500, sample=3000, bcontrol=list(cores=4), n.chains = 2)
Results for loading estimates:
Group 1 []:
Latent Variables:
Estimate Post.SD pi.lower pi.upper Rhat neff Prior
f =~
x1 0.719 0.119 0.521 0.994 1.000 1404.159 normal(0,10)
x2 0.816 0.139 0.584 1.131 1.000 1618.082 normal(0,10)
x3 0.722 0.125 0.512 0.998 1.000 1440.710 normal(0,10)
x4 0.705 0.123 0.495 0.985 1.000 1444.321 normal(0,10)
x5 0.878 0.140 0.648 1.201 1.000 1259.758 normal(0,10)
x6 0.644 0.121 0.439 0.919 1.000 1524.819 normal(0,10)
x7 0.841 0.134 0.621 1.148 1.000 1257.357 normal(0,10)
x8 0.746 0.129 0.533 1.039 1.000 1419.196 normal(0,10)
x9 0.732 0.122 0.534 1.016 1.000 1290.344 normal(0,10)
x10 0.849 0.131 0.638 1.155 1.000 1326.228 normal(0,10)
x11 0.859 0.136 0.642 1.174 1.000 1152.873 normal(0,10)
x12 0.700 NA NA
Group 2 []:
Latent Variables:
Estimate Post.SD pi.lower pi.upper Rhat neff Prior
f =~
x1 1.931 1.890 0.581 7.281 1.051 63.299 normal(0,10)
x2 2.484 2.424 0.778 9.282 1.052 62.666 normal(0,10)
x3 2.852 2.769 0.918 10.585 1.053 61.475 normal(0,10)
x4 2.647 2.558 0.839 9.614 1.052 61.942 normal(0,10)
x5 2.061 1.993 0.63 7.513 1.052 62.991 normal(0,10)
x6 2.229 2.167 0.702 8.162 1.052 62.770 normal(0,10)
x7 2.209 2.157 0.701 8.304 1.053 61.455 normal(0,10)
x8 2.071 1.999 0.634 7.608 1.050 63.565 normal(0,10)
x9 2.702 2.615 0.859 10.145 1.052 62.297 normal(0,10)
x10 2.463 2.373 0.793 9.077 1.053 61.873 normal(0,10)
x11 2.923 2.794 0.957 10.503 1.052 61.895 normal(0,10)
x12 0.700 NA NA
Group 1 track plots


To view this discussion on the web visit https://groups.google.com/d/msgid/blavaan/ce07894b-346e-4097-a9e1-8b50bba9a077n%40googlegroups.com.
2. We played around a little bit. When ULI was used, the model (two-group CFA) seems to take much longer to converge (than using UVI). Is this something you observed?
model <- 'f =~ x1+x2+x3+x4+x5+x6+x7+x8+x9+x10+x11+x12'
run1 <- blavaan::bcfa(model, data = dat, group = "grp", burnin=1500, sample=3000, bcontrol=list(cores=4), n.chains = 2, std.lv=F)
By adding st.lv=F it will automatically use the first indicator, if you want to use x12 you can just put first. My guess about the time difference is because of you setting the constraints on your own in the syntax. By doing this the Stan code will be more complicated to run. For efficiency the recommendation is to use the std.lv argument.
I have no notion on the discrepancy for the trace plots
The current general recommendation is for all rhat to be lower than 1.01, for convergence
For simulations I usually run the model in a while loop increasing the burnin, untill all Rhats < 1.01. But always keeping the sample arguemnt constant, so that the posterior distributions have the same number of draws across iterations
HS.model <- ' visual =~ x1 + x2 + x3
textual =~ x4 + x5 + x6
speed =~ x7 + x8 + x9 '
BURN <- 3000
rhat <- 20
while (rhat > 1.05) {
BURN <- BURN + 2000 ### increase burn in by 2000 iterations every time
fit <- bcfa(HS.model, std.lv=T,
data=HolzingerSwineford1939,
n.chains = 3, burnin = BURN,
sample=1000,target="stan")
rhat <- max(blavInspect(fit, "psrf"), na.rm=T)
print(paste0("Rhat=",rhat))
}
1. For the SE of LOO or WAIC difference, since the distribution of the
difference is not known, could you provide some references on how the SE can be
used with these fit measures?
From the development of the LOO and WAIC difference, we know that each one have no known distribution. But the difference between them do follow a normal distribution. The one issue brought up is that the SE of the difference is most likely optimistic (smaller than it really is). So their recommendation to say that on model presents consistently better out-of-sample predictive accuracy is the if the ration LOO_diff/SE to be "at least" higher than 2, BUT PREFERABLY higher than 4
Using this code and selecting only replications with rhat < 1.05,
DIC still shows extremely high false positives with small sample size (comparing to WAIC and LOO). Is there
anything we missed?
I havent seen that particular issue of DIC, but it does have many issues as, it uses any difference as an indication of a "better" model. So, it doesnt surprise me that fails there
From a pure Bayesian inference view the issue of model selection is more complicated than in the frequentist view. So, the of false positive can be controversial, since rarely the idea to have the true model. But the indices help use identify the model with best out-of-sample predictive accuracy. And in many scenarios, the wrong model can have better accuracy.
The general rule with DIC of simply selecting the model with DIC is overly simple, and and kind of borrows the frequentist view. An interesting scenario with LOO and WAIC is when you find that they have functionally the same predictive accuracy (rayio LOO_diff/SE = 0.85 for example). Because from a frequentist general recommendation you would choose the smaller model (Ocxams razor). But from a Bayesian view you dont have to choose the smaller model, you can choose either, whichever is more theoretically relevant.
Hope some of this helps
To view this discussion on the web visit https://groups.google.com/d/msgid/blavaan/70fc296b-50c7-4e35-ac9b-60dd0def0136n%40googlegroups.com.