DIC

Xinya Liang

unread,

Jul 6, 2020, 6:43:52 PM7/6/20

to blavaan

Hello,

I hope all is well with you.

I fitted a one-factor two-group model with different identification methods (fixing factor variance UVI vs. fixing factor loading ULI) using blavaan through Stan. I found that DIC performed quite differently depending on the use of UVI or ULI when the sample size was small. In the output below (based on n=100 per group): (1) The logl in configural models (in green) differed quite a lot when using different constraints, but it was similar in metric models (in blue). (2) When doing model comparison, DIC would select the metric model when using UVI, but select configural model when using ULI. In addition, in a simulation study comparing between configural and metric models, when using ULI with small sample sizes, DIC had unusually high false positive rates and also much higher true positives than other fit measures. This was not observed with UVI, when sample size was large, or when comparing between metric and scalar models.

I am not sure why the identification (UVI vs. ULI) makes such a big difference. This seems to only affect DIC with small sample sizes. Do you have any clues why this happened? Thank you so much!

(1) Fix factor variance to 1 (UVI)

model <- 'f =~ NA*x1+x2+x3+x4+x5+x6+x7+x8+x9+x10+x11+x12

f~~1*f'

> fit1 (configural)

 npar       logl        ppp        bic        dic      p_dic       waic     p_waic    se_waic      looic      p_loo     se_loo margloglik

 72.000  -2772.953      0.947   5926.661   5687.456     70.775   5686.784     67.496     60.379   5687.270     67.740     60.417  -3093.320

> fit2 (metric)

 npar       logl        ppp        bic        dic      p_dic       waic     p_waic    se_waic      looic      p_loo     se_loo margloglik

 60.000  -2777.333      0.968   5871.962   5672.741     59.037   5672.749     57.057     61.082   5672.982     57.173     61.082  -3045.330

(2) Fix factor loading to .7 (population value) (ULI)

model <- 'f =~ NA*x1+x2+x3+x4+x5+x6+x7+x8+x9+x10+x11+.7*x12

f~~c(NA,NA)*f

f~c(NA,NA)*1

x12~ 0*1'

> fit1 (configural)

npar       logl        ppp        bic        dic      p_dic       waic     p_waic    se_waic      looic      p_loo     se_loo margloglik

72.000  -2790.510      0.854   5961.775   5672.763     45.872   5701.733     71.920     61.929   5702.018     72.063     61.931  -3080.562

> fit2 (metric)

npar       logl        ppp        bic        dic      p_dic       waic     p_waic    se_waic      looic      p_loo     se_loo margloglik

61.000  -2777.232      0.963   5877.048   5673.033     59.285   5673.559     57.809     61.307   5673.786     57.923     61.310  -3039.682

-Xinya

Ed Merkle

unread,

Jul 7, 2020, 12:16:47 AM7/7/20

to Xinya Liang, blavaan

Xinya,

I believe that the std.lv argument is causing the difference here. You are using the model syntax to try to set UVI vs ULI yourself, but all the model estimation commands (blavaan/bcfa/bsem/bgrowth) have an std.lv argument that is set to FALSE by default. When std.lv=FALSE, you get ULI and the first loading is automatically set to 1. When you set std.lv=TRUE, you get UVI. If you use blavaan(), you might get more flexibility here by using extra arguments that are described at:

?lavaan::lavOptions

I recommend always using the std.lv argument for UVI (instead of adding f ~~ 1*f to the model syntax) because blavaan has to do some extra things to ensure model convergence. Those extra things are triggered by std.lv=TRUE. For the ULI model, you want to make sure that the first loading is ont being fixed to 1, in addition to the loading that you fixed to .7.

Best,

Ed

On Mon, 2020-07-06 at 15:43 -0700, Xinya Liang wrote:

Hello,

I hope all is well with you.

I fitted a one-factor two-group model with different identification methods (fixing factor variance UVI vs. fixing factor loading ULI) using blavaan through Stan. I found that DIC performed quite differently depending on the use of UVI or ULI when the sample size was small. In the output below (based on n=100 per group): (1) The logl in configural models (in green) differed quite a lot when using different constraints, but it was similar in metric models (in blue). (2) When doing model comparison, DIC would select the metric model when using UVI, but select configural model when using ULI. In addition, in a simulation study comparing between configural and metric models, when using ULI with small sample sizes, DIC had unusually high false positive rates and also much higher true positives than other fit measures. This was not observed with UVI, when sample size was large, or when comparing between metric and scalar models.

I am not sure why the identification (UVI vs. ULI) makes such a big difference. This seems to only affect DIC with small sample sizes. Do you have any clues why this happened? Thank you so much!

(1) Fix factor variance to 1 (UVI)

model <- 'f =~ NA*x1+x2+x3+x4+x5+x6+x7+x8+x9+x10+x11+x12

f~~1*f'

"Lucida Console";color:blue">> fit1 (configural)

color:black;border:none windowtext 1.0pt;mso-border-alt:none windowtext 0in;

padding:0in"> npar       logl        ppp        bic        dic      p_dic       waic     p_waic    se_waic      looic      p_loo     se_loo margloglik

color:black;border:none windowtext 1.0pt;mso-border-alt:none windowtext 0in;

padding:0in"> 72.000  -2772.953      0.947   5926.661   5687.456     70.775   5686.784     67.496     60.379   5687.270     67.740     60.417  -3093.320

"Lucida Console";color:blue">> fit2 (metric)

color:black;border:none windowtext 1.0pt;mso-border-alt:none windowtext 0in;

padding:0in"> npar       logl        ppp        bic        dic      p_dic       waic     p_waic    se_waic      looic      p_loo     se_loo margloglik

color:black;border:none windowtext 1.0pt;mso-border-alt:none windowtext 0in;

padding:0in"> 60.000  -2777.333      0.968   5871.962   5672.741     59.037   5672.749     57.057     61.082   5672.982     57.173     61.082  -3045.330

(2) Fix factor loading to .7 (population value) (ULI)

model <- 'f =~ NA*x1+x2+x3+x4+x5+x6+x7+x8+x9+x10+x11+.7*x12

            f~~c(NA,NA)*f

            f~c(NA,NA)*1

            x12~ 0*1'

"Lucida Console";color:blue">> fit1 (configural)

color:black;border:none windowtext 1.0pt;mso-border-alt:none windowtext 0in;

padding:0in">npar       logl        ppp        bic        dic      p_dic       waic     p_waic    se_waic      looic      p_loo     se_loo margloglik

color:black;border:none windowtext 1.0pt;mso-border-alt:none windowtext 0in;

padding:0in">72.000  -2790.510      0.854   5961.775   5672.763     45.872   5701.733     71.920     61.929   5702.018     72.063     61.931  -3080.562

"Lucida Console";color:blue">> fit2 (metric)

color:black;border:none windowtext 1.0pt;mso-border-alt:none windowtext 0in;

padding:0in">npar       logl        ppp        bic        dic      p_dic       waic     p_waic    se_waic      looic      p_loo     se_loo margloglik

color:black;border:none windowtext 1.0pt;mso-border-alt:none windowtext 0in;

padding:0in">61.000  -2777.232      0.963   5877.048   5673.033     59.285   5673.559     57.809     61.307   5673.786     57.923     61.310  -3039.682

-Xinya

--
You received this message because you are subscribed to the Google Groups "blavaan" group.
To unsubscribe from this group and stop receiving emails from it, send an email to blavaan+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/blavaan/457a749f-7a84-4d13-b94e-3887e5718cd4o%40googlegroups.com.

Mauricio Garnier-Villarreal

unread,

Jul 7, 2020, 4:30:41 PM7/7/20

to blavaan

Xinya

On another note, it is not recommended to compared the models based on DIC, it has shown to have poor performance and its not a really bayesian information criteria. It is better to do the comparisos based on LOO or WAIC. The comparisons based on them have an SE for the differences which helps identify the magnitude of the difference

You can do this with the blavCompare() function

Xinya Liang

unread,

Jul 12, 2020, 2:31:34 PM7/12/20

to blavaan

Thank you for both of your replies! That is very helpful! We have two additional questions.

1. For the SE of LOO or WAIC difference, since the distribution of the difference is not known, could you provide some references on how the SE can be used with these fit measures?

2. We played around a little bit. When ULI was used, the model (two-group CFA) seems to take much longer to converge (than using UVI). Is this something you observed?

Also, the group 2 estimates look odd and did not seem to match the trace plots (though group 1 estimates matched the plots perfectly). Please see the code and snapshots below.

Using this code and selecting only replications with rhat < 1.05, DIC still shows extremely high false positives with small sample size (comparing to WAIC and LOO). Is there anything we missed? Thank you so much for your time!

Code for ULI (n/group = 100):

model <- 'f =~ NA*x1+x2+x3+x4+x5+x6+x7+x8+x9+x10+x11+.7*x12

f~~c(NA,NA)*f

f~c(NA,NA)*1

x12~ 0*1'

run1 <- blavaan::bcfa(model, data = dat, group = "grp", burnin=1500, sample=3000, bcontrol=list(cores=4), n.chains = 2)

Results for loading estimates:

Group 1 []:

Latent Variables:
Estimate Post.SD pi.lower pi.upper Rhat neff Prior
f =~
x1 0.719 0.119 0.521 0.994 1.000 1404.159 normal(0,10)
x2 0.816 0.139 0.584 1.131 1.000 1618.082 normal(0,10)
x3 0.722 0.125 0.512 0.998 1.000 1440.710 normal(0,10)
x4 0.705 0.123 0.495 0.985 1.000 1444.321 normal(0,10)
x5 0.878 0.140 0.648 1.201 1.000 1259.758 normal(0,10)
x6 0.644 0.121 0.439 0.919 1.000 1524.819 normal(0,10)
x7 0.841 0.134 0.621 1.148 1.000 1257.357 normal(0,10)
x8 0.746 0.129 0.533 1.039 1.000 1419.196 normal(0,10)
x9 0.732 0.122 0.534 1.016 1.000 1290.344 normal(0,10)
x10 0.849 0.131 0.638 1.155 1.000 1326.228 normal(0,10)
x11 0.859 0.136 0.642 1.174 1.000 1152.873 normal(0,10)
x12 0.700 NA NA

Group 2 []:

Latent Variables:

Estimate Post.SD pi.lower pi.upper Rhat neff Prior

f =~

x1 1.931 1.890 0.581 7.281 1.051 63.299 normal(0,10)

x2 2.484 2.424 0.778 9.282 1.052 62.666 normal(0,10)

x3 2.852 2.769 0.918 10.585 1.053 61.475 normal(0,10)

x4 2.647 2.558 0.839 9.614 1.052 61.942 normal(0,10)

x5 2.061 1.993 0.63 7.513 1.052 62.991 normal(0,10)

x6 2.229 2.167 0.702 8.162 1.052 62.770 normal(0,10)

x7 2.209 2.157 0.701 8.304 1.053 61.455 normal(0,10)

x8 2.071 1.999 0.634 7.608 1.050 63.565 normal(0,10)

x9 2.702 2.615 0.859 10.145 1.052 62.297 normal(0,10)

x10 2.463 2.373 0.793 9.077 1.053 61.873 normal(0,10)

x11 2.923 2.794 0.957 10.503 1.052 61.895 normal(0,10)

x12 0.700 NA NA

Group 1 track plots

Group 2 track plots

Best,

Xinya

To view this discussion on the web visit https://groups.google.com/d/msgid/blavaan/ce07894b-346e-4097-a9e1-8b50bba9a077n%40googlegroups.com.

Mauricio Garnier-Villarreal

unread,

Jul 12, 2020, 11:55:50 PM7/12/20

to blavaan

Xinya

2. We played around a little bit. When ULI was used, the model (two-group CFA) seems to take much longer to converge (than using UVI). Is this something you observed?

It seems odd to use factor loading identification by setting it to 0.7, I havent seen that before.

You can streamline this by simplifying the code

model <- 'f =~ x1+x2+x3+x4+x5+x6+x7+x8+x9+x10+x11+x12'

run1 <- blavaan::bcfa(model, data = dat, group = "grp", burnin=1500, sample=3000, bcontrol=list(cores=4), n.chains = 2, std.lv=F)

By adding st.lv=F it will automatically use the first indicator, if you want to use x12 you can just put first. My guess about the time difference is because of you setting the constraints on your own in the syntax. By doing this the Stan code will be more complicated to run. For efficiency the recommendation is to use the std.lv argument.

I have no notion on the discrepancy for the trace plots

The current general recommendation is for all rhat to be lower than 1.01, for convergence

For simulations I usually run the model in a while loop increasing the burnin, untill all Rhats < 1.01. But always keeping the sample arguemnt constant, so that the posterior distributions have the same number of draws across iterations

HS.model <- ' visual =~ x1 + x2 + x3
              textual =~ x4 + x5 + x6
              speed   =~ x7 + x8 + x9 '

BURN <- 3000
rhat <- 20
while (rhat > 1.05) {

BURN <- BURN + 2000 ### increase burn in by 2000 iterations every time

fit <- bcfa(HS.model, std.lv=T,
              data=HolzingerSwineford1939,
              n.chains = 3, burnin = BURN,
              sample=1000,target="stan")
rhat <- max(blavInspect(fit, "psrf"), na.rm=T)
print(paste0("Rhat=",rhat))
}

1. For the SE of LOO or WAIC difference, since the distribution of the difference is not known, could you provide some references on how the SE can be used with these fit measures?

From the development of the LOO and WAIC difference, we know that each one have no known distribution. But the difference between them do follow a normal distribution. The one issue brought up is that the SE of the difference is most likely optimistic (smaller than it really is). So their recommendation to say that on model presents consistently better out-of-sample predictive accuracy is the if the ration LOO_diff/SE to be "at least" higher than 2, BUT PREFERABLY higher than 4

Using this code and selecting only replications with rhat < 1.05, DIC still shows extremely high false positives with small sample size (comparing to WAIC and LOO). Is there anything we missed?

I havent seen that particular issue of DIC, but it does have many issues as, it uses any difference as an indication of a "better" model. So, it doesnt surprise me that fails there

From a pure Bayesian inference view the issue of model selection is more complicated than in the frequentist view. So, the of false positive can be controversial, since rarely the idea to have the true model. But the indices help use identify the model with best out-of-sample predictive accuracy. And in many scenarios, the wrong model can have better accuracy.

The general rule with DIC of simply selecting the model with DIC is overly simple, and and kind of borrows the frequentist view. An interesting scenario with LOO and WAIC is when you find that they have functionally the same predictive accuracy (rayio LOO_diff/SE = 0.85 for example). Because from a frequentist general recommendation you would choose the smaller model (Ocxams razor). But from a Bayesian view you dont have to choose the smaller model, you can choose either, whichever is more theoretically relevant.

Hope some of this helps

Xinya Liang

unread,

Jul 14, 2020, 6:51:12 PM7/14/20

to blavaan

Dear Mauricio,

Thank you so much for your quick response. This is very helpful. I fixed one factor loading to .7 because it is the population value in a simulation. I tried fixing the loading to 1 using the code you provided, and the fit measures are very similar to those by setting one loading to .7. Thank you again for the help!

Best,

Xinya

To view this discussion on the web visit https://groups.google.com/d/msgid/blavaan/70fc296b-50c7-4e35-ac9b-60dd0def0136n%40googlegroups.com.

Mauricio Garnier-Villarreal

unread,

Jul 15, 2020, 5:15:55 PM7/15/20

to blavaan

Xinya

I wouldnt expect the fit measures to change if you set the metric by 0.7 or by 1. Just that if you use the automatic method in blavaan should be faster.

The same between ULI and UVI, there shouldnt be a big difference. In ML, you would see the exact same fit measures. But in blavaan, because of the MCMC chain sampling process, you will see some small differences due to the MCMC process. But in general should be really close to each other.

Also, notice that the blavCompare function will give you the LOO, WAIC, and log-Bayes Factor. In Bayesian statistics there are two sides about model comparison, the Information Criteria (LOO, WAIC) and the Bayes factor side. As you might have notice I am on the IC side. But blavaan provides both pieces of information

For the test of LOO and WAIC differences, I would recommend to look at correct model selection rates at several criteria: LOO_diff/SE > 1, LOO_diff/SE > 2, LOO_diff/SE > 3, LOO_diff/SE > 4. For example

Hope this helps

Reply all

Reply to author

Forward