DIC

86 views
Skip to first unread message

Xinya Liang

unread,
Jul 6, 2020, 6:43:52 PM7/6/20
to blavaan
Hello,

I hope all is well with you.

I fitted a one-factor two-group model with different identification methods (fixing factor variance UVI vs. fixing factor loading ULI) using blavaan through Stan. I found that DIC performed quite differently depending on the use of UVI or ULI when the sample size was small. In the output below (based on n=100 per group): (1) The logl in configural models (in green) differed quite a lot when using different constraints, but it was similar in metric models (in blue). (2) When doing model comparison, DIC would select the metric model when using UVI, but select configural model when using ULI. In addition, in a simulation study comparing between configural and metric models, when using ULI with small sample sizes, DIC had unusually high false positive rates and also much higher true positives than other fit measures. This was not observed with UVI, when sample size was large, or when comparing between metric and scalar models.

I am not sure why the identification (UVI vs. ULI) makes such a big difference. This seems to only affect DIC with small sample sizes. Do you have any clues why this happened? Thank you so much!


(1) Fix factor variance to 1 (UVI)

 

model <- 'f =~ NA*x1+x2+x3+x4+x5+x6+x7+x8+x9+x10+x11+x12

                f~~1*f'

 

> fit1 (configural)
 npar       logl        ppp        bic        dic      p_dic       waic     p_waic    se_waic      looic      p_loo     se_loo margloglik 
 72.000  -2772.953      0.947   5926.661   5687.456     70.775   5686.784     67.496     60.379   5687.270     67.740     60.417  -3093.320 

 

> fit2 (metric)
 npar       logl        ppp        bic        dic      p_dic       waic     p_waic    se_waic      looic      p_loo     se_loo margloglik 
 60.000  -2777.333      0.968   5871.962   5672.741     59.037   5672.749     57.057     61.082   5672.982     57.173     61.082  -3045.330

 

 

(2) Fix factor loading to .7 (population value) (ULI)

 

model <- 'f =~ NA*x1+x2+x3+x4+x5+x6+x7+x8+x9+x10+x11+.7*x12

            f~~c(NA,NA)*f

            f~c(NA,NA)*1

            x12~ 0*1'

 

> fit1 (configural)
npar       logl        ppp        bic        dic      p_dic       waic     p_waic    se_waic      looic      p_loo     se_loo margloglik 
72.000  -2790.510      0.854   5961.775   5672.763     45.872   5701.733     71.920     61.929   5702.018     72.063     61.931  -3080.562 

 

> fit2 (metric)
npar       logl        ppp        bic        dic      p_dic       waic     p_waic    se_waic      looic      p_loo     se_loo margloglik 
61.000  -2777.232      0.963   5877.048   5673.033     59.285   5673.559     57.809     61.307   5673.786     57.923     61.310  -3039.682

 


 -Xinya

Ed Merkle

unread,
Jul 7, 2020, 12:16:47 AM7/7/20
to Xinya Liang, blavaan
Xinya,

I believe that the std.lv argument is causing the difference here. You are using the model syntax to try to set UVI vs ULI yourself, but all the model estimation commands (blavaan/bcfa/bsem/bgrowth) have an std.lv argument that is set to FALSE by default. When std.lv=FALSE, you get ULI and the first loading is automatically set to 1. When you set std.lv=TRUE, you get UVI. If you use blavaan(), you might get more flexibility here by using extra arguments that are described at:

?lavaan::lavOptions

I recommend always using the std.lv argument for UVI (instead of adding f ~~ 1*f to the model syntax) because blavaan has to do some extra things to ensure model convergence. Those extra things are triggered by std.lv=TRUE. For the ULI model, you want to make sure that the first loading is ont being fixed to 1, in addition to the loading that you fixed to .7.

Best,
Ed


On Mon, 2020-07-06 at 15:43 -0700, Xinya Liang wrote:
Hello,

I hope all is well with you.

I fitted a one-factor two-group model with different identification methods (fixing factor variance UVI vs. fixing factor loading ULI) using blavaan through Stan. I found that DIC performed quite differently depending on the use of UVI or ULI when the sample size was small. In the output below (based on n=100 per group): (1) The logl in configural models (in green) differed quite a lot when using different constraints, but it was similar in metric models (in blue). (2) When doing model comparison, DIC would select the metric model when using UVI, but select configural model when using ULI. In addition, in a simulation study comparing between configural and metric models, when using ULI with small sample sizes, DIC had unusually high false positive rates and also much higher true positives than other fit measures. This was not observed with UVI, when sample size was large, or when comparing between metric and scalar models.

I am not sure why the identification (UVI vs. ULI) makes such a big difference. This seems to only affect DIC with small sample sizes. Do you have any clues why this happened? Thank you so much!


(1) Fix factor variance to 1 (UVI)

 

model <- 'f =~ NA*x1+x2+x3+x4+x5+x6+x7+x8+x9+x10+x11+x12

                f~~1*f'

 

"Lucida Console";color:blue">> fit1 (configural)
color:black;border:none windowtext 1.0pt;mso-border-alt:none windowtext 0in;
padding:0in"> npar       logl        ppp        bic        dic      p_dic       waic     p_waic    se_waic      looic      p_loo     se_loo margloglik 
color:black;border:none windowtext 1.0pt;mso-border-alt:none windowtext 0in;
padding:0in"> 72.000  -2772.953      0.947   5926.661   5687.456     70.775   5686.784     67.496     60.379   5687.270     67.740     60.417  -3093.320 

 

"Lucida Console";color:blue">> fit2 (metric)
color:black;border:none windowtext 1.0pt;mso-border-alt:none windowtext 0in;
padding:0in"> npar       logl        ppp        bic        dic      p_dic       waic     p_waic    se_waic      looic      p_loo     se_loo margloglik 
color:black;border:none windowtext 1.0pt;mso-border-alt:none windowtext 0in;
padding:0in"> 60.000  -2777.333      0.968   5871.962   5672.741     59.037   5672.749     57.057     61.082   5672.982     57.173     61.082  -3045.330

 

 

(2) Fix factor loading to .7 (population value) (ULI)

 

model <- 'f =~ NA*x1+x2+x3+x4+x5+x6+x7+x8+x9+x10+x11+.7*x12

            f~~c(NA,NA)*f

            f~c(NA,NA)*1

            x12~ 0*1'

 

"Lucida Console";color:blue">> fit1 (configural)
color:black;border:none windowtext 1.0pt;mso-border-alt:none windowtext 0in;
padding:0in">npar       logl        ppp        bic        dic      p_dic       waic     p_waic    se_waic      looic      p_loo     se_loo margloglik 
color:black;border:none windowtext 1.0pt;mso-border-alt:none windowtext 0in;
padding:0in">72.000  -2790.510      0.854   5961.775   5672.763     45.872   5701.733     71.920     61.929   5702.018     72.063     61.931  -3080.562 

 

"Lucida Console";color:blue">> fit2 (metric)
color:black;border:none windowtext 1.0pt;mso-border-alt:none windowtext 0in;
padding:0in">npar       logl        ppp        bic        dic      p_dic       waic     p_waic    se_waic      looic      p_loo     se_loo margloglik 
color:black;border:none windowtext 1.0pt;mso-border-alt:none windowtext 0in;
padding:0in">61.000  -2777.232      0.963   5877.048   5673.033     59.285   5673.559     57.809     61.307   5673.786     57.923     61.310  -3039.682

 


 -Xinya

--
You received this message because you are subscribed to the Google Groups "blavaan" group.
To unsubscribe from this group and stop receiving emails from it, send an email to blavaan+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/blavaan/457a749f-7a84-4d13-b94e-3887e5718cd4o%40googlegroups.com.

Mauricio Garnier-Villarreal

unread,
Jul 7, 2020, 4:30:41 PM7/7/20
to blavaan
Xinya

On another note, it is not recommended to compared the models based on DIC, it has shown to have poor performance and its not a really bayesian information criteria. It is better to do the comparisos based on LOO or WAIC. The comparisons based on them have an SE for the differences which helps identify the magnitude of the difference

You can do this with the blavCompare() function

Xinya Liang

unread,
Jul 12, 2020, 2:31:34 PM7/12/20
to blavaan

Thank you for both of your replies! That is very helpful! We have two additional questions.

 

1. For the SE of LOO or WAIC difference, since the distribution of the difference is not known, could you provide some references on how the SE can be used with these fit measures?

 

2. We played around a little bit. When ULI was used, the model (two-group CFA) seems to take much longer to converge (than using UVI). Is this something you observed?  

 

Also, the group 2 estimates look odd and did not seem to match the trace plots (though group 1 estimates matched the plots perfectly). Please see the code and snapshots below.

 

Using this code and selecting only replications with rhat < 1.05, DIC still shows extremely high false positives with small sample size (comparing to WAIC and LOO). Is there anything we missed? Thank you so much for your time!

 

Code for ULI (n/group = 100):

 

model <- 'f =~ NA*x1+x2+x3+x4+x5+x6+x7+x8+x9+x10+x11+.7*x12

      f~~c(NA,NA)*f

      f~c(NA,NA)*1

      x12~ 0*1'

 

run1 <- blavaan::bcfa(model, data = dat, group = "grp", burnin=1500, sample=3000, bcontrol=list(cores=4), n.chains = 2)

 

Results for loading estimates:


Group 1 []:

Latent Variables:
                   Estimate  Post.SD pi.lower pi.upper     Rhat     neff    Prior      
  f =~                                                                                  
    x1                0.719    0.119    0.521    0.994    1.000 1404.159    normal(0,10)
    x2                0.816    0.139    0.584    1.131    1.000 1618.082    normal(0,10)
    x3                0.722    0.125    0.512    0.998    1.000 1440.710    normal(0,10)
    x4                0.705    0.123    0.495    0.985    1.000 1444.321    normal(0,10)
    x5                0.878    0.140    0.648    1.201    1.000 1259.758    normal(0,10)
    x6                0.644    0.121    0.439    0.919    1.000 1524.819    normal(0,10)
    x7                0.841    0.134    0.621    1.148    1.000 1257.357    normal(0,10)
    x8                0.746    0.129    0.533    1.039    1.000 1419.196    normal(0,10)
    x9                0.732    0.122    0.534    1.016    1.000 1290.344    normal(0,10)
    x10               0.849    0.131    0.638    1.155    1.000 1326.228    normal(0,10)
    x11               0.859    0.136    0.642    1.174    1.000 1152.873    normal(0,10)
    x12               0.700                                  NA       NA               

 

Group 2 []:

 

Latent Variables:

                   Estimate  Post.SD pi.lower pi.upper     Rhat     neff    Prior      

  f =~                                                                                 

    x1                1.931    1.890    0.581    7.281    1.051   63.299    normal(0,10)

    x2                2.484    2.424    0.778    9.282    1.052   62.666    normal(0,10)

    x3                2.852    2.769    0.918   10.585    1.053   61.475    normal(0,10)

    x4                2.647    2.558    0.839    9.614    1.052   61.942    normal(0,10)

    x5                2.061    1.993     0.63    7.513    1.052   62.991    normal(0,10)

    x6                2.229    2.167    0.702    8.162    1.052   62.770    normal(0,10)

    x7                2.209    2.157    0.701    8.304    1.053   61.455    normal(0,10)

    x8                2.071    1.999    0.634    7.608    1.050   63.565    normal(0,10)

    x9                2.702    2.615    0.859   10.145    1.052   62.297    normal(0,10)

    x10               2.463    2.373    0.793    9.077    1.053   61.873    normal(0,10)

    x11               2.923    2.794    0.957   10.503    1.052   61.895    normal(0,10)

    x12               0.700                                  NA       NA               

 

Group 1 track plots

Group1 loading.png
Group 2 track plots

Group2 loading.png

Best,
Xinya



Mauricio Garnier-Villarreal

unread,
Jul 12, 2020, 11:55:50 PM7/12/20
to blavaan
Xinya

2. We played around a little bit. When ULI was used, the model (two-group CFA) seems to take much longer to converge (than using UVI). Is this something you observed?  

It seems odd to use factor loading identification by setting it to 0.7, I havent seen that before.

You can streamline this by simplifying the code

model <- 'f =~ x1+x2+x3+x4+x5+x6+x7+x8+x9+x10+x11+x12'

 run1 <- blavaan::bcfa(model, data = dat, group = "grp", burnin=1500, sample=3000, bcontrol=list(cores=4), n.chains = 2, std.lv=F)

 By adding st.lv=F it will automatically use the first indicator, if you want to use x12 you can just put first. My guess about the time difference is because of you setting the constraints on your own in the syntax. By doing this the Stan code will be more complicated to run. For efficiency the recommendation is to use the std.lv argument.

I have no notion on the discrepancy for the trace plots

The current general recommendation is for all rhat to be lower than 1.01, for convergence

For simulations I usually run the model in a while loop increasing the burnin, untill all Rhats < 1.01. But always keeping the sample arguemnt constant, so that the posterior distributions have the same number of draws across iterations



HS.model <- ' visual  =~ x1 + x2 + x3
              textual =~ x4 + x5 + x6
              speed   =~ x7 + x8 + x9 '

BURN <- 3000
rhat <- 20
while (rhat > 1.05) {
 
  BURN <- BURN + 2000 ### increase burn in by 2000 iterations every time
 
  fit <- bcfa(HS.model, std.lv=T,
              data=HolzingerSwineford1939,
              n.chains = 3, burnin = BURN,
              sample=1000,target="stan")
  rhat <- max(blavInspect(fit, "psrf"), na.rm=T)
  print(paste0("Rhat=",rhat)) 
}




1. For the SE of LOO or WAIC difference, since the distribution of the difference is not known, could you provide some references on how the SE can be used with these fit measures?

From the development of the LOO and WAIC difference, we know that each one have no known distribution. But the difference between them do follow a normal distribution. The one issue brought up is that the SE of the difference is most likely optimistic (smaller than it really is). So their recommendation to say that on model presents consistently better out-of-sample predictive accuracy is the if the ration LOO_diff/SE to be "at least" higher than 2, BUT PREFERABLY higher than 4

 

Using this code and selecting only replications with rhat < 1.05, DIC still shows extremely high false positives with small sample size (comparing to WAIC and LOO). Is there anything we missed?

I havent seen that particular issue of DIC, but it does have many issues as, it uses any difference as an indication of a "better" model. So, it doesnt surprise me that fails there

From a pure Bayesian inference view the issue of model selection is more complicated than in the frequentist view. So, the of false positive can be controversial, since rarely the idea to have the true model. But the indices help use identify the model with best out-of-sample predictive accuracy. And in many scenarios, the wrong model can have better accuracy.

The general rule with DIC of simply selecting the model with DIC is overly simple, and and kind of borrows the frequentist view. An interesting scenario with LOO and WAIC is when you find that they have functionally the same predictive accuracy (rayio LOO_diff/SE = 0.85 for example). Because from a frequentist general recommendation you would choose the smaller model (Ocxams razor). But from a Bayesian view you dont have to choose the smaller model, you can choose either, whichever is more theoretically relevant.



Hope some of this helps


Xinya Liang

unread,
Jul 14, 2020, 6:51:12 PM7/14/20
to blavaan
Dear Mauricio,

Thank you so much for your quick response. This is very helpful. I fixed one factor loading to .7 because it is the population value in a simulation. I tried fixing the loading to 1 using the code you provided, and the fit measures are very similar to those by setting one loading to .7. Thank you again for the help!

Best,
Xinya


Mauricio Garnier-Villarreal

unread,
Jul 15, 2020, 5:15:55 PM7/15/20
to blavaan
Xinya

I wouldnt expect the fit measures to change if you set the metric by 0.7 or by 1. Just that if you use the automatic method in blavaan should be faster.
The same between ULI and UVI, there shouldnt be a big difference. In ML, you would see the exact same fit measures. But in blavaan, because of the MCMC chain sampling process, you will see some small differences due to the MCMC process. But in general should be really close to each other.

Also, notice that the blavCompare function will give you the LOO, WAIC, and log-Bayes Factor. In Bayesian statistics there are two sides about model comparison, the Information Criteria (LOO, WAIC) and the Bayes factor side. As you might have notice I am on the IC side. But blavaan provides both pieces of information

For the test of LOO and WAIC differences, I would recommend to look at correct model selection rates at several criteria: LOO_diff/SE > 1, LOO_diff/SE > 2, LOO_diff/SE > 3, LOO_diff/SE > 4. For example

Hope this helps
Reply all
Reply to author
Forward
0 new messages