Is low Bayesian Fraction of Missing Information cause for concern in a Multivariate Probit model?

Jon Fawcett

unread,

Dec 30, 2016, 1:27:12 PM12/30/16

to Stan users mailing list

I am working on a multivariate probit model to deal with some individual patient data I hope to incorporate into a larger epidemiological meta-analysis. I had been banging my head against a wall for quite some time until I came across Bob’s post (https://groups.google.com/d/msg/stan-users/0qdtaoAl1us/xC4Vh2MDpj4J) and the related model provided in the reference manual, which looks to be precisely what I need. However, I am having trouble getting the code to produce high n_eff. Specifically, when I run the sample code – even if I increase the number of iterations as high as 10000 per chain – I sometimes end up with very low n_eff for several parameters (some in the range of 400) and the following warning message:

2: There were 4 chains where the estimated Bayesian Fraction of Missing Information was low. See

http://mc-stan.org/misc/warnings.html#bfmi-low

Given that this is sample code, I thought I would check to see whether the warning is something I should be worrying about/trying to correct, or if it were simply a fact of life when fitting these sorts of models in Stan. From reading this thread (https://groups.google.com/d/msg/stan-users/JDMqS13wi2s/dYR64q20BQAJ) it seems that low n_eff is not necessarily fatal in this model. If so, should I just increase the number of iterations even further (which – besides re-parameterization – seems to be the suggestion in the FAQ)? I am running rstan 2.13.2 (gcc version 4.2.1). I ran the code as posted in the above link with the exception of increasing the number of iterations and adding control=list(adapt_delta=.99, stepsize=.01). In case it matters, my actual application will involve estimating the prevalence of 8 rare (prevalence < 10%) disorders as well as their covariance. But I have not gotten that far yet.

Here is my sessionInfo():

R version 3.3.1 (2016-06-21)

Platform: x86_64-apple-darwin13.4.0 (64-bit)

Running under: OS X 10.11.6 (El Capitan)

locale:

[1] en_CA.UTF-8/en_CA.UTF-8/en_CA.UTF-8/C/en_CA.UTF-8/en_CA.UTF-8

attached base packages:

[1] stats graphics grDevices utils datasets methods base

other attached packages:

[1] foreign_0.8-67 openxlsx_3.0.0 psych_1.6.9 rstan_2.13.2 StanHeaders_2.13.1

[6] ggplot2_2.2.0 reshape2_1.4.2 metafor_1.9-9 Matrix_1.2-7.1 dplyr_0.5.0

[11] SimCorMultRes_1.4.2 MASS_7.3-45

loaded via a namespace (and not attached):

[1] Rcpp_0.12.8 magrittr_1.5 mnormt_1.5-5 evd_2.3-2 munsell_0.4.3 colorspace_1.3-2

[7] lattice_0.20-34 R6_2.2.0 stringr_1.1.0 plyr_1.8.4 tools_3.3.1 parallel_3.3.1

[13] grid_3.3.1 gtable_0.2.0 DBI_0.5-1 lazyeval_0.2.0 assertthat_0.1 tibble_1.2

[19] gridExtra_2.2.1 codetools_0.2-15 rsconnect_0.7 inline_0.3.14 stringi_1.1.2 scales_0.4.1

[25] stats4_3.3.1

Ben Goodrich

unread,

Dec 30, 2016, 2:28:14 PM12/30/16

to Stan users mailing list

On Friday, December 30, 2016 at 1:27:12 PM UTC-5, Jon Fawcett wrote:

Given that this is sample code, I thought I would check to see whether the warning is something I should be worrying about/trying to correct,

Depends on who you talk to. These latent utility models are really hard for NUTS, although they are probably harder for Gibbs than people realize. I may be able to re-re-parameterize the one in example_models.

Ben

Ben Goodrich

unread,

Dec 30, 2016, 7:07:20 PM12/30/16

to stan-...@googlegroups.com

This example

https://github.com/stan-dev/example-models/blob/master/misc/multivariate-probit/probit-multi.stan

also has a low BFMI, which can be seen by executing

options(mc.cores = parallel::detectCores())
bad <- stan_demo("probit-multi")

I implemented another parameterization

https://github.com/stan-dev/example-models/blob/master/misc/multivariate-probit/probit-multi-good.stan

that mixes much better but executes somewhat slowly

good <- stan_demo("probit-multi-good", init_r = 0.5)

The idea here is to use error quantiles to absorb the constraints implied by the observed outcomes. If y[i,d] == 1, then mu[d] + (L_omega * z)[d] has to be positive for observation i, where mu = beta * x[i]. If y[i,d] == 0, then all that has to be negative. Since L_Omega is lower-triangular, you can recursively infer how big or small the next element of z has to be in order to satisfy the inequality constraint implied by y[i,d]. But you cannot declare z in the parameters blocks because the Stan language only supports scalar constraints. So, there is a one-to-one transformation where z[d] = inv_Phi(t) and t is some number on (0,1) that satisfies the inequality constraint implied by y. But you cannot declare t in the parameters block either, so there is a transformation between t and u[i,d] involving the Phi() function so that real<lower=0,upper=1> u[N,D]; is a valid declaration in the parameters block.

This is a case where you have to use Jacobian adjustments, although the direction of the transformation is the opposite of what it usually is because your prior beliefs are that u is standard uniform, t is uniform but not standard uniform, and z is truncated standard normal (with the truncation point chosen to satisfy the constraint on utility implied by the outcome).

Ben

Jon Fawcett

unread,

Jan 1, 2017, 1:56:30 PM1/1/17

to Stan users mailing list

Thanks, Ben!

This is an amazing, belated Christmas gift – and another example of the legendary support offered by this group. I will add you to the list of acknowledgements in the author’s note of the paper once it is ready to go out.

The model seems to do precisely what I need, and it converges much better than before. My particular application is a little different in that I am only trying to predict the probit-transformed mean of each variable (rather than estimate the influence of predictors), but I believe this to be a simple change: I made mu into a parameter representing the mean prevalence for each disorder rather than calculating it on the basis of beta * x[n]. The only other wrinkle is the inclusion of missing data. There are several patients for which we have no measurements for a number of the disorders (this comprises around 12% of the total data we have on hand with the precise disorders affected varying). My understanding based on the earlier incarnation of this model is that for those trials, an unbounded value is imputed instead. As such, I changed y to range from -1 to 1 and coded the missing data as -1 (0’s and 1’s still mean the same; I actually did not think to check until just now, but if Stan could handle the values being NA I may try to keep them as such rather than artificially coding missing values as -1). I then extended the if-else statement that sorted and appropriately bounded the data in your code to deal with 0’s and 1’s as before, but with an additional statement dealing with the -1’s as follows:

else {
   z[d] = inv_Phi(u[n,d]);
}

This statement ignores the bound variable. In simulations this *seems* to work, but given that I am still learning about the current implementation, I thought I had better check with one of the Oracles.

Ben Goodrich

unread,

Jan 1, 2017, 5:09:45 PM1/1/17

to Stan users mailing list

On Sunday, January 1, 2017 at 1:56:30 PM UTC-5, Jon Fawcett wrote:

The model seems to do precisely what I need, and it converges much better than before. My particular application is a little different in that I am only trying to predict the probit-transformed mean of each variable (rather than estimate the influence of predictors), but I believe this to be a simple change: I made mu into a parameter representing the mean prevalence for each disorder rather than calculating it on the basis of beta * x[n].

That's fine

The only other wrinkle is the inclusion of missing data. There are several patients for which we have no measurements for a number of the disorders (this comprises around 12% of the total data we have on hand with the precise disorders affected varying). My understanding based on the earlier incarnation of this model is that for those trials, an unbounded value is imputed instead. As such, I changed y to range from -1 to 1 and coded the missing data as -1 (0’s and 1’s still mean the same; I actually did not think to check until just now, but if Stan could handle the values being NA I may try to keep them as such rather than artificially coding missing values as -1).

The Stan language doesn't have a NA concept so you have to code the missing values as a magic number or pass the nonmissing values in as a giant one-dimensional array and pass in another two-dimensional array indicating where the observed values occur.

I then extended the if-else statement that sorted and appropriately bounded the data in your code to deal with 0’s and 1’s as before, but with an additional statement dealing with the -1’s as follows:

else { z[d] = inv_Phi(u[n,d]); }

This statement ignores the bound variable. In simulations this *seems* to work, but given that I am still learning about the current implementation, I thought I had better check with one of the Oracles.

That should be fine since there are no informative bounds on the t when the outcome is missing.

Ben

Reply all

Reply to author

Forward