ADVI status

Alp Kucukelbir

unread,

Nov 30, 2015, 10:09:53 AM11/30/15

to stan development mailing list, David Blei

hi ben (& co.),

just a friendly ping about interfacing ADVI.

NIPS is just around the corner and i'm being asked to submit my slides to the organizers soon.

do let me know how i can help.

hope you all had a nice thanksgiving!

cheers
alp

Ben Goodrich

unread,

Dec 1, 2015, 2:22:52 AM12/1/15

to stan development mailing list, david...@columbia.edu

On Monday, November 30, 2015 at 10:09:53 AM UTC-5, Alp Kucukelbir wrote:

just a friendly ping about interfacing ADVI.
NIPS is just around the corner and i'm being asked to submit my slides to the organizers soon.
do let me know how i can help.

ADVI is re-enabled in the develop branch of rstan but for unrelated reasons we are not currently able to sync up with the develop branch of Stan the library.

Alp Kucukelbir

unread,

Dec 1, 2015, 7:50:07 AM12/1/15

to stan...@googlegroups.com, david...@columbia.edu

roger that. thanks ben! we can discuss more at today's meeting.

cheers

alp

--
You received this message because you are subscribed to a topic in the Google Groups "stan development mailing list" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/stan-dev/qkQsG0f8krk/unsubscribe.
To unsubscribe from this group and all its topics, send an email to stan-dev+u...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Ben Goodrich

unread,

Dec 2, 2015, 8:35:40 AM12/2/15

to stan development mailing list, david...@columbia.edu

I have ADVI with adaptation working again in R. Attached are comparisons of posterior means between a couple chains of NUTS, meanfield, and fullrank for 115 non-failing examples in example-models. Many of the failures are unrelated to ADVI but some are things like "stan::variational::advi::adapt_eta: All proposed step-sizes failed. Your model may be either severely ill-conditioned or misspecified." when NUTS produces something, which I guess is OK. For the non-failing examples, in R you could do something like

load("matrices.RData")
lapply(matrices, FUN = function(x) round(head(x), digits = 3))

The posterior mean estimates are similar for more examples than before, although fullrank does not seem to be noticeably better than meanfield in the cases where meanfield is considerably different than NUTS. I'll try some from rstanarm later; my guess is that most regression models will be okay with rstanarm's reparameterizations and initial values heuristics.

Ben

matrices.RData

Jiqiang Guo

unread,

Dec 2, 2015, 7:52:59 PM12/2/15

to stan...@googlegroups.com, David Blei

Ben,

I am having problem with a simple example as in testvb.R with branch feature/adapt_eta. I suppose you don't have this problem. Do you have any idea what I have missed?

> library(rstan)

Loading required package: ggplot2

rstan (Version 2.8.9, packaged: 2015-12-03 00:21:29 UTC, GitRev: df3f4e136e91)

For execution on a local, multicore CPU with excess RAM we recommend calling

rstan_options(auto_write = TRUE)

options(mc.cores = parallel::detectCores())

> # default algorithm

> vbf <- rstan:::vb(sm, data = dat, sample_file = 'vb.csv')

Error in .External(list(name = "CppMethod__invoke_notvoid", address = <pointer: 0x7ff722d2c6e0>, :

negative length vectors are not allowed

Calls: <Anonymous> -> <Anonymous> -> .local -> <Anonymous> -> .External

Execution halted

Jiqiang

--
You received this message because you are subscribed to the Google Groups "stan development mailing list" group.
To unsubscribe from this group and stop receiving emails from it, send an email to stan-dev+u...@googlegroups.com.

Ben Goodrich

unread,

Dec 2, 2015, 10:58:53 PM12/2/15

to stan development mailing list, david...@columbia.edu

On Wednesday, December 2, 2015 at 7:52:59 PM UTC-5, Jiqiang Guo wrote:

I am having problem with a simple example as in testvb.R with branch feature/adapt_eta. I suppose you don't have this problem. Do you have any idea what I have missed?

It compiles for me, although I had to update its arguments just now. Is this clang++?

Jiqiang Guo

unread,

Dec 2, 2015, 11:14:48 PM12/2/15

to stan...@googlegroups.com

Yes, I am using clang++. I will try g++ when I have time.

Jiqiang

Ben Goodrich

unread,

Dec 2, 2015, 11:21:21 PM12/2/15

to stan development mailing list

On Wednesday, December 2, 2015 at 11:14:48 PM UTC-5, Jiqiang Guo wrote:

Yes, I am using clang++. I will try g++ when I have time.

The only time I have seen something like that is when Rcpp, RcppEigen, and rstan were compiled with different compilers or options. Does any other example of vb work for you?

Jiqiang Guo

unread,

Dec 2, 2015, 11:50:16 PM12/2/15

to stan...@googlegroups.com

Weird, it is working for me now with clang++ and without any change of Rcpp or RcppEigen.

Thanks,

Jiqiang

Ben Goodrich

unread,

Dec 3, 2015, 12:04:56 AM12/3/15

to stan development mailing list

On Wednesday, December 2, 2015 at 11:50:16 PM UTC-5, Jiqiang Guo wrote:

Weird, it is working for me now with clang++ and without any change of Rcpp or RcppEigen.

OK, I'll merge it. I'm not sure what is going on with the truncated error messages but that is not due to ADVI.

Daniel Lee

unread,

Dec 3, 2015, 8:34:36 AM12/3/15

to stan-dev mailing list

Which messages are being truncated? There is a chance I've made an error setting up the streams. How do you test this?

Alp Kucukelbir

unread,

Dec 3, 2015, 8:35:39 AM12/3/15

to stan...@googlegroups.com

btw guys, i have no clear idea what's going on but please let me know
if i can help.

> You received this message because you are subscribed to a topic in the

> Google Groups "stan development mailing list" group.

> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/stan-dev/qkQsG0f8krk/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to

Ben Goodrich

unread,

Dec 3, 2015, 8:50:24 AM12/3/15

to stan development mailing list

On Thursday, December 3, 2015 at 8:35:39 AM UTC-5, Alp Kucukelbir wrote:

btw guys, i have no clear idea what's going on but please let me know
if i can help.

This is unrelated to ADVI.

Ben Goodrich

unread,

Dec 3, 2015, 8:55:45 AM12/3/15

to stan development mailing list

On Thursday, December 3, 2015 at 8:34:36 AM UTC-5, Daniel Lee wrote:

Which messages are being truncated? There is a chance I've made an error setting up the streams. How do you test this?

For example, in the rstan/tests/unitTests folder there is a file called runit.test.partial_inits.R. If you do

library(RUnit)
library(rstan)
source("runit.test.partial_inits.R")
test_partial_inits()

the first error message will just say

[1] "Error : "

which causes it to fail any test that greps for the content of the error message.

Ben

Ben Goodrich

unread,

Dec 5, 2015, 5:23:26 PM12/5/15

to stan development mailing list, david...@columbia.edu

On Wednesday, December 2, 2015 at 8:35:40 AM UTC-5, Ben Goodrich wrote:

I have ADVI with adaptation working again in R.

I have come across an odd issue with the arsenic model from ARM. With the GitHub versions of the R packages, I can do

ROOT <- "https://raw.githubusercontent.com/stan-dev/example-models/master/ARM/Ch.5/"
wells_data <- read_rdump(paste0(ROOT, "wells.data.R"))
wells_model <- stan_model(paste0(ROOT, "wells_d100ars.stan"))
MCMC <- sampling(wells_model, data = wells_data)
MF <- vb(wells_model, data = wells_data, algorithm = "meanfield")
FR <- vb(wells_model, data = wells_data, algorithm = "fullrank")

and ADVI (which chooses eta = 1) finds basically the same thing as HMC. If I do the "same" model in rstanarm, ADVI was always failing during the adaptation phase, unless I set adapt_engaged = FALSE, in which case ADVI sometimes worked okay but took many more iterations to converge. Then I made what I thought would be a superficial change in the way rstanarm does the log-likelihood for a logit model (where eta0 is the linear predictor vector when y = 0 and eta1 is the linear predictor vector when y = 1)

    if (link == 1) { // logit link
//      This was not working well
//      0 ~ bernoulli_logit(eta0);
//      1 ~ bernoulli_logit(eta1);
      increment_log_prob(logistic_ccdf_log(eta0, 0, 1));
      increment_log_prob(logistic_cdf_log(eta1, 0, 1));
    }

and ADVI adapted and converged quickly to essentially the same solution as HMC. The bernoulli_logit() function has some branch logic when it encounters extreme values of the linear predictor that is absent from logistic_[c]cdf_log(). Could that mess up ADVI in some way that HMC is immune to?

Ben

Alp Kucukelbir

unread,

Dec 5, 2015, 5:26:23 PM12/5/15

to stan...@googlegroups.com, David Blei

this is very interesting.

do you know (off the top of your head) what this branch logic looks like?

if not, i'll dig into it. it could very well be that something like
that would throw the optimization routine in ADVI off in some strange
way -- i'd definitely like to know how and why.

Ben Goodrich

unread,

Dec 5, 2015, 5:38:30 PM12/5/15

to stan development mailing list, david...@columbia.edu

On Saturday, December 5, 2015 at 5:26:23 PM UTC-5, Alp Kucukelbir wrote:

do you know (off the top of your head) what this branch logic looks like?

From the loop in bernoulli_logit.hpp:

        // pull out values of arguments
        const int n_int = value_of(n_vec[n]);
        const T_partials_return theta_dbl = value_of(theta_vec[n]);

        // reusable subexpression values
        const int sign = 2*n_int-1;
        const T_partials_return ntheta = sign * theta_dbl;
        const T_partials_return exp_m_ntheta = exp(-ntheta);

        // Handle extreme values gracefully using Taylor approximations.
        static const double cutoff = 20.0;
        if (ntheta > cutoff)
          logp -= exp_m_ntheta;
        else if (ntheta < -cutoff)
          logp += ntheta;
        else
          logp -= log1p(exp_m_ntheta);

        // gradients
        if (!is_constant_struct<T_prob>::value) {
          static const double cutoff = 20.0;
          if (ntheta > cutoff)
            operands_and_partials.d_x1[n] -= exp_m_ntheta;
          else if (ntheta < -cutoff)
            operands_and_partials.d_x1[n] += sign;
          else
            operands_and_partials.d_x1[n] += sign * exp_m_ntheta
              / (exp_m_ntheta + 1);
        }

Ben

Ben Goodrich

unread,

Dec 5, 2015, 6:31:28 PM12/5/15

to stan development mailing list, david...@columbia.edu

On Wednesday, December 2, 2015 at 8:35:40 AM UTC-5, Ben Goodrich wrote:

I have ADVI with adaptation working again in R.

Also, the QR reparameterization seems to expand the scope of linear regression models where ADVI succeeds. For example,

ROOT <- "https://raw.githubusercontent.com/stan-dev/example-models/master/ARM/Ch.4/"
mesquite_data <- read_rdump(paste0(ROOT, "mesquite.data.R"))
mesquite_data$weight <- mesquite_data$weight / 100
mesquite_model <- stan_model(paste0(ROOT, "mesquite.stan"))
ols <- lm(weight ~ diam1 + diam2 + canopy_height + total_height +
            density + group, data = mesquite_data)
MCMC <- sampling(mesquite_model, data = mesquite_data)
MF <- vb(mesquite_model, data = mesquite_data, algorithm = "meanfield")
FR <- vb(mesquite_model, data = mesquite_data, algorithm = "fullrank")

This plain parameterization does not work particularly well (especially for fullrank)

cbind(OLS = coef(ols), MCMC = summary(MCMC)[[1]][1:7,1], 
      MF = summary(MF)[[1]][1:7,1], FR = summary(FR)[[1]][1:7,1])

                    OLS      MCMC         MF          FR
(Intercept)   -7.285929 -7.274599 -6.8556038 -1.48931304
diam1          1.896690  1.880635  2.2193935  1.21671570
diam2          3.714621  3.746621  3.3493838  2.28108869
canopy_height  3.556653  3.597312  2.1498317  0.02911905
total_height  -1.017325 -1.061082 -0.6275955 -0.13897944
density        1.312542  1.310747  1.2938405  1.50406152
group         -3.632951 -3.651722 -3.4180468 -1.81033995

But with the QR option in stan_glm()

MF <- stan_glm(weight ~ diam1 + diam2 + canopy_height + total_height +
                density + group, data = mesquite_data, prior = NULL,
               prior_intercept = NULL, prior_ops = prior_options(scaled = FALSE),
              algorithm = "meanfield", QR = TRUE)
FR <- stan_glm(weight ~ diam1 + diam2 + canopy_height + total_height +
                 density + group, data = mesquite_data, prior = NULL,
               prior_intercept = NULL, prior_ops = prior_options(scaled = FALSE),
               algorithm = "fullrank", QR = TRUE)

it does better

cbind(OLS = coef(ols), MCMC = summary(MCMC)[[1]][1:7,1], 
      MF = coef(MF), FR = coef(FR))
                    OLS      MCMC        MF         FR
(Intercept)   -7.285929 -7.274599 -7.575135 -6.4010700
diam1          1.896690  1.880635  2.327150  1.2509559
diam2          3.714621  3.746621  3.258451  3.7515768
canopy_height  3.556653  3.597312  3.493274  3.6575743
total_height  -1.017325 -1.061082 -0.770157 -0.9841014
density        1.312542  1.310747  1.209084  1.3232964
group         -3.632951 -3.651722 -3.585129 -3.7281053

Still, it is worrying that ADVI can be so sensitive to things like this.

Ben

Alp Kucukelbir

unread,

Dec 8, 2015, 10:34:48 AM12/8/15

to stan...@googlegroups.com, David Blei

what does the QR option in stan_glm do precisely?

it parameterizes the stan model with the QR of the X matrix?

Ben Goodrich

unread,

Dec 8, 2015, 1:20:37 PM12/8/15

to stan development mailing list, david...@columbia.edu

On Tuesday, December 8, 2015 at 10:34:48 AM UTC-5, Alp Kucukelbir wrote:

what does the QR option in stan_glm do precisely?

it parameterizes the stan model with the QR of the X matrix?

Right, it substitutes Q in place of X when fitting and afterward pre-multiplies the coefficients in the orthogonal Q-space by the inverse of R to get coefficients in X-space.

Ben

Reply all

Reply to author

Forward