stochastic MAP in stan

David Blei

unread,

Mar 26, 2013, 6:20:36 PM3/26/13

to stan-...@googlegroups.com

stanimaniacs,

thanks to the stan development team, implementing stochastic MAP

estimation in stan is easy. there was a little trickiness to getting

it to work, so i thought i'd share.

best

dave

---

you'll need

- a subsample(b) function that grabs a subsample of size b from your

data set

- a stepsize(t) function that gets the t-th stepsize. (to start, you

can even just choose a constant stepsize.)

here is a skeleton of the program:

# model_filename = stan model

# subsample(b) = fun that returns a sample of b data points from the data

# n = data set size

# batch_size = size for each subsample to analyze on each iteration

# niter = number of iterations

# stepsize(t) = fun that returns the stepsize at iteration t

# model = (opt) a pre-compiled model

stochastic.map <- function(model_filename, subsample, n, batch_size,

niter, stepsize, model = NULL)

{

# compile the model (if needed) and initial parameters.

# here, parameters are initialized to zero.

if (is.null(model))

{

dummy_subsample <- sample.documents(data, batch_size)

model <- stan(model_filename, data = dummy_subsample, chains=0)

}

nparams <- sum(unlist(model@par_dims))

params <- rep(0, nparams)

grad_scale <- n / batch_size

# run stochastic gradient ascent by subsampling a batch, computing a

# noisy gradient, and taking a step. at each point, compute the

# the per-data log probability as a score to track.

score <- numeric(niter)

for (t in 1:niter)

{

batch <- subsample(batch_size)

iter_model <- stan(fit = model, data = batch, chains = 0)

grad <- grad_scale * grad_log_prob(iter_model, params)

params <- params + stepsize(t) * grad

score[t] <- log_prob(iter_model, params) / batch_size

}

# return the model object (in case we want to run again)

# the params vector

# the score trajectory

list(model = model, params = params, score = score)

}

Daniel Lee

unread,

Mar 26, 2013, 6:41:44 PM3/26/13

to stan-...@googlegroups.com

Awesome!

--
You received this message because you are subscribed to the Google Groups "stan users mailing list" group.
To unsubscribe from this group and stop receiving emails from it, send an email to stan-users+...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Daniel Lee

unread,

Mar 26, 2013, 6:49:49 PM3/26/13

to stan-...@googlegroups.com

Dave, thanks for putting together stochastic MAP. What's really neat is how clean it is, once Stan has abstracted away the model and the gradients. It reads like pseudo-code.

For users that are diving into Stan at this level, if you have suggestions / requests /improvements for the API, please feel free to bring it up.

Daniel

Andrew Gelman

unread,

Mar 26, 2013, 6:58:28 PM3/26/13

to stan-...@googlegroups.com

Once we've implemented black box VB, MAP will be a thing of the past.

Jiqiang Guo

unread,

Mar 26, 2013, 7:49:59 PM3/26/13

to stan-...@googlegroups.com

Thanks for sharing. Just one minor comment for getting the number of parameters. Please see below.

On Tue, Mar 26, 2013 at 6:20 PM, David Blei <david...@gmail.com> wrote:

This is fine for some models. But log_prob and grad_log_prob are defined on the unconstrained parameter space. The number of unconstrained parameters might be different from what is obtained here. Function "get_num_upars" can be used instead, that is

nparams <- get_num_upars

params <- rep(0, nparams)
grad_scale <- n / batch_size

# run stochastic gradient ascent by subsampling a batch, computing a
# noisy gradient, and taking a step. at each point, compute the
# the per-data log probability as a score to track.

score <- numeric(niter)
for (t in 1:niter)
{
batch <- subsample(batch_size)
iter_model <- stan(fit = model, data = batch, chains = 0)

grad <- grad_scale * grad_log_prob(iter_model, params)
params <- params + stepsize(t) * grad
score[t] <- log_prob(iter_model, params) / batch_size
}

# return the model object (in case we want to run again)
# the params vector

# the score trajectory

list(model = model, params = params, score = score)
}

David Blei

unread,

Mar 27, 2013, 6:00:32 PM3/27/13

to stan-...@googlegroups.com, gel...@stat.columbia.edu

hi andrew

i agree completely. if we had black box variational inference, i would never use MAP.