stochastic MAP in stan

127 views
Skip to first unread message

David Blei

unread,
Mar 26, 2013, 6:20:36 PM3/26/13
to stan-...@googlegroups.com
stanimaniacs,

thanks to the stan development team, implementing stochastic MAP
estimation in stan is easy. there was a little trickiness to getting
it to work, so i thought i'd share.

best
dave

---

you'll need

- a subsample(b) function that grabs a subsample of size b from your
  data set

- a stepsize(t) function that gets the t-th stepsize. (to start, you
  can even just choose a constant stepsize.)

here is a skeleton of the program:

# model_filename = stan model
# subsample(b)   = fun that returns a sample of b data points from the data
# n              = data set size
# batch_size     = size for each subsample to analyze on each iteration
# niter          = number of iterations
# stepsize(t)    = fun that returns the stepsize at iteration t
# model          = (opt) a pre-compiled model

stochastic.map <- function(model_filename, subsample, n, batch_size,
                           niter, stepsize, model = NULL)
{
  # compile the model (if needed) and initial parameters.
  # here, parameters are initialized to zero.

  if (is.null(model))
  {
    dummy_subsample <- sample.documents(data, batch_size)
    model <- stan(model_filename, data = dummy_subsample, chains=0)
  }
  nparams <- sum(unlist(model@par_dims))
  params <- rep(0, nparams)
  grad_scale <- n / batch_size

  # run stochastic gradient ascent by subsampling a batch, computing a
  # noisy gradient, and taking a step.  at each point, compute the
  # the per-data log probability as a score to track.

  score <- numeric(niter)
  for (t in 1:niter)
  {
    batch <- subsample(batch_size)
    iter_model <- stan(fit = model, data = batch, chains = 0)
    grad <- grad_scale * grad_log_prob(iter_model, params)
    params <- params + stepsize(t) * grad
    score[t] <- log_prob(iter_model, params) / batch_size
  }

  # return the model object (in case we want to run again)
  #        the params vector
  #        the score trajectory

  list(model = model, params = params, score = score)
}

Daniel Lee

unread,
Mar 26, 2013, 6:41:44 PM3/26/13
to stan-...@googlegroups.com
Awesome!



--
You received this message because you are subscribed to the Google Groups "stan users mailing list" group.
To unsubscribe from this group and stop receiving emails from it, send an email to stan-users+...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

Daniel Lee

unread,
Mar 26, 2013, 6:49:49 PM3/26/13
to stan-...@googlegroups.com
Dave, thanks for putting together stochastic MAP. What's really neat is how clean it is, once Stan has abstracted away the model and the gradients. It reads like pseudo-code.

For users that are diving into Stan at this level, if you have suggestions / requests /improvements for the API, please feel free to bring it up.



Daniel

Andrew Gelman

unread,
Mar 26, 2013, 6:58:28 PM3/26/13
to stan-...@googlegroups.com
Once we've implemented black box VB, MAP will be a thing of the past.

Jiqiang Guo

unread,
Mar 26, 2013, 7:49:59 PM3/26/13
to stan-...@googlegroups.com
Thanks for sharing.  Just one minor comment for getting the number of parameters.  Please see below.  

On Tue, Mar 26, 2013 at 6:20 PM, David Blei <david...@gmail.com> wrote:
This is fine for some models. But log_prob and grad_log_prob are defined on the unconstrained parameter space.  The number of unconstrained parameters might be different from what is obtained here.  Function "get_num_upars" can be used instead, that is

nparams <- get_num_upars

  params <- rep(0, nparams)
  grad_scale <- n / batch_size

  # run stochastic gradient ascent by subsampling a batch, computing a
  # noisy gradient, and taking a step.  at each point, compute the
  # the per-data log probability as a score to track.

  score <- numeric(niter)
  for (t in 1:niter)
  {
    batch <- subsample(batch_size)
    iter_model <- stan(fit = model, data = batch, chains = 0)
    grad <- grad_scale * grad_log_prob(iter_model, params)
    params <- params + stepsize(t) * grad
    score[t] <- log_prob(iter_model, params) / batch_size
  }
 
  # return the model object (in case we want to run again)
  #        the params vector
  #        the score trajectory

  list(model = model, params = params, score = score)
}

David Blei

unread,
Mar 27, 2013, 6:00:32 PM3/27/13
to stan-...@googlegroups.com, gel...@stat.columbia.edu
hi andrew

i agree completely.  if we had black box variational inference, i would never use MAP.

best
dave
Reply all
Reply to author
Forward
0 new messages