Re: [stan-users] Discrete parameters...again!

Bob Carpenter

unread,

Feb 3, 2016, 12:47:24 PM2/3/16

to stan...@googlegroups.com, Andrew Gelman

[moved to stan-dev]

This is getting a bit esoteric for the users' list.
What's the parameter labeling problem?

- Bob

> On Feb 3, 2016, at 11:10 AM, Dustin Tran <dustinv...@gmail.com> wrote:
>
> I also recall discussion at some point of continuous relaxation stuff with HMC. Right, even with VI it’s easy to do discrete parameters. If only we can resolve the parameter labelling problem!
>
> Dustin
>> On Feb 3, 2016, at 2:17 AM, Ole Petter Hansen <olepetter...@outlook.com> wrote:
>>
>> Ok,well now you know that discrete-parameter people (at least one) would like to move over to Stan - even if sampler is not better than BUGS.
>>
>> --
>> You received this message because you are subscribed to the Google Groups "Stan users mailing list" group.
>> To unsubscribe from this group and stop receiving emails from it, send an email to stan-users+...@googlegroups.com.
>> To post to this group, send email to stan-...@googlegroups.com.
>> For more options, visit https://groups.google.com/d/optout.
>
>
> --
> You received this message because you are subscribed to the Google Groups "Stan users mailing list" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to stan-users+...@googlegroups.com.
> To post to this group, send email to stan-...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

Dustin Tran

unread,

Feb 3, 2016, 1:35:09 PM2/3/16

to stan...@googlegroups.com, Andrew Gelman

This work in progress https://github.com/stan-dev/stan/wiki/Parameter-Labeling-Specification :)

You received this message because you are subscribed to the Google Groups "stan development mailing list" group.
To unsubscribe from this group and stop receiving emails from it, send an email to stan-dev+u...@googlegroups.com.

Bob Carpenter

unread,

Feb 3, 2016, 2:14:25 PM2/3/16

to stan...@googlegroups.com

But is there anything you need that you can't get from the outside?
I don't mean easily, I mean in principle.

I've been very resistant to changing the language itself.

As far as I know, for instance, you should be able to change
the data in a model directly, so the online stuff shouldn't be
an obstacle either.

We'll add "#fullrank" markup in the models over my dead body. If
we want to mark some subset of variables, we need to do it more
generically, not one ad hoc algorithm at a time.

The other markup I believe you guys mentioned is having some
way to mark parameters that were data-specific for some kind of
fancier streaming.

- Bob

Dustin Tran

unread,

Feb 3, 2016, 2:55:44 PM2/3/16

to stan...@googlegroups.com

That link is a reference not to show a solution but the problem.

Dustin

Bob Carpenter

unread,

Feb 3, 2016, 3:07:24 PM2/3/16

to stan...@googlegroups.com

We can bring it up at the next meeting. I'm still not sure
what functionality you need from the model.

You have this as an example command:

./foo advi meanfield=“w” fullrank=“alpha, sigma”

which makes me think it's something you want to control from the
outside. This makes sense to me as the inference isn't something
I like to think of as part of the model (though I know we often
have to program the model in Stan with the inference engine in mind).

But then in a section labeled "(archives)" you have annotations in
the model itself

parameters {
vector<lower=0,#fullrank>[D] alpha;
real<lower=0,#fullrank> sigma;
vector<#meanfield>[D] w;
}

This won't work because "#" starts a line comment.

If we were going to label in the models themselves, the only feasible
syntax I can imagine would follow Java annotations (what Doxygen uses
for C++ doc comments, too):

parameters {
@fullrank
vector<lower=0>[D] alpha;

@fullrank
real<lower=0> sigma;

@meanfield
vector[D] w;
}

But what I really want to know is whether you can evaluate the algorithm
without us doing this to make sure it works before we modify the language
and model-generating code.

Having said all this, this degree of labeling's not a big deal.

But isn't there also labeling where data gets matched to parameters somehow
for online inference?

- Bob

Dustin Tran

unread,

Feb 3, 2016, 3:34:13 PM2/3/16

to stan...@googlegroups.com

Sure we can bring it up. The problem’s not so much that it can’t be done outside of Stan. For example, with GMO we currently use two stan programs for the same model in which we plug in the imputed values that we marginalize over during the E-step. The problem is how we can do these things in “vanilla" Stan, where someone wants to do MML by specifying a single stan program and running a stan command to do find the estimates of the hyperparameters.

Dustin

Bob Carpenter

unread,

Feb 3, 2016, 3:47:24 PM2/3/16

to stan...@googlegroups.com

OK, we can discuss it on the list if you don't want to wait.

Are the two Stan programs because you want only a subset
of derivatives at any one point and doing them all is
too inefficient? That's not reflected in the Wiki. And
every time Andrew comes up with a new MML algorithm, he
says the expensive step is computing derivatives for the low-level
paameters, where the extra cost of the high-level parameters is
negligible.

Sorry to be so obtuse here, but we've had a stream of
such requests over the last few years and I have trouble
keeping straight the requests to modify the language and the
algorithms proposed.

I just want to make sure we have an algorithm we're going
to use or a general markup technique before changing Stan's
underlying languauge to reflect features of inference.

- Bob

Dustin Tran

unread,

Feb 3, 2016, 4:42:50 PM2/3/16

to stan...@googlegroups.com

Oh no, it’s fine we can wait until the next meeting. I have no clear solution or proposal either. I just wanted to declare that this is a problem to be solved and moreover these things all relate in general to some notion of labeling parameters.

Dustin

Bob Carpenter

unread,

Feb 3, 2016, 4:47:25 PM2/3/16

to stan...@googlegroups.com

What I'm looking for is the set of methods you're going to need
in the C++ model class on the annotations in the Stan program.

I keep guessing that it's a subset of derivatives, with other
parameters set to constants, but I'm really not sure.

- Bob

Andrew Gelman

unread,

Feb 3, 2016, 6:09:49 PM2/3/16

to stan...@googlegroups.com

GMO requires 2 Stan programs: the full program (i.e., the original model being specified, in all its joint-distribution glory) and the inner program which we call when we are finding the conditional mode and more generally working with the conditional distributions of local parameters. The local program looks just like the full program except that in the local program, the hyperparameters have been moved from the parameters block to the data block.
A

Ben Goodrich

unread,

Feb 6, 2016, 12:57:29 AM2/6/16

to stan development mailing list

On Wednesday, February 3, 2016 at 6:09:49 PM UTC-5, Andrew Gelman wrote:

GMO requires 2 Stan programs: the full program (i.e., the original model being specified, in all its joint-distribution glory) and the inner program which we call when we are finding the conditional mode and more generally working with the conditional distributions of local parameters. The local program looks just like the full program except that in the local program, the hyperparameters have been moved from the parameters block to the data block.

We have the fixed param functionality in Stan already. Can we generalize that to "partially fixed param" for GMO purposes?

Bob Carpenter

unread,

Feb 6, 2016, 6:54:28 PM2/6/16

to stan...@googlegroups.com

I was trying to ask Dustin, but I still don't understand
why you need to do this. Is it to make the derivatives
cheaper in the case where the hyperparameters are not data?

How is the I/O going to be done in that second Stan program?

Is this all in R, or what?

- Bob

Bob Carpenter

unread,

Feb 6, 2016, 6:57:28 PM2/6/16

to stan...@googlegroups.com

With the current Stan model function, all the parameters
are "fixed" when the log density is evaluated. And you
get derivatives for all of them. Is the issue I/O from
the outside or wanting more efficient subsets of derivatives
at second order (because the cost isn't that much higher
at first order).

So what I still can't figure out is what people want
in the way of functionality in the model. If someone could
just lay out the methods they want, it'd be a huge
help for me. I'm just not sure where the instantiations
are coming in and where derivatives are needed and of what
order.

- Bob

Andrew Gelman

unread,

Feb 6, 2016, 7:51:54 PM2/6/16

to stan...@googlegroups.com

The reason we need 2 programs is that in the inner loop of GMO we need to compute the conditional mode of the local parameters given the hyperparameters, or we need to draw from the conditional posterior distribution of the local parameters, given the hyperparameters. Stan does not allow the user to compute a conditional mode or draw from conditional distributions.

Dustin Tran

unread,

Feb 6, 2016, 7:54:08 PM2/6/16

to stan...@googlegroups.com

GMO finds the mode of the marginal posterior density, p(phi | y) = int p(phi, alpha | y) d alpha. There are two things it uses from Stan:

1. grad_{phi} log p(phi, alpha | y): we run grad_log_prob on the Stan program with phi and alpha in its parameter block, and subset to the gradients with respect to phi
2. get an approximate distribution, g(alpha) \approx p(alpha | phi, y): we run an inference algorithm (optimizing, advi, sampling) on the Stan program with alpha in its parameter block and phi in its data block

The two requirements must use a Stan program with phi in its parameter block and a Stan program with phi in its data block respectively. For GMO to be in Stan natively, it also requires the user to specify which parameters to optimize over, i.e., which ones are phi and one which ones are alpha. We manually do this at the moment by the process of the two Stan programs.

In terms of solutions, a “partially fixed param” block as Ben suggests is possible. So is parameter labeling, where the user specifies a file like they do for inits.

Ben Goodrich

unread,

Feb 6, 2016, 9:34:33 PM2/6/16

to stan development mailing list

On Saturday, February 6, 2016 at 7:54:08 PM UTC-5, Dustin Tran wrote:

In terms of solutions, a “partially fixed param” block as Ben suggests is possible. So is parameter labeling, where the user specifies a file like they do for inits.

I don't think we need another block in a Stan program. We already have the "fixed_param" method, which either takes or draws initial values for all the parameters and them leaves them fixed for the duration while it executes transformed parameters, model, generating quantities. If we could generalize that to a "partially_fixed_param" method where the parameters to be held fixed were specified at runtime, that would be a lot easier for people to use.

Ben

Michael Betancourt

unread,

Feb 7, 2016, 5:34:44 AM2/7/16

to stan...@googlegroups.com

I think the underlying problem here is not how to implement these

algorithms but rather how to generically identical the two sets of

parameters on the C++ side.

Bob Carpenter

unread,

Feb 7, 2016, 3:18:46 PM2/7/16

to stan...@googlegroups.com

If you have a parameter vector theta = (alpha, beta)
in Stan, and want to optimize beta holding alpha
fixed, that can be done by simply changing what gets
sent to the optimizer on the C++ side.

Or if you're doing it in R, you can send it in to an optimizer.

It does require some parameter fiddling, but it can be
done automatically.

- Bob

Bob Carpenter

unread,

Feb 7, 2016, 3:32:45 PM2/7/16

to stan...@googlegroups.com

I still can't put the pieces together from the algorithm
sketches.

If you have that sense of what we'll need, I need to turn it into
an API spec for what the model needs to provide. Then we need to
figure out if there need to be language changes or not.

So far, the only proposal I've seen is add a method that
looks like this to the generated C++ class for a Stan program:

/**
* Return the sequence of parameter names that are labeled with
* the specified tag in the order they appear in the Stan
* program declaration.
*
* @param[in] tag Tag for parameters.
* @return Parameters labeled with tag.
*/
vector<string> parameters(const string& tag) const;

There's already such a method to return all the parameters.

We can do that all from the outside without changing the model class
by having the user pass in the sequence of parameter names.

If that's all you need, it doesn't need to be done on the inside
like this. If you need more than this, can you say what it is
in terms of what functions the C++ class and/or external libraries
need to provide?

Doing this and allowing a syntax of:

@foo
real x;

@bar @foo
real y;

real z;

would return:

{ "x", "y" } for "foo"

{ "y" } for "bar"

- Bob

Andrew Gelman

unread,

Feb 7, 2016, 4:07:19 PM2/7/16

to stan...@googlegroups.com

I didn’t know that. Could you let me know what the R code is which would allow it to run Stan-optimize, setting certain paramemters fixed and optimizing the others? Can R also call Stan-NUTs in this way? I thought I asked you about this 2 yrs ago and you said that it was _not_ possible to run Stan and do conditional optimization or conditional sampling.
A

Bob Carpenter

unread,

Feb 7, 2016, 4:40:45 PM2/7/16

to stan...@googlegroups.com

I don't know all the R commands, but the basic idea
is that you can use RStan to compile a Stan program
and give it data (y) to get a log density function

my_model_lpdf(theta);

which returns

const + log p(theta | y)

for a fixed y.

Now suppose

theta = c(alpha, beta)

which it will be if you define the alpha parameters first and
then the beta parameters. And now you want just a function
of beta with alpha fixed,

f(beta) = const + log p(alpha, beta | y)

R gives you higher order functions, so I define

my_myarginal_lpdf <- function(alpha) {
marginal_log_density <- function(beta) {
return my_model_log_density(c(alpha, beta));
}
return marginal_log_density;
}

and note that

my_marginal_lpdf(alpha)(beta) = my_model_lpdf(c(alpha,beta));

so that you can plug my_marginal_lpdf(alpha) into whatever
optimization or other system you want.

It's not set up to sample that way, but you should be able to
plug it into an optimizer.

- Bob

Michael Betancourt

unread,

Feb 7, 2016, 4:54:43 PM2/7/16

to stan...@googlegroups.com

I’m pretty sure all that is needed to implement these algorithms is

my_model_lpdf(alpha, beta);

If the ordering of the parameters is correct then this can be done
with the current methods,

theta = c(alpha, beta); // or the closest C++ equivalent
my_model_lpdf(theta);

The problem is that the algorithms have no idea if the model
was written with the parameters in the correct order. So the idea
would be that the parameter labels induce functions like

my_model_indices(“label_name”)

which could then be used to slice the parameters/gradients
appropriately.

Andrew Gelman

unread,

Feb 7, 2016, 5:26:42 PM2/7/16

to stan...@googlegroups.com

But that just seems silly. I want to use Stan’s internal optimizer, not whatever inferior optimizer R happens to have.

Ben Goodrich

unread,

Feb 7, 2016, 5:32:17 PM2/7/16

to stan development mailing list

On Sunday, February 7, 2016 at 5:26:42 PM UTC-5, Andrew Gelman wrote:

But that just seems silly. I want to use Stan’s internal optimizer, not whatever inferior optimizer R happens to have.

As best we have been able to tell, LBFGS in Stan is inferior or at least requires a lot of fiddling with settings to get as close to the optimum.

Andrew Gelman

unread,

Feb 7, 2016, 5:33:40 PM2/7/16

to stan...@googlegroups.com

LBFGS in Stan is inferior to what?

Ben Goodrich

unread,

Feb 7, 2016, 6:22:46 PM2/7/16

to stan development mailing list, gel...@stat.columbia.edu

On Sunday, February 7, 2016 at 5:33:40 PM UTC-5, Andrew Gelman wrote:

LBFGS in Stan is inferior to what?

Optimizers that are in R. We thought we were going to be able to verify that all of rstanarm was working by doing optimization and taking off the priors. But it was very frustrating to get LBFGS from Stan to reproduce the same answers. You had to mess with the settings a lot.

Ben

Andrew Gelman

unread,

Feb 7, 2016, 6:32:24 PM2/7/16

to stan...@googlegroups.com, Marcus Brubaker, Dustin Tran

I’m surprised, because I thought Dustin was telling me that R’s optim was giving problems. I’m cc-ing Marcus. This seems like something we could work out.

A

Dustin Tran

unread,

Feb 7, 2016, 8:04:41 PM2/7/16

to stan...@googlegroups.com

It seems like this is another way to implement GMO in R, and not so much about how to get into it Stan internally.

Also, we want to do inference with the function my_marginal_lpdf(alpha) in Stan itself: certainly we could use R’s optimizer and maybe after some additional fiddling that would work; however, we’re also using ADVI and NUTS to infer that marginal conditional density. Ideally we hope to apply any inference algorithm that comes into Stan in the future too. From what I understand, this function is written in R and Stan can’t use it.

> If you have that sense of what we'll need, I need to turn it into
> an API spec for what the model needs to provide. Then we need to
> figure out if there need to be language changes or not.
>
> So far, the only proposal I've seen is add a method that
> looks like this to the generated C++ class for a Stan program:

I’m not sure if I followed, but we do want this to be done “on the inside”, so that we can go ahead and start implementing GMO in Stan.

Dustin

Dustin Tran

unread,

Feb 7, 2016, 8:09:02 PM2/7/16

to Andrew Gelman, stan...@googlegroups.com, Marcus Brubaker

R’s optim—nlm, specifically—did not work very well for the outer loop of GMO, in which the parameters phi are updated with stochastic gradients, \nabla_{phi} log p(phi).

I haven’t tried R’s optim for doing a Laplace approximation, g(alpha) \approx p(alpha | phi). Stan-optimize seems to be doing well so I haven’t tried any other optimization routine. Maybe Stan-optimize is bad at getting the right posterior mode estimate as Ben says, but since GMO uses as an importance sampler, it doesn’t care so much if it didn’t fully converge. (a guess)

Dustin

Andrew Gelman

unread,

Feb 7, 2016, 8:32:37 PM2/7/16

to stan...@googlegroups.com

Yes, I’d much prefer to fix Stan’s internal optimizer (if Ben’s right that it has a problem) and then do my optimizing in Stan. I really don’t want to be writing R programs that grab log densities from Stan and then run them through R’s optimizer. What a mess. Having 2 Stan programs is much less ugly than that!

Bob Carpenter

unread,

Feb 7, 2016, 10:49:51 PM2/7/16

to stan...@googlegroups.com

I take it you don't want to call the Stan C++ code
directly, but want to use rstan::optimizing().
Which means you need a compiled Stan model with data
associated.

That's going to be inefficient reading it in and out
if you have to do it every time, because most of the
data is real data that stays fixed.

It sounds like you need some currying (as in Haskell B.,
not stew: https://en.wikipedia.org/wiki/Currying ).

Suppose we have a Stan program populated with data
it defines

log q(theta | y)

the log unnormalized posterior for fixed data y and parameters
theta.

Then what I think you want to do is pick out a subset of the
parameters of theta, let's say alpha, so that

theta = (alpha, beta).

And what you want is to have is the curried function f,
defined by

f = lambda alpha . lambda beta . log q(alpha, beta | y).

where that's the lambda abstraction operator so popular
with the functional programming set ( https://en.wikipedia.org/wiki/Lambda_calculus ).

That'll let you plug in some actual values alpha' for alpha
and get a new function g defined by:

g = f(alpha') = lambda beta . log q(alpha', beta | y)

Now if we now take values beta' for beta and apply g, we get

g(beta') = log q(alpha', beta' | y).

The key is that we can create the function g that is
differentiable so we can plug it into our optimizers.

You are doing that the hard way with two Stan programs. I
was suggesting the above functional abstraction, which R
supports, and seems much much cleaner to me if you're just
experimenting with optimization --- but it sounds like you need
the rest of Stan's solvers, too (do we have a general name
for optimization, MCMC, and VI?)

When it comes time to do it in C++, we'll essentially roll
our own bind operation, and provide a function that'll return
another function, which will look something like this:

stan_model marginalize(const vector<string>& labels,
const vetor<double>& alpha);

where in the returned stan_model, you get something
that implements the model base class and concepts.

I think that should be sufficient, and it won't involve changing
the modeling language syntax. But it will involve a somewhat
expensive run-time bind operation (mainly for weaving the parameters
back together when we do g(beta')).

This could be precompiled if we had labels in the program itself. But
I'd really rather stay away from that if at all possible and
let the program just define the density.

Whew.

- Bob

Andrew Gelman

unread,

Feb 7, 2016, 11:12:40 PM2/7/16

to stan...@googlegroups.com, Dustin Tran

No no no no no! I don’t want to use rstan:: anything! I want to do GMO entirely within Stan, using Stan’s optimizers, Stan’s samplers, etc. But Stan doesn’t currently allow conditional optimization or sampling, so we’re hacking it within R. Your suggestion, to use optim() in R, I don’t want to do, because we have zero control over optim(), it’s a black box, and that gives us problems.

Daniel Lee

unread,

Feb 8, 2016, 12:40:40 AM2/8/16

to stan-dev mailing list

On Sun, Feb 7, 2016 at 11:12 PM, Andrew Gelman <gel...@stat.columbia.edu> wrote:

No no no no no! I don’t want to use rstan:: anything! I want to do GMO entirely within Stan, using Stan’s optimizers, Stan’s samplers, etc. But Stan doesn’t currently allow conditional optimization or sampling, so we’re hacking it within R. Your suggestion, to use optim() in R, I don’t want to do, because we have zero control over optim(), it’s a black box, and that gives us problems.

1. rstan::optimizing() calls Stan's C++ lbfgs implementation (that Marcus wrote from scratch).

2. We need to make the core of Stan better to make things like this easier. This is what Bob's been saying all a long. We would be in better shape if we would stop tacking on more features and reduced it to a set of things we can expose and expose well. The optimization code isn't in great shape to be exposed. If you want that to be available to be used like you do, we really need to prioritize getting it working and free up some resources to take care of it. Right now, our own implementation of lbfgs (and bfgs) feels too much like a black box that gives us problems. And no one actively working on the code base has really made it feel less like a black box. Right now, I trust Stan's optimization as much as I trust R's optim().

I know I'm late to this thread, but I have some thoughts on alternating discrete sampling... it involves multiple warmup phases, so it won't be practical. I'll table it until the next time it pops up. And how did we merge two separate discussions on this thread?

Reply all

Reply to author

Forward