replace sampling notation; add proportionality control

97 views
Skip to first unread message

Bob Carpenter

unread,
Nov 13, 2015, 4:46:53 PM11/13/15
to stan...@googlegroups.com
After the meeting a couple weeks ago, I've been thinking
about syntax. I think we want two simultaneous changes:

CHANGE I
------------------------------

Replace

y ~ normal(mu, sigma);

with

ld << normal(y | mu, sigma);

where "ld" stands for "log density". And have it compute up to
a proportion by default. The fact that there
is no "_log" suffix means we won't break backward compatiblity,
because we don't now have functions named normal(), etc.
Everything's always on the log scale --- we're doing computation
here, not theory, so I don't even think we need the non-log form.


CHANGE II
----------------------------------

For versions of the densities that don't drop constants,
we can use:

ld << normal_full(y | mu, sigma);

(I don't love "_full" here, but "_normalized" seems too verbose
and "_norm" seems too terse.)


Now some alternatives:


Alt 1. Named arguments, as in

ld << normal(y | mu, sigma, normalize=true);

or something like that. I'm not a fan --- too verbose.
For an even more R-like experience, we'd postfix the pipe
and wrap it in % signs :-)

normal(y, mu, sigma, propto=TRUE) %>% ld;

Not serious, of course.


Alt 2. Non-named arguments:

ld << normal(y | mu, sigma, true);

I'm going to be Goldilocks here and say this is too terse;
it won't be clear what the "true" is signalling.


Alt 3. Remove "ld" from LHS as there's no alternative:

<< normal(y | mu, sigma);

OK unless we want multiple accumulators. And it looks funny
to a C++ programmer used to seeing "<<" as a binary operator.
For what it's worth, I don't see the point of multiple accumulators,
because I think we'd want to block out different executable chunks
rather than running them all and then just looking at one output variable.


Alt 4. Keep normal_log, but break backward compatibility by
making normal_log() drop constants and then add normal_full_log().
Given that we don't have to break backward compatiblity, let's not.

- Bob



Ben Goodrich

unread,
Nov 13, 2015, 5:04:28 PM11/13/15
to stan development mailing list
On Friday, November 13, 2015 at 4:46:53 PM UTC-5, Bob Carpenter wrote:
After the meeting a couple weeks ago, I've been thinking
about syntax.  I think we want two simultaneous changes:

I wasn't at that meeting, but can we consider an alternative with a different operator as indicating whether to exclude or include the constants? Like

ld << normal(y | mu, sigma);  // drops constants
ld
<<< normal(y | mu, sigma); // keeps constants

I'm not wedded to the <<< but something like that.

Ben

Bob Carpenter

unread,
Nov 14, 2015, 1:53:55 AM11/14/15
to stan...@googlegroups.com
We could do that --- it would mean that we wouldn't
be able to control proportionality using them as functions.

In general, I prefer neat compositional semantics where
each expression can be evaluated independently; right now,
the ~ doesn't do that, but everything else does.

- Bob
> --
> You received this message because you are subscribed to the Google Groups "stan development mailing list" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to stan-dev+u...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

Ben Goodrich

unread,
Nov 14, 2015, 10:50:15 AM11/14/15
to stan development mailing list
On Saturday, November 14, 2015 at 1:53:55 AM UTC-5, Bob Carpenter wrote:
We could do that --- it would mean that we wouldn't
be able to control proportionality using them as functions.

Is that a big deal? I would think that if someone calls normal(y|mu,sigma) somewhere in their Stan program that is not in a
ld << normal(y|mu, sigma);
context, then we would need to evaluate it with the constants.

Ben


Marco Inacio

unread,
Nov 14, 2015, 11:37:25 AM11/14/15
to stan...@googlegroups.com
There was a case in marginalization of discrete parameters that I needed propto and it wasn't in that context. I can elaborate if you wish.

Ben Goodrich

unread,
Nov 14, 2015, 12:06:53 PM11/14/15
to stan development mailing list
On Saturday, November 14, 2015 at 11:37:25 AM UTC-5, Marco Inacio wrote:
There was a case in marginalization of discrete parameters that I needed propto and it wasn't in that context. I can elaborate if you wish.

OK. I think I would still prefer some decoration (like a "prime") rather than variations on the function name or arguments. Maybe

foo <- normal(y|mu,sigma); // includes constants
bar
<- normal(y|mu,sigma)'; // excludes constants
ld << normal(y|mu,sigma); // does whatever y ~ normal(mu, sigma) does now depending on the template parameter

Ben

Bob Carpenter

unread,
Nov 14, 2015, 1:02:53 PM11/14/15
to stan...@googlegroups.com
I was proposing arbitrary expressions on the RHS of <<
with the effect of adding their value or all the values in
their container to the log density accumulator.

What if someone did this:

ld << count[n] * normal(y[n] | mu, sigma);

Are you thinking that <<< would have scope over everything
on the right-hand side?

I think it's much easier to understand purely compositional
semantics where you can look at an expression in isolation
and figure out what it's value is assuming you know the
value of the variables used in it.

To take a relevant but neutral example, when we write

y ~ normal(mu, sigma);

we don't evaluate "normal(mu, sigma)" and evaluate "y"
and put them together with the operator "~". It's holistic.
But it's confused users who think the right-hand side gets
evaluated as some kind of random-number generator.

In contrast, when we write

increment_log_prob(normal_log(y, mu, sigma));

then the embedded expression "normal_log(y, mu, sigma)" is
evaluated like any other function and then passed to the increment
statement by value. Perfectly compositional.

I was hoping that "<<" could be like increment_log_prob.

If we introduce "<<<" with some notion of controlling how embedded
expressions are evaluated, it gets trickier to doc and explain.

I think Ben's "<<<" suggestion would be workable to implement because
we have the scope when we generate the code for the expressions. But
it also makes all the code generation for expressions context-sensitive.

- Bob

Bob Carpenter

unread,
Nov 14, 2015, 1:10:53 PM11/14/15
to stan...@googlegroups.com

> On Nov 14, 2015, at 12:06 PM, Ben Goodrich <goodri...@gmail.com> wrote:
>
> On Saturday, November 14, 2015 at 11:37:25 AM UTC-5, Marco Inacio wrote:
> There was a case in marginalization of discrete parameters that I needed propto and it wasn't in that context. I can elaborate if you wish.
>
> OK. I think I would still prefer some decoration (like a "prime") rather than variations on the function name or arguments.

I completely agree, but hadn't been thinking along the lines
of symbols.

> foo <- normal(y|mu,sigma); // includes constants
> bar <- normal(y|mu,sigma)'; // excludes constants

I don't like "'" because we use it for transpose
and I'd rather have the decoration on the function name.

How about using "~" as a distribution prefix?

ld << ~normal(y | mu, sigma);

I don't think it'll conflict with anything at all, but
we'd need to limit "~" to built-in distribution functions.

> ld << normal(y|mu,sigma); // does whatever y ~ normal(mu, sigma) does now depending on the template parameter

I want the function to be an expression that can be evaluated
in any context. So whether we use a "~" prefix or not will determine
how that template parameter is instantiated (propto=true vs. propto=false).

- Bob

Ben Goodrich

unread,
Nov 14, 2015, 4:47:42 PM11/14/15
to stan development mailing list
On Saturday, November 14, 2015 at 1:10:53 PM UTC-5, Bob Carpenter wrote:

> On Nov 14, 2015, at 12:06 PM, Ben Goodrich <goodri...@gmail.com> wrote:
>
> On Saturday, November 14, 2015 at 11:37:25 AM UTC-5, Marco Inacio wrote:
> There was a case in marginalization of discrete parameters that I needed propto and it wasn't in that context. I can elaborate if you wish.
>
> OK. I think I would still prefer some decoration (like a "prime") rather than variations on the function name or arguments.

I completely agree, but hadn't been thinking along the lines
of symbols.  

> foo <- normal(y|mu,sigma); // includes constants
> bar <- normal(y|mu,sigma)'; // excludes constants

I don't like "'" because we use it for transpose
and I'd rather have the decoration on the function name.

How about using "~" as a distribution prefix?

I would like ~ if it were not implicated in the current

y ~ normal(mu, sigma);

syntax. If we want people to move away from the latter it is going to be confusing if we overload the ~.

The same could be said about overloading the ' but since a log-density returns a scalar, I don't know why anyone (who thinks about it) would think it is transposing something. The biggest problem with it is that it is hard to see.

Ben

Michael Betancourt

unread,
Nov 14, 2015, 4:51:13 PM11/14/15
to stan...@googlegroups.com
Firstly I think we should at least consider (I’m still not sure if it’s good or not)
to have the ld suffix on distributions themselves,

target << normal_ld(y | mu, sigma),

to emphasize what we’re adding. Also, ld as the accumulator is a bit odd,
but I can’t think of a better two-letter name off the top of my head. Maybe
“td” for "target distribution”?

And for the normalization, why don’t we just stray a bit from stream notation
and do something like

td <{ normal(y | mu, sigma); // no normalization
td <[ normal(y | mu, sigma); // normalization

It’s not obvious which is which, but they’ll both give the same fit so the worst
outcome of confusion is just slower performance but not a bad fit.

Ben Goodrich

unread,
Nov 15, 2015, 2:24:15 PM11/15/15
to stan development mailing list
 
Or, unless we already ban the leading underscore

foo <- normal(y | mu, sigma);  // includes constants
foo
<- _normal(y | mu, sigma); // excludes constants
ld
<< normal(y | mu, sigma);   // does whatever y ~ normal(mu, sigma) does now

Ben

Bob Carpenter

unread,
Nov 15, 2015, 8:41:14 PM11/15/15
to stan...@googlegroups.com
There are a bunch of related issues in play here. Sorry
for the epic email. You can cut to the chase at the bottom
for a concrete proposal.


1. Do we continue to mark the probability functions and log scale?
--------------------------------------------------------------

How do we write the function?

normal() : unmarked

normal_log() : marks scale

normal_ld() : marks scale and that it's a density

The normal_log form has the advantage of backward compatiblity.

I don't think the advantages of "_ld" outweigh the backward
compatiblity issues, but it's possible.


2. How to mark +/- normalizing constants?
------------------------------------------

The marker has to go directly on the function name, because
it changes how the function is evaluated, not what the result
is.

2.a. Prefix or postfix?

Despite my initial suggestion of "~" as a prefix, I think
postfix makes most sense for readability. I think it's
because we strongly key word recognition in reading off of
prefixes.

2.b. Mark the propto=true or propto=false case?

I think people will take the unmarked case as including
the normalizing constants. But they're rarely needed
and marking them makes almost all of our programs longer, which
in turns makes them harder to read even if they're more
explicit.

2.c. What symbol?

If we exclude the currently used symbols,

', #, ^, &, *, -, +, \, /, ~, ;, ., <, >, {, }, [, ], !, _

that leaves us with the following

@, $, %, `, ?, |

I want to reserve ? for the ternary operator, % if we add
a modulus operator, and | for probability notation and @ for
annotations. And the full-height ones look awfully heavy
as markers.

But it doesn't have to be just one symbol:

normal normal_ld
normal_q normal_ld_q
normal_prop normal_ld_prop
normal_propto normal_ld_propto
normal_o= normal_ld_o=

That "q" is BDA notation for an unnormalized density and
"o=" is my ASCII art for the propto symbol, but the = is
a dealbreaker in my opinion.

We could also add an argument, but if we don't name it it'll
just be opaque and whether it comes first or last it'll confuse
the other arguments.

We could also use a template notation like in C++:

normal<true> vs. normal<false>

It'd be easier to ignore their syntax content and use its
form in:

normal<propto> vs. normal



3. How to mark the syntax for incrementing.
--------------------------------------------

I really don't like the <[ or <{ things. I'm not even
sure why --- probably because they're not familiar operators.
So I think we should go with <<.

3.a One or two symbols for propto=true vs. false?

I don't think we should use two different symbols for the
proportional or not proportional case because we want to allow
general expressions on the right-hand side and this distinction
isn't a function of the right-hand side value.

3.b. Do we just use << or do we include a target of some kind?

<< normal(mu | y, sigma);

ld << normal(mu | y, sigma);

td << normal(mu | y, sigma);

I don't think we need "td" or "ld" on the left side, but I
could live with it. Even if we intend to add more later, the
general accumulator we can have a default case that is what
we add to now.


4. Do we separate out Jacobians?
---------------------------------

This including the normalization is still going to confuse people
because the target density is on the unconstrained scale and
includes the Jacobian for MCMC but not for optimization.

We could change the model interface so that it includes the Jacobian
term broken out from the main log density in the unconstrained
model.

I'm also not sure the constants are included in the Jacobians or
even if there are any.

We can deal with this as a separate issue, though.


PROPOSAL
==================================

I think minimal and backward-compatiblity perserving for function
notation is best:

<< normal_log(mu | y, sigma);

For the case that doesn't include constants, I like the version
that looks like an ordinary function and is spelled out:

<< normal_propto_log(mu | y, sigma);

I could live with putting "td" or "ld" on the LHS of "<<".

I could also live with dropping the "_log" or replacing it
with "_ld", though it's more work and more backward-compatiblity
breaking if we ever move it from deprecated to eliminated.

- Bob


Ben Goodrich

unread,
Nov 15, 2015, 9:21:34 PM11/15/15
to stan development mailing list
On Sunday, November 15, 2015 at 8:41:14 PM UTC-5, Bob Carpenter wrote:
I want to reserve ? for the ternary operator, % if we add
a modulus operator

According to the documentation, we already have a modulus operator.

 
PROPOSAL
==================================

I think minimal and backward-compatiblity perserving for function
notation is best:

  << normal_log(mu | y, sigma);

You mean

<< normal_log(y | mu, sigma);

right? I think

<< normal(y | mu, sigma);

would be better because some people are going to be confused by the _log suffix. For most users, the fact that everything is evaluated in terms of log-densities is a calculation detail that they don't need to know unless they are doing increment_log_prob().
 
For the case that doesn't include constants, I like the version
that looks like an ordinary function and is spelled out:

  << normal_propto_log(mu | y, sigma);

I would prefer the << to imply that constants are being dropped when the log-density is being incremented during sampling. But if someone calls log_prob(stanfit, theta) from R, then it should do what it does now in terms of having flags for the propto and the Jacobians. But for the same reason I don't really like the _log suffix, most users don't need to mentally process whether the constants are dropped or not when writing their Stan program.
 
I could live with putting "td" or "ld" on the LHS of "<<".

Me too, but I think it is fine just with an empty left-hand side.
 
I could also live with dropping the "_log" or replacing it
with "_ld", though it's more work and more backward-compatiblity
breaking if we ever move it from deprecated to eliminated.

I don't know if _ld is a sufficient improvement over _log that it would justify the backward incompatibility. Regardless, if someone assigns normal_log() or normal_ld() to a symbol as opposed to dumping it into the <<, then it should include the constants unless there is some marking that implies the constants should be dropped. Maybe normal_log<propto>(y | mu, sigma) would be the most explicit.

Ben 

Andrew Gelman

unread,
Nov 15, 2015, 10:48:19 PM11/15/15
to stan...@googlegroups.com
If we’re talking big changes, I’d like also to float the idea of getting rid of the transformed parameters blocks, combining it with the model, and then when we declare parameters, just have some tag to indicate which intermediate quantities are saved.

Here’s my reasoning. To me, the blocks go like this:

data: Things specified from the outside
transformed data: Computations done exactly once, at the very beginning of the computation
parameters: The space over which the optimization or sampling or inference is performed
model: Computations done once per _step_; what is required to compute the log probability density or objective function
generated quantities: Computations done once per _iteration_

In fact, I wouldn’t mind relabeling the model block as the compute log density block, and I wouldn’t mind relabeling the generated quantities block as the postprocessing block.

But the big thing to me is that the transformed parameters block creates difficulty, for two reasons. First, it can be confusing to remember whether to put a certain computation in the model block or the transformed parameters block. Second, sometimes when I’m playing around with a model, I end up moving computations back and forth between the two blocks. I’d find it much easier to just have all the computations done each step in the model block, and then have some sort of tagging to indicate which intermediate quantities are saved.

Indeed, I’d prefer the terms “intermediate quantities” or “qois” to the term “transformed parameters” which invites confusion with Stan’s internal transformations and also is a bit misleading as these quantities can depend on data as well.

A

Bob Carpenter

unread,
Nov 15, 2015, 11:25:14 PM11/15/15
to stan...@googlegroups.com
We should somehow try to centralize this ongoing discussion.

OK, sounds like everyone wants to throw out the old design
and start fresh. This is making me very nervous, as it'll
break everyone's existing model (or, if we are willing to
maintain two designs simultaneously, which is a lot of work
on the doc and code front, then we can keep both and try
to wean them from one to the other).

Andrew, can you mock up what you'd like a model to look like?

* save / no-save markings w/o transformed parameter block

* some new syntax for incrementing log density

I also want to know what you want to call all the blocks --- I
couldn't quite line up what you thought should be called "qoi":

data = ???
transformed data = qoi ???
parameters = qoi ???
X transformed parameters
model = compute log density ???
generated quantities = postprocessing / qoi ???

I don't like that "postprocessing" and "compute log density"
are procedural rather than describing what's in the block.
I'd be OK with "log density" in place of model.

I'm guessing you want something like this:

parameters {
vector[N] beta_raw;
@save real mu;
@save real<lower=0> tau;
@save real<lower=0> sigma;
}

log density {
@save real beta <- mu + tau * beta_raw;
td << _normal_ld(x * beta, sigma);
}

Is the @save thing once and for all? If not, it'll require
a recompile each usage.

- Bob

Ben Goodrich

unread,
Nov 15, 2015, 11:46:36 PM11/15/15
to stan development mailing list
On Sunday, November 15, 2015 at 11:25:14 PM UTC-5, Bob Carpenter wrote:
Is the @save thing once and for all?  If not, it'll require
a recompile each usage.

I think the @save things would need to support conditionals like

@save[DEBUG] vector[K] beta;

where DEBUG is declared in the data block. Otherwise, they would be a step backward from what rstan can currently do just fine. If we get rid of the transformed parameters block, then we give up constraint checking. But people were misusing constraint checking anyway, so we may be better off without it.

Ben


Bob Carpenter

unread,
Nov 16, 2015, 2:54:14 AM11/16/15
to stan...@googlegroups.com
Excellent point on losing constraint checking.

Another issue is which variables are visible in the
generated quantities (or whatever it might be called) block.

I was originally thinking of having the transformed parameter
block include any necessary Jacobians, but never did that.

What does RStan do for saving? Presumably if DEBUG = 0,
that variable wouldn't be saved? I don't like putting all
of this in the program because it's not so much the model
changing as the I/O. So I'd like not to clutter up the model
with this kind of meta-annotation on usage, and I'd like to
not have to recompile if I change my mind about variables to
save.

- Bob

Bob Carpenter

unread,
Nov 16, 2015, 3:11:14 AM11/16/15
to stan...@googlegroups.com
I agree the call from R should have flags for +/- Jacobian
and +/- propto. I don't think that needs to change.

What I'm proposing is that

<< u;

just be shorthand for

increment_log_prob(u);

In particular, we should be able to write

<< log_mix(lambda, lp1, lp2);

or even

<< -x' * x / 2;

In other words, "<<" can't affect how the expression u
is evaluated --- it gets evaluated the same way no matter
what context it shows up in. I think any other behavior would
be confusing.

That would mean if we want

<< normal(y | mu, sigma);

to evaluate without the normalizing constants, then that's what
the function normal(y | mu, sigma) should do.

Then we could have something like

<< normal_(y | mu, sigma);

for the normalized version.

- Bob

Michael Betancourt

unread,
Nov 16, 2015, 4:40:24 AM11/16/15
to stan...@googlegroups.com
If the argument against alternative stream notations is that they’re
awkward then why is a stream to nothing not also awkward? Pedagogically
having an anchor that absorbs all of these increments helps to emphasize
that we’re aggregating terms into the log posterior.

If we’re going ASCII art, cx is a much better propto than o=.

The reason why _log is so bad is that it conflicts with the secondary
notation that appends the link function when GLM’d distributions.
poisson_log_log is the devil and must die. At least one of those
notations should change even if it breaks backwards compatibility.
Personally I prefer _ld because it gives us a sneaky way to remind
the users that we’re always working with densities which are not
invariant to reparameterizations.

Ultimately we’re trying to do too much with suffixes here. They can’t
denote the distribution name, the normalization, and the link function
without becoming giant. If we do use a template-like notation to
turn normalization on and off then I suggest that we make the arguments
named “normal<norm=true>”. It’s more explicit and given that the default
“normal” will see the most use the extra typing won’t be too much of a
burden. I actually don’t mind using this for the link functions, too,
“binomial<link=logit>” but then we’d have to do figure out what to do
with link function/distribution pairs that don’t exist.

Michael Betancourt

unread,
Nov 16, 2015, 4:52:00 AM11/16/15
to stan...@googlegroups.com
I am very much pro transformed parameters. Firstly given that most
users have no idea about our transformations under the hood I don’t
think there’s any risk of confusion. Secondly, I think that the transformed
parameters block is perfectly suited to all of the auxiliary parameterizations
that are seeing heavy use. We certainly need to improve our discussion
in the manual, but ultimately what we’re doing is not

parameters = target space
model = target distribution

but rather

parameters = auxiliary target space
transformed parameters = completion of target space
model = target distribution

In particular I like how increment_log_prob can’t be called here — Jacobians
arise not because of our parameterization but rather from the incorrect
use of densities. The scope of transformed parameters in both the
model and generated quantities block is also desirable.

I also agree that conditional labeling is bad, but I also think that the
label names shouldn’t have any meaning because, as Bob notes, we
don’t want to conflate I/O with the model specification. Additionally,
I don’t care what RStan does right now because it’s functionality like
this that drifts the interfaces from the API and causes confusion in
user experiences. So ultimately I think the only viable solution would
be something like

real w;
name1 real x;
name1 real y;
name2 real z;

with unlabeled parameters defaulting to “default_label”. Then somehow
we’d have to rearrange all of our model I/O code to take in a label argument
that could be called by the interfaces, although that in of itself is another giant
rabbit hole.

Ben Goodrich

unread,
Nov 16, 2015, 11:14:21 AM11/16/15
to stan development mailing list
On Monday, November 16, 2015 at 2:54:14 AM UTC-5, Bob Carpenter wrote:
Excellent point on losing constraint checking.

Another issue is which variables are visible in the
generated quantities (or whatever it might be called) block.

I think if we get rid of the transformed parameters block, then we have to put the model block within the scope of generated quantities.
 
What does RStan do for saving?

Currently, you just specify pars = c("alpha", "beta", ...) to save whatever subset of the parameters and transformed parameters blocks you want. And that works just fine and does exactly what users want (PyStan does the same thing). I don't love having an @save[FLAG] syntax in the modeling language because what you choose to save is not really an essential part of the model. I just like that better than an unconditional @save syntax where users would have to recompile the model to change what quantities get saved. This is mostly a problem with an absence of a mechanism to save a subset of the parameters in the API / CmdStan.

Whether to get rid of the transformed parameters block is a separate question and a close call. It has its good points, but those go way over the head of almost all users. To them, I can understand why the division between the transformed parameters and model blocks is completely arbitrary.

Ben

Bob Carpenter

unread,
Nov 16, 2015, 4:59:15 PM11/16/15
to stan...@googlegroups.com
Daniel also just brought up the issue of what we do with
truncation.

I think the most natural thing to do would be to have it
be an opertor that applies to the density, as in:

normal(y | mu, sigma) T[lb, ub];

And we'd want to be able to do that up to a proportion and
not.

- Bob

Bob Carpenter

unread,
Nov 16, 2015, 9:47:14 PM11/16/15
to stan...@googlegroups.com

> On Nov 16, 2015, at 4:40 AM, Michael Betancourt <betan...@gmail.com> wrote:
>
> If the argument against alternative stream notations is that they’re
> awkward then why is a stream to nothing not also awkward?

It is. I think it's awkward either way, as is
the existing increment_log_prob(). Given that it's
bad either way, I would choose the shorter notation.

> Pedagogically
> having an anchor that absorbs all of these increments helps to emphasize
> that we’re aggregating terms into the log posterior.

Yes, but it also introduces something that looks like a variable
that isn't really a variable. Either way, it's a bit of a mess
syntactically.

But as I said, I'm not that concerned about this part of the
equation. I'm more concerned that the propto thing gets
attached to the function name.

> If we’re going ASCII art, cx is a much better propto than o=.

Depends on how much you thinnk propto looks like an alpha and
how much you think those ends should be straight :-)

> The reason why _log is so bad is that it conflicts with the secondary
> notation that appends the link function when GLM’d distributions.
> poisson_log_log is the devil and must die.

It's awkward, but then so is poisson_log_ld. Once you start
stacking up suffixes, it's a problem. The real problem is when
you do it on things like negative binomial where there are
two arguments that a transform can apply to.

> At least one of those
> notations should change even if it breaks backwards compatibility.
> Personally I prefer _ld because it gives us a sneaky way to remind
> the users that we’re always working with densities which are not
> invariant to reparameterizations.
>
> Ultimately we’re trying to do too much with suffixes here. They can’t
> denote the distribution name, the normalization, and the link function
> without becoming giant.

I completely agree.

> If we do use a template-like notation to
> turn normalization on and off then I suggest that we make the arguments
> named “normal<norm=true>”.

I could live with this. Or even with norm+ or norm- or +norm
or -norm. I think we want to make these as short as possible
so that they're still readable. The verbosity's getting out
of hand.

> It’s more explicit and given that the default
> “normal” will see the most use the extra typing won’t be too much of a
> burden. I actually don’t mind using this for the link functions, too,
> “binomial<link=logit>” but then we’d have to do figure out what to do
> with link function/distribution pairs that don’t exist.

I'm OK with this, too. Writing bernoulli_logit is awful!

So then we get something like this:

binomial<link=logit, norm=true>(y | alpha);

Presumably this is going to require parsing = as an operator
in there with optional spaces, and allowing the orders to swap as
everyone expects with named arguments.

We'd definitely have to limit values and error check that
during parsing with warnings.

But I like the basic idea of attaching these as clear modifiers
that aren't conflated with the other arguments. That is, I like
the above much much better than

binomial(y | alpha, link="logit")

though a semicolon in there might help:

binomial(y | alpha; link = "logit");

- Bob

Andrew Gelman

unread,
Nov 16, 2015, 11:17:46 PM11/16/15
to stan...@googlegroups.com
Hi, yes, this was the general idea. I just find the transformed parameters thing very confusing. But maybe things will be confusing no matter what, I’m not sure.

Michael Betancourt

unread,
Nov 17, 2015, 3:47:30 AM11/17/15
to stan...@googlegroups.com
Yes, but it also introduces something that looks like a variable
that isn't really a variable.  Either way, it's a bit of a mess
syntactically. 

What about something like lp<< with no space, so there
isn’t this “ghost” variable that people might mistakenly
think that they can use elsewhere?  Or even use the
name of the model block to avoid confusion,

model << 

It's awkward, but then so is poisson_log_ld.  Once you start
stacking up suffixes, it's a problem.  The real problem is when
you do it on things like negative binomial where there are
two arguments that a transform can apply to.

Right now this only comes up for the negative binomial 2, and
we only have one of the possible three transforms (which is
annoying because I have a model that needs the transform
on both).  But I think we can come up with some sufficiently
appropriate naming scheme, like “log1”, “log2”, etc.  At some
point none of this will be obvious enough and users will have
to look up the link definitions in the manual anyways.

I could live with this.  Or even with norm+ or norm- or +norm
or -norm.  I think we want to make these as short as possible
so that they're still readable.  The verbosity's getting out
of hand.

I have to say that I don’t find norm+ or norm- readable at all.

  binomial<link=logit, norm=true>(y | alpha);
 binomial(y | alpha; link = “logit”, norm=“true");

I’m cool with either of these.

Aki Vehtari

unread,
Nov 17, 2015, 8:05:42 AM11/17/15
to stan development mailing list
On Tuesday, November 17, 2015 at 4:47:14 AM UTC+2, Bob Carpenter wrote:
> So then we get something like this:
>
> binomial<link=logit, norm=true>(y | alpha);

+1

Aki

Ben Goodrich

unread,
Nov 17, 2015, 1:11:40 PM11/17/15
to stan development mailing list

I think this is fine too.

Ben Goodrich

unread,
Nov 17, 2015, 11:27:21 PM11/17/15
to stan development mailing list

Although I would add that I think norm should default to false.

Bob Carpenter

unread,
Nov 18, 2015, 5:16:25 PM11/18/15
to stan...@googlegroups.com
Norm defaulting to true or false is a big deal. Right now,
it is schizophrenic, with norm=true (propto=false)
for functions and norm=false for sampling notation.

Default to true: I think this is what people expect
from the function.

Default to false: This is what most people want most
of the time and is more efficient.

Me: OK either way
Ben: default to false

Any other opinions?

- Bob

Ben Goodrich

unread,
Nov 18, 2015, 5:33:06 PM11/18/15
to stan development mailing list
On Wednesday, November 18, 2015 at 5:16:25 PM UTC-5, Bob Carpenter wrote:
Me: OK either way
Ben: default to false

Any other opinions?

Maybe a parser warning if a density function is called in an assignment statement without norm = true.

Ben

Michael Betancourt

unread,
Nov 18, 2015, 5:36:11 PM11/18/15
to stan...@googlegroups.com
Default to false — normalization usually doesn’t matter,
and when it does it’s not a bad idea for the user to make
the effort to specify that they want it.

Aki Vehtari

unread,
Nov 19, 2015, 3:07:34 AM11/19/15
to stan development mailing list
I don't know, how I would like the norm=true/false be, but here's
an example where both sampling statement and function call is used. When computing LOO (or WAIC) we now have, for example,

model {
y ~ normal(X * b);
}
generated quantities {
vector[N] log_lik;
for (n in 1:N)
log_lik[n] <- normal_log(y[n], X[n] * b);
}

- here the "schizophrenic" behavior works as desired, although I guess that most users don't even know that "y ~ normal" drops the constant.
- when computing log_lik in generated quantities, it's nice that normal_log has the _log to remind that we return log density (_lp would be fine, too)
- it would be nice if log_lik computation could be vectorized. Now _log functions default to computing sum of log densities, but maybe we could have option for that too?

Maybe the code could look like this

model {
<< normal_lp(y | X * b);
}
generated quantities {
vector[N] log_lik;
log_lik <- normal_lp<norm=true,vec=true>(y, X * b);
}

I think this would more clear than the current code (more consistent use of normal_lp).

Aki

Aki Vehtari

unread,
Nov 19, 2015, 4:24:06 AM11/19/15
to stan development mailing list
> generated quantities {
> vector[N] log_lik;
> log_lik <- normal_lp<norm=true,vec=true>(y, X * b);
> }

or

log_lik <- normal_lp<norm=true,sum=false>(y, X * b);

Aki

Bob Carpenter

unread,
Nov 19, 2015, 2:58:29 PM11/19/15
to stan...@googlegroups.com

> On Nov 19, 2015, at 3:07 AM, Aki Vehtari <aki.v...@aalto.fi> wrote:
>
> ...
> Maybe the code could look like this

Edited to add sigma, make | consistent, add logp on left
of << and replace _lp with _ld:

> model {
> logp << normal_ld(y | X * b, sigma);
> }
> generated quantities {
> vector[N] log_lik;
> log_lik <- normal_ld<norm=true, vec=true>(y | X * b, sigma);
> }
>
> I think this would more clear than the current code (more consistent use of normal_lp).

That's what we're aiming for.

And I happen to be at this very minute working on vectorization
for all of the functions in Stan. So we might be able to do it that
way without too much effort. But I think we'll probably take it in
stages as the vectorization is a new feature that could be added
on top of changing the existing behavior. I opened an issue for vectorized
output and assigned to me and linked to vectorization issue so I don't forget it:

https://github.com/stan-dev/stan/issues/1697

- Bob
Reply all
Reply to author
Forward
0 new messages