Correct use of truncated distributions

terrance savitsky

unread,

Sep 1, 2012, 9:51:11 PM9/1/12

to stan-...@googlegroups.com

Hi, I have a parameter matrix, t[i,j], declared as unrestricted (in range) in the parameters block and I over-ride the uniform priors by specifying a set of truncated normal distributions for t[i,j] in the models block (with the lower and upper truncation points differing for each (i,j) combination). Will the specification of a truncated distribution in the models block automatically perform a jacobian adjustment to lp_ for HMC sampling associated with transforming the parameters, t[i,j], from restricted to unrestricted - or do I need to manually include the jacobian to transform from restricted to unrestricted (as shown on page 164 of the users manual)?? This question has broad applicability as the above case is probably the most common for employing truncated distributions.

Thanks, Terrance.

--
Thank you, Terrance Savitsky

Jiqiang Guo

unread,

Sep 1, 2012, 10:40:41 PM9/1/12

to stan-...@googlegroups.com

Hi,

You need not add the jacobian to lp__ if you are putting distributions directly on your parameters.

However, the problem in your case is you have an unrestricted parameter, but later you specify a restricted support on it. This is problematic in Stan. It is like the following simple *problematic* example.

parameters {

real y;

}

model {

y ~ uniform(0, 1); // problem: y is declared with support of the whole real line.

}

--

Jiqiang

terrance savitsky

unread,

Sep 2, 2012, 2:06:22 PM9/2/12

to stan-...@googlegroups.com

Hi Jiqiang, Thanks for your response, but it leaves me confused. You exactly comment on the issue I am trying to resolve. Stan doesn't allow element-by-element setting of restrictions for a parameter matrix, t[i,j], so that the user is forced to declare them as unrestricted. Are you telling me that Stan is not able to manage a situation where t[i,j] are declared as unrestricted in the parameters block with a set of truncated normal distributions specified in the models block? what i'm really asking is how to handle a case in Stan of a parameter matrix where I where I want to impose a truncated prior? I'm interpreting your answer as it is not possible to specify a truncated distribution (in the model block) for a parameter matrix where the declaration would require distinct restriction bounds on each element of the parameter matrix.

Terrance

Marcus Brubaker

unread,

Sep 2, 2012, 2:32:49 PM9/2/12

to stan-...@googlegroups.com

Currently Stan cannot declare arrays where the element bounds are
different for different elements. This can be worked around though.
For instance

data {
real y_lower[5];
real y_upper[5];
}

parameters {
real<lower=0,upper=1> base_y[5];
}

transformed parameters {
real y[5];
for i = 1:5
y[i] <- y_lower[i] + (y_upper[i] - y_lower[i])*base_y[i]
}

The only thing to be careful about is that if the upper and lower
bounds are also parameters and you want a uniform prior over the
values of y then you will need to explicitly declare the uniform
prior, it will not be implicitly handled in the right way without the
explicit declaration.

Cheers,
Marcus

Ben Goodrich

unread,

Sep 2, 2012, 2:34:39 PM9/2/12

to stan-...@googlegroups.com

If I understand you correctly, I think the most straightforward way to proceed would be to specify bounded parameters individually (or in blocks of parameters that have the same bounds) and if matrix algebra is necessary, insert the bounded parameters into a matrix or vector in the transformed parameters block. Then, you could specify priors (truncated or not) on the individual parameters, which obviates the need for a prior on the vector or matrix in the transformed parameters block and does not require the user to adjust lp__ with a log Jacobian determinant. For example, something like

parameters {
real<lower=0,upper=1> alpha;
real<lower=-1,upper=1> beta;
real<lower=-1,upper=0> gamma;
}
transformed parameters {
real vec[3];
vec[1] <- alpha;
vec[2] <- beta;
vec[3] <- gamma;
}
model {
real sum_of_squares;
sum_of_squares = vec' * vec;
# priors on alpha, beta, and gamma
# some function of sum_of_squares
}

Ben

terrance savitsky

unread,

Sep 2, 2012, 3:06:40 PM9/2/12

to stan-...@googlegroups.com

Hi Marcus, Ben and Jiqiang, Thank you for your quick and helpful responses and for your work to deliver Stan. The approach Marcus suggests may work as it is scalable in size of the parameter matrix. For example, I have something like a 20,000 x 25 parameter matrix whose values are restricted according to a data matrix of the same size and another set of ordered parameters. The problem for me is that I would also like more distribution control (for identifiability and other reasons) for the resulting distribution on y[i] in Marcus' example. I could replace the uniform on base_y with a more general beta distribution (or mixture of beta distributions) and that might do it, but it is an unorthodox approach (from the standpoint of producing publishable work).

My application is to build a data augmentation generating model for observed ordinal outcomes where a latent continuous response is restricted according to the observed ordered outcomes and a set of cutpoints. If I employ a mixture model for the latent latent continuous response, then I can fix the cutpoints. The vanilla implementation treats the cutpoints as random. Stan does offer a solution for such problems that makes all of this go away by using the ordered_logistic distribution. The problem is that one is then locked into a logistic distribution for the inverse link function. The computational generality and power of Stan (vs. BUGS implementations) seems best deployed on more general and challenging models, however.

Terrance.

Marcus Brubaker

unread,

Sep 2, 2012, 3:40:41 PM9/2/12

to stan-...@googlegroups.com

You can use any distribution on y[i], including different ones for
different elements. I just picked one as an example.

Also, you can use an ordered type for the cutpoints without using the
ordered_logistic distribution.

Cheers,
Marcus

terrance savitsky

unread,

Sep 2, 2012, 4:11:00 PM9/2/12

to stan-...@googlegroups.com

Hi Marcus, Just to be sure I understand, I repeat what you said. Using your example, I can directly place a prior on y[i] in the models block and that will over-ride the implied prior through the uniform on the parameters, base_y[i]. Is that correct? Yes, it is a nice feature to be able to declare cutpoints as ordered. The issue, though, focuses on imposing a set of proper prior specifications on the latent restricted real responses in a way that works for Stan. I think you have showed me an approach.

terrance savitsky

unread,

Sep 2, 2012, 10:23:50 PM9/2/12

to stan-...@googlegroups.com

so to accomplish a range restricted prior for the y[i] i must place the prior directly to the base_y[i] (where the prior is restricted to [0,1] support)? since the y[i] are declared as unrestricted, I cannot place restricted/truncated priors directly on them.

On Sun, Sep 2, 2012 at 11:32 AM, Marcus Brubaker <marcus.a...@gmail.com> wrote:

Marcus Brubaker

unread,

Sep 2, 2012, 10:54:53 PM9/2/12

to stan-...@googlegroups.com

No, you can put whatever prior you want on the ys in the model block
(truncated or otherwise). Though they are declared as unrestricted,
they are restricted in effect (because of their definition) and so
placing a truncated prior on them will be fine. Note there are three
distinct but related issues at play here.

1. Ensuring that parameters lie in a proper range. This can be done
by declaring them to have a proper range or constructing them in that
way.
2. Placing the correct prior on the parameters. You can put any prior
you want on any parameter you want in the model block, however if you
are construct the parameters yourself (as I did in the example I sent)
then you can't fully rely on the implicit prior behaviour and should
declare it explicitly.
3. With truncated distributions in Stan if the range of the parameters
are not constrained to lie in the truncated range then the sampler is
likely to be very inefficient.

Cheers,
Marcus

terrance savitsky

unread,

Sep 19, 2012, 9:29:13 PM9/19/12

to stan-...@googlegroups.com

Hi Marcus, You provided a trick (below) to allow separate restricted intervals for the elements of a vector or matrix, "y", which will then have imposed a truncated (or restricted range) prior. You point out that it may be wise to state an explicit prior on "y" in order to have a known distributional behavior, given that the trick is based on employing multiple parameter sets. You also note that the truncation ranges on the prior should match the restricted values for "y" set via your trick for sampler efficiency. I'm confused by why this set-up works to produce a correct model. While the range for each element of "y" is restricted to the correct range by using your algorithm, "y" is still declared as unrestricted. So how does Stan know to perform the transformation to unrestricted - with the accompanying Jacobian adjustment - for HMC sampling? Is it the truncated prior that communicates the need to transform? If so, then it seems like it would be correct to simply declare "y" as unrestricted (but initialize with values in the correct truncated range) and just employ a restricted distribution. The issue would then be one of sampling efficiency, though the result would still be correct, right?

On Sun, Sep 2, 2012 at 11:32 AM, Marcus Brubaker <marcus.a...@gmail.com> wrote:

Bob Carpenter

unread,

Sep 19, 2012, 10:13:35 PM9/19/12

to stan-...@googlegroups.com

On 9/19/12 9:29 PM, terrance savitsky wrote:
> Hi Marcus, ... You also note that the truncation ranges on the prior should match the restricted values for

> "y" set via your trick for sampler efficiency. I'm confused by why this set-up works to produce a correct model.
> While the range for each element of "y" is restricted to the correct range by using your algorithm, "y" is still
> declared as unrestricted. So how does Stan know to perform the transformation to unrestricted - with the accompanying
> Jacobian adjustment - for HMC sampling?

The parameters that HMC/NUTS sees are the unconstrained
versions of variables declared in the parameters block.

So declaring a 5-array of [0,1]-bounded parameters,
as in your example

> parameters {
> real<lower=0,upper=1> base_y[5];
> }

uses logit() to unconstrain to (-infinity,infinity).
What happens in the code is that the parameters are always
kept in the unconstrained space by the sampler and then
inverse transformed to their constrained variants,
here with inv_logit(). This means the log absolute
Jacobian for the n-th parameter is log(inv_logit'(y[n])),
and that is what gets added to the log probability
accumulator lp__ implicitly before the user code
executes over the constrained parameter which it
sees as base_y[n].

After that, setting the base_y[n] from the array into
a vector doesn't require a transform.

All of this clunkiness is because I haven't yet
implemented lower- and upper-bound constraints for
vector parameters. But they're on the to-do list,
as I can see they're going to be more important for
vectorized operations than I'd originally anticipated.

> Is it the truncated prior that communicates the need to transform?

No. Although the probability functions do check their
domains and throw exceptions if inputs are out of support.
That's why declarations with constraints are important.

> If so, then
> it seems like it would be correct to simply declare "y" as unrestricted (but initialize with values in the correct
> truncated range) and just employ a restricted distribution. The issue would then be one of sampling efficiency, though
> the result would still be correct, right?

No, see above.

- Bob

Reply all

Reply to author

Forward