# Ordered Vectors, Prior Distributions, and Efficient Reparametrization

1343 views

### Rick Farouni

Jan 21, 2015, 11:27:59 PM1/21/15
Hi,

Let's say I want to estimate K threshold vectors for an ordered logistic model and I would like to estimate them hierarchically. My questions are the following:

1) If I re-parameterize the Stan model to make it run more efficiently, which of the three parameter vectors  (tau_raw[K]mu_tau_unif, tau[K]) in the reparameterized model do I need to specify as an ordered vector?

2) Is there a difference between the two versions of the model with respect to how the Stan's model compiler transforms the constrained ordered vectors into an unconstrained vectors before sampling?

3) For the sake of convenience, I put a normal prior of tau, but am I allowed to put such a prior on set of ordered vectors? Does Stan reject samples that do not meet the constraint?

Here is a stripped-down version of the model:

`data {  int<lower=0> K; // number of items  int<lower=0> N; // number of responses   int<lower=1,upper=K> kk[N];   // item index for response n  int<lower=0,upper=1> y[N];    // data vector  int<lower=2> C;// number of response categories (e.g. 5) } parameters {  ordered[C-1] tau[K];// K threshold vectors, each of dimension C-1  vector[C-1] mu_tau; // mean vector for thresholds  vector<lower=0>[C-1] sigma_tau; //scale } model {//hyperpriorsmu_tau~cauchy(0,5)sigma_tau~cauchy(0,3)//prior  for(k in 1:K){    for (i in 1:(C-1))      tau[k][i]~normal(mu_tau[i],sigma_tau[i]);           }//the likelihood         for (n in 1:N)   y[n] ~ ordered_logistic(.....,tau[kk[n]]); }`

And here is the re-parameterized model:

`parameters {  ordered[C-1] tau_raw[K];// K threshold vectors, each of dimension C-1  ordered<lower=-pi()/2,upper=pi()/2>[C-1] mu_tau_unif; // mean vector for thresholds  vector<lower=0,upper=pi()/2>[C-1] sigma_tau_unif;} transformed parameters {  ordered[C-1] tau[K];// K threshold vectors, each of dimension C-1  ordered<lower=-pi()/2,upper=pi()/2>[C-1] mu_tau; // mean vector for thresholds  vector<lower=0,upper=pi()/2>[C-1] sigma_tau;   mu_tau <-5*tan(mu_tau_unif);          //reparameterization: mu_tau~cauchy(0,3)   sigma_tau <-3*tan(sigma_tau_unif);      //reparameterization: sigma_tau~half-cauchy(0,3)  for(k in 1:K){      tau[k]<-mu_tau+sigma_tau .*tau_raw[k];} //reparameterization: tau[k]~normal(mu_tau,sigma_tau)}model {/* implicit  hyperpriors* mu_tau_unif ~ uniform(-pi/2,pi/2):implies the  hyperprior mu_tau~cauchy(0,5)*/ sigma_tau_unif ~ uniform(0,pi/2): implies the  hyperprior sigma_tau~cauchy(0,3)//priorfor(k in 1:K){    tau_raw[k]~normal(0,1);} //implies the hierarchical prior tau[k]~multi_normal(mu_tau,diag(sigma_tau))//the likelihood         for (n in 1:N)   y[n] ~ ordered_logistic(.....,tau[kk[n]]); }`

### Bob Carpenter

Jan 22, 2015, 2:39:26 PM1/22/15

> On Jan 21, 2015, at 11:27 PM, Rick Farouni <rfar...@gmail.com> wrote:
>
> Hi,
>
> Let's say I want to estimate K threshold vectors for an ordered logistic model and I would like to estimate them hierarchically. My questions are the following:
>
> 1) If I re-parameterize the Stan model to make it run more efficiently, which of the three parameter vectors (tau_raw[K], mu_tau_unif, tau[K]) in the reparameterized model do I need to specify as an ordered vector?

tau --- that's the one that's the parameter to the ordered_logistic().

I'm not sure the reparameterization will be more efficient, though you
can always test it. Some others on the list might have a better idea
in principle.

> 2) Is there a difference between the two versions of the model with respect to how the Stan's model compiler transforms the constrained ordered vectors into an unconstrained vectors before sampling?

Yes. There's a chapter in the manual that explains the transforms
in detail, but the general idea is that lower-bounds lead to log transforms
(exp inverse transforms in practice) and lower and upper bounds lead
to logit (inverse logit in practice) transforms.

> 3) For the sake of convenience, I put a normal prior of tau, but am I allowed to put such a prior on set of ordered vectors?

Yes, you can do that. Like every other sampling statement,
it just adds terms to the log density. It basically winds
up providing a similar prior to having a bunch of independent
normals that get sorted (though that's not going on under the hood).

> Does Stan reject samples that do not meet the constraint?

Not quite. Because tau is a parameter declared as ordered[C-1],
Stan will use (C - 1) unconstrained parameters, all but the first
of which are log transformed differences from the previous parameter.

- Bob
> --
> You received this message because you are subscribed to the Google Groups "Stan users mailing list" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to stan-users+...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

### Rick Farouni

Jan 22, 2015, 2:55:42 PM1/22/15
to

On Thursday, January 22, 2015 at 2:39:26 PM UTC-5, Bob Carpenter wrote:

> On Jan 21, 2015, at 11:27 PM, Rick Farouni <rfar...@gmail.com> wrote:
>
> Hi,
>
> Let's say I want to estimate K threshold vectors for an ordered logistic model and I would like to estimate them hierarchically. My questions are the following:
>
> 1) If I re-parameterize the Stan model to make it run more efficiently, which of the three parameter vectors  (tau_raw[K], mu_tau_unif, tau[K]) in the reparameterized model do I need to specify as an ordered vector?

tau --- that's the one that's the parameter to the ordered_logistic().

But in the FAQ, Ben Goodrich says that "the restrictions on the support of a transformed parameter in the transformed parameters {} block do not affect the sampling because Stan samples from the space of the parameters" . In the reparamterized model, tau is in the transformed parameters block.

I'm not sure the reparameterization will be more efficient, though you
can always test it.  Some others on the list might have a better idea
in principle.

> 2) Is there a difference between the two versions of the model with respect to how the Stan's model compiler transforms the constrained ordered vectors into an unconstrained vectors before sampling?

Yes.  There's a chapter in the manual that explains the transforms
in detail, but the general idea is that lower-bounds lead to log transforms
(exp inverse transforms in practice) and lower and upper bounds lead
to logit (inverse logit in practice) transforms.

> 3) For the sake of convenience, I put a normal prior of tau, but am I allowed to put such a prior on set of ordered vectors?

Yes, you can do that.  Like every other sampling statement,
it just adds terms to the log density.  It basically winds
up providing a similar prior to having a bunch of independent
normals that get sorted (though that's not going on under the hood).

Is there a reference out there that can explain what goes on under the hood?

### Bob Carpenter

Jan 22, 2015, 3:10:25 PM1/22/15

> On Jan 22, 2015, at 2:55 PM, Rick Farouni <rfar...@gmail.com> wrote:
>
>
>
> On Thursday, January 22, 2015 at 2:39:26 PM UTC-5, Bob Carpenter wrote:
>
> > On Jan 21, 2015, at 11:27 PM, Rick Farouni <rfar...@gmail.com> wrote:
> >
> > Hi,
> >
> > Let's say I want to estimate K threshold vectors for an ordered logistic model and I would like to estimate them hierarchically. My questions are the following:
> >
> > 1) If I re-parameterize the Stan model to make it run more efficiently, which of the three parameter vectors (tau_raw[K], mu_tau_unif, tau[K]) in the reparameterized model do I need to specify as an ordered vector?
>
> tau --- that's the one that's the parameter to the ordered_logistic().
>
> But in the FAQ, Ben Goodrich says that "the restrictions on the support of a transformed parameter in the transformed parameters {} block do not affect the sampling because Stan samples from the space of the parameters" . In the reparamterized model, tau is in the transformed parameters block.

It doesn't technically need to be declared as ordered if it's
in the transformed parameters block --- the ordered declaration will
just do error checking after the transformed params are defined.

It's then up to you to make sure that the transformed parameter really
is ordered, or you'll get rejections. You don't want rejections of
this kind as they can cause the sampler to devolve to an inefficient
random walk. You want support over all the legal *parameter*
values --- that is, any value of the parameters matching the constraint
should not be rejected elsewhere or the sampler will devolve.

> I'm not sure the reparameterization will be more efficient, though you
> can always test it. Some others on the list might have a better idea
> in principle.
>
> > 2) Is there a difference between the two versions of the model with respect to how the Stan's model compiler transforms the constrained ordered vectors into an unconstrained vectors before sampling?
>
> Yes. There's a chapter in the manual that explains the transforms
> in detail, but the general idea is that lower-bounds lead to log transforms
> (exp inverse transforms in practice) and lower and upper bounds lead
> to logit (inverse logit in practice) transforms.

I guess I should come to grips with that. I knew it was
in there. The efficiency of some of these transforms can depend
on how much data you have an how much that constrains the posterior.

> > 3) For the sake of convenience, I put a normal prior of tau, but am I allowed to put such a prior on set of ordered vectors?
>
> Yes, you can do that. Like every other sampling statement,
> it just adds terms to the log density. It basically winds
> up providing a similar prior to having a bunch of independent
> normals that get sorted (though that's not going on under the hood).
>
> Is there a reference out there that can explain what goes on under the hood?

The manual, but I can summarize. Every sampling statement:

y ~ foo(theta);

increments the log density by log foo(y|theta). Any constrained
parameter increments the log density with the log Jacobian of the
inverse transform. The transforms we use are all documented.

So if you look at ordering

ordered[K] c;

Then you have unconstrained params:

c1,...,cK

c1, c1 + exp(c2), c1 + exp(c2) + exp(c3), ...

The log Jacobian of the transform is just (c2 + ... + cK).

You can also increment the log density directly using increment_log_prob.

We then just sample from whatever posterior is defined by the density.

- Bob