implementing specific improper priors in STAN

608 views
Skip to first unread message

Mark Leeds

unread,
Apr 21, 2013, 10:51:25 PM4/21/13
to stan-...@googlegroups.com

Hi everyone: I looked in the STAN manual but I couldn't find an example of what I'm trying to do so that's why I'm asking here. Now that I understand the problem more clearly, ( adl_gibbs2.pdf was a real eye opener ), the fix they recommend for dealing with the ridge at lambda = 1 is to set the following priors for the parameters.

lambda should be set to be proportional to (1-lambda)^2. ( originally, it's improper uniform but this results in the ridge at lambda = 1 ).
sigmasquared should be proportional to 1/sigmasquared  ( log of sigmasquared is improper uniform. same as always ).
beta should be ~ k ( some constant. i.e: improper uniform prior. same as always ).

The revised prior on lambda should  eliminate the ridge in the posterior density  for (beta,lambda). ( see adl_gibbs2.pdf for details ).

But is there a way to implement the above in STAN. I remember reading somewhere in the manual or in Ben's posts that if you don't specify priors, then they will be improper and flat but

A) Will STAN automatically know to set the flat improper on log(sigmasquared) and not sigmasquared. ( box and tiao explains why it should be on log sigmasquared ).

B) is it possible to set the prior on lambda to be proportional to (1-lambda)^2.  I realize that I can set the prior to a density with a parameter (1-lambda^2) but then the mathematical description of the
cancelling out of the ridge in adl_gibbs2.pdf will no longer necessarily hold. So, I'd prefer to just be proportional to (1-lambda)^2.

I tried the code below but STAN was not happy about not seeing a density.  I get

DIAGNOSTIC(S) FROM PARSER:
Parser expecting: <distribution and parameters>

Thanks.

R CODE: koyck_paper.R

#=================================================================================

library("rstan");

set.seed(123)
T <- 200;
x <- rnorm(T,0,1);

#=========================================
# NOT OKAY : 0.3
#=========================================
beta <- 1.0;
lambda <- 0.95
sigma <- 1.0
#=========================================
# NOT OKAY 1.6
#=========================================
#beta <- 2.6;
#lambda <- 0.95
#sigma <- 1.0
#=========================================
# NOT OKAY : -0.1
#=========================================
#beta <- 0.5;
#lambda <- 0.95
#sigma <- 1.0
#====================================
# NOT OKAY: 0.6
#====================================
#beta <- 1.0;
#lambda <- 0.90
#sigma <- 1.0
#====================================
# NOT GREAT : 1.8
#========================================
#beta <- 2.6;
#lambda <- 0.90
#sigma <- 1.0
#========================================
# NOT SO GOOD : 0.2
#========================================
#beta <- 0.5;
#lambda <- 0.90
#sigma <- 1.0
#========================================
#NOT BAD: 0.8
#=========================================
#beta <- 1.0;
#lambda <- 0.85
#sigma <- 1.0
#=========================================
#SO SO : 2.0
#=========================================
#beta <- 2.6;
#lambda <- 0.85
#sigma <- 1.0
#=========================================
#SO SO : 0.3
#=========================================
#beta <- 0.5;
#lambda <- 0.85
#sigma <- 1.0
#====================================
# SO SO:  0.7
#====================================
#beta <- 1.0;
#lambda <- 0.80
#sigma <- 1.0
#====================================
# DECENT: 2.2
#====================================
#beta <- 2.6;
#lambda <- 0.80
#sigma <- 1.0
#====================================
#SO SO : 0.3
#====================================
#beta <- 0.5;
#lambda <- 0.80
#sigma <- 1.0
#====================================
# OKAY:  0.8
#====================================
#beta <- 1.0;
#lambda <- 0.75
#sigma <- 1.0
#====================================
# GOOD: 2.7
#====================================
#beta <- 2.6;
#lambda <- 0.75
#sigma <- 1.
#====================================
# STILL SO SO : 0.3
#====================================
#beta <- 0.5;
#lambda <- 0.75
#sigma <- 1.0
#====================================
# OKAY:  0.8
#====================================
#beta <- 1.0;
#lambda <- 0.70
#sigma <- 1.0
#====================================
# GOOD: 2.4
#====================================
#beta <- 2.6;
#lambda <- 0.70
#sigma <- 1.0
#====================================
# GOOD: 0.4
#====================================
#beta <- 0.5;
#lambda <- 0.70
#sigma <- 1.0
#===========================================

y <- rep(NA,T);

y[1] <- rnorm(1,beta * (1 - lambda) * x[1],sigma);

for (t in 2:T)
  y[t] <- rnorm(1, beta * (1 - lambda) * x[t] + lambda * y[t-1], sigma);

fit <- stan(file="koyck_paper.stan", data=c("T","y","x"), iter=1000, chains=4);
print(fit)


# STAN CODE
#====================================================================

data {
  int<lower=0> T; // number of time points
  real y[T]; // output at time t
  real x[T]; // predictor for time t
}
parameters {
  real beta1mlambda; // equals beta * (1-lambda)
  real <lower=0, upper=1> lambda; // lag
  real <lower=0> sigma; // noise scale
}

model {
  lambda ~ (1-lambda)^2;     // fix according to adl_gibbs.pdf
  y[1] ~  normal(beta1mlambda * x[1], sigma);
  for (t in 2:T)
    y[t] ~ normal( beta1mlambda * x[t] + lambda * y[t-1],sigma);
}

generated quantities {
  real beta;
  beta <-  beta1mlambda / (1 - lambda);
}

#==========================================================================================================





















Bob Carpenter

unread,
Apr 21, 2013, 11:31:44 PM4/21/13
to stan-...@googlegroups.com
The chapter on implementing your own distributions
covers how to do this.


On 4/21/13 10:51 PM, Mark Leeds wrote:
>
> Hi everyone: I looked in the STAN manual but I couldn't find an example of what I'm trying to do so that's why I'm
> asking here. Now that I understand the problem more clearly, ( adl_gibbs2.pdf was a real eye opener ), the fix they
> recommend for dealing with the ridge at lambda = 1 is to set the following priors for the parameters.
>
> lambda should be set to be proportional to (1-lambda)^2. ( originally, it's improper uniform but this results in the
> ridge at lambda = 1 ).

If this is what you want, you just add the log probability
up to a proportion that doesn't depend on params, so that's just:

lp__ <- lp__ + 2 * log1m(lambda);

Note that log1m(x) = log(1 - x), but is more arithmetically stable.

> sigmasquared should be proportional to 1/sigmasquared ( log of sigmasquared is improper uniform. same as always ).

If the parameter is sigmasquared, that's

lp__ <- lp__ - log(sigmasiquared);

> beta should be ~ k ( some constant. i.e: improper uniform prior. same as always ).

You don't need to do anything for this one.

> The revised prior on lambda should eliminate the ridge in the posterior density for (beta,lambda). ( see
> adl_gibbs2.pdf for details ).
>
> But is there a way to implement the above in STAN. I remember reading somewhere in the manual or in Ben's posts that if
> you don't specify priors, then they will be improper and flat but

"Stan" isn't an acronym, so it's not capitalized.

> A) Will STAN automatically know to set the flat improper on log(sigmasquared) and not sigmasquared. ( box and tiao
> explains why it should be on log sigmasquared ).

No. If you declare

parameters {
real<lower=0> sigmasquared;
}

That implies an improper flat prior on sigmasquared,
not on log(sigmasquared).

You need to work out the change of variable to put a flat
proper on log(sigmasquared). If it works out to
p(sigmasquared) propto 1/sigmasquared, then you need what
I wrote above. (I'm not quite up to working out changes
of variables in my head.)

> B) is it possible to set the prior on lambda to be proportional to (1-lambda)^2. I realize that I can set the prior to
> a density with a parameter (1-lambda^2) but then the mathematical description of the
> cancelling out of the ridge in adl_gibbs2.pdf will no longer necessarily hold. So, I'd prefer to just be proportional to
> (1-lambda)^2.

Yes, no problem. See above. But if you want to allow (1-lambda)
to be negative, you have to be careful not to do what I did
above because you'll try to take the log of a negative.
Instead, use:

lp__ <- lp__ + 2 * log(abs(1 - lambda));

You can't take the log of a negative number, of course.

> I tried the code below but STAN was not happy about not seeing a density. I get

In Stan's defense, it is a programming language so you have to
follow its syntax to make its happy. This isn't legal:

> lambda ~ (1-lambda)^2; // fix according to adl_gibbs.pdf

Only named distributions are allowed after the twiddle.

What you want is to increment lp__ as described in the
custom probability function chapter of the manual.

- Bob

Mark Leeds

unread,
Apr 22, 2013, 11:17:38 AM4/22/13
to stan-...@googlegroups.com
thanks bob. It's much appreciated. i''ll digest below and give it a go when I get some time later this week. also, I'll look at the relevant manual chapter. I read through that chapter  at some point in
the past but I didn't connect its relevance to what I'm trying to do. and yes, if the log(sigmasquared) is improper and flat, then sigmasquared ~ 1/sigmasquared.They  explain it in box and tiao. To all following this thread, that book gives a nice description of  priors and their relevant issues in ~ pages 25-30. thanks again to bob for the useful clarifications.

Mark Leeds

unread,
Apr 23, 2013, 1:04:00 PM4/23/13
to stan-...@googlegroups.com

Hi Bob: I wrote the Stan and R code below and still the estimates for lambda near 1 aren't very good. But I could be doing something odd/wrong in the Stan code so if you or anyone else wants to
take a look at it, it's appreciated. The intention of the Stan  code for the 3 parameters, sigmasquared, beta and lambda respectively was for

A) sigmasquared to be restricted between zero and infinity and have the prior for sigmasquared be 1/sigmasquared.

B) beta to be unrestricted  and have its prior be flat and improper.

C) lambda to be restricted between 0 and 1 and have its prior be (1-lambda)^2.

Based on your previous useful explanation, I think that's what the code  at the bottom of this posting does. It's essentially an attempt at implementing the fix to the ridge at lambda = 1 problem described in adl_gibbs2.pdf. ( I also tried the version where beta1mlambda is made into one variable and then divided by (1-lambda) in the generated quantities section but that didn't work either ).

I just have a couple of questions because I have not dared to try to open the hood of Stan.  I've written code in C and C++, more C than C++, but I would need infrastructure training/tutelage to get involved with going inside of STAN. My guess is that this would require more time than the Stan team has time for at the moment.

1) Does Stan construct the U(x) = -log(p(x) symbolically ? for example, similarly to how mathematica would construct an expression.  I ask because this "fix" described in adl_gibbs2.pdf might be specific to the gibbs sampling framework. I don't know enough about the details of Stan to know whether the fix should work in Stan also. Essentially, atleast in the gibba sampling framework,  by giving that prior above to lambda, the (1-lambda) in the denominator of posterior density of beta conditional on lambda gets zapped and one ends up with (1-lambda) in the numerator which is fine because the density of beta conditional on lambda = 1 becomes zero. But I'm not sure if, in Stan,  U(x) gets constructed symbolically using the prior for lambda. Michael mention earlier that the lambda parameter
gets unconstrained so that the unconstrained version is on the interval [-inf, +inf] but I don't see how that happens. If it does, then doesn't posterior density need to be modified ?

2) Does the Stan team ever offer tutoring/training sessions for Stan newbies who have a decent amount of experience programming but no experience dealing with something the size and nature of
Stan. I'd like to understand Stan in more depth and I think the only way to accomplish that would be to get inside it.

Thanks for your great help. If the code below seems fine to you, then I may have hit another brick wall with the koyck model. But I'm used to that happening with Mr. Koyck.

# R CODE
#====================================================================


library("rstan");

set.seed(123)
T <- 200;
x <- rnorm(T,0,1);

#=========================================
# NOT OKAY : 0.3
#=========================================
beta <- 1.0;
lambda <- 0.95
sigmasquared <- 1.0
#=========================================
# NOT OKAY 1.2

#=========================================
#beta <- 2.6;
#lambda <- 0.95
#sigmasquared <- 1.0

#=========================================
# NOT OKAY : -0.1
#=========================================
#beta <- 0.5;
#lambda <- 0.95
#sigmasquared <- 1.0

#====================================
# NOT OKAY: 0.6
#====================================
#beta <- 1.0;
#lambda <- 0.90
#sigmasquared <- 1.0

#====================================
# NOT GREAT : 1.8
#========================================
#beta <- 2.6;
#lambda <- 0.90
#sigmasquared <- 1.0

#========================================
# NOT SO GOOD : 0.2
#========================================
#beta <- 0.5;
#lambda <- 0.90
#sigmasquared <- 1.0

#========================================
#NOT BAD: 0.8
#=========================================
#beta <- 1.0;
#lambda <- 0.85
#sigmasquared <- 1.0

#=========================================
#SO SO : 2.0
#=========================================
#beta <- 2.6;
#lambda <- 0.85
#sigmasquared <- 1.0

#=========================================
#SO SO : 0.3
#=========================================
#beta <- 0.5;
#lambda <- 0.85
#sigmasquared <- 1.0

#====================================
# SO SO:  0.7
#====================================
#beta <- 1.0;
#lambda <- 0.80
#sigmasquared <- 1.0

#====================================
# DECENT: 2.2
#====================================
#beta <- 2.6;
#lambda <- 0.80
#sigmasquared <- 1.0

#====================================
#SO SO : 0.3
#====================================
#beta <- 0.5;
#lambda <- 0.80
#sigmasquared <- 1.0

#====================================
# OKAY:  0.8
#====================================
#beta <- 1.0;
#lambda <- 0.75
#sigmasquared <- 1.0

#====================================
# GOOD: 2.7
#====================================
#beta <- 2.6;
#lambda <- 0.75
#sigmasquared <- 1.

#====================================
# STILL SO SO : 0.3
#====================================
#beta <- 0.5;
#lambda <- 0.75
#sigmasquared <- 1.0

#====================================
# OKAY:  0.8
#====================================
#beta <- 1.0;
#lambda <- 0.70
#sigmasquared <- 1.0

#====================================
# GOOD: 2.4
#====================================
#beta <- 2.6;
#lambda <- 0.70
#sigmasquared <- 1.0

#====================================
# GOOD: 0.4
#====================================
#beta <- 0.5;
#lambda <- 0.70
#sigmasquared <- 1.0
#===========================================

y <- rep(NA,T);

y[1] <- rnorm(1,beta * (1 - lambda) * x[1], sqrt(sigmasquared));


for (t in 2:T)
  y[t] <- rnorm(1, beta * (1 - lambda) * x[t] + lambda * y[t-1], sqrt(sigmasquared));

fit <- stan(file="koyck_paperb.stan", data=c("T","y","x"), iter=1000, chains=4);
print(fit)

#=====================================================================================
# STAN CODE FOR IMPLEMENTING RIDGE FIX DESCRIBED IN ADL_GIBBS2.pdf. PRIOR ON LAMBDA
# IS (1-LAMBDA)^2, BETA IS IMPROPER AND FLAT, SIGMASQUARED IS ~ 1/SIGMASQUARED
#=====================================================================================


data {
  int<lower=0> T; // number of time points
  real y[T]; // output at time t
  real x[T]; // predictor for time t
}
parameters {
  real beta;  // part of x coeff
  real <lower=0, upper=1> lambda; // lagged coeff
  real <lower=0> sigmasquared; // noise scale
}

model {
  y[1] ~  normal(beta*(1-lambda) * x[1], sqrt(sigmasquared));

  for (t in 2:T)
    y[t] ~ normal( beta*(1-lambda) * x[t] + lambda * y[t-1],sqrt(sigmasquared));


  lp__<- lp__ + 2 * log1m(lambda); 
  lp__<- lp__  - log(sigmasquared);

}
#============================================================================================================

Ben Goodrich

unread,
Apr 23, 2013, 3:33:14 PM4/23/13
to stan-...@googlegroups.com
On Tuesday, April 23, 2013 1:04:00 PM UTC-4, Mark Leeds wrote:

Hi Bob: I wrote the Stan and R code below and still the estimates for lambda near 1 aren't very good. But I could be doing something odd/wrong in the Stan code so if you or anyone else wants to
take a look at it, it's appreciated. The intention of the Stan  code for the 3 parameters, sigmasquared, beta and lambda respectively was for

A) sigmasquared to be restricted between zero and infinity and have the prior for sigmasquared be 1/sigmasquared.

B) beta to be unrestricted  and have its prior be flat and improper.

C) lambda to be restricted between 0 and 1 and have its prior be (1-lambda)^2.

Based on your previous useful explanation, I think that's what the code  at the bottom of this posting does. It's essentially an attempt at implementing the fix to the ridge at lambda = 1 problem described in adl_gibbs2.pdf. ( I also tried the version where beta1mlambda is made into one variable and then divided by (1-lambda) in the generated quantities section but that didn't work either ).

How so?
 
I just have a couple of questions because I have not dared to try to open the hood of Stan.  I've written code in C and C++, more C than C++, but I would need infrastructure training/tutelage to get involved with going inside of STAN. My guess is that this would require more time than the Stan team has time for at the moment.

The problems you are having are not really C++ problems.
 
1) Does Stan construct the U(x) = -log(p(x) symbolically ? for example, similarly to how mathematica would construct an expression.

No, it's all double precision numerics.
 
I ask because this "fix" described in adl_gibbs2.pdf might be specific to the gibbs sampling framework. I don't know enough about the details of Stan to know whether the fix should work in Stan also. Essentially, atleast in the gibba sampling framework,  by giving that prior above to lambda, the (1-lambda) in the denominator of posterior density of beta conditional on lambda gets zapped and one ends up with (1-lambda) in the numerator which is fine because the density of beta conditional on lambda = 1 becomes zero.

It's the same information matrix regardless of whether you are doing Gibbs or HMC, so any numeric problems would be essentially the same in both cases.
 
But I'm not sure if, in Stan,  U(x) gets constructed symbolically using the prior for lambda. Michael mention earlier that the lambda parameter
gets unconstrained so that the unconstrained version is on the interval [-inf, +inf] but I don't see how that happens. If it does, then doesn't posterior density need to be modified ?

For every parameter in Stan with constrained support, there is an unconstrained primitive parameter that you don't see but which is actually the space that Stan is sampling from. In the case of a parameter constrained to be between 0 and 1 (such as lambda), the unconstrained parameter is the output of the standard inverse logistic CDF. Conversely, the standard logistic CDF transforms the unconstrained parameter to the (0,1) interval.
 
2) Does the Stan team ever offer tutoring/training sessions for Stan newbies who have a decent amount of experience programming but no experience dealing with something the size and nature of
Stan. I'd like to understand Stan in more depth and I think the only way to accomplish that would be to get inside it.

So far, no, but probably some time in the future. Basically, the model that you posted yields a sampler that doesn't move, which can be verified by doing

apply(fit, "chains", FUN = function(x) length(unique(as.data.frame(x)))) / nrow(fit) # acceptance rates

which are all 0.008 when I do it, which ideally should be around 0.6. Another thing to do is

pairs(fit)

which shows that the two dimensional density plot of beta and lambda is sort of a triangle (not good). This suggests that we need to reparameterize to get better geometry, so I would be interested to see what combining beta and 1 - lambda into beta1mlambda would do.

Ben

Mark Leeds

unread,
Apr 23, 2013, 4:38:39 PM4/23/13
to stan-...@googlegroups.com
Thanks Bob. I have to leave but I will look closer at your emaillate tonight or tomorrow. Thanks for your patience and great help. yes, the problems I'm having are not C++ but it would still be fun/interesting to be able to see under the hood someday !!!!! When I get back to you, I'll also have the other beta1mlambda results  you asked for.

Mark Leeds

unread,
Apr 23, 2013, 4:40:45 PM4/23/13
to stan-...@googlegroups.com
Oops Ben. I'm Sorry. I thought that the previous response was Bob. Thanks and sorry for refererring to you as Bob.


On Tuesday, April 23, 2013 3:33:14 PM UTC-4, Ben Goodrich wrote:

Bob Carpenter

unread,
Apr 23, 2013, 5:55:45 PM4/23/13
to stan-...@googlegroups.com


On 4/23/13 3:33 PM, Ben Goodrich wrote:
> On Tuesday, April 23, 2013 1:04:00 PM UTC-4, Mark Leeds wrote:
>
>
> Hi Bob: I wrote the Stan and R code below and still the estimates for lambda near 1 aren't very good. But I could be
> doing something odd/wrong in the Stan code so if you or anyone else wants to
> take a look at it, it's appreciated. The intention of the Stan code for the 3 parameters, sigmasquared, beta and
> lambda respectively was for
>
> A) sigmasquared to be restricted between zero and infinity and have the prior for sigmasquared be 1/sigmasquared.
>
> B) beta to be unrestricted and have its prior be flat and improper.
>
> C) lambda to be restricted between 0 and 1 and have its prior be (1-lambda)^2.
>
> Based on your previous useful explanation, I think that's what the code at the bottom of this posting does. It's
> essentially an attempt at implementing the fix to the ridge at lambda = 1 problem described in adl_gibbs2.pdf. ( I
> also tried the version where beta1mlambda is made into one variable and then divided by (1-lambda) in the generated
> quantities section but that didn't work either ).
>
>
> How so?
>
> I just have a couple of questions because I have not dared to try to open the hood of Stan. I've written code in C
> and C++, more C than C++, but I would need infrastructure training/tutelage to get involved with going inside of
> STAN. My guess is that this would require more time than the Stan team has time for at the moment.
>
>
> The problems you are having are not really C++ problems.

And not even necessarily Stan problems. It's about
the shape of your posterior and identifiability. So it's really
a stats problem.

Are you sure Gibbs can fit your data?

Have you tried fitting simpler models? Or using simulated
data to see if it's the model or the data that is problematic?

> 1) Does Stan construct the U(x) = -log(p(x) symbolically ? for example, similarly to how mathematica would construct
> an expression.
>
>
> No, it's all double precision numerics.

I'm not sure what you mean by construct U(x). It's never constructed
per se --- it corresponds to a program.

So if I write

U(theta) = -log bernoulli(y|theta)

then it gets translated into Stan roughly as:

double log_prob(double theta, bool y) {
return -log(y ? theta : 1-theta);
}

Is that symbolic or numerical? Of course evaluating the value
involves numerical processing.

The gradients are computed symbolically on a line-by-line
basis using algorithmic (aka automatic) differentiation.
This calculates up to machine precision unlike finite diffs.

> I ask because this "fix" described in adl_gibbs2.pdf might be specific to the gibbs sampling framework. I don't know
> enough about the details of Stan to know whether the fix should work in Stan also. Essentially, atleast in the gibba
> sampling framework, by giving that prior above to lambda, the (1-lambda) in the denominator of posterior density of
> beta conditional on lambda gets zapped and one ends up with (1-lambda) in the numerator which is fine because the
> density of beta conditional on lambda = 1 becomes zero.
>
>
> It's the same information matrix regardless of whether you are doing Gibbs or HMC, so any numeric problems would be
> essentially the same in both cases.
>
> But I'm not sure if, in Stan, U(x) gets constructed symbolically using the prior for lambda. Michael mention
> earlier that the lambda parameter
> gets unconstrained so that the unconstrained version is on the interval [-inf, +inf] but I don't see how that
> happens. If it does, then doesn't posterior density need to be modified ?

What do you mean by symbolic?

Read the manual to see how the transforms work and
how their Jacobian determinants get added to the log
probability function.

> For every parameter in Stan with constrained support, there is an unconstrained primitive parameter that you don't see
> but which is actually the space that Stan is sampling from. In the case of a parameter constrained to be between 0 and 1
> (such as lambda), the unconstrained parameter is the output of the standard inverse logistic CDF. Conversely, the
> standard logistic CDF transforms the unconstrained parameter to the (0,1) interval.
>
> 2) Does the Stan team ever offer tutoring/training sessions for Stan newbies who have a decent amount of experience
> programming but no experience dealing with something the size and nature of
> Stan. I'd like to understand Stan in more depth and I think the only way to accomplish that would be to get inside it.

I don't think you need to get inside the code, but if you
want to roll your own priors and likelihoods, you should
have a better understanding of stats. Looking at the code's
not going to help with these issues. You can't see in the code
where a distribution gets unstable or a poor choice of prior
or parameter support interacts poorly with data.


> So far, no, but probably some time in the future. Basically, the model that you posted yields a sampler that doesn't
> move, which can be verified by doing
>
> |
> apply(fit, "chains", FUN = function(x) length(unique(as.data.frame(x)))) / nrow(fit) # acceptance rates
> |
>
> which are all 0.008 when I do it, which ideally should be around 0.6. Another thing to do is
>
> |
> pairs(fit)
> |
>
> which shows that the two dimensional density plot of beta and lambda is sort of a triangle (not good). This suggests
> that we need to reparameterize to get better geometry, so I would be interested to see what combining beta and 1 -
> lambda into beta1mlambda would do.

Right, that redundant "beta * (1 - lambda)" expression is
bad news for identifiability.

- Bob

Ben Goodrich

unread,
Apr 23, 2013, 10:00:40 PM4/23/13
to stan-...@googlegroups.com
On Tuesday, April 23, 2013 3:33:14 PM UTC-4, Ben Goodrich wrote:
So far, no, but probably some time in the future. Basically, the model that you posted yields a sampler that doesn't move, which can be verified by doing

apply(fit, "chains", FUN = function(x) length(unique(as.data.frame(x)))) / nrow(fit) # acceptance rates

which are all 0.008 when I do it, which ideally should be around 0.6.

I messed up. That should be

apply(fit, "chains", FUN = function(x) nrow(unique(as.data.frame(x)))) / nrow(fit) # acceptance rates

in which case the acceptance rates are in the reasonable range. But ...
 
Another thing to do is

pairs(fit)

which shows that the two dimensional density plot of beta and lambda is sort of a triangle (not good). This suggests that we need to reparameterize to get better geometry, so I would be interested to see what combining beta and 1 - lambda into beta1mlambda would do.

this geometry still doesn't look too good.

All that said, the posterior distribution of beta is not terribly inconsistent with the population value.

Ben



Mark Leeds

unread,
Apr 24, 2013, 2:20:25 PM4/24/13
to stan-...@googlegroups.com

Hi Bob: I wrote the gibbs sampling code 3 years ago and I think my understanding of stats is atleast as good as yours. i know you're an amazing programmer/devveloper but that's all I can
see so far so please stop with the digs on knowledge. They're not useful aside from indicating that you have ego issues.  you've yet to show me that you understand statistics more deeply than I do. Feel free to show me that though. I'd appreciate being enlightened by your greatness. Ben has been way more insightful/helpful and there's no need for you to read my postings if you don't have anything useful to say, aside from bashings about lack of knowledge. Also, to be totally frank, centering is generally not done in financial econometric models. That  comment showed your lack of experience/knowledge right away.

Thanks Ben for R code correction. I noticed that also. The pairs function is really useful. I'm gonna run that on the original formulation ( beta1mlamda ) and see if that sheds light on the issue.
Reply all
Reply to author
Forward
0 new messages