Sichel Distribution (aka Poisson Generalized Inverse Gaussian distribution)

862 views
Skip to first unread message

Brad Ochocki

unread,
Oct 16, 2015, 5:53:44 PM10/16/15
to stan-...@googlegroups.com
Hello,

I'd like to ask if anybody is willing and able to implement the Sichel distribution in Stan. I'd like to use it to model some count data that I have, but I'm having difficulties implementing it... I don't have any experience with custom distributions and I couldn't find a reference for the distribution in the manual. I'm also having difficulty finding out very much about the distribution online.

I've used the gamlss package in R to see that the fit to my data is much better with a Poisson Inverse Gaussian (PIG) or Poisson Generalized Inverse Gaussian (Sichel) than with a negative binomial.

Any help at all would be greatly appreciated!

Links:

Thanks,

Brad

Ben Goodrich

unread,
Oct 17, 2015, 12:56:59 AM10/17/15
to Stan users mailing list
On Friday, October 16, 2015 at 5:53:44 PM UTC-4, Brad Ochocki wrote:
I've used the gamlss package in R to see that the fit to my data is much better with a Poisson Inverse Gaussian (PIG) or Poisson Generalized Inverse Gaussian (Sichel) than with a negative binomial.

Is this just a Poisson mixed with the GIG distribution for the lambda parameter? If so, then the main limitation is that Stan does not implement Bessel functions with real orders for some reason, but you could do inverse Gaussian pretty easily. Just write the log-density for the inverse Gaussian as a Stan function to use as a prior for lambda and use a Poisson likelihood for the data.

Ben

Brad Ochocki

unread,
Oct 18, 2015, 3:34:03 PM10/18/15
to stan-...@googlegroups.com
Thanks Ben! From what I understand, you're right, I'm just looking for a Poisson mixed with the GIG distribution for the lambda parameter. I was able to implement the inverse Gaussian, but I don't have a lot of experience with Stan and I'm having trouble assigning the inverse Gaussian as a prior for lambda. 

Here's some practice code I wrote to fit the inverse Gaussian:

// Attempting to implement Inverse Gaussian in Stan.

// PDF:
//f(x; mu,shape) = (shape/(2*pi*x^3))^0.5 * exp((-shape*(x-mu)^2)/(2*x*mu^2))

// logPDF:
// log(f(x; mu,shape)) = 0.5*(log(shape)-log(2*pi)-3*log(shape)-shape*(((x-mu)/mu)^2)/x)
                           
data {
  int<lower=1> N;
  real<lower=0> x[N];
}
transformed data {
  real denom; // calculate this constant only once
  denom <- 0.0;
  for (i in 1:N)
    denom <- denom + (log(2*pi()) + 3*log(x[i]));
}
parameters {
  real<lower=0> mu;
  real<lower=0> shape;
}
model {
  real summands[N];
  // put priors on mu and shape here if you want

  // log-likelihood
  increment_log_prob(0.5*(N * log(shape) - denom));
  for (i in 1:N) {
    summands[i] <- -0.5 * shape * pow((x[i] - mu)/mu,2)/x[i];
  }
  increment_log_prob(sum(summands)); // faster than doing inside loop
}

And the test:

library(rstan);library(gamlss)
N      <- 1000
mu     <- 3
shape  <- 2
sigma  <- sqrt(1/shape)  # changing shape to sigma to match gamlss's parameterization
x      <- rIG(N,mu,sigma)
ig_test <- stan(file = "./IG.stan", data = list(N = N, x = x), verbose = FALSE, refresh = 1000)
print(ig_test)

          mean se_mean   sd     2.5%      25%      50%      75%    97.5% n_eff Rhat
mu        3.05    0.00 0.12     2.84     2.97     3.05     3.12     3.29  2072    1
shape     2.10    0.00 0.10     1.91     2.03     2.10     2.16     2.29  2446    1
lp__  -2007.51    0.03 1.01 -2010.13 -2007.94 -2007.20 -2006.77 -2006.51  1264    1

That seems to work. But I'm having trouble figuring out how to translate the inverse Gaussian log-density to a Stan function to use as a prior for lambda. Here's my attempt, which spits back errors:

// Attempting to implement Poisson inverse Gaussian in Stan.

functions {
    real IG_log(real x, real mu, real shape){
    return 0.5*(log(shape) - log(2*pi()) - 3*log(x) - shape * pow((x - mu)/mu,2)/x);
    } 
  }
                          
data {
  int<lower=1> N;
  int<lower=0> x[N];
}

parameters {
  real<lower=0> mu;
  real<lower=0> shape;
}

model {
  // log-likelihood
  for (i in 1:N) {
     increment_log_prob(poisson_log(x[i],(IG_log(x[i],mu,shape))));
  }
}


library(rstan);library(gamlss)
N      <- 1000
mu     <- 3
shape  <- 2              # changing shape to sigma to match gamlss's parameterization
sigma  <- sqrt(1/shape)
x <- rPIG(N,mu,sigma)
pig_test <- stan(file = "./PIG.stan", data = list(N = N, x = x), verbose = FALSE, refresh = 1000)
print(pig_test)

SAMPLING FOR MODEL 'PIG' NOW (CHAIN 1).
 [1] "Rejecting initial value:"                                                                                                              
 [2] "  Error evaluating the log probability at the initial value."                                                                          
 [3] "Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:"                        
 [4] "Exception thrown at line 22: stan::math::poisson_log: Rate parameter is -3.57973, but must be >= 0!"                                   
 [5] "If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,"
 [6] "but if this warning occurs often then your model may be either severely ill-conditioned or misspecified."                              
 [7] "Rejecting initial value:"                                                                                                              
 [8] "  Error evaluating the log probability at the initial value."                                                                          
 [9] "Initialization between (-2, 2) failed after 100 attempts. "                                                                            
[10] " Try specifying initial values, reducing ranges of constrained values, or reparameterizing the model." 

So it looks like Poisson's lambda is going negative, but I don't know how to keep it positive.

Ben Goodrich

unread,
Oct 18, 2015, 4:45:48 PM10/18/15
to stan-...@googlegroups.com
On Sunday, October 18, 2015 at 3:34:03 PM UTC-4, Brad Ochocki wrote:
Thanks Ben! From what I understand, you're right, I'm just looking for a Poisson mixed with the GIG distribution for the lambda parameter. I was able to implement the inverse Gaussian, but I don't have a lot of experience with Stan and I'm having trouble assigning the inverse Gaussian as a prior for lambda. 

It could be like this, I think.

functions {
   // ignoring the 2pi constant
   real IG_log(real x, real mu, real shape){
     return 0.5 * log(shape) - 1.5 * log(x) - shape * square( (x - mu) / mu) / x;
   }
}

data
{
 int<lower=1> N;
 int<lower=0> x[N];
}

parameters {
 real<lower=0> mu;
 real<lower=0> shape;
 real
<lower=0> lambda;
}

model {
 x ~ poisson(lambda);
 
lambda ~ IG(mu, shape); // edited. Thanks Bob!
}

Ben

Bob Carpenter

unread,
Oct 19, 2015, 1:04:13 PM10/19/15
to stan-...@googlegroups.com
Small correction. This

> lambda ~ IG_log(mu, shape);

should be

lambda ~ IG(mu, shape);

to match the function definition. Blame me for the absolutely
terrible "_log" suffix convention for densities.

- Bob
> lambda ~ IG_log(mu, shape);



> }
>
> Ben
>
>
> --
> You received this message because you are subscribed to the Google Groups "Stan users mailing list" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to stan-users+...@googlegroups.com.
> To post to this group, send email to stan-...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

Brad Ochocki

unread,
Oct 19, 2015, 6:38:39 PM10/19/15
to stan-...@googlegroups.com
Thanks again Bob. I caught the _log suffix, but the model is having a lot of trouble converging, which seems a bit odd to me since the inverse Gaussian model by itself converges nice and quick. I'm going to have another look at the likelihood function and post what I find.

Quick follow-up question: does Stan not implement modified Bessel functions? I saw some functions in the manual (e.g., modified_bessel_second_kind(int v, real z)). I have to admit I'm not very familiar with Bessel functions, but it looks like they might enable me to try an alternate parameterization from this paper (http://www.jstor.org/stable/2288808).

Bob Carpenter

unread,
Oct 20, 2015, 9:51:52 AM10/20/15
to stan-...@googlegroups.com
The first thing to check is whether the model is well identified.
Are there very high posterior correlations among parameters?

And yes, we have the Bessel functions listed in the manual, but only for
integer v. I know almost nothing about them.

- Bob

> On Oct 19, 2015, at 6:38 PM, Brad Ochocki <brad.o...@gmail.com> wrote:
>
> Thanks again Bob. I caught the _log suffix, but the model is having a lot of trouble converging, which seems a bit odd to me since the inverse Gaussian model by itself converges nice and quick. I'm going to have another look at the likelihood function and post what I find.
>
> Quick follow-up question: does Stan implement not implement modified Bessel functions? I saw some functions in the manual (e.g., modified_bessel_second_kind(int v, real z)). I have to admit I'm not very familiar with Bessel functions, but it looks like they might enable me to try an alternate parameterization from this paper (http://www.jstor.org/stable/2288808).

Matt Hoffman

unread,
Oct 20, 2015, 12:14:55 PM10/20/15
to stan-...@googlegroups.com
One possible reason for not having more flexible versions of the modified Bessel functions is that there's no nice expression for their derivative w.r.t. v, which makes it harder to use them in Stan. (This is my main complaint about the otherwise awesome GIG distribution.)

Matt

Brad Ochocki

unread,
Oct 20, 2015, 4:37:55 PM10/20/15
to Stan users mailing list, mdho...@cs.princeton.edu
Okay, I think I'm finally just about on the same page... we can't do the GIG because Stan can only calculate modified Bessel functions for integer v, while the GIG requires real v.

@Bob: There is sometimes a strong correlation (~0.5) between 'lambda' and 'shape', but it seems to vary from simulation to simulation. Here's an example output with strong correlation, where mu=3 and shape=2 in the simulated data: 

                mean      se_mean           sd   2.5%           25%           50%           75%         97.5% n_eff  Rhat

mu    
2.372467e+253          Inf          Inf   3.71 1.140895e+177 5.767031e+221 1.248539e+227 3.277302e+253  4000   NaN
shape  
1.515148e+12 1.805725e+12 3.113701e+12   8.44  9.500000e+00  1.140000e+01  5.061459e+11  1.111952e+13     3  3.07
lambda  3.630000e+00 1.100000e-01 1.600000e-01   3.35  3.490000e+00  3.710000e+00  3.740000e+00  3.820000e+00     2  5.95
lp__    
4.908500e+02 1.433400e+02 2.032600e+02 147.10  4.221500e+02  5.859200e+02  6.287900e+02  6.895600e+02     2 34.27

Bob Carpenter

unread,
Oct 20, 2015, 4:43:01 PM10/20/15
to stan-...@googlegroups.com
If mu = 3, where does the 10^250 scale come from?

If the posteriors vary chain to chain, then they haven't all
converged to exploring the same distribution and you either need
to reparameterize to improve mixing or run longer. If the problem
is multimodality, there may not be a solution.

- Bob

Brad Ochocki

unread,
Oct 22, 2015, 6:47:32 PM10/22/15
to stan-...@googlegroups.com
Thank you guys for your help, it's been incredibly useful.

I'm not sure what exactly was going wrong with the previous model, but I managed to get the Poisson-inverse Gaussian (PIG) up and running. I ended up writing code for the Poisson-generalized inverse Gaussian (Sichel), since this is the most general case, but I didn't have the patience to see if the model converged; I think that the gamma parameter is strongly correlated with one of the other parameters. However, the PIG model converged quickly and accurately. 

I ended up writing modified Bessel functions of the first and second kind... they're probably not the most robust versions that could be written, but they produce output that is very similar to the output of Rs modified Bessel functions, and they work well in the regions of parameter space that I'm working in. I also took advantage of the following recurrence formula to calculate the probability mass function, which allows the calculation of the PMF while only making three calls to the Bessel functions:

Bessel function in the denominator:

bKd   <- besselK(omega,gamma);

for x = 0:

f(0; xi, omega, alpha) = pow(xi,0) * pow(omega / alpha, gamma+0) * besselK(alpha, gamma+0) / (tgamma(0+1) * bKd);

for x = 1:

f(1; xi, omega, alpha) = pow(xi,1) * pow(omega / alpha, gamma+1) * besselK(alpha, gamma+1) / (tgamma(1+1) * bKd);

for x >= 2:

f(x; xi, omega, alpha) = (2 * xi * omega / pow(alpha, 2)) * ((gamma + x - 1) / x) * f[x-1] + pow(xi * omega / alpha, 2) * f[x-2] / (x * (x-1))
 

When gamma = -0.5, these functions give the pmf for the PIG [from Johnson, Kotz, and Kemp 1992. Univariate Discrete Distributions, pg 457].

Here's the final code:

functions {
  // modified Bessel function of the first kind:
  real besselI(real z, real v){
    real summands[101];
    
    for (m in 0:100){
      if (m+v+1<=0){
        summands[m+1] <- 0;
      } else {
        summands[m+1] <- ((z/2)^(v+2*m))/(tgamma(m+1)*tgamma(m+v+1));
      }
    }
  return sum(summands);
  }
  
  // modified Bessel function of the second kind (sometimes referred to as
  // modified Bessel function of the third kind):
  real besselK(real z, real v){
    real v1;
    v1 <- v - 1E-12;
    return pi() / 2 * (besselI(z,-v1) - besselI(z,v1)) / sin(v1 * pi());
  }
  
  // log(pmf) of the Sichel distribution, using the recurrence formula from
  // Chesson and Lee 2005:
  vector log_sichel_prob(real omega, real xi, real gamma){
    // Only fitting data for patches 0:50
    vector[51] P;
    real alpha;
    real bKd;
    int nn;
    
    alpha <- sqrt(pow(omega,2) + 2 * omega * xi);
    bKd   <- besselK(omega,gamma);
    
    P[1] <- pow(xi,0) * pow(omega/alpha, gamma+0) * besselK(alpha, gamma+0) / (tgamma(0+1) * bKd);
    P[2] <- pow(xi,1) * pow(omega/alpha, gamma+1) * besselK(alpha, gamma+1) / (tgamma(1+1) * bKd);
    for(n in 3:51){
      nn <- n-1;
      P[n] <- ( 2 * xi * omega / pow(alpha, 2) ) * ( (gamma + nn - 1) / nn ) * P[n-1] + pow(xi * omega / alpha, 2) * P[n-2] / ( nn * (nn-1));
    }
    return log(P);
  }
}

data {
 int<lower=1> N;
 int<lower=0> y[N];
}

parameters {
 real<lower=0, upper=10> omega;
 real<lower=0, upper=10> xi;
 //real<lower=0, upper=10> gamma;
}

transformed parameters {
  real mu;
  real sig2;
  
  // the mean and variance below are only valid when gamma = -0.5
  mu   <- xi;
  sig2 <- xi*(1+xi/omega);
}

model {
  real fit[N];
  vector[51] logsichel;
  
  //logsichel <- log_sichel_prob(omega, xi, gamma);
  logsichel <- log_sichel_prob(omega, xi, -0.5);
  
  for(i in 1:N){
    fit[i] <- logsichel[ y[i] + 1 ];
  }
  increment_log_prob(sum(fit));
}

Here's an example output:

4 chains, each with iter=2000; warmup=1000; thin=1; 
post-warmup draws per chain=1000, total post-warmup draws=4000.

          mean se_mean   sd     2.5%      25%      50%      75%    97.5% n_eff Rhat
omega     1.47    0.00 0.10     1.28     1.40     1.46     1.53     1.67  2254    1
xi        3.80    0.00 0.09     3.64     3.74     3.80     3.86     3.99  2330    1
mu        3.80    0.00 0.09     3.64     3.74     3.80     3.86     3.99  2330    1
sig2     13.73    0.02 0.97    12.01    13.03    13.69    14.36    15.82  2142    1
lp__  -3912.86    0.03 0.93 -3915.39 -3913.20 -3912.56 -3912.21 -3911.95  1162    1

And finally, a visual comparison of the fit to my real data using the negative binomial and PIG.


Thanks again!

Brad

Bob Carpenter

unread,
Oct 22, 2015, 9:05:28 PM10/22/15
to stan-...@googlegroups.com
Awesome. Thanks for writing back and posting results.

Although the explicit sum of an array is more efficient
than a running sum, there might be lots of unneeded zero
terms for v < 0. You could solve that by slicing off the
tail of the array. But I'd recommend the following
much simpler form (I also fiddled the loop bounds out of habit
and switched z/2 to (0.5 * z) for efficiency).

So this:

> real besselI(real z, real v){
> real summands[101];
>
> for (m in 0:100){
> if (m+v+1<=0){
> summands[m+1] <- 0;
> } else {
> summands[m+1] <- ((z/2)^(v+2*m))/(tgamma(m+1)*tgamma(m+v+1));
> }
> }
> return sum(summands);
> }

can be simplified to

real besselI(real z, real v){
real sum;
sum <- 0;
for (m in 1:101)
if (m + v > 0)
sum <- sum + (0.5 * z)^(v + 2 * (m - 1))
/ (tgamma(m) * tgamma(m + v));
return sum;
}

I'd also be inclined to add a test that v > -101, so that the
result isn't returned as 0 without raising an error. That is:

if (v <= -101)
reject("v must be > -101 in besselI(z, v)");

Why use the - 1E-12 in the besselK? Should that just be an edge
case that gets detected?

> real v1;
> v1 <- v - 1E-12;
> return pi() / 2 * (besselI(z,-v1) - besselI(z,v1)) / sin(v1 * pi());


This could be sped up if there are common terms in that sum
for besselI(z, -v1) and besselI(z, v1).

And then there'd be even more room for optimization in the distribution
itself.

> vector log_sichel_prob(real omega, real xi, real gamma){
> // Only fitting data for patches 0:50
> vector[51] P;
> real alpha;
> real bKd;
> int nn;
>
> alpha <- sqrt(pow(omega,2) + 2 * omega * xi);
> bKd <- besselK(omega,gamma);
>
> P[1] <- pow(xi,0) * pow(omega/alpha, gamma+0) * besselK(alpha, gamma+0) / (tgamma(0+1) * bKd);
> P[2] <- pow(xi,1) * pow(omega/alpha, gamma+1) * besselK(alpha, gamma+1) / (tgamma(1+1) * bKd);
> for(n in 3:51){
> nn <- n-1;
> P[n] <- ( 2 * xi * omega / pow(alpha, 2) ) * ( (gamma + nn - 1) / nn ) * P[n-1] + pow(xi * omega / alpha, 2) * P[n-2] / ( nn * (nn-1));
> }
> return log(P);
> }
> }

You should remove pow(xi,0) * with and replace pow(xi,1) with xi.
You also should remove "+ 0" and replace 0+1 with 0 and 1+1 with 2.
Common terms like (omega/alpha) and (xi * omega / alpha) should be computed once and stored,
as in:

real omega_over_alpha;
real xi_times_omega_over_alpha;
omega_over_alpha <- omega / alpha;
xi_times_omega_over_alpha <- xi * omega_over_alpha;

pow(a,2) can be replaced with with square(a), and I'd
stick to either a^b or pow(a,b) notation throughout
for consistency.

Finally, it may be necessary to work on the log scale throughout,
so you'd save log_P as an array rather than P, and then do
log_sum_exp(log_P) rather than log(P) at the end.

- Bob

> On Oct 22, 2015, at 6:47 PM, Brad Ochocki <brad.o...@gmail.com> wrote:
>
> Thank you guys for your help, it's been incredibly useful.
>
> I'm not sure what exactly was going wrong with the previous model, but I managed to get the Poisson-inverse Gaussian (PIG) up and running. I ended up writing code for the Poisson-generalized inverse Gaussian (Sichel), since this is the most general case, but I didn't have the patience to see if the model converged; I think that the gamma parameter is strongly correlated with one of the other parameters. However, the PIG model converged quickly and accurately.
>
> I ended up writing modified Bessel functions of the first and second kind... they're probably not the most robust versions that could be written, but they produce output that is very similar to the output of Rs modified Bessel functions, and they work well in the regions of parameter space that I'm working in. I also took advantage of the following recurrence formula to calculate the probability mass function, which allows the calculation of the PMF while only making three calls to the Bessel functions:
>
> for x = 0:
>
> f(0; xi, omega, alpha) = pow(xi,0) * pow(omega / alpha, gamma+0) * besselK(alpha, gamma+0) / (tgamma(0+1) * bKd);
>
> for x = 1:
>
> f(0; xi, omega, alpha) = pow(xi,1) * pow(omega / alpha, gamma+1) * besselK(alpha, gamma+1) / (tgamma(1+1) * bKd);
> And finally, a visual comparison of the fit to my data using the negative binomial and PIG.

Brad Ochocki

unread,
Oct 22, 2015, 10:29:25 PM10/22/15
to stan-...@googlegroups.com
Thanks again! I implemented most of the changes you suggested (code is at the bottom); the model ran ~7% faster for the simple case I'm currently running, and had higher n_eff for most of the parameters, which is great!

[EDIT: I guess there's considerable variation in n_eff over different runs, but the updated model seems to be doing at least as good as the other model, if not slightly better.]

I've addressed some specific questions below:

Why use the - 1E-12 in the besselK?  Should that just be an edge 
case that gets detected?

For integers, besselI(z, v) == besselI(z,-v) which makes besselK go to 0. So for integer v, besselK should be evaluated using besselI(z,v) and besselI(z,-v) as v approaches the integer, but not at the integer. To get around this I just did the simplest thing I could think of, which was to add a very small number to any v. I'm just realizing as I type this that adding a small number to v doesn't prevent v1 itself from being an integer, so maybe there's a more elegant solution to that problem. Maybe the odds of getting an integer v are so vanishingly small that it's not even worth considering?

Finally, it may be necessary to work on the log scale throughout, 
so you'd save log_P as an array rather than P, and then do 
log_sum_exp(log_P) rather than log(P) at the end. 

I tried this initially, but the recurrence formula works with non-log-transformed probabilities, so I thought it would be simpler to reference P[n - 1] instead of exp(P[n - 1]) every iteration and just take the log at the end. I could be wrong about that; it's also quite possible that I'm misunderstanding your suggestion.

Cheers,

Brad

Updated code:

functions {
  // modified Bessel function of the first kind
  real besselI(real z, real v){
    real sums;
    sums <- 0;
    
    for (m in 1:101) {
      if (m + v > 0) {
        sums <- sums + pow(0.5 * z, v + 2 * (m - 1)) / ( tgamma(m) * tgamma(m+v) );
      }
       if (v <= -101) {
         reject("v must be > -101 in besselI(z, v)");
      }
    }
  return sums;
  }
  
  // modified Bessel function of the second kind (sometimes referred to as
  // modified Bessel function of the third kind)
  real besselK(real z, real v){
    real v1;
    v1 <- v - 1E-12;
    return pi() / 2 * (besselI(z,-v1) - besselI(z,v1)) / sin(v1 * pi());
  }
  
  // log(pmf) of the Sichel distribution, using the recurrence formula from
  // Chesson and Lee 2005.
  vector log_sichel_prob(real omega, real xi, real gamma){
    // Only fitting data from patches 0:50
    vector[51] P;
    real alpha;
    real bKd;
    int nn;
    real omega_over_alpha;
    real xi_times_omega_over_alpha;
    real square_xi_times_omega_over_alpha;
    real two_times_xi_times_omega_over_square_alpha;
    
    
    // calculate constants
    alpha <- sqrt(square(omega) + 2 * omega * xi);
    bKd   <- besselK(omega, gamma);
    omega_over_alpha <- omega / alpha;
    xi_times_omega_over_alpha <- xi * omega_over_alpha;
    square_xi_times_omega_over_alpha <- square(xi_times_omega_over_alpha);
    two_times_xi_times_omega_over_square_alpha <- 2 * xi_times_omega_over_alpha / alpha;
    
    // calculate probabilities
    P[1] <-  1 * pow(omega_over_alpha, gamma)   * besselK(alpha, gamma)   / (tgamma(1) * bKd);
    P[2] <- xi * pow(omega_over_alpha, gamma+1) * besselK(alpha, gamma+1) / (tgamma(2) * bKd);
    for(n in 3:51){
      nn <- n-1;
      P[n] <- two_times_xi_times_omega_over_square_alpha * ( (gamma + nn - 1) / nn ) * P[n-1] + square_xi_times_omega_over_alpha * P[n-2] / ( nn * (nn-1));
    }
    return log(P);
  }
}

data {
 int<lower=1> N;
 int<lower=0> y[N];
}

parameters {
 real<lower=0, upper=10> omega;
 real<lower=0, upper=10> xi;
 //real<lower=0, upper=10> gamma;
}

transformed parameters {
  real mu;
  real sig2;
  
  // the mean and variance below are only valid when gamma = -0.5
  mu   <- xi;
  sig2 <- xi*(1+xi/omega);
}

model {
  real fit[N];
  vector[51] logsichel;
  
  //logsichel <- log_sichel_prob(omega, xi, gamma);
  logsichel <- log_sichel_prob(omega, xi, -0.5);
  
  for(i in 1:N) {
    fit[i] <- logsichel[ y[i] + 1 ];
  }
  increment_log_prob(sum(fit));
}

Output:
Old Code (average of 8.16 seconds per chain): 
          mean se_mean   sd     2.5%      25%      50%      75%    97.5% n_eff Rhat
omega     1.35    0.00 0.06     1.23     1.30     1.35     1.39     1.48  1829    1
xi        4.19    0.00 0.07     4.05     4.14     4.19     4.24     4.34  1813    1
mu        4.19    0.00 0.07     4.05     4.14     4.19     4.24     4.34  1813    1
sig2     17.22    0.02 0.92    15.54    16.60    17.18    17.83    19.15  1626    1
lp__  -8101.35    0.03 1.02 -8104.08 -8101.76 -8101.04 -8100.63 -8100.35  1146    1

Updated Code (average of 7.61 seconds per chain):
          mean se_mean   sd     2.5%      25%      50%      75%    97.5% n_eff Rhat
omega     1.35    0.00 0.06     1.23     1.31     1.35     1.39     1.48  1864    1
xi        4.19    0.00 0.07     4.05     4.14     4.19     4.24     4.34  1998    1
mu        4.19    0.00 0.07     4.05     4.14     4.19     4.24     4.34  1998    1
sig2     17.21    0.02 0.93    15.56    16.58    17.16    17.79    19.19  1604    1
lp__  -8101.34    0.03 1.02 -8103.97 -8101.74 -8101.03 -8100.61 -8100.36  1163    1


Bob Carpenter

unread,
Oct 22, 2015, 10:38:56 PM10/22/15
to stan-...@googlegroups.com
Well, the n_eff shouldn't change --- it should compute the
exact same posterior!

The place you'll likely get integers is in initialization
if you do something like init=0 or pass in user-defined inits.
Otherwise the integers have measure zero and you shouldn't have
to worry about them.

That was indeed my suggestion, but there's a better way to implement
this:

> P[n] <- two_times_xi_times_omega_over_square_alpha * ( (gamma + nn - 1) / nn ) * P[n-1]
> + square_xi_times_omega_over_alpha * P[n-2] / ( nn * (nn-1));


by using log_sum_exp magic:

log_P[n] <- log_sum_exp(log(two_times_xi...alpha) + log((gamma + nn - 1)/ nn) + log_P[n-1],
log(square_xi...alpha) + log_P[n-2] - log(nn * (nn - 1)));

There you go, recursion on the log scale. The main reason to do
this would be to prevent any underflows or overflows in the arithmetic.
Otherwise, it's more costly to compute on the log scale. And usually
harder to read if you're not used to log-scale arithmetic, where

log(a * b) = log(a) + log(b)

and

log(a + b) = log_sum_exp(log(a), log(b))

And you can continue to reduce. I think a -log(nn) term could be
brought out. But anyway, don't get carried away because I love to
optimize code!

And this assumes, of course, everything's positive --- can't be taking
logs of negative numbers!

- Bob

> On Oct 22, 2015, at 10:29 PM, Brad Ochocki <brad.o...@gmail.com> wrote:
>
> Thanks again! I implemented most of the changes you suggested (code is at the bottom); the model ran ~7% faster for the simple case I'm currently running, and had higher n_eff for most of the parameters, which is great!
>

Ben Goodrich

unread,
Oct 23, 2015, 2:05:43 AM10/23/15
to Stan users mailing list
On Thursday, October 22, 2015 at 6:47:32 PM UTC-4, Brad Ochocki wrote:
I ended up writing modified Bessel functions of the first and second kind... they're probably not the most robust versions that could be written, but they produce output that is very similar to the output of Rs modified Bessel functions, and they work well in the regions of parameter space that I'm working in.

I wrote versions of the modified Bessel functions that may be more universally appropriate because they do not have a fixed truncation point and utilize the log-sum-exp logic until the exp part underflows to zero. See attached. But Bob is right that you could write a more efficient version of the besselK function that avoided some of the duplication.

Ben

bessel.stan

Michael Betancourt

unread,
Oct 23, 2015, 4:49:02 AM10/23/15
to stan-...@googlegroups.com
It could very well be different if the new version admits more accurate
floating point calculations.  Additionally always keep in mind that
implementations like this will naively autodiff through the loops to
compute the gradient, which can lead to unstable and inaccurate
evaluations.  Implementing the functions themselves are pretty
straightforward, but implementing their derivatives explicitly is
the really, really nasty part.  See, for example, all the work that
had to be done for the derivatives of the incomplete beta function.

andre.p...@googlemail.com

unread,
Jan 28, 2017, 10:40:56 AM1/28/17
to Stan users mailing list
Posting to an old posting, but maybe sometime to someone useful.

The possion GIG can be calculate directly:

https://www.jstor.org/stable/2288808?seq=1#page_scan_tab_contents

The Bessel for gamma 1/2, 3/2, ... can be calutated more easily... 
K_m12<-function(z, v) {
  if(v==-0.5||v==0.5)
    return (sqrt(pi/(2*z))*exp(-z));
  return (K_m12(z, v - 2) + 2*(v-1)/z * K_m12(z, v - 1));
}

for gamma = 1/2
  real PIG(int y, real mu, real tau) {
 
if(y==0)
   
return exp(1/tau*(1-(1+2*tau*mu)^0.5));
 
if(y==1)
   
return mu*(1+2*tau*mu)^-0.5*PIG(0,mu,tau);
 
return 2.*tau*mu/(1+2*tau*mu) *
         
(1. - 3. /(2.*y)) * PIG(y-1,mu,tau) +
           mu
^2 / (1+2.*tau*mu) / (y*(y-1)) * PIG(y-2,mu,tau);
 
}

library(gamlss.dist)

contains implementation of the PIG distribution.

Andre

Bob Carpenter

unread,
Jan 30, 2017, 3:35:17 PM1/30/17
to stan-...@googlegroups.com
Thanks for sharing.

As long as that recursive funciton doesn't go too deep,
you should be OK---C++ doesn't support deep recursion, so
usually iteratively unfolding is better.

Shouldn't that be pi() for Stan?


- Bob

andre.p...@googlemail.com

unread,
Feb 13, 2017, 12:58:39 AM2/13/17
to stan-...@googlegroups.com
The code not in the box is R code and just an efficient implementation of BesselK for special gamma values.

The PIG code in the Box is a special case for gamma = 1/2. already optimized. However there maybe should
be also versions for gamma = 3/2, 5/2., which is easy to implement.

It's analog to the Matérn covariance functions:


Those gamma's define the "smoothness" level in the Matérn-cov case. 

Even we might wish to have a continuous gamma value, its expensive to calculate 
and (for me) its enough to try those {1/2, 3/2, 5/2, infinity} values.


Andre


Reply all
Reply to author
Forward
0 new messages