User-defined- distribution (generalized poisson)

hongxuan yan

unread,

Jul 17, 2015, 7:56:12 AM7/17/15

to stan-...@googlegroups.com

Hi,

Whith the help of Bob, I am trying to define a generalized poisson distribution. But I am not very clear about dealing with "n" in the function.

Do I need define n<-uniform_rng(-1,1000000)? If so, should I regard n as a parameter or not?

For defining a distribution f(x), there are 3 things, first, we need know the pdf. Second, we need define the parameters. Last, how can we define the x? Besides,for many of distributions, the range of x is (-inf, inf).

library(rstan)

cat("Stan version:", stan_version(), "\n")

stanmodelcode <- "

functions {

real generalized_poisson_log(int y, int n, real theta, real lambda) {

return log(theta)

+ (n - 1) * log(theta + n * lambda)

- lgamma(n + 1)

- n * lambda

- theta;

}

data {

int<lower=0> N;

int y[N];

}

parameters {

real<lower=0, upper=0.5> d;

real c;

real<lower=0> sig2;

real<lower=-1, upper=1> u;

real<lower=-1, upper=1> lambda;

vector[N] e;

}

transformed parameters {

vector[N] mu;

real D0;

real D1;

real D2;

real D3;

D0 <- 1;

D1 <- 2*d*u;

D2 <- 2*u*((d-1)/2.0+1)*D1-(2*(d-1/2.0)+1)*D0;

D3 <- 2*u*((d-1)/3.0+1)*D2-(2*(d-1/3.0)+1)*D1;

mu[1] <- exp(c+e[1]);

mu[2] <- exp(c+e[2] + D1*e[1]);

mu[3] <- exp(c+e[3] + D1*e[2] + D2*e[1]);

for (i in 4:N) {

mu[i] <- exp(c+e[i] + D1*e[i-1] + D2*e[i-2]+D3*e[i-3]);

}

model {

u ~ uniform(-1,1);

c ~ normal(0,10);

d ~ uniform(0,0.5);

sig2~gamma(3,1);

e~normal(0,sig2);

lambda~uniform(-1,1);

for (j in 1:N) {

y[j] ~ generalized_poisson(n, mu[j]*(1-lambda), lambda);

}

"

y=c(0,1,0,1,3,9,2,3,5,3,5,2,2,0,1,0,1,3,3,2,1,1,5,0,3,1,0,1,4,0,0,1,6,14,1,1,0,0,1,1,1,1,0,2,0,2,0,1,0,1,0,1,0,1,0,1,0,0,2,0,1,0,1,0,0,1,2,0,0,1,2,0,3,1,1,0,2,0,4,0,2,1,1,1,1,0,1,1,0,2,1,3,1,2,4,0,0,0,1,0,1,0,2,2,4,2,3,3,0,0,2,7,8,2,4,1,1,2,4,0,1,1,1,3,0,0,0,0,1,0,2,2,0,0,0,0,0,1,2,0,2,0,0,0,1,0,1,0,1,0,2,0,0,1,2,0,1,0,0,0,1,2,1,0,1,3,6)

dat <- list(N = 167, y = y);

fit <- stan(model_code = stanmodelcode, model_name = "example",

data = dat, iter = 2012, chains = 3)

print(fit)

Bob Carpenter

unread,

Jul 17, 2015, 11:46:35 AM7/17/15

to stan-...@googlegroups.com

If you're defining the probability (mass/density) function, you
shouldn't need any RNGs. In fact, Stan won't let you include RNGs
in the definition of a distribution.

I don't understand what your concerns are about x.

Was the version I wrote below not the right generalization of the
Poisson?

- Bob

> --
> You received this message because you are subscribed to the Google Groups "Stan users mailing list" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to stan-users+...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

Jonah

unread,

Jul 17, 2015, 4:50:33 PM7/17/15

to stan-...@googlegroups.com

To piggy-back on what Bob said, if you are trying to implement the log density function (or mass function) you don't need RNGs because you just need to compute the log of the pdf/pmf using the arguments specified to your function (the data, parameters, whatever). There's no RNG involved because inside the function everything is known (in the sense that it's passed in as an argument) and you just do the needed computation and spit out the result. The RNGs would be relevant if you want to write a function to draw from a distribution rather than just evaluate the log-pdf/pmf. Hope that makes sense.

hongxuan yan

unread,

Jul 18, 2015, 3:10:03 AM7/18/15

to stan-...@googlegroups.com

Hi,

Yes, your version is totally correct. My question is about the value of "n" in the function. In R or winbugs, we say y~generalized poisson(theta, lambda) but there is one "extra n". What value should I put for n? For example, for normal, we only need mu and sigma to define it. We say y~normal(mu,sigma). Now in this generalized Poisson function, the value of "n" need to be defined, I can not define generalized Poisson only by theta and lambda.

Seth Flaxman

unread,

Jul 18, 2015, 9:33:35 AM7/18/15

to stan-...@googlegroups.com

I think that's just a typo--there's only two parameters in the
distribution and the observation is denoted n (but don't be
confused--later it's in a variable y).

real generalized_poisson_log(int n, real theta, real lambda) {

return log(theta)
+ (n - 1) * log(theta + n * lambda)
- lgamma(n + 1)
- n * lambda
- theta;
}

And then later:

y[j] ~ generalized_poisson(mu[j]*(1-lambda), lambda);

Recall that this is equivalent to:

increment_log_prob(generalized_poisson_log(y[j], mu[j]*(1-lambda), lambda));

So for consistency you may want to relabel either n or y...
Seth

hongxuan yan

unread,

Jul 20, 2015, 1:00:19 AM7/20/15

to stan-...@googlegroups.com, fla...@gmail.com

Thanks so much, it makes sense now.

Tadaishi Yatabe

unread,

Jan 10, 2017, 7:00:28 PM1/10/17

to Stan users mailing list, fla...@gmail.com

Hello folks,

I'm trying to solve a problem that also seems to require a generalized Poisson. I'm trying to fit a model of a count of events as a function of two variables, but unsuccesfully so far. I'm following the parameterization of the generalized poisson presented here http://www.stata-journal.com/sjpdf.html?articlenum=st0279. Please give me some advice for my code if possible. Thanks!

# Generalized poisson

stancode <- "functions {

real generalized_poisson(int n, real theta, real lambda) {

return log(theta)

+ (n - 1) * log(theta + n * lambda)

- lgamma(n + 1)

- n * lambda

- theta;

}

data {

int<lower=0> N;

int No_outbreaks[N];

real Score_surv100_c[N];

real windeg_c[N];

}

parameters {

vector[N] theta;

real<lower=-1, upper=1> lambda;

real a;

real bs;

real bi;

real bsi;

}

model {

vector[N] lambdastar;

theta[N] ~ exponential(1);

lambda ~ normal(0,1);

bsi ~ normal( 0 , 1 );

bi ~ normal( 0 , 1 );

bs ~ normal( 0 , 1 );

a ~ normal( 0 , 100 );

for (i in 1:N){

lamdastar[i] = theta[i]/(1-lambda)

}

for (i in 1:N){

log(lamdastar[i]) = a + bs * Score_surv100_c[i] + bi * windeg_c[i] + bsi * Score_surv100_c[i] * windeg_c[i];

}

No_outbreaks ~ generalized_poisson(No_outbreaks[i], theta, lambda);

}

"

d.stan <- list(No_outbreaks=c(1, 1, 0, 2, 1, 0, 2, 2, 1, 2, 3, 1, 1, 1, 2, 2, 2), Score_surv100_c=c(-2.7588458, -2.7588458, 3.4339806, 3.0432020, 2.7448574, 3.7356885, -1.7610177, -4.1047376, -2.4006051, -2.4006051, -5.1737057, -0.7110643, -0.3697673, -0.7110643, 3.5651855, 3.9328326, 2.6945123) , windeg_c=c(1, 0, 0, 1, 1, 0, 1, 0, 1, 1, 0, 0, 1, 0, 1, 1, 1), N=17)