Compound Poisson in stan

Linas Mockus

unread,

Feb 10, 2015, 3:20:43 PM2/10/15

to stan-...@googlegroups.com

Hi,

I have some nasty heteroscedasticity for count data - after sorting the output and putting into equal size bins, calculating mean and variance of each bin, I get some quadratic relation between mean and variance: var=-10000*mean+0.1*mean, plot is attached. I have used negative binomial (parametrization 2) and got some results. However I am still searching for a better underlying distribution (the model is a fixed effect gravity model) to explain the data. The stan code is attached as well.

My questions:
1. should I stop with negative binomial or pursue some more complicated distributions such as compound Poisson?
2. I would love to implement compound Poisson anyway to see if it works better? Any suggestions how?

Thanks for good ideas,
Linas

Hetero.jpg

nb3.stan

Bob Carpenter

unread,

Feb 10, 2015, 11:56:50 PM2/10/15

to stan-...@googlegroups.com

If you can build a model of the overdispersion, you can
use that directly. But I'd be nervous about a linear
variance formula with a negation --- you might want to
use the log link function in a GLM to characterize it.

The negative binomial is a compound Poisson-gamma :-)
The best way to implement a compound distribution like
the negative binomial (or student-T) is to integrate out
the latent parameter. If you can't do that, then you
can just code everything up explicitly using the latent
parameters, but it can be much slower sampling.

- Bob

> --
> You received this message because you are subscribed to the Google Groups "Stan users mailing list" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to stan-users+...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
> <Hetero.jpg><nb3.stan>

Linas Mockus

unread,

Feb 11, 2015, 8:30:18 AM2/11/15

to stan-...@googlegroups.com

Thanks, Bob

Is it possible to provide some reference how to use the model for overdispersion directly in stan? I would like to explore this idea first. By the way, glm(y~x+x^2-1,family=gaussian(link="log")) gave me 1e-5 as a coefficient for x and -1e-12 for x^2.

Thank you,
Linas

Bob Carpenter

unread,

Feb 11, 2015, 1:48:51 PM2/11/15

to stan-...@googlegroups.com

I'm not exactly sure what you're looking for, but
you can [Poisson regression overdispersion] on the web.

I have an econometrics book that covers count data in detail:

http://www.amazon.com/Regression-Analysis-Econometric-Society-Monographs/dp/1107667275/

I've only got the first edition, but it has a lot of info
on regression for overdispersion.

The basic idea is that you can model the overdispersion just
like you model the mean.

- Bob

Linas Mockus

unread,

Feb 16, 2015, 6:07:35 PM2/16/15

to stan-...@googlegroups.com

Thanks Bob. Can you suggest how to implement in stan the overdispersed negative binomial: Var(y)=psi*mu+psi/kappa*mu where mu=E(y)?

Linas

Bob Carpenter

unread,

Feb 16, 2015, 6:25:56 PM2/16/15

to stan-...@googlegroups.com

You need to set the parameters so that you get the desired
mean and variance.

The expectations and variances of both of our negative binomials
are given in the manual. For example,

y ~ neg_binomial2(mu,phi)
E[y] = mu
var[y] = mu + mu^2 / phi

So if you want

var[y] = alpha

just set

alpha = mu + mu^2 / phi

and solve,

phi = mu^2 / (alpha - mu)

Because phi > 0, this only works for *over*dispersion,
requiring alpha > mu.

- Bob

Reply all

Reply to author

Forward