Hierarchical model for a reinforced Poisson process: problem with specifying priors

256 views
Skip to first unread message

Marion Dumas

unread,
Jan 27, 2016, 5:48:50 PM1/27/16
to Stan users mailing list
Hello, 
I have a large dataset of court cases and their citations over time. I need to estimate the importance of a case given its citations over time.  In other words, consider a population of cases, each represented by a vector of arrival times (the number of days after its birth at which a citation was made to it). The goal is to fit a model that explains these citation profiles in terms of the importance of a case, plus other processes, but I mostly care about recovering the "importance" parameter. I want to use a bayesian approach (rather than simply likelihood maximization) because some of the cases have very short vectors and I would therefore benefit from pooling. I am also interested in estimating the population-level distribution of the case-level parameters. 
The likelihood is drawn from this recent paper that develops a parsimonious model of the life-history of a paper, which they call a reinforced Poisson process. Basically the rate at which a paper d (or case) gets cited at time t is x_d(t) = lambda_d f(t;theta_d)i_d(t). lambda_d is the intrinsic importance of the case/paper, f(t;\theta_d) is a log-normal distribution for the time t of citation, reflecting the fact that there is a time to maturity of a paper/case and then it decays as it ages. Finally i_d(t) is the number of citations received up to time t, and this models the preferential attachment process (highly cited cases are more likely to be cited just because they are highly visible). From there one can derive the probability of each citation arrival time for case d and form the likelihood. I am interested first and foremost in the parameter lambda_d, but also in theta_d (scale and location parameter of the lognormal) .

I attach the precise mathematical formulation of the likelihood so that you can understand my Stan code.
I am not sure how to ascribe good priors for this problem. For now, I keep theta_d fixed to values estimated from the data, and use a Gamma(alpha,beta) for the lambda_d parameters. I do not have strong priors about alpha and beta, so I would like to estimate them from data. For now, I thought that uniform distribution for both could work, as a first step. But I am seeking guidance on this choice of priors. As you will see below my parameterization of the priors is giving me problems.

Below is my Stan code (called from python with pystan). You will see that I increment the log-probability with each of the terms of the likelihood in the mathematical formula attached (in the same order as in the formula). When I try to fit, I get the following error: 
Exception thrown at line 25: stan::math::gamma_log: Shape parameter is inf, but must be finite!
OR 
Exception thrown at line 25: stan::math::gamma_log: Inverse scale parameter is 0, but must be > 0!
If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.

I would appreciate feedback on whether I successfully coded the likelihood and what priors and hyperpriors I should use to get a proper posterior and avoid the errors I am getting.
Thank you so much

imp_code = """
data {
int<lower=0> NNd; // total number of citation events
int<lower=0> N; // total number of documents
int<lower=0> Nd[N]; // number of citation events of each document
int<lower=0> Ts[N]; // length ( days) of the period of observation of each document
vector[NNd] y; // citation events
}
transformed data {
int<lower=0> m;
real<lower=0> mu; // location of the lognormal decay function; to be modeled later
real<lower=0> s; // scale of the lognormal decay function; to be modeled later
m <- 3;
mu <-7.466;
s <- 1.14;
}
parameters {
real<lower=0> l[N]; // importance parameter for each case
real<lower=0> alpha; // shape parameter of l~Gamma
real<lower=0> beta;  // inverse scale parameter of l~Gamma
}
model {
int pos;
pos <- 1;
l ~ gamma(alpha,beta);
alpha ~ uniform(0.01,100);
beta ~ uniform(0.01,100);
for (n in 1:N) {
    segment(y, pos,Nd[n]) ~ lognormal(mu,s); 
    increment_log_prob(l[n]*normal_cdf((log(segment(y,pos,Nd[n]))-mu)/s,0,1));
    increment_log_prob(-l[n]*(Nd[n]+m)*normal_cdf((log(Ts[n])-mu)/s,0,1));
    increment_log_prob(l[n]^Nd[n]);
    pos <- pos + Nd[n];    
}
}"""

#this part is python
netdata = {'NNd':sum(Nd),'N':len(Nd),'Nd':Nd,'Ts':Ts,'y':y} 
fit = ps.stan(model_code = imp_code, data = netdata, iter = 10000, chains = 3, thin = 1, warmup = 5000)
stan_likelihood_coding_question.pdf

Andrew Gelman

unread,
Jan 27, 2016, 5:52:46 PM1/27/16
to stan-...@googlegroups.com
Hi, quick comments:

1.  In many applications you’ll want overdisp Poisson, not Poisson.  Poisson can have problems becasue variance is too small.

2.  no on uniform(0,01, 100).  I almost think Stan should have a parser and spit out a warning when people put in lower and upper bounds that are not 0 or 1, because it’s almost always a bad idea.

A

--
You received this message because you are subscribed to the Google Groups "Stan users mailing list" group.
To unsubscribe from this group and stop receiving emails from it, send an email to stan-users+...@googlegroups.com.
To post to this group, send email to stan-...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
<stan_likelihood_coding_question.pdf>

Jonah Gabry

unread,
Jan 27, 2016, 8:06:52 PM1/27/16
to Stan users mailing list, gel...@stat.columbia.edu
I agree with Andrew about not using the constrained uniform, but even if it were a good prior to use you would need to tweak your Stan code a bit to use it correctly. Right now you have the parameters declared with <lower=0>, but uniform(0.01, 100) means that the constraints need to be <lower=0.01,upper=100>. That said, it'd be better to leave the <lower=0> and use a prior that imposes a "soft" constraint by decaying quickly enough to put very little probability mass at extreme values. So maybe a half-normal or half-cauchy with a moderate scale parameter, or something like that. The nice thing is that you don't need to use a special half-normal or half-cauchy function: if you use normal or cauchy (or whatever) for a parameter declared with <lower=0> then Stan knows it needs to be the half-normal, etc. 

Jonah Gabry

unread,
Jan 27, 2016, 8:29:08 PM1/27/16
to Stan users mailing list, gel...@stat.columbia.edu
Also, are those warning messages you're getting occurring during warmup or sampling? Like the message says, if the warning only happens occasionally (especially during warmup) then you're fine. If you're getting a lot of those during sampling then that's a problem.

Marion Dumas

unread,
Jan 27, 2016, 11:49:24 PM1/27/16
to Stan users mailing list, gel...@stat.columbia.edu
I am getting this warning at every step.

Marion Dumas

unread,
Jan 27, 2016, 11:50:58 PM1/27/16
to stan-...@googlegroups.com, gel...@stat.columbia.edu
Isn't overdispersed if lambda is gamma-distributed?
Thank you, I will follow your advice on the priors.

Bob Carpenter

unread,
Jan 28, 2016, 1:59:00 PM1/28/16
to stan-...@googlegroups.com
I'd also suggest fitting with simulated data. There can
often be problems with fits for very misspecified models.

I'd also recommend putting the Stan code in a separate
file so you can easily track down line number references.

- Bob

> On Jan 27, 2016, at 5:48 PM, Marion Dumas <mari...@gmail.com> wrote:
>
Reply all
Reply to author
Forward
0 new messages