Hello,
I have a large dataset of court cases and their citations over time. I need to estimate the importance of a case given its citations over time. In other words, consider a population of cases, each represented by a vector of arrival times (the number of days after its birth at which a citation was made to it). The goal is to fit a model that explains these citation profiles in terms of the importance of a case, plus other processes, but I mostly care about recovering the "importance" parameter. I want to use a bayesian approach (rather than simply likelihood maximization) because some of the cases have very short vectors and I would therefore benefit from pooling. I am also interested in estimating the population-level distribution of the case-level parameters.
The likelihood is drawn from
this recent paper that develops a parsimonious model of the life-history of a paper, which they call a reinforced Poisson process. Basically the rate at which a paper d (or case) gets cited at time t is x_d(t) = lambda_d f(t;theta_d)i_d(t). lambda_d is the intrinsic importance of the case/paper, f(t;\theta_d) is a log-normal distribution for the time t of citation, reflecting the fact that there is a time to maturity of a paper/case and then it decays as it ages. Finally i_d(t) is the number of citations received up to time t, and this models the preferential attachment process (highly cited cases are more likely to be cited just because they are highly visible). From there one can derive the probability of each citation arrival time for case d and form the likelihood. I am interested first and foremost in the parameter lambda_d, but also in theta_d (scale and location parameter of the lognormal) .
I attach the precise mathematical formulation of the likelihood so that you can understand my Stan code.
I am not sure how to ascribe good priors for this problem. For now, I keep theta_d fixed to values estimated from the data, and use a Gamma(alpha,beta) for the lambda_d parameters. I do not have strong priors about alpha and beta, so I would like to estimate them from data. For now, I thought that uniform distribution for both could work, as a first step. But I am seeking guidance on this choice of priors. As you will see below my parameterization of the priors is giving me problems.
Below is my Stan code (called from python with pystan). You will see that I increment the log-probability with each of the terms of the likelihood in the mathematical formula attached (in the same order as in the formula). When I try to fit, I get the following error:
Exception thrown at line 25: stan::math::gamma_log: Shape parameter is inf, but must be finite!
OR
Exception thrown at line 25: stan::math::gamma_log: Inverse scale parameter is 0, but must be > 0!
If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.
I would appreciate feedback on whether I successfully coded the likelihood and what priors and hyperpriors I should use to get a proper posterior and avoid the errors I am getting.
Thank you so much
imp_code = """
data {
int<lower=0> NNd; // total number of citation events
int<lower=0> N; // total number of documents
int<lower=0> Nd[N]; // number of citation events of each document
int<lower=0> Ts[N]; // length ( days) of the period of observation of each document
vector[NNd] y; // citation events
}
transformed data {
int<lower=0> m;
real<lower=0> mu; // location of the lognormal decay function; to be modeled later
real<lower=0> s; // scale of the lognormal decay function; to be modeled later
m <- 3;
mu <-7.466;
s <- 1.14;
}
parameters {
real<lower=0> l[N]; // importance parameter for each case
real<lower=0> alpha; // shape parameter of l~Gamma
real<lower=0> beta; // inverse scale parameter of l~Gamma
}
model {
int pos;
pos <- 1;
l ~ gamma(alpha,beta);
alpha ~ uniform(0.01,100);
beta ~ uniform(0.01,100);
for (n in 1:N) {
segment(y, pos,Nd[n]) ~ lognormal(mu,s);
increment_log_prob(l[n]*normal_cdf((log(segment(y,pos,Nd[n]))-mu)/s,0,1));
increment_log_prob(-l[n]*(Nd[n]+m)*normal_cdf((log(Ts[n])-mu)/s,0,1));
increment_log_prob(l[n]^Nd[n]);
pos <- pos + Nd[n];
}
}"""
#this part is python
netdata = {'NNd':sum(Nd),'N':len(Nd),'Nd':Nd,'Ts':Ts,'y':y}
fit = ps.stan(model_code = imp_code, data = netdata, iter = 10000, chains = 3, thin = 1, warmup = 5000)