Pareto NBD and HB Pareto NBD

201 views
Skip to first unread message

Willemijn Jansen

unread,
Jan 21, 2017, 5:57:44 AM1/21/17
to Stan users mailing list
Hi guys!

I'm working on the Pareto/NBD model as explained in http://merc.e.u-tokyo.ac.jp/mmrc/dp/pdf/MMRC76_2006.pdf

First I'm trying to estimate the original Pareto/NBD using stan. 
However my hyperparameters ( r, alpha, s, beta) have a really small n_eff ( 135 , 357 from 6000 iterations for s and beta  ). I use the following code:

ParetoCode <-"
data{
int N; //number of observations, integer
int K; // number of columns
matrix[N,K] x; // 
}
transformed data{
vector [N] no_x;
vector [N] last_x;
vector [N] time;
vector<lower=0>[N] x_one;
no_x = x[,1];
last_x = x[,2];
time = x[,3];
x_one = no_x + 1;
}
parameters{
vector <lower=0>[N]lambda; 
vector <lower=0>[N] mu;
real <lower=0>r;
real <lower=0>alpha;
real <lower=0>s;
real <lower=0>beta;
}
model{
vector[N] ll1;
vector[N] ll2;
vector[N] mu_lambda;
vector[N] log_lambda;
vector[N] log_mu;
vector[N] log_mu_lambda;

mu_lambda = mu + lambda;
log_mu_lambda = log(mu_lambda);
log_lambda = log(lambda);
log_mu = log(mu);

//prior hyperparameters
r ~ normal(0.5,1);
alpha ~ normal(10,1);
s ~ normal(0.5,1);
beta ~ normal(10,1);

//prior
mu ~ gamma(s,beta);
lambda ~ gamma(s,beta);

//likelihood
ll1 = no_x .* log_lambda + log_mu - log_mu_lambda - last_x .* (mu_lambda);
ll2 = x_one .* log_lambda - log_mu_lambda - time .* (mu_lambda);

target+= log(exp(ll1)+exp(ll2));
}
"
on the CDNOW data set. 

Does it harm that the n_eff is rather small? The paper provides different estimates, however they use MLE for the hyperparameters. 


Second, using the Hierarchical Bayes model, I use the following code:
HBCode <-"data {
  int<lower=0> n_cust;
vector<lower=0>[n_cust] x;
vector<lower=0>[n_cust] time_of_last_txn;
vector<lower=0>[n_cust] observation_time;
}
transformed data {
vector<lower=0>[n_cust] x_one;
x_one = x + 1;
}
parameters {
vector[n_cust] lambda_raw;
vector[n_cust] mu_raw;
vector<lower=0>[2] Beta; // log_lambda_mean and log_mu_mean
cov_matrix[2] Sigma; // lambda_sd,cov(lambda,mu),mu_sd

}
transformed parameters {
vector<lower=0>[n_cust] lambda; 
vector<lower=0>[n_cust] mu; 
row_vector[2] elementswise;

for ( i in 1:n_cust) {
elementswise[1] = lambda_raw[i]*Sigma[1,1] + mu_raw[i]*Sigma[2,1];
elementswise[2] = lambda_raw[i]*Sigma[1,2] + mu_raw[i]*Sigma[2,2];
lambda[i] = exp(Beta[1] + elementswise[1] );
mu[i] = exp(Beta[2] + elementswise[2] );
}
}
model {
// local variables likelihood
vector[n_cust] ll1;
vector[n_cust] ll2;
vector[n_cust] mu_lambda;
vector[n_cust] log_lambda;
vector[n_cust] log_mu;
vector[n_cust] log_mu_lambda;
real v0;
//local variables hyper
vector[2] Beta0;
matrix[2,2] Sigma0;
matrix[n_cust,2] theta;

mu_lambda = lambda + mu;
log_mu_lambda = log(mu_lambda);
log_lambda = log(lambda);
log_mu = log(mu);

theta[,1] = lambda;
theta[,2] = mu;
Beta0[1] = 0;
Beta0[2] = 0;
Sigma0[1,1]= 1;
Sigma0[1,2]= 0;
Sigma0[2,1]= 0;
Sigma0[2,2]= 1;
v0 = 3;

// hyperprior
Beta ~ multi_normal(Beta0, Sigma0);
Sigma ~ inv_wishart(v0, v0*Sigma0);

//prior
for ( i in 1:n_cust){
theta[i,] ~ multi_normal(Beta0,Sigma0); // implies theta ~ lognormal(zero, identity);
}

//likelihood
ll1 = x .* log_lambda + log_mu - log_mu_lambda - time_of_last_txn .* (mu_lambda);
ll2 = (x_one) .* log_lambda - log_mu_lambda - observation_time .* (mu_lambda);

target+= log(exp(ll1)+exp(ll2));
}
"

Based on the pareto-nbd-ish code https://gist.github.com/capers/9bd73b57e604db376e75
The estimates of Sigma don't correspond to the estimates as reported in Abe(2006). Moreover, n_eff is even smaller  
How can I improve these results?



Bob Carpenter

unread,
Jan 23, 2017, 2:41:20 PM1/23/17
to stan-...@googlegroups.com
You can run Stan's MLE using optimization to check your model.

The problem you're having with s and beta probably arises from
the centered parameterization you use:

> parameters{
> vector <lower=0>[N]lambda;
> vector <lower=0>[N] mu;
...
> real <lower=0>s;
> real <lower=0>beta;


...
> model{
...

> log_lambda = log(lambda);
> log_mu = log(mu);
>
...

> //prior
> mu ~ gamma(s,beta);
> lambda ~ gamma(s,beta);


That's not uncommon in hierarchical models coded as you coded yours
with a centered hyperprior. The centered approach winds up
linking (s, beta, mu, lambda) in the prior. I don't know if
it's possible to write a non-centered gamma prior---it looks like
it should be easy enough for the scale, but I don't know about shape:

https://en.wikipedia.org/wiki/Gamma_distribution#Generating_gamma-distributed_random_variables

You might find it works better just declaring log_lambda and log_mu
as parameters (rather than defined local varialbes in the model) and
giving them a non-centered normal prior. Or equivalently, leave them on
the positive scale and use a non-centered lognormal prior. Or maybe someone
can figure out how to do the non-centered parameterization of the gamma.

- Bob
> --
> You received this message because you are subscribed to the Google Groups "Stan users mailing list" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to stan-users+...@googlegroups.com.
> To post to this group, send email to stan-...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

Reply all
Reply to author
Forward
0 new messages