Pareto NBD and HB Pareto NBD

201 views

Skip to first unread message

Willemijn Jansen

unread,

Jan 21, 2017, 5:57:44 AM1/21/17

to Stan users mailing list

Hi guys!

I'm working on the Pareto/NBD model as explained in http://merc.e.u-tokyo.ac.jp/mmrc/dp/pdf/MMRC76_2006.pdf.

First I'm trying to estimate the original Pareto/NBD using stan.
However my hyperparameters ( r, alpha, s, beta) have a really small n_eff ( 135 , 357 from 6000 iterations for s and beta ). I use the following code:

ParetoCode <-"

data{

int N; //number of observations, integer

int K; // number of columns

matrix[N,K] x; //

}

transformed data{

vector [N] no_x;

vector [N] last_x;

vector [N] time;

vector<lower=0>[N] x_one;

no_x = x[,1];

last_x = x[,2];

time = x[,3];

x_one = no_x + 1;

}

parameters{

vector <lower=0>[N]lambda;

vector <lower=0>[N] mu;

real <lower=0>r;

real <lower=0>alpha;

real <lower=0>s;

real <lower=0>beta;

}

model{

vector[N] ll1;

vector[N] ll2;

vector[N] mu_lambda;

vector[N] log_lambda;

vector[N] log_mu;

vector[N] log_mu_lambda;

mu_lambda = mu + lambda;

log_mu_lambda = log(mu_lambda);

log_lambda = log(lambda);

log_mu = log(mu);

//prior hyperparameters

r ~ normal(0.5,1);

alpha ~ normal(10,1);

s ~ normal(0.5,1);

beta ~ normal(10,1);

//prior

mu ~ gamma(s,beta);

lambda ~ gamma(s,beta);

//likelihood

ll1 = no_x .* log_lambda + log_mu - log_mu_lambda - last_x .* (mu_lambda);

ll2 = x_one .* log_lambda - log_mu_lambda - time .* (mu_lambda);

target+= log(exp(ll1)+exp(ll2));

}

on the CDNOW data set.

Does it harm that the n_eff is rather small? The paper provides different estimates, however they use MLE for the hyperparameters.

Second, using the Hierarchical Bayes model, I use the following code:

HBCode <-"data {

int<lower=0> n_cust;

vector<lower=0>[n_cust] x;

vector<lower=0>[n_cust] time_of_last_txn;

vector<lower=0>[n_cust] observation_time;

}

transformed data {

vector<lower=0>[n_cust] x_one;

x_one = x + 1;

}

parameters {

vector[n_cust] lambda_raw;

vector[n_cust] mu_raw;

vector<lower=0>[2] Beta; // log_lambda_mean and log_mu_mean

cov_matrix[2] Sigma; // lambda_sd,cov(lambda,mu),mu_sd

}

transformed parameters {

vector<lower=0>[n_cust] lambda;

vector<lower=0>[n_cust] mu;

row_vector[2] elementswise;

for ( i in 1:n_cust) {

elementswise[1] = lambda_raw[i]*Sigma[1,1] + mu_raw[i]*Sigma[2,1];

elementswise[2] = lambda_raw[i]*Sigma[1,2] + mu_raw[i]*Sigma[2,2];

lambda[i] = exp(Beta[1] + elementswise[1] );

mu[i] = exp(Beta[2] + elementswise[2] );

}

model {

// local variables likelihood

vector[n_cust] ll1;

vector[n_cust] ll2;

vector[n_cust] mu_lambda;

vector[n_cust] log_lambda;

vector[n_cust] log_mu;

vector[n_cust] log_mu_lambda;

real v0;

//local variables hyper

vector[2] Beta0;

matrix[2,2] Sigma0;

matrix[n_cust,2] theta;

mu_lambda = lambda + mu;

log_mu_lambda = log(mu_lambda);

log_lambda = log(lambda);

log_mu = log(mu);

theta[,1] = lambda;

theta[,2] = mu;

Beta0[1] = 0;

Beta0[2] = 0;

Sigma0[1,1]= 1;

Sigma0[1,2]= 0;

Sigma0[2,1]= 0;

Sigma0[2,2]= 1;

v0 = 3;

// hyperprior

Beta ~ multi_normal(Beta0, Sigma0);

Sigma ~ inv_wishart(v0, v0*Sigma0);

//prior

for ( i in 1:n_cust){

theta[i,] ~ multi_normal(Beta0,Sigma0); // implies theta ~ lognormal(zero, identity);

}

//likelihood

ll1 = x .* log_lambda + log_mu - log_mu_lambda - time_of_last_txn .* (mu_lambda);

ll2 = (x_one) .* log_lambda - log_mu_lambda - observation_time .* (mu_lambda);

target+= log(exp(ll1)+exp(ll2));

}

Based on the pareto-nbd-ish code https://gist.github.com/capers/9bd73b57e604db376e75.

The estimates of Sigma don't correspond to the estimates as reported in Abe(2006). Moreover, n_eff is even smaller

How can I improve these results?

Bob Carpenter

unread,

Jan 23, 2017, 2:41:20 PM1/23/17

to stan-...@googlegroups.com

You can run Stan's MLE using optimization to check your model.

The problem you're having with s and beta probably arises from
the centered parameterization you use:

> parameters{
> vector <lower=0>[N]lambda;
> vector <lower=0>[N] mu;

...

> real <lower=0>s;
> real <lower=0>beta;

...
> model{
...

> log_lambda = log(lambda);
> log_mu = log(mu);
>

...

> //prior
> mu ~ gamma(s,beta);
> lambda ~ gamma(s,beta);

That's not uncommon in hierarchical models coded as you coded yours
with a centered hyperprior. The centered approach winds up
linking (s, beta, mu, lambda) in the prior. I don't know if
it's possible to write a non-centered gamma prior---it looks like
it should be easy enough for the scale, but I don't know about shape:

https://en.wikipedia.org/wiki/Gamma_distribution#Generating_gamma-distributed_random_variables

You might find it works better just declaring log_lambda and log_mu
as parameters (rather than defined local varialbes in the model) and
giving them a non-centered normal prior. Or equivalently, leave them on
the positive scale and use a non-centered lognormal prior. Or maybe someone
can figure out how to do the non-centered parameterization of the gamma.

- Bob

> --
> You received this message because you are subscribed to the Google Groups "Stan users mailing list" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to stan-users+...@googlegroups.com.
> To post to this group, send email to stan-...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

Reply all

Reply to author

Forward

0 new messages