Non-Homogeneous Poisson Process (NHPP) Model

747 views
Skip to first unread message

Sean O'Riordain

unread,
May 27, 2013, 1:06:33 PM5/27/13
to stan-...@googlegroups.com
Hi stan-users,

I'm a newcomer to Stan-type models, and I'm wondering if it is possible to implement a Non-Homogeneous Poisson Process (NHPP process model in Stan?  For this exercise I am looking at bugs in software in the NTDS dataset referred to in [goel1997nhpp].

I have been exploring this dataset with grid approximation and I'd like to try it with Stan.

I am using R3.0.1 on Ubuntu and rstan version 1.3.0.

My complete code is:

library(rstan)
# set_cppo("fast")  # for best running speed
set_cppo('debug') # make debug easier

# the failure time in days from the start of testing
bugt <- c(9L, 21L, 32L, 36L, 43L, 45L, 50L, 58L, 63L, 70L, 71L, 77L,
  78L, 87L, 91L, 92L, 95L, 98L, 104L, 105L, 116L, 149L, 156L, 247L,
  249L, 250L, 337L, 384L, 396L, 405L, 540L, 798L, 814L, 849L)

# restrict bugs to the first 26 elements to match the original paper
N <- 26

# visualize the dataset with the mean value function with
# some nominal values
alpha <- 34
beta <- 0.00579
mvt <- function(x) { alpha*(1-exp(-beta*x)) }
curve(mvt, to=250, lty=3, xlab='Time (days)',
      ylab='Cumulative Bugs Found')
points(bugs, 1:26)

bugs_dat <- list(N=N, bugs=bugt[1:N])

# here I try to implement the NHPP model for the NTDS data set
# i.e. Pr{N(t) = y} = Poisson(y, mvt(t)), y=0,1,2,...
# where mvt(t) = alpha*(1-exp(-beta*t))
smodel <- '
  data {
    int<lower=0> N; // number of data points (bugs observed)
    real<lower=0> bugs[N]; // the observed bug times in days
  }
  parameters {
    real<lower=1> alpha; // estimated number of bugs in the system
    real<lower=0> beta;  // finding "rate" parameter
  }
  model {
    for (d in 1:N) {
      bugs[d] ~ poisson(d, alpha*(1-exp(-beta*bugs[d])));
      // lp__ ~ mvt[d]^d*exp(-mvt[d])/d!
    }
  } '

# we expect alpha to be about 34 and beta about 0.00579
# for the first 26 elements of bugt
fit <- stan(model_code = smodel, data = bugs_dat, iter = 1000, chains = 4)

However I'm getting an error:

TRANSLATING MODEL 'smodel' FROM Stan CODE TO C++ CODE NOW.
Error in stanc(file = file, model_code = model_code, model_name = model_name,  :
  failed to parse Stan model 'smodel' with error message:
EXPECTATION FAILURE LOCATION: file=input; line=12, column=7

      bugs[d] ~ poisson(d, alpha*(1-exp(-beta*bugs[d])));
      ^-- here

DIAGNOSTIC(S) FROM PARSER:
no matches for function name="poisson_log"
    arg 0 type=real
    arg 1 type=int
    arg 2 type=real
available function signatures for poisson_log:
0.  poisson_log(int, real) : real
1.  poisson_log(int, real[1]) : real
2.  poisson_log(int, vector) : real
3.  poisson_log(int, row vector) : real
4.  poisson_log(int[1], real) : real
5.  poisson_log(int[1], real[1]) : real
6.  poisson_log(int[1], vector) : real
7.  poisson_log(int[1], row vector) : real
unknown distribution=poisson
Parser expecting: "}"


Is it possible to use such a NHPP model?
Pr{N(t) = y} = Poisson(y, mvt(t)), y=0,1,2,...
where is the mean value function:
 mvt(t) = alpha*(1-exp(-beta*t))

bugs[d] in the model above is the time bug d was detected and alpha is the total expected number of bugs.

Many thanks in advance for your help.

Kind regards,
Seán


@article{goel1997nhpp,
  title={Time-Dependent Error-Detection Rate Model for Software Reliability and Other Performance Measures},
  author={Amrit L. Goel and Kazu Okumoto},
  journal={IEEE Trasactions on Reliability},
  volume={R-28},
  number={3},
  pages={206--211},
  year={1997},
  month={August},
  publisher={IEEE},
  keywords={Software reliability, NHPP, Software modelling, Software failure analysis},
}



--
Kind regards,

Sean O'Riordain


ntds.csv

Jiqiang Guo

unread,
May 27, 2013, 1:32:00 PM5/27/13
to stan-...@googlegroups.com

I don't see a problem that stan would not work here.  But your data 'bugs' need to be declared as int in Stan.  In addition, your 'bugs' appear on both sides of the specification if Poisson distribution.  Should the second one be time in your model?

--
Jiqiang

--
You received this message because you are subscribed to the Google Groups "stan users mailing list" group.
To unsubscribe from this group and stop receiving emails from it, send an email to stan-users+...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

Sean O'Riordain

unread,
May 27, 2013, 2:13:23 PM5/27/13
to stan-...@googlegroups.com
Thank you Jiqiang,

I see your point!  That's definitely wrong!  However, when I say
    for (d in 1:N)
      d ~ poisson(d, alpha*(1-exp(-beta*bugs[d])));
where d is the number of the bug and bugs[d] is the time of that bug, I still get an error:

TRANSLATING MODEL 'smodel' FROM Stan CODE TO C++ CODE NOW.
Error in stanc(file = file, model_code = model_code, model_name = model_name,  :
  failed to parse Stan model 'smodel' with error message:
EXPECTATION FAILURE LOCATION: file=input; line=12, column=7

      d ~ poisson(d, alpha*(1-exp(-beta*bugs[d])));
      ^-- here

The bugs_dat here is integer, but in general it will be POSIXct in R - which I presume I just translate back to type real and scale it to be in days rather than seconds.

Thanks again,
Seán

Ben Goodrich

unread,
May 27, 2013, 3:44:04 PM5/27/13
to stan-...@googlegroups.com
On Monday, May 27, 2013 2:13:23 PM UTC-4, Sean wrote:
Thank you Jiqiang,

I see your point!  That's definitely wrong!  However, when I say
    for (d in 1:N)
      d ~ poisson(d, alpha*(1-exp(-beta*bugs[d])));
where d is the number of the bug and bugs[d] is the time of that bug, I still get an error:

But d is just the loop index, so it seems weird to model that. And you have d on both the left and right sides of the ~. So, I think it should be something like

for(d in 1:N) {
  d
~ poisson(alpha * (1 - exp(-beta*bugs[d])));
}

But that is still really suboptimal. You would want to use a vectorized loglikelihood and the poisson_log function

transformed data {
 
int d_vec[N];
 
for(d in 1:N) {
    d_vec
[d] <- d;

 
}
}
parameters
{
  real<lower=1> alpha; // estimated number of bugs in the system
  real
<lower=0> beta;  // finding "rate" parameter
}
}
model
{

  real log_lambda
[N];
  real log_alpha
;
  log_alpha
<- log(alpha);
 
for(d in 1:N) {
    log_lambda
[d] <- log_alpha + log1m(exp(-beta * bugs[d]));
 
}
  d_vec
~ poisson_log(log_lambda);
 
...
}

The next version of Stan will have better way to handle the log1m(exp(-beta * bugs[d])) piece.

Ben

Sean O'Riordain

unread,
May 27, 2013, 4:29:37 PM5/27/13
to stan-...@googlegroups.com
Thanks Ben,

That compiles nicely and I'm getting vaguely sane answers for alpha now - I'll have another look at beta tomorrow - but it's great to know that this is possible to do in Stan!  Your vectorized code runs really quickly too!

d is also the cumulative number of bugs found to the time bugs[d] - I think it's pretty standard in the domain.

All the best from Dublin, Ireland
Seán




--
You received this message because you are subscribed to the Google Groups "stan users mailing list" group.
To unsubscribe from this group and stop receiving emails from it, send an email to stan-users+...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

Bob Carpenter

unread,
May 27, 2013, 6:44:37 PM5/27/13
to stan-...@googlegroups.com
Was there not more to that error message? If not, what
came before that line?

This is an error with the model, not with the R interface.

- Bob

On 5/27/13 2:13 PM, Sean O'Riordain wrote:
> Thank you Jiqiang,
>
> I see your point! That's definitely wrong! However, when I say
> for (d in 1:N)
> d ~ poisson(d, alpha*(1-exp(-beta*bugs[d])));
> where d is the number of the bug and bugs[d] is the time of that bug, I still get an error:
>
> TRANSLATING MODEL 'smodel' FROM Stan CODE TO C++ CODE NOW.
> Error in stanc(file = file, model_code = model_code, model_name = model_name, :
> failed to parse Stan model 'smodel' with error message:
> EXPECTATION FAILURE LOCATION: file=input; line=12, column=7
>
> d ~ poisson(d, alpha*(1-exp(-beta*bugs[d])));
> ^-- here
>
> The bugs_dat here is integer, but in general it will be POSIXct in R - which I presume I just translate back to type
> real and scale it to be in days rather than seconds.
>
> Thanks again,
> Se�n
>
>
>
> On 27 May 2013 18:32, Jiqiang Guo <guo...@gmail.com <mailto:guo...@gmail.com>> wrote:
>
> I don't see a problem that stan would not work here. But your data 'bugs' need to be declared as int in Stan. In
> addition, your 'bugs' appear on both sides of the specification if Poisson distribution. Should the second one be
> time in your model?
>
> --
> Jiqiang
>
> Se�n
>
>
> @article{goel1997nhpp,
> title={Time-Dependent Error-Detection Rate Model for Software Reliability and Other Performance Measures},
> author={Amrit L. Goel and Kazu Okumoto},
> journal={IEEE Trasactions on Reliability},
> volume={R-28},
> number={3},
> pages={206--211},
> year={1997},
> month={August},
> publisher={IEEE},
> keywords={Software reliability, NHPP, Software modelling, Software failure analysis},
> }
>
>
>
> --
> Kind regards,
>
> Sean O'Riordain
> seor...@tcd.ie <mailto:seor...@tcd.ie>
>
>
> --
> You received this message because you are subscribed to the Google Groups "stan users mailing list" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to
> stan-users+...@googlegroups.com <mailto:stan-users%2Bunsu...@googlegroups.com>.
> For more options, visit https://groups.google.com/groups/opt_out.
>
>
> --
> You received this message because you are subscribed to the Google Groups "stan users mailing list" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to
> stan-users+...@googlegroups.com <mailto:stan-users%2Bunsu...@googlegroups.com>.
> For more options, visit https://groups.google.com/groups/opt_out.
>
>
>
>
>
> --
> Kind regards,
>
> Sean O'Riordain
> seor...@tcd.ie <mailto:seor...@tcd.ie>
Reply all
Reply to author
Forward
0 new messages