truncated distribution with censored data

355 views
Skip to first unread message

Peter Smits

unread,
Nov 10, 2014, 4:17:27 PM11/10/14
to stan-...@googlegroups.com
Dear List,

Sorry if this is a bit of a beginner question. I'm mostly new to this level of modeling and Bayesian data analysis and don't come a statistics background at all.

I'm wondering about how to update the log-probability of censored data from a truncated distribution.

I'm writing a survival model of species fossil durations. Because the fossil record is resolvable only to a certain degree of accuracy, there is a minimum duration for all observed species. In this case, the minimum duration is 1. Also, because not all species are extinct a few of the observations are right censored.

From the manual, I understand that for truncated distributions I first increment the log probability by the log of the sampling distribution and then subtract the log of the complementary cumulative density function evaluated at the truncation point from the log probability. I also know that for integrating over the censored observations, I increment the log probability by the log of the complementary cumulative density function. My concern is do I need to then subtract some value from the log probability? And if so, what?

Following the suggestions from BDA, the STAN manual, and various posts on this list I've been able to write a model that I believe is at least heading in the right direction. Attached is a short Stan model that fits a Weibull distribution to the observed durations. I've implemented truncation for the uncensored observations, but I'm stuck on what to do with the censored observations. I've also attached an R object (as ASCII file using dput) of the data I'm using. Hopefully this works as a minimal example.

Thanks in advance for any advice. Again, I'm relatively new to this whole business.

Cheers,

Peter Smits
zero_weibull.stan
minimal_data.txt

Bob Carpenter

unread,
Nov 10, 2014, 11:22:16 PM11/10/14
to stan-...@googlegroups.com
Your description of the model seems reasonable. As I understand it, there is
censoring both above and below, but no truncation.

The best thing to do to understand these models is to write out the
densities.

1. Censoring below.

For species with a survival time recorded as 1, the value might
be anywhere between 0 and 1. If there are N_censored_below observations,
you just need

increment_log_prob(N_censored_below * weibull_cdf_log(1, alpha, sigma));

Note this uses the CDF, not the CCDF, because it's the total probability from
0 (the lower bound of support for Weibull) to 1.

2. Censoring above.

For the species still alive, you need to know when they originated to know
what the censoring point is --- if species n arose (no idea what the
right term is here) at time t_origin[n] and is not yet
extinct at t_current, then you know the survival time is somewhere
above (t_current - t_origin[n]). For each such observation, you censor with
its own bounds. In Stan:

for (n in 1:N_censored_above)
increment_log_prob(weibull_ccdf_log(t_current - t_origin[n], alpha, sigma);

If t_origin is a vector of size N_truncated_above, you can vectorize to
just

increment_log_prob(weibull_ccdf_log(t_current - t_origin, alpha, sigma);

This uses the CCDF because it's 1 - CDF(U) to indicate the true value is
somewhere above U.

3. Uncensored

For species with survival time greater than 1 who are extinct, there's no
truncation needed. In Stan, in vector form, that's

dur_unc ~ weibull(alpha, sigma);

4. Priors

I have no idea what the scale of survival times is, but you want
the priors to be consistent with the expected posteriors. Especially
when you don't have a lot of data. There's a relevant section
in the regression chapter of the current (v2.5) Stan language manual that discusses
priors for scales --- the same arguments apply here. Also, with gamma, you
should double check the parameterization --- we use shape and inverse scale (aka rate).

- Bob
> --
> You received this message because you are subscribed to the Google Groups "Stan users mailing list" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to stan-users+...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
> <minimal_data.txt><zero_weibull.stan>

Peter Smits

unread,
Nov 11, 2014, 11:45:15 AM11/11/14
to stan-...@googlegroups.com
Bob (and list),

Thanks so much for the reply. That makes a lot of sense thinking about it as left censoring as opposed to just a truncated distribution. The example Stan lines are really helpful.

Also, thanks for the tip about setting priors for scale parameter. This isn't the final model and I was going to reparameterize sigma as a linear regression, but it is good to know there is advice for me out there.

Cheers,

Peter
--
Peter D Smits
Grad student
Committee on Evolutionary Biology
University of Chicago
Reply all
Reply to author
Forward
0 new messages