95% HPD

plef

unread,

May 10, 2010, 10:43:44 AM5/10/10

to beast-users

Dear all,

I wonder how the 95% HPD intervalls are calculated in tracer (v1.5). I
want to work directly on the log file (without using tracer) and I
didn't manage to find the same results as tracer.

- is the 95% HPD for each parameter (for exemple the tree root heigh)
the distribution minus the 2.5% lowest and the 2.5% highest
estimation ?
- Or does we have to consider first the posterior (or the likelihood),
take the 95% intervall as described just above and then consider the
distribution of each parameter for those subset of samples ?

In both case It didn't match with tracer output (with or without
burnin)... Can you please provide me your calaculation methods ?

thanks

paul

--
You received this message because you are subscribed to the Google Groups "beast-users" group.
To post to this group, send email to beast...@googlegroups.com.
To unsubscribe from this group, send email to beast-users...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/beast-users?hl=en.

alexei

unread,

May 10, 2010, 4:23:58 PM5/10/10

to beast-users

The x% Highest Posterior Density (HPD) interval, is the *smallest*
interval that contains x% of the samples. If one tail is much longer
than the other then most of the removed values will come from the
longer tail. Sort the values and consider the interval starting from
the very smallest value to the x-percentile value and store the
difference in the values (i.e. the size of the interval). Now
increment one value at a time (at both ends, so that you always have x
% of the values in the interval) until you get to the interval that
starts at the (100-x)-percentile value and finishes at the largest
value. Each step compare the new interval width to the minimum found
so far. If smaller than minimum update minimum and store the start
value. This algorithm takes O(N) operations where N is the size of the
sample. You need to sort the sample first which takes O(N log(N)) with
say a merge sort.

The x% Central Posterior Density (CPD) interval, on the other hand, is
the interval containing x% of the samples after [(100-x)/2]% of the
samples are removed from each tail.

The shorthand for both types of interval is the x% Credible Interval.

Alexei

plef

unread,

May 11, 2010, 3:02:14 AM5/11/10

to beast-users

excellent !
Thanks a lot for you quick answer.
best wishes

jspa...@gmail.com

unread,

Mar 2, 2014, 1:43:08 PM3/2/14

to beast...@googlegroups.com, plef...@yahoo.fr

Hi guys,

Related to these posts, I'd like to know if there's the possibility of including in the code the following request: Beauti also showing the 95% highest (prior?) density for a given prior on a parameter, for each of the possible probability distributions, besides the 95% central distribution only. I mention this because every now and then I'd like to set the prior based on what the posterior on Tracer will show (which is the 95% HPD), but Beauti focus on the central distribution instead... though I wonder if it is even possible to include the highest density in the prior tab, given it's probably calculated by integration for each distribution, which may not be an easy task...

Cheers,

Jose.

Rupert Collins

unread,

Mar 2, 2014, 8:13:55 PM3/2/14

to beast...@googlegroups.com

This won't answer the Jose's question, but regarding Paul's first question, yes, 95% HPD is not calculated in the same way as a 95% quantile. See http://stats.stackexchange.com/questions/24588/quantile-intervals-vs-highest-posterior-density-intervals

Calculating HPDs outside of Tracer can be done in the R package LaplacesDemon. See http://www.bayesian-inference.com/software

Hope this helps a bit.

To unsubscribe from this group and stop receiving emails from it, send an email to beast-users...@googlegroups.com.

To post to this group, send email to beast...@googlegroups.com.

Visit this group at http://groups.google.com/group/beast-users.
For more options, visit https://groups.google.com/groups/opt_out.

Andrew Rambaut

unread,

Mar 3, 2014, 2:37:09 AM3/3/14

to beast...@googlegroups.com

Dear Jose,

The HPD is the shortest interval that contains 95% of the probability density. For most parametric distributions in BEAUti it will be the same as the interval spanning the 2.5% and 97.5% percentiles. What I usually do is play around with the mean and variance (or whatever) until I get a good correspondence. If you want to do this precisely, you would need to do a fitting procedure perhaps in R.

The issue is for an actual BEAST run, the posterior density of any parameter may not exactly fit one of the standard densities in BEAUti, so you may not be able to find one that gives percentiles that correspond to your HPDs.

A.

To unsubscribe from this group and stop receiving emails from it, send an email to beast-users...@googlegroups.com.

To post to this group, send email to beast...@googlegroups.com.

signature.asc

Alexei Drummond

unread,

Mar 3, 2014, 2:56:27 AM3/3/14

to beast...@googlegroups.com

Dear Jose,

As you say, BEAUti reports the CPD (central prior density) rather than the HPD (highest prior density) interval. As Andrew says, these are the same for symmetric distributions and for the 95% CPD it is just the interval spanning from the 2.5% percentile to the 97.5% percentile. In the case of asymmetric distributions like Lognormal the HPD will depart from the CPD and tend to use up more of the lower tail than the upper tail because that will result in the shortest interval containing 95% of the probability. I think you are correct that in general computing the 95% highest prior density will take some sort of numerical optimisation, which is why we don't do it.

Cheers

Alexei

jspa...@gmail.com

unread,

Mar 8, 2014, 11:34:08 AM3/8/14

to beast...@googlegroups.com

Dear A. Rambaut and A. Drummond,

Thks so much for the replies, guys! It helped me figure out 2 solutions, one more heuristic, the other using an R script (thks Andrew for the suggestion on using R for this task!)... But first, I just want to clarify that my question was motivated by the difference on what's reported in the intended prior (reported in Beauti's prior tab using quantiles) and the effective prior as observed in Tracer (which uses the highest 95% density interval). The issue is that, e.g., if I intend to set a lognormal within the 95% density interval, it is not feasible to set it properly in beauti, because depending on the skewness of the distribution, the effective prior's 95% density (as seen in Tracer) can be more towards the lowest 95% interval of the intended prior in beauti (if very skewed), or more towards the 2.5% - 97.5% interval of the intended prior (if closer to a Normal).

As this may be of general interest to Beast users because lognormal distributions are heavily used as priors on both rates and dates, after playing with some values of the log(stdev) parameter (which determines the skewness level), I found out that a general rule of thumb is that if the intended prior of a Lognormal in Beast is "very skewed" (stdev >= 0.7), then Beauti's lowest 95% interval is very close to the 95% HPD reported in Beast; contrarily, if the logN distribution is closer to a Normal (stdev <= 0.3), then it's safe to assume the central 2.5% - 97.5% interval (i.e., Tracer will report a 95% HPD close to that range). Regarding the Exponential distribution (also used a lot for priors), the lowest 95% of the intended prior is always the closest match. It's important to mention once again that these results concern what range one should focus on when setting up Beauti priors, regarded that Tracer reports the 95% HPD.

Now, because the heuristic solution above (in terms of stdev values) can only be applied to the lognormal, an analog reasoning makes it extensible to other distributions, although I haven't tried others. Following Andrew Rambaut's suggestion, I found in R the "TeachingDemos" package, which has the hpd function. Below, I post an example code on how to calculate exactly the 95% HPD bounds of a LogN intended prior... it's a simple solution, as the bounds must be found by trial-and-error, by changing mean and stdev values:

# Load package "TeachingDemos", which has the "hpd" function

library(TeachingDemos)

my_mean = put_your_value_here

stdev = some_other_value

# Calculate lower and upper 95% density of the intended lognormal prior

# if mean is given in REAL space:

real_mean_HPD <- hpd(qlnorm, meanlog=log(my_mean)-stdev^2/2, sdlog=stdev)

print(real_mean_HPD)

# if mean is given in LOG space:

log_mean_HPD <- hpd(qlnorm, meanlog=my_mean, sdlog=stdev)

print(log_mean_HPD)

Other distributions can be used, this was just a LogN example.

All the best,

Jose.

jspa...@gmail.com

unread,

Mar 8, 2014, 2:10:43 PM3/8/14

to beast...@googlegroups.com

Hi RC,

Sorry, I hadn't seen your post before... thks for your suggestion, using LaplacesDemon seems a good alternative to calculate 95% highest density intervals of a prior distribution, but instead of doing numeric integration (which I believe is the case with the hpd() function in TeachingDemos.... though I'm not sure, please correct me if this is incorrect), it seems to calculate the HPD based on points drawn from the specified distribution (at least the p.interval() function in LaplacesDemon), and therefore it isn't as precise in calculating the intervals (I've tried it a little bit, hard to get close to stabilized lower and upper bound values unless n>=10,000,000 points, but even though it's not as good as an approximation).