Re: describing phases with cumulative probability plots

85 views
Skip to first unread message

Rayfo...@aol.com

unread,
Feb 27, 2015, 7:28:53 AM2/27/15
to ox...@googlegroups.com
Hi Erik,
 
I once had an academic colleague who bemoaned the term 'Probability Density' as unnecessary jargon, berating me for putting it on diagrams.  She said she understood  a Probability curve (she sketched a Gaussian pdf in the air).  Stabbing at the imaginary middle she added 'with the maximum Probability in the centre, getting less as it slopes down at each side.'
 
She is not alone, It seems ingrained in the psyche to associate the height of the PDF with increasing Probability.  Of course it is not, I blame it on the early contact with  teachers who themselves are wooly on statistics.  You of course know this, but never the less, it is so easy to slip into the trap, as when you say 'so you can read increasing probabilities as a higher curve or increasingly dark color'.
 
The task is to convey the probability as an area under the PDF, made more difficult when the PDF is multi modal and or skewed.  One answer as used in OxCal is the concept of HPD (Highest Posterior Density or Highest Probability Density.  I'm never sure which and I've seen both terms used).  This, in visual terms, draws a line horizontally across the mode(s), i.e. the highest point(s) of the PDF.  The line is then moved downwards towards the X axis, accumulating the area under the curve as a cumulating probability.  When the line reaches the X axis, the probability is '1' and the range of the X values is then how much of X is required in order to be 100% certain the parameter (the date) lies in that interval.  OxCal can display intervals at 68.2%, 95.4% and 99.7% probability.  They are based on accumulating probability from the centre outwards to the tails.  In a Normal pdf the accumulation is more rapid nearer the Median/Mean/Mode and slows at the tails.
 
Once it is decided what value of probability will be acceptable, (lets say 95.4%), then the range of uncertainty of the parameter in question (i.e. the Start date) is defined by the range encompassed on the X axis.  All of it, not more at one end than the other, all of it.  Hence I suggest your red bars are sufficient, without shading or amplification.  If you have a 95.4% red bar that goes from 1270 BC to 1150 BC, then all you can say is that there is a 95.4% probability that the start date lies between these two dates.   If you use a 68.2% bar, then the range of dates is shorter, but the probability of the date being in the range is less (from 1 in 20 to about 1 in 3).
 
The corollary is that the '40 dates'  (I still like to think in terms of the Evidence) is also a moving feast governing the length of the red bar.
 
If the object is to simplify the diagram, then I think PDFs, cumulative Probabilities, shading and heights are not the way to go.
 
I think your earlier diagram is the simplified explanation for the audience in question (which may be larger than realised).  Viz:
 
The green bar is the evidence for the group activity.
The red bar shows how uncertain we are about when the activity started
The Blue Median shows where there is a 50:50 chance of the Start being before or after that date.
 
The slide being rapidly followed by the Date() pdf!
 
Most enjoyable!
 
Best wishes
 
Ray
 
In a message dated 27/02/2015 03:13:13 GMT Standard Time, erik....@gmail.com writes:
Dan and Ray—

I have a few other options that might be useful after playing with gradients and transparencies.
Ray, you defended my red bard better than I could have. Dan, your points are well taken and I think, as usual, it leads back to the fine line of how much detail to exclude. The truer you are to the data the faster you lose your audience.
1. The first one (top in the figure) is the same as before, except with a gradient instead of solid bar. It adds a tiny bit of information that might discourage looking at the tails. You could add two bars, or like I mentioned, just one for the 68% range.
2. Next, I used the probability curves and the gradients, so you can read increasing probabilities as a higher curve or increasingly dark color. Using the curves has the advantage of representing 99% of the probability, but is a little less intuitive than bars. I would say that bars are good enough as long as the have a fairly normal shape (which these do, like many phase boundaries). If they're really odd or bimodal, I would agree with Dan that they might be obscuring an important detail and the curve is necessary.
3. The third one just joins the curves together (green vertical bar indicates the median).
4. This last one might be the most interesting, as per Ray's interesting idea on accumulating probabilities in response to Dan's concern about showing how it becomes increasingly likely that a phase has started (which remains unclear with the simple boundary curve and median). The curve is outlined in red, just like in Ray's figure. Note that I didn't actually calculate the cumulative probability as it builds from left to right and then right to left (or top to bottom? Ray, you lost me there). I would have no idea how to—I just nudged the figure over. In this case the curves aren't anything weird (bimodal or jagged), so they're probably pretty close. I would be very cool if OxCal (or R?) would display a cumulative probability as it increased/decreased over time.

Anyway, I'd like to hear your opinions on how these communicate the idea without losing necessary detail.
The idea is to not recreate something that inspired Dan to start this thread!

Cheers
Erik

--
You received this message because you are subscribed to the Google Groups "OxCal" group.
To unsubscribe from this group and stop receiving emails from it, send an email to oxcal+un...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Rayfo...@aol.com

unread,
Feb 28, 2015, 6:48:47 AM2/28/15
to ox...@googlegroups.com
Hi Erik,
 
My last e-mail had an oops moment.
 
When I said:
 If you use a 68.2% bar, then the range of dates is shorter, but the probability of the date being in the range is less (from 1 in 20 to about 1 in 3).
 
I should of course have said:
 If you use a 68.2% bar, then the range of dates is shorter, but the probability of the date being OUTSIDE the range is GREATER (from 1 in 20 to about 1 in 3).
 
My apologies.
 
regards
 
Ray
Reply all
Reply to author
Forward
0 new messages