Hi Erik,
I once had an academic colleague who bemoaned the
term 'Probability Density' as unnecessary jargon, berating me for putting
it on diagrams. She said she understood a Probability curve (she
sketched a Gaussian pdf in the air). Stabbing at the imaginary middle
she added 'with the maximum Probability in the centre, getting less as it slopes
down at each side.'
She is not alone, It seems ingrained in the psyche to associate the height
of the PDF with increasing Probability. Of course it is not, I blame it on
the early contact with teachers who themselves are wooly on
statistics. You of course know this, but never the less, it is so easy to
slip into the trap, as when you say 'so you can read increasing probabilities as
a higher curve or increasingly dark color'.
The task is to convey the probability as an area under the PDF, made more
difficult when the PDF is multi modal and or skewed. One answer as used in
OxCal is the concept of HPD (Highest Posterior Density or Highest Probability
Density. I'm never sure which and I've seen both terms used). This,
in visual terms, draws a line horizontally across the mode(s), i.e. the highest
point(s) of the PDF. The line is then moved downwards towards the X axis,
accumulating the area under the curve as a cumulating probability. When
the line reaches the X axis, the probability is '1' and the range of the X
values is then how much of X is required in order to be 100% certain the
parameter (the date) lies in that interval. OxCal can display intervals at
68.2%, 95.4% and 99.7% probability. They are based on accumulating
probability from the centre outwards to the tails. In a Normal pdf the
accumulation is more rapid nearer the Median/Mean/Mode and slows at the
tails.
Once it is decided what value of probability will be acceptable, (lets say
95.4%), then the range of uncertainty of the parameter in question (i.e. the
Start date) is defined by the range encompassed on the X axis. All of it,
not more at one end than the other, all of it. Hence I suggest your red
bars are sufficient, without shading or amplification. If you have a 95.4%
red bar that goes from 1270 BC to 1150 BC, then all you can say is that
there is a 95.4% probability that the start date lies between these two
dates. If you use a 68.2% bar, then the range of dates is shorter,
but the probability of the date being in the range is less (from 1 in 20 to
about 1 in 3).
The corollary is that the '40 dates' (I still like to think in terms
of the Evidence) is also a moving feast governing the length of the red
bar.
If the object is to simplify the diagram, then I think PDFs, cumulative
Probabilities, shading and heights are not the way to go.
I think your earlier diagram is the simplified explanation for the audience
in question (which may be larger than realised). Viz:
The green bar is the evidence for the group activity.
The red bar shows how uncertain we are about when the activity
started
The Blue Median shows where there is a 50:50 chance of the Start being
before or after that date.
The slide being rapidly followed by the Date() pdf!
Most enjoyable!
Best wishes
Ray
Dan and Ray—
I have a few other options that might be useful after playing with
gradients and transparencies.
Ray, you defended my red bard better than I could have. Dan, your points
are well taken and I think, as usual, it leads back to the fine line of how
much detail to exclude. The truer you are to the data the faster you lose your
audience.
1. The first one (top in the figure) is the same as before, except with a
gradient instead of solid bar. It adds a tiny bit of information that might
discourage looking at the tails. You could add two bars, or like I mentioned,
just one for the 68% range.
2. Next, I used the probability curves and the gradients, so you can read
increasing probabilities as a higher curve or increasingly dark color. Using
the curves has the advantage of representing 99% of the probability, but is a
little less intuitive than bars. I would say that bars are good enough as long
as the have a fairly normal shape (which these do, like many phase
boundaries). If they're really odd or bimodal, I would agree with Dan that
they might be obscuring an important detail and the curve is necessary.
3. The third one just joins the curves together (green vertical bar
indicates the median).
4. This last one might be the most interesting, as per Ray's interesting
idea on accumulating probabilities in response to Dan's concern about showing
how it becomes increasingly likely that a phase has started (which remains
unclear with the simple boundary curve and median). The curve is outlined in
red, just like in Ray's figure. Note that I didn't actually calculate the
cumulative probability as it builds from left to right and then right to left
(or top to bottom? Ray, you lost me there). I would have no idea how to—I just
nudged the figure over. In this case the curves aren't anything weird (bimodal
or jagged), so they're probably pretty close. I would be very cool if OxCal
(or R?) would display a cumulative probability as it increased/decreased over
time.
Anyway, I'd like to hear your opinions on how these communicate the idea
without losing necessary detail.
The idea is to not recreate something that inspired Dan to start
this thread!
Cheers
Erik
--
You received this message because you are subscribed to the
Google Groups "OxCal" group.
To unsubscribe from this group and stop
receiving emails from it, send an email to oxcal+un...@googlegroups.com.
For
more options, visit https://groups.google.com/d/optout.