Sum function as implemented in OxCal, Calib, and CalPal

Boulanger, Matthew Turner

unread,

May 29, 2013, 11:19:00 AM5/29/13

to ox...@googlegroups.com

Chris and co.:

Can anyone explain to me why the summed probability distributions generated by OxCal and Calib are fundamentally different than those produced by CalPal, as observed by Buchanan et al. (2011) “A comment on Steele's (2010) “Radiocarbon dates as data: quantitative strategies for estimating colonization front speeds and event densities” in Journal of Archaeological Science?

-Matt

MILLARD A.R.

unread,

May 29, 2013, 11:32:57 AM5/29/13

to ox...@googlegroups.com

> From: Boulanger, Matthew Turner
> Sent: 29 May 2013 16:19

There's an explanation from the authors of CalPal about why it differs from the other two in this paper:
http://www.academia.edu/1608091/Concepts_of_probability_in_radiocarbon_analysis
In particular from p.13 onwards. However, I must admit I have not fully understood what is being said, and I certainly don't understand how to replicate the CalPal calculations.

Best wishes

Andrew
--
Dr. Andrew Millard                       A.R.M...@durham.ac.uk
Durham University
Senior Lecturer in Archaeology              Tel: +44 191 334 1147
Archaeology:      http://www.dur.ac.uk/archaeology/
Personal webpage: http://community.dur.ac.uk/a.r.millard/

Rayfo...@aol.com

unread,

Jun 2, 2013, 11:26:29 AM6/2/13

to ox...@googlegroups.com

Hi,

A paper by Alan N. Williams "The use of summed radiocarbon probability distributions in archaeology: a review of methods" Journal of Archaeological Science 39 (2012) 578-589 may cast some light on the question posed.

In particular ..."when using summed probability plots: 1) a minimum sample size of 500 radiocarbon dates should be used ........2) a moving average trendline of 500-800 years should be used to offset the effects of the calibration process."

Figure 5 in that paper provides a clear example of why summed distribution probabilities can cause spikes that are an artefact of the summing process.

It may be the case that the CalPal summing process applies such a smoothing whereas Calib and OxCal do not. Perhaps distribution Summing in CalPal serves a different purpose.

It might be instructive to run the IntCal09 curve through a 500 year moving average trendline and use it as the calibration curve in OxCal for such long term demographic questions?

regards

Ray Kidd

--
You received this message because you are subscribed to the Google Groups "OxCal" group.
To unsubscribe from this group and stop receiving emails from it, send an email to oxcal+un...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Rayfo...@aol.com

unread,

Jun 2, 2013, 2:30:15 PM6/2/13

to ox...@googlegroups.com

Hello Matt, Andrew,

Further to my last re

"A paper by Alan N. Williams "The use of summed radiocarbon probability distributions in archaeology: a review of methods" Journal of Archaeological Science 39 (2012) 578-589 may cast some light on the question posed."

I have run a rough 500 year trend smoothing on Intcal09 and used it as a trial curve in OxCal. Then I ran a model of pseudo R_dates at 25 year steps between 11,000 +/- 25BC and 8,000 +/- 25BC. In the first I used the Intcal09 curve and in the second the 500 year trend smoothed curve.

Perhaps the answer to your question lies in the CalPal smoothing method. I think it may be horses for courses, but I'll leave the details to the experts.

regards

Ray Kidd

In a message dated 29/05/2013 16:19:05 GMT Daylight Time, Boula...@missouri.edu writes:

Chris and co.:

Can anyone explain to me why the summed probability distributions generated by OxCal and Calib are fundamentally different than those produced by CalPal, as observed by Buchanan et al. (2011) “A comment on Steele's (2010) “Radiocarbon dates as data: quantitative strategies for estimating colonization front speeds and event densities” in Journal of Archaeological Science?

-Matt

Christopher Ramsey

unread,

Jun 3, 2013, 5:31:08 AM6/3/13

to <oxcal@googlegroups.com>

I think it is important to be specific what we mean by peaks in the Summed distributions and in what circumstances they arise:

1. Ideally sum should be applied in conjunction with a Bayesian model (such as a single phase model). Doing this will usually reduce one source of the peaks seen which is uncertainty in the end-points of the distribution. The peaks seen may often be just the results of the calibration of the earliest and latest points.

2. Consider what method is being used to generate the 'pseudo' radiocarbon dates. If you just have regularly spaced radiocarbon dates you will certainly get peaks from the calibration curve. This is ultimately because real radiocarbon dates are not uniformly distributed in radiocarbon date space if they are uniformly distributed in real time.

Here is a bit of OxCal code which can be used to test this:

Plot()
{
var(a);
Sequence()
{
   Boundary("Start Sim");
   Sum("Sim")
   {
    a=13000;
    while(a>=9000)
    {
     // calibrate the date
     R_Simulate(calBP(a),30);
     a=a-25;
    };
   };
   Boundary("End Sim");
};
Sequence()
{
   Boundary("Start RC");
   Sum("RC")
   {
    a=11000;
    while(a>=8000)
    {
     // calibrate the date
     R_Date(a,30);
     a=a-25;
    };
   };
   Boundary("End RC");
};
};

This puts radiocarbon dates at 25 year intervals through the period of about 11-7kBC. In the first model these are simulated to be uniform in true age (so probability density 0.04). In the second case they are simply equally spaced in radiocarbon date. The latter gives a very peaked distribution (similar to that shown by Ray). The former is noisy, but does not have the extreme peaks. In both these cases the spacing of the dates is similar to their uncertainty and so you do expect some noise. If you use the first approach with more closely spaced dates the distribution will be even less noisy (see fourth plot below with dates spaced every 5 years instead).

I attach plots from the two approaches - several of the first method since this is a simulation and gives different answers each time - and a final one at five times the date density (0.2 average probability density) to show how the noise reduces. Remember that real dates should follow the pattern of the Sim examples, not the RC example.

So in conclusion, I don't think that the algorithm used in OxCal and Calib does generate consistent spurious peaks from the Sum distributions (except at the ends).

Christopher

On 2 Jun 2013, at 19:30, <Rayfo...@aol.com>
wrote:

> Hello Matt, Andrew,
>
> Further to my last re
> "A paper by Alan N. Williams "The use of summed radiocarbon probability distributions in archaeology: a review of methods" Journal of Archaeological Science 39 (2012) 578-589 may cast some light on the question posed."
>
> I have run a rough 500 year trend smoothing on Intcal09 and used it as a trial curve in OxCal. Then I ran a model of pseudo R_dates at 25 year steps between 11,000 +/- 25BC and 8,000 +/- 25BC. In the first I used the Intcal09 curve and in the second the 500 year trend smoothed curve.
>

> <Untitled.jpg>
> <Untitled.jpg>

Untitled-6.pdf

Untitled-8.pdf

Untitled-9.pdf

Untitled-10.pdf

Untitled-7.pdf

Boulanger, Matthew Turner

unread,

Jun 3, 2013, 10:13:46 AM6/3/13

to ox...@googlegroups.com

Ray:

Thanks for the emails. These are issues that have come up in several published articles, and a ms. I have submitted has been harshly criticized in the review process for failing to address them.

Regarding the Williams (2012) paper, I am not convinced by a blanket statement about a sample size of 500 dates or a 500-800 year moving average as being always appropriate. Primarily because Williams dealt with (and ONLY with) a time frame of 50,000 years. No attempt was made to evaluate sample size for shorter time intervals. So, a more accurate statement of his findings would be that these criteria are needed when dealing with time intervals of ca. 50,000 years. This is an important distinction to make, especially for folks (like me) working in North America where our total amount of time is 1/5 that range. I’m looking at an interval of 5000 years, where asking for 500 dates is equivalent to saying we need a date for every 10(!) years.

All of this is related, but somewhat beside the point because (as several folks have observed) probability distributions created under OxCal and Calib are fundamentally different (regardless of smoothing or any other option) than those under CalPal. This is easily observed by creating an equal-interval series of dates with overlapping standard errors and calibrating them – just as you did. Logically, if the distributions of all the dates are continuous and overlap, then the summed probability for the entire range should be more or less constant for the entire interval of time – Which, as you can see from your plots, it is not, regardless of smoothing.

So, some aspect of the procedure in OxCal (and Calib) makes it such that the resulting SPD is influenced by the shape of the calibration curve. This is not the case in CalPal – and in fact, the SPDs from CalPal actually appear as you would logically anticipate given a uniform distribution of dates with overlapping probabilities over a given time period. It is only when the standard error is less than the gap between individual dates that “artificial” peaks and valleys begin to form. Note, that these are “artificial” in the fact that they are influenced by the composition of the sample itself, and not the calibration curve. You can observe this by doing effectively the same thing that you’ve done, but comparing the results with those of what you would get in CalPal – see attached image in which, for an interval of 8000 radiocarbon years and a constant error of 120 years, a sample size of 33 is perfectly sufficient to capture the underlying uniform constant probability of dates. The paper that Andrew directed me to actually addresses why this is the case, though I can’t say I actually ~understand~ all of it, yet.

-Matt

SPD_Eval_1.png

Christopher Ramsey

unread,

Jun 3, 2013, 12:23:25 PM6/3/13

to <oxcal@googlegroups.com>

Here is an equivalent to my fourth graph but with 120 year uncertainty on all the dates:

This is with about 800 dates across the range.

It is still a simulation and so there will still be some noise as there are of the order of 48 dates per 240 years - so noise of the order of ~14% is to be expected.

CalPal is using a method to smooth out the noise - I presume by hypothesising extra events - but I don't really understood the underlying assumption in formal terms. Bernie's paper speaks of 'produce some previous-ly non-existant particles ‘out of a va-cuum’ on the calendric time-scale". Or perhaps it is considering all possible dates you might get at each point (equivalent in OxCal to running multiple simulations). In real situations, you only get one measurement though.

On 3 Jun 2013, at 15:13, "Boulanger, Matthew Turner" <Boula...@missouri.edu>
wrote:

> Ray:
>
> Thanks for the emails. These are issues that have come up in several published articles, and a ms. I have submitted has been harshly criticized in the review process for failing to address them.
>
> Regarding the Williams (2012) paper, I am not convinced by a blanket statement about a sample size of 500 dates or a 500-800 year moving average as being always appropriate. Primarily because Williams dealt with (and ONLY with) a time frame of 50,000 years. No attempt was made to evaluate sample size for shorter time intervals. So, a more accurate statement of his findings would be that these criteria are needed when dealing with time intervals of ca. 50,000 years. This is an important distinction to make, especially for folks (like me) working in North America where our total amount of time is 1/5 that range. I’m looking at an interval of 5000 years, where asking for 500 dates is equivalent to saying we need a date for every 10(!) years.
>
> All of this is related, but somewhat beside the point because (as several folks have observed) probability distributions created under OxCal and Calib are fundamentally different (regardless of smoothing or any other option) than those under CalPal. This is easily observed by creating an equal-interval series of dates with overlapping standard errors and calibrating them – just as you did. Logically, if the distributions of all the dates are continuous and overlap, then the summed probability for the entire range should be more or less constant for the entire interval of time – Which, as you can see from your plots, it is not, regardless of smoothing.

But I'm pretty sure Ray's plots were based on equally spaced radiocarbon dates - not dates equally spaced in calendar time. I think Bernie's are too.

>
> So, some aspect of the procedure in OxCal (and Calib) makes it such that the resulting SPD is influenced by the shape of the calibration curve. This is not the case in CalPal – and in fact, the SPDs from CalPal actually appear as you would logically anticipate given a uniform distribution of dates with overlapping probabilities over a given time period. It is only when the standard error is less than the gap between individual dates that “artificial” peaks and valleys begin to form. Note, that these are “artificial” in the fact that they are influenced by the composition of the sample itself, and not the calibration curve. You can observe this by doing effectively the same thing that you’ve done, but comparing the results with those of what you would get in CalPal – see attached image in which, for an interval of 8000 radiocarbon years and a constant error of 120 years, a sample size of 33 is perfectly sufficient to capture the underlying uniform constant probability of dates. The paper that Andrew directed me to actually addresses why this is the case, though I can’t say I actually ~understand~ all of it, yet.

One thing I don't understand is what the distribution of ticks is under the CalPal plots. These appear to be non-linearly distributed in calendar time. It would be very useful if anyone can explain how this really works in simple terms. I would be worried by using a mathematical process that cannot be formally defined - even if it does produce clean-looking output.

Christopher

> To unsubscribe from this group and stop receiving emails from it, send an email tooxcal+u...@googlegroups.com.

> For more options, visit https://groups.google.com/groups/opt_out.
>
> --
> You received this message because you are subscribed to the Google Groups "OxCal" group.

> To unsubscribe from this group and stop receiving emails from it, send an email tooxcal+u...@googlegroups.com.

> For more options, visit https://groups.google.com/groups/opt_out.
>
>
>
> --
> You received this message because you are subscribed to the Google Groups "OxCal" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to oxcal+un...@googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.
>
>

> <SPD_Eval_1.png>

Untitled-11.pdf

MILLARD A.R.

unread,

Jun 3, 2013, 12:33:20 PM6/3/13

to ox...@googlegroups.com

> From: Boulanger, Matthew Turner
> Sent: 03 June 2013 15:14
...

> Logically, if
> the distributions of all the dates are continuous and overlap, then
> the summed probability for the entire range should be more or less
> constant for the entire interval of time

This has been assumed by those using summed distributions, but I do not think I have seen it demonstrated. The uncertainty in radiocarbon dates derives from the measurement on the C14-scale, which is non-linearly related to the calendar scale, and so my statistical intuition is that non-uniformity of summed dates is expected.

Imagine the case where there is one dated sample per year. Via the calibration curve we can read off the true radiocarbon content of each year, and we will get a spiky distribution on the C14-scale. We measure all those samples with Gaussian error. The resulting summed probability distribution on the C14-scale will smooth out the peaks and troughs in the distribution. If that distribution is calibrated back to calendar years, the result will not be uniform, because the peaks are no longer high enough to compensate for plateaux and the troughs not deep enough to compensate for steep sections of the curve.

Christopher Ramsey

unread,

Jun 3, 2013, 1:47:35 PM6/3/13

to <oxcal@googlegroups.com>

I think that from the derivation of the calibration process it should be uniform as this is the prior.

However, I think there may be subtleties in the way we simulate dates from the curve and correlation of errors between points on the curve which could produce some minor effects when the errors become comparable to those of the curve itself.

However, the point of my previous emails is that it is uniform with the OxCal/Calib procedure (or at least very close to uniform) if you have enough data. Here are another three plots, this time through the iron age plateau. I have not used phase modelling because I wanted to push the number of simulations much higher. So in these cases I have 20 dates every 5 years (each with 50 year uncertainty).

You will see that there are no consistent peaks etc around the iron-age plateau as would be suggested by comments earlier. You should ignore the edge effects as these are the result of using unmodelled dates.

From these, I think that perhaps what CalPal is doing, is for each date, simulating a large number of other dates that you might get in the same time range where you to make them. This would smooth the curve.

However I still don't see how you can expect to get a uniform distribution from a sparse number of dates - just because of measurement noise - you would always expect to see some noise. To remove it you have to use some noise-removal algorithm which, as with image-enhancement might make a clearer picture but it is ultimately removing information.

Christopher

On 3 Jun 2013, at 17:33, MILLARD A.R. <a.r.m...@durham.ac.uk>
wrote:

Untitled-15.pdf

Untitled-14.pdf

Untitled-13.pdf

Rayfo...@aol.com

unread,

Jun 3, 2013, 4:18:31 PM6/3/13

to ox...@googlegroups.com

Hi Christopher,

I think, in simple terms (I only do simple), what they are doing in CalPal is akin to applying negative feedback in an amplifier to reduce distortion in the transfer characteristic. In fig 9 of the paper it is as if a single determination without error is applied to the calibration curve. Where it hits, it generates a pulse. If there are multiple pulses, they are weighted so they always add up to, say, 95%. The paper then says (p16):

"We may now re-formulate the question whether –
or not – a correction of the 14C-histogram shape (or
corresponding shape of the calibrated frequency distribution)
to allow for the folding properties of the
calibration curve is possible. As already noted above,
the correction must be applied to the probability distribution
for each individual date. Now that the distribution
is reduced to a series of digital ‘true-false’
(yes-no; 1/0) decisions (Fig. 9), it becomes apparent
that the correction – to be applicable – must assign
an individual truth-value to each of the alternative
readings......"

I think the 'correction' might be a form of negative feedback that alters the local shape of the calibration curve by the relevant proportion for each date distribution.

From my telecoms days Distortion with feedback = d/(1+Ab)

where d is the original distortion, A the amplifier gain and b the fraction of output fed back.

Whether or not this is the case is moot. Whether or not it is a valid application in the circumstances would, as you say, require exposing the mathematics to the oxygen of daylight.

regards

Ray

Rayfo...@aol.com

unread,

Jun 4, 2013, 6:09:17 PM6/4/13

to ox...@googlegroups.com

Hello Christopher,

Thinking about the CalPal chart and what I said earlier about negative feedback for correcting transfer characteristic distortion in amplifiers. As a thought experiment, if 100% negative feedback is applied to the wiggles, then what is left is the trendline of the calibration curve without the wiggles. So I took Intcal09 and derived its trendline as a polynomial of order 2. I exchanged this for the Intcal09 curve but retained the error terms. Then I ran several models using your suggested simulation program.

The results are attached as 'Chart_of_MODIntcal.doc, but here is a taster:

In the model below, a uniform distribution of dates at 25 yr intervals between about 3500 and 1500 BP has a region of 21 additional dates at 5 yr intervals between 2600 and 2500 BP. Shown after modeling.

I of course cannot say if this is what CalPal is doing, but it seems to be a fair comparison.

If so, then the implication might be that the process has the effect of a hard negative feedback that eliminates the wiggles of the calibration curve. But then there is a question about how far can it be claimed that the Intcal09 curve is employed if it is 'undistorted' beyond all recognition?

I will try some previous models and see what effects on real dates are.

Best wishes

Ray

Chart_of_ModIntcal.doc

Christopher Ramsey

unread,

Jun 5, 2013, 8:22:30 AM6/5/13

to <oxcal@googlegroups.com>

Yes - it may be something like that. I think there are three different things to keep in mind.

1. The main reason for the lumpiness of the sum distributions is that some radiocarbon dates give much higher calendrical precision than others. If you have enough this all averages out and as I have demonstrated, you do get a fairly uniform distribution for the Sum. However you have to go to a very large number of measurements.

2. I think the simulations being used in some cases don't take account of the fact that even if you know the date precisely radiocarbon dates that you get for this will typically vary and this in itself introduces noise which you should see in the Sum distribution if the number of dates is limited (again it will smooth out in the limit of a high number of dates).

3. The Sum function in OxCal and Calib is NOT a model it is just a mathematical average of a series of probability distribution functions from calibrated radiocarbon dates (or anything else for that matter).

Of course typically the dates that you have are only representative of some underlying distribution and so you may want to do more than simply sum the dates to get at this. The question is then what model you use to do that. There are ways within OxCal for example to generate a distribution if you think that the underlying process is Uniform. However typically in these cases what we want is something that gives an indication of changing rates. To do this you need some underlying model.

I think that CalPal is doing something to simulate nearby dates. This is rather like a kernel density approach. You can simulate something similar in OxCal. Take our same example again. The calibrated dates in our set typically have a pdf standard deviation of about 100 so we could use a kernel distribution of N(0,100) to infer the underlying distribution of events. The following code does this in two ways. The first includes the randomness referred to above in the dates you expect to get - the second does not - it just assumes we get exactly the right value.

Plot()
{
var(a);
Sum("Sim-rand")

{
   a=13000;
   while(a>=9000)
   {
    // calibrate the date

    R_Simulate(calBP(a),30)+N(0,100);
    a=a-25;
   };
};
Sum("Sim-exact")

{
   a=13000;
   while(a>=9000)
   {
    // calibrate the date

    R_Date(BP(calBP(a)),30)+N(0,100);
    a=a-25;
   };
};
};

I attach the two plots from this excercise. You will see that the second one is looking much more like the CalPal output. This is fine - but I think the underlying model for generating this distribution needs to be explicit. Note that the properly randomised simulation does still contain more noise as you would expect it to.

Christopher

Untitled-17.pdf

Untitled-16.pdf

Chris Carleton

unread,

Nov 20, 2013, 7:45:05 PM11/20/13

to ox...@googlegroups.com

I'm not sure if anyone who posted to this thread is interested in the topic anymore, or has already figured out a good answer, but I've recently come on to this topic and have a concise answer to the original question. I've spent some time reading the linked papers in this thread, and a few others, regarding the difference between OxCal/Calib and CalPal vis the summed radiocarbon distributions. I've also spent some time emailing back and forth with Bernie Weninger about CalPal. From those readings, emails, and what Christopher Ramsey has stated here (and on the OxCal webpage), it seems to me that the primary differences in the calculation are as follows:

OxCal: calibrates the set of radiocarbon dates (including any Bayesian constraints), corrects the areas of the posterior distributions and calculates a common mean (which amounts to an OR statement about the events in question). OxCal's summed probability distribution conforms to Kolmogorov's probability axioms, but has to be evaluated with care including all of the issues Dr Ramsey has mentioned in this thread regarding sampling etc. and it will always be 'noisy' and peaks will correspond to the slope of the calibration curve.

CalPal: sums the c-14 distributions (in radiocarbon years) and then calibrates the resulting histogram against the calibration curve without correcting for area, thus creating a smoother looking curve that is relatively (compared to OxCal/Calib) less responsive to the slope of the calibration curve, but that does not conform to the probability axioms. It will generally always be smoother than OxCal/Calib sums because it is not a 'sum' of the posteriors in the same strict Kolmogorov sense.

I hope you read no bias in the above statements since I make no attempt here to debate the relative strengths of the two approaches for estimating an underlying function (generally, namely, palaeodemography).

Chris

Reply all

Reply to author

Forward