Outliers and shifting dates?

35 views
Skip to first unread message

Sarah Martini

unread,
Sep 8, 2025, 10:42:03 AMSep 8
to ox...@googlegroups.com
Hello fellow OxCal users!
I have a couple of questions about the proper application of Outlier_Model() and trying to account for offsets between dates.
  1. I am unclear on where exactly in the code the Outlier_Model() should be placed. Should it be placed within a Sequence() or can it be placed before (if the same model should be applied to all parts of a code)? I tried the following code based on some supplementary information in other articles and received the warning that Outlier_Model() had not been set (see screenshot).
 Options()
 {
  Curve="SHCal20.14c";
 };
 Plot()
 {
  Outlier_Model("General", T(5), U(0,4), "t");
  Sequence("Cerro Ñañañique - Panecillo")
  {
   Boundary('Panecillo')
   {
    color = "blue";
   };
   Phase()
   {
    KDE_Plot('Start Panecillo')
    {
     R_Date('OBDY-81',2540,250)
     {
      Outlier("General", 0.05);
     };
     R_Date('OBDY-172',2420,670)
     {
      Outlier("General", 0.05);
     };
     R_Combine("Combine Sample 88-5")
     {
      Outlier("General", 0.05);
      R_Date('OBDY-NA',2220,170)
      {
       Outlier("General", 0.05);
      };
      R_Date('OBDY-564',2634,160)
      {
       Outlier("General", 0.05);
      };
     };
     R_Date('OBDY-256',2380,160)
     {
      Outlier("General", 0.05);
     };
     R_Combine("Combine Sample 88-6")
     {
      Outlier("General", 0.05);
      R_Date('OBDY-557',2640,160)
      {
       Outlier("General", 0.05);
      };
      R_Date('OBDY-853',2850,50)
      {
       Outlier("General", 0.05);
      };
     };
    };
    Interval('Duration Panecillo');
   };
   Boundary('End Panecillo')
   {
    color = "red";
   };
  };
 };
 
A screenshot of a computer

AI-generated content may be incorrect.
  1. With a different series of radiocarbon dates, I am trying to incorporate dates measured on both apatite and collagen fractions of bone into a phased model. I dated one bone using both fractions and found that the apatite fraction is 80 C14 years “younger” than the collagen fraction. Because of this, I would like to try to account for this offset within the phase model to make sure that apatite measurements aren’t pushing things earlier than they should. Is there a way to make this correction? I have tried
N(“appcorr”, -80, 20);
Shift(Shited Date, Apatite date, appcorr)
This does move the apatite dates back towards what I would expect them to be, but feels slightly arbitrary.
I appreciate the help!
Sarah

--
Sarah Martini (she/her)

Erik Marsh

unread,
Sep 9, 2025, 3:39:11 PMSep 9
to OxCal
Hi Sarah,

You can place the outlier model anywhere before the first tag; I usually put it at the top with the calibration curve. I think your error might be because you mixed double and single quotes in the names of events (OxCal can sometimes handle this but double quotes are best). Anyway, this version of the code runs fine; I streamlined it a little. Phase can be substituted for KDE_Plot, which simplifies the nesting. You are only calling one outlier model, so you don't have repeat its name in each tag.

When calling outliers and R_Combine, the outlier tag applies to all dates nested within the R_Combine, after they are combined (see this post).  If you want to apply outlier tags to each date, you have to group them with Combine instead. A chi-square test is applied anytime you use either command, and this result is what tells you if the dates combine well or not. In this case, both combinations pass the chi-square test (you will get a warning in red text if not). Sample 88-6 has a 13% chance of being an outlier, but its chi-square and agreement index are okay. A second option is to use an s-type outlier, which assumes the uncertainties have been underestimated (see the example in Bronk Ramsey 2009:1029).

Nested as below, we assume the combinations are okay (based on the chi-square tests), so the outlier tag evaluates the combined date's place in the phase. This is not especially well defined since the phase only has five widely-spaced events. I queried both Interval and Span so you can compare. Better color choices are green for starting boundaries (go) and red for ending boundaries (stop). Look at the example in Bronk Ramsey 2017:Fig 3, made with View > Plot Stacks.

As for your second question, start a new thread, since it's clearer for everyone if you only ask one question per thread.
Hope this helps, Erik


 Plot()
 {
  Curve("SHCal20","SHCal20.14c");

  Outlier_Model("General", T(5), U(0,4), "t");
  Sequence("Cerro Ñañañique - Panecillo")
  {
   Boundary("Panecillo");
   KDE_Plot("Start Panecillo")
   {
    R_Date("OBDY-81",2540,250)    {     Outlier(0.05);    };
    R_Date("OBDY-172",2420,670)    {     Outlier(0.05);    };
    R_Combine("Sample 88-5")
    {
     R_Date("OBDY-NA",2220,170);
     R_Date("OBDY-564",2634,160);
     Outlier(0.05);
    };
    R_Date("OBDY-256",2380,160)    {     Outlier(0.05);    };
    R_Combine("Sample 88-6")
    {
     R_Date("OBDY-557",2640,160);
     R_Date("OBDY-853",2850,50);
     Outlier(0.05);
    };
    Interval("Interval Panecillo");
    Span("Span Panecillo");
   };
   Boundary("End Panecillo");
  };
 };

Sarah Martini

unread,
Sep 10, 2025, 4:50:13 AMSep 10
to OxCal
Dear Erik,
Thank you for the cleaned up code and the explanation. That's really good to know about the problem with single quotation marks and certainly solved one of my issues with the code. 
I have a follow-up question about the substitution of the DKE_Plot.

I ran the code that you provided substituting the KDE_Plot for the Phase and ran a second code where I re-introduced the Phase:
 Plot()
 {
  Curve("SHCal20","SHCal20.14c");
  Outlier_Model("General", T(5), U(0,4), "t");
  Sequence("Cerro Ñañañique - Panecillo")
  {
   Boundary("Panecillo");
   Phase("CN-Pan")

   {
    KDE_Plot("Start Panecillo");
    R_Date("OBDY-81",2540,250) {Outlier(0.05); };
    R_Date("OBDY-172",2420,670) { Outlier(0.05); };
    R_Combine("Sample 88-5")
    {
     R_Date("OBDY-NA",2220,170);
     R_Date("OBDY-564",2634,160);
     Outlier(0.05);
    };
    R_Date("OBDY-256",2380,160)  {   Outlier(0.05); };
    R_Combine("Sample 88-6")
    {
     R_Date("OBDY-557",2640,160);
     R_Date("OBDY-853",2850,50);
     Outlier(0.05);
    };
    Interval("Interval Panecillo");
    Span("Span Panecillo");
   };
   Boundary("End Panecillo");
  };
 };

With the phase re-introduced, I got a very different Agreement Index for Sample 88_6 (A = 61.2) and the probability of it being an outlier jumps up to 40%. The boundaries and Interval/Span are also different (albeit only slightly). I realize that technically both models pass the agreement indices, but the change was quite notable. Do you know why there is such a difference based on this substitution? If I am considering these dates as part of a phase, should I keep the phase and have the KDE_Plot nested?

KDE substituted for phase

KDE_sub.png

Phase & KDE
PhaseAndKDE.png

Thanks again!
Sarah

Erik Marsh

unread,
Sep 11, 2025, 11:31:59 AMSep 11
to OxCal
Hi Sarah,

Those are both identical ways of coding that phase, but the multiple runs of the same code can produce different results because of the MCMC's randomized starting points. I see that the KDE is not smooth but spiky – I think what is happening is that the MCMC algorithm is struggling to converge – there is a low chance the phase is very short (as in this case). We can see this in the bimodal shape of the Span and Interval queries: there is a very high peak for the phase lasting <100 years, 
followed by a smoother curve with a median closer to 400 years.

To (partially) address this, I increased the number of iterations (below) so multiple runs should give more consistent results (but the Span is still bimodal). Any way we run it, the boundaries are very imprecise, telling us the phase is poorly defined – more dates or priors (such as contextual relationships) would help. For a phase with so few dates, the error ranges are unexpectedly large, much more than I usually see, even for legacy dates. I also tried the model without the outlier tags – it resolves quickly and provides reasonable and similar results (the R_Combines pass the chi-square test). If you have no other reason to believe these are outliers, this simpler model may work for you.

Erik



Options() { kIterations=1000; };


Plot()
 {
  Curve("SHCal20","SHCal20.14c");
  Sequence("Cerro Ñañañique - Panecillo")
  {
   Boundary("Panecillo");
   KDE_Plot("Start Panecillo")
   {
    R_Date("OBDY-81",2540,250);
    R_Date("OBDY-172",2420,670);

    R_Combine("Sample 88-5")
    {
     R_Date("OBDY-NA",2220,170);
     R_Date("OBDY-564",2634,160);
    };
    R_Date("OBDY-256",2380,160);

    R_Combine("Sample 88-6")
    {
     R_Date("OBDY-557",2640,160);
     R_Date("OBDY-853",2850,50);
    };
    Interval("Interval Panecillo");
    Span("Span Panecillo");
   };
   Boundary("End Panecillo");
  };
 };

Reply all
Reply to author
Forward
0 new messages