Additional details on KDE algorithm and/or source code

202 views
Skip to first unread message

Michael Price

unread,
Mar 22, 2021, 6:01:18 PM3/22/21
to OxCal
Hi folks,

Many thanks for a creating such a high quality and useful tool for working with radiocarbon dates.

I am trying to implement the KDE algorithm in R, but cannot quite deduce from the 2017 paper the full, precise implementation. In particular, it is not clear to me exactly how the Metropolis sampling is implemented.

Is the source code available? I could easily figure out the details if it is. Failing that, here is my best guess at the precise details (as pseudo code). I would greatly appreciate help, corrections, and clarifications.

I believe that the event times -- \bold{t} in the 2017 paper -- are sampled completely independently of the sampling that involves the shaping parameter, g. If desired, the sampling of t can utilize constraints (e.g., stratigraphic information). Let's say a single sample of t can obtained by calling the function sample_t:

t = sample_t()

My guess is that sampling then involves two nested for loops, where the outer for loop is over samples of t and the inner for loop is over samples of g. Let Nt be the number of samples of t and Ng the number of samples of g for each value of t. The algorithm would be:

FOR m in 1:Nt
    t = sample_t()
    FOR n in 1:Ng
        g = sample_g() # uniform draw on 0 to 1
        h_S = calc_silverman_bandwith(t)
        h = g * h_S
        L = calc_predictive_likelihood(t,h)
        accept_g = metropolis_rule(Lprev,L) # I am not showing the steps to store L and g
        IF NOT accept_g:
            g = gprev

I can imagine any number of viable alternatives. For example, one could alternate sampling of t and g and use conditional likelihoods (Gibbs sampling or something like it).

While in principle the sampling approach should not matter (subject to the normal assumptions), in practice it could matter a great deal (e.g., a good sampling approach may need far fewer samples to obtain a good set of posterior draws; a bad sampling approach may take the time of the universe).

Much obliged for any help you can provide.

Mike

Christopher Ramsey

unread,
Mar 22, 2021, 7:44:13 PM3/22/21
to OxCal group
Mike

Yes - happy to provide some more details on this.

Essentially you are right we are sampling t_1 ... t_N and also g. In OxCal these are all just treated as parameters of the model and are not treated differently in terms of the order. The updating is randomised so in no particular order - and the order shifts over time.

All moves use the Metropolis Hastings algorithm - so the probability before and after a proposed shift is calculated and if the ratio is > 1 the move is accepted, if <1 it is accepted with probability equal to that ratio.

When updating g equation 5 is used from the paper. When updating any of the t_1 ... t_N then equation 9 is used. Obviously the likelihoods from the radiocarbon calibration are also included.

The Silverman kernel width is calculated every time as it depends on each value of t_i, though this is probably unimportant in most applications where this would be pretty constant.

Most moves are for individual parameters. There are also occasional group moves included to stretch/shrink or shift the whole distribution. These are only really important when the likelihoods all overlap and you are dealing with a potentially short overall group length (as is the case for how you calculate the Silverman kernel estimate).

I think you should be able to get it running with just the moves for t_i and g, randomly updated for the types of application you have been looking at. The issue might be speed in R. The code is written in C++ and is still fairly low for large group sizes. You will also almost certainly need to work in log space to prevent overflow of the probabilities.

You could start with the KDE_Plot routine - which really just finds g which otherwise sampling t_i from the likelihoods.

Best wishes

Christopher
> --
> You received this message because you are subscribed to the Google Groups "OxCal" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to oxcal+un...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/oxcal/1d2b2356-fc44-441d-a3f5-8ba75a03460an%40googlegroups.com.

Michael Price

unread,
Mar 23, 2021, 12:06:58 AM3/23/21
to OxCal
Hi Christopher,

Thank you kindly for the quick and clear response. I hope your patience extends far enough to answer some additional questions :).

My first has to do with the particulars of the four trial state changes. I would like to implement the exact algorithm in OxCal, if possible, so I would like to implement those, too. With what probability is each step trial state chosen relative to the others? If there are 99 t_i's is an update on g attempted 1/100 times on average (excluding the shifts and expansions/contractions). Sticking to the four trial state changes, to guarantee detailed balance it is necessary that the proposal distribution be symmetric (well, that's not quite right: using symmetric proposal distributions is sufficient for achieving detailed balance with typical sampling approaches), which is easy to maintain for the first three types of trial state changes, but requires some care for the fourth.

Let s be a rescaling factor and mu the current mean for the vector t. The proposal t' = mu + (t-mu)*s seems likely what one would use to do the expansion/contraction (yes? did I screw something up, or do you use a different equation?). How do you draw s to ensure that detailed balance is maintained?

My last question has to do with the trial state change for t_i. The article reads, "the acceptance of this trial move depends on the product of the KDE distribution probability given in Equation 9 for all the other events, and the likelihood associated with any direct measurements on t_i." I know that the KDE approach mixes frequentist and Bayesian reasoning, so perhaps my question is, as physicists sometimes say, not even wrong. Nevertheless, here goes: is it more accurate to say that the second component is the posterior probability of t_i given all available information aside from the KDE? Or do you envision the KDE component of the conditional density to be something like a prior for the event?

To remove the possibility of ambiguity on the preceding point, Equation 9 gives a formula for p(t_i|g,t_{-i}) (up to a constant of proportionality), where t_{-i} represents all the elements of the vector t aside from the i-th element. The only measurement that bears on t_i is a radiocarbon determination for which the fraction modern value is phi_{m,i} (m for measurement) with associated uncertainty on the fraction modern of sigma_{m,i} (i.e., there is no stratigraphic or other information to help constrain t_i further). The probability of t_i given the radiocarbon measurement is (assuming I haven't messed up any of the math -- and I don't know how to get a text symbol proportional to)

p(t_i|phi_{m,i},sigma_{m,i}) is proportional to p(phi_{m,i}|t_i,sigma_{m,i}) p(t_i)

where phi_{m,i}|t_i,sigma_{m,i} is normally distributed and depends on the calibration curve in the usual way and p(t_i) is the prior on t_i (often assumed to be an improper uniform distribution on the interval -Inf to Inf, but not necessarily so). As I interpret it, the quantity that enters into the Metropolis step for t_i is:

z_i = p(t_i|g,t_{-i}) * p(phi_{m,i}|t_i,sigma_{m,i})

Do I understand correctly? Why not the following instead?

z_i = p(t_i|g,t_{-i}) * p(phi_{m,i}|t_i,sigma_{m,i}) p(t_i)

All the best,
Mike Price

Christopher Ramsey

unread,
Mar 23, 2021, 11:15:02 AM3/23/21
to OxCal group
Mike

Thanks for your followup email which I will try to answer.

On the MCMC updates - the whole process runs through all parameters but in a random order before repeating this - so one pass will have a trial update of every parameter. This does, as you suggest that g is only updated (or not) once every pass of all the other parameters. This may not be ideal - it is just the way the program is set up more generally.

Yes - detailed balance is important. For the shift moves:

All can be moved together using a trial taken as +/- 0.5 sd of the event distribution - with a uniform distribution of trial states.

For the fourth scaling move you can use a multiplier about the mean with a log multiplier set uniform between +/- ln(2). In fact this move does not seem to be necessary in the way it is for many sequence type models. If you do include it you should take into account the change in phase space associated with the move. I don't find it makes any difference (as it should not) but neither does it seem to help convergence and mixing (which it might in some circumstances). It is actually currently disabled in the code for this reason.

The calibration likelihood is the only other contribution as you say. In fact the way this is implemented in OxCal you are free to set any other constraints too - the KDE_Model is not standalone code but embedded in the general MCMC for OxCal and the parameters can have other priors (and or likelihoods) associated with them in principle.

Best wishes

Christopher
> To view this discussion on the web visit https://groups.google.com/d/msgid/oxcal/ba537451-ed78-4370-92ab-6499561d6f17n%40googlegroups.com.

Michael Price

unread,
Mar 23, 2021, 1:31:55 PM3/23/21
to ox...@googlegroups.com
Got it, and thanks again.

I'm not quite 100% clear on how the shift move works. The first part seems clear. To avoid ambiguity, I'll resort to equations. Let t be the current vector of times. The shifted proposal is either t' = t + 0.5 sd(t) or t' = t - 0.5 sd(t). I'm guessing the two possibilities (+/-) are chosen with 50% probability each. I'm not sure what "with a uniform distribution of trial states" means here, though, since the first part of the description seems to fully describe the proposal. I'd worry about doing anything beyond the equation immediately above since it respects detailed balance, and any modifications to it must make sure to also.

I believe I understand how the shift moves works (though I have taken note that it is currently disabled in OxCal). The equation I posited, t' = mu + (t-mu)*s, is correct, and the logarithm of s is uniformly distributed, log(s) ~ Unif(-ln(2),ln(2)). This respects detailed balance.

Regarding the overall sampling procedure with the four trial state changes (in practice, three), is one shift and one rescale attempted for each cycle through the parameters?

Can you point me to a description of the currently implemented version of the MCMC in OxCal? In particular, I need to choose a proposal distribution for the t_i update step. I suspect a normal draw around the current value is sufficient.

Lastly, can you clarify which of the following equations is correct for the factor entering the Metropolis step (adopting my previous notation and example)?

z_i = p(t_i|g,t_{-i}) * p(phi_{m,i}|t_i,sigma_{m,i})

vs

z_i = p(t_i|g,t_{-i}) * p(phi_{m,i}|t_i,sigma_{m,i}) p(t_i)

(Or possibly my question is simply not even wrong.)

I've managed to vectorize the key operations in R, and can sample relatively quickly. I will send some benchmarks once I have them. I can achieve further speed-up by running separate sampling chains across multiple cores. I suspect further improvements in performance might be achieved by modifying the sampling procedure (e.g., using Hamiltonian MCMC, though that would probably require analytically calculating some derivatives). Also, I suspect that recursion can be used to speed up the kernel calculation, though I have yet to derive an exact equation for it. I will report back if I do. It would require raising a matrix to the power of the ratio of the Silverman bandwidths, which I worry could lead to some numerical instability, but it would, I think, be faster than calculating the log-normal density each time for every pair of events (well, excluding the diagonal since j cannot equal i).

If I looked over the algorithm source code, I could answer all these questions myself in short order. Is it available somewhere?

Mike

You received this message because you are subscribed to a topic in the Google Groups "OxCal" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/oxcal/ZfhqJt0qyck/unsubscribe.
To unsubscribe from this group and all its topics, send an email to oxcal+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/oxcal/27D7AB9E-0936-4D79-982E-C9C240F0FCCE%40arch.ox.ac.uk.

Christopher Ramsey

unread,
Mar 23, 2021, 2:21:37 PM3/23/21
to OxCal group
Mike

Yes sorry the shifted proposal is drawn from a Unif distribution over -0.5-+0.5 sd(t) so it allows for smaller increments too. This shift proposal is used.

It is the scaling move that is not used and I have checked it was never included in the released version of OxCal, probably because I was not 100% sure how to deal with the phase space effects given the model is not purely Bayesian. The paper was written ahead of the release and the code is there but disabled.

There is one shift proposal per cycle.

For the proposal distribution for t_i - a normal draw would be sensible and indeed I have been intending to implement this as it would probably be more efficient. It currently uses a Unif over the range covered by the likelihood - so it actually behaves rather like a Gibbs sampler (which was the original coding). Where there are constraints (not generally used here) they limit the Unif. Detailed balance is ensured this way and it allows for larger shifts but is probably slower than a draw from a Normal. This should not affect the results of the MCMC in theory but it might affect the speed.

For the use of the calibration steps - I would look at equation 9 and the derivation leading to it at:

http://dx.doi.org/10.1017/s0033822200033865

The source code is not currently available - but I do hope to get it into a form where it could be - which will require more explanatory documentation to be useful. The part that deals with the MCMC is about 7000 lines of code and uses routines from elsewhere - there are nearly 40 files, some in plain C and some in C++ mostly because it has been written over several decades as programming ideas have changed.

Best wishes

Christopher
> To view this discussion on the web visit https://groups.google.com/d/msgid/oxcal/CA%2BTe8FR-cxwj6%3DD0CTk57BhDKwRTgWwmHF6vZCFKujByPaPacQ%40mail.gmail.com.

Reply all
Reply to author
Forward
0 new messages