Hi,
first of all, thanks for making Duke available freely, I think it will prove very valuable to what I'm doing.
I have a question about the workings of the Bayesian algorithm. I'm looking at the
How it works page and trying to follow the steps but it seems I'm missing something.
To make a simple example in Bayes' terms, we have the following assumptions:
event A: entities are equal (match)
event B: property (e.g. address, name) of two entries is equal (or is event B actually that the entries have a certain similarity score, calculated by a comparator function?)
we assume: P(A) = 0.5 without prior knowledge. This seems arbitrary but I get that you have to start somewhere.
we assert: P(A | B) = p_high, P(A | not B) = p_low
e.g. for the first step on the "How it works" page we have p_high = 0.65 and p_low = 0.25 for the address comparisons.
the page then tells us that the similarity between two given addresses is 0.867 and that this updates the probability to 0.6127.
This is where I am getting lost. How do you get the new probability, starting from 0.5? And how does a similarity score between two properties translate to P(B), the probability that the properties are actually equal?
would be glad for any explanation or link to a more detailed description of what the algorithm is doing.