Doing the calculation to combine probabilities?

23 views
Skip to first unread message

Andrew Leer

unread,
Oct 26, 2012, 9:55:03 AM10/26/12
to duke-...@googlegroups.com
I'm reading the blog post over at Larsblog on "Bayesian Identity Resolution", prior to this I had been struggling to figure out how the high and low probabilities from different fields combine, to give the final probably that two records match, but now it's beginning to make a little more sense.

I came to the part of the blog post that reads:

So here you see immediately a benefit of the approach: let's say two organizations have the same name, but different organization numbers. That gives us 0.9 and 0.1 probability, which combines to 0.5.
From what I can tell from my humble understanding of the source code of Duke, it looks like the two probabilities are combined using this source:

/**
   * Combines two probabilities using Bayes' theorem.
   */
  public static double computeBayes(double prob1, double prob2) {
    return (prob1 * prob2) /
      ((prob1 * prob2) + ((1.0 - prob1) * (1.0 - prob2)));
  }

But when I try that with the equation given in the source code the probabilities in the blog post (prob1 = 0.1 and prob2 = 0.9) they combine to 0.18 not 0.5 as stated in the blog post.

Am I looking at the wrong source for combining them?  I'm trying to get a better understanding so that I can tune a configuration file for Duke better.

Thank you,
    Andrew J. Leer

Andrew Leer

unread,
Oct 26, 2012, 10:03:38 AM10/26/12
to duke-...@googlegroups.com
Oh never mind I forgot to divide...
Reply all
Reply to author
Forward
0 new messages