CF formula breakthrough! (hopefully)

8 views
Skip to first unread message

Bruce Frederiksen

unread,
Mar 27, 2011, 1:06:30 AM3/27/11
to Naimath developers
Hi Everybody,

I just finished writing the CF formula objects for the new ZODB based Naimath engine.  I'm pretty excited because I have figured out a way to generalize the Mycin formula to make it much more flexible.

Here's the idea!

The original Mycin formula (for positive arguments), A and B, is: A + B - A*B.  This causes the first match to contribute more than subsequent matches.  If you looked at the increase to the overall CF as new matches are added, it tapers off, or flattens out.

Contrast this to our original CF formula: A + B.  Here, the overall CF continues to grow linearly as new matches are added.

One might also imagine a situation where you would want the overall CF to grow more and more rapidly as new matches are added.  This could be done using the formula: A + B + A*B.

So I've generalized all of these into a single formula: A + B + K*A*B, where K is a constant set for that formula (i.e., in that one rule).  If you select -1 for K, you get the Mycin formula.  If you select 0, you get our original formula.  And if you select +1, you get the formula that grows more and more rapidly.  Also, by selecting values for K between -1 and 0, or between 0 and 1, you can get different levels of aggressiveness in how quickly or slowly the curve changes.

These are all made possible due to the idea that James had a long time ago about adding up the total of the max CF values for all of the questions to get a final divisor.  This keeps the final CF value from becoming > 1.  This is required for all cases where K > -1.

Another thing that has been bothering me is that the aggressiveness of the Mycin curve is also affected by the individual CF numbers chosen.  Higher CF numbers create a more aggressive curve, while lower CF numbers create a less aggressive curve.  I've also been bothered by the individual CF values in our original CF formula not showing up when that's the only question that was answered correctly.  Because the total of the CF values are generally > 1, if you tag an answer with CF 0.6, you'll get much less than that when that's the only "right" answer.

I had thought a bit about implementing a scaling function, so that doctors could enter an initial set of CF values that they thought felt right, and could then scale them by the total CF value to get the actual CF values that would be contributed to the CF for the diagnosis.  And guess what?!  When this scaling function is applied to rules with a K value other than 0, it also scales the K value.  So scaling the individual CF values will also show what impact the choice of smaller or larger CF values had on the shape of the curve.  So it all comes out quite nicely and seems to make a lot of sense.

In addition to the modified formula, I also have a simple "max" formula.  These can also be scaled.

So now, you can combine these different kinds of formulas in any way imaginable, and scale the whole mess when you're done.

All of this code is checked into sourceforge!  The next question is how to play with this stuff?

-Bruce
Reply all
Reply to author
Forward
0 new messages