Duke Handling of Missing Values

39 views
Skip to first unread message

Atif Khan

unread,
Feb 24, 2017, 8:33:08 PM2/24/17
to duke

Consider the following dataset:
1,john,doe
2,john,
3,john,watson



For matching purposes, I am assuming that both attributes are of equal importance and hence high=0.999 and low=0.001 has been set with Exact Comparator matching.

Normally the expectation is that
#1: 1-match-1: produce match score of ~1
#2: 1-match-2: produce a match score somewhere between 0.5 and 1, but much lower than #1
#3: 1-match-3: produce a match score ~0.5 (as we are matching on 1 attribute).


I get the following scores:
#1: 1-match-1: Overall: 0.999998997998
#2: 1-match-2: Overall: 0.999
#3: 1-match-3: Overall: 0.4999999999999998



Notice how close the scores are for #1 and #2. I understand that Duke ignores missing values. However, if I wanted to process missing values, what would be the best course of action.

I would like to achieve something like the following:
#1: 1-match-1: Overall: 0.999998997998
#2: 1-match-2: Overall: 0.75
#3: 1-match-3: Overall: 0.4999999999999998

Reply all
Reply to author
Forward
0 new messages