Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Duke Handling of Missing Values #241

Open
atifijazkhan opened this issue Feb 24, 2017 · 0 comments
Open

Duke Handling of Missing Values #241

atifijazkhan opened this issue Feb 24, 2017 · 0 comments

Comments

@atifijazkhan
Copy link

Consider the following dataset:
1,john,doe
2,john,
3,john,watson

For matching purposes, I am assuming that both attributes are of equal importance and hence high=0.999 and low=0.001 has been set with Exact Comparator matching.

Normally the expectation is that
#1: 1-match-1: produce match score of ~1
#2: 1-match-2: produce a match score somewhere between 0.5 and 1, but much lower than #1
#3: 1-match-3: produce a match score ~0.5 (as we are matching on 1 attribute).

I get the following scores:
#1: 1-match-1: Overall: 0.999998997998
#2: 1-match-2: Overall: 0.999
#3: 1-match-3: Overall: 0.4999999999999998

Notice how close the scores are for #1 and #2. I understand that Duke ignores missing values. However, if I wanted to process missing values, what would be the best course of action.

I would like to achieve something like the following:
#1: 1-match-1: Overall: 0.999998997998
#2: 1-match-2: Overall: 0.75
#3: 1-match-3: Overall: 0.4999999999999998

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant