Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deduplication and record linkage #259

Open
xinelim opened this issue Oct 31, 2018 · 1 comment
Open

Deduplication and record linkage #259

xinelim opened this issue Oct 31, 2018 · 1 comment

Comments

@xinelim
Copy link

xinelim commented Oct 31, 2018

Hi,
Given two sets of datasets, is it possible that I deduplicate each dataset and then perform record linkage across two datasets? Please advise.

@uderline
Copy link

uderline commented Nov 1, 2018

Hi !

You will need to do this step by step: deduplicate each dataset individually (in a new file for example) and then link them. There is no way of doing those at the same time.

I sort of wanted to do something like you at one point using Python by getting the matches/links from the console with the command java no.priv.garshol.Duke .... config.xml. It was a waste of time, you should go directly with Java and use the MatchListener classes and maybe make your own if you need to.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants