usage of duke-es for record linkage

65 views
Skip to first unread message

Nathalie C

unread,
Sep 29, 2015, 10:21:25 AM9/29/15
to duke
Hi,

I would like to use duke and elasticsearch for record linkage. I tried using Yann's plugin for entity-resolution and what I want to do is possible, however many of my values are null and it doesn't handle this correctly. 

Then I discovered duke-es and I wonder if I could use it instead. I want to use it like a datasource (also database if possible) and as far as I understood from https://github.com/larsga/Duke/pull/193, it should be possible.

I receive the data using a rest interface, then I'd run duke only on these new records comparing against what is already in the database, and finally add the records in elasticsearch (or add them before if that makes things easier).

Is this possible using duke-es?

Could you provide a sample xml for elasticsearch? I could not find how to add cleaners. Also, an example of how to use the API for that specific task would be useful.

Best regards





Reply all
Reply to author
Forward
0 new messages