how can I achieve Incremental indexing in duke dedup

55 views
Skip to first unread message

Vamshi

unread,
Dec 21, 2015, 1:34:25 AM12/21/15
to duke
Hi ER Experts,

We are using Duke dedup for Entity resolution(ER) in record linkage mode, using oracle and postgreSql as data-sources and
also we are using database as no.priv.garshol.duke.databases.LuceneDatabase for storing the data in index and using it for comparison.
The records which are linked, I'm indexing into Solr for searching.I have run the duke using command-line program, Entire application is working fine,
only problem I have is incremental Database records. when add any new records into database and ran duke, but it is creating index of entire first data-source again by
overwriting the old one. How can I index only newly added record?

please suggest how can I overcome this problem.

Thanks & Regards,
Vamshi

Lars Marius Garshol

unread,
Dec 27, 2015, 5:40:20 AM12/27/15
to duke

* Vamshi

only problem I have is incremental Database records. when add any new records into database and ran duke, but it is creating index of entire first data-source again by  overwriting the old one. How can I index only newly added record?

If you use the "--noreindex" option Duke will not overwrite the existing Lucene index, and instead just add the new records. That should give you the behaviour you're looking for.

I hope this helps.

--Lars Marius
Reply all
Reply to author
Forward
Message has been deleted
0 new messages