open source deduplication

Forest Gregg and Derek Eder of Open City are creating a generic open-source library to find duplicates/matched records in any given data set. If that makes no sense to you, check out our presentation slides, which give a friendlier overview.

You can get the library here: https://github.com/open-city/dedupe


Showing 1-16 of 35 topics
Dedupe 1.0 Forest Gregg 8/18/15
sorry for the recruiter spam! Derek Eder 6/24/15
Dedupe 0.8.0 Forest Gregg 4/16/15
StaticRecordLink: matchBlocks API dougths Alexandre Rodrigues 2/26/15
matchBlocks error: A 0-dimensional array is not permitted Alexandre Rodrigues 1/25/15
moving to python 3 Forest Gregg 11/4/14
Having trouble training a model Asoka Diggs 10/28/14
Can someone help de-dupe 40million email addresses? Evan Frey 9/18/14
Dedupe-Web issue Eric van Zanten 8/27/14
Dedupe 0.6.0 Released Forest Gregg 8/11/14
Spreadsheet Deduper and Dedupe 0.5 Forest Gregg 3/17/14
Data Deduping software Jeffery Tully 1/26/14
Re: Dedupe for record linkage Forest Gregg 12/1/13
Re: [dedupe] Abridged summary of open-source-...@googlegroups.com - 2 Messages in 1 Topic Declan Frye 11/26/13
Re: using dedupe on the web Friedrich Lindenberg 11/25/13
Re: Dedupe solution Forest Gregg 11/23/13
More topics »