Groups
Sign in
Groups
duke
Conversations
About
Send feedback
Help
duke
Contact owners and managers
1–30 of 164
This is the main mailing list for discussion of
the Duke deduplication engine
. Asking for help, proposing new features, discussion of further development of Duke, and more, are all good discussion topics here and very much welcome.
Mark all as read
Report group
0 selected
dukeUser
2/24/20
list of duplicate record
Hello, my question is how can I get the list of duplicate records with theirs scores in order the
unread,
list of duplicate record
Hello, my question is how can I get the list of duplicate records with theirs scores in order the
2/24/20
Venkatesh T
,
JP de Vooght
2
11/5/19
Pls suggest to improve performance in keyvaluedb
Hi Venkatesh? Did you eventually solve this? If I understand correctly blocking is possible only with
unread,
Pls suggest to improve performance in keyvaluedb
Hi Venkatesh? Did you eventually solve this? If I understand correctly blocking is possible only with
11/5/19
Chris colón
, …
JP de Vooght
3
11/4/19
Still alive.
I just found this project and am looking at the DBLP2-ACM data with it. First with my own config.xml
unread,
Still alive.
I just found this project and am looking at the DBLP2-ACM data with it. First with my own config.xml
11/4/19
sandy baroo
4/2/18
Fwd: Us congress hearing of maan alsaan Money laundry قضية الكونغجرس لغسيل الأموال للمليادير معن الصانع
YouTube videos of US Congress money laundering hearing of Saudi Billionaire " Maan Al sanea
unread,
Fwd: Us congress hearing of maan alsaan Money laundry قضية الكونغجرس لغسيل الأموال للمليادير معن الصانع
YouTube videos of US Congress money laundering hearing of Saudi Billionaire " Maan Al sanea
4/2/18
s142...@gmail.com
, …
Brandon Hoult
6
7/8/17
linkfile all confidence probabiilities equals to 1
Look again at solr. It has all the same comparitors as Duke, including the phonetic, geospatial,
unread,
linkfile all confidence probabiilities equals to 1
Look again at solr. It has all the same comparitors as Duke, including the phonetic, geospatial,
7/8/17
Venkatesh T
7/3/17
Pls suggest to improve performance using MapDBBlockingDatabase
Dear Lars, First of all, thank you very much for this wonderful framework. I am trying my best to
unread,
Pls suggest to improve performance using MapDBBlockingDatabase
Dear Lars, First of all, thank you very much for this wonderful framework. I am trying my best to
7/3/17
zubairmessi
5/9/17
Can anyOne Help me Finding Duplicates From 2 SQL Database
When i try to Run this I get the error Please let me know How can We Fix this problem . I am trying
unread,
Can anyOne Help me Finding Duplicates From 2 SQL Database
When i try to Run this I get the error Please let me know How can We Fix this problem . I am trying
5/9/17
Salvatore Taddeo
4/4/17
Execute Duke 1.1
Hi all, i'm trying to execute Duke 1.1 versione because is more simple then new versions, i just
unread,
Execute Duke 1.1
Hi all, i'm trying to execute Duke 1.1 versione because is more simple then new versions, i just
4/4/17
Atif Khan
2/24/17
Duke Handling of Missing Values
Consider the following dataset: 1,john,doe 2,john, 3,john,watson For matching purposes, I am assuming
unread,
Duke Handling of Missing Values
Consider the following dataset: 1,john,doe 2,john, 3,john,watson For matching purposes, I am assuming
2/24/17
Mauro Fraboni
2/20/17
Match score using option --linkfile
I run duke for RecordLinkage using the following command: java no.priv.garshol.duke.Duke --linkfile=
unread,
Match score using option --linkfile
I run duke for RecordLinkage using the following command: java no.priv.garshol.duke.Duke --linkfile=
2/20/17
Jeff Headley
11/16/16
Duke with rest services and a single record to be matched
I would like to use Duke if possible in this scenario and so far I haven't been able to figure it
unread,
Duke with rest services and a single record to be matched
I would like to use Duke if possible in this scenario and so far I haven't been able to figure it
11/16/16
wrh...@gmail.com
10/14/16
Jaro-Winkler returning the wrong value?
I was doing some testing of the implementation of the Jaro-Winkler comparator in Duke and I think
unread,
Jaro-Winkler returning the wrong value?
I was doing some testing of the implementation of the Jaro-Winkler comparator in Duke and I think
10/14/16
Roman Hennig
2
8/4/16
print out "maybe" matches
also, is there an option to display the match score in the matchfile?
unread,
print out "maybe" matches
also, is there an option to display the match score in the matchfile?
8/4/16
Antonio Quintana
,
Lars Marius Garshol
2
7/24/16
ElasticSearch as Database
* Antonio Quintana wrote: I'd like build class similar to LuceneDatabase but using ElasticSearch
unread,
ElasticSearch as Database
* Antonio Quintana wrote: I'd like build class similar to LuceneDatabase but using ElasticSearch
7/24/16
Roman Hennig
,
Lars Marius Garshol
5
7/4/16
detailed explanation of how Duke's Bayesian algorithm works?
Well if it works well, that's fair enough. I just want to understand what I'm using :) Thanks
unread,
detailed explanation of how Duke's Bayesian algorithm works?
Well if it works well, that's fair enough. I just want to understand what I'm using :) Thanks
7/4/16
Nigel Vivian
,
Lars Marius Garshol
9
7/1/16
Lucene database: parameters have no effect
Ordinarily I would agree with you, but I have a version of Duke SNAPSHOT built a while ago and this
unread,
Lucene database: parameters have no effect
Ordinarily I would agree with you, but I have a version of Duke SNAPSHOT built a while ago and this
7/1/16
Nicola Ghirardi
,
Lars Marius Garshol
2
6/30/16
Single thread used?
* Nicola Ghirardi i'm using duke to link 6M vs 600k data sources. After the first part (indexing?
unread,
Single thread used?
* Nicola Ghirardi i'm using duke to link 6M vs 600k data sources. After the first part (indexing?
6/30/16
Soundarya Thiagarajan
,
Lars Marius Garshol
2
4/1/16
Duke - config xml - Data Deduplication
* Soundarya Thiagarajan https://github.com/larsga/Duke/blob/master/doc/example-data/countries.xml -
unread,
Duke - config xml - Data Deduplication
* Soundarya Thiagarajan https://github.com/larsga/Duke/blob/master/doc/example-data/countries.xml -
4/1/16
Soundarya Thiagarajan
,
Mike Vasquez
2
3/23/16
Data deduplication - https://github.com/larsga/Duke/wiki/SemanticDogfood
You have to add lucene jars to the classpath (there is more than one jar) On Wed, Mar 23, 2016 at 10:
unread,
Data deduplication - https://github.com/larsga/Duke/wiki/SemanticDogfood
You have to add lucene jars to the classpath (there is more than one jar) On Wed, Mar 23, 2016 at 10:
3/23/16
Daniel Perry
2/24/16
Is there a plan to tag a new release and update maven central at some point?
Hello there, Firstly this looks like a great project, so thanks for taking the time to share it. I
unread,
Is there a plan to tag a new release and update maven central at some point?
Hello there, Firstly this looks like a great project, so thanks for taking the time to share it. I
2/24/16
Vamshi
2/24/16
how to pass jdbc params dynamically instead of exposing it in duke config file ?
Hi Experts, I have a requirement that i should not expose database credentials in duke configuration
unread,
how to pass jdbc params dynamically instead of exposing it in duke config file ?
Hi Experts, I have a requirement that i should not expose database credentials in duke configuration
2/24/16
andrea patricelli
2
2/10/16
Genetic algorithm main: correct Lucene versions.
Solved, Im also using Elasticsearch (with Duke plugin) and there are some congùflicting dependencies.
unread,
Genetic algorithm main: correct Lucene versions.
Solved, Im also using Elasticsearch (with Duke plugin) and there are some congùflicting dependencies.
2/10/16
Vamshi
,
Lars Marius Garshol
2
12/27/15
how can I achieve Incremental indexing in duke dedup
* Vamshi only problem I have is incremental Database records. when add any new records into database
unread,
how can I achieve Incremental indexing in duke dedup
* Vamshi only problem I have is incremental Database records. when add any new records into database
12/27/15
Elmo Macalalad
12/20/15
GPG Passpharse
Hi mates, i would like to ask on how to access the GPG Passpharse: thanks
unread,
GPG Passpharse
Hi mates, i would like to ask on how to access the GPG Passpharse: thanks
12/20/15
Lucas Adams
2
11/23/15
Bayes Probability Estimator
http://cs.wellesley.edu/~anderson/writing/naive-bayes.pdf Here is a pdf that is similar to what I was
unread,
Bayes Probability Estimator
http://cs.wellesley.edu/~anderson/writing/naive-bayes.pdf Here is a pdf that is similar to what I was
11/23/15
Brandon Fletcher
, …
Monte Cillo Co
3
11/16/15
Duke sourcing Hbase Question
Hi, I guess you may use https://phoenix.apache.org/ :D On Friday, August 7, 2015 at 7:24:37 AM UTC+8,
unread,
Duke sourcing Hbase Question
Hi, I guess you may use https://phoenix.apache.org/ :D On Friday, August 7, 2015 at 7:24:37 AM UTC+8,
11/16/15
Kai Hüner
,
Lars Marius Garshol
7
10/20/15
Upgrade to Lucene 5
Ok, I did a little profiling of a job with 229000 records. Not huge, but enough to get a real test. -
unread,
Upgrade to Lucene 5
Ok, I did a little profiling of a job with 229000 records. Not huge, but enough to get a real test. -
10/20/15
kishore kumar suthar
,
Lars Marius Garshol
2
10/7/15
Duke deduplication engine : How can I give a string to deduplication program
* kishore kumar suthar proc.link(); proc.deduplicate(); You should have only one of these. .link() is
unread,
Duke deduplication engine : How can I give a string to deduplication program
* kishore kumar suthar proc.link(); proc.deduplicate(); You should have only one of these. .link() is
10/7/15
Nathalie C
9/29/15
usage of duke-es for record linkage
Hi, I would like to use duke and elasticsearch for record linkage. I tried using Yann's plugin
unread,
usage of duke-es for record linkage
Hi, I would like to use duke and elasticsearch for record linkage. I tried using Yann's plugin
9/29/15
Vasco
, …
Lars Marius Garshol
3
9/29/15
Prior probabilities
* Vasco Is there any way to adjust the prior? It seems to be always set to .5? Is there some reason I
unread,
Prior probabilities
* Vasco Is there any way to adjust the prior? It seems to be always set to .5? Is there some reason I
9/29/15