Hello, Dedupe lib ask me for 10 example of duplicates and 10 example of non-duplicates.
Is increasing this number will give me better results?
For example I have one milion records and 10 000 of them are labeled as duplicated, should I put all 10 000 as training data? Or few of them will be sufficient?