Question: sample_size is pairs or records?

33 views
Skip to first unread message

Flávio Juvenal

unread,
Jan 28, 2020, 6:56:55 PM1/28/20
to open source deduplication
Hi folks, once again thanks for this excellent library! I have a question about the sample method inside Dedupe class.

I've checked the docs and the code, but it's not clear to me: the sample_size argument of the sample method refers to the number of pairs or the number records to sample?
I've read the sampling.py source code but it's a bit difficult to follow, so I thought it would be easier to ask here.

Thanks,
Flávio.
Reply all
Reply to author
Forward
0 new messages