possible to ID focal record within cluster -or- best way to choose canonical record from cluster

37 views
Skip to first unread message

Tim Stallmann

unread,
Sep 22, 2022, 10:34:53 AM9/22/22
to open source deduplication
Hi! Enjoying this tool so far, awesome work!

I'm curious about whether there's any way to ID what the "focal" record is within a given cluster? 

The documentation for the cluster() method refers to the similarity score as being calculated from a focal record and other records, but I can't tell if the methodology actually does identify a specific focal record within each cluster or not.

Do other folks on the list have any strategies you've used to ID what the best "canonical" record is to choose from within a given cluster?

Tim Stallmann

unread,
Sep 22, 2022, 11:09:09 AM9/22/22
to open source deduplication
Lol I am realizing I totally failed to notice the `canonicalize` convenience function, which is already documented!
Reply all
Reply to author
Forward
0 new messages