possible to ID focal record within cluster -or- best way to choose canonical record from cluster

45 views

Skip to first unread message

Tim Stallmann

unread,

Sep 22, 2022, 10:34:53 AM9/22/22

to open source deduplication

Hi! Enjoying this tool so far, awesome work!

I'm curious about whether there's any way to ID what the "focal" record is within a given cluster?

The documentation for the cluster() method refers to the similarity score as being calculated from a focal record and other records, but I can't tell if the methodology actually does identify a specific focal record within each cluster or not.

Do other folks on the list have any strategies you've used to ID what the best "canonical" record is to choose from within a given cluster?

Tim Stallmann

unread,

Sep 22, 2022, 11:09:09 AM9/22/22

to open source deduplication

Lol I am realizing I totally failed to notice the `canonicalize` convenience function, which is already documented!

Reply all

Reply to author

Forward

0 new messages