Regarding Custom Blocking & Centroid Calculation for non-numerical vector

32 views
Skip to first unread message

Indrajit Saha

unread,
May 27, 2021, 9:35:31 AM5/27/21
to open source deduplication
Hi All,

First of all thank you for making this package open-sourced. Hope all of you are doing well.

I have two questions:

Question 1: How to manually provide the field names for blocking purpose?

Question 2: Say we have 10 rows. So using the classification model we can get the similarity metric for 10 by 10 matrix. Then we can convert this similarity matrix to distance matrix. Once we have this distance matrix ready we go for the hierarchical clustering technique. My question is how can we use linkage="Centroid" method? As the rows are not numeric vectors so how can we take the average of all the elements within a cluster?

I hope it is clear what I meant to ask!

Hope to hear from you soon.
Reply all
Reply to author
Forward
0 new messages