Help with array/list variable definitions

28 views

Skip to first unread message

ilan....@credifi.com

unread,

Mar 27, 2018, 5:28:14 AM3/27/18

to open source deduplication

Appreciate any help in this area:

I have a data set with many companies in it.

Features include CompanyName, Size, Type, Phone and other general information. -- These variables are easily defined by the pre-made 'core' list.

However, also in my Company Data Set, I have an array of addresses. A company can have 100's of addresses. Each address contains sub-parts (Street, City, State...)

If I had 1 address for the company, I would define the variable using the 'AddressType' definition.

But how do I include all of the addresses? A different company record may have 1 address that is the same as 1 of the 100 addresses of another company. This similarity has meaning as it could mean these companies are indeed duplicates.

Question: How to include an array (or list) of features into the variable definition, such that they are taken into account when creating pairs/blocks?