One-to-many, many-to-one, and many-to-many matching (Gazetteer)

81 views
Skip to first unread message

Dylan Culfogienis

unread,
Jun 22, 2020, 11:30:18 AM6/22/20
to open source deduplication
So, I have a clean dataset that I created using a combination of filtering methods and deduplication. This dataset has 3 columns, manufacturer_catalog_number, gxci_number, and nsn_number.

The messy dataset that I am trying to record link to the clean one just has a manufacturer_catalog_number; however, this number may be any of the three numbers in the clean dataset.

Furthermore, my clean dataset has both short_descriptions and long_descriptions. The messy dataset has a description that may EITHER correspond to the short or long one.

Is it possible to do many-to-one or one-to-many matching of variables like this? Or, better yet, is it possible to do many-to-many matching?

Thanks,

- Dylan Culfogienis
Reply all
Reply to author
Forward
0 new messages