Hi Tim -
If I understand correctly you have two fairly straight-forward options here.
If all you want to do if use a really straight-forward string to
determine blocking, like an ISO 3166 country code, and you're certain
that the strings are accurate in your data set, you can declare the
country_code as an "Exact" type variable in training
(
https://docs.dedupe.io/en/latest/Variable-definition.html) like this:
{'field' : 'country_code', 'type': 'Exact'}}
Alternatively, if you've come up with something really clever for your
country code comparisons, or there's something weird about them in
your dataset, or if you are just using country codes as an example,
you can consider declaring a custom comparator type. According to the
docs (
https://docs.dedupe.io/en/latest/Variable-definition.html#custom-types),
"The comparator must be a function that can take in two field values
and return a number."
The example docs give a simple comparison example:
def sameOrNotComparator(field_1, field_2) :
if field_1 and field_2 :
if field_1 == field_2 :
return 0
else:
return 1
variable definition:
{'field' : 'country_code', 'type': 'Custom',
'comparator' : sameOrNotComparator}
If you're normalizing different types of country codes, your function
might want to look at number of characters in a string, but you've
already got your logic worked out.
--
All the best,
Josh W.
https://joshwieder.net