Hello all,
We are creating a workflow for reconciling our place records with TGN. We are currently planning to use OpenRefine and the Getty's Reconciliation Service API to do this. However, we've run into a snag.
It seems that "latitude" and "longitude" are not taken into account at all by the reconciliation service when suggesting matches. This hampers the reconciliation workflow, especially when multiple matches with identical labels are returned.
Many of our places have "latitude" and "longitude" on record in our collections database. We were hoping that we could use these coordinates to help with reconciliation. Intuitively, we expected that the scores returned by the reconciliation service would be influenced by the distance between the coordinates in our query vs. the coordinates of each candidate. Failing that, we expected that candidates where the coordinates matched exactly would be bumped to the top. However, it seems that neither is the case at this time.
I'll show what we are doing in OpenRefine. I've attached a sample of our data to this email. In these screenshots, I'm working with that sample.
When starting the reconciliation in OpenRefine, we reconcile on the `title` column and opt to include `latitude` and `longitude` in the reconciliation. When typing in the "As Property" input, `latitude` and `longitude` are offered as suggestions by the service:
Once the automated part of the process completes, we can find instances of this issue at play. Let's take a look at "Peoria" first:
In this case, there are 16 exact matches on "Peoria" that have been returned, and the one we are looking for happens to be the top result. Coordinates are not included in the preview modal, so we have to verify this by clicking on the full record display link:
However, note that 11 of the 16 exact-match candidates share the same score. I don't know why 5 of the exact-match candidates have a different score, but I'd expect the score to be influenced by distance to the queried coordinates.
Next, let's take a look at "Springfield":
In this case, I looked up the coordinates via Google and confirmed that this record referred to Springfield, Illinois:
The reconciliation service returned 22 exact matches on Springfield, and I confirmed that none of them was Springfield, IL!
The web search interface returns 100 exact-match results for "Springfield" in TGN:
Turns out that the "Springfield" we want does exist in TGN, and it has coordinates that are almost an exact match for our query:
If we go through the same exercise for the next item, "Gary", we'd find that the one with the closest coordinates was the third result returned by the reconciliation service.
Alright, so clearly, this is not a sustainable workflow for reconciliation. Have we made a mistake in how we configured OpenRefine? Are there other techniques we should be using for this, instead of querying the reconciliation service?
If not, then can the Getty's reconciliation service be updated to take latitude and longitude into account when reconciling against TGN? (Ideally, by figuring distance into the score formula, instead of trying to do an exact match on these fields?)
I hope all of these attachments get posted successfully, but if not, I'm happy to share more details.
Thank you,
Illya