Mismatch of reconciliation results - image files & Wikicommons

19 views
Skip to first unread message

Dr Thneed

unread,
Mar 28, 2022, 6:40:04 PM3/28/22
to openr...@googlegroups.com
Hi.
Something very strange has just happened in reconciliation. I have a short list of people, already reconciled correctly with Wikidata. I have then extended the data by adding the column "image", to get the file name of images from Wikidata. I have then reconciled that column of file names with the Wikicommons API, the first time I have used this service.

What I have found is that the filenames, which are the exact file names of Wikicommons images, are reconciling bizarrely to the wrong item! So there are only four filenames I am trying to match, and the reconciliation service has returned the correct four items, but matched them to the wrong cell somehow. E.g. there are images for Sonja Macfarlane and Barbara Brookes, and OpenRefine has matched each to the wrong one, with 100% confidence. I have rejected the matches before taking the screenshot so that you can see both the contents of the reconciled cell and the image that OpenRefine is trying to tell me it matches to. Any ideas for what is going on? I am using v3.5.2(1) on a Mac.

Cheers,
Tamsin
--
Tamsin Braisher

Dunedin
New Zealand
Screen Shot 2022-03-29 at 11.32.05 AM.png

Owen Stephens

unread,
Mar 29, 2022, 5:08:00 AM3/29/22
to OpenRefine
I can see the same issue Tamsin - so it's not just you!

It looks like for some reason the results aren't being linked up to the correct row in OpenRefine - in fact it looks like they are "off by one" - i.e. the correct match is being shown one row below where it should be (except the last row which wraps back around to the top).

I'm not quite sure where the issue is occurring at the moment  - I'll try to find some time to do some more investigation today and update once I know more. (or maybe others can take a look!)

Owen

Owen Stephens

unread,
Mar 29, 2022, 5:28:38 AM3/29/22
to OpenRefine
OK - I think I've established that the problem is in the Wikicommons Reconciliation service rather than in OpenRefine directly. I can see there has been a change to the code in that service that creates the results set, and although I'm not 100% sure that's the cause of the problem, I suspect that the change has resulted in this 'off by one' issue in the results. 

I need to reach out to the team responsible for that to see if it can be fixed

Owen

For a little more detail, intended only for those who have a technical interest, I'm seeing a post like:
curl --location --request POST 'https://commonsreconcile.toolforge.org/en/api' \
--header 'Content-Type: application/x-www-form-urlencoded' \
--data-urlencode 'queries={
"q0": {
"query": "File:Commons-logo.svg"
},
"q1": {
"query": "File:Allah-green-transparent.svg"
}
}'

Returning:

{
"q0": {
"result": [
{
"id": "M74943657",
"match": true,
"name": "File:Allah-green-transparent.svg",
"score": 100
}
],
"type": [
{
"id": "mediafile",
"name": "Media file"
}
]
},
"q1": {
"result": [
{
"id": "M317966",
"match": true,
"name": "File:Commons-logo.svg",
"score": 100
}
],
"type": [
{
"id": "mediafile",
"name": "Media file"
}
]
}
}

which (assuming we rely on the query keys to be consistent) looks like the results are assigned to an incorrect key.
I'm seeing this commit against the reconciliation code https://gerrit.wikimedia.org/r/c/labs/tools/commons-recon-service/+/769409/1/service/reconcile/processresults.py which makes a change to how the results set is built as a possible cause, although I don't really understand the code at the moment tbh

Dr Thneed

unread,
Mar 29, 2022, 3:55:26 PM3/29/22
to openr...@googlegroups.com
Thanks Owen, it is always reassuring to know the problem is not an individual one (although I didn't see how this one could be)!
Cheers,
Tamsin

--
You received this message because you are subscribed to the Google Groups "OpenRefine" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openrefine+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/openrefine/440b5037-7f47-40b1-8fb7-70ee9fa7984en%40googlegroups.com.

Owen Stephens

unread,
Mar 30, 2022, 7:53:19 AM3/30/22
to OpenRefine
The issue has now been reported to the team who work on the Wikimedia Commons reconciliation service

Hopefully they can fix the issue and get this reconciliation service working again quickly!

Best wishes

Owen

Reply all
Reply to author
Forward
0 new messages