question about reconciliation

97 views
Skip to first unread message

D063520

unread,
Jun 17, 2022, 4:37:09 AM6/17/22
to OpenRefine
Hi,

I'm new to open refine and I run into this problem. Imagine I have a table with

1) a know wikidata identifier (let's say ORCID (P496))
2) a property I want to add to wikidata (let's say the employer)

Now I would like to reconcile column 1 agains the entity that match the ORCID and add the second column. My problem is I'm not able to reconcile. I feel he needs the name of the person, but I do not have it.

Can someone help me out?

Thank you
D063520

Owen Stephens

unread,
Jun 17, 2022, 7:48:49 AM6/17/22
to OpenRefine
Hi and welcome!

Can you give some more details about the data and your reconciliation settings and process and what it means when you say "I'm not able to reconcile"?

I find that a reconciliation against the https://wikidata.reconci.link/en/api reconcilation service using an ORCID works, although the match confidence is not as high as for a name

The more information you can give the more likely it is that we'll be able to help workout what's going wrong.

Best wishes

Owen

D063520

unread,
Jun 17, 2022, 11:16:17 AM6/17/22
to OpenRefine
Hi,

thank you for your help! My main problem is, how is the workflow? 

I have a table like this:

Column,Column 4,Column 5
ORCID ID,Other Names,Affiliations
0000-0002-0046-2219,,"The QA Company, University of Kaiserslautern, Université Jean Monnet Saint-Etienne, Université de Lyon"
0000-0001-8151-7236,,Universitat der Bundeswehr München
0000-0003-3407-0798,Evandro Luiz Diefenbach,"Fundação Getúlio Vargas, National Telecommunications Agency, Universidade Federal do Rio Grande do Sul, Universidade de Brasília, Universidade de Lisboa, Universidade de Lisboa Instituto Superior de Ciências Sociais e Políticas"
0000-0002-4996-5714,,Technische Universitat Darmstadt

How do I reconciliate agains the first column? 

1) click on colum
2) reconcile
3) use values as identifiers?

Owen Stephens

unread,
Jun 17, 2022, 11:57:40 AM6/17/22
to OpenRefine
If you're new to reconciliation it will be worth reading through the documentation at https://docs.openrefine.org/manual/reconciling

But the basic approach will be:

1) Click on ORCID ID column menu
2) Choose "Reconcile"
3) Choose "Start reconciling"
4) If you haven't already done so, add the Wikidata reconcilation services as follows:
4a) click "Add standard service"
4b) Enter the URL https://wikidata.reconci.link/en/api into the prompt and click "Add Service"
5) Select the "Wikidata reconci.link (en)" reconciliation service in the "Services" tab
6) Select "Reconcile against no particular type" (this step shouldn't be necessary but when looking up people there are problems when reconciling against Wikidata with a type specified)
7) The reconciliation should now run and you'll see the results

Based on the four records you've given as an example only one of these people/ORCIDs exists in Wikidata - so I don't think you should necessarily expect all your ORCIDs to reconcile successfully.

Best wishes

Owen

Owen Stephens

unread,
Jun 17, 2022, 12:04:08 PM6/17/22
to OpenRefine
By the way the "Use values as identifiers" only applies in situations where you have the unique identifiers from a specific reconciliation service and want to match them up. So for example for Wikidata this only works if you have Wikidata IDs. This option actually skips the reconciliation step and just links up based on the IDs - see https://docs.openrefine.org/manual/reconciling#reconciling-with-unique-identifiers for more information.

Owen

Antoine Beaubien

unread,
Jun 20, 2022, 2:22:12 AM6/20/22
to OpenRefine
I think you should go the other way around…

So, I would gather the ORD IDs, and, With the Wikidata Query Service, I would send this query:

#ORC IDs of persons
SELECT ?vOrcPers ?vOrcPersLabel ?vOrcIDs
WHERE
{
  VALUES ?vOrcIDs { "0000-0002-0046-2219" "0000-0001-8151-7236" "0000-0003-3407-0798" "0000-0002-4996-5714" }
  ?vOrcPers      wdt:P496         ?vOrcIDs
                 
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}


Of course, your ?vOrcIDs would be longer/containing all your ORC-ID…

And then, I would import the result in a new project, cross() the WDID resulting with your main project, based on the ORC_ID.

Regards,
   Antoine

Owen Stephens

unread,
Jun 20, 2022, 6:52:59 AM6/20/22
to OpenRefine
I don't want to overload with options (the approach suggested by Antoine sounds like it might be the most efficient if all you care about is ORCIDs already in Wikidata) but I'd just add that you can also use the ORCID API from OpenRefine and you could use this approach to grab any relevant information from the relevant public profile - which could be helpful if you want to reconcile against Wikidata using names, or if you want to create new entries in Wikidata for those that don't already exist.

To do this you can use an approach like:
  1. From the ORCID column choose "Edit column -> Add column by fetching URLs"
  2. Specify a name for your new column (like "ORCID Data")
  3. Use GREL to create a URL like "https://pub.orcid.org/v3.0/"+value+"/person" (this will only fetch the person information - if you wanted other details you can amend to "https://pub.orcid.org/v3.0/"+value to get all the available information)
  4. Set the Throttle delay to something reasonable (I'd say 500 or 1000)
  5. If you want to retrieve JSON rather than XML, click the 'show' option on "HTTP headers to be used when fetching URLs" and change the Accept value to application/vnd.orcid+json (this is a matter of preference - you can extract the relevant data from either format)
  6. Click OK
This should then populate your new column by fetching all the information from the ORCID registry. You can then extract information using GREL like:
XML example
with(value.parseXml(),x,x.select("personal-details|given-names")[0].ownText() + " " + x.select("personal-details|family-name")[0].ownText())
JSON example
with(value.parseJson().person.name,j,j.get("given-names").value + " " + j.get("family-name").value)

Happy to answer more on this approach if you have questions

Owen

D063520

unread,
Jun 20, 2022, 4:07:32 PM6/20/22
to OpenRefine
Hi,

thank you very much for your answers! All was helpful to understand how things work. I have one more question related to this. Basically I have the data:

Column 1, Column2
Harald Krichel, value1
Seewolf, value1
Internetsaarbruecken, value1
Bertilow, value1
TR, value1
Eckhard, value1
Eckhard Bick, value3
Oscar, value2
WTM, value1
Albrecht, value2

I know that these are wikimedia usernames (https://www.wikidata.org/wiki/Property:P4174). I want to reconcile, i.e. find the qids based on them and add as a property column 2. I would imagine that I can easily reconcile against them. When I use the reconciliation suggested by Owens then I do not get good results. I would not care if I have to specify P4174. In fact I would be happy, so that this is the only information used.  I understand the solution of Antoine, only it sounds a bit complicated. I feel it is a very common workflow. Can you confirm that in this case only the workflow that Antoine proposes it the right one?

Thank you again for the efforts!
D063520

Antoine Beaubien

unread,
Jun 20, 2022, 5:16:21 PM6/20/22
to OpenRefine
My solution is not complicate.

And it doesn't rely on a variable process. Querying Wikidata item with an ORC ID is not like the Recon, that can match bad or similar yet different entities.
I can do the whole process in a few minutes.

My process is not the only one, but it's the most strait forward, and doesn't require Recon. Building the WDQS query is easy, exporting the result from the WDQS is easy, exporting the result CSV file is easy, importing it to OR is easy. You just need the last step, how to write the cross() function call.

Regards, 
   Antoine

Thad Guidry

unread,
Jun 20, 2022, 5:38:34 PM6/20/22
to openr...@googlegroups.com
Duplicate your wikimedia user column using the "Add column based on".
Then you can reconcile against type "Q41546637" Wikimedian those entities in the first column.  But be forewarned, you won't get reconcile results (or very good results) because no Wikimedians are loaded into Wikidata as such because of notability requirements. (well, a few are but they are an exception).
But you can still include the duplicated wikimedia username as a property "Also use relevant details from other columns..." in the dialog. But again, because Wikimedians are not directly in the Wikidata graph (but instead stored as part of Wikimedia (and Wikidata's) metadata and the data is actually located in SQL tables behind the scenes but is accessible via the Wikidata API which you could query directly and ask someone in the Wikidata community how you would go about doing that after you have consulted the API documentation here: API:Userinfo - MediaWiki

Best of luck !



--
You received this message because you are subscribed to the Google Groups "OpenRefine" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openrefine+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/openrefine/028339ac-0f1a-4017-adbc-9c7210cf3eb7n%40googlegroups.com.

Antoine Beaubien

unread,
Jun 22, 2022, 10:59:11 PM6/22/22
to OpenRefine
Hi Thad,

   I'm curious: how can you make the assumption that these people are Wikimedians? I don't believe D063520 said that in any of it's posts.

Regards,
   Antoine

Thad Guidry

unread,
Jun 22, 2022, 11:21:59 PM6/22/22
to openr...@googlegroups.com

D063520

unread,
Jun 23, 2022, 4:46:25 AM6/23/22
to OpenRefine
Hi,

thank you again for all the responses! I think what Antoine proposed is the solution that works out properly.

Thank you
D063520
Reply all
Reply to author
Forward
0 new messages