OpenRefine/Reconciliation help needed to extract literal TGN/AAT terms

40 views
Skip to first unread message

ALLISON PLANCK

unread,
Mar 11, 2023, 11:05:52 AM3/11/23
to Getty Vocabularies as Linked Open Data
Hi all, 

First of all, thank you for this incredible resource!

I am currently working with an open access museum collections data set for a digital humanities project, and I have decided to learn/experiment with the Getty's reconciliation program via open refine to populate missing artist biographical data. I have been combing through tutorials provided by the Getty, as well as through this message board, to learn more about the process and to find answers to my questions.

I have succeeded in reconciling the artists names and pulling their ULAN ID's - now I am trying to populate artist gender/sex and birth location. I found a really useful work sheet (attached below) that has led me through a process of adding columns by fetching URLS then running JSON parse transformation on the resulting column. The result populates a link to the identified vocabulary term, however I need to be able to see this term literally as I wish to further create data visualizations based on the results. I've included a screen shot below for reference to what I'm speaking of. So instead of seeing my data expressed as http://vocab.getty.edu/tgn/7011781-place , I would like to further transform/parse it to say "London". Is there an additional step that someone could provide or point me to in order to help me populate these literal terms in my columns? 

Thank you in advance for any assistance with this issue to a newcomer!

Allison

(screenshot exampling indicated issue)
ex.png
OpenRefine ULAN Reconcilation Steps.pdf

Tom Morris

unread,
Mar 11, 2023, 3:02:16 PM3/11/23
to ALLISON PLANCK, Getty Vocabularies as Linked Open Data
It's kind of heavyweight, so there may be a simpler way, but if you fetch https://vocab.getty.edu/tgn/7011781.json and parse the JSON returned, the `_label` key will contain the English label "London".

There's a richer set of terms returned as an array under the `identified_by` key including additional languages, historical terms, etc, but they would take more work to parse.

Tom


--
You received this message because you are subscribed to the Google Groups "Getty Vocabularies as Linked Open Data" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gettyvocablo...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gettyvocablod/4787d273-d739-433c-a115-5b51b0bae1f5n%40googlegroups.com.

ALLISON PLANCK

unread,
Mar 13, 2023, 10:37:43 PM3/13/23
to Getty Vocabularies as Linked Open Data
Dear Tom, 

Thank you for extending your help! 

I will definitely continue to ask around to see if someone may be able to help facilitate this route, as the language to further parse that information is currently out of my depth. As I am trying to pull all of the TGN locations on my spreadsheet (16842 rows), would this direction of further fetching and parsing need to be applied on each individual cell or is there a way for this to be done by the batch? Thank you again for your help Tom!

So far, I have yet to find another avenue (besides the instruction sheet attached in my first message) that enables me to generate the birthplace without already having the TGN location ID. I have found instructions on how to fetch and parse the TGN ID to generate the literal term, but none on how to populate that TGN ID from the ULAN ID/artist that doesn't result in the http link indicated in the screenshot. 

Thank you again for this valuable resource and the community of folks willing to provide their assistance where they can!

Best,
Allison

Vladimir Alexiev

unread,
Apr 19, 2023, 9:38:16 AM4/19/23
to Getty Vocabularies as Linked Open Data
Hi Allison!

You can fetch the main GVP label like this:
select (str(?lab) as ?label) {
  [] foaf:focus <http://vocab.getty.edu/tgn/7011781-place>; gvp:prefLabelGVP/xl:literalForm ?lab
}


Run this query and copy the URL from the button "Download as CSV".
You need to replace the fixed semantic URL <http://vocab.getty.edu/tgn/7011781-place> with "value" (the current value):

+ value +
"%3B+gvp%3AprefLabelGVP%2Fxl%3AliteralForm+%3Flab%0D%0A%7D&_implicit=false&implicit=true&_equivalent=false&_form=%2Fsparql"

This will fill out the new column with strings like "label\nLondon". Now you need to make a new column that strips "label\n".

If doing this for 16k rows is too slow, then you can:
* collect all distinct places in a separate sheet
* add the place label to that smaller sheet 
* use the "cross" function https://openrefine.org/docs/manual/grelfunctions#crosscell-s-projectname-optional-s-columnname-optional to copy from the smaller to the bigger sheet.

>  see this term literally as I wish to further create data visualizations... http://vocab.getty.edu/tgn/7011781-place ,

If you mean map visualizations, then you saw there are coordinates in the *-place node, right?

Cheers!

Vladimir Alexiev

unread,
Apr 19, 2023, 9:40:29 AM4/19/23
to Getty Vocabularies as Linked Open Data
https://forum.openrefine.org/t/how-to-translate-gvp-sparql-to-grel/534/2 shows a better way where you put the human-readable query in the GREL expression and use escape() to do the URL escaping


Reply all
Reply to author
Forward
0 new messages