ULAN Subject and OpenRefine

216 views
Skip to first unread message

Xiaoli Ma

unread,
May 10, 2018, 9:38:28 AM5/10/18
to Getty Vocabularies as Linked Open Data
Hi there,

We try to test out if OpenRefine can help us match local names to ULAN. We used the following query in OpenRefine Edit Column --> Fetch URL based on the Column but got HTTP error 505 : HTTP Version Not Supported |

'http://vocab.getty.edu/sparql.json?query=select ?Subject ?Term{?Subject a skos:Concept; rdfs:label "'+ value + '";gvp:prefLabelGVP [xl:literalForm ?Term].}'

Any clues?

We tested the query in the browser by using an actual name "Leonardo da Vinci" and it can trigger the downloading of a json file.

http://vocab.getty.edu/sparql.json?query=select ?Subject ?Term{?Subject a skos:Concept; rdfs:label "Leonardo da Vinci";gvp:prefLabelGVP [xl:literalForm ?Term].}

Thanks in advance for your time and help.

BTW, if you have successful stories of using OpenRefine to match names in ULAN or terms in AAT, please guide us to the right direction.

Thanks!

Xiaoli

Artstor/Ithaka

Vladimir Alexiev

unread,
May 10, 2018, 10:00:04 AM5/10/18
to Getty Vocabularies as Linked Open Data
The spaces make this problem. You need to URL-encode spaces to "+". 

I first I removed the optional spaces from your query, but still got  505 HTTP Version Not Supported:
curl -vg 'http://vocab.getty.edu/sparql.json?query=select*{?Subject+a+skos:Concept;rdfs:label"Leonardo da Vinci";gvp:prefLabelGVP[xl:literalForm?Term].}'

I then URL-encoded spaces in the name being searched, and it returned the expected data:
curl -g 'http://vocab.getty.edu/sparql.json?query=select*{?Subject+a+skos:Concept;rdfs:label"Leonardo+da+Vinci";gvp:prefLabelGVP[xl:literalForm?Term].}'
(note: option -g takes care of a curl error " [globbing] nested brace in column 120").

A similar query is shown at http://vocab.getty.edu/doc/queries/#OpenRefine_Reconciliation_Service and it says to use escape(value, 'url'). I took a note to add info about encoding other spaces in the query.

Xiaoli Ma

unread,
May 10, 2018, 10:57:40 AM5/10/18
to Getty Vocabularies as Linked Open Data

Xiaoli Ma

unread,
May 10, 2018, 10:58:43 AM5/10/18
to Getty Vocabularies as Linked Open Data

Marcia Zeng

unread,
Jul 29, 2018, 3:33:51 PM7/29/18
to Getty Vocabularies as Linked Open Data
Xiaoli, I tried to batch-process a huge dataset from the open MET data, specifically align with ULAN, using OpenRefine. Gragg Garcia from the Getty gave me this specific code, since I need ULAN:

'http://vocab.getty.edu/sparql.json?query=select+distinct*{?x+skos:inScheme+ulan:;(xl:prefLabel|xl:altLabel)/gvp:term+"' + escape(value, 'url') + '"}' 

It was very helpful. The second step of using JSON to get the ULAN URI is on his instruction there: 
https://www.google.com/search?q=Reconciliation+example+using+OpenRefine+with+the+Getty&ie=utf-8&oe=utf-8&client=firefox-b-1

Marcia

Vladimir Alexiev

unread,
Jul 29, 2018, 5:20:58 PM7/29/18
to Marcia Zeng, Getty Vocabularies as Linked Open Data
Comment to Xiaoli (sorry I don't have his/her email):
I think the important point is that you need a valid URL. Omit spaces or escape them as "+".
Using rdfs:label is fine for ULAN because ULAN names have no parenthetical qualifiers; and both skos:pref&altLabel contribute to rdfs:label.
But instead  of "a skos:Concept", check for "skos:inScheme ulan:".


--
You received this message because you are subscribed to a topic in the Google Groups "Getty Vocabularies as Linked Open Data" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/gettyvocablod/rBOBeyYqD9E/unsubscribe.
To unsubscribe from this group and all its topics, send an email to gettyvocablod+unsubscribe@googlegroups.com.
To post to this group, send email to gettyv...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gettyvocablod/770398ef-89ce-4d93-aef2-211f1caf2005%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--
Vladimir Alexiev, PhD, PMP
Lead, Data and Ontology Management
Ontotext Corp, www.ontotext.com
Email: vladimir...@ontotext.com, skype:valexiev1
Mobile: +359 888 568 132, SMS: 359888...@sms.mtel.net
Calendar: https://www.google.com/calendar/embed?src=vladimir...@ontotext.com
Publications: http://vladimiralexiev.github.io/pubs/

Drew Roberson

unread,
Mar 3, 2019, 3:36:34 PM3/3/19
to Getty Vocabularies as Linked Open Data
Hi,

I'm pretty new to all of this and I'm trying to follow the steps on this thread and on http://vocab.getty.edu/queries#OpenRefine_Reconciliation_Service to match names with ULAN authorities and I'm having some trouble. 

When I select "Add column by fetching URLs..." and enter the expression:

'http://vocab.getty.edu/sparql.json?query=select+distinct*{?x+skos:inScheme+ulan:;(xl:prefLabel|xl:altLabel)/gvp:term+"' + escape(value, 'url') + '"}' 

the preview looks like it will retrieve the correct string. For example, for the name "Adam, Victor" the preview looks like:


but when I hit "OK" the result for all of the names I'm working with is:

{
  "head" : {
    "vars" : [ "x" ]
  },
  "results" : {
    "bindings" : [ ]
  }
}

Can someone tell me what I'm doing wrong/how to correct this? I'm sure it's a very simple/silly mistake, but again, I'm new to all of this.

Thanks,
Drew

To unsubscribe from this group and all its topics, send an email to gettyvocablo...@googlegroups.com.

To post to this group, send email to gettyv...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gettyvocablod/770398ef-89ce-4d93-aef2-211f1caf2005%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--
Vladimir Alexiev, PhD, PMP
Lead, Data and Ontology Management
Ontotext Corp, www.ontotext.com
Email: vladimir...@ontotext.com, skype:valexiev1
Mobile: +359 888 568 132, SMS: 359888...@sms.mtel.net

Vladimir Alexiev

unread,
Mar 4, 2019, 9:46:24 AM3/4/19
to Drew Roberson, Getty Vocabularies as Linked Open Data
You have an extra trailing space in the name, which translates to an extra + before the quote. This URL works fine:


Label matching is by exact string.

If you want to make the request more robust, see the FTS queries, and the variant that uses FTS first and regex second to increase precision.

HTH ! Vladimir 

--
You received this message because you are subscribed to a topic in the Google Groups "Getty Vocabularies as Linked Open Data" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/gettyvocablod/rBOBeyYqD9E/unsubscribe.
To unsubscribe from this group and all its topics, send an email to gettyvocablo...@googlegroups.com.
To post to this group, send email to gettyv...@googlegroups.com.

Drew Roberson

unread,
Mar 6, 2019, 11:39:32 AM3/6/19
to Getty Vocabularies as Linked Open Data
Thank you! That worked!

Unfortunately, now I'm having trouble with the next step (b). In addition to the link I posted before, I'm also following these steps (which I think are identical):
  1. Add column by fetching URLs 
    'http://vocab.getty.edu/sparql.json?query=select+distinct*{?x+skos:inScheme+aat:;(xl:prefLabel|xl:altLabel)/gvp:term"' + escape(value, 'url') + '"@en}' 

  2. Add column based in this column --> Parse the JSON to obtain the URL 
    value.parseJson().results.bindings[0].x.value

  3. Add column based in this column --> Parse the identifier out of the URL by adding a column based on this column 
    value[27,37]

  4. Add column by fetching URLs --> Use another query to fetch prefLabelGVP 
    'http://vocab.getty.edu/sparql.json?query=select*where{?x+gvp:prefLabelGVP[skosxl:literalForm+?label];dc:identifier"' + escape(value, 'url') + '"}'

  5. Add column based in this column --> Parse the JSON to obtain the label 
    value.parseJson().results.bindings[0].label.value
For example, when I did step a on the name "Webb, Todd" I got the following:


But when I add a new column based on the column created in the first step and enter:

value.parseJson().results.bindings[0].x.value

the preview says, "null"

Thanks again,
Drew

Ettore RIZZA

unread,
Mar 6, 2019, 1:14:16 PM3/6/19
to Drew Roberson, Getty Vocabularies as Linked Open Data
Hello,

Note that your query works with "Todd Webb", but not with "Webb, Todd", even if the last is the preferred term. In general, it seems that even the API has problems with the commas. Weird.

Anyway, if you do not need the features of Sparql, maybe it would be better to use the Ulan web API. This OpenRefine formula should allow you to handle commas:

'http://vocabsservices.getty.edu/ULANService.asmx/ULANGetTermMatch?name="' + value.replace(",","").escape('url') + '"&roleid=&nationid='

Use it in a "fetch url". As you can see, the result is the same regardless of the order of the names.

téléchargement.png

This time the result will be in XML (and no longer in Json). You can parse it with a formula like this:

forEach(value.parseXml().select('Subject'), e, e.htmlText()).join(',')

The result will look like this:

téléchargement (1).png

If you just want the ULAN's link, this GREL formula should do the trick:

forEach(value.parseXml().select('Subject_ID'), e, "http://vocab.getty.edu/page/ulan/" + e.htmlText()).join(',')

I hope this helps,

Ettore Rizza


--
You received this message because you are subscribed to a topic in the Google Groups "Getty Vocabularies as Linked Open Data" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/gettyvocablod/rBOBeyYqD9E/unsubscribe.
To unsubscribe from this group and all its topics, send an email to gettyvocablo...@googlegroups.com.
To post to this group, send email to gettyv...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gettyvocablod/755625c7-a62d-4059-92d9-3ae2dfd199af%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "Getty Vocabularies as Linked Open Data" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gettyvocablo...@googlegroups.com.

To post to this group, send email to gettyv...@googlegroups.com.

Ettore RIZZA

unread,
Mar 6, 2019, 1:29:32 PM3/6/19
to Drew Roberson, Getty Vocabularies as Linked Open Data
(forgot to mention 1) : parseXml() is a brand new function in OpenRefine. If you are using an older version, replace it with parseHtml() (which does exactly the same thing, except in a few very specific cases).

(forgot to mention 2) : I created some time ago a little reconciliation service that facilitates the matching between an OpenRefine Column and AAT. If there is a need, I could add ULAN or TGN one of these days.

Ettore Rizza

Drew Roberson

unread,
Mar 6, 2019, 4:55:48 PM3/6/19
to Getty Vocabularies as Linked Open Data
This is so helpful!!! Thank you! Now I just have to retrieve the URIs. Any idea on how to do that? It looks like it retrieved strings of numbers/possible URIs, but I have 385 names, so I'm not sure how to systematically parse out those URIs.

Screen Shot 2019-03-06 at 3.55.04 PM.png

Also if there are any other easier ways to reconcile my data set against ULAN with the goal of retrieving URIs, I welcome any and all input.

Drew

--
You received this message because you are subscribed to a topic in the Google Groups "Getty Vocabularies as Linked Open Data" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/gettyvocablod/rBOBeyYqD9E/unsubscribe.
To unsubscribe from this group and all its topics, send an email to gettyvocablo...@googlegroups.com.
To post to this group, send email to gettyv...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gettyvocablod/755625c7-a62d-4059-92d9-3ae2dfd199af%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Getty Vocabularies LOD

unread,
Mar 6, 2019, 5:05:59 PM3/6/19
to Getty Vocabularies as Linked Open Data
Just as an FYI, the Getty will be implementing a reconciliation service for OpenRefine this month (March, 2019). I will post on this forum when it is available.

Gregg Garcia
Getty Digital

Ettore RIZZA

unread,
Mar 6, 2019, 6:47:15 PM3/6/19
to Getty Vocabularies LOD, Getty Vocabularies as Linked Open Data
@Gregg Garcia : Great! Looking forward to test it. :)

@Drew Roberson : The best procedure would probably be:

1 Fetch URLs using forEach(value.parseXml().select('Subject'), e, e.htmlText()).join('|')

2 On your new column, click on "Edit cells / Split Multi-Valued Cells" using the symbol | as a separator.

3 Based on the column created in 1, create a new column named "URI" using this GREL formula :

forEach(value.parseXml().select('Subject_ID'), e, "http://vocab.getty.edu/page/ulan/" + e.htmlText()).join('|')

4 Split this new column as in 2.

5 Visually check the names of people who have got more than one result in order to select the correct one 
(for example by adding a flag).



Ettore Rizza


--
You received this message because you are subscribed to the Google Groups "Getty Vocabularies as Linked Open Data" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gettyvocablo...@googlegroups.com.
To post to this group, send email to gettyv...@googlegroups.com.

Vladimir Alexiev

unread,
Mar 7, 2019, 6:55:36 AM3/7/19
to Getty Vocabularies as Linked Open Data
Please use http://vocab.getty.edu/ulan/nnnnn-agent as canonical semantic url to store in your data, and not
http://vocab.getty.edu/page/ulan/nnnnn

Vladimir Alexiev

unread,
Mar 7, 2019, 6:59:36 AM3/7/19
to Drew Roberson, Getty Vocabularies as Linked Open Data
Drew, you're using the wrong scheme: aat:  instead of ulan:  

Drew Roberson

unread,
Mar 7, 2019, 1:22:13 PM3/7/19
to Getty Vocabularies as Linked Open Data
Vladimir: Just to be clear, am I currently using AAT or ULAN? Because for what I'm doing, I want to reconcile against ULAN.

Anyone: I got URIs for almost everything, but the names that didn't work have diacritical marks, hyphenated names, or initials ("Smith, John A.")--is there a systematic work around or way to identify and replace these things? 

Ettore RIZZA

unread,
Mar 7, 2019, 1:39:09 PM3/7/19
to Drew Roberson, Getty Vocabularies as Linked Open Data
Can you provide some real examples of names that don't work ? 

Ettore Rizza


--
You received this message because you are subscribed to the Google Groups "Getty Vocabularies as Linked Open Data" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gettyvocablo...@googlegroups.com.

To post to this group, send email to gettyv...@googlegroups.com.

Drew Roberson

unread,
Mar 11, 2019, 3:45:20 PM3/11/19
to Getty Vocabularies as Linked Open Data
Some names that didn't match:
Ernö Friedmann, Endre
Dawson-Watson
Claudet, Antoine François Jean
Lékégian, G.

Then again, the following names DID match:
Desrochers, Étienne Jehandier
Dürer, Albrecht
Ferrier, père, fils et Soulier

They're supposed to all come from ULAN, so I'm not sure what's up.

Drew

ZENG, MARCIA

unread,
Mar 11, 2019, 4:21:32 PM3/11/19
to Drew Roberson, Getty Vocabularies as Linked Open Data

I had similar issues, especially for French names. You might find lots of discussions about bugs related to OpenRefine and CSV concerting. 


I am not a techy person so Gragg and Vladimir have been the ones to help out. Yet I am not sure such issues can be solved easily.


Marcia


From: gettyv...@googlegroups.com <gettyv...@googlegroups.com> on behalf of Drew Roberson <drober...@gmail.com>
Sent: Monday, March 11, 2019 3:45:19 PM
To: Getty Vocabularies as Linked Open Data
Subject: Re: [gettyvocablod] Re: ULAN Subject and OpenRefine
 
You received this message because you are subscribed to a topic in the Google Groups "Getty Vocabularies as Linked Open Data" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/gettyvocablod/rBOBeyYqD9E/unsubscribe.
To unsubscribe from this group and all its topics, send an email to gettyvocablo...@googlegroups.com.

To post to this group, send email to gettyv...@googlegroups.com.

Ettore RIZZA

unread,
Mar 11, 2019, 5:16:50 PM3/11/19
to ZENG, MARCIA, Drew Roberson, Getty Vocabularies as Linked Open Data
The Ulan's search engine does not give the same result depending on whether one looks for "Ernö Friedmann, Endre" or "Friedmann, Endre Ernö". IMHO, their indexing system is a bit too rigid. But I'm not a specialist in Getty's tools.

I know in any case what I would do if I had to match artists' names with Ulan (at least while waiting for the official reconciliation service). 

Knowing that 1° Open Refine has a very good reconciliation service with Wikidata and 2° that Wikidata has a lot of links to Ulan, so I will check first with Wikidata, I'll get Ulan ids, then I'll use Getty API for the unknown names like "Ferrier, père, fils et Soulier".

This 2-minute screencast shows you the steps 1 and 2 (the Wikidata API is particularly slow tonight, sorry for that).

screencast-vocab.getty.edu-2019.03.11-22-01-05.gif

Note: There is a Google Group dedicated to OpenRefine if you have more specific questions about the reconciliation with Wikidata.

Hope this helps.

Ettore Rizza


Drew Roberson

unread,
Mar 11, 2019, 5:28:11 PM3/11/19
to Ettore RIZZA, ZENG, MARCIA, Getty Vocabularies as Linked Open Data
Thank you! I was just going to ask if there is a Wikidata group because I also have some questions about that as well. I'm reconciling names from LCnaf, VIAF, ULAN, and George Eastman House/Wikidata (I also have thousands of local names that I'll end up double checking as well).

Stay tuned...

Drew

ZENG, MARCIA

unread,
Mar 11, 2019, 5:34:16 PM3/11/19
to Drew Roberson, Ettore RIZZA, Getty Vocabularies as Linked Open Data

Thank you both.

You might know this, but just in case, here is another source:

In the Mix'n'match you can see many authority control files and some are matched against Wikidata items. But a more valuable thing is the involving original name authorities, might be in other languages, in specific domains, and regional.

https://tools.wmflabs.org/mix-n-match/#/


Marcia


From: Drew Roberson <drober...@gmail.com>
Sent: Monday, March 11, 2019 5:27:57 PM
To: Ettore RIZZA
Cc: ZENG, MARCIA; Getty Vocabularies as Linked Open Data

Getty Vocabularies LOD

unread,
Mar 11, 2019, 5:43:59 PM3/11/19
to Getty Vocabularies as Linked Open Data
The Vocab web site and XML web service indexes on the normalized forms of terms. The LOD site has the ability to search on exact match, like you did with the predicate+object combination gvp:term+"Adam%2C+Victor", and a Lucene index which would have the predicate+object combination of luc:term+"Adam%2C+Victor".

So the most-to-least strict versions of the Vocabs search would be exact->normalized->Lucene.

The upcoming service for OpenRefine will be using Elasticsearch which features a tokenized, Lucene-like indexing scheme with a boost in score for any exact matches that are found.

Gregg Garcia
Getty Digital

Getty Vocabularies LOD

unread,
Mar 21, 2019, 2:20:03 PM3/21/19
to Getty Vocabularies as Linked Open Data
If you would like to be a beta tester for the Vocabularies OpenRefine reconciliation service please email me at: gga...@getty.edu

We are just starting our rollout and only want a small group of people testing before we make the service open publicly.

Thanks.

Gregg Garcia
Software Architect
Getty Digital / J. Paul Getty Trust

Jessica Breiman

unread,
Apr 17, 2019, 12:34:37 PM4/17/19
to Getty Vocabularies as Linked Open Data
Are there any updates on this project? Thanks so much -- it's going to be fantastic!

Jessica

Getty Vocabularies LOD

unread,
Apr 22, 2019, 5:30:29 PM4/22/19
to Getty Vocabularies as Linked Open Data
Beta testing is underway. If you would like to be a beta tester, please email me at gga...@getty.edu with your name, organization and current project you will be testing.

Thanks.

Gregg
Reply all
Reply to author
Forward
0 new messages