Query for all artists in ULAN database

142 views
Skip to first unread message

Maximilian Füsslin

unread,
Aug 23, 2018, 4:58:50 AM8/23/18
to Getty Vocabularies as Linked Open Data
Hi folks!

I am trying to extract some data from the ULAN database.
I need the name, bio, nationality, scopeNote, birth and died informations form every artist within the database.

So I came up with the following query:

select ?x ?name ?bio ?nationality ?type ?ScopeNote ?birth ?died {
  ?x gvp:broaderExtended ulan:500000002. # Persons, Artists
  optional {?x gvp:agentTypePreferred [gvp:prefLabelGVP [xl:literalForm ?type]]}
  optional {?x foaf:focus [gvp:nationalityPreferred [gvp:prefLabelGVP [xl:literalForm ?nationality]]]}
  optional {?x gvp:prefLabelGVP [xl:literalForm ?name]}
  optional {?x foaf:focus [gvp:biographyPreferred [schema:description ?bio; gvp:estStart ?birth; gvp:estEnd ?died]]}
  optional {?x skos:scopeNote [dct:language gvp_lang:en; rdf:value ?ScopeNote]}
}

This query works fine and for result size it says ~192k at the result page.
However, if I try to download the dataset, I only get around 45k rows in a CSV file.

Why does the number of rows differ in the CSV file from the result size showing on the vocab.getty.edu?

If I try the following query, everything seems to be fine though - got 192k rows in a CSV file which matches the size on the vocab.getty.edu result page:

select ?x ?name ?bio ?nationality ?type {
  ?x gvp:broaderExtended ulan:500000002. # Persons, Artists
  optional {?x gvp:agentTypePreferred [gvp:prefLabelGVP [xl:literalForm ?type]]}
  optional {?x foaf:focus [gvp:nationalityPreferred [gvp:prefLabelGVP [xl:literalForm ?nationality]]]}
  optional {?x gvp:prefLabelGVP [xl:literalForm ?name]}
  optional {?x foaf:focus [gvp:biographyPreferred [schema:description ?bio]]}
}

So the problem has to be in the way I want to retrieve the "scopeNote" and "birth/died" values...
Is there a way to make the first query work when downloading result as CSV or JSON?

Thank you in advance for your help!

Maximilian Füsslin

Vladimir Alexiev

unread,
Aug 23, 2018, 7:40:57 AM8/23/18
to mfue...@gmail.com, Getty Vocabularies as Linked Open Data
You are hitting a server limit. I optimized the query slightly by traversing foaf:focus once (and that field is not optional):

select ?x ?name ?bio ?nationality ?type ?ScopeNote ?birth ?died {
  ?x gvp:broaderExtended ulan:500000002; foaf:focus ?agent. # Persons, Artists
  optional {?x gvp:agentTypePreferred [gvp:prefLabelGVP [xl:literalForm ?type]]}
  optional {?agent gvp:nationalityPreferred [gvp:prefLabelGVP [xl:literalForm ?nationality]]}
  optional {?x gvp:prefLabelGVP [xl:literalForm ?name]}
  optional {?agent gvp:biographyPreferred [schema:description ?bio; gvp:estStart ?birth; gvp:estEnd ?died]}
  optional {?x skos:scopeNote [dct:language gvp_lang:en; rdf:value ?ScopeNote]}
}

Now I get 69k rows (18Mb TSV). You should be able to get the rest by playing with OFFSET.
But please note that the SPARQL repo is not intended for download of huge results sets...

Vladimir Alexiev

unread,
Aug 23, 2018, 7:43:26 AM8/23/18
to mfue...@gmail.com, Getty Vocabularies as Linked Open Data
Also:  gvp:estStart, gvp:estEnd are decidedly not birt/death dates, they often have a range of 100 years or more (see the documentation).
You're better off just printing schema:description ?bio.
Cheers!
--
Vladimir Alexiev, PhD, PMP
Lead, Data and Ontology Management
Ontotext Corp, www.ontotext.com
Email: vladimir...@ontotext.com, skype:valexiev1
Mobile: +359 888 568 132, SMS: 359888...@sms.mtel.net
Calendar: https://www.google.com/calendar/embed?src=vladimir...@ontotext.com
Publications: http://vladimiralexiev.github.io/pubs/

Maximilian Füsslin

unread,
Aug 27, 2018, 3:44:10 AM8/27/18
to Getty Vocabularies as Linked Open Data
Thank you very much for your answers!
Will try your query optimization.

Regarding birth and death dates: Is there a way to retrieve the "correct" birth and death dates? Or are they not even stored within the vocab database?
I saw that some death dates were like 2080 or 2050... But the one of people who died already seems to be correct (tested around 10 different artists). Do you think I can just use all dates lower than 2018 and assume they are correct? I need the dates for filtering... 

I looked again into the documentation and found a sample query which uses birth:

5.7       Architects Born in the 14th or 15th Century

Select all architects (type aat:300024987 or its descendants) with birth date between 1300 and 1499. We'll take a shortcut: search only in the preferred biography: observation shows that if there is a birth date at all, it will be found in the preferred biography:

select * {

  ?x a gvp:PersonConcept;

     gvp:prefLabelGVP/xl:literalForm ?name;

     gvp:agentTypePreferred|(gvp:agentTypePreferred/gvp:broaderGenericExtended) aat:300024987;

     foaf:focus/gvp:biographyPreferred [

       schema:description ?bio;

       gvp:estStart ?birth]

     filter ("1300"^^xsd:gYear < ?birth && ?birth <= "1499"^^xsd:gYear)}

We have to provide proper types xsd:gYear to the query literals in order for the comparisons to work.


Thank you again for your help!

Vladimir Alexiev

unread,
Aug 27, 2018, 5:27:43 AM8/27/18
to Maximilian Füsslin, Getty Vocabularies as Linked Open Data
>Is there a way to retrieve the "correct" birth and death dates? Or are they not even stored within the vocab database?

Correct dates (when available/filled) are stored in the same fields, but there is no flag to mark correct vs estimated dates.
 
I saw that some death dates were like 2080 or 2050... But the one of people who died already seems to be correct (tested around 10 different artists). Do you think I can just use all dates lower than 2018 and assume they are correct?

That's a good idea, but you need more heuristics. Eg
  • if the difference between start and end is 100 years (or more), these are inexact dates.
  • If the dates are round (divisible by 10), they are slightly suspect
  • You could also check against dates in LOD. All of ULAN is coreferenced to VIAF; and you can fetch ULAN-WD coreferences from https://tools.wmflabs.org/wikidata-todo/beacon.php

Maximilian Füsslin

unread,
Aug 29, 2018, 3:22:01 AM8/29/18
to Getty Vocabularies as Linked Open Data
Thank you again for your answers.
I will try your suggestions!
Reply all
Reply to author
Forward
0 new messages