1 AAT and 2 TGN Questions

65 views
Skip to first unread message

Nicholas Cipolla

unread,
Dec 8, 2020, 7:49:23 PM12/8/20
to Getty Vocabularies as Linked Open Data
Hello,

I'm trying to do a few things.

1. I'd like to pull all AAT Nationality IDs, Pref Label or English Label, and all other non-preferred terms as well. The idea is to map against the Nationality label, and we need the non-pref terms as well to help in mapping. Gregg was kind enough to start me off with the ID query: 

select distinct ?nationality {?x skos:inScheme ulan:; foaf:focus/schema:nationality ?nationality}

Ideally, it would be great to order them by display order similar to (at least I think) this query: 

  {?x gvp:broaderPreferred ulan:500125789; gvp:displayOrder ?o1} union

  {?x gvp:broaderPreferred [gvp:broaderPreferred ulan:500125789; gvp:displayOrder ?o1]; gvp:displayOrder ?o2} union

  {?x gvp:broaderPreferred [gvp:broaderPreferred [gvp:broaderPreferred ulan:500125789; gvp:displayOrder ?o1]; gvp:displayOrder ?o2]; gvp:displayOrder ?o3}.

  ?x gvp:prefLabelGVP/xl:literalForm ?name;

     foaf:focus/gvp:biographyPreferred/schema:description ?bio

} order by ?o1 ?o2 ?o3

The resulting CSV would be column 1: AAT ID; column 2: Label all in first normal form. 

2. We're doing a similar pull of TGN data- all sovereign states. I used this query to get all sovereign state TGN IDs and the English or Preferred Label:

select ?c (coalesce(?labEn,?labGVP) as ?lab) {

  ?c gvp:placeType|(gvp:placeType/gvp:broaderGenericExtended) [rdfs:label "sovereign states"@en]

  optional {?c xl:prefLabel [xl:literalForm ?labEn; dct:language gvp_lang:en]}

  optional {?c gvp:prefLabelGVP [xl:literalForm ?labGVP]}}


 I can easily de-dupe the list later. What I'd love is again to have all the labels for each country in first normal form.

3. Lastly, we're trying to grab the hierarchy of TGN terms for a given ID. For example, New York City would have the parents: 
 
World (facet)
North and Central America (continent) (P)
United States (nation) (P)
New York (state) (P)
New York (inhabited place) (P)

The reasoning behind this is that we pulled the birth and death location of ULAN persons/artists, but that data exists in TGN URI form (we couldn't figure out how to get the label). We're trying to parse out the location information into up to four parts: 1. City, 2. State, 3. Province, 4. Nation. We're creating a lookup dictionary for mapping purposes starting with the country/nation level to start. Therefore, we need that hierarchy of labels to parse out a given TGN ID. Any suggestions or insights here would be most helpful. I did see Alessio's recent post generously sharing a Python script to extract hierarchies from AAT terms.

many thanks in advance,
Nicholas Cipolla

Vladimir Alexiev

unread,
Feb 5, 2021, 4:44:46 AM2/5/21
to Getty Vocabularies as Linked Open Data
Please post different questions in different emails.

1. If you want nationalities, why are you playing with one particular ULAN entry  ulan:500125789?
- GVP uses "nationality" for a catch-all term of any culturally significant group: nation, religion, period/style, even sex. There is not one single AAT hierarchy for all "nationalities". 
- sort order won't do you any good since there's a hierarchy amongst all these different factors, and sort order is local to all these hierarchies and subhirarchies.
This said, there are 4 ways to get them:

a. By usage (what Gregg showed you). It's inefficient since it has to grep all ULAN and do DISTINCT. It also may not return everything that is considered "nationality".
You can use a subquery to get the label:

select * {
  {select distinct ?nat {
    ?x skos:inScheme ulan:; foaf:focus/schema:nat}}
  ?nat gvp:prefLabelGVP/xl:literalForm ?nationality}

I haven't tried the above

b. By the ULAN facet 500125081 Unknown People by Culture.

select * {
  ?nat a skos:Concept; gvp:broaderExtended ulan:500125081.
  ?nat gvp:prefLabelGVP/xl:literalForm ?nationality
}

This returns 2241.

Unfortunately there's no link to AAT. Eg see Unknown Abassid:

c. By the AAT hierarchy 300015646 Styles and Periods 

select * {
  ?nat a skos:Concept; gvp:broaderExtended aat:300015646.
  ?nat gvp:prefLabelGVP/xl:literalForm ?nationality
}

This returns 5692. Some are art movements eg Jugend (not what you'd call a nationality), others are geological periods eg Cretaceous (would be useful if you want to designate the unknown creator of a cave painting)

d. By asking Gregg whether we can post the internal nationality sheet currently used for mapping ULAN ingests. 
It has 2263 entries, and includes stuff like Freemasonic, Rosicrucian, Manichaean, Mithraist, Zoroastrian.

In brief: you first need to decide what you consider "nationality"...

2. There are about 400 countries, but 5796 total labels.
Some of the labels are duplicated with different lang tags, so let's strip that:

select ?c ?prefLab ?label {
  {select distinct ?c {?c gvp:placeType|(gvp:placeType/gvp:broaderGenericExtended) [rdfs:label "sovereign states"@en]} order by ?c}
  optional {?c xl:prefLabel [xl:literalForm ?labEn; dct:language gvp_lang:en]}
  optional {?c gvp:prefLabelGVP [xl:literalForm ?labGVP]}
  bind(coalesce(?labEn,?labGVP) as ?prefLab)
  {select distinct ?c ?label {?c rdfs:label ?lab bind(str(?lab) as ?label)}}

You can also try to group by ?c and use (group_concat(?label; separator="; ") as ?labels)

3. "We're trying to parse out the location information into up to four parts: 1. City, 2. State, 3. Province, 4. Nation."
Well, good luck with that :-) since ADM region organization varies wildly between countries.

I see two ways:

a. Parse gvp:parentString. Eg for NY City it shows "New York, United States, North and Central America, World"
Optionally, get "?x gvp:broaderPreferred ?parent. ?parent gvp:prefLabelGVP/xl:literalForm ?parentPlace" and match the labels to the pieces of gvp:parentString.
Unless you use gvp:broaderPreferred you'll be surprised to learn that NY City has 5 additional parents:
the counties of Bronx, Kings, New York, Queens and Richmond (yes, places are weird in this way)

b. Use a hard-coded number of levels, eg

Please post different questions in different emails.

1. If you want nationalities, why are you playing with one particular ULAN entry  ulan:500125789?
- GVP uses "nationality" for a catch-all term of any culturally significant group: nation, religion, period/style, even sex. There is not one single AAT hierarchy for all "nationalities". 
- sort order won't do you any good since there's a hierarchy amongst all these different factors, and sort order is local to all these hierarchies and subhirarchies.
This said, there are 4 ways to get them:

a. By usage (what Gregg showed you). It's inefficient since it has to grep all ULAN and do DISTINCT. It also may not return everything that is considered "nationality".
You can use a subquery to get the label:

select * {
  {select distinct ?nat {
    ?x skos:inScheme ulan:; foaf:focus/schema:nat}}
  ?nat gvp:prefLabelGVP/xl:literalForm ?nationality}

I haven't tried the above

b. By the ULAN facet 500125081 Unknown People by Culture.

select * {
  ?nat a skos:Concept; gvp:broaderExtended ulan:500125081.
  ?nat gvp:prefLabelGVP/xl:literalForm ?nationality
}

This returns 2241.

Unfortunately there's no link to AAT. Eg see Unknown Abassid:

c. By the AAT hierarchy 300015646 Styles and Periods 

select * {
  ?nat a skos:Concept; gvp:broaderExtended aat:300015646.
  ?nat gvp:prefLabelGVP/xl:literalForm ?nationality
}

This returns 5692. Some are art movements eg Jugend (not what you'd call a nationality), others are geological periods eg Cretaceous (would be useful if you want to designate the unknown creator of a cave painting)

d. By asking Gregg whether we can post the internal nationality sheet currently used for mapping ULAN ingests. 
It has 2263 entries, and includes stuff like Freemasonic, Rosicrucian, Manichaean, Mithraist, Zoroastrian.

In brief: you first need to decide what you consider "nationality"...

2. There are about 400 countries, but 5796 total labels.
Some of the labels are duplicated with different lang tags, so let's strip that:

select ?c ?prefLab ?label {
  {select distinct ?c {?c gvp:placeType|(gvp:placeType/gvp:broaderGenericExtended) [rdfs:label "sovereign states"@en]} order by ?c}
  optional {?c xl:prefLabel [xl:literalForm ?labEn; dct:language gvp_lang:en]}
  optional {?c gvp:prefLabelGVP [xl:literalForm ?labGVP]}
  bind(coalesce(?labEn,?labGVP) as ?prefLab)
  {select distinct ?c ?label {?c rdfs:label ?lab bind(str(?lab) as ?label)}}

You can also try to group by ?c and use (group_concat(?label; separator="; ") as ?labels)

3. "We're trying to parse out the location information into up to four parts: 1. City, 2. State, 3. Province, 4. Nation."
Well, good luck with that :-) since ADM region organization varies wildly between countries.

I see two ways:

a. Parse gvp:parentString. Eg for NY City it shows "New York, United States, North and Central America, World"
Optionally, get "?x gvp:broaderPreferred ?parent. ?parent gvp:prefLabelGVP/xl:literalForm ?parentPlace" and match the labels to the pieces of gvp:parentString.
Unless you use gvp:broaderPreferred you'll be surprised to learn that NY City has 5 additional parents:
the counties of Bronx, Kings, New York, Queens and Richmond (yes, places are weird in this way)

b. Use a hard-coded number of levels, eg

select * {
  bind(tgn:7007567 as ?x)
  optional {?x gvp:broaderPreferred ?p1. ?p1 gvp:prefLabelGVP/xl:literalForm ?p1label
    optional {?p1 gvp:broaderPreferred ?p2. ?p2 gvp:prefLabelGVP/xl:literalForm ?p2label
      optional {?p2 gvp:broaderPreferred ?p3. ?p3 gvp:prefLabelGVP/xl:literalForm ?p3label
        optional {?p3 gvp:broaderPreferred ?p4. ?p4 gvp:prefLabelGVP/xl:literalForm ?p4label}}}}}


Reply all
Reply to author
Forward
0 new messages