Broader term as text,

54 views
Skip to first unread message

Rolf Blijleven

unread,
Nov 9, 2017, 12:10:16 PM11/9/17
to Getty Vocabularies as Linked Open Data
Hi, 

This query (modified from elsewhere on this forum) does almost what I want: 

SELECT DISTINCT ?subject ?label ?broader ?scope_note_nl
WHERE {
  ?subject luc:term "weef*"; a gvp:Concept.
  OPTIONAL {?subject xl:prefLabel|xl:altLabel [dct:language gvp_lang:nl; xl:literalForm ?label]}
  OPTIONAL {?subject gvp:broaderGeneric ?broader]}
  OPTIONAL {?subject skos:scopeNote [dct:language gvp_lang:nl; rdf:value ?scope_note_nl]}
  }
ORDER BY ?term

Instead of the URL to the broaderGeneric, I want to get that term itself, in Dutch [nl]. 

I'm a total sparql noob, but I do know some SQL (and Python). I get lost very easily trying to find out when to use gvp, xl, skos, dct, and so on. I can't seem to find an overview of what has what en should be used when. Any suggestions? 

Many thanks, 
Rolf



Getty Vocabularies LOD

unread,
Nov 9, 2017, 12:22:12 PM11/9/17
to Getty Vocabularies as Linked Open Data
Hi, Rolf. Just use the same code to get the broader label that you used for the term and scope note:

SELECT DISTINCT ?subject ?label ?blabel ?scope_note_nl
WHERE
{

 
?subject luc:term "weef*"; a gvp:Concept.
  OPTIONAL
{?subject xl:prefLabel|xl:altLabel [dct:language gvp_lang:nl; xl:literalForm ?label]}

  OPTIONAL
{?subject gvp:broaderGeneric ?broader. ?broader xl:prefLabel|xl:altLabel [dct:language gvp_lang:nl; xl:literalForm ?blabel]}

  OPTIONAL
{?subject skos:scopeNote [dct:language gvp_lang:nl; rdf:value ?scope_note_nl]}
 
}
ORDER BY
?term

Gregg Garcia
Getty Digital
J. Paul Getty Trust

Richard Light

unread,
Nov 9, 2017, 12:25:36 PM11/9/17
to gettyv...@googlegroups.com

Rolf,

SELECT DISTINCT ?subject ?label ?broaderlabel ?scope_note_nl


WHERE {
  ?subject luc:term "weef*"; a gvp:Concept.
  OPTIONAL {?subject xl:prefLabel|xl:altLabel [dct:language gvp_lang:nl; xl:literalForm ?label]}

  OPTIONAL {?subject gvp:broaderGeneric ?broader . ?broader xl:prefLabel|xl:altLabel [dct:language gvp_lang:nl; xl:literalForm ?broaderlabel]}


  OPTIONAL {?subject skos:scopeNote [dct:language gvp_lang:nl; rdf:value ?scope_note_nl]}
  }

You had an unmatched closing square bracket in your second OPTIONAL statement.  I've just updated that second statement to match on ?broader, and then go on to find (and display) its label as ?broaderlabel.  Does that give you what you are after?

Best wishes,

Richard

--
You received this message because you are subscribed to the Google Groups "Getty Vocabularies as Linked Open Data" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gettyvocablo...@googlegroups.com.
To post to this group, send email to gettyv...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gettyvocablod/e016945d-e4a5-4826-be69-0cae2c428758%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
Richard Light

Rolf Blijleven

unread,
Nov 9, 2017, 2:16:07 PM11/9/17
to Getty Vocabularies as Linked Open Data
Hi Gregg, Richard, 

Yes, THANKS! 

I didn't know how to 'nest' the language specifier. Your answer lead me here, useful for a noob like me. 

Richard is right about the closing bracket. Copy-paste glitch, sorry.  

Thanks again! 

cheers,
Rolf


Op donderdag 9 november 2017 18:10:16 UTC+1 schreef Rolf Blijleven:

Vladimir Alexiev

unread,
Nov 10, 2017, 11:39:45 AM11/10/17
to Getty Vocabularies as Linked Open Data
Hi Rolf!

1. Have you read the doc to learn the shape of the data? I know it's big (100p) but there's a very useful Overview diagram

2. Don't use DISTINCT if you can help it. It kills performance on large result sets (even if you set a LIMIT)

3. You wrote in email "the term, its preferred term (if any), its broader terms(s), related term(s), equivalent term(s), source (ie the source of this term), scope note and term ID". So you're getting a bunch of data for each concept.
The query with a bunch of optionals as Gregg wrote has a defect: if a concept has 5 labels and 3 broaders, it will return 15 rows (Cartesian product). As you add cols, that Cartesian product will cause a combinatorial explosion.
Consider a different approach: get just concept IDs, then fetch the semantic resource for each one (HTTP GET on the semantic URL, see Semantic Resolution for details). 

If that's too much data for you, consider a CONSTRUCT UNION query: use the one given as example and simplify it. 
Keep in mind that wildcard properties (?p1...?pD) return all direct statements at the respective node.

Hope to help! V

Rolf Blijleven

unread,
Nov 21, 2017, 6:14:41 AM11/21/17
to Getty Vocabularies as Linked Open Data
Hi Vlad, 

thanks for your comments. Let me explain the use case. 

A few years ago, the Dutch RKD made available an AAT webservice. You can query by it URL, it returns AAT-records in Adlib XML. This year, Axiell came up with an Adlib for Windows version that can make use of this service. (See this in action in slide 5 to 7 of this presentation). However, RKD have not been able to update the contents of the webservice for about 10 years or so. I'm exploring how to best resolve that. It's a project I'm doing in my spare time, there is no funding for it. (I'm self-employed, Adlib technical support is one of the things I do for a living, I'm also board member of the Dutch Adlib user group). 

One option is to export RKD's data and import it into the webservice. I know how to do that, it just needs to be done on a regular basis. 

One other option could be to build a service that SPARQL queries the GVP AAT directly, and transforms the response into AdlibXML on the fly. That's what I'm exploring right now. The big question is of course: is this a good idea at all? 

Step one is to come up with a SPARQL query that returns the desired data (more on 'desired' below). 
The end result could be a web service, probably in Flask/Python (I can do that), that accepts a search term as input, wraps it in a SPARQL query to GVP, gets json back, transforms that into AdlibXML and returns that. 
The main reason for choosing this option, apart from fun & educational to make, would be that it'd automatically be up to date. 

'Desired' data must be matching terms with only some 1:1 related terms (not 1:n related terms). The user chooses a term from a list, Adlib creates a new thesaurus record for it, also fills in its scope note, and perhaps also creates a record for its broader term. Narrowers wouldn't do, because it'd force the creation of many related records for that term. Don't want that. 

More below


Op vrijdag 10 november 2017 17:39:45 UTC+1 schreef Vladimir Alexiev:
Hi Rolf!

1. Have you read the doc to learn the shape of the data? I know it's big (100p) but there's a very useful Overview diagram

As I'm doing this in my (scarce) spare time, I'm trying to minimize time on documentation I don't need. I'm not afraid of diagrams (I'm an EE), but I'm new to ontologies. I'm hoping to meet some giants who are so kind as to let me stand on their shoulders. I guess I already am :-) 


2. Don't use DISTINCT if you can help it. It kills performance on large result sets (even if you set a LIMIT)

Thanks for the tip, point accepted.  

3. You wrote in email "the term, its preferred term (if any), its broader terms(s), related term(s), equivalent term(s), source (ie the source of this term), scope note and term ID". So you're getting a bunch of data for each concept.
The query with a bunch of optionals as Gregg wrote has a defect: if a concept has 5 labels and 3 broaders, it will return 15 rows (Cartesian product). As you add cols, that Cartesian product will cause a combinatorial explosion.
Consider a different approach: get just concept IDs, then fetch the semantic resource for each one (HTTP GET on the semantic URL, see Semantic Resolution for details). 

Before I started this exploration, I thought that a single concept has only one broader. (That's the way it is in RKD's AAT-gateway). 
If this is not the case, then that's a bit of a show stopper. The only reason for letting the user derive the broader, is to add proper context into their local thesaurus. 
"All data about concept" would be way over the top for this use case. 
 

If that's too much data for you, consider a CONSTRUCT UNION query: use the one given as example and simplify it. 
Keep in mind that wildcard properties (?p1...?pD) return all direct statements at the respective node.

Thanks, this has to sink in. I'll be back! 


Hope to help! V

Cheers, 
Rolf 

vladimir...@ontotext.com

unread,
Nov 21, 2017, 1:05:42 PM11/21/17
to Getty Vocabularies as Linked Open Data
> thought that a single concept has only one broader.

AAT is a multi-parent hierarchy, in fact very broadly branching (factor 2x going up).
But gvp:broaderPreferred is single.

> The only reason for letting the user derive the broader, is to add proper context into their local thesaurus. 

Maybe gvp:parentString can also be of some help.

>> "the term, its preferred term (if any), its broader terms(s), related term(s), equivalent term(s), source (ie the source of this term), scope note and term ID"
> web service that accepts a search term as input, wraps it in a SPARQL query to GVP, gets json back, transforms that into AdlibXML and returns that.

1. Say Concept (or in AAT speak, Subject). Term is just the label, a concept has many labels.
2. Do you really need to return all those fields for a list of results? It's harder to make such query when some of the fields are multi-valued because you need to do either CONSTRUCT or GROUP_CONCAT (which requires GROUP BY so is slow). I think you need a subset for list, and the full set for a single concept.
3. For the related concepts, do you want thier URLs, prefLabelGVP, or both?

> 'Desired' data must be matching terms with only some 1:1 related terms (not 1:n related terms).
> The user chooses a term from a list, Adlib creates a new thesaurus record for it, also fills in its scope note, and perhaps also creates a record for its broader term.

AAT is 12 levels deep, so is "broader term" enough? Don't you need all ancestors?
Anyway, here's a query that does a bit of what you need.

construct {
 
?x a skos:Concept;
    skos
:prefLabel ?label;
    skos
:inScheme ???;
    skos
:broader ?broader.
 
?broader a skos:Concept;
    skos
:inScheme ???;
    skos
:prefLabel ?broaderLabel.
} where {
 
?x # see FTS sample queries
 
?x xl:prefLabel [dct:language gvp_lang:nl; xl:literalForm ?label].
  optional
{
   
?x gvp:broaderPreferred ?broader.
   
?broader xl:prefLabel [dct:language gvp_lang:nl; xl:literalForm ?broaderLabel].
 
}
}


Sources are not included because in what form do you want them?
AAT sources are URLs (aat_source:1235) and may have local parts (with bibo:locator, eg page number).
Do you want them as text?

Rolf Blijleven

unread,
Nov 22, 2017, 4:37:05 AM11/22/17
to Getty Vocabularies as Linked Open Data
Hi Vlad,
you bring up some good points that I had not considered before. Will take some experimenting and discussion to decide what's best.
Cheers,
Rolf

Op dinsdag 21 november 2017 19:05:42 UTC+1 schreef vladimir...@ontotext.com:
Reply all
Reply to author
Forward
0 new messages