queries for TGN and ULAN

75 views
Skip to first unread message

Vladimir Alexiev

unread,
Nov 13, 2015, 4:24:25 AM11/13/15
to Getty Vocabularies as Linked Open Data

Cristiano Bianchi <c.bi...@keepthinking.it> writes:


We are trying to use the ULAN and TGN queries, following the successful use of AAT - but struggling.

 

In both cases we want to search the term/label only, not full text and sort by relevance. We are doing autocomplete - so partial strings should be ok (e.g. picass for picasso)

 

ULAN

In ULAN I'd like to search for Picasso and get the great Pablo first (if possible). If not, happy to settle on alphabetical sorting. The only way we can think of is the FULL TEXT search, but that searches all fields. Is there a way to only search the Term field?

 

select ?Subject ?Term ?Parents ?Descr ?ScopeNote ?Type (coalesce(?Type1,?Type2) as ?ExtraType) {

 ?Subject luc:term "picasso"; skos:inScheme ulan: ;a ?typ.

 ?typ rdfs:subClassOf gvp:Subject; rdfs:label ?Type.

 filter (?typ != gvp:Subject)

 optional {?Subject gvp:placeTypePreferred [gvp:prefLabelGVP [xl:literalForm ?Type1]]}

 optional {?Subject gvp:agentTypePreferred [gvp:prefLabelGVP [xl:literalForm ?Type2]]}

 optional {?Subject gvp:prefLabelGVP [xl:literalForm ?Term]}

 optional {?Subject gvp:parentStringAbbrev ?Parents}

 optional {?Subject foaf:focus/gvp:biographyPreferred/schema:description ?Descr}

 optional {?Subject skos:scopeNote [dct:language gvp_lang:en; rdf:value ?ScopeNote]}}

 ORDER BY (fn:lower-case(str(?Term)))

 

TGN

Similarly, if I use the below to search for London, I get anything that has London anywhere and if I don't use order by I get my London as the last Term, together with lots of others that I do not really need (e.g. Barking and Dagenham). I'd like to search the Term only.

 

select ?Subject ?Term ?Parents ?Descr ?ScopeNote ?Type (coalesce(?Type1,?Type2) as ?ExtraType) {
  ?Subject luc:term "london"; skos:inScheme tgn: ;a ?typ.
  ?typ rdfs:subClassOf gvp:Subject; rdfs:label ?Type.
  filter (?typ != gvp:Subject)
  optional {?Subject gvp:placeTypePreferred [gvp:prefLabelGVP [xl:literalForm ?Type1]]}
  optional {?Subject gvp:agentTypePreferred [gvp:prefLabelGVP [xl:literalForm ?Type2]]}
  optional {?Subject gvp:prefLabelGVP [xl:literalForm ?Term]}
  optional {?Subject gvp:parentStringAbbrev ?Parents}
  optional {?Subject foaf:focus/gvp:biographyPreferred/schema:description ?Descr}
  optional {?Subject skos:scopeNote [dct:language gvp_lang:en; rdf:value ?ScopeNote]}}
  ORDER BY (fn:lower-case(str(?Term)))


 

Can you please help!?

Vladimir Alexiev

unread,
Nov 13, 2015, 5:08:31 AM11/13/15
to Getty Vocabularies as Linked Open Data, Cristiano Bianchi
Hi Cristiano!

Regarding searching, there's been two discussions here:
> we want to search the term/label only, not full text 

luc:term are all the labels (excluding scope note). But it seems to me you want to search by prefLabel only. Eg the above returns "Francoise Gilot" since she has altLabel "Francoise Picasso". 

> partial strings should be ok (e.g. picass for picasso)

This means you can't use lucene "exact phrase" search
luc:term '  "picasso*"  '  # returns 15, same as "picasso"
luc
:term '  "picass*"  '   # returns nothing because there's nobody named "picass"

So your best bet is to search by luc:term with wildcard to get a wider list of candidates, then filter by regex on prefLabel to pick only the relevant ones:

select ?Subject ?Term ?Parents ?Descr ?ScopeNote ?Type (coalesce(?Type1,?Type2) as ?ExtraType) {
 ?Subject luc:term "picass*"; skos:inScheme ulan: ; a ?typ; gvp:prefLabelGVP [xl:literalForm ?Term].
 filter (regex(?Term,"picass","i"))
 ?typ rdfs:subClassOf gvp:Subject; rdfs:label ?Type.
 filter (?typ != gvp:Subject)
 optional {?Subject gvp:placeTypePreferred [gvp:prefLabelGVP [xl:literalForm ?Type1]]}
 optional {?Subject gvp:agentTypePreferred [gvp:prefLabelGVP [xl:literalForm ?Type2]]}
 optional {?Subject gvp:parentStringAbbrev ?Parents}
 optional {?Subject foaf:focus/gvp:biographyPreferred/schema:description ?Descr}
 optional {?Subject skos:scopeNote [dct:language gvp_lang:en; rdf:value ?ScopeNote]}}
ORDER BY (fn:lower-case(str(?Term)))

About relevance: unfortunately there's no explicit info to set out Pablo as more important than the other Picasso's.
  • Maybe he has more associative relations (influenced more people)? We could enable RDFRank in GraphDB, which is a graph centrality measure (similar to Google PageRank), but only if there's evidence that important people have more connections (that's true in a big CH dataset, but I doubt it's true in a thesaurus).
  • Other heuristics about importance could include: length of biography, number of biographies, number of revision actions, number of life events. (If you look at the results of my query, you'll notice Pablo has the longest biography). But of course this depends on particular editorial actions.
  • If you explore some of the above and come up with a measure, we'll optimize the respective query and put it in the documentation.
  • Regarding TGN, there is a flag "important" in Getty's internal system. Getty plan to publish it one day but it's currently underpopulated (about 15 instances), so we decided together it's not yet worth it.
Cheers! Vladimir

Cristiano Bianchi

unread,
Nov 19, 2015, 11:01:23 AM11/19/15
to Vladimir Alexiev, Getty Vocabularies as Linked Open Data, Parahat Melayev
Thanks Vladimir! We'll give these are go.
Best, Cristiano
--


Keepthinking is ranked no. 13 among smaller independent digital agencies in UK

Art Detective wins overall Best of the Web Award at Museums and the Web 2015

Art Detective wins Silver Award at AAM 2015 in Atlanta


---

Cristiano Bianchi
Keepthinking

43 Clerkenwell Road
London EC1M 5RS
tel +44 20 7490 5337
mobile +44 7939 041169 (UK)
---

Registration no. 04905582
VAT 831132962

Reply all
Reply to author
Forward
0 new messages