General question about reliability of Getty Vocabularies

61 views
Skip to first unread message

Andre Waloszek

unread,
Nov 3, 2015, 4:58:52 AM11/3/15
to Getty Vocabularies as Linked Open Data
Hi everyone, 

I found out that when I was querying the Getty Vocabs in the last days. The response for a query is not always the same. I just wanted to ask if anyone knows what could be the issue. For example the following query should return two results. 

select ?VocNode ?VocNodeID ?VocNodePath ?VocTerm ?VocTermID ?VocTermLabel ?VocNodeScopeNote ?VocNodeProviderView {
  ?VocNode a skos:Concept; luc:term "spark"; skos:inScheme aat: ; gvp:prefLabelGVP [xl:literalForm ?VocTermLabel].
  ?VocNode dc:identifier ?VocNodeID.
  ?VocNode gvp:prefLabelGVP ?VocTerm .
  ?VocNode gvp:parentString ?VocNodePath .
  ?VocNode rdfs:seeAlso ?VocNodeProviderView .
  ?VocNode skos:scopeNote [dct:language gvp_lang:en; rdf:value ?VocNodeScopeNote] .
  ?VocTerm dc:identifier ?VocTermID .
} order by asc(lcase(str(?VocTerm))) LIMIT 100

But in the last days it does not always return a value. Sometimes the response is zero but that is definitely wrong. There is no error value returned. I have a similar issue with another query I am using a a lot. The following query shall return the hierarchy for a given node and the node itself, but sometimes it only returns the node and not its hierarchy, sometimes it returns a zero result. No error message in the response.

select ?VocNode ?VocNodeLogicalName ?VocNodePath ?VocTerm ?VocTermID ?VocTermLabel {
  { ?VocNode dc:identifier "300180842" }   UNION { aat:300180842 gvp:broaderPreferredExtended ?VocNode} .
  ?VocNode gvp:prefLabelGVP [xl:literalForm ?VocTermLabel]; gvp:parentString ?parentStr .
  ?VocNode dc:identifier ?VocNodeID .
  ?VocNode gvp:prefLabelGVP ?VocTerm .
  ?VocTerm dc:identifier ?VocTermID .
  bind (if (?parentStr="Top of the AAT hierarchies", "", ?parentStr) as ?VocNodePath)
} order by asc(strlen(?VocNodePath))


Is there something wrong with queries in general or is there some maintenance ongoing? At the moment I can't trust any results of my queries. Is there any information about maintenance or the reliability of the Getty Vocabs on the website? Unfortunately I did not find anything.

Best and thanks in advance for any help.
André

Vladimir Alexiev

unread,
Nov 3, 2015, 6:54:48 PM11/3/15
to Getty Vocabularies as Linked Open Data, gga...@getty.edu
Hi Andre and thanks for the heads up!
There's an internal query timeout, but your queries are good.
Some improvements can be made (see at bottom) but they are not essential.
I confirm that running your second query several times, the first time I got 0 and then I got 11 results.

I would suspect that one of the cluster worker nodes is toast, and somehow the master isn't noticing this and is routing queries to it.
Could be that the database copy of the bad worker is empty or somehow defective, so the worker is answering but finding nothing.
(Gregg, remember we had a similar problem once, but it involved the entity files not the RDF store.)

Gregg, could you investigate? 
Also we should talk about some stronger monitoring procedure, i.e. what query to use to ping.

PS: Small query improvements:
- you may want to sprinkle some OPTIONALs for fields that might be missing (eg I think not 100% of AAT nodes have scope notes)
- don't use gvp:prefLabelGVP twice. Use it once and get all the data you need out of its target node
- instead of {?VocNode dc:identifier "300180842"} try {bind(aat:300180842 as ?VocNode}, that will save you a join

Gregg Garcia

unread,
Nov 3, 2015, 7:48:18 PM11/3/15
to Getty Vocabularies as Linked Open Data, gga...@getty.edu

Looks like the culprit is worker 1 on node 1.  I removed it from the master for now until I can look at it / rebuild it tomorrow.  Let me know if there are any more problems.


Gregg Garcia
Software Architect
J. Paul Getty Trust
Reply all
Reply to author
Forward
0 new messages