Hello All,
My name is Saku
Seppälä and I'm an IT-expert helping The National Library of
Finland in evaluating VocBench for ontology editing. In our
evaluation we have found that opening hierarchies is very slow in our
test setup (SKOS lexicalization and model). It could take half a
minute to open a branch that had 230 narrower concepts. Also, opening
even very small branches seemed to be relatively slow.
After some debugging
I found out that the SPARQL query that is used by Semantic Turkey to
search narrower concepts was the culprit. It turns out that the
SPARQL structure to restrict concepts based on (multiple) selected
schemes with OR-type filtering results in very slow queries. More
specifically it is the part that is used to figure out if a narrower
concept has its own children. The troublesome part of the query uses
currently the following structure:
OPTIONAL { BIND( EXISTS { ?aNarrowerConcept (<http://www.w3.org/2004/02/skos/core#broader> ) ?resource . ?subPropInScheme2 rdfs:subPropertyOf* skos:inScheme . ?aNarrowerConcept ?subPropInScheme2 ?scheme2 . FILTER (?scheme2=<http://www.yso.fi/onto/tao/> || ?scheme2=<http://www.yso.fi/onto/yso/>) } as ?attr_more ) }
This structure, where FILTER command is at the end of the OPTIONAL-BIND-EXIST structure is very hard to execute for GraphDB-engine. It may be that some previous versions of GraphDB could execute this quicker, but some later changes in query execution optimizations have resulted in drastically slower execution for this type of query. It may even trigger some optimization bug in GraphDB.
As a solution, I
propose that the query structure is changed to UNION based scheme
selection as follows:
OPTIONAL {
BIND(
EXISTS {
?aNarrowerConcept (<http://www.w3.org/2004/02/skos/core#broader> ) ?resource .
?subPropInScheme2 rdfs:subPropertyOf* skos:inScheme .
{?aNarrowerConcept ?subPropInScheme2 <http://www.yso.fi/onto/tao/> }
UNION
{?aNarrowerConcept ?subPropInScheme2 <http://www.yso.fi/onto/yso/> }
}
as ?attr_more )
}
This structure results in my tests up to 30X faster execution times. Your results may vary based on your data. I think this approach is a little bit more elegant and should be easier to execute for all SPARQL engines. The changed part of the query is produced by the filterAndOrScheme() method of the it/uniroma2/art/semanticturkey/services/core/SKOS.java class.
Currently it
produces:
proposed change:
?subPropInScheme2 rdfs:subPropertyOf* skos:inScheme . {?aNarrowerConcept ?subPropInScheme2 <http://www.yso.fi/onto/tao/> } UNION {?aNarrowerConcept ?subPropInScheme2 <http://www.yso.fi/onto/yso/> }In my view these
queries are completely interchangeable and always produce same
results in any context when run on the same data.
I will attach a
PATCH that will alter the filterAndOrScheme() method to produce the
UNION-based query structures that I proposed above. This is a very
simple patch and only changes 3 lines and removes 1 line of code.
These changes only affect situations with OR-type scheme selection is
used with SKOS-model. I have tested this patch on the source code
Semantic Turkey/VocBench 10.1.1 release, built my own binary and
tested it with our test data.
Here
is a link to my test build if you want to try
it:
http://www.tsk.fi/tiedostot/vocbench3-10.1.1-full-sparqlfix.zip
It is based on
vocbench/semanticturkey 10.1.1 release and will overwrite this
version if extracted to the same folder.
I'm happy to answer
any comments or questions you may have.
Kind regards, Saku
Seppälä
--
You received this message because you are subscribed to the Google Groups "semanticturkey-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to semanticturkey-...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/semanticturkey-user/8530668a-0ac0-4231-8483-f079fe6bb59cn%40googlegroups.com.
Hi Saku!
Thanks from my side and all the ST’s dev team as well for the great improvement you brought!
Armando
To view this discussion on the web visit https://groups.google.com/d/msgid/semanticturkey-user/207da114-b8b4-9598-925e-8c34c6a95b84%40gmail.com.