SPARQL scheme selection optimization for filterAndOrScheme() method (SKOS.java)

81 views
Skip to first unread message

Saku Seppälä

unread,
Dec 17, 2021, 10:54:51 AM12/17/21
to semanticturkey-user

 Hello All,

My name is Saku Seppälä and I'm an IT-expert helping The National Library of Finland in evaluating VocBench for ontology editing. In our evaluation we have found that opening hierarchies is very slow in our test setup (SKOS lexicalization and model). It could take half a minute to open a branch that had 230 narrower concepts. Also, opening even very small branches seemed to be relatively slow.

After some debugging I found out that the SPARQL query that is used by Semantic Turkey to search narrower concepts was the culprit. It turns out that the SPARQL structure to restrict concepts based on (multiple) selected schemes with OR-type filtering results in very slow queries. More specifically it is the part that is used to figure out if a narrower concept has its own children. The troublesome part of the query uses currently the following structure:

OPTIONAL { BIND( EXISTS { ?aNarrowerConcept (<http://www.w3.org/2004/02/skos/core#broader> ) ?resource . ?subPropInScheme2 rdfs:subPropertyOf* skos:inScheme . ?aNarrowerConcept ?subPropInScheme2 ?scheme2 . FILTER (?scheme2=<http://www.yso.fi/onto/tao/> || ?scheme2=<http://www.yso.fi/onto/yso/>) } as ?attr_more ) }

This structure, where FILTER command is at the end of the OPTIONAL-BIND-EXIST structure is very hard to execute for GraphDB-engine. It may be that some previous versions of GraphDB could execute this quicker, but some later changes in query execution optimizations have resulted in drastically slower execution for this type of query. It may even trigger some optimization bug in GraphDB.

As a solution, I propose that the query structure is changed to UNION based scheme selection as follows:

OPTIONAL { BIND( EXISTS { ?aNarrowerConcept (<http://www.w3.org/2004/02/skos/core#broader> ) ?resource . ?subPropInScheme2 rdfs:subPropertyOf* skos:inScheme . {?aNarrowerConcept ?subPropInScheme2 <http://www.yso.fi/onto/tao/> } UNION {?aNarrowerConcept ?subPropInScheme2 <http://www.yso.fi/onto/yso/> } } as ?attr_more ) }

This structure results in my tests up to 30X faster execution times. Your results may vary based on your data. I think this approach is a little bit more elegant and should be easier to execute for all SPARQL engines. The changed part of the query is produced by the filterAndOrScheme() method of the it/uniroma2/art/semanticturkey/services/core/SKOS.java class.

Currently it produces:

?aNarrowerConcept ?subPropInScheme2 ?scheme2 . FILTER (?scheme2=<http://www.yso.fi/onto/tao/> || ?scheme2=<http://www.yso.fi/onto/yso/>)

proposed change:

?subPropInScheme2 rdfs:subPropertyOf* skos:inScheme . {?aNarrowerConcept ?subPropInScheme2 <http://www.yso.fi/onto/tao/> } UNION {?aNarrowerConcept ?subPropInScheme2 <http://www.yso.fi/onto/yso/> }

In my view these queries are completely interchangeable and always produce same results in any context when run on the same data.

I will attach a PATCH that will alter the filterAndOrScheme() method to produce the UNION-based query structures that I proposed above. This is a very simple patch and only changes 3 lines and removes 1 line of code. These changes only affect situations with OR-type scheme selection is used with SKOS-model. I have tested this patch on the source code Semantic Turkey/VocBench 10.1.1 release, built my own binary and tested it with our test data.

Here is a link to my test build if you want to try it:
http://www.tsk.fi/tiedostot/vocbench3-10.1.1-full-sparqlfix.zip

It is based on vocbench/semanticturkey 10.1.1 release and will overwrite this version if extracted to the same folder.

I'm happy to answer any comments or questions you may have.


Kind regards,  Saku Seppälä

semanticturkey.patch

tur...@info.uniroma2.it

unread,
Dec 22, 2021, 6:14:06 AM12/22/21
to semanticturkey-user
Hi Saku Seppälä,
I've tried your version of ST/VB (downloading the zip you provided) but there is something strange with the concept tree.
I've created a local project in which I have two schemes: "S1" and "S2". Then in "S1" I have two concepts "C-A" and "C-B" where C-A is topConceptOf S1 and C-B has broader C-A . If I set C-B as topConceptOf S2 and (selecting only S1 in the scheme tab) refresh the concept tree I see both C-A and C-B as topConceptOf the scheme S1 while only C-A should be seen as topConceptOf S1 (since C-B belongs to S1 but it is topConceptOf S2 and not S1).
I've tested also on a remote project but the problem is still there, so I think that the fix you did to the SPARQL query is not managing well the case in which a concept of the current scheme is topConceptOf another scheme.
Could you please test this simple case and let me know if you have the same bug on your version?

Thank you

Best Regards

Andrea

Saku Seppälä

unread,
Dec 23, 2021, 5:41:59 AM12/23/21
to semanticturkey-user
Hi Andrea,

I tried your example and there is indeed a problem with the optimization with regards to filtering the top concepts of a scheme. I should have looked the code a bit more carefully to see that I also need to make a change to the code path that is handling the filtering of top concepts, as it is done differently than filtering of narrower concepts.  I missed this issue because none of my test cases contained this kind of structure and they worked perfectly. After quickly looking this issue, I think that this can be easily fixed with another small change to the filterAndOrScheme() method.

However, due to the holidays and my contract with the National Library of Finland being up for renewal for next year, fix for this issue has to wait for early next year. I can then post a new a new patch and a link to a new binary for testing.

Kind regards,  Saku Seppälä

Saku Seppälä

unread,
Dec 30, 2021, 10:48:39 AM12/30/21
to semanticturkey-user
Hi Andrea,

I already got clearance to continue this work, so that the changes could be included to ST/VB ASAP. Here is a new version of the SPARQL optimization. To properly handle the filtering of the top concepts I decided to change the code a little bit more than previously, mainly because of code readability and consistency reasons. Now the 'and' and 'or' query cases are handled in similar fashion.

This patch changes the filtering of top concepts to use the same kind of UNION construct as I did previously with narrower concepts:

{?resource skos:topConceptOf|^skos:hasTopConcept  <http://www.yso.fi/onto/mao/associatedYsoConcepts/>}
 UNION {?resource skos:topConceptOf|^skos:hasTopConcept  <http://www.yso.fi/onto/tao/>}
 UNION {?resource skos:topConceptOf|^skos:hasTopConcept  <http://www.yso.fi/onto/yso/>}


I will attach a patch to this message and a test build of VocBench/Semantic Turkey can be downloaded from here:
http://www.tsk.fi/tiedostot/vocbench3-10.1.1-full-sparqlfix2.zip

As previously, it is based on VB/ST 10.1.1 release and will overwrite this version if extracted to the same folder.


I'm happy to answer any comments or questions you may have.

Kind regards,  Saku Seppälä

PS.

As a side note, to me there seems to be a slight logical inconsistency in the SPARQL query. The skos:inScheme filtering is done using the direct SKOS property and any of its sub properties, but skos:topConceptOf|^skos:hasTopConcept filtering is done only using the these direct SKOS properties and not their sub properties. I did not do anything to this because the change would not be completely transparent and interchangeable with the previous implementation and might need a little bit more code changes.

Also, while looking at the passing of the SPARQL variable names to the filterAndOrScheme method, I thought if it would a good idea to create a mechanism to dynamically create unique SPARQL variable names (ID-number generator). That is, in cases where method wants to create a unique local SPARQL variable that does not clash with any other variable, it could just create one it self and variable names would not have to be provided for it in the method call. Only variable names that are needed to connect the snippet to the rest of the SPARQL query would be provided in the call.

On Wednesday, December 22, 2021 at 1:14:06 PM UTC+2 tur...@info.uniroma2.it wrote:
semanticturkey2.patch

Andrea Turbati

unread,
Jan 3, 2022, 4:04:06 AM1/3/22
to semantict...@googlegroups.com
Hi Saku Seppälä,
I've tried your version and now it seems to be working fine and also there are really better performances when more than a scheme is selected (in both getTopConcepts and in getNarrowerConcepts). I'll do some more test and then I'll integrate your suggestions in the source code. Regarding the use of skos:hasTopConcept you are right, I'll see how to fix this by using their subproperties as well.

Thank you

Best Regards

Andrea


--
You received this message because you are subscribed to the Google Groups "semanticturkey-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to semanticturkey-...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/semanticturkey-user/8530668a-0ac0-4231-8483-f079fe6bb59cn%40googlegroups.com.


Armando Stellato

unread,
Jan 3, 2022, 6:21:30 AM1/3/22
to semantict...@googlegroups.com

Hi Saku!

 

Thanks from my side and all the ST’s dev team as well for the great improvement you brought!

 

Armando

Reply all
Reply to author
Forward
0 new messages