[SHACL API] Caching & Root shapes

14 views
Skip to first unread message

Matthias

unread,
Jul 9, 2020, 5:52:40 PM7/9/20
to TopBraid Suite Users
Hello,
I am trying to validate my model with several shapes, but I want to use only one SHACL rule per validation in order to distribute the work in a cluster. I noticed that when validating the model with multiple shapes, the execution is significantly faster than running each validation only with one shape. For instance, if I call the validator once with a shape model comprising three shapes, it takes 52 seconds, while executing the validator three times with one shape each takes in total 197 seconds.

I profiled the execution and it seems that most of the time (73.8%) is spent executing org.apache.jena.reasoner.rulesys.impl.LPTopGoalIterator.moveForward() called from the statement iterator in org.topbraid.jenax.util.JenaUtil.hasSuperClass(). I don't understand how such significant performance improvements are achieved when multiple shapes are used when validating the model.

I am aware of org.topbraid.shacl.validation.ClassesCache (used in org.topbraid.shacl.validation.ValidationEngine) but I am not sure whether this is the reason for it. Are there any other caches implemented that could yield these performance boosts?

Also, I noticed that the collection rootShapes in org.topbraid.shacl.validation.ValidationEngine contains 51 items, which are not directly included in my shapes file (e.g., "sh:MinExclusiveConstraintComponent", "dash:NonRecursiveConstraintComponent", "dash:QueryTestCase", "sh:MaxInclusiveConstraintComponent"). I don't think that these shapes are executed on each validation, as they have no focus nodes, but where do they come from?

Any help would be greatly appreciated.

Holger Knublauch

unread,
Jul 9, 2020, 8:11:48 PM7/9/20
to topbrai...@googlegroups.com

Hi Matthias,

On 10/07/2020 07:43, Matthias wrote:
Hello,
I am trying to validate my model with several shapes, but I want to use only one SHACL rule per validation in order to distribute the work in a cluster. I noticed that when validating the model with multiple shapes, the execution is significantly faster than running each validation only with one shape. For instance, if I call the validator once with a shape model comprising three shapes, it takes 52 seconds, while executing the validator three times with one shape each takes in total 197 seconds.

I profiled the execution and it seems that most of the time (73.8%) is spent executing org.apache.jena.reasoner.rulesys.impl.LPTopGoalIterator.moveForward() called from the statement iterator in org.topbraid.jenax.util.JenaUtil.hasSuperClass(). I don't understand how such significant performance improvements are achieved when multiple shapes are used when validating the model.
This looks like you have rule-based inferencing activated on your data or shapes graphs. This should be switched off because SHACL already does (trivial) subClassOf subsumption as part of the standard. If you need the engine to see additional RDFS or rule inferences, then better materialize them beforehand.


I am aware of org.topbraid.shacl.validation.ClassesCache (used in org.topbraid.shacl.validation.ValidationEngine) but I am not sure whether this is the reason for it. Are there any other caches implemented that could yield these performance boosts?

Also, I noticed that the collection rootShapes in org.topbraid.shacl.validation.ValidationEngine contains 51 items, which are not directly included in my shapes file (e.g., "sh:MinExclusiveConstraintComponent", "dash:NonRecursiveConstraintComponent", "dash:QueryTestCase", "sh:MaxInclusiveConstraintComponent"). I don't think that these shapes are executed on each validation, as they have no focus nodes, but where do they come from?

The engine automatically adds the dash.ttl and shacl.ttl as sub-graphs to the shapes graphs. They contain those definitions.

HTH
Holger



Any help would be greatly appreciated.
--
You received this message because you are subscribed to the Google Groups "TopBraid Suite Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to topbraid-user...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/topbraid-users/388143d3-6925-45b0-9128-5b21842c747cn%40googlegroups.com.

Matthias

unread,
Jul 11, 2020, 1:07:14 PM7/11/20
to TopBraid Suite Users
Hello Holger,

thanks a lot for your helpful suggestions! I needed the rule inferences so I materialized them beforehand and got significant performance improvements.

All the best,
Matthias
Reply all
Reply to author
Forward
0 new messages