Inference 40x faster after TBC restart and save model

18 views
Skip to first unread message

Arthur Keen

unread,
Oct 18, 2010, 4:35:30 PM10/18/10
to topbrai...@googlegroups.com
I did some informal testing of SPIN versus Jena rules performance in TopBraid Composer 3.3.2.  I created a schema model, and instance data model,  and separate models containing equivallent Jena and SPIN rules to infer familial relationships.  I used a construct query to create synthetic family data (without saving).  I ran tests on 10, 100, 1000 families (without saving) and collected statistics.  For 1000 families SPIN at 40s was 5% faster than Jena .  I had some issues with Composer and restarted it and accidentally saved my synthetic data after being prompted.  After the restart and reload of the models, the same inferences (SPIN and Jena Rules) on 1000 families took less than a second, i.e., > 40x speedup.   I am trying to understand if this is a TopBraid or Jena phenomenon and what caused the speedup and how I can leverage it when doing inferences on freshly created information. 
 
Thanks
Arthur

Scott Henninger

unread,
Oct 19, 2010, 12:02:35 AM10/19/10
to TopBraid Suite Users
Arthur; I'm afraid there isn't an easy explanation for this. Since
you re-load a file, caching should not be an issue. I take it this is
a repeatable behavior, and that the >40x speedup was only on the data
with 1000 "families"?. Would it be possible to share the rules or
some variation thereof to further analyze what may be going on? Also
let us know what the other details are, such as heap space, number of
triples/file size, etc.

-- Scott

Arthur Keen

unread,
Oct 19, 2010, 12:13:39 PM10/19/10
to topbrai...@googlegroups.com
Scott,

It is repeatable.  It happened on Friday last week, and I spent yesterday confirming it, before raising the issue with you guys, because it seemed too good to be true.  I did not try it on the 10 and 100 family, since the 1000 family model was already hitting 1 second response time.  I was expecting an answer like you have to "re-index", or "re-balance", or "optimize" since a large number of new triples got added or there is something wrong with the way you are creating instances.  While demonstrating this to a colleague yesterday, I took the saved 1000 family model and added 100 more families to make 1100 and the inference speed slowed down dramatically.  Also the reset on the unsaved models is also orders of magnitude slower than on the saved/restarted models. 

Unfortunately, the test models are on a secure network, else I would send you a copy.  It is a very simple model, so I will reconstruct it and send it to you later today.

FYI, It consists of the following models:
schema.n3 Defines Person, Man, Woman, hasChild, hasFather, hasMother, hasSon, hasDaughter
rulesspin.n3  Imports schema.n3 and defines SPIN rules for inferring hasFather, hasMother, hasSon, hasDaughter from hasChild
rulesjena.n3  Imports schema.n3 and defines jena rules for inferring hasFather, hasMother, hasSon, hasDaughter from hasChild
rulesswrl.n3  Imports schema.n3 and defines swrl rules for inferring hasFather, hasMother, hasSon, hasDaughter from hasChild
instances.n3  Imports schema.n3 and defines a SPARQL CONSTRUCT query for generating families consisting of 2 parents (Man/Woman) and 2 children (Man/Woman) using hasChild
instancesspin.n3 Imports instances and rulesspin.n3
instancesjenarules.n3 Imports instances and rulesjena.n3
instancesswrlrules.n3 Imports instances and rulesswrl.n3

Test procedure
1) I set the iterator to 1000 on the query in instances.n3
2) Run the SPARQL construct query in instances.n3 for 1000 families, 
3) Assert the inferred triples 
4) Run inferences on instancesspin.n3, record time, hit reset, (avg 3 runs = 38 seconds)
5) Run inferences on instancesjenarules.n3, record time, hit reset, (avg 3 runs =41.667 seconds)
6) Run inferences on instancesswrl.n3, record time, hit reset, (avg 3 runs =41.667 seconds)
7) Save instances.n3 and restart TBC and repeat 4) 5) 6) inference time 1 second.

Also simply saving instances.n3 and closing and re-opening it has no effect on performance, probably because it stays in memory because of the imports.   I have not tried closing and re-opening all of the models.  I think the latter may have the same effect as restarting tbc.

--
You received this message because you are subscribed to the Google
Group "TopBraid Suite Users", the topics of which include TopBraid Composer,
TopBraid Live, TopBraid Ensemble, SPARQLMotion and SPIN.
To post to this group, send email to
topbrai...@googlegroups.com
To unsubscribe from this group, send email to
topbraid-user...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/topbraid-users?hl=en

Scott Henninger

unread,
Oct 19, 2010, 2:02:08 PM10/19/10
to TopBraid Suite Users
Arthur; Yes, having the model and scripts will help us understand the
issue better.

-- Scott
> > topbraid-user...@googlegroups.com<topbraid-users%2Bunsu...@googlegroups.com>

Holger Knublauch

unread,
Oct 21, 2010, 12:06:29 AM10/21/10
to topbrai...@googlegroups.com
Hi Arthur,

many thanks for sending along your example files (off-list). We had a good look into your scenario and found that the performance loss is entirely caused by the user interface. The TopSPIN inferencing always takes less than a second. But then opening the Inferences view takes a few seconds, and (worse) updating the SPARQL view that is already populated with 2000 rows takes the rest of the time. The latter was caused by a deeper issue in TBC - repainting views is fast as long as the content is from the current model, but gets slower when the rows are from a different model. This happened when you had first executed the SPARQL query on the instances file, and then switched to the SPIN file. The SPARQL view was still operating on the old model, with slower labels. For 3.4 I have changed the behavior of the SPARQL view, so that it always switches its views to the current model when files have been switched. This significantly improves performance of the scenario you describe. Meanwhile, please close the SPARQL view before switching models, and you should see similar performance gains.

The rest of the performance is caused by the fact that TBC always opens an Inferences view after running inferences. I should probably add an option to switch this behavior off, because it's a time sink. Meanwhile you could play with the Maximum number of instances to display preference to reduce workload of that view.

Thanks a lot for your patience - this was an interesting issue.
Holger
Reply all
Reply to author
Forward
0 new messages