SM Script Runs slower and slower and slower

29 views
Skip to first unread message

Tim Smith

unread,
Mar 16, 2012, 3:35:08 PM3/16/12
to topbrai...@googlegroups.com
Hi,

I'm attempting to process ~250 XML files into RDF.  I created a schema for the files using XMLSpy and imported the schema into TBC using the XSD importer.  This created two .ttl files.

I created an SM script that iterates over the files using tops:files via a bind by select module.  Prior to the Bind by Select, I import the schema ontologies and my target ontology.  In the body, I import each XML file, convert it to RDF and then run a series of CONSTRUCT queries to map each file into the target ontology.  The combination of all triples generated is then saved to disk.

The script works fine if I only run through a small number of files.  However, if I try to hit all 250 at once, it just runs slower and slower and slower...  The slow part seems to be the CONSTRUCT queries.  They run fast initially but slow significantly after 10-20 files.  For every file that I have manually tested by running the CONSTRUCT query in the SPARQL view, the query has always run very fast so I do not know why performance is so poor running as an SM script.

Any suggestions?  Are there things I can do to speed this along?  Is there data that I can collect to better inform you?

My current work around is to process each directory individually but even that hits the problem because some directories have 10's of files (not to mention the obvious hassle of changing the script - file names, base URIs, etc... for each directory)

I'm using 3.6B on win7/64 with 5G allocated to the JVM.

Thanks,

Tim

Tim Smith

unread,
Mar 16, 2012, 3:44:13 PM3/16/12
to topbrai...@googlegroups.com
One small correction - I'm using an Iterate Over Select module, not Bind by Select to process each file.

Thanks,

Tim

Gokhan Soydan

unread,
Mar 16, 2012, 3:46:37 PM3/16/12
to topbrai...@googlegroups.com
Tim,

If you are using a sml:IterateWhile module, then at the end of each
iteration, an ApplyByConstruct module with the query:

CONSTRUCT { ?s ?p ?o } WHERE { ?s ?p ?o }

and with the "sml:replace" value set to "true" might be useful. This
flattens deepening nested Jena graph objects by copying triples from the
nested graph objects into a single graph object.

If you are using sml:IterateOverSelect, you probably won't need this
module, because it flattens the nested graph objects in each loop
automatically, but you can try.

You may also try this module in other places - for example just before
entering the loop.

Gokhan

> --
> You received this message because you are subscribed to the Google
> Group "TopBraid Suite Users", the topics of which include Enterprise
> Vocabulary Network (EVN), TopBraid Composer,
> TopBraid Live, TopBraid Ensemble, SPARQLMotion and SPIN.
> To post to this group, send email to
> topbrai...@googlegroups.com
> To unsubscribe from this group, send email to
> topbraid-user...@googlegroups.com
> For more options, visit this group at
> http://groups.google.com/group/topbraid-users?hl=en

Gokhan Soydan

unread,
Mar 16, 2012, 3:47:50 PM3/16/12
to topbrai...@googlegroups.com
Ahh, I saw your message after I sent mine, but my suggestion would still be valid.

Gokhan

Tim Smith

unread,
Mar 16, 2012, 4:31:40 PM3/16/12
to topbrai...@googlegroups.com
Hi Gohkan,

Per your suggestion, I added the Apply Construct module to the end of the body.  It's running faster overall, but I'm still seeing the slow down behavior.

In addition, I've been trying different directories and the Convert XML to RDF module is crashing (stack trace below) on one of the XML instance files.  Unfortunately I cannot tell which one even when running in Debug mode.  The file name is in a variable bound from the Iterate module but I do not know how to make that display on the console as the script executes.  Since there are so many files, I don't really want to put in a break point and manually step through until it crashes.

Is there a way to display the bound variables as the script executes?

Thanks,

Tim

java.lang.reflect.InvocationTargetException
    at org.topbraidcomposer.sparqlmotion.actions.AbstractExecuteSPARQLMotionAction$1.run(AbstractExecuteSPARQLMotionAction.java:148)
    at org.topbraidcomposer.core.util.ThreadUtil$1$1.run(ThreadUtil.java:64)
    at java.lang.Thread.run(Unknown Source)
Caused by: org.topbraid.spin.sparqlmotion.modules.SMException: Failed to convert XML file using Semantic XML
    at org.topbraid.spin.sparqlmotion.lib.internal.ConvertXMLToRDFModule.createGraph(ConvertXMLToRDFModule.java:53)
    at org.topbraid.spin.sparqlmotion.modules.AbstractSMModule.getRDFOutput(AbstractSMModule.java:849)
    at org.topbraid.spin.sparqlmotion.engine.impl.ExecutionEngineImpl.executeModule(ExecutionEngineImpl.java:175)
    at org.topbraid.spin.sparqlmotion.engine.impl.ExecutionEngineImpl.execute(ExecutionEngineImpl.java:120)
    at org.topbraid.spin.sparqlmotion.modules.AbstractSMModule.executeSubScript(AbstractSMModule.java:292)
    at org.topbraid.spin.sparqlmotion.lib.internal.IterateOverSelectModule.access$0(IterateOverSelectModule.java:1)
    at org.topbraid.spin.sparqlmotion.lib.internal.IterateOverSelectModule$1.run(IterateOverSelectModule.java:175)
    ... 1 more
Caused by: java.util.ConcurrentModificationException
    at com.hp.hpl.jena.mem.HashCommon$BasicKeyIterator.hasNext(HashCommon.java:338)
    at com.hp.hpl.jena.util.iterator.NiceIterator$1.hasNext(NiceIterator.java:87)
    at com.hp.hpl.jena.util.iterator.WrappedIterator.hasNext(WrappedIterator.java:64)
    at com.hp.hpl.jena.util.iterator.FilterIterator.hasNext(FilterIterator.java:43)
    at com.hp.hpl.jena.util.iterator.WrappedIterator.hasNext(WrappedIterator.java:64)
    at com.hp.hpl.jena.util.iterator.NiceIterator$1.hasNext(NiceIterator.java:86)
    at com.hp.hpl.jena.util.iterator.WrappedIterator.hasNext(WrappedIterator.java:64)
    at com.hp.hpl.jena.util.iterator.FilterIterator.hasNext(FilterIterator.java:43)
    at com.hp.hpl.jena.graph.compose.CompositionBase$2.hasNext(CompositionBase.java:99)
    at com.hp.hpl.jena.util.iterator.NiceIterator$1.hasNext(NiceIterator.java:86)
    at com.hp.hpl.jena.util.iterator.WrappedIterator.hasNext(WrappedIterator.java:64)
    at com.hp.hpl.jena.util.iterator.NiceIterator$1.hasNext(NiceIterator.java:86)
    at com.hp.hpl.jena.util.iterator.WrappedIterator.hasNext(WrappedIterator.java:64)
    at com.hp.hpl.jena.util.iterator.FilterIterator.hasNext(FilterIterator.java:43)
    at com.hp.hpl.jena.graph.compose.CompositionBase$2.hasNext(CompositionBase.java:99)
    at com.hp.hpl.jena.util.iterator.NiceIterator$1.hasNext(NiceIterator.java:86)
    at com.hp.hpl.jena.util.iterator.WrappedIterator.hasNext(WrappedIterator.java:64)
    at com.hp.hpl.jena.util.iterator.NiceIterator$1.hasNext(NiceIterator.java:86)
    at com.hp.hpl.jena.graph.query.SimpleQueryHandler.subjectsFor(SimpleQueryHandler.java:61)
    at com.hp.hpl.jena.graph.query.SimpleQueryHandler.subjectsFor(SimpleQueryHandler.java:44)
    at com.hp.hpl.jena.rdf.model.impl.ModelCom.listSubjectsFor(ModelCom.java:1019)
    at com.hp.hpl.jena.rdf.model.impl.ModelCom.listResourcesWithProperty(ModelCom.java:1033)
    at com.hp.hpl.jena.rdf.model.impl.ModelCom.listSubjectsWithProperty(ModelCom.java:433)
    at org.topbraid.spin.sparqlmotion.lib.internal.sxml.XML2RDF.getExistingURISubject(XML2RDF.java:593)
    at org.topbraid.spin.sparqlmotion.lib.internal.sxml.XML2RDF.getExistingURISubject(XML2RDF.java:588)
    at org.topbraid.spin.sparqlmotion.lib.internal.sxml.XML2RDF.getAnnotatedElementClass(XML2RDF.java:351)
    at org.topbraid.spin.sparqlmotion.lib.internal.sxml.XML2RDF.getElementType(XML2RDF.java:504)
    at org.topbraid.spin.sparqlmotion.lib.internal.sxml.XML2RDF.createElement(XML2RDF.java:141)
    at org.topbraid.spin.sparqlmotion.lib.internal.sxml.XML2RDF.createElement(XML2RDF.java:126)
    at org.topbraid.spin.sparqlmotion.lib.internal.sxml.XML2RDF.createElement(XML2RDF.java:236)
    at org.topbraid.spin.sparqlmotion.lib.internal.sxml.XML2RDF.createElement(XML2RDF.java:126)
    at org.topbraid.spin.sparqlmotion.lib.internal.sxml.XML2RDF.createElement(XML2RDF.java:236)
    at org.topbraid.spin.sparqlmotion.lib.internal.sxml.XML2RDF.createElement(XML2RDF.java:126)
    at org.topbraid.spin.sparqlmotion.lib.internal.sxml.XML2RDF.createElement(XML2RDF.java:236)
    at org.topbraid.spin.sparqlmotion.lib.internal.sxml.XML2RDF.createElement(XML2RDF.java:126)
    at org.topbraid.spin.sparqlmotion.lib.internal.sxml.XML2RDF.createElement(XML2RDF.java:236)
    at org.topbraid.spin.sparqlmotion.lib.internal.sxml.XML2RDF.createElement(XML2RDF.java:126)
    at org.topbraid.spin.sparqlmotion.lib.internal.sxml.XML2RDF.createElement(XML2RDF.java:236)
    at org.topbraid.spin.sparqlmotion.lib.internal.sxml.XML2RDF.createElement(XML2RDF.java:126)
    at org.topbraid.spin.sparqlmotion.lib.internal.sxml.XML2RDF.createDocument(XML2RDF.java:119)
    at org.topbraid.sxml.mapping.XML2RDFLoader.load(XML2RDFLoader.java:77)
    at org.topbraid.sparqlmotion.lib.convertXMLToRDF.ConvertXMLToRDFModule.load(ConvertXMLToRDFModule.java:24)
    at org.topbraid.spin.sparqlmotion.lib.internal.ConvertXMLToRDFModule.createGraph(ConvertXMLToRDFModule.java:50)
    ... 7 more

Tim Smith

unread,
Mar 16, 2012, 5:10:50 PM3/16/12
to topbrai...@googlegroups.com
More investigation into the Convert XML to RDF module crashing.

This only seems to occur when running multiple threads.  As long as I only use one thread it will process the files correctly, albeit slowly.

I guess this is one of the watchouts for using multiple threads!

Tim

Scott Henninger

unread,
Mar 16, 2012, 5:15:58 PM3/16/12
to TopBraid Suite Users
Focusing just on this part:

> Is there a way to display the bound variables as the script executes?

...you may find smf:trace to be useful.

-- Scott

Gokhan Soydan

unread,
Mar 16, 2012, 7:07:44 PM3/16/12
to topbrai...@googlegroups.com
Tim,

You may want to try assigning different sml:baseURI values in the sml:ConvertXMLToRDF for each iteration.

Gokhan
Reply all
Reply to author
Forward
0 new messages