pipeline hangs

51 views
Skip to first unread message

Marisa DeMeglio

unread,
Feb 20, 2013, 1:20:25 AM2/20/13
to daisy-pip...@googlegroups.com
So, as you have seen (off-list), we got some sample content from APH that hangs the pipeline. Here's what I get when I run this command:

./dp2 nimas-fileset-validator --i-source ~/Downloads/9780618900824NIMAS/9780618900824NIMAS.opf --x-output-dir /tmp/dpout

[DP2] Waiting for the WS to come up
[DP2] The daisy pipeline 2 WS is up!
[DP2] Job with id aabb17f0-c25c-4c64-9ceb-2473916c2e26 submitted to the server
[WS] INFO(1) - Message:Nimas fileset validator: Loading DTBook document.
[WS] INFO(2) - Message:Nimas fileset validator: Validating DTBook document.
[WS] INFO(3) - Message:Nimas fileset validator: Validating package document.
[WS] INFO(4) - Message:(debug) Nimas fileset validator: Validating package document against RNG.

It hangs for a very long time at this point. I have confirmed that it will hang for more than 3 hours with only jetty messages continually appearing in the log. A log file is attached for this job execution (though not the 3-hour version :)

I have another, smaller NIMAS sample that causes it to hang at a different point:

[DP2] Job with id 5dbfd42e-cfa5-4f95-9ab6-09fa6d38897a submitted to the server
[WS] INFO(1) - Message:Nimas fileset validator: Loading DTBook document.
[WS] INFO(2) - Message:Nimas fileset validator: Validating DTBook document.
[WS] INFO(3) - Message:Nimas fileset validator: Validating package document.
[WS] INFO(4) - Message:(debug) Nimas fileset validator: Validating package document against RNG.
[WS] INFO(5) - Message:Nimas fileset validator: Checking that PDFs exist on disk.
[WS] INFO(6) - Message:(debug) Nimas fileset validator: Validating package document; combining reports.
[WS] INFO(7) - Message:(debug) Nimas fileset validator: Validating package document against schematron.
[WS] INFO(8) - Message:Nimas fileset validator: Formatting report as HTML.
[WS] INFO(9) - Message:Nimas fileset validator: Creating reports index.


Compare with another, smaller sample (included in the Nimas fileset validator test dir), which runs just fine:
./dp2 nimas-fileset-validator --i-source ../../../../../pipeline-scripts/nimas-fileset-validator/src/test/MathML_Modular_Extension_Example_2/MathML_Sample2.opf --x-output-dir /tmp/dpout 
[DP2] Job with id 3544229a-2855-4232-96b0-1d918c4f94d1 submitted to the server
[WS] INFO(1) - Message:Nimas fileset validator: Loading DTBook document.
[WS] INFO(2) - Message:Nimas fileset validator: Validating DTBook document.
[WS] INFO(3) - Message:Nimas fileset validator: Loading DTBook document.
[WS] INFO(4) - Message:Nimas fileset validator: Validating DTBook document.
[WS] INFO(5) - Message:Nimas fileset validator: Validating package document.
[WS] INFO(6) - Message:(debug) Nimas fileset validator: Validating package document against RNG.
[WS] INFO(7) - Message:Nimas fileset validator: Checking that PDFs exist on disk.
[WS] INFO(8) - Message:(debug) Nimas fileset validator: Validating package document; combining reports.
[WS] INFO(9) - Message:(debug) Nimas fileset validator: Validating package document against schematron.
[WS] INFO(10) - Message:Nimas fileset validator: Formatting report as HTML.
[WS] INFO(11) - Message:Nimas fileset validator: Creating reports index.
[WS] INFO(12) - Message:Nimas fileset validator: Storing reports to disk in output directory: file:/tmp/dpout
[DP2]  The job 3544229a-2855-4232-96b0-1d918c4f94d1 has been deleted from the server


Also compare with output from running directly in Oxygen, which takes 34 seconds for the bigger sample (9780618900824NIMAS.opf):

Message: Nimas fileset validator: validating files.
Message: Nimas fileset validator: Loading DTBook document.
Message: Nimas fileset validator: Validating DTBook document.
Message: Nimas fileset validator: Validating package document.
Message: (debug) Nimas fileset validator: Validating package document against RNG.
Message: Nimas fileset validator: Checking that PDFs exist on disk.
Message: (debug) Nimas fileset validator: Validating package document; combining reports.
Message: (debug) Nimas fileset validator: Validating package document against schematron.
Message: Nimas fileset validator: Formatting report as HTML.
Message: Nimas fileset validator: Creating reports index.
Message: Nimas fileset validator: Storing reports to disk in output directory: file:/tmp/dpout

Any thoughts? Let me know if you need the samples.

Marisa

daisy-pipeline_hangs.log.zip

Romain Deltour

unread,
Feb 20, 2013, 4:24:30 AM2/20/13
to daisy-pip...@googlegroups.com
Have you tried with the jobs-refactoring branch ?

Or it might be due to a missing entity in the catalogs ?

Romain.


--
You received this message because you are subscribed to the Google Groups "DAISY Pipeline Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to daisy-pipeline-...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


<daisy-pipeline_hangs.log.zip>

Daniel Weck

unread,
Feb 20, 2013, 5:06:37 AM2/20/13
to daisy-pip...@googlegroups.com
fyi (perhaps totally unrelated to marisa's issue): when executed from tobi (.net shell, dp2.exe with command line parameters), the pipeline2 validator often timeouts, like if the ruby wrapper wasn't sending the "end" signal properly when the job is finished (after a predefined timeout duration, we force-interrupt the process, and sometimes we still get a complete, albeit delayed, report from the stderr and stdout buffers).

that's the reason why i want to start using the http rest api directly, but to be fair it adds some unnecessary complexity for one-off validation jobs (because of the in/out data packaging, and the setup process). is there some high-level API or sample code we could use to query the locally-hosted web service for simple validation jobs? (we prefer non-zipped input data located in local folder)

i see some refactoring going on, is this internal or will this impact the command line and/or webservice protocol?

thank you very much, daniel

Romain.


--
You received this message because you are subscribed to the Google Groups "DAISY Pipeline Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to daisy-pipeline-...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


<daisy-pipeline_hangs.log.zip>

Jostein Austvik Jacobsen

unread,
Feb 20, 2013, 5:25:23 AM2/20/13
to daisy-pip...@googlegroups.com
When the framework is running in local mode (used by default for dp2.exe as well as the standalone Web UI distribution), you can pass absolute file URIs to the WS instead of zipping the content.

Jostein

Javier Asensio

unread,
Feb 20, 2013, 7:08:00 AM2/20/13
to daisy-pip...@googlegroups.com
Hi Marisa,

I've been playing around with the file no matter which configuration I set for the jvm in terms of memory I always run out of it [1]. You said that you can execute it in oxygen, you know exactly which version of jing and other relaxng related libraries are you using?

Daniel, as our user base it's pretty much reduced, this kind of feedback is amazingly appreciated, do you have any logs of this stalls occurring when you call dp2.exe through toby?. As you said we are working in a big refactoring. Once it's done, the interaction through the webapi should be easier in uses cases like yours.


cheers,



[1]: Exception in thread "Thread-18" java.lang.OutOfMemoryError: GC overhead limit exceeded
        at net.sf.saxon.tree.util.FastStringBuffer.<init>(FastStringBuffer.java:35)
        at net.sf.saxon.tree.tiny.TinyTree.<init>(TinyTree.java:206)
        at net.sf.saxon.tree.tiny.TinyTree.<init>(TinyTree.java:170)
        at net.sf.saxon.tree.tiny.TinyBuilder.open(TinyBuilder.java:116)
        at net.sf.saxon.event.ProxyReceiver.open(ProxyReceiver.java:80)
        at net.sf.saxon.event.ProxyReceiver.open(ProxyReceiver.java:80)
        at com.xmlcalabash.util.TreeWriter.startDocument(Unknown Source)
        at com.xmlcalabash.library.ValidateJing$RNGErrorHandler.error(Unknown Source)
        at com.thaiopensource.xml.sax.CountingErrorHandler.error(CountingErrorHandler.java:65)
        at com.thaiopensource.relaxng.sax.PatternValidator.check(PatternValidator.java:116)
        at com.thaiopensource.relaxng.sax.PatternValidator.startElement(PatternValidator.java:38)
        at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.startElement(AbstractSAXParser.java:501)
        at com.sun.org.apache.xerces.internal.parsers.AbstractXMLDocumentParser.emptyElement(AbstractXMLDocumentParser.java:179)
        at com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.scanStartElement(XMLNSDocumentScannerImpl.java:377)
        at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(XMLDocumentFragmentScannerImpl.java:2755)

cheers,

Javi

Marisa DeMeglio

unread,
Feb 20, 2013, 1:38:36 PM2/20/13
to daisy-pip...@googlegroups.com
Here are my library versions, from the Eclipse Oxygen plugin directory:

/Applications/eclipse/plugins/com.oxygenxml.editor_14.1.0.v2012121012/lib

Jing version 20091111
Trang version 20091111
calabash 1.0.3-94
SAXON 6.5.5 from Michael Kay
Saxon-EE 9.4.0.4J from Saxonica

I would expect Calabash uses Saxon 9x.

I don't see any out of memory exceptions in any of the logs I've saved. I'm running from this dir, using all the defaults from a maven build:
/Users/marisa/Projects/pipeline2/pipeline-assembly/target/pipeline2-1.4/daisy-pipeline/cli

Did you try it with the jobs-refactor branch?

Thanks
Marisa

Javier Asensio

unread,
Feb 20, 2013, 2:08:48 PM2/20/13
to daisy-pip...@googlegroups.com
Nope, I tried with the master. I'll try with the 1.4. Sorry about yesterday the with got busier than I thought.

cheers,

Javi

Marisa DeMeglio

unread,
Feb 20, 2013, 5:50:20 PM2/20/13
to daisy-pip...@googlegroups.com
No problem :) So where do you see the memory errors? I don't see them.

I have tried a few different combinations, and I'm noticing some differences. I'll try to summarize:

Win7, pipeline2-1.4 + new modules, first run, smaller NIMAS sample = works

Win7, pipeline2-1.4 + new modules, second run, smaller NIMAS sample = doesn't work

Mac, pipeline2-1.4 + new modules, first run, smaller NIMAS sample = doesn't work

Mac, current master branch, first run, smaller NIMAS sample = doesn't work

Where


first run vs second run = the WS is brought up on the first run; on the second run, it is already up

works = produces results in an output directory

doesn't work = hangs

smaller NIMAS sample = 0547647034NIMAS; let me know if you need it.

Marisa

Marisa DeMeglio

unread,
Feb 26, 2013, 3:22:52 PM2/26/13
to daisy-pip...@googlegroups.com
So I'm trying with the jobs-refactor branch of pipeline-framework. Here are my results (all on Mac). FYI, the pipeline was started from scratch for each test:

Small job:

./dp2 nimas-fileset-validator --i-source ../../../../../pipeline-scripts/nimas-fileset-validator/src/test/MathML_Modular_Extension_Example_2/MathML_Sample2.opf --x-output-dir /tmp/dpout 
[DP2] Waiting for the WS to come up
[DP2] The daisy pipeline 2 WS is up!
[DP2] Job with id 8512c1e4-46fb-449b-9d73-141daff96c7b submitted to the server
[WS] INFO(1) - Message:Nimas fileset validator: Loading DTBook document.
[WS] INFO(2) - Message:Nimas fileset validator: Validating DTBook document.
[WS] INFO(3) - Message:Nimas fileset validator: Loading DTBook document.
[WS] INFO(4) - Message:Nimas fileset validator: Validating DTBook document.
[WS] INFO(5) - Message:Nimas fileset validator: Validating package document.
[WS] INFO(6) - Message:(debug) Nimas fileset validator: Validating package document against RNG.
[WS] INFO(7) - Message:Nimas fileset validator: Checking that PDFs exist on disk.
[WS] INFO(8) - Message:(debug) Nimas fileset validator: Validating package document; combining reports.
[WS] INFO(9) - Message:(debug) Nimas fileset validator: Validating package document against schematron.
[WS] INFO(10) - Message:Nimas fileset validator: Formatting report as HTML.
[WS] INFO(11) - Message:Nimas fileset validator: Creating reports index.
[WS] INFO(12) - Message:Nimas fileset validator: Storing reports to disk in output directory: file:/tmp/dpout
[DP2] ERROR

Log, right before it reports an error:

2013-02-26 11:18:26,170 [ERROR] org.daisy.pipeline.job.Job - job finished with error state
java.lang.IllegalArgumentException: URI is not absolute
at java.net.URI.toURL(URI.java:1095) ~[na:1.7.0_09]
at org.daisy.common.xproc.calabash.CalabashXProcResult.writeTo(CalabashXProcResult.java:98) ~[na:na]
at org.daisy.pipeline.job.AbstractJobContext.writeResult(AbstractJobContext.java:191) ~[na:na]
at org.daisy.pipeline.persistence.jobs.PersistentJobContext.writeResult(PersistentJobContext.java:217) ~[na:na]
at org.daisy.pipeline.job.Job.run(Job.java:125) ~[na:na]
at org.daisy.pipeline.job.DefaultJobExecutionService$1.run(DefaultJobExecutionService.java:65) ~[na:na]
at java.lang.Thread.run(Thread.java:722) ~[na:1.7.0_09]


However, even though there was an error, the correct output was saved in /tmp/dpout.


Medium job (log attached):

/dp2 nimas-fileset-validator --i-source ~/Downloads/0547647034NIMAS/0547647034NIMAS.opf --x-output-dir /tmp/dpout
[DP2] Waiting for the WS to come up
[DP2] The daisy pipeline 2 WS is up!
[DP2] Job with id 49a80867-8e5e-49bf-80ee-66857fc03073 submitted to the server
[WS] INFO(1) - Message:Nimas fileset validator: Loading DTBook document.
[WS] INFO(2) - Message:Nimas fileset validator: Validating DTBook document.
[WS] INFO(3) - Message:Nimas fileset validator: Validating package document.
[WS] INFO(4) - Message:(debug) Nimas fileset validator: Validating package document against RNG.
[WS] INFO(5) - Message:Nimas fileset validator: Checking that PDFs exist on disk.
[WS] INFO(6) - Message:(debug) Nimas fileset validator: Validating package document; combining reports.
[WS] INFO(7) - Message:(debug) Nimas fileset validator: Validating package document against schematron.
[WS] INFO(8) - Message:Nimas fileset validator: Formatting report as HTML.
[WS] INFO(9) - Message:Nimas fileset validator: Creating reports index.

It got to the last step in Nimas fileset validator and then it just hung there for a minute (then I killed it with ctrl + C). The log shows repeating log messages from "org.eclipse.jetty.http.HttpParser".


Large job (log attached):

./dp2 nimas-fileset-validator --i-source ~/Downloads/9780618900824NIMAS/9780618900824NIMAS.opf --x-output-dir /tmp/dpout[DP2] 
[DP2] Waiting for the WS to come up
[DP2] The daisy pipeline 2 WS is up!
[DP2] Job with id d80b04c8-dd17-4fa6-a8dd-21872256fa31 submitted to the server
[WS] INFO(1) - Message:Nimas fileset validator: Loading DTBook document.
[WS] INFO(2) - Message:Nimas fileset validator: Validating DTBook document.
[WS] INFO(3) - Message:Nimas fileset validator: Validating package document.
[WS] INFO(4) - Message:(debug) Nimas fileset validator: Validating package document against RNG.

For this one, it got to an intermediate step and then hung there for several minutes before I killed it. You can see the repeating messages again in the log.

Marisa

large-job-daisy-pipeline.log
medium-job-daisy-pipeline.log

Javier Asensio

unread,
Feb 26, 2013, 3:41:59 PM2/26/13
to daisy-pip...@googlegroups.com
Hi marisa,
I had it a go this morning, I'm still with the nice out of mem error.  I'm about to test it on windows. and see if I got the mem error or not. The only difference between your execution with oxygen and the pipeline seems to be saxon (oxygen uses ee and we use he), ee is suposed to be faster, but here the difference is just too big (I guess). Any of the files are querying for dtd's not included in the module? I'll try to get back to you tomorrow

cheers,

Javi



cheers,

Javi



cheers,

Javi


Romain.


./dp2 nimas-fileset-validator --i-source ~/Downloads/9780618900824NIMAS/9780618900824NIMAS.opf --x-output-dir /tmp/dpout

[DP2] Waiting for the WS to come up
[DP2] The daisy pipeline 2 WS is up!
[DP2] Job with id aabb17f0-c25c-4c64-9ceb-2473916c2e26 submitted to the server
[WS] INFO(1) - Message:Nimas fileset validator: Loading DTBook document.
[WS] INFO(2) - Message:Nimas fileset validator: Validating DTBook document.
[WS] INFO(3) - Message:Nimas fileset validator: Validating package document.
[WS] INFO(4) - Message:(debug) Nimas fileset validator: Validating package document against RNG.
It hangs for a very long time at this point. I have confirmed that it will hang for more than 3 hours with only jetty messages continually appearing in the log. A log file is attached for this job execution (though not the 3-hour version :)

I have another, smaller NIMAS sample that causes it to hang at a different point:

[DP2] Job with id 5dbfd42e-cfa5-4f95-9ab6-09fa6d38897a submitted to the server
[WS] INFO(1) - Message:Nimas fileset validator: Loading DTBook document.
[WS] INFO(2) - Message:Nimas fileset validator: Validating DTBook document.
[WS] INFO(3) - Message:Nimas fileset validator: Validating package document.
[WS] INFO(4) - Message:(debug) Nimas fileset validator: Validating package document against RNG.
[WS] INFO(5) - Message:Nimas fileset validator: Checking that PDFs exist on disk.
[WS] INFO(6) - Message:(debug) Nimas fileset validator: Validating package document; combining reports.
[WS] INFO(7) - Message:(debug) Nimas fileset validator: Validating package document against schematron.
[WS] INFO(8) - Message:Nimas fileset validator: Formatting report as HTML.
[WS] INFO(9) - Message:Nimas fileset validator: Creating reports index.
Compare with another, smaller sample (included in the Nimas fileset validator test dir), which runs just fine:
./dp2 nimas-fileset-validator --i-source ../../../../../pipeline-scripts/nimas-fileset-validator/src/test/MathML_Modular_Extension_Example_2/MathML_Sample2.opf --x-output-dir /tmp/dpout 
[DP2] Job with id 3544229a-2855-4232-96b0-1d918c4f94d1 submitted to the server
[WS] INFO(1) - Message:Nimas fileset validator: Loading DTBook document.
[WS] INFO(2) - Message:Nimas fileset validator: Validating DTBook document.
[WS] INFO(3) - Message:Nimas fileset validator: Loading DTBook document.
[WS] INFO(4) - Message:Nimas fileset validator: Validating DTBook document.
[WS] INFO(5) - Message:Nimas fileset validator: Validating package document.
[WS] INFO(6) - Message:(debug) Nimas fileset validator: Validating package document against RNG.
[WS] INFO(7) - Message:Nimas fileset validator: Checking that PDFs exist on disk.
[WS] INFO(8) - Message:(debug) Nimas fileset validator: Validating package document; combining reports.
[WS] INFO(9) - Message:(debug) Nimas fileset validator: Validating package document against schematron.
[WS] INFO(10) - Message:Nimas fileset validator: Formatting report as HTML.
[WS] INFO(11) - Message:Nimas fileset validator: Creating reports index.
[WS] INFO(12) - Message:Nimas fileset validator: Storing reports to disk in output directory: file:/tmp/dpout
[DP2]  The job 3544229a-2855-4232-96b0-1d918c4f

Marisa DeMeglio

unread,
Feb 26, 2013, 4:00:25 PM2/26/13
to daisy-pip...@googlegroups.com
Hi Javi,

The DTBook and package DTDs are in the catalog; those should be the only ones it requires.  Most likely, if it were an error like that, then we'd see issues with the small samples too. In this case, our smallest sample is as complex, content-wise, as the larger ones.

Where are you seeing the out of mem errors? I don't see them in the log or anything. Is there a dev/debug switch I can set?

I will see if I can make Oxygen use Saxon HE instead and see if that makes a difference.

I'll be on Skype tomorrow if you want to chat.

Marisa

Javier Asensio

unread,
Feb 27, 2013, 7:46:10 AM2/27/13
to daisy-pip...@googlegroups.com
Hey,

I'm an idiot, I was wondring why I didn't get any messages from the script and I happened to be using an old version of the module. Anyway, try to start the fwk manually
pipeline2 > /dev/null

The out of memory error doesn't get logged but it displays on the terminal and we get stuck on the same point. The error seems to be thrown when treating some kind of validation error in the document. How big is the err report you get in oxygen from the rng validation? Does this behaviour happen also with saxon-he in oxygen?

cheers

cheers,

Javi

Marisa DeMeglio

unread,
Mar 4, 2013, 3:49:50 PM3/4/13
to daisy-pip...@googlegroups.com
I am experiencing out of memory issues even when just validating the package doc with RNG. 

Just to describe the full NIMAS fileset validation process, here are the main steps:

1. Validate DTBook document(s) using dtbook-validator
2. Validate package document against RNG, Schematron, and also check that any referenced PDFs exist on disk.

I'm testing using the rather large files in 9780618900824NIMAS (let me know if you need a copy and I can share it off-list).

I can run dtbook-validator by itself on the fairly large (2.3MB) DTBook file many times with no memory issues

Then I made a super-simple nimas-fileset-validator, that only runs RNG validation on the OPF file (415k). If I run it twice in a row, I get an out of memory issue the second time.

Here's the RNG file we're using for package docs. I believe it came from ZedVal:

Does it look unusual to anyone? I noticed that the DTBook RNG file(s), which work just fine, do not use schematron inline, so I tried making a light version of the package doc RNG, but, not surprisingly, it still failed.

To try the very simple test yourself, replace pipeline-scripts/nimas-fileset-validator/src/main/resources/xml/nimas-fileset-validator.xml with the attached file. Run as usual:

./dp2 nimas-fileset-validator --i-source path/to/OPF 

Marisa

nimas-fileset-validator.xpl

Marisa DeMeglio

unread,
Mar 4, 2013, 3:52:07 PM3/4/13
to daisy-pip...@googlegroups.com
Oh yeah, here is a bit of the debug output around the out of mem error:

INFO  [    o.d.p.job.DefaultJobExecutionService] Starting to log to job's log file too:af8dd2f4-c65d-467d-a608-9b60d1e0f949   @o.d.p.j.DefaultJobExecutionService$1:62#run
DEBUG [          org.daisy.pipeline.push.Poster] Status changed af8dd2f4-c65d-467d-a608-9b60d1e0f949->RUNNING   @org.daisy.pipeline.push.PushNotifier:102#handleStatus
Mar 04, 2013 12:38:37 PM org.restlet.engine.log.LogFilter afterHandle
INFO: 2013-03-04 12:38:37 0:0:0:0:0:0:0:1 - 0:0:0:0:0:0:0:1 8181 POST /ws/jobs - 201 - 259 23 http://localhost:8181 Ruby -
DEBUG [d.c.x.c.DynamicXProcConfigurationFactory] Reading Calabash configuration from /Users/marisa/Projects/pipeline2/pipeline-assembly/target/pipeline2-1.4/daisy-pipeline/etc/config-calabash.xml   @o.d.c.x.c.DynamicXProcConfigurationFactory:168#loadConfigurationFile
DEBUG [    o.e.jetty.server.AsyncHttpConnection] Enabled read interest SCEP@3cbe09e0{l(/0:0:0:0:0:0:0:1:56101)<->r(/0:0:0:0:0:0:0:1:8181),d=true,open=true,ishut=false,oshut=false,rb=false,wb=false,w=true,i=1r}-{AsyncHttpConnection@607a7047,g=HttpGenerator{s=4,h=0,b=0,c=-1},p=HttpParser{s=0,l=33,c=259},r=16}   @o.e.jetty.server.AsyncHttpConnection:206#reset
DEBUG [       org.eclipse.jetty.http.HttpParser] filled 0/0   @org.eclipse.jetty.http.HttpParser:276#parseNext
DEBUG [d.c.x.c.DynamicXProcConfigurationFactory] Reading Calabash configuration from /Users/marisa/Projects/pipeline2/pipeline-assembly/target/pipeline2-1.4/daisy-pipeline/etc/config-calabash.xml   @o.d.c.x.c.DynamicXProcConfigurationFactory:168#loadConfigurationFile
DEBUG [       org.eclipse.jetty.http.HttpParser] filled 124/124   @org.eclipse.jetty.http.HttpParser:276#parseNext
Mar 04, 2013 12:38:39 PM org.restlet.engine.log.LogFilter afterHandle
INFO: 2013-03-04 12:38:39 0:0:0:0:0:0:0:1 - 0:0:0:0:0:0:0:1 8181 GET /ws/jobs/af8dd2f4-c65d-467d-a608-9b60d1e0f949 msgSeq=0 200 - 0 19 http://localhost:8181 Ruby -
DEBUG [    o.e.jetty.server.AsyncHttpConnection] Enabled read interest SCEP@3cbe09e0{l(/0:0:0:0:0:0:0:1:56101)<->r(/0:0:0:0:0:0:0:1:8181),d=true,open=true,ishut=false,oshut=false,rb=false,wb=false,w=true,i=1r}-{AsyncHttpConnection@607a7047,g=HttpGenerator{s=4,h=0,b=0,c=-1},p=HttpParser{s=0,l=14,c=0},r=17}   @o.e.jetty.server.AsyncHttpConnection:206#reset
DEBUG [       org.eclipse.jetty.http.HttpParser] filled 0/0   @org.eclipse.jetty.http.HttpParser:276#parseNext
DEBUG [       org.eclipse.jetty.http.HttpParser] filled 124/124   @org.eclipse.jetty.http.HttpParser:276#parseNext
Mar 04, 2013 12:38:47 PM org.restlet.engine.log.LogFilter afterHandle
INFO: 2013-03-04 12:38:47 0:0:0:0:0:0:0:1 - 0:0:0:0:0:0:0:1 8181 GET /ws/jobs/af8dd2f4-c65d-467d-a608-9b60d1e0f949 msgSeq=0 200 - 0 6156 http://localhost:8181 Ruby -
DEBUG [    o.e.jetty.server.AsyncHttpConnection] Enabled read interest SCEP@3cbe09e0{l(/0:0:0:0:0:0:0:1:56101)<->r(/0:0:0:0:0:0:0:1:8181),d=true,open=true,ishut=false,oshut=false,rb=false,wb=false,w=true,i=1r}-{AsyncHttpConnection@607a7047,g=HttpGenerator{s=4,h=0,b=0,c=-1},p=HttpParser{s=0,l=14,c=0},r=18}   @o.e.jetty.server.AsyncHttpConnection:206#reset
DEBUG [       org.eclipse.jetty.http.HttpParser] filled 0/0   @org.eclipse.jetty.http.HttpParser:276#parseNext
Exception in thread "Thread-20" java.lang.OutOfMemoryError: Java heap space
at net.sf.saxon.tree.tiny.TinyTree.<init>(TinyTree.java:198)
at net.sf.saxon.tree.tiny.TinyTree.<init>(TinyTree.java:170)
at net.sf.saxon.tree.tiny.TinyBuilder.open(TinyBuilder.java:116)
at net.sf.saxon.event.ProxyReceiver.open(ProxyReceiver.java:80)
at net.sf.saxon.event.ProxyReceiver.open(ProxyReceiver.java:80)
at com.xmlcalabash.util.TreeWriter.startDocument(Unknown Source)
at com.xmlcalabash.library.ValidateJing$RNGErrorHandler.error(Unknown Source)
at com.thaiopensource.xml.sax.CountingErrorHandler.error(CountingErrorHandler.java:65)
at com.thaiopensource.relaxng.sax.PatternValidator.check(PatternValidator.java:116)
at com.thaiopensource.relaxng.sax.PatternValidator.startElement(PatternValidator.java:38)
at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.startElement(AbstractSAXParser.java:506)
at com.sun.org.apache.xerces.internal.parsers.AbstractXMLDocumentParser.emptyElement(AbstractXMLDocumentParser.java:182)
at com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.scanStartElement(XMLNSDocumentScannerImpl.java:353)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(XMLDocumentFragmentScannerImpl.java:2717)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(XMLDocumentScannerImpl.java:607)
at com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.next(XMLNSDocumentScannerImpl.java:116)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:489)
at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:835)
at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:764)
at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:123)
at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1210)
at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(SAXParserImpl.java:568)
at com.thaiopensource.validate.ValidationDriver.validate(ValidationDriver.java:160)
at com.xmlcalabash.library.ValidateJing.run(Unknown Source)
at com.xmlcalabash.runtime.XAtomicStep.run(Unknown Source)
at com.xmlcalabash.runtime.XCompoundStep.run(Unknown Source)
at com.xmlcalabash.runtime.XTry.run(Unknown Source)
at com.xmlcalabash.runtime.XPipeline.doRun(Unknown Source)
at com.xmlcalabash.runtime.XPipeline.run(Unknown Source)
at com.xmlcalabash.runtime.XPipelineCall.run(Unknown Source)
at com.xmlcalabash.runtime.XPipeline.doRun(Unknown Source)
at com.xmlcalabash.runtime.XPipeline.run(Unknown Source)

<nimas-fileset-validator.xpl>

Marisa DeMeglio

unread,
Mar 6, 2013, 8:59:56 PM3/6/13
to daisy-pip...@googlegroups.com
Unfortunately the out of memory error occurs using vanilla calabash too:

XML Calabash version 1.0.8 (for Saxon 9.4.x), an XProc processor.
Running on Saxon version 9.4.0.7, HE edition.


➜ ~/Projects  calabash -isource=/Users/marisa/Downloads/9780618900824NIMAS/9780618900824NIMAS.opf -oresult=/tmp/xprocout/result.html /Users/marisa/Projects/pipeline2/pipeline-scripts/nimas-fileset-validator/src/main/resources/xml/nimas-fileset-validator.xpl
Mar 06, 2013 5:53:46 PM com.xmlcalabash.drivers.Main warning
WARNING: You didn't specify any binding for the output port 'dtbook-validation-report', its output will be discard.
Mar 06, 2013 5:53:46 PM com.xmlcalabash.drivers.Main warning
WARNING: You didn't specify any binding for the output port 'reports-index', its output will be discard.
Message: Nimas fileset validator: validating files.
Message: Nimas fileset validator: Loading DTBook document.
Message: Nimas fileset validator: Validating DTBook document.
Message: Nimas fileset validator: Validating package document.
Message: (debug) Nimas fileset validator: Validating package document against RNG.
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at net.sf.saxon.tree.tiny.TinyTree.<init>(TinyTree.java:191)
at net.sf.saxon.tree.tiny.TinyTree.<init>(TinyTree.java:170)
at net.sf.saxon.tree.tiny.TinyBuilder.open(TinyBuilder.java:116)
at net.sf.saxon.event.ProxyReceiver.open(ProxyReceiver.java:80)
at net.sf.saxon.event.ProxyReceiver.open(ProxyReceiver.java:80)
at com.xmlcalabash.util.TreeWriter.startDocument(Unknown Source)
at com.xmlcalabash.library.ValidateJing$RNGErrorHandler.error(Unknown Source)
at com.thaiopensource.xml.sax.CountingErrorHandler.error(CountingErrorHandler.java:65)
at com.thaiopensource.relaxng.sax.PatternValidator.check(PatternValidator.java:116)
at com.thaiopensource.relaxng.sax.PatternValidator.startElement(PatternValidator.java:32)
at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.startElement(AbstractSAXParser.java:506)
at com.sun.org.apache.xerces.internal.parsers.AbstractXMLDocumentParser.emptyElement(AbstractXMLDocumentParser.java:182)
at com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.scanStartElement(XMLNSDocumentScannerImpl.java:353)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(XMLDocumentFragmentScannerImpl.java:2717)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(XMLDocumentScannerImpl.java:607)
at com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.next(XMLNSDocumentScannerImpl.java:116)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:489)
at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:835)
at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:764)
at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:123)
at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1210)
at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(SAXParserImpl.java:568)
at com.thaiopensource.validate.ValidationDriver.validate(ValidationDriver.java:160)
at com.xmlcalabash.library.ValidateJing.run(Unknown Source)
at com.xmlcalabash.runtime.XAtomicStep.run(Unknown Source)
at com.xmlcalabash.runtime.XCompoundStep.run(Unknown Source)
at com.xmlcalabash.runtime.XTry.run(Unknown Source)
at com.xmlcalabash.runtime.XPipeline.doRun(Unknown Source)
at com.xmlcalabash.runtime.XPipeline.run(Unknown Source)
at com.xmlcalabash.runtime.XPipelineCall.run(Unknown Source)
at com.xmlcalabash.runtime.XCompoundStep.run(Unknown Source)
at com.xmlcalabash.runtime.XPipeline.doRun(Unknown Source)

Marisa DeMeglio

unread,
Mar 6, 2013, 9:30:23 PM3/6/13
to daisy-pip...@googlegroups.com
Ok! If I set the heap to 1.6GB, it works :)  E.g. java -Xmx1600m <calabash stuff> . 
What does this mean for us? Is there a way to dynamically allocate heap space depending on content size? Assuming we could even draw a correlation between the two -- like I said before, a 2.3 MB DTBook presents no problems but a 400K package doc does.

Romain Deltour

unread,
Mar 7, 2013, 3:52:48 AM3/7/13
to daisy-pip...@googlegroups.com
Wow, that's pretty bad. We can definitely increase the default max heap space value, but 1.6GB seems like a very high value for a default. And a 400k document doesn't sound terrible in itself, it shouldn't make the validation choke.

We should have yet another look at the RelaxNG to see if there is a special construct there that could cause the issues, and else we should also try to profile the execution to identify a potential memory leak. It seems we have three possible candidates: Calabash, Jing and Saxon.

Romain.

Jostein Austvik Jacobsen

unread,
Mar 7, 2013, 3:59:18 AM3/7/13
to daisy-pip...@googlegroups.com
The next version of Calabash will support profiling of XProc scripts[1]. It could be very useful to identify bottlenecks in our pipelines.

Romain Deltour

unread,
Mar 7, 2013, 5:46:52 AM3/7/13
to daisy-pip...@googlegroups.com
I had a further look.  It's running out of memory while creating the error report document in the ValidateJing implementation of p:validate-with-relax-ng… there are 4888 items in the OPF and almost all of them have a bad media-type.

I now have to look if there's a mem leak somewhere or if we've just reached a legitimate limit, in which case we can set a threshold of the number of reported errors (as a patch to Calabash).

R.

Romain Deltour

unread,
Mar 7, 2013, 6:38:55 AM3/7/13
to daisy-pip...@googlegroups.com
FYI I created a simple repro:


The limit seems to be around 3000 errors on my setup (latest Calabash, ran from sources in Eclipse).

Romain.

Romain Deltour

unread,
Mar 7, 2013, 6:44:54 AM3/7/13
to daisy-pip...@googlegroups.com
And I reported the issue to Calabash:

R.

Keith Creasy

unread,
Mar 7, 2013, 7:07:24 AM3/7/13
to daisy-pip...@googlegroups.com

Just FYI. The Pipeline 1 has the same issue. It simply fails to create a complete error report after it hits a certain number of errors. I’d guess this is a limitation of jing. Would there be any way of truncating the error list if you hit many errors of the same type? Such as a failure that simply gives the failure info with an annotation that there are <nnnn> more….

 

Maybe this is too intrinsic to jing and out of your control.

 

Just thoughts.

 

Keith

Greg Kearney

unread,
Mar 7, 2013, 7:15:20 AM3/7/13
to daisy-pip...@googlegroups.com
Would it be possible to write multiple files if the list grows too large. 

Sent from my iPhone

Greg Kearney
Association for the Blind of Western Australia

Romain Deltour

unread,
Mar 7, 2013, 7:51:05 AM3/7/13
to daisy-pip...@googlegroups.com
Right now all of this depends on some code that is out of our control. It may be possible to write our own custom validation step that can be configured with a threshold value, but I'd first want to make sure the issue is not just a mere memory leak.

Romain.

Romain Deltour

unread,
Mar 7, 2013, 8:33:38 AM3/7/13
to daisy-pip...@googlegroups.com
OK, so I've identified and fixed a mem leak in Calabash [1]. Now it passes the RelaxNG validation but runs out of memory at the schematron stage:

$ calabash -i source=9780618900824NIMAS/9780618900824NIMAS.opf ../../modules/scripts/nimas-fileset-validator/src/main/resources/xml/nimas-fileset-validator.xpl 
Mar 7, 2013 2:29:55 PM com.xmlcalabash.drivers.Main warning
WARNING: You didn't specify any binding for the output port 'package-doc-validation-report', its output will be discard.
Mar 7, 2013 2:29:55 PM com.xmlcalabash.drivers.Main warning
WARNING: You didn't specify any binding for the output port 'dtbook-validation-report', its output will be discard.
Mar 7, 2013 2:29:55 PM com.xmlcalabash.drivers.Main warning
WARNING: You didn't specify any binding for the output port 'reports-index', its output will be discard.
Message: Nimas fileset validator: validating files.
Message: Nimas fileset validator: Loading DTBook document.
Message: Nimas fileset validator: Validating DTBook document.
Message: Nimas fileset validator: Validating package document.
Message: (debug) Nimas fileset validator: Validating package document against RNG.
Message: Nimas fileset validator: Checking that PDFs exist on disk.
Message: (debug) Nimas fileset validator: Validating package document; combining reports.
Message: (debug) Nimas fileset validator: Validating package document against schematron.
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Arrays.java:2882)
at java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:100)
at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:515)
at java.lang.StringBuffer.append(StringBuffer.java:306)
at java.io.StringWriter.write(StringWriter.java:77)
at net.sf.saxon.tree.tiny.CharSlice.write(CharSlice.java:217)
at net.sf.saxon.serialize.XMLEmitter.writeCharSequence(XMLEmitter.java:637)
at net.sf.saxon.serialize.XMLEmitter.writeEscape(XMLEmitter.java:728)
at net.sf.saxon.serialize.XMLEmitter.characters(XMLEmitter.java:573)
at net.sf.saxon.event.ProxyReceiver.characters(ProxyReceiver.java:186)
at net.sf.saxon.event.ComplexContentOutputter.characters(ComplexContentOutputter.java:169)
at net.sf.saxon.tree.tiny.TinyElementImpl.copy(TinyElementImpl.java:360)
at net.sf.saxon.event.ComplexContentOutputter.append(ComplexContentOutputter.java:525)
at net.sf.saxon.event.ComplexContentOutputter.append(ComplexContentOutputter.java:514)
at net.sf.saxon.expr.Expression.process(Expression.java:527)
at net.sf.saxon.query.XQueryExpression.run(XQueryExpression.java:414)
at net.sf.saxon.s9api.XQueryEvaluator.run(XQueryEvaluator.java:323)
at com.xmlcalabash.util.S9apiUtils.serialize(Unknown Source)
at com.xmlcalabash.util.S9apiUtils.serialize(Unknown Source)
at com.xmlcalabash.util.S9apiUtils.xdmToInputSource(Unknown Source)
at com.xmlcalabash.library.ValidateJing.run(Unknown Source)
at com.xmlcalabash.runtime.XAtomicStep.run(Unknown Source)
at com.xmlcalabash.runtime.XPipeline.doRun(Unknown Source)
at com.xmlcalabash.runtime.XPipeline.run(Unknown Source)
at com.xmlcalabash.runtime.XPipelineCall.run(Unknown Source)
at com.xmlcalabash.runtime.XPipeline.doRun(Unknown Source)
at com.xmlcalabash.runtime.XPipeline.run(Unknown Source)
at com.xmlcalabash.runtime.XPipelineCall.run(Unknown Source)
at com.xmlcalabash.runtime.XCompoundStep.run(Unknown Source)
at com.xmlcalabash.runtime.XChoose.run(Unknown Source)
at com.xmlcalabash.runtime.XCompoundStep.run(Unknown Source)
at com.xmlcalabash.runtime.XPipeline.doRun(Unknown Source)

Romain.

Romain Deltour

unread,
Mar 7, 2013, 11:47:05 AM3/7/13
to daisy-pip...@googlegroups.com
The validation now works for me!

I've identified another memory performance issue in Calabash, in the S9apiUtils#xdmToInputSource method. I've committed another pull request to improve it.

I integrated the two pull requests in my own clone of Calabash and deployed a 1.0.9-SNAPSHOT version to our snapshot repo. You can give it a try by updating the version of calabash in the assembly POM.

Cheers,
Romain.
Reply all
Reply to author
Forward
0 new messages