disk I/O error - GBSSeqToTagDBPlugin

177 views
Skip to first unread message

Varella, Andrea

unread,
May 23, 2016, 4:17:49 PM5/23/16
to tas...@googlegroups.com

 Hi all,

 

This is the first time I am using tassel. I am trying to run the GBSSeqToTagDBPlugin but I keep getting the following error msg:

 

disk I/O error

java.sql.SQLException: disk I/O error

        at org.sqlite.core.NativeDB.throwex(NativeDB.java:397)

        at org.sqlite.core.NativeDB._exec(Native Method)

        at org.sqlite.jdbc3.JDBC3Statement.executeUpdate(JDBC3Statement.java:116)

        at net.maizegenetics.dna.tag.TagDataSQLite.<init>(TagDataSQLite.java:98)

        at net.maizegenetics.analysis.gbs.v2.GBSSeqToTagDBPlugin.processData(GBSSeqToTagDBPlugin.java:232)

        at net.maizegenetics.plugindef.AbstractPlugin.performFunction(AbstractPlugin.java:110)

        at net.maizegenetics.plugindef.AbstractPlugin.dataSetReturned(AbstractPlugin.java:1631)

        at net.maizegenetics.plugindef.ThreadedPluginListener.run(ThreadedPluginListener.java:29)

        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)

        at java.util.concurrent.FutureTask.run(FutureTask.java:266)

        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)

        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)

        at java.lang.Thread.run(Thread.java:745)

 

Start processing batch 1

Enzyme: PstI

Enzyme: PstI

java.lang.ArrayIndexOutOfBoundsException: 0

        at net.maizegenetics.analysis.gbs.v2.GBSUtils.initializeBarcodeTrie(GBSUtils.java:159)

        at net.maizegenetics.analysis.gbs.v2.GBSSeqToTagDBPlugin.processFastQFile(GBSSeqToTagDBPlugin.java:303)

        at net.maizegenetics.analysis.gbs.v2.GBSSeqToTagDBPlugin.lambda$processData$86(GBSSeqToTagDBPlugin.java:243)

       at net.maizegenetics.analysis.gbs.v2.GBSSeqToTagDBPlugin$$Lambda$19/231886004.accept(Unknown Source)

        at java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:184)

        at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1374)

        at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:512)

        at java.util.stream.ForEachOps$ForEachTask.compute(ForEachOps.java:291)

        at java.util.concurrent.CountedCompleter.exec(CountedCompleter.java:731)

        at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)

        at java.util.concurrent.ForkJoinTask.doInvoke(ForkJoinTask.java:401)

        at java.util.concurrent.ForkJoinTask.invoke(ForkJoinTask.java:734)

        at java.util.stream.ForEachOps$ForEachOp.evaluateParallel(ForEachOps.java:160)

        at java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateParallel(ForEachOps.java:174)

        at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:233)

        at java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:418)

        at java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:583)

        at net.maizegenetics.analysis.gbs.v2.GBSSeqToTagDBPlugin.processData(GBSSeqToTagDBPlugin.java:241)

        at net.maizegenetics.plugindef.AbstractPlugin.performFunction(AbstractPlugin.java:110)

        at net.maizegenetics.plugindef.AbstractPlugin.dataSetReturned(AbstractPlugin.java:1631)

        at net.maizegenetics.plugindef.ThreadedPluginListener.run(ThreadedPluginListener.java:29)

        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)

        at java.util.concurrent.FutureTask.run(FutureTask.java:266)

        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)

        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)

        at java.lang.Thread.run(Thread.java:745)

[pool-1-thread-1] INFO net.maizegenetics.plugindef.AbstractPlugin - Finished net.maizegenetics.analysis.gbs.v2.GBSSeqToTagDBPlugin: time: May 23, 2016 11:46:59

[pool-1-thread-1] INFO net.maizegenetics.pipeline.TasselPipeline - net.maizegenetics.analysis.gbs.v2.GBSSeqToTagDBPlugin: time: May 23, 2016 11:46:59: progress: 100%

 

Does anybody know what all that mean?

I appreciated your help.

 

Andrea

 

--

Andrea C. Varella Ph.D.  

Montana State University

Plant Sciences and Plant Pathology Department

Linfield Hall - Bozeman MT

 

Lynn Carol Johnson

unread,
May 23, 2016, 5:19:36 PM5/23/16
to tas...@googlegroups.com
Hi Andrea-

This error is thrown at the very beginning of processing when an attempt is made to setup an SQLite instance.  Are you creating the db on the disk where you have write permission?  Googling shows various reasons for this error, including that the /tmp directory is full.  Check if you are having memory issues on the system.

Are you able to run anything else in TASSEL?   Are you running TASSEL standalone, and was this downloaded from the web?  Perhaps there was an installation problem.

Thanks - Lynn

--
You received this message because you are subscribed to the Google Groups "TASSEL - Trait Analysis by Association, Evolution and Linkage" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tassel+un...@googlegroups.com.
To post to this group, send email to tas...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tassel/SN1PR0201MB193366AC533DB50F787CE25A844E0%40SN1PR0201MB1933.namprd02.prod.outlook.com.
For more options, visit https://groups.google.com/d/optout.

t58...@montana.edu

unread,
May 24, 2016, 2:04:22 PM5/24/16
to TASSEL - Trait Analysis by Association, Evolution and Linkage

Hi Lynn,

 

I have tassel-5-standalone and tassel 5.0.0 on the university server. I tried to run the GBSSeqToTagDBPlugin with both of them, but I keep getting the same error message. I know I am creating the db file on a disk that I have write permission, so that should not be the problem. 


I guess this is is a memory problem then. I will talk to the IT people from the university and see if we can make it work.

 

Thanks for your help!


Andrea

Idalia Rojas

unread,
Jan 27, 2017, 7:40:09 PM1/27/17
to TASSEL - Trait Analysis by Association, Evolution and Linkage
Hi Andrea,

I got the same error as you, and I´m also running this on a remote server. Did your problem was a memory issue? Cause as you, I think I have the permission to write on the disk.

Thanks in advanced,

Idalia Rojas

Matthew Peterson

unread,
Feb 24, 2017, 7:26:10 PM2/24/17
to TASSEL - Trait Analysis by Association, Evolution and Linkage
Hello,

I recently ran Tassel Version 5.2.33 (January 12, 2017) on a run composed of 8 Illumina HiSeq 2000 lanes, with a total of 1,412,833,762 reads.

1) Allocating Tassel 20 GB of ram (via -Xmx) resulted with the GBSSeqToTagDBPlugin to fail with:

[pool-1-thread-1] ERROR net.maizegenetics.plugindef.ThreadedPluginListener - Out of Memory: GBSSeqToTagDBPlugin could not complete task:

2) Re-running with 30 GB of ram (via -Xmx) resulted with the GBSSeqToTagDBPlugin to fail with:

java.sql.SQLException: [SQLITE_IOERR] Some kind of disk I/O error occurred (disk I/O error)

3) Re-running with 42 GB of ram (via -Xmx) resulted with the GBSSeqToTagDBPlugin to fail with numerous:

tagInsertPS.executeBatch() 100001

Followed by:
java.sql.SQLException: disk I/O error

4) I finally re-ran with -Xmx allocating 200 GB of RAM, and watching `top` showed that the GBSSeqToTagDBPlugin step used approximately 106 GB, which then the plugin completed successfully. (The machine this ran on had 32 cores and 256 GB of RAM).

Questions:

1) Is there a protocol to determine before I start a run ("back of the envelope calculation") how much RAM I'll need to allocate Tassel (via -Xmx) to complete the GBSSeqToTagDBPlugin step successfully? e.g., based upon the number of reads in the FASTQ files I'm providing?

2) Is there a way to instruct the GBSSeqToTagDBPlugin to artificially cap how much RAM it uses, independent of what the JVM is provided via -Xmx?

3) Is there a way in the lower memory examples above to have the plugin "fail" with a more user friendly error (instead of an SQL exception)?

Thank you,
Matthew

Idalia Rojas

unread,
Feb 24, 2017, 7:48:38 PM2/24/17
to TASSEL - Trait Analysis by Association, Evolution and Linkage
Hello Matthew:

Maybe you can use the info at Tassel 3 manual as reference:

"Our initial default value of the -s option was 200 million, which required 6G of available RAM, allowing the pipeline
to be run on a computer with a total of 8GB of RAM. However, as Illumina next gen sequencing technology has
improved, FASTQ files with more than 200 million good, barcoded reads have become more commonplace.
Therefore, we have now increased the default value of -s to 300 million, which might require a computer with
more than 8GB of RAM (a 16GB machine should suffice).
If the console output of the FastqToTagCountsPlugin indicates that exactly 300 million good, barcoded reads (or
whatever you set -s to) were found in one or more of the input files, then you should increase the -s parameter,
provided that your computer has enough memory. "

Best,

Idalia


El viernes, 24 de febrero de 2017, 18:26:10 (UTC-6), Matthew Peterson escribió:
Hello,

I recently ran Tassel Version 5.2.33 (January 12, 2017) on a run composed of 8 Illumina HiSeq 2000 lanes, with a total of 1,412,833,762 reads.

1) Allocating Tassel 20 GB of ram (via -Xmx) resulted with the GBSSeqToTagDBPlugin to fail with:

[pool-1-thread-1] ERROR net.maizegenetics.plugindef.ThreadedPluginListener - Out of Memory: GBSSeqToTagDBPlugin could not complete task:

2) Re-running with 30 GB of ram (via -Xmx) resulted with the GBSSeqToTagDBPlugin to fail with:

java.sql.SQLException: [SQLITE_IOERR] Some kind of disk I/O error occurred (disk I/O error)

3) Re-running with 42 GB of ram (via -Xmx) resulted with the GBSSeqToTagDBPlugin to fail with numerous:

tagInsertPS.executeBatch() 100001

Followed by:
java.sql.SQLException: disk I/O error

4) I finally re-ran with -Xmx allocating 200 GB of RAM, and watching `top` showed that the GBSSeqToTagDBPlugin step used approximately 106 GB, which then the plugin completed successfully. (The machine this ran on had 32 cores and 256 GB of RAM).

Questions:

1) Is there a protocol to determine before I start a run ("back of the envelope calculation") how much RAM I'll need to allocate Tassel (via -Xmx) to complete the GBSSeqToTagDBPlugin step successfully? e.g., based upon the number of reads in the FASTQ files I'm providing?
You can find this info at Tassel 3 manua, maybe you can use it as a reference:

Matthew Peterson

unread,
Feb 24, 2017, 7:56:00 PM2/24/17
to TASSEL - Trait Analysis by Association, Evolution and Linkage
Idalia,

Thank you for the suggestion. I never had this memory issue using TASSEL v3, only TASSEL v5.

According to the TASSEL v5 documentation for the GBSSeqToTagDBPlugin (https://bitbucket.org/tasseladmin/tassel-5-source/wiki/Tassel5GBSv2Pipeline/GBSSeqToTagDBPlugin) there is no -s option. The TASSEL v5 documentation for the plugin references:

"Additionally, for this and the other plugins in the GBS pipeline, you may add the -Xms and -Xmx parameters to indicate the minimum and maximum amount of memory Java should use. The default values are platform specific. What you need will depend on your machine's available memory and the size of the data you intend to process."

I was able to overcome my issue by using 100+ GB of RAM, but using anything less (specified by -Xmx) results in the GBSSeqToTagDBPlugin to fail with an SQL exception. Ideally, I would like to tell the plugin to be more conservative in RAM usage (at the expense of it running slower), but I don't know if that is possible, i.e., if there is a command line option to do so.

Thank you again,
Matthew
Reply all
Reply to author
Forward
0 new messages