GetTagTaxaDistFromDBPlugin failed (out of memory) with 95 lanes of Illumina data, ~3000 taxa (samples)

140 views
Skip to first unread message

Liangliang Gao

unread,
Jul 10, 2017, 1:49:46 PM7/10/17
to TASSEL - Trait Analysis by Association, Evolution and Linkage
Dear TASSEL developers,


I am using TASSEL5 GBSV2 pipeline, to process Illumina data. I also need to use the plugin GetTagTaxaDistFromDBPlugin to derive per tag depths information. The test pipeline worked (28 samples from ~20 lanes of Illumina HiSeq data). However, when I tried to use the full panel. Around 300 breeding lines and 95 lanes of illumina data. The pipeline worked fine for calling SNPs, but failed to produce the TagTaxaDistribution information. Here is the error: 


Memory Settings: -Xms64G -Xmx360G
Tassel Pipeline Arguments: -fork1 -GetTagTaxaDistFromDBPlugin -db ventri2ns.db -o ventri2ns_TagTaxaDistOutput.txt -endPlugin -runfork1
[main] INFO net.maizegenetics.tassel.TasselLogging - Tassel Version: 5.2.37  Date: April 6, 2017
[main] INFO net.maizegenetics.tassel.TasselLogging - Max Available Memory Reported by JVM: 327680 MB
[main] INFO net.maizegenetics.tassel.TasselLogging - Java Version: 1.8.0_112
[main] INFO net.maizegenetics.tassel.TasselLogging - OS: Linux
[main] INFO net.maizegenetics.tassel.TasselLogging - Number of Processors: 24
[main] INFO net.maizegenetics.pipeline.TasselPipeline - Tassel Pipeline Arguments: [-fork1, -GetTagTaxaDistFromDBPlugin, -db, ventri2ns.db, -o, ventri2ns_TagTaxaDistOutput.txt, -endPlugin, -runfork1]
net.maizegenetics.analysis.gbs.v2.GetTagTaxaDistFromDBPlugin
[pool-1-thread-1] INFO net.maizegenetics.plugindef.AbstractPlugin - Starting net.maizegenetics.analysis.gbs.v2.GetTagTaxaDistFromDBPlugin: time: Jul 8, 2017 23:23:58
[pool-1-thread-1] INFO net.maizegenetics.plugindef.AbstractPlugin -
GetTagTaxaDistFromDBPlugin Parameters
db: ventri2ns.db
o: ventri2ns_TagTaxaDistOutput.txt

size of all tags in tag table=2439235
size of all tissues in tissue table=0
size of all tags in mappingApproach table=2
size of all taxa in taxa table=3184
[pool-1-thread-1] INFO net.maizegenetics.pipeline.TasselPipeline - net.maizegenetics.analysis.gbs.v2.GetTagTaxaDistFromDBPlugin: time: Jul 8, 2017 23:24:39: progress: 100%
[pool-1-thread-1] INFO net.maizegenetics.plugindef.AbstractPlugin - net.maizegenetics.analysis.gbs.v2.GetTagTaxaDistFromDBPlugin  Citation: Bradbury PJ, Zhang Z, Kroon DE, Casstevens TM, Ramdoss Y, Buckler ES. (2007) TASSEL: Software for association mapping of complex traits in diverse samples. Bioinformatics 23:2633-2635.
[pool-1-thread-1] ERROR net.maizegenetics.plugindef.ThreadedPluginListener - Out of Memory: GetTagTaxaDistFromDBPlugin could not complete task:
Current Max Heap Size: 327680 Mb
Use -Xmx option in start_tassel.pl or start_tassel.bat
to increase heap size. Included with tassel standalone zip.

I tried to adjust Xmx from 100G to 240G to 360G, all failed. I am currently queing for a job with memory set at 800G, however, this setting is approaching the limit of our computer cluster capabilities. I am not even sure that it will work out. Meanwhile, is there a way that I can do the tag depth information decoding by using custom scripts? I checked the information here: https://bitbucket.org/tasseladmin/tassel-5-source/wiki/Tassel5GBSv2Pipeline/depthsRLEExamples . Unfortunately, I am not quite familiar with Java, and am not sure if I can follow the steps.  Is it possible for me to adopt another langague (I use python or perl) to do the depth information decoding?  I am hoping to dig into the sqlite3 database and generate some intermediate files based on the depths information to overcome the seemingly memory problem.

Also, I actually do not need all of the tags taxa distribution information, I only need the taxa distribution information for about 1000 tags in particular.

Any suggestions to solve the problem would help! 

Thanks and Best Regards!

Liang




Terry Casstevens

unread,
Jul 10, 2017, 1:53:45 PM7/10/17
to Tassel User Group
Did you specify it this way?

-Xmx100g

The logging says 327680 Mb, which is less than 1g
> --
> You received this message because you are subscribed to the Google Groups
> "TASSEL - Trait Analysis by Association, Evolution and Linkage" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tassel+un...@googlegroups.com.
> To post to this group, send email to tas...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tassel/d9de336d-1830-4247-bfa2-e74a2438f546%40googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

Liangliang Gao

unread,
Jul 10, 2017, 3:02:30 PM7/10/17
to TASSEL - Trait Analysis by Association, Evolution and Linkage
Yes, I think that I specified it correctly. Below is the command line used:

tasselPath=$HOME/src/tassel-5-standalone/run_pipeline.pl
$tasselPath -Xms64G -Xmx360G -fork1 -GetTagTaxaDistFromDBPlugin \
    -db ${prj_name}.db \
    -o ${prj_name}_TagTaxaDistOutput.txt \
    -endPlugin -runfork1 >> z_${prj_name}_GetTagTaxaDistFromDBPlugin.out

Liangliang Gao

unread,
Jul 10, 2017, 3:09:34 PM7/10/17
to TASSEL - Trait Analysis by Association, Evolution and Linkage
Also, isn't 327680 Mb 327.68gb?

Lynn Carol Johnson

unread,
Jul 10, 2017, 3:50:55 PM7/10/17
to tas...@googlegroups.com

Liang –

 

While we are not currently updating methods related to GBSv2 functionality, the code is open-source.  You are welcome to create a modified version of GetTagTaxaDistFromDBPlugin that only grabs the data for a specified list of tags.

 

Lynn

Kan Bao

unread,
Jul 24, 2017, 4:54:12 PM7/24/17
to TASSEL - Trait Analysis by Association, Evolution and Linkage
 Hi Terry,

I also got the same error.

Memory Settings: -Xms512m -Xmx900G
Tassel Pipeline Arguments: -fork1 -GetTagTaxaDistFromDBPlugin -db GBSV2.db -o TagTaxaDistOutput.txt -endPlugin -runfork1
[main] INFO net.maizegenetics.tassel.TasselLogging - Tassel Version: 5.2.33  Date: January 12, 2017
[main] INFO net.maizegenetics.tassel.TasselLogging - Max Available Memory Reported by JVM: 819200 MB
[main] INFO net.maizegenetics.tassel.TasselLogging - Java Version: 1.8.0_121
[main] INFO net.maizegenetics.tassel.TasselLogging - OS: Linux
[main] INFO net.maizegenetics.tassel.TasselLogging - Number of Processors: 72
[main] INFO net.maizegenetics.pipeline.TasselPipeline - Tassel Pipeline Arguments: [-fork1, -GetTagTaxaDistFromDBPlugin, -db, GBSV2.db, -o, TagTaxaDistOutput.txt, -endPlugin, -runfork1]
net.maizegenetics.analysis.gbs.v2.GetTagTaxaDistFromDBPlugin
[pool-1-thread-1] INFO net.maizegenetics.plugindef.AbstractPlugin - Starting net.maizegenetics.analysis.gbs.v2.GetTagTaxaDistFromDBPlugin: time: Jul 24, 2017 14:50:26
[pool-1-thread-1] INFO net.maizegenetics.plugindef.AbstractPlugin - 
GetTagTaxaDistFromDBPlugin Parameters
db: GBSV2.db
o: TagTaxaDistOutput.txt

size of all tags in tag table=743545
size of all tissues in tissue table=0
size of all tags in mappingApproach table=2
size of all taxa in taxa table=2078
[pool-1-thread-1] INFO net.maizegenetics.pipeline.TasselPipeline - net.maizegenetics.analysis.gbs.v2.GetTagTaxaDistFromDBPlugin: time: Jul 24, 2017 14:54:25: progress: 100%
[pool-1-thread-1] INFO net.maizegenetics.plugindef.AbstractPlugin - net.maizegenetics.analysis.gbs.v2.GetTagTaxaDistFromDBPlugin  Citation: Bradbury PJ, Zhang Z, Kroon DE, Casstevens TM, Ramdoss Y, Buckler ES. (2007) TASSEL: Software for association mapping of complex traits in diverse samples. Bioinformatics 23:2633-2635.
java.lang.OutOfMemoryError: Requested array size exceeds VM limit
at java.util.Arrays.copyOf(Arrays.java:3332)
at java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:124)
at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:674)
at java.lang.StringBuilder.append(StringBuilder.java:208)
at net.maizegenetics.analysis.gbs.v2.GetTagTaxaDistFromDBPlugin.processData(GetTagTaxaDistFromDBPlugin.java:85)
at net.maizegenetics.plugindef.AbstractPlugin.performFunction(AbstractPlugin.java:112)
at net.maizegenetics.plugindef.AbstractPlugin.dataSetReturned(AbstractPlugin.java:1821)
at net.maizegenetics.plugindef.ThreadedPluginListener.run(ThreadedPluginListener.java:29)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
[pool-1-thread-1] ERROR net.maizegenetics.plugindef.ThreadedPluginListener - Out of Memory: GetTagTaxaDistFromDBPlugin could not complete task: 
Current Max Heap Size: 819200 Mb
Use -Xmx option in start_tassel.pl or start_tassel.bat
to increase heap size. Included with tassel standalone zip.



OutOfMemoryError: Requested array size exceeds VM limit

Maybe the reason is that Java arrays are indexed by int. The maximum positive int in Java is 2^31 – 1 = 2,147,483,647.   It could not be solved by increasing the memory.

Any help would be appreciated?


Best

Kan

Lynn Carol Johnson

unread,
Jul 25, 2017, 7:53:12 AM7/25/17
to tas...@googlegroups.com

I made a quick change.  Try with the attached jar.  You’ll need to rename the jar from sTASSEL.jar.txt to sTASSEL.jar and replace your current TASSEL jar with this one. 

 

From: <tas...@googlegroups.com> on behalf of Kan Bao <kan...@gmail.com>
Reply-To: "tas...@googlegroups.com" <tas...@googlegroups.com>
Date: Monday, July 24, 2017 at 4:54 PM
To: "TASSEL - Trait Analysis by Association, Evolution and Linkage" <tas...@googlegroups.com>
Subject: Re: [TASSEL-Group] GetTagTaxaDistFromDBPlugin failed (out of memory) with 95 lanes of Illumina data, ~3000 taxa (samples)

 

 Hi Terry,

sTASSEL.jar.txt

Kan Bao

unread,
Jul 25, 2017, 10:13:47 AM7/25/17
to TASSEL - Trait Analysis by Association, Evolution and Linkage
Hi Lynn,


Great!  That works.  Thank you so much for your help!


Best  


Kan

Lynn Carol Johnson

unread,
Jul 25, 2017, 10:40:47 AM7/25/17
to tas...@googlegroups.com

No problem.  I have delivered the code, it will be available in the next official build.

--

You received this message because you are subscribed to the Google Groups "TASSEL - Trait Analysis by Association, Evolution and Linkage" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tassel+un...@googlegroups.com.
To post to this group, send email to tas...@googlegroups.com.

Liangliang Gao

unread,
Jul 25, 2017, 11:50:23 AM7/25/17
to TASSEL - Trait Analysis by Association, Evolution and Linkage
Lynn and Kan

Thank you so much for your comments and solutions. It also worked for my 90 lanes of data and ~3000 breeding lines. We also developed a modified plugin that takes selected tags (not all tags) and generate taxa distribution information. That modified script might be useful if you have extremely large amount of data. But so far, it seems OK using the standard plugin. Cheers!

Liang
Reply all
Reply to author
Forward
0 new messages