calculate LD , large Winsize, Out of memory error

237 views
Skip to first unread message

Liangliang Gao

unread,
Aug 23, 2019, 2:27:08 PM8/23/19
to TASSEL - Trait Analysis by Association, Evolution and Linkage
Hi,

I have a 4 million SNP genotype file, one SNP for every 100-200bp; I wanted to calculate LD using TASSEL, and set the ldWinSize to 2000 (around 200kb sliding window). I got an error complaining about memory. The command line is this: $tasselPath -Xms2g -Xmx60g -fork1 -h $myG -ld  -ldWinSize 2000 -export $myG.ld.win2000.txt > $myG.ld.win2000.out 2> $myG.ld.win2000.err

So, is there an option to ask TASSEL to output parts of LD calculations to temporary files, so that it won't need too big a memory?  
I noticed that when fitting LD decay curve using Remington et al 2001 PNAS 98:11479 method, the decay rate (as defined by dropping below r2=0.2) decreases with smaller sliding window size. That's the reason why I thought I should use a big sliding window size of 2000 SNPs.  Also, the genotype file were based on WGS (10-30x coverage of wild wheat), let me konw if you think that I should do more SNP filtering before doing LD analysis.

Picture1.png






















The .out file:
Memory Settings: -Xms2g -Xmx60g
Tassel Pipeline Arguments: -fork1 -h CM008374.1.vcf.ipkmm.vcf.hmp.txt.format.owwc.hmp.txt -ld -ldWinSize 2000 -export CM008374.1.vcf.ipkmm.vcf.hmp.txt.format.owwc.hmp.txt.ld.win2000.txt
[main] INFO net.maizegenetics.tassel.TasselLogging - Tassel Version: 5.2.52  Date: March 21, 2019
[main] INFO net.maizegenetics.tassel.TasselLogging - Max Available Memory Reported by JVM: 54613 MB
[main] INFO net.maizegenetics.tassel.TasselLogging - Java Version: 1.8.0_192
[main] INFO net.maizegenetics.tassel.TasselLogging - OS: Linux
[main] INFO net.maizegenetics.tassel.TasselLogging - Number of Processors: 16
[main] INFO net.maizegenetics.pipeline.TasselPipeline - Tassel Pipeline Arguments: [-fork1, -h, CM008374.1.vcf.ipkmm.vcf.hmp.txt.format.owwc.hmp.txt, -ld, -ldWinSize, 2000, -export, CM008374.1.vcf.ipkmm.vcf.hmp.txt.format.owwc.hmp.txt.ld.win2000.txt]
[pool-1-thread-1] INFO net.maizegenetics.plugindef.AbstractPlugin - Starting net.maizegenetics.analysis.data.FileLoadPlugin: time: Aug 23, 2019 11:42:8
[pool-1-thread-1] INFO net.maizegenetics.plugindef.AbstractPlugin -
FileLoadPlugin Parameters
format: Hapmap
sortPositions: false
keepDepth: true

[pool-1-thread-1] INFO net.maizegenetics.analysis.data.FileLoadPlugin - Start Loading File: CM008374.1.vcf.ipkmm.vcf.hmp.txt.format.owwc.hmp.txt time: Aug 23, 2019 11:42:9
[pool-1-thread-1] INFO net.maizegenetics.pipeline.TasselPipeline - net.maizegenetics.analysis.data.FileLoadPlugin: time: Aug 23, 2019 11:42:47: progress: 0%
[pool-1-thread-1] INFO net.maizegenetics.pipeline.TasselPipeline - net.maizegenetics.analysis.data.FileLoadPlugin: time: Aug 23, 2019 11:42:47: progress: 10%
[pool-1-thread-1] INFO net.maizegenetics.pipeline.TasselPipeline - net.maizegenetics.analysis.data.FileLoadPlugin: time: Aug 23, 2019 11:42:47: progress: 20%
[pool-1-thread-1] INFO net.maizegenetics.pipeline.TasselPipeline - net.maizegenetics.analysis.data.FileLoadPlugin: time: Aug 23, 2019 11:42:47: progress: 30%
[pool-1-thread-1] INFO net.maizegenetics.pipeline.TasselPipeline - net.maizegenetics.analysis.data.FileLoadPlugin: time: Aug 23, 2019 11:42:47: progress: 40%
[pool-1-thread-1] INFO net.maizegenetics.pipeline.TasselPipeline - net.maizegenetics.analysis.data.FileLoadPlugin: time: Aug 23, 2019 11:42:48: progress: 50%
[pool-1-thread-1] INFO net.maizegenetics.pipeline.TasselPipeline - net.maizegenetics.analysis.data.FileLoadPlugin: time: Aug 23, 2019 11:42:48: progress: 60%
[pool-1-thread-1] INFO net.maizegenetics.pipeline.TasselPipeline - net.maizegenetics.analysis.data.FileLoadPlugin: time: Aug 23, 2019 11:42:48: progress: 70%
[pool-1-thread-1] INFO net.maizegenetics.pipeline.TasselPipeline - net.maizegenetics.analysis.data.FileLoadPlugin: time: Aug 23, 2019 11:42:48: progress: 80%
[pool-1-thread-1] INFO net.maizegenetics.pipeline.TasselPipeline - net.maizegenetics.analysis.data.FileLoadPlugin: time: Aug 23, 2019 11:42:48: progress: 90%
[pool-1-thread-1] INFO net.maizegenetics.pipeline.TasselPipeline - net.maizegenetics.analysis.data.FileLoadPlugin: time: Aug 23, 2019 11:42:48: progress: 100%
[pool-1-thread-1] INFO net.maizegenetics.plugindef.AbstractPlugin - net.maizegenetics.analysis.data.FileLoadPlugin  Citation: Bradbury PJ, Zhang Z, Kroon DE, Casstevens TM, Ramdoss Y, Buckler ES. (2007) TASSEL: Software for association mapping of complex traits in diverse samples. Bioinformatics 23:2633-2635.
[pool-1-thread-1] INFO net.maizegenetics.analysis.data.FileLoadPlugin - Finished Loading File: CM008374.1.vcf.ipkmm.vcf.hmp.txt.format.owwc.hmp.txt time: Aug 23, 2019 11:42:49
Genotype Table Name: CM008374.1.vcf.ipkmm.vcf.hmp.txt.format
Number of Taxa: 265
Number of Sites: 4158655
Sites x Taxa: 1102043575
Chromosomes...
7: start site: 0 (11294) last site: 4158654 (644694373) total: 4158655

[pool-1-thread-1] INFO net.maizegenetics.plugindef.AbstractPlugin - Finished net.maizegenetics.analysis.data.FileLoadPlugin: time: Aug 23, 2019 11:42:49
[pool-1-thread-1] INFO net.maizegenetics.pipeline.TasselPipeline - net.maizegenetics.analysis.popgen.LinkageDisequilibriumPlugin: time: Aug 23, 2019 11:42:49: progress: 0%
[pool-1-thread-1] INFO net.maizegenetics.pipeline.TasselPipeline - net.maizegenetics.analysis.popgen.LinkageDisequilibriumPlugin: time: Aug 23, 2019 12:45:29: progress: 100%
[pool-1-thread-1] ERROR net.maizegenetics.plugindef.ThreadedPluginListener - Out of Memory: FileLoadPlugin could not complete task:
Current Max Heap Size: 58392 Mb
Use -Xmx option in start_tassel.pl or start_tassel.bat
to increase heap size. Included with tassel standalone zip.


The error file:
java.lang.OutOfMemoryError: Java heap space
    at cern.colt.map.OpenLongObjectHashMap.rehash(Unknown Source)
    at cern.colt.map.OpenLongObjectHashMap.put(Unknown Source)
    at net.maizegenetics.analysis.popgen.LinkageDisequilibrium.calculateBitLDForHaplotype(LinkageDisequilibrium.java:243)
    at net.maizegenetics.analysis.popgen.LinkageDisequilibrium.run(LinkageDisequilibrium.java:159)
    at net.maizegenetics.analysis.popgen.LinkageDisequilibriumPlugin.processDatum(LinkageDisequilibriumPlugin.java:134)
    at net.maizegenetics.analysis.popgen.LinkageDisequilibriumPlugin.performFunction(LinkageDisequilibriumPlugin.java:114)
    at net.maizegenetics.plugindef.AbstractPlugin.dataSetReturned(AbstractPlugin.java:1925)
    at net.maizegenetics.plugindef.AbstractPlugin.fireDataSetReturned(AbstractPlugin.java:1826)
    at net.maizegenetics.plugindef.AbstractPlugin.performFunction(AbstractPlugin.java:119)
    at net.maizegenetics.plugindef.AbstractPlugin.dataSetReturned(AbstractPlugin.java:1925)
    at net.maizegenetics.plugindef.ThreadedPluginListener.run(ThreadedPluginListener.java:29)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
Reply all
Reply to author
Forward
0 new messages