Tassel pipeline for MLM is too slow

257 views
Skip to first unread message

Vicky

unread,
Apr 13, 2021, 7:01:51 AM4/13/21
to TASSEL - Trait Analysis by Association, Evolution and Linkage
Hi,
I am running a TASSEL pipeline for MLM however, data set is too big with around 3 millions SNPs and 800 individuals. Also the pipeline is running but got stuck at 70% progress for around 48 hours. I cancelled the previous command and run it again but again it got stuck at same progress level 70%.

This is the script:
/tassel-5-standalone/run_pipeline.pl -Xmx80g -Xms50g -configFile /variantfilter1/mlm_tutorial_config.xml -mlmOutputFile /variantfilter1/mlm_results

The end of the log:

[pool-1-thread-1] INFO net.maizegenetics.plugindef.AbstractPlugin - Finished net.maizegenetics.analysis.filter.FilterSiteBuilderPlugin: time: Apr 13, 2021 12:35:7
[pool-1-thread-1] INFO net.maizegenetics.pipeline.TasselPipeline - net.maizegenetics.analysis.filter.FilterSiteBuilderPlugin: time: Apr 13, 2021 12:35:7: progress: 100%
[Thread-11] INFO net.maizegenetics.pipeline.TasselPipeline - net.maizegenetics.analysis.data.IntersectionAlignmentPlugin: time: Apr 13, 2021 12:35:7: progress: 100%
[Thread-13] INFO net.maizegenetics.plugindef.AbstractPlugin - Starting net.maizegenetics.analysis.association.WeightedMLMPlugin: time: Apr 13, 2021 12:35:7
[Thread-13] INFO net.maizegenetics.matrixalgebra.Matrix.DoubleMatrixFactory - TasselBlas library for system-specific BLAS/LAPACK not found. Using system-independent EJML for DoubleMatrix operations.
[Thread-13] INFO net.maizegenetics.pipeline.TasselPipeline - net.maizegenetics.analysis.association.WeightedMLMPlugin: time: Apr 13, 2021 12:35:11: progress: 0%
[Thread-13] INFO net.maizegenetics.pipeline.TasselPipeline - net.maizegenetics.analysis.association.WeightedMLMPlugin: time: Apr 13, 2021 12:36:43: progress: 10%
[Thread-13] INFO net.maizegenetics.pipeline.TasselPipeline - net.maizegenetics.analysis.association.WeightedMLMPlugin: time: Apr 13, 2021 12:38:0: progress: 20%
[Thread-13] INFO net.maizegenetics.pipeline.TasselPipeline - net.maizegenetics.analysis.association.WeightedMLMPlugin: time: Apr 13, 2021 12:39:9: progress: 30%
[Thread-13] INFO net.maizegenetics.pipeline.TasselPipeline - net.maizegenetics.analysis.association.WeightedMLMPlugin: time: Apr 13, 2021 12:40:17: progress: 40%
[Thread-13] INFO net.maizegenetics.pipeline.TasselPipeline - net.maizegenetics.analysis.association.WeightedMLMPlugin: time: Apr 13, 2021 12:41:25: progress: 50%
[Thread-13] INFO net.maizegenetics.pipeline.TasselPipeline - net.maizegenetics.analysis.association.WeightedMLMPlugin: time: Apr 13, 2021 12:42:1: progress: 60%
[Thread-13] INFO net.maizegenetics.pipeline.TasselPipeline - net.maizegenetics.analysis.association.WeightedMLMPlugin: time: Apr 13, 2021 12:42:19: progress: 70%


Is there any problem with my data or the run is slow because of the big data set.
How can I run it faster? Otherwise I am waiting endlessly here for >100 traits.
Thanks,

Vinod,




Terry Casstevens

unread,
Apr 13, 2021, 7:08:17 AM4/13/21
to Tassel User Group
Are you using the latest update of TASSEL?

Looks like it gets to 70% quickly?

Will you add -debug to the command?



--
You received this message because you are subscribed to the Google Groups "TASSEL - Trait Analysis by Association, Evolution and Linkage" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tassel+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tassel/975b6bf0-576f-4867-b490-a6e6331b548cn%40googlegroups.com.

Vicky

unread,
Apr 13, 2021, 9:15:43 AM4/13/21
to TASSEL - Trait Analysis by Association, Evolution and Linkage
Thanks Terry for the reply.
I am using Tassel 5.2.72, I recently cloned it.
Yes, you are right, it is going very fast up to 70% but after that it gets stuck without producing any error.
I ran it again with -debug and it is again on 70% since last half an hour and I don't know if this time it will pass this hurdle.
Thanks,

Vinod,

Terry Casstevens

unread,
Apr 13, 2021, 11:17:17 AM4/13/21
to Tassel User Group
I know that having invariant sites causes problems with MLM, but that usually throws an Exception.  Have you filtered out invariant sites?

Manoj Kumar

unread,
Apr 13, 2021, 12:31:34 PM4/13/21
to TASSEL - Trait Analysis by Association, Evolution and Linkage
Did you try filters with the parameters like MAF, Max-MAF and Site min Count.

Vicky

unread,
Apr 13, 2021, 4:28:41 PM4/13/21
to TASSEL - Trait Analysis by Association, Evolution and Linkage
Hi Terry,
First I've filtered the vcf file using MAF 0.05 with some other filters and then used is it for GWAS, so I think there is no probability of invariant sites. What else could be the reasons?
I've also increased the memory allocation -Xms5g -Xmx100g. The GUI is working well with same files but too slow, one chromosome is finished in 3 days.
Thanks,

Vinod,

Vicky

unread,
Apr 13, 2021, 4:46:44 PM4/13/21
to TASSEL - Trait Analysis by Association, Evolution and Linkage
One thing I forgot to mention is that vcf file contains missing sites and I don't know if this is what causing the issue.
Thanks,

Vinod,

Terry Casstevens

unread,
Apr 13, 2021, 8:42:04 PM4/13/21
to Tassel User Group
Tell me exactly what you do in GUI that works for you?  Also send me the logging from the GUI

Vicky

unread,
Apr 14, 2021, 1:19:27 AM4/14/21
to TASSEL - Trait Analysis by Association, Evolution and Linkage
Hi Terry,
I am using the same files on linux GUI (Called using: tassel-5-standalone/start_tassel.pl -Xmx90g).  GUI is still running and I don't know how to generate logging information from GUI. But some logging from GUI is still there:

Memory Settings: -Xms512m -Xmx90g
[main] INFO net.maizegenetics.tassel.PreferencesDialog - TasselPrefs: default locale set: English (United States)
[main] INFO net.maizegenetics.tassel.TASSELMainFrame - TASSELMainFrame: addThirdPartyMenus: adding: /phg/tassel_menu.xml
[main] INFO net.maizegenetics.tassel.TasselLogging - Tassel Version: 5.2.72  Date: April 8, 2021
[main] INFO net.maizegenetics.tassel.TasselLogging - Max Available Memory Reported by JVM: 92160 MB
[main] INFO net.maizegenetics.tassel.TasselLogging - Java Version: 11.0.10
[main] INFO net.maizegenetics.tassel.TasselLogging - OS: Linux
[main] INFO net.maizegenetics.tassel.TasselLogging - Number of Processors: 160


But this is exceptionally slow and worthless for repeated data analysis. 

But the VCF filtered, KInship matrix and PCA files generated using TASEEL, are fed into linux pipeline. I thought I could run GWAS faster using linux pipeline.

I hope you find a clue soon.

Terry Casstevens

unread,
Apr 14, 2021, 8:09:08 AM4/14/21
to Tassel User Group
If you can make it work from the GUI (even slowly), I should be able to tell you how to replicate that from the command line.  That's why I was asking the steps you take in the GUI. See this document to see how to get logging from the GUI.

Vicky

unread,
Apr 14, 2021, 9:04:10 AM4/14/21
to TASSEL - Trait Analysis by Association, Evolution and Linkage
Hi Terry,
Thanks for letting me know about Logging. I am attaching here the logging information. Please have a look and let me know if it can go faster. Can I also click on the debug level in GUI even while the program is running?
Thanks,

Vinod,
TASSEL_5_Logging.txt

Terry Casstevens

unread,
Apr 14, 2021, 10:27:18 AM4/14/21
to Tassel User Group
Is this the command you're trying to get to work?
/tassel-5-standalone/run_pipeline.pl -Xmx80g -Xms50g -configFile /variantfilter1/mlm_tutorial_config.xml -mlmOutputFile /variantfilter1/mlm_results

Can you send the config file?  Will you add -debug and send all the logging?  You don't need to wait until it finishes.   Just up to the point it hangs.

Vicky

unread,
Apr 14, 2021, 11:24:38 AM4/14/21
to TASSEL - Trait Analysis by Association, Evolution and Linkage
Hi Terry,
I tried various combinations of -Xmx and -Xms but every time it stucks at same level. It looks problem is somewhere else and it is not throwing any error.
I am attaching the logging file of the linux pipeline. Yes, I've added the -debug in the new script and you can also see it in logging file.
Thanks,
Vinod,



Tassel_pipeline.txt

Peter Bradbury

unread,
Apr 14, 2021, 1:34:39 PM4/14/21
to TASSEL - Trait Analysis by Association, Evolution and Linkage
It looks like you are using optimal compression, which could be causing the problem. Try using no compression. That takes a lot of time with big data sets and may be what is causing MLM to hang. If that does not solve the problem, try running one chromosome at a time to see if one specific chromosome is causing the problem. The SNPs are tested one at a time, so running chromosomes individually should not affect the results, though it will take longer in total.
Peter 

Reply all
Reply to author
Forward
0 new messages