TASSEL 4: MLM Compression Failed

233 views
Skip to first unread message

Matthew H.

unread,
Feb 23, 2015, 9:10:14 PM2/23/15
to tas...@googlegroups.com
Hello TASSEL community!

I am using TASSEL 4.3.13 to run MLM analyses.
I'm trying to automate the process as an undergrad in my lab, so it may be a while before I can answer more technical questions about our goals.

The problem I am having is extremely similar to that of M. Enders, who encountered a problem with MLM compression as seen here.
I have several data sets of varying levels of completeness. When I run the analysis on my half-complete data set (phenotype data for approximately 180/286 individuals in the panel) there are no issues whatsoever and MLM runs without a hitch.

When I run MLM on a complete data set (285/286), though... "Compression failed for g = 250" !
Part of my automation includes importing pre-joined and pre-filtered genotype data, as well as a pre-generated kinship, but even when I run the analysis from scratch in the GUI (my goal is to automate the pipeline) I get the same failure pattern.

I'm having issues getting the Java console to show up, so I can't provide any more detailed error reports, but sometimes when messing with the data import I will get "Compression failed for g = 358." The message shows up one time. In the pipeline, MLM will appear to continue running (error message appears between 0-10%) until 80%, at which it hangs. In the GUI, the MLM progress bar will reach 2/3 full and then "reset" to empty and never refill.

Here are my flags for the pipeline MLM:
-fork1 -h $top/COREFILES/maize_snp/FilteredGenome.hmp.txt.gz -fork2 -k $top/COREFILES/maize_snp/kinship.txt -fork3 -r $top/runs/$RunName/data/$RunDataName -combine4 -input1 -input3 -intersect -combine5 -input2 -input4 -mlm -mlmVarCompEst P3D -mlmCompressionLevel Optimum -mlmOutputFile $top/runs/$RunName/data/$RunResultsName.txt -runfork1 -runfork2 -runfork3

The $Variables are because I'm automating the procedure in Perl. I will also add that setting compression to zero in the pipeline results in a very long hang. I'm not sure if it just gets slower or breaks entirely.

Can anyone shed light on this frustrating problem?
M. Enders had a similar problem, but I don't think it was solved for him.

Many thanks,
Matthew Hill

Terry Casstevens

unread,
Feb 23, 2015, 10:18:34 PM2/23/15
to Tassel User Group
Is there a reason you aren't using Tassel 5?
> --
> You received this message because you are subscribed to the Google Groups
> "TASSEL - Trait Analysis by Association, Evolution and Linkage" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tassel+un...@googlegroups.com.
> To post to this group, send email to tas...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tassel/3482730d-c75c-4767-8bda-10622bf910cf%40googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
Message has been deleted

Matthew H.

unread,
Feb 23, 2015, 10:36:41 PM2/23/15
to tas...@googlegroups.com
According to my lab manager, Tassel 5 MLM doesn't work with the line names in our association panel in the format we have most of them in. I believe he has some phenotype data sets with the reformatted names, but I'm working with Tassel 4 for now since I don't know what the reformatting entails. If this is an issue that's been fixed in Tassel 5 I can ask him for the 5-compatible line names and try it that way.

Terry Casstevens

unread,
Feb 23, 2015, 10:41:47 PM2/23/15
to Tassel User Group
I don't think there would be reformatting needed for Tassel 5. I'd be
interested in knowing the details, as it'd be our priority to fix any
problems with Tassel 5.
> https://groups.google.com/d/msgid/tassel/1bb9b3ff-661b-4985-868f-34d9a91e5947%40googlegroups.com.

Matthew H.

unread,
Feb 23, 2015, 10:48:39 PM2/23/15
to tas...@googlegroups.com
I don't know all the details- I'll talk to our lab manager tomorrow and see if he knows anything more- I know I'll be able to attach a comparison of the names we use for Tassel 4 and Tassel 5. Using our previous names does throw an error with MLM- I'll have more info on that tomorrow afternoon.

 Do you have any idea what might be causing the compression error? The most significant fact to me is that the half-empty data set seems to work. 

For your information, I am running MLM on an entire genome (2.2M SNPs) and a panel of 286 lines. I union-joined the chromosomes and filter-aligned them through the GUI, then exported them so I could simply reload them instead of re-joining and re-filtering each iteration. Similarly, I saved and exported the kinship file from the filtered genotype data. I'm on Windows 8, and the MLM uses about 10GB of RAM on my system.

Terry Casstevens

unread,
Feb 23, 2015, 10:51:55 PM2/23/15
to Tassel User Group
My colleague could answer your question better. If you have logging
messages send that. If you run from command line, you should get
those. Generally I think folk intersect join also.
> --
> You received this message because you are subscribed to the Google Groups
> "TASSEL - Trait Analysis by Association, Evolution and Linkage" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tassel+un...@googlegroups.com.
> To post to this group, send email to tas...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tassel/dd2afe83-2f12-4f90-b2f8-27f0d5cf60ae%40googlegroups.com.

Matthew H.

unread,
Feb 23, 2015, 10:54:53 PM2/23/15
to tas...@googlegroups.com
I noticed that about the intersection join- the geno/phenotype data are n-joined, but I'll have to ask about that initial chromosome 1-10 u-join.
I've had trouble getting Tassel to give me logs via the Java console, but I'll keep working on it.
I'll be in touch tomorrow when I have more information.

Terry Casstevens

unread,
Feb 23, 2015, 10:56:41 PM2/23/15
to Tassel User Group
Logging has been improved in Tassel 5
> --
> You received this message because you are subscribed to the Google Groups
> "TASSEL - Trait Analysis by Association, Evolution and Linkage" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tassel+un...@googlegroups.com.
> To post to this group, send email to tas...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tassel/17ca6c4b-87e3-455e-b71f-5c27678f3140%40googlegroups.com.

Peter Bradbury

unread,
Feb 24, 2015, 10:11:50 AM2/24/15
to tas...@googlegroups.com
Matthew,

Except in rare cases, we will not be fixing issues in Tassel 4, so you will need to resolve the name formatting issues with Tassel 5 first. As Terry said, there is no reason that we know of that names that worked with version 4 would not work with version 5. So, you should work with Terry to resolve that first. Then if Tassel 5 has the compression problem, we can address that. 

Peter


Matthew H.

unread,
Feb 24, 2015, 9:00:20 PM2/24/15
to tas...@googlegroups.com
Hi Peter and Terry,

Our lab is using the partially imputed maize haplotype data for the association panel. I believe it is the latest version from panzea. 

In Tassel 4, our trait data could be used with line names formatted as "A619" or "A632," for example. 
In Tassel 5, the same names must be referenced fully: 

A619:C08L7ACXX:6:250047940
A619:MRG:2:250039765
A632:C08L7ACXX:6:250047950
A632:MRG:2:250039775

This is a relatively simple fix, but our old data sets must all be converted to the new format for use in Tassel 5.

I'm testing MLM on Tassel 5 now. The run seems to be taking a while through the GUI, so I'll report back when that finishes.
When I use the pipeline, MLM gets to 80% completion and then writes empty output files and a non-empty "residuals" file with some data.
I'm unsure if this is an error of some kind or if the rest of the MLM takes a long time.

A couple more questions-
Are the pipeline flags -t and -r for loading of phenotype data identical, or are there any differences in their interpretations?
Finally, should I be using -export or -mlmOutputFile? The MLM pipeline tutorial uses the former, but the latter gives results akin to those I get by using the GUI.

I wish I had Java logs, but I'm struggling with Java on this machine at the moment.

Matthew H.

unread,
Feb 24, 2015, 10:38:11 PM2/24/15
to tas...@googlegroups.com
After a little tweaking, it looks like MLM is up and running just fine now. I'll be using -mlmOutputFile since it can be better read into MS Excel.
The only remaining issue is that of the AP line names.
It won't take me long at all to write a quick script to convert our data sets, but it would be nice if Tassel supported the old, non-verbose line names.

Thanks for your advice to switch to Tassel 5!
Reply all
Reply to author
Forward
0 new messages