Usage of -ExcludeLastTrait in Population Structure

95 views
Skip to first unread message

Matthew H.

unread,
Mar 3, 2015, 12:09:51 PM3/3/15
to tas...@googlegroups.com
Hi Tassel Users,

What is the function of removing the last column of phenotype data from a population structure for use in MLM?
I ask so I can be sure I am using it correctly.

The MLM tutorial structure file has the lines, for example:
<Trait> Q1 Q2 Q3
4226 0.071 0.917 0.012

While the actual AP population structure file from PanZea has analogous lines:
StiffStalk NonStiffStalk Tropical Sweet Popcorn
4226 0.071 0.917 0.012 0 0

the former one uses -ExcludeLastTrait in the tutorial.

If I am using the latter file in my MLM, does this mean I should run -ExcludeLastTrait three times to approach how the data in the tutorial are treated?
That's my real question, the uncertainty about the function of the flag is underlying.

Thanks,
Matthew Hill

Matthew H.

unread,
Mar 3, 2015, 12:41:46 PM3/3/15
to tas...@googlegroups.com
From another thread:

"The method used to solve MLM requires that the columns of the design matrix be independent. Dropping one of the structure covariates usually solves that problem."

It's my understanding that one column must be deleted to make them independent (add up to less than one), then. 
For all columns to add up to less than one, I will indeed have to delete three columns of data from the population structure file, either manually or by repeating the -ExcludeLastTrait flag three times. Is this correct?

Peter Bradbury

unread,
Mar 3, 2015, 1:10:24 PM3/3/15
to tas...@googlegroups.com
Correct. Although, if you have sweet corn and popcorn lines in your data set and as a result, there are some non-zero values in the sweet and popcorn columns removing one column will be sufficient.

Peter

Matthew H.

unread,
Mar 3, 2015, 1:16:44 PM3/3/15
to tas...@googlegroups.com
Thank you for the clarification.
Upon importing the matrix with 

-q $dir/popstructure.txt -ExcludeLastTrait -ExcludeLastTrait -ExcludeLastTrait 
(this also occurs with -r and any number of -ExcludeLastTrait flags)

I get the error message from FileLoadPlugin: "Unrecognized format for a phenotype."
Of course, this isn't a phenotype- it's population structure data! While I understand structure data can be loaded as phenotype data, how do I make Tassel treat this correctly?

Peter Bradbury

unread,
Mar 3, 2015, 2:37:55 PM3/3/15
to tas...@googlegroups.com
I cannot duplicate your problem. -q with -ExcludeLastTrait works fine for me with in the mdp_population_structure.txt tutorial set. There may actually be a formatting problem with the file you are trying to import. Population structure (and factors and covariates in general) is treated as a type of phenotype, hence the misleading error message. 

Peter

Matthew H.

unread,
Mar 3, 2015, 2:51:28 PM3/3/15
to tas...@googlegroups.com
Hi Peter,

Sorry, I should have been more clear- the file that returns the error is the PanZea Q matrix showing population structure data for the maize associaton mapping panel, found here.
Looking at it again, I think I understand the problem. The first line contains a value for the number of columns, in this case "5."
When -ExcludeLastTrait removes the last column, perhaps that "5" is not changed to "4," and thus Tassel tries to read a column that does not exist.

I am heading back to my desktop shortly, I'll let you know if altering that value changes anything.

Peter Bradbury

unread,
Mar 3, 2015, 3:04:22 PM3/3/15
to tas...@googlegroups.com
Matthew,

That file was created a long time ago (in 2005). I think Tassel version 2 was current then. The format has not been supported for some time. Just change the headers to look like mdp_population_structure.txt.

Peter

Matthew H.

unread,
Mar 3, 2015, 3:28:06 PM3/3/15
to tas...@googlegroups.com
Thanks, that seems to have worked! Unfortunately, I now get the following error:

[pool-1-thread-1] INFO net.maizegenetics.pipeline.TasselPipeline - net.maizegene
tics.analysis.data.FileLoadPlugin: progress: 100%
[Thread-13] INFO net.maizegenetics.pipeline.TasselPipeline - net.maizegenetics.a
nalysis.data.IntersectionAlignmentPlugin: progress: 100%
[Thread-15] INFO net.maizegenetics.matrixalgebra.Matrix.DoubleMatrixFactory - Ta
sselBlas library for system-specific BLAS/LAPACK not found. Using system-indepen
dent EJML for DoubleMatrix operations.
[Thread-15] INFO net.maizegenetics.pipeline.TasselPipeline - net.maizegenetics.a
nalysis.association.MLMPlugin: progress: 100%
java.lang.ArrayIndexOutOfBoundsException: 0
        at net.maizegenetics.taxa.distance.DistanceMatrix.getDistance(DistanceMatrix.java:207)
        at net.maizegenetics.analysis.association.CompressedMLMusingDoubleMatrix.calculateDistanceFromKin(CompressedMLMusingDoubleMatrix.java:741)
        at net.maizegenetics.analysis.association.CompressedMLMusingDoubleMatrix.computeZKZ(CompressedMLMusingDoubleMatrix.java:547)
        at net.maizegenetics.analysis.association.CompressedMLMusingDoubleMatrix.solve(CompressedMLMusingDoubleMatrix.java:198)
        at net.maizegenetics.analysis.association.MLMPlugin.performFunction(MLMPlugin.java:193)
        at net.maizegenetics.plugindef.AbstractPlugin.dataSetReturned(AbstractPlugin.java:1444)
        at net.maizegenetics.plugindef.AbstractPlugin.fireDataSetReturned(AbstractPlugin.java:1351)
        at net.maizegenetics.plugindef.AbstractPlugin.fireDataSetReturned(AbstractPlugin.java:1367)
        at net.maizegenetics.analysis.data.CombineDataSetsPlugin.performFunction(CombineDataSetsPlugin.java:63)
        at net.maizegenetics.analysis.data.CombineDataSetsPlugin.dataSetReturned(CombineDataSetsPlugin.java:138)
        at net.maizegenetics.plugindef.ThreadedPluginListener.run(ThreadedPluginListener.java:29)

My Pipeline arguments (passed to the command line from perl) are:

-fork1 -h $top/COREFILES/maize_snp/FilteredGenome.hmp.txt.gz -fork2 -r $top/runs/$RunName/$RunDataName -fork3 -q $top/COREFILES/maize_snp/popstructure.txt -fork4 -k $top/COREFILES/maize_snp/kinship.txt  -combine5 -input1 -input2 -input3 -intersect -combine6 -input5 -input4 -mlm -mlmVarCompEst P3D -mlmCompressionLevel Optimum -mlmOutputFile $top/runs/$RunName/ -runfork1 -runfork2 -runfork3 -runfork4
FilteredGenome.hmp.txt.gz is a union-joined, site-filtered export of all 10 partially imputed maize chromosome hapmap files from AllZeaGBSv2.3.
Popstructure.txt is the aforementioned population structure file with the sweetcorn and popcorn columns removed, as there were some nonzero values for lines also represented in our phenotype data.
Kinship.txt is a Tassel-generated kinship of the FilteredGenome file.

The first two and the phenotype data are intersected and run in MLM against the kinship.

Peter Bradbury

unread,
Mar 3, 2015, 4:06:57 PM3/3/15
to tas...@googlegroups.com
This is a bit more challenging. The problem appears to be that when Tassel creates a subset of the kinship matrix that matches the taxa with non-missing phenotype values, it ends up with nothing. I have no idea why that is happening. You might try using the GUI version of Tassel to make sure you get something reasonable after joining the first three files.

Peter

Matthew H.

unread,
Mar 3, 2015, 4:20:07 PM3/3/15
to tas...@googlegroups.com
I think I've figured it out.

Line names for Tassel 5 phenotype data are formatted like this (e.g. 33-16)
33-16:C08L7ACXX:6:250047984 0.321907981
33-16:MRG:2:250039809 0.321907981

The old population structure file, however, uses the old format:
33-16 0.014 0.972 0.014

Thus, when the intersection-join occurs, the result is a blank table.
With regards to reformatting the population structure file, would the updated format

33-16:C08L7ACXX:6:250047984 0.014 0.972 0.014
33-16:MRG:2:250039809 0.014 0.972 0.014

be correct?

Matthew H.

unread,
Mar 3, 2015, 8:53:24 PM3/3/15
to tas...@googlegroups.com
For anyone else who finds this topic, my aforementioned reformatting of the population structure file worked.
Thanks for your help Peter!
Reply all
Reply to author
Forward
0 new messages