Is my running failed in sort step?

238 views
Skip to first unread message

czq...@hotmail.com

unread,
Nov 5, 2017, 7:18:32 AM11/5/17
to TASSEL - Trait Analysis by Association, Evolution and Linkage

Dear all,


I  am running the sorting step in Tassel. It finished in 3 minutes for 4G data, but I found there is an error in the reporting file as below. 



Memory Settings: -Xms512m -Xmx400g

Tassel Pipeline Arguments: -debug -SortGenotypeFilePlugin -inputFile GS_Fullsib_delete15badseq_indel_biallelic_GQ6_DP2_MAF0.01_Maxmiss0.9.vcf.recode.vcf -outputFile GS_Fulls

ib_delete15badseq_indel_biallelic_GQ6_DP2_MAF0.01_Maxmiss0.9_sorted_debug.vcf -fileType VCF

[main] INFO net.maizegenetics.tassel.TasselLogging - Tassel Version: 5.2.38  Date: July 13, 2017

[main] INFO net.maizegenetics.tassel.TasselLogging - Max Available Memory Reported by JVM: 364089 MB

[main] INFO net.maizegenetics.tassel.TasselLogging - Java Version: 1.8.0_131

[main] INFO net.maizegenetics.tassel.TasselLogging - OS: Linux

[main] INFO net.maizegenetics.tassel.TasselLogging - Number of Processors: 13

[main] INFO net.maizegenetics.pipeline.TasselPipeline - Tassel Pipeline Arguments: [-fork1, -SortGenotypeFilePlugin, -inputFile, GS_Fullsib_delete15badseq_indel_biallelic_GQ6_DP2_MAF0.01_Maxmiss0.9.vcf.recode.vcf, -outputFile, GS_Fullsib_delete15badseq_indel_biallelic_GQ6_DP2_MAF0.01_Maxmiss0.9_sorted_debug.vcf, -fileType, VCF, -runfork1]

net.maizegenetics.analysis.data.SortGenotypeFilePlugin

[pool-1-thread-1] INFO net.maizegenetics.plugindef.AbstractPlugin - Starting net.maizegenetics.analysis.data.SortGenotypeFilePlugin: time: Nov 5, 2017 12:41:48

[pool-1-thread-1] INFO net.maizegenetics.plugindef.AbstractPlugin - 

SortGenotypeFilePlugin Parameters

inputFile: GS_Fullsib_delete15badseq_indel_biallelic_GQ6_DP2_MAF0.01_Maxmiss0.9.vcf.recode.vcf

outputFile: GS_Fullsib_delete15badseq_indel_biallelic_GQ6_DP2_MAF0.01_Maxmiss0.9_sorted_debug.vcf

fileType: VCF


[pool-1-thread-1] ERROR net.maizegenetics.dna.map.PositionListBuilder - validateOrdering: Position      Chr:MA_10       Pos:24775       Name:SMA_10_24775       Variants:G/C

    MAF:NaN Ref:G and Position      Chr:MA_5        Pos:86572       Name:SMA_5_86572        Variants:C/T    MAF:NaN Ref:C out of order.

BuilderFromVCF data timing 35.1043s 

[pool-1-thread-1] INFO net.maizegenetics.plugindef.AbstractPlugin - Finished net.maizegenetics.analysis.data.SortGenotypeFilePlugin: time: Nov 5, 2017 12:44:35

[pool-1-thread-1] INFO net.maizegenetics.pipeline.TasselPipeline - net.maizegenetics.analysis.data.SortGenotypeFilePlugin: time: Nov 5, 2017 12:44:35: progress: 100%

[pool-1-thread-1] INFO net.maizegenetics.plugindef.AbstractPlugin - net.maizegenetics.analysis.data.SortGenotypeFilePlugin  Citation: Bradbury PJ, Zhang Z, Kroon DE, Casstevens TM, Ramdoss Y, Buckler ES. (2007) TASSEL: Software for association mapping of complex traits in diverse samples. Bioinformatics 23:2633-2635.



any suggestions?


Cheers

chen

Terry Casstevens

unread,
Nov 5, 2017, 10:02:27 AM11/5/17
to Tassel User Group
I suspect that your output file was created correctly. That error
message is showing what was out of order. Not indicating a problem.
> --
> You received this message because you are subscribed to the Google Groups
> "TASSEL - Trait Analysis by Association, Evolution and Linkage" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tassel+un...@googlegroups.com.
> To post to this group, send email to tas...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tassel/81de1501-f3a4-400b-9c42-12ff2b0e8925%40googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

czq...@hotmail.com

unread,
Nov 5, 2017, 12:29:03 PM11/5/17
to TASSEL - Trait Analysis by Association, Evolution and Linkage
Hi Terry, 

Thanks.  I would like to continue ask one more question. After this sort, I did imputation for genotype using LDKNNi imputation. But I find there still exist some missing values after imputation. 

I was wondering whether it is possible or not?

Cheers
Chen

Terry Casstevens

unread,
Nov 5, 2017, 2:18:28 PM11/5/17
to Tassel User Group
I don't know the details of that function. Maybe someone else will
reply. But I don't think it necessarily fills in every missing value.
> https://groups.google.com/d/msgid/tassel/ddf97f99-490e-47f1-8b34-a311f36026dd%40googlegroups.com.

czq...@hotmail.com

unread,
Nov 5, 2017, 2:36:31 PM11/5/17
to TASSEL - Trait Analysis by Association, Evolution and Linkage
Hi Terry,

I saw the paper published in Nature Genetics mentioned that LD-KNNimputaion may not impute all genotypes once, at least in that paper. The LD-KNN imputation was run several times iteratively to reach user requirement.
below is the link to that paper: 
https://www.nature.com/ng/journal/v46/n7/full/ng.3007.html

Genome-wide association analyses provide genetic and biochemical insights into natural variation in rice metabolism

I am not sure how it works in Tassel. Hopefully someone could answer this.

Cheer
Chen
Reply all
Reply to author
Forward
0 new messages