Alteration of data in VCF file after Sort Genotype File Plugin

258 views
Skip to first unread message

Alexander Berry

unread,
Aug 9, 2016, 2:20:34 PM8/9/16
to TASSEL - Trait Analysis by Association, Evolution and Linkage
This question was posted 2 years ago by another userbut there was no response: https://groups.google.com/forum/#!topic/tassel/tMlOV440Zm8

I am using the GATK pipeline to call SNPs, and I end up with a VCFv4.2 file. Before I can load it into Tassel, it needs to sort, so I use the Sort Genotype File plugin, and it completely alters the data in the sample columns, along with sorting it correctly. For example it turns this:

#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT TC002 TC002-2
KB222877.1 1258 . G A 3254.20 . AC=4;AF=1.00;AN=4;DP=85;ExcessHet=3.0103;FS=0.000;MLEAC=4;MLEAF=1.00;MQ=42.00;QD=30.89;SOR=0.716 GT:AD:DP:GQ:PL 1/1:0,50:50:99:1912,149,0 1/1:0,35:35:99:1369,105,0
KB222877.1 1528 . A G 3415.20 . AC=4;AF=1.00;AN=4;DP=90;ExcessHet=3.0103;FS=0.000;MLEAC=4;MLEAF=1.00;MQ=41.98;QD=28.85;SOR=0.738 GT:AD:DP:GQ:PL 1/1:0,43:43:99:1661,129,0 1/1:0,47:47:99:1781,140,0
KB222877.1 2973 . A C 3037.20 . AC=4;AF=1.00;AN=4;DP=79;ExcessHet=3.0103;FS=0.000;MLEAC=4;MLEAF=1.00;MQ=42.00;QD=26.67;SOR=0.941 GT:AD:DP:GQ:PL 1/1:0,40:40:99:1557,120,0 1/1:0,39:39:99:1507,117,0

into this:

#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT TC002 TC002-2
KB222877.1 1258 SKB222877.1_1258 G A . PASS AC=4;AF=1.00;AN=4;DP=85;ExcessHet=3.0103;FS=0.000;MLEAC=4;MLEAF=1.00;MQ=42.00;QD=30.89;SOR=0.716;DP=114 GT:AD:DP:GQ:PL 1/1:53,0:53:99:0,159,255 1/1:45,0:45:99:0,135,255
KB222877.1 1528 SKB222877.1_1528 A G . PASS AC=4;AF=1.00;AN=4;DP=90;ExcessHet=3.0103;FS=0.000;MLEAC=4;MLEAF=1.00;MQ=41.98;QD=28.85;SOR=0.738;DP=100 GT:AD:DP:GQ:PL 1/1:53,0:53:99:0,159,255 1/1:47,0:47:99:0,141,255
KB222877.1 2973 SKB222877.1_2973 A C . PASS AC=4;AF=1.00;AN=4;DP=79;ExcessHet=3.0103;FS=0.000;MLEAC=4;MLEAF=1.00;MQ=42.00;QD=26.67;SOR=0.941;DP=74 GT:AD:DP:GQ:PL 1/1:15,0:15:99:0,45,255 1/1:19,0:19:99:0,57,255




The input is VCFv4.2 and Tassel outputs 4.0. I don't know if this is part of the problem or not. 

Thanks for any help,

Alex

Terry Casstevens

unread,
Aug 9, 2016, 3:48:05 PM8/9/16
to Tassel User Group
I think at least one problem is that the depth information isn't being
sorted along with the positions.

I'm already working on this as part of a bigger effort to handle site
scores correctly (i.e. depth) when transforming genotype data. At
this point, I don't have an estimate when it will be fixed.
> --
> You received this message because you are subscribed to the Google Groups
> "TASSEL - Trait Analysis by Association, Evolution and Linkage" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tassel+un...@googlegroups.com.
> To post to this group, send email to tas...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tassel/3b439fe3-606f-4000-b835-d17d40230895%40googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
Reply all
Reply to author
Forward
0 new messages