Duplicate contradicting genotypes in MNPs

44 views
Skip to first unread message

Keren Carss

unread,
Apr 27, 2017, 6:46:48 AM4/27/17
to Platypus Users
Hi,

We have run platypus on a trio and are having problems with certain variants the output.  Here is an example: two lines from the output VCF:

1       804107  .       TCCCTAGAGACAA   CCACTAGAAACAT   367     PASS    BRF=0.26;FR=0.1667;HP=3;HapScore=2;MGOF=47;MMLQ=41;MQ=51.56;NF=4;NR=0;PP=3
67;QD=110.117;SC=CAGAACACAATCCCTAGAGAC;SbPval=1;Source=Platypus;TC=76;TCF=48;TCR=28;TR=4;WE=804127;WS=804097  GT:GL:GOF:GQ:NR:NV      0/0:0,-9.03,
-300:8:90:28:0      0/0:0,-8.13,-300:7:81:28:0      1/0:-30.2,0,-151.88:47:99:20:4
1       804115  .       G       A       2965    PASS    BRF=0.26;FR=0.8333;HP=2;HapScore=2;MGOF=69;MMLQ=37;MQ=50.49;NF=44;NR=24;PP=2965;QD=20;SC=AATCCCTAGAGACAACCTACC;SbPval=0.51;Source=Platypus;TC=69;TCF=45;TCR=24;TR=68;WE=804127;WS=804097        GT:GL:GOF:GQ:NR:NV      1/1:-102.8,-7.8,0:8:78:26:25    1/1:-102.4,-7.77,0:7:78:27:27   1/1:-63.9,-4.52,0:69:45:16:16

So the first variant is a MNP that overlaps the SNP in the second line. We are trying to decompose and normalise the MNPs so that we can annotate them against other data sources. Therefore we would like to express this MNP as 4 SNPs: 1:804107T>C 1:804109C>A 1:804115G>A (which is the same SNP as in the second line above) and 1:804119A>T. The problem is that the first line says that individuals 1 and 2 are homozygous reference and individual 3 is heterozygous for 1:804115G>A, but the second line says all individuals are homozygous for the same SNP.

Has anyone else come across this? Is there a simple solution?

Many thanks, Keren




Andy Rimmer

unread,
May 4, 2017, 3:40:14 AM5/4/17
to Keren Carss, Platypus Users
Hi Keren,

This looks like a bug in the way Platypus generates the VCF output. The genotypes for the first variant are for the whole MNP, so you have to be careful when breaking it down to individual SNPs. The MNP should be reported with the following SNP in a single VCF record, to avoid this kind of problem. The way I would interpret this is that individuals 1 and 2 do not have the long MNP at all, but do have the SNP, and individual 3 is heterozygous for the MNP and heterozygous for the haplotype with just the SNP, and so has 2 copies of the SNP (one from the MNP and one from the single SNP). So the genotypes are not necessarily wrong, but they are confusing. VCF does not really handle reporting of long haplotypes very well.

It's also worth scanning the BAM file at this location. Individual 3 has only 4 reads out of 20 supporting the MNP, which may be because the MNP is not always aligned correctly in the BAM, or could indicate that the MNP is actually just an unfortunate pileup of errors around that site.

Kind regards,
Andy

--
You received this message because you are subscribed to the Google Groups "Platypus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to platypus-users+unsubscribe@googlegroups.com.
To post to this group, send email to platypus-users@googlegroups.com.
Visit this group at https://groups.google.com/group/platypus-users.
To view this discussion on the web, visit https://groups.google.com/d/msgid/platypus-users/28b27dd5-f08b-4a07-8113-5b59855ae5aa%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--
Dr Andrew (Andy) Rimmer
Reply all
Reply to author
Forward
0 new messages