--
You received this message because you are subscribed to the Google Groups "Kaviar-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaviar-discuss+unsubscribe@googlegroups.com.
To post to this group, send email to kaviar-discuss@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kaviar-discuss/1b54544c-f03c-46ca-9b85-4e8ed5dd6385%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Hello Gustavo,
thank you very much for your quick reply.
I have been trying to do as suggested - I determined the "minimal discrepancy of reference and alternative" for all entries in the VCF and then summed up the AC value for repeating variants.
However, now I found cases where AN is not consistent, e.g. for the variant chr2:160698700-160698700 A->C (again hg19).
Here I found > 3 entries:
2 160698700 . ATATA CTATA . . AF=0.0001896;AC=5;AN=26378;END=160698704;DS=GS000016444|ISB_founders-Nge3
2 160698700 . ATATAT CTATAT . . AF=0.0000379;AC=1;AN=26378;END=160698705;DS=GS000010327
2 160698700 . ATATATA CTATATA . . AF=0.0000322;AC=5;AN=155504;END=160698706;DS=GS000015891|ISB_founders-Nge3
In that case now AN varies between the entries - as far as I understood before, I assumed only AC should differ. I am sorry, but it is still not completely clear to me how these values are generated from the various sources, if this observation is something you would expect, and if so how to handle it.
Thanks again for your help!
Kind regards,
Florentine
The source repeatedly mentioned (ISB_founders-Nge3) is a collection of individuals; each individual genome would have just one version of the representation of the variant. Therefore it should be safe to sum the counts. AN refers to how many individuals have coverage over the locus, therefore it doesn't depend on the list of sources in which variation is observed.Best,-- Gustavo