(somatic) variant frequency

219 views
Skip to first unread message

I.J. Nijman

unread,
Sep 26, 2014, 11:15:30 AM9/26/14
to strelka...@googlegroups.com
I know this has come up before but until now I'm still not sure what the best approach is to calculate the allele frequency of calls made by Strelka and would appreciate a pointer.

Cheers,
Ies Nijman

Chris Saunders

unread,
Sep 29, 2014, 10:52:58 AM9/29/14
to strelka...@googlegroups.com
Hi Ies,

Right now, strelka will output the basecall counts from both tumor and normal samples in the VCF output, although this is currently a fair bit of work to parse out. These basecall counts are contained in the AU,CU,GU, and TU fields (full description of output format is here: https://sites.google.com/site/strelkasomaticvariantcaller/home/somatic-variant-output). I recommend just using the tier1 counts (first in the comma separated list) for the purpose of frequency estimation.

These raw basecall counts are not ideal for estimating variant frequencies because with strelka’s default settings because they will potentially include some low-quality basecalls. There’s a custom option you can provide in the strelka config file to have it only output basecalls for each sample above a certain threshold, described here:

https://sites.google.com/site/strelkasomaticvariantcaller/home/faq#TOC-I-m-using-the-somatic-SNV-allele-counts-in-the-Strelka-VCF-output-for-downstream-analysis.-These-counts-include-even-very-low-quality-basecalls-how-do-I-remove-them-

…this will provide AU,CU,GU,TU counts which are more appropriate for estimating allele frequency.


-Chris

Reply all
Reply to author
Forward
0 new messages