I would like to maximize my sensitivity so that I can then feed the
results into mutationseq. I am processing human genomes, so I need
to
do some reasonable filtering just to make my set manageable. Of
course
I want to keep the SNVs where AA_AB and AA_BB have the highest
probability, but is there a recommended level to filter at?
thanks,
Richard
There are a couple approaches I can think of for extracting a large
but manageable set of somatic mutations from the output.
1) Set some large number, n, you want in advance. Then extract the top
n somatic mutations.
2) Set the somatic probability threshold very low. Unfortunately
different thresholds will give different numbers of somatic mutations
depending on the case. However, going as low as 0.01 has lead to
reasonably small datasets for me in the past.
You can implement option 1) very efficiently if you incrementally
build a sorted list (sorted by p_AA_AB, p_AA_BB) as you scan the
output.
I hope that helps.
Andy
What do you think of checking AA_AB+AA_BB > 0.5?
RIchard
Cheers,
Andy