Hi Sangwoo,
1. Short answer is that JointSNVMix is much more specific and a little
less sensitive to the approach you describe.
Longer answer. The main difference between JointSNVMix and running
SNVMix as described, is that the latter ignores correlation between
samples. The result will be that using two independent SNVMix runs
combined post-hoc tends to predict a large number of false positives,
though it also predicts more true positives. In practice the large
number of false positives from running SNVMix was what motivated the
development of JointSNVMix.
Running SNVMix and combining the results post-hoc is included in the
JointSNVMix software. You can use the command snv_mix_one/snv_mix_two
in place of joint_snv_mix_one/joint_snv_mix_two when training and
classifying. Be aware these commands use different priors and
parameters files, but these are included in the program config/
folder. They are called indep_params.cfg and indep_priors.cfg.
2. Short answer the software does not explicitly model heterogeneity.
However, training the model should let the parameters adapt to this
situation. Every sample we have used the software on has some level of
normal contamination, so I would say the software can handle this
situation provided training is run.
Long answer. In principle what you are describing would lead to a
landscape of allelic frequencies (#ref_bases/dept) in the tumour with
more than three modes. To understand this remember that if it were
pure tumour we would see three peaks for (AA, AB, BB), ignoring copy
number. If we are now in the situation you describe then we would have
peaks for (AA_AA, AA_AB, AA_BB, ..., BB_BB). This might not be a huge
issue however. Consider a position which is AA_AB. For the program to
work, all that is required is that it is significantly more probable
that it matches AB cluster in the tumour, than either the AA or BB
clusters.
I do consider this a limitation along with not explicitly handling
copy number, and would like to address it in the future.
One simple idea I have is to expand the number of parameters used in
the binomial densities. At present, there are only 6 parameters for
the binomial densities, 3 for the normal and 3 for the tumour. This is
accomplished by sharing say the tumour parameter for the genotype AB,
across the classes AA_AB, AB_AB, BB_BB. A simple way to handle
heterogeneity in the tumour might be to change this setup so that
there are 9 parameters for the tumour.
Hopefully that helps.
Cheers,
Andy