re-sending to the list email, since I apparently missed hitting reply-all when I originally replied to this.
Hi,
I'm
not sure if I've understood correctly, so let me know if not, but I
think the problem is that half the sequences align most closely to *01,
and half are closest to *06. You're then (quite reasonably) expecting
that both of those alleles would be in the subject's germline.
So
the trouble is that the fact that sequences align most closely to a
given allele in the imgt set is actually not a very good indicator that
that allele is in fact present in the subject. This is what I'm trying
to communicate with Table 1 and Figure 4
here
-- just aligning each sequence to the closest match in imgt results in a
very large number of spurious alleles, because SHM makes the sequences
closer to a spurious allele than to their true germline allele.
Now,
that said, in your specific case, the second allele could be present,
and it could be not -- that's essentially the entire question that
partis germline inference is attempting to answer. One way you might
imagine to answer it by hand, is looking at the difference
between those two alleles at position 213, and if that position is
always mutated, even in otherwise entirely unmutated sequences, that'd
be an indication that the allele is truly there. This is basically the
information that partis is using to do the fit.
As
far as I understand it, tigger on the other hand only does the
new-allele fitting for potential non-imgt alleles (whereas partis uses
the new-allele fitting also to decide which imgt alleles are in/not in
the sample). Tigger decides whether imgt alleles are in the sample by
simply keeping the ones that have a prevalence higher than some
threshold, I think 1/8, without considering the likelihood that SHM is
screwing things up. And SHM can definitely cause two alleles that differ by only one position to appear to have similar prevalences.
OK
that was probably a little too detailed. In practical terms, you can
get a better idea what's going on by setting --debug-allele-finding,
which'll print some info about what's going on as partis removes
uncertain alleles, and then re-infers new alleles. You can also specify a
--plotdir <path>, so it'll write the actual allele finding fits
(look in the subdirectory, with a browser sw/allele-finding/try-0.html),
as in figure 13 and 14 in that ^ paper.