Differences in assigning

27 views
Skip to first unread message

GrubaK

unread,
May 16, 2018, 12:50:54 PM5/16/18
to partis
Hi,

I am working on comparing the output from Partis and TIgGER. In my data, I have around 37.000 contigs and I found that the partis assigned 3239 of them for allele IGHV1-69*06, but TIgGER 1425 and 1741 into IGHV1-69*01, IGHV1-69*06 respectively.
I would like to point out that the reference sequences are different in only one position.
To gain a more detailed view, I prepared the simplest version of MSA. In results, 1415 contigs cover allele IGHV1-69*01 (they have G at position 220) and 1697 contigs cover IGHV1-69*06 (they have A at position 220).

It is very confusing for me, and my question is, why 'partis' assign those contigs for the same allele, this is kind of specific feature?
I would like to know if I may modify it by adjusting input parameters?

Thanks,
Kasia

Duncan Ralph

unread,
Jul 11, 2018, 1:12:36 PM7/11/18
to GrubaK, partis
re-sending to the list email, since I apparently missed hitting reply-all when I originally replied to this.

original response:

Hi,

I'm not sure if I've understood correctly, so let me know if not, but I think the problem is that half the sequences align most closely to *01, and half are closest to *06. You're then (quite reasonably) expecting that both of those alleles would be in the subject's germline.

So the trouble is that the fact that sequences align most closely to a given allele in the imgt set is actually not a very good indicator that that allele is in fact present in the subject. This is what I'm trying to communicate with Table 1 and Figure 4 here -- just aligning each sequence to the closest match in imgt results in a very large number of spurious alleles, because SHM makes the sequences closer to a spurious allele than to their true germline allele.

Now, that said, in your specific case, the second allele could be present, and it could be not -- that's essentially the entire question that partis germline inference is attempting to answer. One way you might imagine to answer it by hand, is looking at the difference between those two alleles at position 213, and if that position is always mutated, even in otherwise entirely unmutated sequences, that'd be an indication that the allele is truly there. This is basically the information that partis is using to do the fit.

As far as I understand it, tigger on the other hand only does the new-allele fitting for potential non-imgt alleles (whereas partis uses the new-allele fitting also to decide which imgt alleles are in/not in the sample). Tigger decides whether imgt alleles are in the sample by simply keeping the ones that have a prevalence higher than some threshold, I think 1/8, without considering the likelihood that SHM is screwing things up. And SHM can definitely cause two alleles that differ by only one position to appear to have similar prevalences.

OK that was probably a little too detailed. In practical terms, you can get a better idea what's going on by setting --debug-allele-finding, which'll print some info about what's going on as partis removes uncertain alleles, and then re-infers new alleles. You can also specify a --plotdir <path>, so it'll write the actual allele finding fits (look in the subdirectory, with a browser sw/allele-finding/try-0.html), as in figure 13 and 14 in that ^ paper.


--
You received this message because you are subscribed to the Google Groups "partis" group.
To unsubscribe from this group and stop receiving emails from it, send an email to partis+unsubscribe@googlegroups.com.
To post to this group, send email to par...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/partis/e5dda766-396b-4e1d-8ad6-a4e526888640%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply all
Reply to author
Forward
0 new messages