Differences in assigning

27 views

Skip to first unread message

GrubaK

unread,

May 16, 2018, 12:50:54 PM5/16/18

to partis

Hi,

I am working on comparing the output from Partis and TIgGER. In my data, I have around 37.000 contigs and I found that the partis assigned 3239 of them for allele IGHV1-69*06, but TIgGER 1425 and 1741 into IGHV1-69*01, IGHV1-69*06 respectively.

I would like to point out that the reference sequences are different in only one position.
To gain a more detailed view, I prepared the simplest version of MSA. In results, 1415 contigs cover allele IGHV1-69*01 (they have G at position 220) and 1697 contigs cover IGHV1-69*06 (they have A at position 220).

It is very confusing for me, and my question is, why 'partis' assign those contigs for the same allele, this is kind of specific feature?

I would like to know if I may modify it by adjusting input parameters?

Thanks,

Kasia

Duncan Ralph

unread,

Jul 11, 2018, 1:12:36 PM7/11/18

to GrubaK, partis

re-sending to the list email, since I apparently missed hitting reply-all when I originally replied to this.

original response:

Hi,

I'm
 not sure if I've understood correctly, so let me know if not, but I 
think the problem is that half the sequences align most closely to *01, 
and half are closest to *06. You're then (quite reasonably) expecting 
that both of those alleles would be in the subject's germline.

So
 the trouble is that the fact that sequences align most closely to a 
given allele in the imgt set is actually not a very good indicator that 
that allele is in fact present in the subject. This is what I'm trying 
to communicate with Table 1 and Figure 4 here
 -- just aligning each sequence to the closest match in imgt results in a
 very large number of spurious alleles, because SHM makes the sequences 
closer to a spurious allele than to their true germline allele.

Now,
 that said, in your specific case, the second allele could be present, 
and it could be not -- that's essentially the entire question that 
partis germline inference is attempting to answer. One way you might 
imagine to answer it by hand, is looking at the difference
 between those two alleles at position 213, and if that position is 
always mutated, even in otherwise entirely unmutated sequences, that'd 
be an indication that the allele is truly there. This is basically the 
information that partis is using to do the fit.

As
 far as I understand it, tigger on the other hand only does the 
new-allele fitting for potential non-imgt alleles (whereas partis uses 
the new-allele fitting also to decide which imgt alleles are in/not in 
the sample). Tigger decides whether imgt alleles are in the sample by 
simply keeping the ones that have a prevalence higher than some 
threshold, I think 1/8, without considering the likelihood that SHM is 
screwing things up. And SHM can definitely cause two alleles that differ by only one position to appear to have similar prevalences.

OK
 that was probably a little too detailed. In practical terms, you can 
get a better idea what's going on by setting --debug-allele-finding, 
which'll print some info about what's going on as partis removes 
uncertain alleles, and then re-infers new alleles. You can also specify a
 --plotdir <path>, so it'll write the actual allele finding fits 
(look in the subdirectory, with a browser sw/allele-finding/try-0.html),
 as in figure 13 and 14 in that ^ paper.

--
You received this message because you are subscribed to the Google Groups "partis" group.
To unsubscribe from this group and stop receiving emails from it, send an email to partis+unsubscribe@googlegroups.com.
To post to this group, send email to par...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/partis/e5dda766-396b-4e1d-8ad6-a4e526888640%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply all

Reply to author

Forward

0 new messages