cdr3_seqs issue in partis annotation output

0 views
Skip to first unread message

Elizabeth Van Itallie

unread,
Feb 19, 2026, 8:39:18 PM (3 days ago) Feb 19
to partis
Hi, 
With the help of a collaborator, we identified that sometimes the cdr3 returned by "cdr3_seqs" is missing part of the actual CDR3.  Here is a example mouse heavy chain BCR. It looks like it happens when there are insertions, definitely when there are insertions in the CDR3. See example below:

input fasta seq: 
GAGGTTCACCTGCAGCAGTCTGGGGCTGAGCTTGTGAGGCCAGGGGCCTCAGTCAAGTTGTCCTGCACAGCTTCTGGCTTTAACATTAAAGACGACTATATGCACTGGGTGAAACAGAGGCCTGAACAGGGCCTGGAGTGGATTGGATGGATTGATCCTGAGAATGATTATACTGAATATGCCTCGAAGTTCCAGGGCAAGGCCACTTTAACAGCAGACACATCCTCCAACACAGCCTACCTGCAGCTCAGCAGCCTGACATCTGAGGACACTGCCGTCTATTACTGTATAATTTATTACTACGGTAGTAGCGGGGTGGACTACTGGGGTCAAGGAACCTCAGTCACCGTCTCCTCA
aa sequence
EVHLQQSGAELVRPGASVKLSCTASGFNIKDDYMHWVKQRPEQGLEWIGWIDPENDYTEYASKFQGKATLTADTSSNTAYLQLSSLTSEDTAVYYCIIYYYGSSGVDYWGQGTSVTVSS

This CDR3 should be 14*3 = 42 nt. However partis returns:
"cdr3_length": 33,
"cdr3_seqs": ["TGTATAATTTATTACTACGGTATGGACTACTGG"]

This underlined part of the true CDR3 is missing (TGTATAATTTATTACTACGGTAGTAGCGGGGTGGACTACTGG). If you look at the germline gapped sequences it looks like an insertion in the J encoded part of the CDR3 has been inferred and that has been left out of the CDR3 that is returned.  

full output from running partis annotation on this single sequence as heavy with a black6 only mouse germline, but it shouldn't be very different with the default mouse germline. 

{"version-info": {"partis-git": {"commit": "99205b5da0e13d0743b4bef8cd8174ec113d8690", "n_ahead_of_tag": "1215", "tag": "0.16.0"}, "partis-yaml": 0.1}, "germline-info": {"seqs": {"j": {"IGHJ4*01": "ATTACTATGCTATGGACTACTGGGGTCAAGGAACCTCAGTCACCGTCTCCTCAG"}, "d": {"IGHD1-1*01": "TTTATTACTACGGTAGTAGCTAC"}, "v": {"IGHV14-4*01": "GAGGTTCAGCTGCAGCAGTCTGGGGCTGAGCTTGTGAGGCCAGGGGCCTCAGTCAAGTTGTCCTGCACAGCTTCTGGCTTTAACATTAAAGACGACTATATGCACTGGGTGAAGCAGAGGCCTGAACAGGGCCTGGAGTGGATTGGATGGATTGATCCTGAGAATGGTGATACTGAATATGCCTCGAAGTTCCAGGGCAAGGCCACTATAACAGCAGACACATCCTCCAACACAGCCTACCTGCAGCTCAGCAGCCTGACATCTGAGGACACTGCCGTCTATTACTGTACTACA"}}, "tryp-positions": {"IGHJ4*01": 20}, "cyst-positions": {"IGHV1-69*03": 252, "IGHV1S103*01": 253, "IGHV1-74*02": 252, "IGHV1-74*03": 252, "IGHV1S122*01": 252, "IGHV7-4*03": 291, "IGHV2-6-8*01": 282, "IGHV1-62-3*02": 252, "IGHV1-64*02": 252, "IGHV1S113*01": 252, "IGHV1-18*02": 253, "IGHV1-18*03": 252, "IGHV1-55*03": 252, "IGHV1-55*02": 252, "IGHV1S113*02": 253, "IGHV1-55*04": 252, "IGHV1S111*01": 252, "IGHV1-71*01": 285, "IGHV14-4*01": 285, "IGHV1S118*01": 253, "IGHV5-6-1*01": 285, "IGHV1S120*02": 252, "IGHV12-2-1*01": 288, "IGHV1-62-1*01": 283, "IGHV1S121*01": 252, "IGHV1S20*01": 242, "IGHV13-1*02": 291, "IGHV1S100*01": 253, "IGHV1S108*01": 253, "IGHV1S112*02": 253, "IGHV1-72*05": 252, "IGHV1-72*02": 252, "IGHV1-72*03": 252, "IGHV5-9-5*01": 285, "IGHV1-42*02": 253, "IGHV1S120*01": 252, "IGHV1-53*04": 252, "IGHV1-53*03": 252, "IGHV1-53*02": 252, "IGHV1S107*01": 253, "IGHV1S21*02": 222, "IGHV1S21*01": 230}, "functionalities": {}, "locus": "igh"}, "events": [{"input_seqs": ["GAGGTTCACCTGCAGCAGTCTGGGGCTGAGCTTGTGAGGCCAGGGGCCTCAGTCAAGTTGTCCTGCACAGCTTCTGGCTTTAACATTAAAGACGACTATATGCACTGGGTGAAACAGAGGCCTGAACAGGGCCTGGAGTGGATTGGATGGATTGATCCTGAGAATGATTATACTGAATATGCCTCGAAGTTCCAGGGCAAGGCCACTTTAACAGCAGACACATCCTCCAACACAGCCTACCTGCAGCTCAGCAGCCTGACATCTGAGGACACTGCCGTCTATTACTGTATAATTTATTACTACGGTAGTAGCGGGGTGGACTACTGGGGTCAAGGAACCTCAGTCACCGTCTCCTCAN"], "d_5p_del": 0, "mut_freqs": [0.020114942528735632], "duplicates": [[]], "vd_insertion": "TAA", "has_shm_indels": [true], "stops": [false], "d_3p_del": 20, "j_gene": "IGHJ4*01", "v_5p_del": 0, "codon_positions": {"j": 315, "v": 285}, "naive_seq": "GAGGTTCAGCTGCAGCAGTCTGGGGCTGAGCTTGTGAGGCCAGGGGCCTCAGTCAAGTTGTCCTGCACAGCTTCTGGCTTTAACATTAAAGACGACTATATGCACTGGGTGAAGCAGAGGCCTGAACAGGGCCTGGAGTGGATTGGATGGATTGATCCTGAGAATGGTGATACTGAATATGCCTCGAAGTTCCAGGGCAAGGCCACTATAACAGCAGACACATCCTCCAACACAGCCTACCTGCAGCTCAGCAGCCTGACATCTGAGGACACTGCCGTCTATTACTGTATAATTTATTACTATGCTATGGACTACTGGGGTCAAGGAACCTCAGTCACCGTCTCCTCAG", "cdr3_length": 33, "dj_insertion": "", "j_5p_del": 0, "invalid": false, "cdr3_seqs": ["TGTATAATTTATTACTACGGTATGGACTACTGG"], "qr_gap_seqs": ["GAGGTTCACCTGCAGCAGTCTGGGGCTGAGCTTGTGAGGCCAGGGGCCTCAGTCAAGTTGTCCTGCACAGCTTCTGGCTTTAACATTAAAGACGACTATATGCACTGGGTGAAACAGAGGCCTGAACAGGGCCTGGAGTGGATTGGATGGATTGATCCTGAGAATGATTATACTGAATATGCCTCGAAGTTCCAGGGCAAGGCCACTTTAACAGCAGACACATCCTCCAACACAGCCTACCTGCAGCTCAGCAGCCTGACATCTGAGGACACTGCCGTCTATTACTGTATAATTTATTACTACGGTAGTAGCGGGGTGGACTACTGGGGTCAAGGAACCTCAGTCACCGTCTCCTCAN"], "in_frames": [true], "n_mutations": [7], "fv_insertion": "", "mutated_invariants": [false], "j_3p_del": 0, "v_gene": "IGHV14-4*01", "indel_reversed_seqs": ["GAGGTTCACCTGCAGCAGTCTGGGGCTGAGCTTGTGAGGCCAGGGGCCTCAGTCAAGTTGTCCTGCACAGCTTCTGGCTTTAACATTAAAGACGACTATATGCACTGGGTGAAACAGAGGCCTGAACAGGGCCTGGAGTGGATTGGATGGATTGATCCTGAGAATGATTATACTGAATATGCCTCGAAGTTCCAGGGCAAGGCCACTTTAACAGCAGACACATCCTCCAACACAGCCTACCTGCAGCTCAGCAGCCTGACATCTGAGGACACTGCCGTCTATTACTGTATAATTTATTACTACGGTATGGACTACTGGGGTCAAGGAACCTCAGTCACCGTCTCCTCAN"], "unique_ids": ["RBS3_d12_HAmi_P22B03"], "v_3p_del": 5, "d_per_gene_support": {"IGHD1-1*01": 1.0}, "gl_gap_seqs": ["GAGGTTCAGCTGCAGCAGTCTGGGGCTGAGCTTGTGAGGCCAGGGGCCTCAGTCAAGTTGTCCTGCACAGCTTCTGGCTTTAACATTAAAGACGACTATATGCACTGGGTGAAGCAGAGGCCTGAACAGGGCCTGGAGTGGATTGGATGGATTGATCCTGAGAATGGTGATACTGAATATGCCTCGAAGTTCCAGGGCAAGGCCACTATAACAGCAGACACATCCTCCAACACAGCCTACCTGCAGCTCAGCAGCCTGACATCTGAGGACACTGCCGTCTATTACTGTATAATTTATTACTATGCTA.........TGGACTACTGGGGTCAAGGAACCTCAGTCACCGTCTCCTCAG"], "v_per_gene_support": {"IGHV14-4*01": 1.0}, "lengths": {"j": 54, "d": 3, "v": 289}, "j_per_gene_support": {"IGHJ4*01": 1.0}, "jf_insertion": "", "d_gene": "IGHD1-1*01", "regional_bounds": {"j": [295, 349], "d": [292, 295], "v": [0, 289]}}], "partitions": [{"n_procs": 1, "logprob": 0.0, "n_clusters": 1, "partition": [["RBS3_d12_HAmi_P22B03"]]}]}


Duncan Ralph

unread,
Feb 20, 2026, 9:50:27 AM (2 days ago) Feb 20
to Elizabeth Van Itallie, partis
Yes, it looks like it thinks there's an SHM indel in the J, and consequently returns the CDR3 length including this indel. This is what the annotation you included looks like with 'view-output':

p.jpg
You can tell it not to look for SHM indels by setting --no-indels (which just cranks the gap open penalty way up). I don't think there's an easy way to tell it to look for indels only outside of the CDR3, if that's what you're wanting. If you're wanting it to report the pre-indel CDR3 length, I'm not sure there's an easy way to do that, either. I think you'd have to just add/subtract the indel length afterwards.

--
You received this message because you are subscribed to the Google Groups "partis" group.
To unsubscribe from this group and stop receiving emails from it, send an email to partis+un...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/partis/b7acd226-8d55-4238-a421-948e42945111n%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages