Conditional Probability of CDR3 region given V-gene and J-gene

Yongkang Li

unread,

Jul 24, 2024, 2:41:27 PM7/24/24

to partis

Hello,

Hope you are doing well!

I would like to ask if there is any way to calculate the conditional probability of CDR3 region given V-gene and J-gene. For example, if I know the V-gene, J-gene, the length of CDR3 region and the length of deletion of V3' and J5', would it be possible to calculate the conditional probability of CDR3 region directly with partis? If not, would it be possible for you to kindly tell me the corresponding code snippets that calculate the likelihood regarding the CDR3 region? Thank you so much!

Evan

Duncan Ralph

unread,

Jul 28, 2024, 10:54:23 AM7/28/24

to Yongkang Li, partis

Sorry for the delay, I've been travelling. It's been a very long time since I've been in the guts of the partis HMM, so I could be wrong, but I don't think there'd be any reasonably easy way to calculate the CDR3 region probability. Mostly because partis calculates probabilities on the V, D, and J regions separately (with insertions attached to a neighboring region), rather than on CDR/FWK regions. So the CDR3 probability would require bits of all three different pieces, which isn't possible to get. If D + insertion would be good enough for you (I believe the VD insertion is included with D, and DJ insertion is included with J, but I'm not sure), you might be able to get it from the existing probabilities. You'd want to start probably looking through these options for alternative annotations and naive probability estimates, if you haven't already:

https://github.com/psathyrella/partis/blob/dev/docs/subcommands.md#annotation-uncertainties-alternative-annotations

https://github.com/psathyrella/partis/blob/dev/docs/subcommands.md#naive-rearrangement-probability-estimates

In particular the second one mentions this script, which sums probabilities in a way that I think is what you'd want to do:

https://github.com/psathyrella/partis/blob/dev/bin/get-naive-probabilities.py

I would also recommend running on one or two sequences with --debug 2 turned on (piping to less -RS), which prints every probability that it calculates as it does so, for each k_v/k_d pair, etc (see screenshot for example output)

--
You received this message because you are subscribed to the Google Groups "partis" group.
To unsubscribe from this group and stop receiving emails from it, send an email to partis+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/partis/bdcda762-9a18-470d-be7b-8ed6541a9428n%40googlegroups.com.

p.jpg

Yongkang Li

unread,

Jul 30, 2024, 10:58:15 AM7/30/24

to partis

Hello,

Thank you so much for your response!

I think D + insertion might be a good enough approximation for me. It is just that I am a little bit confused about how to get this from the existing probabilities. I tried to calculate alternative annotations and use the get-naive-probabilities.py script. It looks like that I don't get the D + insertion probability. It is noteworthy that my input is just one sequence and I desire to get D + insertion probability of the only sequence as a UCA. Therefore, I don't think I need the step of annotation?

Would it be possible for you to kindly tell me how to skip the annotation step and directly calculate the probabilities of a sequence as a UCA? For example, if my sequence is CAGGTGCAGCTGGTGCAGTCTGGGGCTGAGGTGAAGAAGCCTGGGGCCTCAGTGAAGGTTTCCTGCAAGGCATCTGGATACACCTTCACCAGCTACTATATGCACTGGGTGCGACAGGCCCCTGGACAAGGGCTTGAGTGGATGGGAATAATCAACCCTAGTGGTGGTAGCACAAGCTACGCACAGAAGTTCCAGGGCAGAGTCACCATGACCAGGGACACGTCCACGAGCACAGTCTACATGGAGCTGAGCAGCCTGAGATCTGAGGACACGGCCGTGTATTACTGTGCGAGAGACGTGGGAACGGAGGGGAGTTTACTCCACTTTGACTACTGGGGCCAGGGAACCCTGGTCACCGTCTCCTCAG,

the V gene is IGHV1-46*01, D gene is IGHD3-16*03 and J gene is IGHJ4*02. How do I get the D + insertion probabilities of this sequence as an UCA?

Your patience and assistance would be greatly appreciated!

Thanks,

Yongkang

Duncan Ralph

unread,

Jul 31, 2024, 10:26:08 AM7/31/24

to Yongkang Li, partis

Yes, I think the D+insertion probability is only in the --debug 2 output, not written to file.

The probabilities could also be calculated in partitioning, but annotation seems simpler, so I'm not sure which step you're thinking of running instead of annotation. Once you have parameters cached, running annotation with --queries `your id` seems easiest (assuming you're caching parameters on more than one sequence, otherwise --queries of course isn't necessary).

I don't think there's an easy way to tell it that the sequence you're passing in is the naive sequence for its annotation (assuming this is what you want by "as UCA"). You could try setting emission probabilities to non-germline bases to zero in the yaml model files, but that'd require some coding.

To view this discussion on the web visit https://groups.google.com/d/msgid/partis/65c921dc-f8b2-44f1-aa72-c76485a58b75n%40googlegroups.com.

Reply all

Reply to author

Forward