Total (forward) probability of a receptor sequence

14 views

Skip to first unread message

Jamie Oaks

unread,

Jan 26, 2023, 1:54:50 PM1/26/23

to partis

Hello,

Is it possible to get the HMM's forward probability of a single receptor sequence from the output of partis? I'm particularly interested in Tcell receptor sequences, so no need to worry about clonal families.

I see the `partition` command reports the `logprob` of partitions, but I'm not sure how that relates to the probability of a specific sequence. Is this the (log) forward probability from the HMM for the inferred ancestral sequence of the partition? If so, can I get this for each sequence of a TCR repertoire (i.e., no clustering into clonal lineages necessary)? Or, would it be simpler to use partis to generate the HMM yaml and then run `ham` directly on each TCR sequence?

Thanks!

Jamie Oaks

Duncan Ralph

unread,

Jan 26, 2023, 4:31:51 PM1/26/23

to Jamie Oaks, partis

I don't think there's a way to do it just with options, since it only runs fwd if it's partitioning, and the first step of partitioning is to collapse all seqs with identical inferred naive seqs. But it's easy to turn off that collapse with one line of code. You want to insert this:

    partition = [[q] for q in queries] # group_seqs_by_value(queries, keyfunc) 
    print partition

in utils.collapse_naive_seqs() then run this:

./bin/partis partition --infname <file>  --debug 1 --n-procs 1 |tee log|less -RS

For instance running on a test file this:

./bin/partis partition --infname test/ref-results/test/simu.yaml --n-max-queries 3 --debug 1 --n-procs 1|less -RS

will give output from which i've screenshotted the attachments:

fwd.png is what you want: the first line here is printing the singleton partition to verify that that worked, then you want each line whose first whitespace char is 'fwd', for instance:

           fwd      -91.951       [294-296)     [15-34)   1v  1d  1j   0.07s      1  5047400969516159420

which has logprob, k_v range, k_d range (both are the integral/summation bounds it used, see paper), number of v/d/j genes it integrated over, time required, number of seqs, and finally colon-separated list of UIDs. So something like "grep '[ ][ ]*fwd' log|grep -v :" should give the single-seq fwd probs. For entertainment, i've also attached a screenshot of the bit where it caches each naive seq (i.e. runs viterbi on each single seq) -- this is much easier to get, just add --debug 1, although of course it could differ since it's only over one path. Also note that the --n-procs 1 is important -- you have to force it to run logprobs on everybody the first time through, otherwise the hack won't do anything. If you have a ton of seqs you could split them manually into subsets and run separately, sorry.

--
You received this message because you are subscribed to the Google Groups "partis" group.
To unsubscribe from this group and stop receiving emails from it, send an email to partis+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/partis/38c44815-bcff-4a62-841e-f9eda6b62b92n%40googlegroups.com.

vtb.png

fwd.png

Jamie Oaks

unread,

Jan 27, 2023, 9:42:00 AM1/27/23

to Duncan Ralph, partis

No apology necessary. Your detailed (and fast) response is very helpful.

Much appreciated, thanks!

Jamie

Reply all

Reply to author

Forward

0 new messages