HMM parameters (similar to what linearham is using)

Evan Li

unread,

Apr 9, 2024, 9:32:13 PM4/9/24

to partis

Hi!

I would like to ask which file I should look at if I want to see the parameters (transition probabilities and emission probabilities) of the HMM on the UCA. Any assistance would be greatly appreciated!

Thanks,

Evan

Duncan Ralph

unread,

Apr 10, 2024, 12:41:46 PM4/10/24

to acad...@yongkangl.com, partis

It's been a really long time since I thought about this, so I could be wrong, but I don't think there's an easy way to get what you're after. Although, I'm also not entirely certain that I'm thinking of the right thing as the HMM parameters "on" the UCA.

In general, you can't get a bunch of things out of our HMM that you might expect to because of the HMM "factorization" (see paper), where there's separate V, D, and J HMMs, which is necessary for speed, but means you can't get probabilities of full sequence paths, only of sequence paths through each region.

Anyway, this is what you can get. The parameter dir will always have HMM model files, for instance here, that have all the emission and transition probabilities, but I don't think this is what you're after since it won't tell you anything about the UCA specifically. You can also crank up to debug 2, and it will show you most of the numbers coming out of the HMM, for instance running:

./bin/partis annotate --infname test/ref-results/test/simu.yaml --parameter-dir test/ref-results/test/parameters/simu --n-max-queries 1 --debug 2 | less -RS

will give something like:

where each block (296 21, 296 20, etc) is for a particular k_v, k_d pair (see paper), and then it shows all the probs for the seqs in each V/D/J region.

Finally, the alternative annotations described in the annotation uncertainty section of the manual may be useful.

--
You received this message because you are subscribed to the Google Groups "partis" group.
To unsubscribe from this group and stop receiving emails from it, send an email to partis+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/partis/6057e802-d536-4210-818e-3c8b4bbe78b3n%40googlegroups.com.

Evan Li

unread,

Apr 10, 2024, 1:46:27 PM4/10/24

to partis

Hi,

Thank you so much for your prompt response!

So the output in the parameter dir is about each sequences from the input file. Is this correct?

I thought partis also outputs the specific HMM parameters for the UCA (like what linearham is utilizing). If I want to acquire this information, should I look at linearham? Thank you!

Best,

Evan

Duncan Ralph

unread,

Apr 10, 2024, 4:44:51 PM4/10/24

to Evan Li, partis

The files in the parameter dir are "averages"/include info from all sequences in the input file. For instance if there's 10 input sequences with a particular V gene, the emission probabilities in that gene's HMM file will, more or less, be averages over the times each position was mutated in those 10 input sequences.

Ah, I guess it depends what we're meaning by "HMM parameters". Linearham doesn't use anything like the individual transition/emission probabilities -- it uses the parameters defining a rearrangement event/naive sequence (v/d/j genes, insertions, deletions, etc). If you want to see exactly what linearham gets, you can just run the 'get-linearham-info' action:

./bin/partis --help

<snip>

    get-linearham-info  Write input file for linearham (to --linearham-info-fname), using a previous partis output (--outfname) file as input.

To view this discussion on the web visit https://groups.google.com/d/msgid/partis/b947e8da-062d-4aac-aa9d-08ae56a41e05n%40googlegroups.com.

Message has been deleted

Evan Li

unread,

Apr 24, 2024, 10:20:25 PM4/24/24

to partis

Hi,

Thank you so much for your reply! And sorry for the delay.

It seems delicate for me to manipulate these "HMM parameters" on my own, even though we haven't got a clear definition on this yet.

Maybe it could be beneficial to clarify my problem. In the linearham paper, there is a sentence that "the computation of the emission probabilities is of interest because the hidden state probabilities in p(Y_naive) can be easily inferred using partis, which we now call \hat{p}(Y_naive)". I wonder how to infer this quantity with partis. Say given a naive sequence, how can I know this probability? Is there a committed subcommand to do this?

Thank you for your patience! Any assistance would be greatly appreciated!

Best,

Evan

Duncan Ralph

unread,

Apr 25, 2024, 4:19:31 PM4/25/24

to Evan Li, partis

Ah, ok that makes more sense. That is referring to the partis rearrangement/naive sequence viterbi probability. I don't think that's written to any output files at the moment, but you can get it a few ways. The easiest is probably to run annotate or partition with --debug 1, in which case the debug is passed through to bcrham, which prints the viterbi log prob to std out, which will look like these:

where the first is running a bunch of single-sequence annotations, and the second is a multi-sequence annotation. The viterbi prob is hte second column, and the sequence ids are the last column. In order for the multi-sequence one to be what you want, I think you'll need to also turn of subcluster annotation (see paired paper) for instance like so:

./bin/partis partition --infname test/new-results/partition-new-simu.yaml --parameter-dir test/new-results/test/parameters/simu --n-max-queries 10 --n-procs 1 --debug 1 --subcluster-annotation-size 1000

 You can also pull the numbers from the bcrham output csv if you comment the line that removes it (this will then cause partis to crash on exit when it tries to rm a non-empty dir, but that's fine).

So unfortunately I think you will have to either grep through some debug output, or modify the code a bit.

To view this discussion on the web visit https://groups.google.com/d/msgid/partis/84511489-98bd-42fb-995a-2e9cedf43167n%40googlegroups.com.

Reply all

Reply to author

Forward