Question about probabilistic mapping

18 views
Skip to first unread message

Keren Halabi

unread,
May 13, 2019, 10:51:00 AM5/13/19
to Bio++ Usage Help Forum
Hi dear team,

I would like to use the simulation free mapping algorithm by Minin and Suchard (2008), which is, to my knowledge, implemented in Bio++.
For this purpose, I would like to:
(1) Estimate analytically the mean number of substitttution for each combination of site and branch.
(2) Receive the expected dwelling time under each state in my model, for each combination of site and branch.

If I understand correctly, the class I should be using is ProbabilisticSubstitutionMapping. I see here that the default usage returns the expected number of substitution for each combination of site and branch (sounds like what I need in 1), while the extended usage returns the dwelling time per substitution type, and not state (I can derive from it what I need in 2):
"The probabilistic mapping can however be extended to contain a matrix will all types of substitutions, instead of their total number.".

I would be grateful for the following clarifications: 
(1) Am I indeed looking at the right class? the paper by Minin and Suchard is specified in ProbabiliticRewardMapping, which returns a single reward per branch - and not one for each substitution type or state.
(2) How I should be using it. Specifically, the constructor requires as input an instance of type "SubstitutionCount". In the testers of bpp-phyl, I see examples in which this instance is created in either naive approach of with Laplace. I see in the paper by Minin and Suchard that there is also an option to analytically estimate the mean number of trait changes. I would like to create a mapping that corresponds to this estimate. Is it possible to create a fitting SubsitutionCount instance for this case? if yes, How can I do that?

Many thanks in advance!
Keren

Laurent Guéguen

unread,
May 15, 2019, 12:34:38 AM5/15/19
to Bio++ Usage Help Forum
Hi Keren,

yes you are looking at the right classes, for both questions.

The counts are computed for any category of substitutions, or any category of sets of substitutions,
as well as the rewards. I would be surprised that you have to develop something new.
In bio++, these categories are called "SubstitutionRegisters".

To have an idea of what is available, you can look at mapnh program (that does the job)
examples:



Cheers,
Laurent


Keren Halabi

unread,
May 15, 2019, 5:11:36 AM5/15/19
to Bio++ Usage Help Forum
Thank you, Laurent!

In the below discussion, I refer to a stochastic mapping as an instance that consist of the time-points and types of substitutions along each branch in the phylogeny.

I had a look at testnh. If I understand correctly, the output of testnh is a mapping of the expected number of substitutions for each combination of substitution type and branch in the phylogeny. However, it doesn't provide the order of the substitutions along each branch and their corresponding rewards (i.e., expected times of transitions along each branch). However, I understand that there is a possibility in BIo++ to compute the reward for each combination of substitution type and branch. So in the case that a specific substitution type along some branch B receives count X higher than 1, the reward must be "divided" between the X substitutions along B.

Consequently, unlike the simulation based stochastic mapping, which can be translated into an explicit mapping along the phylogeny, here I may need to apply additional heuristics to derive a mapping along the phylogeny (for example, to decide how to divide a reward between multiple substitutions). Is this indeed the case?

Many thanks!
Keren

Keren Halabi

unread,
May 15, 2019, 5:33:20 AM5/15/19
to Bio++ Usage Help Forum
Hi again,

To further clarify what I meant earlier, I see that SubstitutionMapping class offers two types of mappings: probabilistic mapping and exact mapping (storing the positions of each substitution onto each branch). I am interested in obtaining analytically an exact mapping. 

Thanks!
Keren

Laurent Guéguen

unread,
May 22, 2019, 6:02:49 AM5/22/19
to Bio++ Usage Help Forum
Hi Keren,

yes indeed both are available. If you look at the code of mapnh (which is more oriented to basic mapping than testnh that has further goals), you can see how
they are obtained (for example when option splitNorm=T), through functions defined in SubstitutionMappingTools.h.

Cheers,
Laurent



Keren Halabi

unread,
May 22, 2019, 10:35:35 AM5/22/19
to Bio++ Usage Help Forum
Thank you, Laurent!

I had a look at mapnh, and this indeed appears close to what I need. What I require corresponds to case 5 of the program (or 7 - which is identical in my case since I only have one site of the trait tip data). The only difference is the statistic I need. Because I want to derive the expected dwelling times of the trait states for each branch, I need the rewards instead of the transitions count. To derive the expected dwelling times for each combination of trait state and branch, I also need to be able to set the rewards per trait state, referred to as the omegas vector in the paper by Minin and Suchard (equations 1.4 and 2.8). As stated in the paper (below equation 1.4): "...To obtain dwelling times of a mapping in a predefined set of trait states, we set the relevant entry in the omegas vector to 1 if i belongs to the set of interest and to 0 otherwise.". More specifically, say I have a two-state Markov model, and I want to estimate the expected dwelling times under states 0 and 1. I can set omega(0) to 1 and omega(1) to 0 and then compute the expected reward per branch. The yielded values correspond to the expected dwelling times in state 0 per branch. The expected dwelling times in state 1 per branch could be easily derived as the values complementing the former ones to the respective branches lengths. 
I didn't find the definition of this  this "omegas vector" in the code, so I am not sure how to set its values. Could you please let me know how I can do this?

Thanks!
Keren

Keren Halabi

unread,
Jun 2, 2019, 6:53:16 AM6/2/19
to Bio++ Usage Help Forum
Dear Laurent,

Is there any update on this question?

Many thanks!
Keren

Laurent Guéguen

unread,
Jun 6, 2019, 12:51:17 AM6/6/19
to Bio++ Usage Help Forum
Dear Keren,

sorry, I missed your message.

From mapnh, you can get the rewards directly. This is encoded in the DecompositionReward class.
The omegas are defined as an AlphabetIndex1 object.

Cheers,
Laurent


Keren Halabi

unread,
Jun 6, 2019, 4:51:57 AM6/6/19
to Bio++ Usage Help Forum
Dear Laurent,

It's not a problem. 

Thank you for the clarification. If I understand correctly, the data member index_ in the AlphabetIndex1 instance (that is given to the constructor of DecompositionReward) is the vector of omegas I referred to earlier. As such, the data member bMatrices_ in index 0 holds the diagonal matrix of the omega values that is used in DecompositionMethods::computeProducts_ (called by RewardMappingTools::computeRewardVectors), according to equation 2.11 in the paper. In this case, If I want to compute the expected dwelling times under each state in each branch (i.e., node id), all I need to do (based on the content underneath equation 1.4 in the paper) is to set the indices of the states of the model in the AlphabetIndex1 instance to 1.
Please let me know your thoughts on this.

Many thanks!
Keren

Laurent Guéguen

unread,
Jun 7, 2019, 4:15:15 AM6/7/19
to Bio++ Usage Help Forum
Dear Keren,

if you set all the states of AlphabetIndex1 to 1, you will obtain the sum of the dwelling times of all the states,
which means the length of the branch.

In your case, you have to compute successively on all states of interest, the reward with all terms in the AlphabetIndex1 set to 0, excepted one term (state of interest) set to 1.

Cheers,
laurent

Keren Halabi

unread,
Jun 10, 2019, 3:52:06 AM6/10/19
to Bio++ Usage Help Forum
Thank you, Laurent!

Now everything is clear to me.

Cheers,
Keren
Reply all
Reply to author
Forward
0 new messages