Dear all,
Thank you for providing the Canopy package work flow to analyze tumor heterogeneity. I am currently working with it and I got a question concerning the config.summary of the
posterior tree evaluation step.
Within the provided code, the following explanations for the config.summary are given:
# first column: tree configuration
# second column: posterior configuration probability in the entire tree space
# third column: posterior configuration likelihood in the subtree space
I am wondering if these explanations are actually correct. For me it seems like the second column gives the proportion of configuration x among all configurations which are left after applying the post.config.cutoff. By changing this threshold, also the probabilities change accordingly to the number of tree configurations which are considered under the threshold. This would mean that the probability refers to the subtree space of the threshold post.config.cutoff and not to the entire tree space, wouldn't it?
However, the third column gives the tree likelihood which is estimated within the MCMC sampling approach and therefore indeed in the entire tree space. Also the header of the config.summary stating "Mean_post_lik" for the third column confuses me, as the canopy.post function seems to take the (tree with the) maximum likelihood among all trees with the same configuration, but not a mean value of all likelihoods (of those trees with the same configuration).
I also would like to know from a statistical point of view, why the tree with the highest likelihood is chosen, even if this tree has a configuration which is less probable concerning all configurations under the threshold post.config.cutoff.
Many thanks to you in advance!
Kindest regards,
Caroline
On Oct 15, 2019, at 12:40 PM, Wandinger, Caroline Sophie <c.wan...@dkfz-heidelberg.de> wrote:
Dear all,
Thank you for providing the Canopy package work flow to analyze tumor heterogeneity. I am currently working with it and I got a question concerning the config.summary of theposterior tree evaluation step.
Within the provided code, the following explanations for the config.summary are given:
# first column: tree configuration
# second column: posterior configuration probability in the entire tree space
# third column: posterior configuration likelihood in the subtree space
I am wondering if these explanations are actually correct. For me it seems like the second column gives the proportion of configuration x among all configurations which are left after applying the post.config.cutoff. By changing this threshold, also the probabilities change accordingly to the number of tree configurations which are considered under the threshold. This would mean that the probability refers to the subtree space of the threshold post.config.cutoff and not to the entire tree space, wouldn't it?
However, the third column gives the tree likelihood which is estimated within the MCMC sampling approach and therefore indeed in the entire tree space.
Also the header of the config.summary stating "Mean_post_lik" for the third column confuses me, as the canopy.post function seems to take the (tree with the) maximum likelihood among all trees with the same configuration, but not a mean value of all likelihoods (of those trees with the same configuration).
I also would like to know from a statistical point of view, why the tree with the highest likelihood is chosen, even if this tree has a configuration which is less probable concerning all configurations under the threshold post.config.cutoff.