Hi Russell,
Thanks for your interest in Canopy. It is a bit odd that there’s such discrepancy between purity estimates by Canopy and by the pathologists. I don’t think putting an upper limit on the normal fractions is a good solution. It might actually return unexpected results like the one you mentioned where the mutations on the topologies were highly asymmetrical. Have you performed any sanity check, e.g., by looking at the VAFs of the clonal point mutations or the fractional copy number estimates of the clonal CNAs? These usually can give you a good estimate of the purity.
Another way to check is to use orthogonal approaches to estimate the purity. Sequenza and Absolute are two methods my collaborators use and they seem to perform quite well.
In other words, I guess my question is are you sure the discrepancy is not real? Maybe the pathologist did not give a good estimate or maybe the tumor tissues were collected in a different way that led to the biases.
If you have made all the aforementioned sanity checks and still find that the results of Canopy don’t make sense, we can take a look at your de-identified data.
Yuchao
From: Bonneville, Russell <Russell.B...@osumc.edu>
Sent: Tuesday, October 1, 2019 5:15 PM
To: Jiang, Yuchao <yuc...@email.unc.edu>
Cc: Roychowdhury, Sameek <Sameek.Ro...@osumc.edu>
Subject: Canopy estimation of normal contamination
Hello Dr. Jiang,
We have used Canopy extensively for subclonal analysis of research autopsy cases, and published multiple papers utilizing results from Canopy. However, we are encountering an issue with Canopy’s estimation of normal cell fractions. For several solid tumor samples from cases with low tumor mutational burden (<5 mutations/Mb) and known normal percentages ~50% (pathologist estimate), Canopy often substantially underestimates the degree of normal contamination (see table below). This does not seem to be an issue in cases with higher TMB or higher tumor content.
Currently, we are working around this via overriding Canopy’s percent normal (by patching the cell line code) to be 20% below the pathologist’s estimate for each tumor sample. We almost always find that Canopy generates a clone with no mutations, which when added to the normal (see third column below) yields combined percent normal estimates much more in line with pathologist estimates. In addition, the resulting trees are often different, namely we tend to find more descendent relationships and less parallel relationships between clones when we do this.
Pathologist |
Canopy |
Canopy with -20% floor |
0.5 |
0.734 |
0.643 |
0.5 |
0.067 |
0.532 |
0.4 |
0.665 |
0.594 |
0.5 |
0.564 |
0.56 |
0.5 |
0.53 |
0.53 |
0.4 |
0.335 |
0.43 |
0.55 |
0.413 |
0.352 |
0.5 |
0.131 |
0.3 |
0.4 |
0.023 |
0.201 |
0.55 |
0.547 |
0.457 |
0.5 |
0.601 |
0.357 |
0.5 |
0.306 |
0.304 |
0.4 |
0.393 |
0.384 |
Would you have any recommendations for optimizing the accuracy of Canopy clonal fractions, including percent normal estimates? We are striving to generate phylogenies that most accurately represent our input data, especially given that we have developed several downstream analyses based on them. Thank you for your time.
Sincerely, Russell Bonneville
Russell Bonneville
PhD Candidate, Roychowdhury Lab
Biomedical Sciences Graduate Program
The Ohio State University
460 W. 12th Ave., Rm. 595
Columbus, OH 43210
Lab Phone: (614) 685-5842
Hello Yuchao,
Thank you very much for your reply. We have looked at VAF distributions (attached separately), which seem to imply either lower purity or very few truncal mutations. We have also utilized several orthogonal approaches to estimate purity, namely Accurity, Sclust and Sequenza, in the table below. Note that Sequenza’s optimal solutions frequently included ploidies of 3 to 5 in this patient, therefore the table includes both Sequenza’s best solution and its alternative solution closest to ploidy 2. In two instances Accurity failed to find a solution.
Prop. normal |
Pathologist |
Canopy |
Canopy with -20% floor |
Accurity |
Sclust |
Sequenza |
Sequenza ploidy 2 |
||||
T1 |
0.5 |
0.734 |
0.643 |
0.74695 |
0.69 |
0.84 |
0.73 |
T2 |
0.5 |
0.067 |
0.532 |
0.60251 |
0.5 |
0.65 |
0.49 |
T3 |
0.4 |
0.665 |
0.594 |
0.79464 |
0.78 |
0.84 |
0.65 |
T4 |
0.5 |
0.564 |
0.56 |
error |
0.04 |
0.76 |
0.52 |
T5 |
0.5 |
0.53 |
0.53 |
0.74239 |
0.51 |
0.76 |
0.51 |
T6 |
0.4 |
0.335 |
0.43 |
0.79977 |
0.75 |
0.82 |
0.74 |
T7 |
0.55 |
0.413 |
0.352 |
0.74384 |
0.58 |
0.79 |
0.61 |
T8 |
0.5 |
0.131 |
0.3 |
error |
0.62 |
0.75 |
0.51 |
T9 |
0.4 |
0.023 |
0.201 |
0.7993 |
0.58 |
0.57 |
0.71 |
T10 |
0.55 |
0.547 |
0.457 |
0.84331 |
0.54 |
0.78 |
0.54 |
T11 |
0.5 |
0.601 |
0.357 |
0.86608 |
0.73 |
0.79 |
0.6 |
T12 |
0.5 |
0.306 |
0.304 |
0.6314 |
0.58 |
0.76 |
0.53 |
T13 |
0.4 |
0.393 |
0.384 |
0.66143 |
0.36 |
0.35 |
0.35 |
In addition, the highly asymmetrical topology (also attached separately) was expected with our patch, which places a lower (not an upper) limit on proportion of normal. After the quantities of the zero-mutation clone 1 were added to normal (yielding the normal estimates in the above table for Canopy with -20% floor), and clones 2 through 8 renumbered to be 1 through 7, we no longer have such an asymmetrical tree.
We really appreciate your insights and your taking the time to help us with our questions!
Thanks, Russell
From: Jiang, Yuchao <yuc...@email.unc.edu>
Sent: Thursday, October 03, 2019 9:01 PM
To: Bonneville, Russell <Russell.B...@osumc.edu>
Cc: Roychowdhury, Sameek <Sameek.Ro...@osumc.edu>; canopy_phylogeny <canopy_p...@googlegroups.com>
Subject: RE: Canopy estimation of normal contamination
Hi Russell,
Apologies for my late reply. I just took a look at the plots you sent and below are my comments/questions/recommendations.
1) I think to have 9 clones for Canopy to deconvolve stretches it to its limit. I would not go above 5.
2) For whatever reasons, clone 1 and 2 in your plot are both normal clones. This is abnormal and may be because of 1).
3) I would not recommend setting any cap to the normal clone fractions. This will mess up the Gibbs sampler and bias your final output.
4) You probably need to QC your data extensively. The noise has a relatively large effect on the final output.
5) You can carry out sanity checks based on Canopy’s output to see if the result makes sense. For example, see if mutations in mut2 are mostly existent in T12,11,10,9,8,7,6 but don’t exist in T13,5,4,2. This should be easy to check and you can simply look at their VAFs.
I hope this is helpful. BTW, you haven’t sent any data (and this makes things easier and safer) so don’t worry. Of course, I won’t share the plots either.
Hello Dr. Jiang,
Thank you very much for your suggestions and insights! With our -20% normal floor, we expected Canopy to identify more than one normal clone. Could you please elaborate how setting this floor would affect/bias the Gibbs sampler, beyond restricting the search space to solutions with percent normal estimates at least as high as 20% below pathologist estimate?
We would definitely appreciate any suggestions you may have to improve our QC filters. Currently we are strictly QC’ing mutations by only passing mutations to Canopy with:
· 100x coverage in all tumor samples
· At least 20 alt-supporting reads in at least one tumor sample
· Minimum VF 6% in at least one sample
· VarScan2 P <= 0.05
· DANN score >= 0.96
At your convenience, would it be possible to set up a phone call to further discuss Canopy clonal inference, so we could more optimally utilize this tool? Thank you so much for your assistance!