Questions on Canopy package

91 views
Skip to first unread message

Jia, Li (NIH/NCI) [C]

unread,
Jan 2, 2018, 1:55:20 PM1/2/18
to canopy_p...@googlegroups.com

Dear Sir/Madam,

 

Hope you had wonderfully holidays!

 

We are interested in R package Canopy, also we used sequenza to get the copy number, a couple of questions when we run canopy.

 

  1. Using sequenza, we got depth ratio, does that depth ratio is the copy ratio as mentioned?
  2. We have the multiple tumor regions for each patient, the case we tested is 4 samples. When getting WM and Wm matrix, we didn’t overlap the multiple sample copy number common regions, instead, we used those regions individually, we don’t know if we should get the overlapping regions before running canopy.sample function?
  3. When running canopy.sample, it seems like running forever. It’s been running 28 hours and we don’t know when it’s going to be done. We wonder if there is setting that could lower down the running hours.

 

Here is the function:

K= 3:5 # number of subclones

numchain = 15 # number of chains with random initiations

sampchain = canopy.sample(R = R, X = X, WM = WM, Wm = Wm, epsilonM = epsilonM,

                          epsilonm = epsilonm, C = NULL, Y = Y, K = K,

                          numchain = numchain, max.simrun = 100000,

                          min.simrun = 20000, writeskip = 200,

                          projectname = projectname, cell.line = TRUE,

                          plot.likelihood = TRUE)

 

if we can set the small number of simrun, say, 1000, that may help, does that affect the results?

 

By the way, we tried to set number of subclones k=2:5, unfortunately it doesn’t work. From the source code you provided, it’s supposed to work.

 

Thanks so much, appreciate the help!

 

Li

 

 

 

 

 

Yuchao Jiang

unread,
Jan 4, 2018, 9:25:33 PM1/4/18
to Jia, Li (NIH/NCI) [C], canopy_p...@googlegroups.com, Gene Urrutia, Jiang, Yuchao
Hi Li,

Please see my response below. Hope that it is helpful.

Yuchao 

On Jan 2, 2018, at 1:55 PM, Jia, Li (NIH/NCI) [C] <li....@nih.gov> wrote:

Dear Sir/Madam,
 
Hope you had wonderfully holidays!
 
We are interested in R package Canopy, also we used sequenza to get the copy number, a couple of questions when we run canopy.
 
  1. Using sequenza, we got depth ratio, does that depth ratio is the copy ratio as mentioned?


  1. We have the multiple tumor regions for each patient, the case we tested is 4 samples. When getting WM and Wm matrix, we didn’t overlap the multiple sample copy number common regions, instead, we used those regions individually, we don’t know if we should get the overlapping regions before running canopy.sample function?

Again refer to the page above. WM and Wm is a matrix of copy number regions x samples. If you have a copy number event shared by multiple samples, these samples should have similar input for that specific row/copy number event in WM and Wm. Overlapping refers to two events overlapping each other (e.g., a homozygous deletion nested within a heterozygous deletion) rather than a duplication shared by multiple dissections. You need to make sure that you have the correct input.

  1. When running canopy.sample, it seems like running forever. It’s been running 28 hours and we don’t know when it’s going to be done. We wonder if there is setting that could lower down the running hours.
 
Here is the function:
K= 3:5 # number of subclones
numchain = 15 # number of chains with random initiations
sampchain = canopy.sample(R = R, X = X, WM = WM, Wm = Wm, epsilonM = epsilonM,
                          epsilonm = epsilonm, C = NULL, Y = Y, K = K,
                          numchain = numchain, max.simrun = 100000,
                          min.simrun = 20000, writeskip = 200,
                          projectname = projectname, cell.line = TRUE,
                          plot.likelihood = TRUE)
 
if we can set the small number of simrun, say, 1000, that may help, does that affect the results?

Yes, you cannot set it to 1000. You have to make sure MCMC converges and it usually takes a while. Yet this should be 28 hours long. Most likely you have a wrong input or a wrong input format, based on your first and second question. In this case, the algorithm will take a long time to converge besides a garbage in garbage out scenario might happen. My suggestion is to select mutations that are informative https://github.com/yuchaojiang/Canopy/blob/master/instruction/SNA_CNA_choice.md and cluster your point mutations if you have many https://github.com/yuchaojiang/Canopy/tree/master/clustering.

 
By the way, we tried to set number of subclones k=2:5, unfortunately it doesn’t work. From the source code you provided, it’s supposed to work.

K here includes the normal clone. Therefore, if you have only 2 sub clones including normal, you don’t need deconvolution.


 
Thanks so much, appreciate the help!
 
Li
 
 
 
 
 

-- 
You received this message because you are subscribed to the Google Groups "canopy_phylogeny" group.
To unsubscribe from this group and stop receiving emails from it, send an email to canopy_phyloge...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/canopy_phylogeny/F63C8B6A-4D83-412F-A4AE-29CC59EC8362%40mail.nih.gov.
For more options, visit https://groups.google.com/d/optout.

Jia, Li (NIH/NCI) [C]

unread,
Jan 23, 2018, 12:25:09 PM1/23/18
to Yuchao Jiang, canopy_p...@googlegroups.com, Gene Urrutia, Jiang, Yuchao

Thanks Yuchao for the reply.

 

Based on your answers, I re-generated the input files for both mutation and copy number. Before doing the phylogenetic tree, we already generated the clonality analysis results to identify the mutations on either clonal or subclone using both mutation and copy number results. In the phylogenetic tree, we expect to see the clonal mutations could show at the tree trunk, that is near to the root.

 

We have a couple of questions on this package and try to understand it.

 

  1. We followed the instructions on the website you provided and regenerated the input mutation list with some filter criteria, including only used the driver gene mutations, it does make the mutation list short. Should we need to select the mutations reside at the copy number aberration regions? We worked on both ways, only selected the mutations overlapping with CNAs and selected all mutations including the ones that are not overlapped with CNAs. We run two patient cases, each patient has multiple samples. The results showed the opposite to our expectation. One patient showed that only using overlapped mutations with CNAs has the majority of clonal mutations at the tree trunk, another patient showed that using all mutation list including overlapped with CNAs has the majority of clonal mutations at the tree trunk. Could you explain how to set the correct input file to be used in canopy?
  2. In the tree, we understand that the muations and CNAs in mut1 group that is near the root, should be more important. But we still need you help to understand it better. We found that TP53 gene in mut1, but the corresponding TP53 resided CNA is located at mut6, does that mean TP53 mutation happened early than the copy number aberration. How to understand this? Also, how to understand the CNAs at mut1, and the mutations at later group, say mut4 or mut5?
  3. We also confused by the clone frequency with different colors in the matrix under the tree. The row is the frequency of clones for the samples, if two samples showed the similar frequency levels at all clones, that means these two samples are similar, and more correlated, correct? When we checked on the copy number aberration and mutations for the heatmap on correlation, these two samples are not the closest ones. Could you explain on the results? Those clone frequency calculation doesn’t consider any tumor purity issue, correct?

 

Thanks again for the help.

 

Li

Yuchao Jiang

unread,
Jan 24, 2018, 3:27:35 PM1/24/18
to Jia, Li (NIH/NCI) [C], canopy_p...@googlegroups.com, Gene Urrutia, Jiang, Yuchao
Hi Li,

See my response below. Hope that this helps.

We followed the instructions on the website you provided and regenerated the input mutation list with some filter criteria, including only used the driver gene mutations, it does make the mutation list short. Should we need to select the mutations reside at the copy number aberration regions? We worked on both ways, only selected the mutations overlapping with CNAs and selected all mutations including the ones that are not overlapped with CNAs. We run two patient cases, each patient has multiple samples. The results showed the opposite to our expectation. One patient showed that only using overlapped mutations with CNAs has the majority of clonal mutations at the tree trunk, another patient showed that using all mutation list including overlapped with CNAs has the majority of clonal mutations at the tree trunk. Could you explain how to set the correct input file to be used in canopy?

How many point mutations there are? Also how did you select the CNAs? You need to be stringent not only on SNAs but also on CNAs. Select those that are informative -- for SNAs, select those that show differential alt allele frequencies across different dissections of a patient; for CNAs, select those that show distinct copy number profiles (i.e., duplication in region 1, loss of heterozygosity in region 2). Resort to this for more info. https://github.com/yuchaojiang/Canopy/blob/master/instruction/SNA_CNA_choice.md

As a concrete example, the picture below is an IGV view of ASCN calls across three sections of a glioblastoma patient. We see that if you use the deletion that is shared across all sections, they aren't informative in separating the clones and will be placed at the tree trunk. If you use the loss-of-heterozygosity event in yellow, you can tell that GBM9_R1 has an additional loss, which gives rise to a new clone that is present in this section but not in the other two. Also, as you can tell, for section GBM9_R1, there are a lot of false positives, which need to be filtered out. Otherwise, it will be just garbage in garbage out (this will also significant affect SNAs as well if you consider SNA-CNA overlap).


Inline image 2

I'm not sure what you meant by "the opposite to our expectation". What is the expectation? Also can you elaborate on this "One patient showed that only using overlapped mutations with CNAs has the majority of clonal mutations at the tree trunk, another patient showed that using all mutation list including overlapped with CNAs has the majority of clonal mutations at the tree trunk. "? Each patient has different tumor evolutionary history and thus one shouldnt cross compare two different patients. Maybe I am missing something here.


In the tree, we understand that the muations and CNAs in mut1 group that is near the root, should be more important. But we still need you help to understand it better. We found that TP53 gene in mut1, but the corresponding TP53 resided CNA is located at mut6, does that mean TP53 mutation happened early than the copy number aberration. How to understand this? Also, how to understand the CNAs at mut1, and the mutations at later group, say mut4 or mut5?

We see these a lot. TP53 point mutation happens at the trunk; a CNA that happens later along the tree branch amplify the TP53 point mutations (e.g., a duplication of the mutated allele or a loss-of-heterozygosity of the protected allele). However, I recommend perform some sanity check. For example, to manually inspect the bam files and see if the TP53 mutations are indeed amplified in samples that have the corresponding CNA. 

We also confused by the clone frequency with different colors in the matrix under the tree. The row is the frequency of clones for the samples, if two samples showed the similar frequency levels at all clones, that means these two samples are similar, and more correlated, correct?

Yes this is correct.

When we checked on the copy number aberration and mutations for the heatmap on correlation, these two samples are not the closest ones. Could you explain on the results?

Something is wrong with the Canopy analysis. You need to be cautious about the input and also you need to let the chain run long enough -- it's possible that the sampling hasn't converged yet. Don't change the default parameters by Canopy.

Those clone frequency calculation doesn’t consider any tumor purity issue, correct?

Canopy does account for normal cell clone (the first clone to the left most branch, which doesn't have any mutations on it).

Yuchao

Yuchao 

-- 

To unsubscribe from this group and stop receiving emails from it, send an email to canopy_phylogeny+unsub...@googlegroups.com.

Jia, Li (NIH/NCI) [C]

unread,
Jan 26, 2018, 10:06:32 AM1/26/18
to Jiang, Yuchao, Yuchao Jiang, canopy_p...@googlegroups.com, Gene Urrutia

Thanks Yuchao for your answers.

 

To clarify the question on two cases we run. We didn’t compare the two different patients, instead, we run the two different scenarios for each patient. When we compared the two scenarios within each case, we found the results from two scenarios are dramatically different and one may be satisfied our expectation. Because we don’t know which way is better, we checked on both cases. The two scenarios for each case were provided in the last email, we hope we can find the consensus based on the two cases, that is, scenario I in case I is similar to scenario I in case II, unfortunately we found scenario I in case I is similar to scenario II in case II. That is what we confused, should we use SNAs completely reside at CNAs? Or we still can include some SNAs not in CNAs?

 

 

  1. In fig 5C of your publication, the root is 4, correct? Normal is only one clone generated from root 4, if normal shows some clone frequency, for example, 0.23, that means normal cell has 23% tumor contamination, correct? In some phylogenetic tree study, normal is used as origin, does canopy do the same?

 

Yes the clonal frequency for the left most branch is the frequency of normal cells in the bulk sample. Yes we assume cancer cells arise from normal.

For question 4 I asked in the last email, it is not very clear if normal is the root. Per information from paper and tree, it seems number 4 is the root, normal is one of the subclones derived from the root?

 

Thanks again for the help.

 

Li

 

From: "Jiang, Yuchao" <yuc...@email.unc.edu>
Date: Thursday, January 25, 2018 at 4:23 PM
To: Li Jia <li....@nih.gov>
Cc: Yuchao Jiang <yj...@cornell.edu>, "canopy_p...@googlegroups.com" <canopy_p...@googlegroups.com>, Gene Urrutia <gene.u...@gmail.com>
Subject: Re: [canopy_phylogeny] Questions on Canopy package

 

 

Sent from my iPhone


On Jan 25, 2018, at 11:02 AM, Jia, Li (NIH/NCI) [C] <li....@nih.gov> wrote:

Thanks Yuchao for your kind reply. It’s helpful.

 

Here is to clarify my question in the last email.

 

In one case we run, if we only use mutations that overlap with CNAs, there are 38 mutations, if considering the mutations without overlapping with CNAs, there are 51 mutations. In another patient, there are 15 mutations overlapped with CNAs, and considering the mutations without overlapping with CNAs, there are 20 mutations. In one patient, the mutations overlapped with CNAs showed the clonal mutations at tree trunk, but in another patient, the mutations including overlapping with CNAs and without overlapping with CNAs showed the clonal mutations at the tree trunk, that is opposite case. Also, it is confusing us on SNA generation, we don’t know if we should only use SNAs overlapped with CNAs.

 

I’m still very confused about this paragraph. Why do you compare two patients? Canopy infers intratumor heterogeneity within each patient. Also what’s your definition of clonal mutation? Aren’t they both at the tree trunk (ie why opposite)?

 

Also you need to not only dwell on SNVs but also CNAs as pointed out in my previous email. That is just as, if not more, important in your final output.



 

We may still have some other questions that need to be clarified.

 

  1. For SNAs input, is there any statistical standard to evaluate the differences of SNAs across the samples. You suggested heatmap to visualize the results, but we have hundreds of patients, and expect to run the program based on certain criteria with the same standard, not just based on the visualization.

 

You can try sd. If your list of mutations are fine tuned without false positives/negatives, you can throw them all in. QC is important.



  1.  
  2. When read your publication, it seems that pre-clustering of SNAs before MCMC may improve the performance and reduce the experiment noise. We wonder if we should do this step before MCMC for all runs. Or depends on different situations? We haven’t tried this yet, and wonder if we only use SNAs in the clusters and exclude those SNAs that don’t belong to any cluster.

 

If you see clusters of mutations in the VAF plot, you should do so. It only improves running time and we show there is no impact on the final posterior tree distribution, as long as MCMC sampling successfully converges. You need to run the chain long enough.



  1.  
  2. When we do correlation of samples using all of SNAs or CNAs that had been identified, but in canopy, we have to select SNAs and CNAs, the sample correlation should distinguish from using all SNAs or CNAs. The scientists in our group concern that only selecting distinct VAF of SNAs may miss some important driver mutations, like KRAS or TP53 showed across all samples, but it doesn’t have dramatic difference in allele frequency. Should we keep those SNAs showed across all samples?

 

Yes you should keep these. They will be clonal mutations on the first right branch. Again QC is important to remove noise.



  1. In canopy, the selected SNAs and CNAs for studying on tumor evolution history, only elaborate that the tumor evolution is similar if two samples show the similar clone frequency, Correct?

 

Yes if I understand correctly.



  1.  
  2. In fig 5C of your publication, the root is 4, correct? Normal is only one clone generated from root 4, if normal shows some clone frequency, for example, 0.23, that means normal cell has 23% tumor contamination, correct? In some phylogenetic tree study, normal is used as origin, does canopy do the same?

 

Yes the clonal frequency for the left most branch is the frequency of normal cells in the bulk sample. Yes we assume cancer cells arise from normal.

  1.  

 

Thanks so much. A lot of questions need your help. We like this tool, but definitely need better understand.

 

Best,

Li

 

 

 

From: Yuchao Jiang <yj...@cornell.edu>
Date: Wednesday, January 24, 2018 at 3:26 PM
To: Li Jia <li....@nih.gov>
Cc: "canopy_p...@googlegroups.com" <canopy_p...@googlegroups.com>, Gene Urrutia <gene.u...@gmail.com>, "Jiang, Yuchao" <yuc...@email.unc.edu>
Subject: Re: [canopy_phylogeny] Questions on Canopy package

 

Hi Li,

 

See my response below. Hope that this helps.

 

We followed the instructions on the website you provided and regenerated the input mutation list with some filter criteria, including only used the driver gene mutations, it does make the mutation list short. Should we need to select the mutations reside at the copy number aberration regions? We worked on both ways, only selected the mutations overlapping with CNAs and selected all mutations including the ones that are not overlapped with CNAs. We run two patient cases, each patient has multiple samples. The results showed the opposite to our expectation. One patient showed that only using overlapped mutations with CNAs has the majority of clonal mutations at the tree trunk, another patient showed that using all mutation list including overlapped with CNAs has the majority of clonal mutations at the tree trunk. Could you explain how to set the correct input file to be used in canopy?

 

How many point mutations there are? Also how did you select the CNAs? You need to be stringent not only on SNAs but also on CNAs. Select those that are informative -- for SNAs, select those that show differential alt allele frequencies across different dissections of a patient; for CNAs, select those that show distinct copy number profiles (i.e., duplication in region 1, loss of heterozygosity in region 2). Resort to this for more info. https://github.com/yuchaojiang/Canopy/blob/master/instruction/SNA_CNA_choice.md

 

As a concrete example, the picture below is an IGV view of ASCN calls across three sections of a glioblastoma patient. We see that if you use the deletion that is shared across all sections, they aren't informative in separating the clones and will be placed at the tree trunk. If you use the loss-of-heterozygosity event in yellow, you can tell that GBM9_R1 has an additional loss, which gives rise to a new clone that is present in this section but not in the other two. Also, as you can tell, for section GBM9_R1, there are a lot of false positives, which need to be filtered out. Otherwise, it will be just garbage in garbage out (this will also significant affect SNAs as well if you consider SNA-CNA overlap).


<image001.png>

Yuchao 

-- 

To unsubscribe from this group and stop receiving emails from it, send an email to canopy_phyloge...@googlegroups.com.

Jiang, Yuchao

unread,
Jan 26, 2018, 6:26:56 PM1/26/18
to Jia, Li (NIH/NCI) [C], Yuchao Jiang, canopy_p...@googlegroups.com, Gene Urrutia


Sent from my iPhone

On Jan 25, 2018, at 11:02 AM, Jia, Li (NIH/NCI) [C] <li....@nih.gov> wrote:

Thanks Yuchao for your kind reply. It’s helpful.

 

Here is to clarify my question in the last email.

 

In one case we run, if we only use mutations that overlap with CNAs, there are 38 mutations, if considering the mutations without overlapping with CNAs, there are 51 mutations. In another patient, there are 15 mutations overlapped with CNAs, and considering the mutations without overlapping with CNAs, there are 20 mutations. In one patient, the mutations overlapped with CNAs showed the clonal mutations at tree trunk, but in another patient, the mutations including overlapping with CNAs and without overlapping with CNAs showed the clonal mutations at the tree trunk, that is opposite case. Also, it is confusing us on SNA generation, we don’t know if we should only use SNAs overlapped with CNAs.


I’m still very confused about this paragraph. Why do you compare two patients? Canopy infers intratumor heterogeneity within each patient. Also what’s your definition of clonal mutation? Aren’t they both at the tree trunk (ie why opposite)?

Also you need to not only dwell on SNVs but also CNAs as pointed out in my previous email. That is just as, if not more, important in your final output.

 

We may still have some other questions that need to be clarified.

 

  1. For SNAs input, is there any statistical standard to evaluate the differences of SNAs across the samples. You suggested heatmap to visualize the results, but we have hundreds of patients, and expect to run the program based on certain criteria with the same standard, not just based on the visualization.

You can try sd. If your list of mutations are fine tuned without false positives/negatives, you can throw them all in. QC is important.

  1. When read your publication, it seems that pre-clustering of SNAs before MCMC may improve the performance and reduce the experiment noise. We wonder if we should do this step before MCMC for all runs. Or depends on different situations? We haven’t tried this yet, and wonder if we only use SNAs in the clusters and exclude those SNAs that don’t belong to any cluster.

If you see clusters of mutations in the VAF plot, you should do so. It only improves running time and we show there is no impact on the final posterior tree distribution, as long as MCMC sampling successfully converges. You need to run the chain long enough.

  1. When we do correlation of samples using all of SNAs or CNAs that had been identified, but in canopy, we have to select SNAs and CNAs, the sample correlation should distinguish from using all SNAs or CNAs. The scientists in our group concern that only selecting distinct VAF of SNAs may miss some important driver mutations, like KRAS or TP53 showed across all samples, but it doesn’t have dramatic difference in allele frequency. Should we keep those SNAs showed across all samples?

Yes you should keep these. They will be clonal mutations on the first right branch. Again QC is important to remove noise.

  1. In canopy, the selected SNAs and CNAs for studying on tumor evolution history, only elaborate that the tumor evolution is similar if two samples show the similar clone frequency, Correct?

Yes if I understand correctly.

  1. In fig 5C of your publication, the root is 4, correct? Normal is only one clone generated from root 4, if normal shows some clone frequency, for example, 0.23, that means normal cell has 23% tumor contamination, correct? In some phylogenetic tree study, normal is used as origin, does canopy do the same?

Yes the clonal frequency for the left most branch is the frequency of normal cells in the bulk sample. Yes we assume cancer cells arise from normal.

 

Thanks so much. A lot of questions need your help. We like this tool, but definitely need better understand.

 

Best,

Li

 

 

 

From: Yuchao Jiang <yj...@cornell.edu>
Date: Wednesday, January 24, 2018 at 3:26 PM
To: Li Jia <li....@nih.gov>
Cc: "canopy_p...@googlegroups.com" <canopy_p...@googlegroups.com>, Gene Urrutia <gene.u...@gmail.com>, "Jiang, Yuchao" <yuc...@email.unc.edu>
Subject: Re: [canopy_phylogeny] Questions on Canopy package

 

Hi Li,

 

See my response below. Hope that this helps.

 

We followed the instructions on the website you provided and regenerated the input mutation list with some filter criteria, including only used the driver gene mutations, it does make the mutation list short. Should we need to select the mutations reside at the copy number aberration regions? We worked on both ways, only selected the mutations overlapping with CNAs and selected all mutations including the ones that are not overlapped with CNAs. We run two patient cases, each patient has multiple samples. The results showed the opposite to our expectation. One patient showed that only using overlapped mutations with CNAs has the majority of clonal mutations at the tree trunk, another patient showed that using all mutation list including overlapped with CNAs has the majority of clonal mutations at the tree trunk. Could you explain how to set the correct input file to be used in canopy?

 

How many point mutations there are? Also how did you select the CNAs? You need to be stringent not only on SNAs but also on CNAs. Select those that are informative -- for SNAs, select those that show differential alt allele frequencies across different dissections of a patient; for CNAs, select those that show distinct copy number profiles (i.e., duplication in region 1, loss of heterozygosity in region 2). Resort to this for more info. https://github.com/yuchaojiang/Canopy/blob/master/instruction/SNA_CNA_choice.md

 

As a concrete example, the picture below is an IGV view of ASCN calls across three sections of a glioblastoma patient. We see that if you use the deletion that is shared across all sections, they aren't informative in separating the clones and will be placed at the tree trunk. If you use the loss-of-heterozygosity event in yellow, you can tell that GBM9_R1 has an additional loss, which gives rise to a new clone that is present in this section but not in the other two. Also, as you can tell, for section GBM9_R1, there are a lot of false positives, which need to be filtered out. Otherwise, it will be just garbage in garbage out (this will also significant affect SNAs as well if you consider SNA-CNA overlap).


<image001.png>

Yuchao

 

Yuchao 

-- 

To unsubscribe from this group and stop receiving emails from it, send an email to canopy_phyloge...@googlegroups.com.

Jia, Li (NIH/NCI) [C]

unread,
Jan 26, 2018, 6:26:56 PM1/26/18
to Yuchao Jiang, canopy_p...@googlegroups.com, Gene Urrutia, Jiang, Yuchao

Thanks Yuchao for your kind reply. It’s helpful.

 

Here is to clarify my question in the last email.

 

In one case we run, if we only use mutations that overlap with CNAs, there are 38 mutations, if considering the mutations without overlapping with CNAs, there are 51 mutations. In another patient, there are 15 mutations overlapped with CNAs, and considering the mutations without overlapping with CNAs, there are 20 mutations. In one patient, the mutations overlapped with CNAs showed the clonal mutations at tree trunk, but in another patient, the mutations including overlapping with CNAs and without overlapping with CNAs showed the clonal mutations at the tree trunk, that is opposite case. Also, it is confusing us on SNA generation, we don’t know if we should only use SNAs overlapped with CNAs.

 

We may still have some other questions that need to be clarified.

 

  1. For SNAs input, is there any statistical standard to evaluate the differences of SNAs across the samples. You suggested heatmap to visualize the results, but we have hundreds of patients, and expect to run the program based on certain criteria with the same standard, not just based on the visualization.
  1. When read your publication, it seems that pre-clustering of SNAs before MCMC may improve the performance and reduce the experiment noise. We wonder if we should do this step before MCMC for all runs. Or depends on different situations? We haven’t tried this yet, and wonder if we only use SNAs in the clusters and exclude those SNAs that don’t belong to any cluster.
  1. When we do correlation of samples using all of SNAs or CNAs that had been identified, but in canopy, we have to select SNAs and CNAs, the sample correlation should distinguish from using all SNAs or CNAs. The scientists in our group concern that only selecting distinct VAF of SNAs may miss some important driver mutations, like KRAS or TP53 showed across all samples, but it doesn’t have dramatic difference in allele frequency. Should we keep those SNAs showed across all samples? In canopy, the selected SNAs and CNAs for studying on tumor evolution history, only elaborate that the tumor evolution is similar if two samples show the similar clone frequency, Correct?
  1. In fig 5C of your publication, the root is 4, correct? Normal is only one clone generated from root 4, if normal shows some clone frequency, for example, 0.23, that means normal cell has 23% tumor contamination, correct? In some phylogenetic tree study, normal is used as origin, does canopy do the same?

     

    Thanks so much. A lot of questions need your help. We like this tool, but definitely need better understand.

     

    Best,

    Li

     

     

     

    From: Yuchao Jiang <yj...@cornell.edu>
    Date: Wednesday, January 24, 2018 at 3:26 PM
    To: Li Jia <li....@nih.gov>
    Cc: "canopy_p...@googlegroups.com" <canopy_p...@googlegroups.com>, Gene Urrutia <gene.u...@gmail.com>, "Jiang, Yuchao" <yuc...@email.unc.edu>
    Subject: Re: [canopy_phylogeny] Questions on Canopy package

     

    Hi Li,

     

    See my response below. Hope that this helps.

     

    We followed the instructions on the website you provided and regenerated the input mutation list with some filter criteria, including only used the driver gene mutations, it does make the mutation list short. Should we need to select the mutations reside at the copy number aberration regions? We worked on both ways, only selected the mutations overlapping with CNAs and selected all mutations including the ones that are not overlapped with CNAs. We run two patient cases, each patient has multiple samples. The results showed the opposite to our expectation. One patient showed that only using overlapped mutations with CNAs has the majority of clonal mutations at the tree trunk, another patient showed that using all mutation list including overlapped with CNAs has the majority of clonal mutations at the tree trunk. Could you explain how to set the correct input file to be used in canopy?

     

    How many point mutations there are? Also how did you select the CNAs? You need to be stringent not only on SNAs but also on CNAs. Select those that are informative -- for SNAs, select those that show differential alt allele frequencies across different dissections of a patient; for CNAs, select those that show distinct copy number profiles (i.e., duplication in region 1, loss of heterozygosity in region 2). Resort to this for more info. https://github.com/yuchaojiang/Canopy/blob/master/instruction/SNA_CNA_choice.md

     

    As a concrete example, the picture below is an IGV view of ASCN calls across three sections of a glioblastoma patient. We see that if you use the deletion that is shared across all sections, they aren't informative in separating the clones and will be placed at the tree trunk. If you use the loss-of-heterozygosity event in yellow, you can tell that GBM9_R1 has an additional loss, which gives rise to a new clone that is present in this section but not in the other two. Also, as you can tell, for section GBM9_R1, there are a lot of false positives, which need to be filtered out. Otherwise, it will be just garbage in garbage out (this will also significant affect SNAs as well if you consider SNA-CNA overlap).


    nline image 2

    Yuchao

     

    Yuchao 

    -- 

    To unsubscribe from this group and stop receiving emails from it, send an email to canopy_phyloge...@googlegroups.com.

    Yuchao Jiang

    unread,
    Jan 28, 2018, 2:12:25 AM1/28/18
    to Jia, Li (NIH/NCI) [C], canopy_p...@googlegroups.com, Gene Urrutia

    On Jan 26, 2018, at 10:06 AM, Jia, Li (NIH/NCI) [C] <li....@nih.gov> wrote:

    Thanks Yuchao for your answers.
     
    To clarify the question on two cases we run. We didn’t compare the two different patients, instead, we run the two different scenarios for each patient. When we compared the two scenarios within each case, we found the results from two scenarios are dramatically different and one may be satisfied our expectation. Because we don’t know which way is better, we checked on both cases. The two scenarios for each case were provided in the last email, we hope we can find the consensus based on the two cases, that is, scenario I in case I is similar to scenario I in case II, unfortunately we found scenario I in case I is similar to scenario II in case II. That is what we confused, should we use SNAs completely reside at CNAs? Or we still can include some SNAs not in CNAs?
     
     

    I thought this is comparing the two cases (i.e., patients)? Am I taking this wrong? Also you need to make sure you have the correct input for Canopy and run Canopy correctly before making any meaningful interpretations.


    1. In fig 5C of your publication, the root is 4, correct? Normal is only one clone generated from root 4, if normal shows some clone frequency, for example, 0.23, that means normal cell has 23% tumor contamination, correct? In some phylogenetic tree study, normal is used as origin, does canopy do the same?
     
    Yes the clonal frequency for the left most branch is the frequency of normal cells in the bulk sample. Yes we assume cancer cells arise from normal.

    For question 4 I asked in the last email, it is not very clear if normal is the root. Per information from paper and tree, it seems number 4 is the root, normal is one of the subclones derived from the root?
     

    There is no mutation / changes from root (labeled as number 4 in Figure 5c) to the normal clone (left most leaf). They are the same.
    Reply all
    Reply to author
    Forward
    0 new messages