canopy.sample.parallel

76 views
Skip to first unread message

achao...@gmail.com

unread,
Jun 19, 2018, 10:14:37 AM6/19/18
to canopy_phylogeny
Hi Jiang,
Thanks for your efforts on creating Canopy.

On 4.5.5.2 parallel computing section, there is a canopy.sample.parallel() function is mentioned, which allows each chain running simultaneously on HPC. But I didn't actually find it in the both MARATHON and Canopy. 
I am wondering is there any other way that can run the MCMC sampling part faster? or was I missing something on 4.5.5.2?

Thanks for your time!

Best,
Peter

Gene Urrutia

unread,
Jun 19, 2018, 10:41:23 AM6/19/18
to achao...@gmail.com, canopy_phylogeny
Hi Peter, thanks for you interest in Canopy and the MARATHON pipeline.

Please download the latest Canopy from github.  canopy.sample.parallel.R is available there. See section 3.2 of the MARATHON notebook for installation instructions.    As you mentioned, section 4.5.5.2 provides details on how to run.   Please let us know if you have additional questions.

Thanks,
Gene

--
You received this message because you are subscribed to the Google Groups "canopy_phylogeny" group.
To unsubscribe from this group and stop receiving emails from it, send an email to canopy_phylogeny+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/canopy_phylogeny/463ddbea-bc87-48fc-b23e-584648b90168%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

achao...@gmail.com

unread,
Jun 19, 2018, 11:05:11 AM6/19/18
to canopy_phylogeny
Hi Gene,
Thank you so much for your quick response! I will go head to download the newest version of Canopy.

Thank,
Peter,

在 2018年6月19日星期二 UTC-4上午10:41:23,Gene Urrutia写道:
Hi Peter, thanks for you interest in Canopy and the MARATHON pipeline.

Please download the latest Canopy from github.  canopy.sample.parallel.R is available there. See section 3.2 of the MARATHON notebook for installation instructions.    As you mentioned, section 4.5.5.2 provides details on how to run.   Please let us know if you have additional questions.

Thanks,
Gene
On Tue, Jun 19, 2018 at 10:14 AM, <achao...@gmail.com> wrote:
Hi Jiang,
Thanks for your efforts on creating Canopy.

On 4.5.5.2 parallel computing section, there is a canopy.sample.parallel() function is mentioned, which allows each chain running simultaneously on HPC. But I didn't actually find it in the both MARATHON and Canopy. 
I am wondering is there any other way that can run the MCMC sampling part faster? or was I missing something on 4.5.5.2?

Thanks for your time!

Best,
Peter

--
You received this message because you are subscribed to the Google Groups "canopy_phylogeny" group.
To unsubscribe from this group and stop receiving emails from it, send an email to canopy_phyloge...@googlegroups.com.

achao...@gmail.com

unread,
Jun 19, 2018, 12:44:13 PM6/19/18
to canopy_phylogeny
Hi Gene,
when I do source("https://bioconductor.org/biocLite.R"), I got the information ask me to install the latest version of R (3.5.0). But the R I run on HPC server is 3.4.4. Is this the reason that I failed to install the latest version of Canopy by following the section 3.2?

Thanks!
Peter

在 2018年6月19日星期二 UTC-4上午10:41:23,Gene Urrutia写道:
Hi Peter, thanks for you interest in Canopy and the MARATHON pipeline.

Please download the latest Canopy from github.  canopy.sample.parallel.R is available there. See section 3.2 of the MARATHON notebook for installation instructions.    As you mentioned, section 4.5.5.2 provides details on how to run.   Please let us know if you have additional questions.

Thanks,
Gene
On Tue, Jun 19, 2018 at 10:14 AM, <achao...@gmail.com> wrote:
Hi Jiang,
Thanks for your efforts on creating Canopy.

On 4.5.5.2 parallel computing section, there is a canopy.sample.parallel() function is mentioned, which allows each chain running simultaneously on HPC. But I didn't actually find it in the both MARATHON and Canopy. 
I am wondering is there any other way that can run the MCMC sampling part faster? or was I missing something on 4.5.5.2?

Thanks for your time!

Best,
Peter

--
You received this message because you are subscribed to the Google Groups "canopy_phylogeny" group.
To unsubscribe from this group and stop receiving emails from it, send an email to canopy_phyloge...@googlegroups.com.

Gene Urrutia

unread,
Jun 19, 2018, 1:37:48 PM6/19/18
to 黄一舟, canopy_phylogeny
Hi Peter, sorry that didn't work,

I don't think the error you mentioned is the issue.  Canopy is downloaded independently of bioconductor.  Also, I just installed MARATHON and Canopy successfully using R 3.4.3 on our HPC as a test.

Could you please try the following code and send me the error message?

install.packages("devtools")
library(devtools)
devtools::install_github("yuchaojiang/Canopy/package")

Thanks,
Gene


To unsubscribe from this group and stop receiving emails from it, send an email to canopy_phylogeny+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/canopy_phylogeny/349fbea5-612b-44c2-b1a0-c4486b51dd2d%40googlegroups.com.

achao...@gmail.com

unread,
Jun 19, 2018, 3:38:29 PM6/19/18
to canopy_phylogeny
Hi Gene,
Thank you so much for your response. Your code works perfectly and right now I see the parallel () function. Not sure why this one didn't bring me the parallel() though:
install.packages(c("Canopy", "falcon", "falconx", "devtools"))
source("https://bioconductor.org/biocLite.R")
biocLite("WES.1KG.WUGSC")
devtools::install_github(c("yuchaojiang/CODEX/package", "yuchaojiang/CODEX2/package", "yuchaojiang/Canopy/package", "zhouzilu/iCNV", "yuchaojiang/MARATHON/package"))

Sorry for late response. I tried to set up R3.5.0 on HPC and not only failed to do so but also disable my previous version (R3.4.4) :3 
But right now everything is back on track. I will let you know if I have more questions coming on lately.

Best,
Peter


在 2018年6月19日星期二 UTC-4下午1:37:48,Gene Urrutia写道:

achao...@gmail.com

unread,
Jun 28, 2018, 10:39:52 AM6/28/18
to canopy_phylogeny
Hi Gene,
Once I used parallel on HPC, sometimes I got this error: "Error in unserialize(node$con) : error reading from connection". Maybe the MCMC sampling process excessed the memory of the node? or the canopy.sample.parallel spawning too many nodes on HPC?
I used the options(mc.cores = 24) to limit the cores, but still received the error sometimes.
The annoying thing is that it not always happened, but when it happened, the error showed up at the very end of the process:

Sample in tree space with 5 subclones

Sample in tree space with 6 subclones

Sample in tree space with 7 subclones

Error in unserialize(node$con) : error reading from connection


My data set usually have two samples with 200-300 SNVs and 30-40 CNVs, and  parameters I used for canopy.sample are: 
#K = 4:6; numchain = 20; epsilonM  = epsilonm = 0.01 (assigned as default); C = NULL, max.simrun = 100000; min.simrun = 10000, writeskip = 200.

It would be nice if you have any recommendation about my process.

Thanks!

Best,
Peter


在 2018年6月19日星期二 UTC-4下午1:37:48,Gene Urrutia写道:
Hi Peter, sorry that didn't work,

Gene Urrutia

unread,
Jun 28, 2018, 11:17:55 AM6/28/18
to 黄一舟, canopy_phylogeny
Hi Peter, 

I agree, this seems like a memory issue. Could you please try running while setting writeskip higher, maybe 2000.  This will reduce the amount of memory needed, and should be sufficient for downstream processing.  Essentially, it will write every 2000th iteration instead of 200th iteration, saving 10x memory.

Thanks,
Gene


To unsubscribe from this group and stop receiving emails from it, send an email to canopy_phylogeny+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/canopy_phylogeny/bfc72262-421a-4dd4-a430-d174ceeed22d%40googlegroups.com.

achao...@gmail.com

unread,
Jun 28, 2018, 4:32:24 PM6/28/18
to canopy_phylogeny
Thanks Gene!
I still have some confused:
1) It's not clear to me that how to select burin and thin parameters in the post() and canopy.BIC()? could you offer some hints?
2) I am thinking about using canopy.sample.cluster with the pre-clusted SNVs by BIC to see if better results would get. But seems like canopy.sample.cluster do not have parallel version. Am I missing something?

3) I am confused about the meaning of the diagnostic plot on the right. I knew the left plot is used to examine the stability of chain. But what is the right plot mean? why there is only one chain showed on my plot?

Sorry for too much questions! I will keep exploring the amazing package.

Peter,

在 2018年6月28日星期四 UTC-4上午11:17:55,Gene Urrutia写道:


在 2018年6月28日星期四 UTC-4上午11:17:55,Gene Urrutia写道:

Gene Urrutia

unread,
Jun 29, 2018, 7:50:39 AM6/29/18
to 黄一舟, canopy_phylogeny

Hi Peter,

1) It's not clear to me that how to select burnin and thin parameters in the post() and canopy.BIC()? could you offer some hints?

-in the canopy.sample() step, results were written for every [writeskip]th iteration.  canopy.BIC() and canopy.post() now additionally thin those results. The first [burnin] iterations are removed and of the remaining, every [thin]th iteration is kept.  So essentially writeskip and thin are doing the same thing, and can multiply each other. The only advice is to make sure not to overthin, otherwise the functions will not run.


2) I am thinking about using canopy.sample.cluster with the pre-clusted SNVs by BIC to see if better results would get. But seems like canopy.sample.cluster do not have parallel version. Am I missing something?

-There is not a parallel version.


3) I am confused about the meaning of the diagnostic plot on the right. I knew the left plot is used to examine the stability of chain. But what is the right plot mean? why there is only one chain showed on my plot?

-The right is a zoomed plot of the left.  All chains appear to be stable on the left, because the scale is so large.  However, the zoomed image on the right gives greater detail and ability to determine stability.  Make sure writeskip here matches your setting in canopy.sample(). You can change yRange setting to expand the view so that more chains are visible.  In your case, only chain 3 achieved the highest likelihood among all chains.  We also see that chain 3 is not increasing over the past 20K iterations, which indicates stability

Thanks,
Gene

To unsubscribe from this group and stop receiving emails from it, send an email to canopy_phylogeny+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/canopy_phylogeny/cdb3b37d-8c3f-4f39-ab31-23d240c5ff9a%40googlegroups.com.

achao...@gmail.com

unread,
Jul 25, 2018, 2:54:26 PM7/25/18
to canopy_phylogeny
Hi Gene,
Thanks for the previous help.
I come up with another question regarding the new somatic calls emerged in the relapsed tumor sample which didn't be called in the primary tumor. In order to include the SNP that does not share by both tumor samples, I assume the VAF of the SNP in primary tumor sample to be 0 and then I assigned 0 to R matrix corresponds to the alt allele read depth of the primary tumor and 10 to the X matrix corresponds to the total read depth. I tried different totally read depth such as 1,10,20ad 25 and realized that even the VAF should be always 0 but the output pattern of clones would be different. Both VAF and CCF results in the outputTree would be different when changing the total depth.

I am not sure if my strategy is ok or not. Do you have any recommendation regarding this situation?
Thanks!

Best,
Peter 

在 2018年6月29日星期五 UTC-4上午7:50:39,Gene Urrutia写道:

Gene Urrutia

unread,
Jul 26, 2018, 8:44:17 AM7/26/18
to 黄一舟, canopy_phylogeny
Hi Peter,

The total read depth should be calculated from the sequencing alignments.  We recommend generating the VCF file by calling SNP's across all samples simultaneously.  Then the VCF file should then have the same entries for all samples.

Thanks,
Gene


To unsubscribe from this group and stop receiving emails from it, send an email to canopy_phylogeny+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/canopy_phylogeny/3aa1c989-fbdf-4f99-93f4-d9f4188ab976%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages