Re: multiple overlap regions in canopy

142 views
Skip to first unread message

Jiang, Yuchao

unread,
Nov 15, 2016, 10:45:07 PM11/15/16
to 黎航, canopy_phylogeny
Hi Dr Li,

I am not sure what you mean by multiple overlap region. You mean multiple overlap CNA events? Is each line in the following a CNA event that you find?

While I need more clarification from you to answer your question, you can refer to the bottom of the following page, which I think addresses your question. Also please read through the other common questions as well ;-)

Please let me know if you have any questions.

Best,
Yuchao


On Nov 15, 2016, at 10:07 PM, 黎航 <lih...@webmail.hzau.edu.cn> wrote:

Dear Dr. Jiang
I'm doing some clonal evolution research with canopy with whole genome sequence.I'm facing one problem that if their are multiple overlap region in one chromosome, like
1    chr1:1-1000
2    chr1:500-1500
3    chr1:2000-3000
4    chr1:2500-3500
How could I provide the Matrix C
                1      2     3    4 
chr1         1      1     1    1

or 
                1      2      3     4  
chr1         1      1       2     2

Best Regards
Thanks

Hang Li




 

Jiang, Yuchao

unread,
Nov 15, 2016, 11:06:49 PM11/15/16
to 黎航, canopy_p...@googlegroups.com

Okay, you can refer to the bottom of our github page. Each column of C is a CNA event and thus in your case it is CNA1, CNA2, CNA3, CNA4; each row of C is a CNA region, for this you can have two different version shown below.

 

A simple version where you only focus on the overlapped regions:

 

 

CNA1

CNA2

CNA3

CNA4

Chr1: 500-1000

1

1

0

0

Ch1:2500-3000

0

0

1

1

 

Or more completely:

 

 

CNA1

CNA2

CNA3

CNA4

Chr1:1-499

1

0

0

0

Chr1: 500-1000

1

1

0

0

Chr1:1001-1500

0

1

0

0

Chr1:2000-2499

0

0

1

0

Ch1:2500-3000

0

0

1

1

Chr1:3001-3500

0

0

0

1

 

 

For the above two different input, your WM and Wm have correspondingly different dimensions. Make sure that the dimensionality checks before you proceed.

 

Cheers,

Yuchao

 

 

From: 黎航 [mailto:lih...@webmail.hzau.edu.cn]
Sent: Tuesday, November 15, 2016 10:46 PM
To: Jiang, Yuchao <yuc...@wharton.upenn.edu>
Subject: Re:Re: multiple overlap regions in canopy

 

Yes, I'm meaning that I had found 4 CNAs in chr1 . CNA1 overlap with CNA2, and CNA3 overlap with CNA4.  how can I show that for metrix C in canopy.
Thanks

---------- Origin message ----------
>From"Jiang, Yuchao" <yuc...@wharton.upenn.edu>
>To"黎航" <lih...@webmail.hzau.edu.cn>
>SubjectRe: multiple overlap regions in canopy
>Date2016-11-16 11:45:04



Hi Dr Li,

I am not sure what you mean by multiple overlap region. You mean multiple overlap CNA events? Is each line in the following a CNA event that you find?

While I need more clarification from you to answer your question, you can refer to the bottom of the following page, which I think addresses your question. Also please read through the other common questions as well ;-)
https://github.com/yuchaojiang/Canopy

Please let me know if you have any questions.

Best,
Yuchao


soroosh....@gmail.com

unread,
Nov 19, 2016, 11:43:56 AM11/19/16
to canopy_phylogeny, lih...@webmail.hzau.edu.cn, yuc...@wharton.upenn.edu
Hi Yuchao,

From what I understand,(and in the code)  Matrix C should have only one "1" in each column which intuitively is reasonable.However , in the second table below there are more . Am I missing something? 

On a different note, I've been playing a bit with Canopy and so far the results haven't been what I have expected. For example, Canopy puts BRAF in placed the subclones of very low cellular prevalence in a BRAF-mutant Melanoma samples of very high purity. I think it has to do with the way I create C matrix as I am still not 100% clear about the distinction between "CNA region" and "CNA event" in Matrix C. Looks like if left Null, Canopy creates a square matrix (second version above) for each non-overlapping region ( "colnames(C) = rownames(C) = rownames(WM)"). So is it correct to say that CNA region is a superset (union) of all CNV events? 

Also, how do you think the choices (two versions above) of C matrix affect the final results (is second more accurate and possibly more computationally intensive or not necessarily?). Thank you very much.

Thank you,

Soroush

Yuchao Jiang

unread,
Nov 21, 2016, 12:59:16 PM11/21/16
to soroosh....@gmail.com, canopy_phylogeny, lih...@webmail.hzau.edu.cn, Jiang, Yuchao
Hi Soroush,

The CNVs in the example are overlapping and that's why you would observe more than one 1's in the columns of the C matrix. I would suggest using the first table, which is simpler and only has a single 1 in the columns of the C matrix. By using the first table, we are essentially only using the information of the overlapped regions for two CNVs.

For your question on CNV event versus CNV region, I would suggest refer to the bottom of our GitHub page https://github.com/yuchaojiang/Canopy for clarification of the C matrix. If you don't have observed overlapped CNAs, just leave C=NULL. If you do have overlapped CNVs and still need help, please reply to me with your CNA calls and SNA calls. I can help look into it and it would be the easiest if you use save.image(file='debug.rda') and send the rda file to me. It wouldn't make sense you observed BRAF within a subclone with very small cellular frequency and yes it could be very likely due to a mis specification of the input.

Cheers,
Yuchao



--
Yuchao Jiang (yuc...@mail.med.upenn.edu)
PhD student in Genomics and Computational Biology
University of Pennsylvania

--
You received this message because you are subscribed to the Google Groups "canopy_phylogeny" group.
To unsubscribe from this group and stop receiving emails from it, send an email to canopy_phylogeny+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/canopy_phylogeny/d80e1452-e215-4218-8256-cfc76e281b43%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Lorraine Soudade

unread,
Apr 19, 2018, 7:59:27 AM4/19/18
to canopy_phylogeny
Hello Yuchao,

I have generated my C matrix, corresponding exactly with my data and inspired by your example matrix below. But Canopy does not accept matrix C with column having more than one "1". This is my data (C and Wm) :

C :

CNA_1CNA_2CNA_3CNA_4CNA_5CNA_6CNA_7CNA_8CNA_9CNA_10CNA_11CNA_12CNA_13
chr1:4423527-180016631000000000000
chr1:68137446-1157809460100000000000
chr2:195960514-2280280190010000000000
chr3:29375111-569478440001000000000
chr3:56947844-586708240001100000000
chr3:58670824-690581420001000000000
chr4:150660451-1812867320000010000000
chr6:137674-1707369090000001000000
chr9:27075398-389645580000000100000
chr12:55657-249714420000000010000
chr14:16097157-1068825430000000001000
chr23:251157-9093190000000000110
chr23:909319-1560251160000000000101

Wm :

B00JAJBB00JAJC
chr1:4423527-1800166310,658
chr1:68137446-11578094610,634
chr2:195960514-22802801910,669
chr3:29375111-5694784410,681
chr3:56947844-586708240,6940,681
chr3:58670824-6905814210,681
chr4:150660451-18128673210,694
chr6:137674-1707369090,8431
chr9:27075398-3896455810,683
chr12:55657-2497144210,698
chr14:16097157-1068825430,8461
chr23:251157-9093190,7550,722
chr23:909319-1560251160,7550,772

How would you format these inputs to have only one "1" in each column and do not lose information (and then make canopy work without any error) ?
Thank you,

Gene Urrutia

unread,
Apr 25, 2018, 11:04:06 AM4/25/18
to Lorraine Soudade, canopy_phylogeny
Hi Lorraine,

Thanks again for your interest in Canopy and the Marathon pipeline.

Yes you are correct, canopy.sample will not run when a column in C has more than one 1.  This could be the case when one CNV event semi-overlaps with another CNV.

Some possible approaches would be to:

1) simplify the regions so that each event has only 1 region.  Please see the example Yuchao presented below where the simple version focuses on the overlapped regions.
2) further QC the CNV events.  For example, you could check that the chr23 event is not due to gender difference between the samples (XY as deletion in males).  IGV is a good tool for visualization.

Thanks and please let us know if you have any further question.

Best,
Gene


--
You received this message because you are subscribed to the Google Groups "canopy_phylogeny" group.
To unsubscribe from this group and stop receiving emails from it, send an email to canopy_phylogeny+unsubscribe@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages