Regarding the copy number states and further processing

70 views
Skip to first unread message

Sam Padmanabhuni

unread,
Jan 19, 2015, 8:42:59 AM1/19/15
to aroma-af...@googlegroups.com
Dear AromaAffymetrix Team,

First of all, thank you very much for such a detailed vignette on how to perform the CNV analysis. 

I am Sam, a PhD student in genetics, working on CNV analysis on data from CytoScan HD Array. I have read the vignette to do CRMAv2 and non-paired CBS. I have copied the commands and ran in R.

But, I have few questions regarding CbsModel and GladModel in segmentation algorithm:

1. It is mentioned that, copy number states is not calculated in CbsModel segmentation. How do I get information of whether the segment is a loss or gain from output of CbsModel? I mean can this information be passed to other algorithms to estimate copy number state.

2. I have looked in to GLAD model and it is mentioned that it is developed for aCGH but my data is not from aCGH. Can it be still used to calculate copy number states for the data I am working on?

3. Also, do you have a vignette on how to run CRMAv2 and CBS on CytoScan HD array? This would be really helpful.

Thank you,

Best,
Sam.


Chengyu Liu

unread,
Jan 20, 2015, 4:38:27 AM1/20/15
to aroma-af...@googlegroups.com
Hi,

On Monday, January 19, 2015 at 3:42:59 PM UTC+2, Sam Padmanabhuni wrote:
Dear AromaAffymetrix Team,

First of all, thank you very much for such a detailed vignette on how to perform the CNV analysis. 

I am Sam, a PhD student in genetics, working on CNV analysis on data from CytoScan HD Array. I have read the vignette to do CRMAv2 and non-paired CBS. I have copied the commands and ran in R.

But, I have few questions regarding CbsModel and GladModel in segmentation algorithm:

1. It is mentioned that, copy number states is not calculated in CbsModel segmentation. How do I get information of whether the segment is a loss or gain from output of CbsModel? I mean can this information be passed to other algorithms to estimate copy number state.
As far as I know, the out put of CBS is the relative copy number.  It does not directly tell you the copy number states. 

2. I have looked in to GLAD model and it is mentioned that it is developed for aCGH but my data is not from aCGH. Can it be still used to calculate copy number states for the data I am working on?
GLAD can calculate copy number states for affy-array, although I have not used it before.

3. Also, do you have a vignette on how to run CRMAv2 and CBS on CytoScan HD array? This would be really helpful.
It is the same with other chiptype, prepare input as required (there is vignette).


BTW, I am also working on CytoScan HD. What kind of analysis are you going to do? Do you have paired samples or non-paired? Maybe we have something common and we can discuss.

Br,
C.Y



Thank you,

Best,
Sam.


Sam Padmanabhuni

unread,
Jan 20, 2015, 10:20:35 AM1/20/15
to aroma-af...@googlegroups.com
Hi,

Thanks for the clarification.

I am working on finding segments of duplication/deletion that are only present in patients but not in controls. And my samples are non-paired. 

From the literature search, it seems best to call CNVs using from different softwares to have a comprehensive list before doing association analysis. For this reason, I need to know gain or loss of DNA in a segment. 

When I tried GLAD on just 3 samples, it took more than 30 minutes to finish. 

I don't know how to incorporate this segments from CBS in to my analysis. Please let me know if you have any ideas on how to solve this.

Thanks,

Best Regards,
Sam.

Chengyu Liu

unread,
Jan 20, 2015, 11:01:46 AM1/20/15
to aroma-af...@googlegroups.com
Hi Sam,

I am doing similar stuff with you. I also need to identify regions which are amplified or deleted. I have paired samples. 
There are quite many different ways to define gain and loss of a segment. It is a tricky question. 

From the literature search, it seems best to call CNVs using from different softwares to have a comprehensive list before doing association analysis. For this reason, I need to know gain or loss of DNA in a segment. 
I did not get your point.  
 
When I tried GLAD on just 3 samples, it took more than 30 minutes to finish. 
My experience is that CBS is faster than GLAD. When I ran GLAD with 4 samples, it took like two or more to finish them.  

I don't know how to incorporate this segments from CBS in to my analysis. Please let me know if you have any ideas on how to solve this.
You can replace GLAD model with CBS model (cns <- CbsModel(dsT, dsN)where dsN is average of all the controls).


Do you need to identify copy number alterations (CNA)? or Just copy number variants(CNV)? I need to identify CNA not CNV. For now I do not know how. Do you know also how to map amplified or deleted region to genes?  If you know something about it, happy to hear.

Br,
C.Y

Sam Padmanabhuni

unread,
Jan 20, 2015, 11:52:42 AM1/20/15
to aroma-af...@googlegroups.com
Hi Liu,

That is good to know some one is doing similar stuff as mine. 

I was going to through 2-3 papers which described to get a comprehensive list of CNVs it is better to consider a CNV which is called in 2 or more CNV calling algorithms. This is what I have observed recently in some papers too. Please let me know if you want link for the papers I am talking about. I currently do not have them but will email you links for the papers.



On Tuesday, January 20, 2015 at 5:01:46 PM UTC+1, Chengyu Liu wrote:
Hi Sam,

I am doing similar stuff with you. I also need to identify regions which are amplified or deleted. I have paired samples. 
There are quite many different ways to define gain and loss of a segment. It is a tricky question. 

From the literature search, it seems best to call CNVs using from different softwares to have a comprehensive list before doing association analysis. For this reason, I need to know gain or loss of DNA in a segment. 
I did not get your point.  
 
When I tried GLAD on just 3 samples, it took more than 30 minutes to finish. 
My experience is that CBS is faster than GLAD. When I ran GLAD with 4 samples, it took like two or more to finish them.  

I don't know how to incorporate this segments from CBS in to my analysis. Please let me know if you have any ideas on how to solve this.
You can replace GLAD model with CBS model (cns <- CbsModel(dsT, dsN)where dsN is average of all the controls).

  I was actually thinking about this. Wow this solves my problem. Thanks a lot mate for this information.  



Do you need to identify copy number alterations (CNA)? or Just copy number variants(CNV)? I need to identify CNA not CNV. For now I do not know how. Do you know also how to map amplified or deleted region to genes?  If you know something about it, happy to hear.

I am lost here. Is there difference between CNA and CNV?



Br,
C.Y

 

Thanks,

Best Regards,
Sam.

On Tuesday, January 20, 2015 at 10:38:27 AM UTC+1, Chengyu Liu wrote:
Hi,

On Monday, January 19, 2015 at 3:42:59 PM UTC+2, Sam Padmanabhuni wrote:
Dear AromaAffymetrix Team,

First of all, thank you very much for such a detailed vignette on how to perform the CNV analysis. 

I am Sam, a PhD student in genetics, working on CNV analysis on data from CytoScan HD Array. I have read the vignette to do CRMAv2 and non-paired CBS. I have copied the commands and ran in R.

But, I have few questions regarding CbsModel and GladModel in segmentation algorithm:

1. It is mentioned that, copy number states is not calculated in CbsModel segmentation. How do I get information of whether the segment is a loss or gain from output of CbsModel? I mean can this information be passed to other algorithms to estimate copy number state.
As far as I know, the out put of CBS is the relative copy number.  It does not directly tell you the copy number states. 

2. I have looked in to GLAD model and it is mentioned that it is developed for aCGH but my data is not from aCGH. Can it be still used to calculate copy number states for the data I am working on?
GLAD can calculate copy number states for affy-array, although I have not used it before.

3. Also, do you have a vignette on how to run CRMAv2 and CBS on CytoScan HD array? This would be really helpful.
It is the same with other chiptype, prepare input as required (there is vignette).


BTW, I am also working on CytoScan HD. What kind of analysis are you going to do? Do you have paired samples or non-paired? Maybe we have something common and we can discuss.

Br,
C.Y



Thank you,

Best,
Sam.



Best Regards,
Sam.  

Chengyu Liu

unread,
Jan 21, 2015, 10:21:04 AM1/21/15
to aroma-af...@googlegroups.com
Hi, Sam,

No thanks, I don't need the reference papers. 


On Tuesday, January 20, 2015 at 6:52:42 PM UTC+2, Sam Padmanabhuni wrote:
Hi Liu,

That is good to know some one is doing similar stuff as mine. 

I was going to through 2-3 papers which described to get a comprehensive list of CNVs it is better to consider a CNV which is called in 2 or more CNV calling algorithms. This is what I have observed recently in some papers too. Please let me know if you want link for the papers I am talking about. I currently do not have them but will email you links for the papers.



On Tuesday, January 20, 2015 at 5:01:46 PM UTC+1, Chengyu Liu wrote:
Hi Sam,

I am doing similar stuff with you. I also need to identify regions which are amplified or deleted. I have paired samples. 
There are quite many different ways to define gain and loss of a segment. It is a tricky question. 

From the literature search, it seems best to call CNVs using from different softwares to have a comprehensive list before doing association analysis. For this reason, I need to know gain or loss of DNA in a segment. 
I did not get your point.  
 
When I tried GLAD on just 3 samples, it took more than 30 minutes to finish. 
My experience is that CBS is faster than GLAD. When I ran GLAD with 4 samples, it took like two or more to finish them.  

I don't know how to incorporate this segments from CBS in to my analysis. Please let me know if you have any ideas on how to solve this.
You can replace GLAD model with CBS model (cns <- CbsModel(dsT, dsN)where dsN is average of all the controls).

  I was actually thinking about this. Wow this solves my problem. Thanks a lot mate for this information.  
Excellent~!
 



Do you need to identify copy number alterations (CNA)? or Just copy number variants(CNV)? I need to identify CNA not CNV. For now I do not know how. Do you know also how to map amplified or deleted region to genes?  If you know something about it, happy to hear.

I am lost here. Is there difference between CNA and CNV?
But I am sure there are different. CNA refers to somatic copy number variants, and CNV refers to germline copy number variants. Once you have reference samples, the results you will get is CNA.

Sam Padmanabhuni

unread,
Jan 21, 2015, 11:05:35 AM1/21/15
to aroma-af...@googlegroups.com
Hi Liu,



On Wednesday, January 21, 2015 at 4:21:04 PM UTC+1, Chengyu Liu wrote:
Hi, Sam,

No thanks, I don't need the reference papers. 

On Tuesday, January 20, 2015 at 6:52:42 PM UTC+2, Sam Padmanabhuni wrote:
Hi Liu,

That is good to know some one is doing similar stuff as mine. 

I was going to through 2-3 papers which described to get a comprehensive list of CNVs it is better to consider a CNV which is called in 2 or more CNV calling algorithms. This is what I have observed recently in some papers too. Please let me know if you want link for the papers I am talking about. I currently do not have them but will email you links for the papers.



On Tuesday, January 20, 2015 at 5:01:46 PM UTC+1, Chengyu Liu wrote:
Hi Sam,

I am doing similar stuff with you. I also need to identify regions which are amplified or deleted. I have paired samples. 
There are quite many different ways to define gain and loss of a segment. It is a tricky question. 

From the literature search, it seems best to call CNVs using from different softwares to have a comprehensive list before doing association analysis. For this reason, I need to know gain or loss of DNA in a segment. 
I did not get your point.  
 
When I tried GLAD on just 3 samples, it took more than 30 minutes to finish. 
My experience is that CBS is faster than GLAD. When I ran GLAD with 4 samples, it took like two or more to finish them.  

I don't know how to incorporate this segments from CBS in to my analysis. Please let me know if you have any ideas on how to solve this.
You can replace GLAD model with CBS model (cns <- CbsModel(dsT, dsN)where dsN is average of all the controls).

  I was actually thinking about this. Wow this solves my problem. Thanks a lot mate for this information.  
Excellent~!

I have tried this and works good but at the end I need the information whether there is a gain or loss at the segment. I will use GLAD model to get gain or loss at a segment. My samples and controls are completely unrelated so I am little bit doubtful whether I am doing right or not. I also found some other algorithms that can work on segments produced by CBS model still looking into them.
 



Do you need to identify copy number alterations (CNA)? or Just copy number variants(CNV)? I need to identify CNA not CNV. For now I do not know how. Do you know also how to map amplified or deleted region to genes?  If you know something about it, happy to hear.

I am lost here. Is there difference between CNA and CNV?
But I am sure there are different. CNA refers to somatic copy number variants, and CNV refers to germline copy number variants. Once you have reference samples, the results you will get is CNA.

Then I am also looking for CNA. What other softwares have you tried on data from CytoScan HD array? 
Best,
Sam. 

Chengyu Liu

unread,
Jan 22, 2015, 3:42:37 AM1/22/15
to aroma-af...@googlegroups.com
Hi,
 
I have tried this and works good but at the end I need the information whether there is a gain or loss at the segment. I will use GLAD model to get gain or loss at a segment. My samples and controls are completely unrelated so I am little bit doubtful whether I am doing right or not. I also found some other algorithms that can work on segments produced by CBS model still looking into them.

I think you can use GLAD to call gain and loss. But CBS does not return gain or loss, only segments. If you use CBS you should call gain or loss yourself (or use other tools such as GISTIC).
 
Then I am also looking for CNA. What other softwares have you tried on data from CytoScan HD array? 
Like you I used aroma to preprocess, segmented using CBS and manually call gain or loss. The simplest way is using a threshold to define gain or loss. If I remember correctly, one of TCGA papers in Nature, there a fixed threshold was used to define gain and loss. Maybe you can check that.

Br,

Henrik Bengtsson

unread,
Jan 22, 2015, 2:36:59 PM1/22/15
to aroma-affymetrix
Hi guys,

here are some late feedback on this discussion:

* When talking about copy numbers, it is important to always be very
clear and distinguish between whether we talk about normal/germline
CNs or tumor CNs. The former take integer CN levels (0, 1, 2, 3,
...), whereas for tumors we very rarely observe pure homogeneous tumor
cells, which is why we only measure and observe non-integer CN levels.
Hopefully, we observe at least discrete CN levels in tumors, but one
should never expect integer levels.

* aCGH: a historical term often used as a synonym for total copy
numbers. For example, some say "aCGH analysis" when they really mean
"total copy-number analysis". aCGH stands for array-CGH, or in full
'array comparative genomic hybridization'. This refers to the older
generation two-color/two-channel arrays where a test and a reference
sample where labelled with two different dyes and "competitively"
hybridized to the same array and the same probes. I recommend to stop
using this term and instead use "total copy number", total CN, or
"TCN" (when it's clear). By being explicit about "total", you're
also explicitly contrasting it to "parent-specific" CNs (which you can
do if you have SNP data).

* CNA: Copy-Number Aberration. This term can be applied to both tumor
and germline samples. In tumors you expect non-integer CN levels. In
germline/normals you expect integer CN levels (0, 1, 2, 3, ...).

* CNP: Copy-Number Polymorphism. This term applies to copy-number
differences in relationship to a population. This also implies we're
talking about germline genomes. In other words, CNPs are also integer
CN levels (0, 1, 2, 3, ...). CNPs are used to specify, say, "2% of
the Europeans have a 1 copy deletion of length 1.0-1.5 Mb on Chr 3 at
124.5Mb". CNPs is for segment deletions and gains what SNPs are for
nucleotide polymorphisms. The term CNP is rare. It is much more
common to hear/see "CNV".

* CNV: Copy-Number Variation. Ideally the word "variation" refers to
"polymorphism" and therefore the term CNV should be used only to refer
to CNPs. I don't know if there is a formal definitions, but I find it
unfortunate to see CNV being used when CNA should be used. By my
books, CNV only takes integer CN levels (0, 1, 2, 3, ...). The term
CNV should never be used to refer to CN levels in tumors.

* Calling total CN levels is very hard in tumors, and as the first
above point alludes to, it may not even be a well defined problem.
For instance, imagine you have a tumor sample with 5% tumor cells and
95% normal cells, and that the those tumors cells all have a deletion
on Chr 2. Then, at what point to you consider that sample itself to
have a deletion on Chr 2? Are you after he sample/tissue itself, or
are you after those 5% tumors cells? What if you have a heterogeneous
mix of tumor cells? The more precise you can specify your question
the more easy it is for you to decided what approach forward (may)
work and what doesn't work. Here "work" can also be read as "make
sense".

* The first and most important task for almost all segmentation
methods is to *segment* the genome, that is, identify at what genomic
locations the observed DNA (tumor, normal or a mix) changes in CN
level. Together, these location, aka "change points", defines how the
genome can be "partitioned" into segments with equal CN levels, such
that when we look at a particular segment, we can assume that all
genomic locations within that segment has the same underlying genomic
composition (e.g. gain, loss, loss in 5% of the cells, etc.). CBS,
GLAD, and many other methods, segment the genome this way as a first
step.

* A common task after having decided on the segments (partitioning of
the genome), is to decide on what is going on within each segment.
Not all methods does this. For instance, CBS "only" provides you with
the change points. GLAD on the other hand does both the segmentation
and then also provides a method for calling. Theoretically, there is
nothing preventing you from using the GLAD *calling* algorithm using
the segmentation found by CBS. Unfortunately, I don't think it is
straightforward to do that in practice; at least you have to coerce
one data format into one that GLAD understands.

* GLAD does not scale well with the number of loci, because it's
computational complexity is ~O(n^2), unless things have changed since.
In 2007, I tried to predict GLAD's processing time when we were using
the Affymetrix 500K chips and the GenomeWideSNP_5 and GenomeWideSNP_6
were starting to come out. A GWS6 chip would basically take days to
segment. See attached PNG for a table.

* CBS is much faster as an algorithm. Also, the implementation in the
DNAcopy package has been made even faster over time. There was a
major speedup back in 2009, cf.
http://aroma-project.org/benchmarks/DNAcopy_v1.19.2-speedup/

Over and for now

Henrik
> --
> --
> When reporting problems on aroma.affymetrix, make sure 1) to run the latest
> version of the package, 2) to report the output of sessionInfo() and
> traceback(), and 3) to post a complete code example.
>
>
> You received this message because you are subscribed to the Google Groups
> "aroma.affymetrix" group with website http://www.aroma-project.org/.
> To post to this group, send email to aroma-af...@googlegroups.com
> To unsubscribe and other options, go to http://www.aroma-project.org/forum/
>
> ---
> You received this message because you are subscribed to the Google Groups
> "aroma.affymetrix" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to aroma-affymetr...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
BengtssonH_20070601-Complexity of GLAD.png

Sam Padmanabhuni

unread,
Jan 23, 2015, 4:58:08 AM1/23/15
to aroma-af...@googlegroups.com, h...@biostat.ucsf.edu
Hi Henrik,

Thank you very much for the information and it has clarified lot of my doubts.

Best,
Sam.

Chengyu Liu

unread,
Feb 4, 2015, 3:31:55 AM2/4/15
to aroma-af...@googlegroups.com, h...@biostat.ucsf.edu
Hi Sam,

I would like to discuss something about cytoscanHD array. Did you find that when you have done preprocessing, there are chromosome 24 and 25 ?

Br,
Chengyu

Sam Padmanabhuni

unread,
Feb 4, 2015, 3:37:46 AM2/4/15
to aroma-af...@googlegroups.com
HI Chengyu,

Yes, I do have chromosome 23, 24 and 25. We are only interested in CNAs in autosomal chromosomes so not going to include X and Y chromosomes in further analysis.

Best,
Sam.


--
--
When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example.
 
 
You received this message because you are subscribed to the Google Groups "aroma.affymetrix" group with website http://www.aroma-project.org/.
To post to this group, send email to aroma-af...@googlegroups.com
To unsubscribe and other options, go to http://www.aroma-project.org/forum/

---
You received this message because you are subscribed to a topic in the Google Groups "aroma.affymetrix" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/aroma-affymetrix/wCFGrViNri4/unsubscribe.
To unsubscribe from this group and all its topics, send an email to aroma-affymetr...@googlegroups.com.

Chengyu Liu

unread,
Feb 4, 2015, 8:45:33 AM2/4/15
to aroma-af...@googlegroups.com
Thanks Sam for sharing the information.
But I dont understand, why there are chromosome 24 and 25 (autosomal chromosome 1-22, X(23), Y(24))?
Are you using any other package than aroma.affymetrix ?
Are you interested in total copy number or allele-specific copy number analysis ?

Now I am working on allele-specific copy number analysis. But I am stuck in the steps where copy number alterations are called and LOHs are identified.  Do you have any suggestions?
How about you ? 
Br,
C.Y

Sam Padmanabhuni

unread,
Feb 4, 2015, 8:53:14 AM2/4/15
to aroma-af...@googlegroups.com
Hi Chengyu,

On 4 February 2015 at 15:45, Chengyu Liu <chengyu...@gmail.com> wrote:
Thanks Sam for sharing the information.
But I dont understand, why there are chromosome 24 and 25 (autosomal chromosome 1-22, X(23), Y(24))?

yes, 23 is chromosome X and 24 is chromosome Y. Chromosome 25 is for pseudo-autosomal regions in X.


Are you using any other package than aroma.affymetrix ?

Yes, I am using ChAS to find CNAs.
 
Are you interested in total copy number or allele-specific copy number analysis ?

I am interested in allele-specific copy number analysis. 

Now I am working on allele-specific copy number analysis. But I am stuck in the steps where copy number alterations are called and LOHs are identified.  Do you have any suggestions?

ChAS actually does lot analysis along with calling CNV. LOH is one of them. Probably it is better if you can use that.
 
How about you ?

We are not yet concerned about LOH till now. Maybe in future. 

Best,
Sam.
Reply all
Reply to author
Forward
0 new messages