interpretation of results fot total CN analysis

marco

unread,

Jul 22, 2008, 6:55:46 AM7/22/08

to aroma.affymetrix

Dear All,

I wonder if there is any place where a legend and some explanation of
the graphical results can be found.

1. There is a \sigma_\Delta on top of the plots, what does this refers
to and how should this be interpreted? I tried to look in the cited
paper but I did not find any obvious explanation there.

2. From my understanding green is the raw and black is the modelled
results ?

3. Are the black lines always plotted on top of the green ones, so
they cannot be hidden by the green ones? I have a lot of black spots
but seems that they are mostly plotted for relative CN >1 or <-1

4. I have run 2 arrays 6.0 and looked at the CNs. The arrays are
generated from two monozygotic twins, so should be identical up to
somatic mutations of the tissue used.
I find some 50-100 tiny small CNs per chromosome but seems like that
for the most of those the relative CN is positive. Shouldn't one
expect that the relative CN is distributed simmetrically around zero ?

Best Regards

Marco

Henrik Bengtsson

unread,

Jul 22, 2008, 4:00:54 PM7/22/08

to aroma-af...@googlegroups.com

Hi Marco,

On Tue, Jul 22, 2008 at 3:55 AM, marco <marco...@gmail.com> wrote:
>
> Dear All,
>
> I wonder if there is any place where a legend and some explanation of
> the graphical results can be found.

Except from source code (and asking here), that information is
currently not available elsewhere.

>
> 1. There is a \sigma_\Delta on top of the plots, what does this refers
> to and how should this be interpreted? I tried to look in the cited
> paper but I did not find any obvious explanation there.

The \sigma_\Delta is a estimate of the standard deviation of the raw
CNs on that particular chromosome. The \Delta indicates that a robust
first-order difference estimator has been used, which has the property
of being robust against the existence of CN aberrations.

>
> 2. From my understanding green is the raw and black is the modelled
> results ?

Since I know the source code and you refer to "\sigma_\Delta" above, I
guess that you are running the CbsModel segmentation, correct? For
the CbsModel, yes, the raw CNs are plotted as green dots and the CN
regions found by CBS are plotted as black lines.

>
> 3. Are the black lines always plotted on top of the green ones, so
> they cannot be hidden by the green ones? I have a lot of black spots
> but seems that they are mostly plotted for relative CN >1 or <-1

The raw CNs are always plotted first. CN regions are plotted on top
of that. The "black spots" (should be lines) are probably extremely
short regions identified by CBS. Have you tried to zoom in?

When you say "CN >1 or <-1", do you mean they are extreme? It sounds
like you might have quite noisy data. If so, many of those regions
are likely to be false (positives). It is hard to say more since you
don't say how many samples you have, what you use as a reference, what
processing steps you've done and so on.

>
> 4. I have run 2 arrays 6.0 and looked at the CNs. The arrays are
> generated from two monozygotic twins, so should be identical up to
> somatic mutations of the tissue used.
> I find some 50-100 tiny small CNs per chromosome but seems like that
> for the most of those the relative CN is positive. Shouldn't one
> expect that the relative CN is distributed simmetrically around zero ?

Are you doing a paired or non-paired comparison of the two twins? I
need to know more about your samples and the number of samples you
have to give any decent comments on this. I wouldn't expect the CN
errors to be perfectly symmetrical. Remember, we are working on the
log-scale and the tail goes to negative infinity as the underlying CN
goes to zero. It does not go to positive infinity as the CN increases
(because there is saturation effect and there are no true inifinite
CNs).

Next time, please be more specific what analysis you have done. That
makes it is easier for anyone to be more specific in their answers.

Cheers

Henrik

>
>
>
> Best Regards
>
> Marco
>
>
> >
>

marco

unread,

Jul 22, 2008, 4:57:26 PM7/22/08

to aroma.affymetrix

Dear Henrik

thanks for the answers.

To complete :

I have 10 discordant monozygotic twin pairs genotyped on Affy 6.0 that
I am using for this experiment.
I am interested in CNs between the single (10) pairs of discordant
twins.

From the vignette, I understand that this is achieved by the
following:

########################
cdf <- AffymetrixCdfFile$fromChipType("GenomeWideSNP_6",
tags="Full")
cs <- AffymetrixCelSet$fromName("", cdf=cdf)
acc <- AllelicCrosstalkCalibration(cs)
csC <- process(acc, verbose=verbose)
plm <- AvgCnPlm(csC, mergeStrands=TRUE, combineAlleles=TRUE,
shift=+300)
fit(plm, verbose=verbose)
ces <- getChipEffectSet(plm)
fln <- FragmentLengthNormalization(ces)
cesN <- process(fln, verbose=verbose)

## paired analysis twins 1 and 2
ces1 <- extract(cesN, 11) # twin 1
ces2 <- extract(cesN, 13) # twin 2
cbs <- CbsModel(ces1, ces2)
ce <- ChromosomeExplorer(cbs)
print(ce)
process(ce, chromosomes=c(1:22), verbose=verbose)

.....

etc etc for the other 9 pairs

############################

Do you think this is the right way to go and I am using
aroma.affymetrix correctly ?
Any suggestion here is very wellcome, I am pretty a beginner in this
area ..

Some related questions:

1. From you paper I do not understand if only the SNP probe sets are
used or all the probes sets (i.e. also the non polymorphic probes that
are on the affy array).
2. "n" on the side of the plots is the number of loci for the specific
chromosome?

Cheers

Marco

On Jul 22, 10:00 pm, "Henrik Bengtsson" <h...@stat.berkeley.edu>
wrote:
> Hi Marco,

Henrik Bengtsson

unread,

Jul 22, 2008, 5:53:09 PM7/22/08

to aroma-af...@googlegroups.com

Hi.

On Tue, Jul 22, 2008 at 1:57 PM, marco <mazu...@gmail.com> wrote:
>
> Dear Henrik
>
> thanks for the answers.
>
> To complete :
>
> I have 10 discordant monozygotic twin pairs genotyped on Affy 6.0 that
> I am using for this experiment.
> I am interested in CNs between the single (10) pairs of discordant
> twins.
>
> From the vignette, I understand that this is achieved by the
> following:
>
> ########################
> cdf <- AffymetrixCdfFile$fromChipType("GenomeWideSNP_6",
> tags="Full")
> cs <- AffymetrixCelSet$fromName("", cdf=cdf)

Ay ay ay... your data set name is empty, i.e "" - not good, neither by
you nor me. In the next release I will have aroma.affymetrix protect
against this mistake and give an error. It looks like you have your
data in:

rawData/GenomeWideSNP_6/

and not

rawData/<data set name>/GenomeWideSNP_6/

Please read the online 'User Guide' carefully, and especially Section
'Where to put raw data' on Page '
Structure of data set directories'.

> acc <- AllelicCrosstalkCalibration(cs)
> csC <- process(acc, verbose=verbose)
> plm <- AvgCnPlm(csC, mergeStrands=TRUE, combineAlleles=TRUE,
> shift=+300)
> fit(plm, verbose=verbose)
> ces <- getChipEffectSet(plm)
> fln <- FragmentLengthNormalization(ces)
> cesN <- process(fln, verbose=verbose)

Otherwise it look alright.

>
> ## paired analysis twins 1 and 2
> ces1 <- extract(cesN, 11) # twin 1
> ces2 <- extract(cesN, 13) # twin 2
> cbs <- CbsModel(ces1, ces2)
> ce <- ChromosomeExplorer(cbs)
> print(ce)
> process(ce, chromosomes=c(1:22), verbose=verbose)
>
> .....
>
> etc etc for the other 9 pairs

Instead of setting up the CbsModel (and the ChromosomeExplorer) for
every pair of *arrays*, set it up for paired *sets* of arrays, e.g. if
arrays 1 & 2, 3 & 4, 7 & 8, and 11 & 13 are twins do:

cesA <- extract(cesN, c(1,3,7,11))
cesB <- extract(cesN, c(2,4,8,13))
cbs <- CbsModel(cesA, cesB)

and process as above.

>
> ############################
>
> Do you think this is the right way to go and I am using
> aroma.affymetrix correctly ?
> Any suggestion here is very wellcome, I am pretty a beginner in this
> area ..
>
> Some related questions:
>
> 1. From you paper I do not understand if only the SNP probe sets are
> used or all the probes sets (i.e. also the non polymorphic probes that
> are on the affy array).

The Bengtsson et al. 2008 ("CRMA") paper only covers the 500K chip
set, i.e. there is no discussion on non-polymorphic loci (CN probes)
in that paper. The CN probes were introduced in GWS5 and GWS6. Both
SNPs & CN probes are processed in the above script.

> 2. "n" on the side of the plots is the number of loci for the specific
> chromosome?

The "n=..." annotation is the number of *finite* raw CNs on the
chromosome. Sometimes you get a few NAs and those are not counted.