Trimming option at copy number analysis

8 views

Skip to first unread message

Cheng Li

unread,

Dec 29, 2008, 2:22:15 AM12/29/08

to dChip Software

Posted: 24 May 2007 09:34 am Post subject: Trimming option at copy
number analysis

--------------------------------------------------------------------------------

Hi Cheng,

We have some basic questions regarding the trimming aspect of SNP data
analysis:

First, is the trimming used only to determine the "raw" copy numbers?
That is to say, once the raw copy numbers are obtained, we are
assuming that there is no trimming to generate the median-smoothed or
HMM "inferred" copy numbers. Is this correct; that smoothed values are
generated from all probes across an individual's genome regardless to
any particular probe's trim status?

Specific to median smoothing: how are the first and the last
Window_Size/2 SNPs being smoothed?

Many thanks.

Posted: 24 May 2007 02:18 pm Post subject:

--------------------------------------------------------------------------------

> First, is the trimming used only to determine the "raw" copy numbers?

Yes. For each SNP, after some samples are trimmed, the rest samples
are used to determine the mean and variation of the raw signals (of 2
copies), which are used for computing raw copy numbres. Trimming is
not used in HMM or median smoothing.

> Specific to median smoothing: how are the first and the last Window_Size/2 SNPs being smoothed?

It's reflected at the boundry. E.g. at window size 9, SNP 3's median
will be computed from these SNPs (2, 1, 1, 2, 3, 4, 5, 6, 7).

Posted: 24 May 2007 02:21 pm Post subject:

--------------------------------------------------------------------------------

--- In dc...@yahoogroups.com, "Rehab Abdel Rahman" <rahman_rehab@...>
wrote:

Dear Dr Cheng,
The trimming vlaues could take from negative and positive values. I
wonder what the negative trimmin means (in chromosome/copy number
analysis).

Thanks for your help,
Rehab

--- In dc...@yahoogroups.com, "Cheng Li" <cli@...> wrote:

Hi Rehab,

If you specify "% of samples trimmed " as a negative value (e.g. -20),
only
the normal samples (specified as 2 for "Ploidy(numeric)" in sample
information file), instead of all samples, are trimmed by this percent
and
used for computing signal distribution of copy 2.

Cheng

Posted: 11 Jul 2007 12:45 pm Post subject:

--------------------------------------------------------------------------------

RE: [dChip] trimmed analysis
Hi,

Specifying trimmed % at any step after “Open group” is fine,
preferably at “Analysis/Chromosome/Options”. Note this value is used
only at “Analysis/Chromosome” to compute signal mean for copy number
analysis. Also LOH and copy number is computed at “Analysis/
Chromosome”.

If you don’t have normal samples, you don’t need sample info file with
“Ploidy” column. “Gender” of samples has default value of “Female” so
it’s ok to not have this column.

Cheng

From: dc...@yahoogroups.com [mailto:dc...@yahoogroups.com] On Behalf
Of arrayprofile
Sent: Wednesday, May 31, 2006 12:24 PM
To: dc...@yahoogroups.com
Subject: [dChip] trimmed analysis

Hi cheng,

My 100k SNP arrays do not have normal samples, all are cancer
samples, so I understand I should use trimmed analysis. Should I
specify the trimmed analysis ("Options/Chromosome/% of samples
trimmed" to be 10 to use trimmed analysis) when I use "Analysis/Open
group" the very first time to read in CEL files? or should I do it
when I normalize it using invariant set ("Analysis/Normalize")? or
should I do it at the time of calculating MBEI ("Analysis/Model-based
expression")? I have read in all the CEL files and normalize them
without specifying trimmed analysis, I plan to use that option in the
next step of calculating MBEI, hopefully this is ok.

Another relevant question is at which step (read in CEL files,
normalize or MBEI), the copy number and LOH are calculated?

Also, since I don't have any normal sample and all my samples are
female, do I still need to provide a sample information file with
Gender and Ploidy(numeric) columns?

Thanks

Posted: 17 Jul 2007 12:25 pm Post subject: Question on trimed
method for copy number calculation

--------------------------------------------------------------------------------

Fri Jun 2, 2006 5:30 pm
Hi Jacob,

Your concern is justified. When a region has copy change in the same
direction in most samples, this method will likely miss the region.

In such case having at least one normal sample will be useful, so at a
first round analysis one can pair this normal with every tumor in
array list file and check “Options/Use paired normal as reference” to
get raw copy number and assess whether such regions are likely.

Cheng

From: dc...@yahoogroups.com [mailto:dc...@yahoogroups.com] On Behalf
Of Zhang, Jacob
Sent: Friday, June 02, 2006 1:31 PM
To: dc...@yahoogroups.com
Subject: [dChip] Question on trimed method for copy number
calculation

Dr Li:

I have a question. Suppose we have a bunch of SNP data for some type
of tumors, we want to find out copy number abnormalities for this type
of tumor. When we do not have the normal reference samples, I
understand we can use the trimmed method to calculate the raw copy
numbers. As my understanding, for each SNP, the algorithm will throw
away the same amount of sample data on each side of lower and higher
ends, then use the remaining data to calculate the normal reference
signals. Here comes my question. When we are expecting to see some
areas where copy numbers are all higher or lower than normal 2, does
this method constitute a danger of being unable to display these areas
with uniformly low or high copy numbers across all the samples?

Thanks,

Jacob

Thu Jun 29, 2006 4:07 pm
Hi Josh,

Using trimmed mean method is fine. Specifying all tumors as “normal”
is similar as using trimmed mean with very small threshold.

Given 12 samples, I think it’s very likely to identify copy changes
that are > 4 copy. You can attach your copy number view image so I can
check.

Cheng

From: dc...@yahoogroups.com [mailto:dc...@yahoogroups.com] On Behalf
Of Joshua Herbeck
Sent: Wednesday, June 07, 2006 6:18 PM
To: dc...@yahoogroups.com
Subject: [dChip] 500K copy number questions

Hi,

I am trying to estimated copy number using 500K SNP data. I don't
have reference "normal" samples. I am unclear what the best procedure
is.

Under "Copy number analysis" in the manual it is stated, "If no
normal
is available, one may specify all tumor samples having
"Ploidy(numeric)" as 2 (see below) to make conservative estimate of
copy number changes."

But it is also stated, "If there are no normal samples, you do not
need a sample information file with "Ploidy" column."

and also, "If no sample is specified as 2 for "Ploidy(numeric)",
specify "Options/Chromosome/% of samples trimmed" to be > 1 to use
trimmed analysis."

I have been doing the following: Not including a Ploidy column in my
sample information file, and running trimmed analysis with various
%trimmed values >0. I see no copy number changes (greater than 4
copies, at least) at all in my data set (12 samples). Perhaps this is
a question larger than just about dChip operation, but: Can I be
confident in this result, given my small N and my lack of known
normal
samples?

Thanks, and sorry if these are naive questions.

Josh

Tue May 16, 2006 3:26 am
Hi Natalie,

Ideally you want to have normal samples in each batch so tumor samples
can
be combined with the normals in the same batch to do analysis. You may
try
to use trimmed mean method for the samples in each batch (specified by
array
list file) to see if it reduce noise signal level in the raw copy
number of
individual samples.

Cheng Li

From: Natalie Twine
Sent: Thursday, April 20, 2006 11:32 AM
To: Cheng Li
Subject: dealing with batch effect

Dear Cheng,

I have attached a screen print of chromosome 20 showing all of my
samples (raw Copy numbers displayed). It is quite clear that there is
a
batch effect between 4 groups of samples - processed at difference
times. I have performed normalisation across all the samples and
comparing results from HMM and median smoothing predicted copy number
to
gauge real copy number changes. However I was wondering whether you
can
suggest any analysis function I could use to counteract this batch
effect?

Natalie

Mon Jan 23, 2006 4:18 pm
You may use this trimming method:

http://www.dchip.org/copy.htm#trimmed

Cheng

From: dc...@yahoogroups.com [mailto:dc...@yahoogroups.com] On Behalf
Of ALFHM
Sent: Monday, January 23, 2006 2:36 PM
To: dc...@yahoogroups.com
Subject: [dChip] copy number polymorphisms

Hi!!

I am trying to analyze copy number polymorphisms on a population of
normal samples, but I am not quite sure if there is a method for
inferring the copy number of these samples without comparing against
other reference.

All my cases are diploid and normal, just want to get copy number
info
to find potential polymorphims in a set of 50 K arrays.

Any help will be very useful!!!
Thanks

Sun Dec 4, 2005 7:50 pm
Hi Natalie,

1. You can uncheck "Chromosome/Average curve".

2. This phenomena is common when normal and tumor samples are not from
the same batch or the same experiment. Is this the case for you? If so
you may try opening only tumor samples in a group, and then use the
"trimmed" method:
http://www.dchip.org/copy.htm#trimmed

Cheng

From: Natalie Twine
Sent: Thursday, December 01, 2005 1:17 PM
To: Cheng Li
Subject: copy number questions

hi cheng,

sorry for the bombardment of questions recently. it is a steep
learning curve!

a couple of questions about copy number.

1. How do I change the blue curve on the right hand side of the copy
number chromosome view from showing the average of the copy number to
the curve for each sample depending on which one i select with the
mouse?

2. My inferred copy number (using median smoothing) data for the tumor
samples is a fair bit more noisy than the normals i am using
(unpaired, there are 8 normals). The copy number seems to vary in
blocks of markers across all samples. I have attached a screenshot of
chromosome 5 to show you this. I am seeing this across all chromosomes
for the tumor samples, but not for the normals. Is this usual? Is
there anyway I can make the data less noisy? It is difficult to
identify real copy number changes when the data is noisy.

Thanks
Natalie

Posted: 18 Feb 2008 08:55 pm Post subject:

--------------------------------------------------------------------------------

Usually you should specify all normal samples as Ploidy(Numeric) of 2
and female samples as “Gender” F (male is M), so that chromosome X
copy is properly handled.

If trimmed % is positive, all samples in “open group” will be used for
trimmed analysis, and negative trimmed % will compute trimmed mean
using Ploidy(Numeric) 2 samples only. If there is no ploidy column,
all samples are regarded as ploidy missing, but trimmed analysis can
still be used.

Cheng

________________________________________
From: Shailender Nagpal Sent: Monday, February 18, 2008 7:47 PM
To: 'Cheng Li'
Subject: RE: bioinformatics

Hi Cheng,

While loading 250k Sty samples, we specify a “sample information
file”, with Ploidy(Numeric) column set at 2 for female samples, so
that that they are used to create the reference distribution. When we
do chromosome analysis, is this column used? What if we also specify
trimmed analysis parameter at 5% or -5%. What if we don’t specify the
ploidy column?

Your input would be very useful! Thanks.

Shailender

Posted: 18 Sep 2008 04:45 am Post subject: copy number, trimming
and batch effect

--------------------------------------------------------------------------------

Hi,

I am trying to analyse copy numbers in a dataset of ca. 400 normal
samples on Affy 250K Sty chips.

I first tested the analysis disregarding any batch effects. Because of
memory limits, I performed normalizing & MBEI for the samples in four
100-sample sets (using the same standard sample for all four sets),
and then copy number inference in five 80-sample sets (with 20%
trimming and HMM). In a few dozen samples within these sets, the
results showed a clear batch effect that corresponded to the
hybridization date of the arrays.

Now I would like to correct the batch effect, preferably by defining a
RefBatch column in the sample info file, but I have a number of
questions:

(1) Do I also have to define the corresponding batch structure in the
array list file using standard separators? The software obviously
recognizes the batches given in the array list file, beacuse the log
says "Use sample information column 'RefBatch' (...) There are 5
reference batches for copy number analysis. Default batch (16 samples
with no 'RefBatch' value specified) (...) Batch 'D2' (16 samples)
(...)", etc., but later it also says "Only found 1 sample group; use
standardize separators to divide samples in 'Tools/array list file'".
Can I just ignore the latter?

(2) Do I have to redo the Normalize & MBEI step to get the batch
effect corrected? If yes, will the step use the RefBatch information
from the sample information file, i.e. is it possible to perform the
step for 100-sample sets as before, or does the step have to be
performed for one batch (10-25 samples) at a time?

(3) Is the raw copy number calculation performed in the MBEI step or
in the Analysis/Chromosome step? Specifically: if I want to experiment
on different trim percentages, do I have to redo MBEI for every
percentage?

(4) What happens if the trimming percent would result in less than one
sample (e.g., 10% on 10 samples => 0.5 samples in each end)? In
general, how is the number of samples to be trimmed rounded (e.g., 1.5
at each end)?

(5) Is it possible to export raw copy numbers?

Your comments would be very helpful. Thanks,
Elina

PS. I don't know if it is an intentional feature, but the RefBatch
column based correction approach seemed to produce strange results (a
constant, high copy number over all samples and SNPs) unless there
were some samples whose RefBatch was blank. This happened with samples
where the batch information was not used in normalizing & MBEI.

Reply all

Reply to author

Forward

0 new messages