-- Forwarded old discussion --
Posted: 09 Aug 2007 04:26 pm Post subject: Reference file for
LOH
--------------------------------------------------------------------------------
I have a couple simple questions regarding LOH and CN analysis.
When I run LOH analysis using only the small number of tumor samples
using the HMM CONSIDERING HAPLOTYPE in "Option/Chromosome/Inferred LOH
method", I get different results if I use only the tumor samples, or
if I also include a set of controls samples (although not matched
controls). Why would this be?
Is there a way to use a reference sample to estimate the
heterozygosity at each marker when using the HMM WITHOUT considering
haplotype?
How is the Average heterozygosity rate used in the HMM?
Thanks for your help.
Andrew Skol
Section of Genetic Medicine
University of Chicago
Posted: 09 Aug 2007 05:11 pm Post subject:
--------------------------------------------------------------------------------
If the difference is large, most likely it's due to a recent bug when
using control genotype data:
The "HMM haplotype considering haplotype" considers the dependence
between two adjacent SNPs as well as the heterozygosity (het) rate. If
the 1st SNP is not informative to determine the Het/Hom status of the
2nd marker, the probability of the 2nd marker being Het given
retention is set to be its Het rate.
Posted: 09 Aug 2007 06:29 pm Post subject:
--------------------------------------------------------------------------------
I've installed the newest version, but still seem to have an issue.
When I run the LOH analysis on only tumor samples using the HapMap
genotype file for reference (from
http://biosun1.harvard.edu/complab/dchip/copy.htm),
I find a very large number of deletions. In contrast, when I run the
tumor sample together with a set of 20 normal controls, I have far
fewer deletions. I've attached screen shots of both including the
options window. Any thoughts you have would be greatly appreciated.
It might be relevant that I'm also using a sample file as well. For
example, this is the first several lines:
Array Sample Gender Ploidy(numeric)
07-0243_KO-Nsp 07-0243 Male
07-0245_KO-Nsp 07-0245 Male
07-0246_KO-Nsp 07-0246 Male
07-0247_KO-Nsp 07-0247 Male
07-0248_KO-Nsp 07-0248 Female
07-0249_KO-Nsp 07-0249 Male
NA10851_FinNsp_vR1_579813_A1_1_SC2 NA10851 Male 2
NA10855_FinNsp_vR1_579548_A4_1_SC1 NA10855 Female 2
NA10863_FinNsp_vR1_579812_A10_1_SC7 NA10863 Female 2
I don't recall seeing anything in the documentation that says this
would affect the analysis, but thought I should mention it.
One final question: Is dChip able to perform a statistical test for
the presence of either LOH or CN < 2?
Thanks again for your help.
Andrew
Posted: 09 Aug 2007 06:50 pm Post subject:
--------------------------------------------------------------------------------
It seems that in the 2nd figure, the array list used doesn't have
standardize separators to divide samples. As the result, the 1st and
2nd samples are regarded as paired normal and tumor and the resulting
"LOH" are in fact mismatching gentoypes between the two.
There is no test or p-values assigned with LOH/CN. The inferred LOH
data are probabilities of LOH between 0 and 1.
Posted: 10 Aug 2007 10:25 am Post subject:
--------------------------------------------------------------------------------
I don't have paired sample data so I don't believe I should be using
the standardize separator.
I have an alternative set of samples that I have dcp files for which I
would like to use as the reference set for the LOH analysis. If I was
performing a CN analysis I believe I would include a sample file which
listed the individuals in this reference sample with ploidy 2, and
list the tumor samples with a blank ploidy column. Then I would set up
an array that included the tumor samples only. I thought that perhaps
this same setup would work for an LOH analysis (I expect many folks
that would perform a CN analysis would like to perform a LOH analysis
using a common reference set). However, this does not seem to be the
case.
How would I go about performing an LOH analysis using a reference set
for which I have dcp files.
Thanks again for your help. It's greatly appreciated.
Andrew
Posted: 10 Aug 2007 02:16 pm Post subject:
--------------------------------------------------------------------------------
You need to use standardize separators to specify they are unparied
tumor samples, e.g.
tumor1
---Standardize---
tumor2
---Standardize---
tumor3
when no reference genotype file is used and "HMM considering
haplotype" is specified, dcp files with "Ploidy" 2 will be used as
reference genotypes.
Posted: 13 Aug 2007 06:16 pm Post subject:
--------------------------------------------------------------------------------
Thank you again. I have one more question on this topic.
I'm now perform a CN analysis on the same set of data (6 tumor , 20
normal). I believe I now have to use a sample file so that I can
specify that the normal samples are ploidy of 2. I used the same array
file such that each tumor cell was separated by --Standardize--.
However, when I run the analysis dChip tells me that "All samples are
in one batch" and that "26 samples with extreme signals are trimmed".
My understanding was that by specifying the array file and sample file
as I did that dChip would know to use the 20 normal samples as the
control set. However, this does not appear to be the case. Could you
please tell me how to let dChip know that the 20 ploidy 2 samples are
the control set.
Thanks you again for your help.
Andrew
Posted: 13 Aug 2007 07:08 pm Post subject:
--------------------------------------------------------------------------------
Make sure you have "Ploidy(numeric)" column in sample info file and
"Options/% samples trimmed" is 0. You can attach dchip output message
at "Analysis/Chromosome" if this doesn't work.
From:
dc...@yahoogroups.com [mailto:
dc...@yahoogroups.com] On Behalf
Of coldrecd
Sent: Friday, November 03, 2006 5:03 PM
To:
dc...@yahoogroups.com
Subject: [dChip] Slow reading of reference genotype
Cheng Li:
I've been using Dchip for copy number and LOH analysis with Affy 500K
SNP data. I figured out that keeping the genotype .txt files in "cdf
order" allows them to be read much more quickly. Would the same apply
to the haplotype data? The files Nsp_consensus.txt and
Sty_consensus.txt load very slowly, and their order is numerical by
ProbeSetID.
Chris
Posted: 12 Jul 2007 12:36 pm Post subject: Slow reading of
reference genotype file
--------------------------------------------------------------------------------
Chris,
Yes you best keep them in the same order as cdf file (“Tools/export
expression data” can show the order).
Cheng