Copy number generation from SNP arrays

26 views

Skip to first unread message

Subarna Sinha

unread,

Oct 14, 2011, 9:10:10 PM10/14/11

to dchip-s...@googlegroups.com

Dear Cheng,

I have a few questions about using dChip for inferring copy number from Affymetrix SNP6 arrays for matched normal-tumor samples.

1. I have about 200 matched samples for tumor and normal SNP 6.0 data. I wasn't sure if I could analyze all pairs at once. So I tried first to match 1 pair (say, [a_tumor, a_normal]) and then to match a small set of 10 pairs (which also contains [a_tumor, a_normal]). I find the copy number results for [a_tumor] are different depending on whether 1 pair is used for the analysis or 10 pairs are used. Is this difference expected? If so, what is your recommendation for running the 200 samples? Will I be able to run all 200 samples in dChip in one run?

2. Related to question 1, when you have matched tumor-normal samples, do you use information from other sample pairs to derive copy number information of a given tumor-normal sample pair? My understanding based on what is written in the manual was you shouldn't but maybe I am wrong.

3. What kind of inference method would you recommend with 1 paired sample versus multiple paired samples? I tried 'Median Smoothing' versus 'HMM' and I found them to perform differently for 1 paired sample versus 10 paired samples.

4. I am often getting fractional copy number after both HMM and Median Smoothing, mainly with Median Smoothing. Is that expected? I was expecting copy number data to be integers. I am getting numbers like 0.04, 15.19, etc. That seems really strange.

5. Is there a description of how the output of copy number analysis should be inferred?

I am using the latest dChip version. Below, you can find the dChip parameters I am using.

CDF_FILE=C:\Users\subar\COMMON\Project\GenomeWideSNP_6.cdf
READ_DAT=0
READ_CEL=1
READ_DCP=0
DATA_PATH=C:\Users\subar\COMMON\Project\sample_subset
WORKING_DIR=C:\Users\subar\COMMON\Software\DCHIP
GOSURFER_DIR=C:\Users\subar\COMMON\Project
USE_UNNORM=0
MAS5_SIGNAL=0
OPTION_PAGE=3
SAMPLE_INFO_FILE=sample_info_subset.txt
GENE_NAME_FILE=
GENOME_INFO_FILE=C:\Users\subar\COMMON\Project\combined_genome_wide_snp6.txt
REF_GENE_FILE=None C:\Users\subar\COMMON\Project\refFlat.txt
CYTOBAND_FILE=
MASK_FILE=
DETECT_SINGLE_OUTLIER=1
NORM_SMOOTH_METHOD=0
PROBE_SEQ_FILE=
BACKGROUND_METHOD=1
NO_REP_ARRAY_OUTLIER=0
ALTERNATE_TWO_VIEW=0
WHICH_ONLINE_DATABASE=0
DIRECT_GO_ONLINE_DATABASE=0
MODEL_METHOD=1
DIST_MEASURE=0
LINKAGE_METHOD=0
OUTLIER_IN_RANGE=1
USE_CV_FILTER=1
CV_LOWER=0.5
CV_UPPER=1000
USE_CALL_FILTER=1
FILTER_PRESENCE=20
USE_EXPR_FILTER=0
EXPR_FILTER_VALUE=20
EXPR_FILTER_PCT=50
USE_REP_FILTER=0
REP_FILTER_LOW=0
REP_FILTER_HIGH=0.5
FILTER_INPUT_LIST=
FILTER_INPUT_USE=1
FILTER_OUTPUT_LIST=C:\Users\subar\COMMON\Software\DCHIP\dChip association.xls
ANOVA_FACTOR=None (overall score)
ANOVA_PVALUE=0.05
CLUSTER_GENE=1
CLUSTER_SAMPLE=0
ARRAY_LIST_FILE=C:\Users\subar\COMMON\Software\DCHIP\dChip_array_list.txt
USE_STD_SEP=1
STANDARDIZE_COL=0
EXTERNAL_DATA_FILE=
MBEI_MEMORY=500
SHOW_PROFILE=0
ADD_NEW_COLOR=1
STANDARDIZE_ROW=1
GREEN_RED=0
DISPLAY_RANGE=5
SAMPLE_NAME_VISIBLE=1
SIG_SAMPLE_PVALUE=0.01
SIG_GENE_PVALUE=0.001
STORE_DISTANCE=1
CLUSTER_SHOW_PROBE_SET=0
GENE_LIST_FILE=
ALLOW_MISSING=0
HAS_STD=0
HAS_BOTH=1
SKIP_ROW_END=1
EXPORT_FILE_FORMAT=0

Reply all

Reply to author

Forward

0 new messages