Re: bug hunt in CDF

69 views

Skip to first unread message

Mark Robinson

unread,

Apr 2, 2008, 4:57:14 AM4/2/08

to Alistair Chalk, aroma-af...@googlegroups.com

Alistair.

I've copied this to the aroma.affymetrix google group, in case someone else has this problem.

I *think* the CDF is ok. I don't get this error.

I think the problem is that you have processed (BG adjusted, quantile normalized) the data previously with a different CDF and then try to invoke a PLM, where it first looks for unestimated units ... and there it finds that the number of probes doesn't match.

Two things to do ....

1. use force=TRUE in your 'fit' and 'process' calls ... that will start from scratch and overwrite the CEL files for the normalized data, chip effects, etc.

2. use tags ... with tags, you can process the same dataset with a bunch of different CDFs ... for example, the code below does BG adjustment, normalization on the 'main' set of probesets, but then does fits on the Ensembl 47 or 49 and stores the results in separate locations ...

# - - - - - - - - - - - - - - - - - - - - - - -

# setup dataset and chip names

# - - - - - - - - - - - - - - - - - - - - - - -

projectName <- "colon_affy"

chipType <- "HuEx-1_0-st-v2"

# - - - - - - - - - - - - - - - - - - - - - - -

# define CDF files (main + ensembl)

# - - - - - - - - - - - - - - - - - - - - - - -

cdfMain <- AffymetrixCdfFile$fromChipType(paste(chipType, "main,A20070914,EP", sep=","))

cdfEnsb47 <- AffymetrixCdfFile$fromChipType(paste(chipType, "U-Ensembl47,G-Affy", sep=","))

cdfEnsb49 <- AffymetrixCdfFile$fromChipType(paste(chipType, "U-Ensembl49,G-Affy", sep=","))

csColon <- AffymetrixCelSet$fromName(projectName, chipType=chipType)

setCdf(csColon, cdfMain)

# - - - - - - - - - - - - - - - - - - - - - - -

# background correction + quantile normalization

# - - - - - - - - - - - - - - - - - - - - - - -

bcColon <- RmaBackgroundCorrection(csColon,tag=c("*","main"),addJitter=T)

csBCColon <- process(bcColon,verbose=verbose)

qnColon <- QuantileNormalization(csBCColon, typesToUpdate="pm")

csNColon <- process(qnColon,verbose=verbose)

# - - - - - - - - - - - - - - - - - - - - - - -

# fit probe level models

# - - - - - - - - - - - - - - - - - - - - - - -

setCdf(csNColon, cdfEnsb49)

plmColon <- ExonRmaPlm(csNColon, mergeGroups=TRUE,tag=c("*","ens49"))

fit(plmColon, verbose=verbose)

setCdf(csNColon, cdfEnsb47)

plmColon <- ExonRmaPlm(csNColon, mergeGroups=TRUE,tag=c("*","ens47"))

fit(plmColon, verbose=verbose)

.... this will fit PLMs based on both CDFs and you could then go and read chip effects separately .... note that under the hood this will store data in the following directories for:

unix88 509 % ll plmData/ | grep colon_affy

drwxr-x--- 3 mrobinson lab0605 4096 Apr 2 19:52 colon_affy,RBC,main,QN,RMA,merged,ens47

drwxr-x--- 3 mrobinson lab0605 4096 Apr 2 19:34 colon_affy,RBC,main,QN,RMA,merged,ens49

unix88 510 % ll probeData/ | grep colon_affy

drwxr-x--- 3 mrobinson lab0605 4096 Apr 1 10:29 colon_affy,RBC,main

drwxr-x--- 3 mrobinson lab0605 4096 Apr 1 10:41 colon_affy,RBC,main,QN

Let me know if that doesn't work.

Cheers,

Mark

On 02/04/2008, at 7:00 PM, Alistair Chalk wrote:

Hi Mark,

The CDF broke the script - ideas? :)

> cdf <- AffymetrixCdfFile$fromChipType(chipType, tags="U-Ensembl49,G-Affy") #was core,U-...
> print(cdf)
AffymetrixCdfFile:
Path: annotationData/chipTypes/HuEx-1_0-st-v2
Filename: HuEx-1_0-st-v2,U-Ensembl49,G-Affy.cdf
Filesize: 44.04MB
File format: v4 (binary; XDA)
Chip type: HuEx-1_0-st-v2,U-Ensembl49,G-Affy
Dimension: 2560x2560
Number of cells: 6553600
Number of units: 22035
Cells per unit: 297.42
Number of QC units: 1
RAM: 0.00MB
>
> cs_osc_control <- AffymetrixCelSet$fromName("OSCControlClassification",cdf=cdf)
> print(cs_osc_control)

AffymetrixCelSet:
Name: OSCControlClassification
Tags:
Path: rawData/OSCControlClassification/HuEx-1_0-st-v2
Chip type: HuEx-1_0-st-v2,U-Ensembl49,G-Affy
Number of arrays: 67
Names: 10002002.1, 10003001.1, ..., huex_wta_thyroid_C
Time period: 2005-03-03 12:59:11 -- 2007-12-13 16:49:42
Total file size: 4231.77MB
RAM: 0.08MB
>

> bc_osc_control <- RmaBackgroundCorrection(cs_osc_control)
> csBC_osc_control <- process(bc_osc_control,verbose=verbose)
20080402 17:57:34|Background correcting data set...
20080402 17:57:34| Already background corrected
20080402 17:57:34|Background correcting data set...done
> qn_osc_control <- QuantileNormalization(csBC_osc_control, typesToUpdate="pm")
> print(qn_osc_control)
QuantileNormalization:
Data set: OSCControlClassification
Input tags: RBC
Output tags: RBC,QN
Number of arrays: 67 (4231.77MB)
Chip type: HuEx-1_0-st-v2,U-Ensembl49,G-Affy
Algorithm parameters: (subsetToUpdate: NULL, typesToUpdate: chr "pm", subsetToAvg: NULL, typesToAvg: chr "pm", .targetDistribution: NULL)
Output path: probeData/OSCControlClassification,RBC,QN/HuEx-1_0-st-v2
Is done: TRUE
RAM: 0.00MB
> # OSC control
> csN_osc_control <- process(qn_osc_control, verbose=verbose)
20080402 17:57:46|Quantile normalizing data set...
20080402 17:57:46| Already normalized
20080402 17:57:46|Quantile normalizing data set...done

> fit(plm_osc_control, verbose=verbose)
20080402 17:58:02|Fitting model of class ExonRmaPlm:...
ExonRmaPlm:
Data set: OSCControlClassification
Chip type: HuEx-1_0-st-v2,U-Ensembl49,G-Affy
Input tags: RBC,QN
Output tags: RBC,QN,RMA,merged
Parameters: (probeModel: chr "pm"; shift: num 0; flavor: chr "affyPLM"; treatNAsAs: chr "weights").
Path: plmData/OSCControlClassification,RBC,QN,RMA,merged/HuEx-1_0-st-v2
RAM: 0.00MB
20080402 17:58:02| Identifying non-estimated units...
20080402 17:58:02| Getting chip-effect set from data set...
20080402 17:58:02|   Retrieving monocell CDF...
20080402 17:58:02|    Monocell chip type: HuEx-1_0-st-v2,U-Ensembl49,G-Affy,monocell
20080402 17:58:02|    Locating monocell CDF...
20080402 17:58:02|     Pathname: annotationData/chipTypes/HuEx-1_0-st-v2/HuEx-1_0-st-v2,U-Ensembl49,G-Affy,monocell.CDF
20080402 17:58:02|    Locating monocell CDF...done
20080402 17:58:02|   Retrieving monocell CDF...done
20080402 17:58:02|   Retrieving chip-effects from data set...
20080402 17:58:02|    Data set: OSCControlClassification
20080402 17:58:02|    Retrieving chip-effect #1 of 67 (10002002.1)...
20080402 17:58:02|     Defining chip-effect file...
20080402 17:58:02|      Pathname: plmData/OSCControlClassification,RBC,QN,RMA,merged/HuEx-1_0-st-v2/10002002.1,chipEffects.CEL
Error in list("fit(plm_osc_control, verbose = verbose)" = <environment>, :

[2008-04-02 17:58:02] Exception: The specified CDF structure is not compatible with the CEL file. The number of cells do not match: 327184 != 332929
at throw(Exception(...))
at throw.default("The specified CDF structure is not compatible with the CEL f
at throw("The specified CDF structure is not compatible with the CEL file. The
at setCdf.AffymetrixCelFile(res, cdf)
at setCdf(res, cdf)
at method(static, ...)
at clazz$fromDataFile(df, path = path, name = name, cdf = cdf, ..., verbose =
at method(static, ...)
at clazz$fromDataSet(dataSet = ds, path = getPath(this), cdf = cdfMono, verbos
at getChipEffectSet.ProbeLevelModel(this, verbose = verbose)
at NextMethod("getChipEffectSet", this, ...)
at getChipEffectSet.ExonRmaPlm(this, verbose = verbose)
at getChipEffectSet(this, verbose = verbose)
at findUnitsTodo.ProbeLevelModel(this, verbose = less(verbose))
at findUnitsTodo(this, verbose = less(verbose))
at fit.ProbeLevelModel(plm_osc_control, verb

[s2611722@ettin R_scripts]$ ls -al ../annotationData/chipTypes/HuEx-1_0-st-v2/HuEx-1_0-st-v2,U-Ensembl49,G-Affy*
-rw-r--r-- 1 s2611722 s2611722 46182259 Apr 2 14:13 ../annotationData/chipTypes/HuEx-1_0-st-v2/HuEx-1_0-st-v2,U-Ensembl49,G-Affy.cdf
-rw-rw-r-- 1 s2611722 s2611722 33329489 Apr 2 17:44 ../annotationData/chipTypes/HuEx-1_0-st-v2/HuEx-1_0-st-v2,U-Ensembl49,G-Affy,monocell.CDF
[

-------------------------------------
Alistair Chalk, Ph.D.

Research Fellow
Systems Biology Program
The Eskitis Institute for Cell and Molecular Therapies
Griffith University
Brisbane Innovation Park
Don Young Road
Brisbane, QLD 4111, Australia
http://informatics-eskitis.griffith.edu.au
http://www.eskitis.org.au/research/systems/systems.html

Office: +61 (7) 373 54411
Fax: +61 (7) 373 54255

Mark Robinson

unread,

Apr 2, 2008, 5:13:22 AM4/2/08

to aroma-af...@googlegroups.com

One clarification ...

> Two things to do ....
>
> 1. use force=TRUE in your 'fit' and 'process' calls ... that will
> start from scratch and overwrite the CEL files for the normalized
> data, chip effects, etc.
>
> 2. use tags ... with tags, you can process the same dataset with a
> bunch of different CDFs ... for example, the code below does BG
> adjustment, normalization on the 'main' set of probesets, but then
> does fits on the Ensembl 47 or 49 and stores the results in
> separate locations ...

I guess I should say you should do 1 OR 2, you don't need to do
both. There is no need to do 1 unless you really want to overwrite
your previous analysis. #2 is probably the preferred since it gives
you more flexibility (i.e. of running separate analysis and not
overwriting anything).