I am trying to run GcRMaBackgroundCorrection on HTA2.0 arrays. I built my own probeflat file, that I think that it is correct (but I may be wrong). After running this code I get the error
> bcgc <- GcRmaBackgroundCorrection(cs, tag=c("*","r11"),type="affinities");
> csBCgc <- process(bcgc,verbose=verbose,ram=40,safe =F);
Loading required namespace: gcrma
20180804 14:58:26|Background correcting data set...
20180804 14:58:27| Background correcting data set...
20180804 14:58:33| Already background corrected for "optical" effects
20180804 14:58:33| Background correcting data set...done
20180804 14:58:38| Computing probe affinities (independent of data)...
(More stuff)
20180804 14:58:45| Retrieving probe-sequence data...
20180804 14:58:45| Chip type (full): EP_HTA-2_0,r
20180804 14:58:45| Locating probe-tab file...
20180804 14:58:45| Chip type: EP_HTA-2_0
AffymetrixProbeTabFile:
Name: EP_HTA-2_0
Tags:
Full name: EP_HTA-2_0
Pathname: annotationData/chipTypes/EP_HTA-2_0/EP_HTA-2_0_probe_tab
File size: 496.13 MiB (520234631 bytes)
Number of data rows: NA
Columns [6]: 'unitName', 'probeXPos', 'probeYPos', 'probeInterrogationPosition', 'probeSequence', 'targetStrandedness'
Number of text lines: NA
AffymetrixCdfFile:
Path: annotationData/chipTypes/EP_HTA-2_0
Filename: EP_HTA-2_0,r.cdf
File size: 140.18 MiB (146990714 bytes)
Chip type: EP_HTA-2_0,r
File format: v4 (binary; XDA)
Dimension: 2572x2680
Number of cells: 6892960
Number of units: 97482
Cells per unit: 70.71
Number of QC units: 0
20180804 14:58:45| Locating probe-tab file...done
20180804 14:58:45| Validating probe-tab file against CDF...
20180804 14:58:46| Number of records read: 1
20180804 14:58:46| Data read:
'data.frame': 1 obs. of 1 variable:
$ unitName: chr "TC01000001.hg_1"
20180804 14:58:46| Unit name:
chr "TC01000001.hg_1"
20180804 14:58:46| Unit index: 1
probeXPos probeYPos probeSequence
1 84 1383 GGGGAAGGGCATGCCTGGCATCACC
20180804 14:58:46| (x,y):
[1] 84 1383
20180804 14:58:46| Validating probe-tab file against CDF...done
20180804 14:58:46| Reading (x,y,sequence) data...
20180804 14:59:48| Reading (x,y,sequence) data...done
20180804 14:59:48| Validating (x,y) against CDF dimension...
20180804 14:59:48| CDF dimension:
nbrOfRows nbrOfColumns
2572 2680
[2018-08-04 14:59:48] Exception: Detected probe x position out of range [0,2572]: annotationData/chipTypes/EP_HTA-2_0/EP_HTA-2_0_probe_tab
at #08. getProbeSequenceData.AffymetrixCdfFile(this, safe = safe, verbose = verbose)
- getProbeSequenceData.AffymetrixCdfFile() is in environment 'aroma.affymetrix'
at #07. getProbeSequenceData(this, safe = safe, verbose = verbose)
- getProbeSequenceData() is in environment 'aroma.affymetrix'
at #06. computeAffinities.AffymetrixCdfFile(cdf, ..., verbose = less(verbose))
- computeAffinities.AffymetrixCdfFile() is in environment 'aroma.affymetrix'
at #05. computeAffinities(cdf, ..., verbose = less(verbose))
- computeAffinities() is in environment 'aroma.affymetrix'
at #04. calculateAffinities.GcRmaBackgroundCorrection(this, verbose = less(verbose))
- calculateAffinities.GcRmaBackgroundCorrection() is in environment 'aroma.affymetrix'
at #03. calculateAffinities(this, verbose = less(verbose))
- calculateAffinities() is in environment 'aroma.affymetrix'
at #02. process.GcRmaBackgroundCorrection(bcgc, verbose = verbose, ram = 40,
safe = F)
- process.GcRmaBackgroundCorrection() is in environment 'aroma.affymetrix'
at #01. process(bcgc, verbose = verbose, ram = 40, safe = F)
- process() is in environment 'aroma.core'
Error: Detected probe x position out of range [0,2572]: annotationData/chipTypes/EP_HTA-2_0/EP_HTA-2_0_probe_tab
20180804 14:59:48| Validating (x,y) against CDF dimension...done
20180804 14:59:48| Retrieving probe-sequence data...done
20180804 14:59:48| Reading probe-sequence data...done
20180804 14:59:48| Computing GCRMA probe affinities...done
20180804 14:59:48| Computing probe affinities (independent of data)...done
20180804 14:59:48|Background correcting data set...done
So it seems that there are "probe x position out of range [0,2572]"... My surprise came when I found that, indeed there are!!!!
I checked (downloading the unsupported HTA-2_0-r1-exon_cdf.zip file from Affymetrix) after changing periods to commas that running
the df data.frame contains df$x out of the range 0:2572. In other words, it seems that the dimensions of the array are somehow switched and the sanity check of function getProbeSequenceData.AffymetrixCdfFile throws an error.
In other words, in the official (albeit unsupported) cdf from Affy, there are probes with x position larger than 2572 and the dimensions seems to be switched. I was wondering if there is some type of bug: Affy arrays used to be square and therefore, the range for x and y was identical.
Is there any way to circumvent this sanity check since everything seems to be working properly.
R version 3.4.2 (2017-09-28)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS High Sierra 10.13.5
Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats4 parallel stats graphics grDevices utils datasets methods base
other attached packages:
[1] future_1.8.0 limma_3.34.9 prodlim_2018.04.18
[4] RBGL_1.54.0 graph_1.56.0 Matrix_1.2-14
[7] stringr_1.3.1 MASS_7.3-50 igraph_1.2.1
[10] SGSeq_1.12.0 SummarizedExperiment_1.8.1 DelayedArray_0.4.1
[13] matrixStats_0.53.1 Rsamtools_1.30.0 Biostrings_2.46.0
[16] XVector_0.18.0 GenomicFeatures_1.30.3 AnnotationDbi_1.40.0
[19] Biobase_2.38.0 GenomicRanges_1.30.2 GenomeInfoDb_1.14.0
[22] IRanges_2.12.0 S4Vectors_0.16.0 BiocGenerics_0.24.0
[25] aroma.light_3.8.0 aroma.affymetrix_3.1.1 aroma.core_3.1.2
[28] R.devices_2.15.1 R.filesets_2.12.1 R.utils_2.6.0
[31] R.oo_1.22.0 affxparser_1.50.0 R.methodsS3_1.7.1
[34] BiocInstaller_1.28.0
loaded via a namespace (and not attached):
[1] httr_1.3.1 RMySQL_0.10.14 splines_3.4.2
[4] bit64_0.9-7 assertthat_0.2.0 affy_1.56.0
[7] blob_1.1.1 GenomeInfoDbData_1.0.0 gcrma_2.50.0
[10] yaml_2.1.19 progress_1.2.0 globals_0.11.0
[13] RSQLite_2.1.0 lattice_0.20-35 RUnit_0.4.31
[16] digest_0.6.15 preprocessCore_1.40.0 XML_3.98-1.11
[19] pkgconfig_2.0.1 biomaRt_2.34.2 listenv_0.7.0
[22] zlibbioc_1.24.0 aroma.apd_0.6.0 affyio_1.48.0
[25] lava_1.6.1 BiocParallel_1.12.0 survival_2.42-3
[28] magrittr_1.5 crayon_1.3.4 memoise_1.1.0
[31] R.cache_0.13.0 R.rsp_0.42.0 tools_3.4.2
[34] prettyunits_1.0.2 hms_0.4.2 compiler_3.4.2
[37] rlang_0.2.1 grid_3.4.2 RCurl_1.95-4.10
[40] rstudioapi_0.7 R.huge_0.9.0 bitops_1.0-6
[43] base64enc_0.1-3 DNAcopy_1.52.0 codetools_0.2-15
[46] DBI_0.8 PSCBS_0.63.0 R6_2.2.2
[49] GenomicAlignments_1.14.1 rtracklayer_1.38.3 future.apply_0.2.0
[52] bit_1.1-12 stringi_1.2.2 Rcpp_0.12.17