GcRMaBackgroundCorrection problem with HTA2.0

31 views
Skip to first unread message

Angel Rubio

unread,
Aug 4, 2018, 11:16:03 AM8/4/18
to aroma.affymetrix
Dear Henrik,

I am trying to run GcRMaBackgroundCorrection on HTA2.0 arrays. I built my own probeflat file, that I think that it is correct (but I may be wrong). After running this code I get the error



> bcgc <- GcRmaBackgroundCorrection(cs, tag=c("*","r11"),type="affinities");
> csBCgc <- process(bcgc,verbose=verbose,ram=40,safe =F);
Loading required namespace: gcrma
20180804 14:58:26|Background correcting data set...
20180804 14:58:27| Background correcting data set...
20180804 14:58:33|  Already background corrected for "optical" effects
20180804 14:58:33| Background correcting data set...done
20180804 14:58:38| Computing probe affinities (independent of data)...
(More stuff)
20180804 14:58:45|    Retrieving probe-sequence data...
20180804 14:58:45|     Chip type (full): EP_HTA-2_0,r
20180804 14:58:45|     Locating probe-tab file...
20180804 14:58:45|      Chip type: EP_HTA-2_0
      AffymetrixProbeTabFile:
      Name: EP_HTA-2_0
      Tags: 
      Full name: EP_HTA-2_0
      Pathname: annotationData/chipTypes/EP_HTA-2_0/EP_HTA-2_0_probe_tab
      File size: 496.13 MiB (520234631 bytes)
      Number of data rows: NA
      Columns [6]: 'unitName', 'probeXPos', 'probeYPos', 'probeInterrogationPosition', 'probeSequence', 'targetStrandedness'
      Number of text lines: NA
      AffymetrixCdfFile:
      Path: annotationData/chipTypes/EP_HTA-2_0
      Filename: EP_HTA-2_0,r.cdf
      File size: 140.18 MiB (146990714 bytes)
      Chip type: EP_HTA-2_0,r
      File format: v4 (binary; XDA)
      Dimension: 2572x2680
      Number of cells: 6892960
      Number of units: 97482
      Cells per unit: 70.71
      Number of QC units: 0
20180804 14:58:45|     Locating probe-tab file...done
20180804 14:58:45|     Validating probe-tab file against CDF...
20180804 14:58:46|      Number of records read: 1
20180804 14:58:46|      Data read:
      'data.frame': 1 obs. of  1 variable:
       $ unitName: chr "TC01000001.hg_1"
20180804 14:58:46|      Unit name:
       chr "TC01000001.hg_1"
20180804 14:58:46|      Unit index: 1
        probeXPos probeYPos             probeSequence
      1        84      1383 GGGGAAGGGCATGCCTGGCATCACC
20180804 14:58:46|      (x,y):
      [1]   84 1383
20180804 14:58:46|     Validating probe-tab file against CDF...done
20180804 14:58:46|     Reading (x,y,sequence) data...
20180804 14:59:48|     Reading (x,y,sequence) data...done
20180804 14:59:48|     Validating (x,y) against CDF dimension...
20180804 14:59:48|      CDF dimension:
         nbrOfRows nbrOfColumns 
              2572         2680 
[2018-08-04 14:59:48] Exception: Detected probe x position out of range [0,2572]: annotationData/chipTypes/EP_HTA-2_0/EP_HTA-2_0_probe_tab

  at #08. getProbeSequenceData.AffymetrixCdfFile(this, safe = safe, verbose = verbose)
          - getProbeSequenceData.AffymetrixCdfFile() is in environment 'aroma.affymetrix'

  at #07. getProbeSequenceData(this, safe = safe, verbose = verbose)
          - getProbeSequenceData() is in environment 'aroma.affymetrix'

  at #06. computeAffinities.AffymetrixCdfFile(cdf, ..., verbose = less(verbose))
          - computeAffinities.AffymetrixCdfFile() is in environment 'aroma.affymetrix'

  at #05. computeAffinities(cdf, ..., verbose = less(verbose))
          - computeAffinities() is in environment 'aroma.affymetrix'

  at #04. calculateAffinities.GcRmaBackgroundCorrection(this, verbose = less(verbose))
          - calculateAffinities.GcRmaBackgroundCorrection() is in environment 'aroma.affymetrix'

  at #03. calculateAffinities(this, verbose = less(verbose))
          - calculateAffinities() is in environment 'aroma.affymetrix'

  at #02. process.GcRmaBackgroundCorrection(bcgc, verbose = verbose, ram = 40, 
              safe = F)
          - process.GcRmaBackgroundCorrection() is in environment 'aroma.affymetrix'

  at #01. process(bcgc, verbose = verbose, ram = 40, safe = F)
          - process() is in environment 'aroma.core'

Error: Detected probe x position out of range [0,2572]: annotationData/chipTypes/EP_HTA-2_0/EP_HTA-2_0_probe_tab
20180804 14:59:48|     Validating (x,y) against CDF dimension...done
20180804 14:59:48|    Retrieving probe-sequence data...done
20180804 14:59:48|   Reading probe-sequence data...done
20180804 14:59:48|  Computing GCRMA probe affinities...done
20180804 14:59:48| Computing probe affinities (independent of data)...done
20180804 14:59:48|Background correcting data set...done


So it seems that there are "probe x position out of range [0,2572]"... My surprise came when I found that, indeed there are!!!! 

I checked (downloading the unsupported HTA-2_0-r1-exon_cdf.zip file from Affymetrix) after changing periods to commas that running 

cdfGFile <- file.path(Path, "HTA-2_0,r1,exon.cdf")
system.time(df <- readCdfDataFrame(cdfGFile)) # Takes more than 30 minutes in my laptop...

the df data.frame contains df$x out of the range 0:2572. In other words, it seems that the dimensions of the array are somehow switched and the sanity check of function getProbeSequenceData.AffymetrixCdfFile throws an error.

In other words, in the official (albeit unsupported) cdf from Affy, there are probes with x position larger than 2572 and the dimensions seems to be switched. I was wondering if there is some type of bug: Affy arrays used to be square and therefore, the range for x and y was identical. 

Is there any way to circumvent this sanity check since everything seems to be working properly.
Best regards,

Angel

Session Info:
R version 3.4.2 (2017-09-28)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS High Sierra 10.13.5

Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] future_1.8.0               limma_3.34.9               prodlim_2018.04.18        
 [4] RBGL_1.54.0                graph_1.56.0               Matrix_1.2-14             
 [7] stringr_1.3.1              MASS_7.3-50                igraph_1.2.1              
[10] SGSeq_1.12.0               SummarizedExperiment_1.8.1 DelayedArray_0.4.1        
[13] matrixStats_0.53.1         Rsamtools_1.30.0           Biostrings_2.46.0         
[16] XVector_0.18.0             GenomicFeatures_1.30.3     AnnotationDbi_1.40.0      
[19] Biobase_2.38.0             GenomicRanges_1.30.2       GenomeInfoDb_1.14.0       
[22] IRanges_2.12.0             S4Vectors_0.16.0           BiocGenerics_0.24.0       
[25] aroma.light_3.8.0          aroma.affymetrix_3.1.1     aroma.core_3.1.2          
[28] R.devices_2.15.1           R.filesets_2.12.1          R.utils_2.6.0             
[31] R.oo_1.22.0                affxparser_1.50.0          R.methodsS3_1.7.1         
[34] BiocInstaller_1.28.0      

loaded via a namespace (and not attached):
 [1] httr_1.3.1               RMySQL_0.10.14           splines_3.4.2           
 [4] bit64_0.9-7              assertthat_0.2.0         affy_1.56.0             
 [7] blob_1.1.1               GenomeInfoDbData_1.0.0   gcrma_2.50.0            
[10] yaml_2.1.19              progress_1.2.0           globals_0.11.0          
[13] RSQLite_2.1.0            lattice_0.20-35          RUnit_0.4.31            
[16] digest_0.6.15            preprocessCore_1.40.0    XML_3.98-1.11           
[19] pkgconfig_2.0.1          biomaRt_2.34.2           listenv_0.7.0           
[22] zlibbioc_1.24.0          aroma.apd_0.6.0          affyio_1.48.0           
[25] lava_1.6.1               BiocParallel_1.12.0      survival_2.42-3         
[28] magrittr_1.5             crayon_1.3.4             memoise_1.1.0           
[31] R.cache_0.13.0           R.rsp_0.42.0             tools_3.4.2             
[34] prettyunits_1.0.2        hms_0.4.2                compiler_3.4.2          
[37] rlang_0.2.1              grid_3.4.2               RCurl_1.95-4.10         
[40] rstudioapi_0.7           R.huge_0.9.0             bitops_1.0-6            
[43] base64enc_0.1-3          DNAcopy_1.52.0           codetools_0.2-15        
[46] DBI_0.8                  PSCBS_0.63.0             R6_2.2.2                
[49] GenomicAlignments_1.14.1 rtracklayer_1.38.3       future.apply_0.2.0      
[52] bit_1.1-12               stringi_1.2.2            Rcpp_0.12.17            

Reply all
Reply to author
Forward
0 new messages