a possible bug in `subset` function

24 views
Skip to first unread message

Zhengnan Cheng

unread,
Apr 1, 2024, 1:55:27 AMApr 1
to Cardinal MSI Help
Dear Cardinal developers,
 
I hope this message finds you well. I am writing to bring to your attention a bug that I have encountered while using the Cardinal R package. The `subset` function is much slower in the newest version than the old one.

Here is my test code:
library(Cardinal) |> suppressWarnings() |> suppressMessages()

file <- 'test.imzML'
print('readMSIData')
system.time({
    mse <- Cardinal::readMSIData(file, attach.only = FALSE)
})
print(mse)

print('subsetPixels')
system.time({
    submse <- subsetPixels(mse, run == runNames(mse)[1])
})

print('subsetFeatures')
system.time({
    submse <- subsetFeatures(mse, mz == mz(mse)[1])
})

sessionInfo()

Cardinal_3.0.1: 
[1] "readMSIData"
   user  system elapsed
  2.261   0.810   3.135
An object of class 'MSContinuousImagingExperiment'
  <1779 feature, 75924 pixel> imaging dataset
    imageData(1): intensity
    featureData(0):
    pixelData(0):
    metadata(12): spectrum representation ibd binary type ... files
        name
    run(1): test
    raster dimensions: 299 x 386
    coord(2): x = 1..299, y = 1..386
    mass range:  125.0001 to 1281.6032
    centroided: TRUE
[1] "subsetPixels"
   user  system elapsed
  1.016   0.475   1.505
[1] "subsetFeatures"
   user  system elapsed
  0.046   0.000   0.047
R version 4.2.3 (2023-03-15)
Platform: x86_64-conda-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8  
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C      

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods  
[8] base    

other attached packages:
[1] Cardinal_3.0.1      S4Vectors_0.36.2    EBImage_4.40.0    
[4] BiocParallel_1.32.6 BiocGenerics_0.44.0 ProtGenerics_1.30.0

loaded via a namespace (and not attached):
 [1] magrittr_2.0.3    MASS_7.3-58.3     mclust_6.0.0      viridisLite_0.4.1
 [5] lattice_0.21-8    jpeg_0.1-10       rlang_1.1.1       fastmap_1.1.1    
 [9] tools_4.2.3       parallel_4.2.3    grid_4.2.3        biglm_0.9-2.1    
[13] Biobase_2.58.0    nlme_3.1-162      png_0.1-8         irlba_2.3.5.1    
[17] DBI_1.1.3         cli_3.6.1         matter_2.0.1      htmltools_0.5.5  
[21] abind_1.4-5       digest_0.6.31     Matrix_1.5-4.1    htmlwidgets_1.6.2
[25] bitops_1.0-7      fftwtools_0.9-11  codetools_0.2-19  signal_0.7-7    
[29] RCurl_1.98-1.12   tiff_0.1-11       sp_1.6-1          compiler_4.2.3  
[33] locfit_1.5-9.7

Cardinal_3.4.3: 
[1] "readMSIData"
   user  system elapsed
  5.024   0.423   5.579
An object of class 'MSContinuousImagingExperiment'
  <1779 feature, 75924 pixel> imaging dataset
    imageData(1): intensity
    featureData(0):
    pixelData(0):
    metadata(1): parse
    run(1): test
    raster dimensions: 299 x 386
    coord(2): x = 1..299, y = 1..386
    mass range:  125.0001 to 1281.6032
    centroided: TRUE
[1] "subsetPixels"
   user  system elapsed
 92.075   1.266  93.563
[1] "subsetFeatures"
   user  system elapsed
  0.530   0.002   0.533
R version 4.3.2 (2023-10-31)
Platform: x86_64-conda-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8  
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C      

time zone: Asia/Shanghai
tzcode source: system (glibc)

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods  
[8] base    

other attached packages:
[1] Cardinal_3.4.3      S4Vectors_0.40.2    EBImage_4.44.0    
[4] BiocParallel_1.36.0 BiocGenerics_0.48.1 ProtGenerics_1.34.0

loaded via a namespace (and not attached):
 [1] nlme_3.1-164       cli_3.6.2          rlang_1.1.3        DBI_1.2.2        
 [5] mclust_6.1         tiff_0.1-12        png_0.1-8          RCurl_1.98-1.14  
 [9] htmltools_0.5.8    ontologyIndex_2.12 locfit_1.5-9.9     Biobase_2.62.0    
[13] grid_4.3.2         abind_1.4-5        MASS_7.3-60.0.1    bitops_1.0-7      
[17] fastmap_1.1.1      compiler_4.3.2     codetools_0.2-20   irlba_2.3.5.1    
[21] CardinalIO_1.0.0   htmlwidgets_1.6.4  fftwtools_0.9-11   lattice_0.22-6    
[25] digest_0.6.35      signal_1.8-0       viridisLite_0.4.2  matter_2.4.1      
[29] parallel_4.3.2     magrittr_2.0.3     Matrix_1.6-5       tools_4.3.2      
[33] jpeg_0.1-10        biglm_0.9-2.1    


I found that in the old version, loading MSI data into memory (using `attach.only = FALSE`) greatly improved the speed of subset operations, but this improvement is not significant in the newest version. The same problems occurred when using slicing methods ([]).

Your prompt attention to this matter would be highly appreciated.  Thank you for your time and effort in maintaining the package. 
 
Best regards,
Zhengnan Cheng
Reply all
Reply to author
Forward
0 new messages