Error with featureSubset = 'highQuality' and remove_uninformative_feature_outlier=TRUE

93 views
Skip to first unread message

Miguel Cosenza

unread,
Feb 15, 2021, 4:33:50 AM2/15/21
to MSstats
Dear MSstats team,

I am reporting and error that I am having while executing the dataProcess function:

> normalized_data <- dataProcess(msts_formated_data,
+                                logTrans=2,
+                                normalization="equalizeMedians",
+                                nameStandards=NULL,
+                                address="",
+                                fillIncompleteRows=TRUE,
+                                featureSubset="highQuality",
+                                remove_uninformative_feature_outlier=TRUE,
+                                n_top_feature=3,
+                                summaryMethod="TMP",
+                                equalFeatureVar=TRUE,
+                                censoredInt="NA",
+                                cutoffCensored="minFeature",
+                                MBimpute=FALSE,
+                                remove50missing=FALSE,
+                                maxQuantileforCensored=0.999,
+                                clusters=NULL)
** Flag uninformative feature and outliers by feature selection algorithm.
Analyzing features in L channel...
Joining, by = c("protein", "feature")
Joining, by = "protein"
Joining, by = "protein"
Identifying low-coverage features...
Fitting robust linear models...
Joining, by = "protein"
Identifying outliers and calculating feature variances...
Error: Must specify either `data` or `newdata` argument.
In addition: Warning message:
10 very small variances detected, have been offset away from zero 

> sessionInfo()
R version 4.0.3 (2020-10-10)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows Server x64 (build 17763)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] stringr_1.4.0  tidyr_1.1.2    readr_1.4.0    here_1.0.1     dplyr_1.0.4   
[6] MSstats_3.22.0

loaded via a namespace (and not attached):
 [1] tinytex_0.29          gtools_3.8.2          statmod_1.4.35       
 [4] minpack.lm_1.2-1      tidyselect_1.1.0      xfun_0.20            
 [7] reshape2_1.4.4        purrr_0.3.4           splines_4.0.3        
[10] lattice_0.20-41       colorspace_2.0-0      vctrs_0.3.6          
[13] generics_0.1.0        doSNOW_1.0.19         snow_0.4-3           
[16] marray_1.68.0         survival_3.2-7        rlang_0.4.10         
[19] pillar_1.4.7          nloptr_1.2.2.2        glue_1.4.2           
[22] DBI_1.1.1             plyr_1.8.6            foreach_1.5.1        
[25] lifecycle_0.2.0       munsell_0.5.0         gtable_0.3.0         
[28] caTools_1.18.1        codetools_0.2-18      parallel_4.0.3       
[31] preprocessCore_1.52.1 broom_0.7.4           Rcpp_1.0.6           
[34] KernSmooth_2.23-18    scales_1.1.1          backports_1.2.1      
[37] limma_3.46.0          lme4_1.1-26           gplots_3.1.1         
[40] hms_1.0.0             ggplot2_3.3.3         stringi_1.5.3        
[43] ggrepel_0.9.1         grid_4.0.3            rprojroot_2.0.2      
[46] bitops_1.0-6          tools_4.0.3           magrittr_2.0.1       
[49] tibble_3.0.6          crayon_1.4.1          pkgconfig_2.0.3      
[52] MASS_7.3-53           ellipsis_0.3.1        Matrix_1.2-18        
[55] data.table_1.13.6     assertthat_0.2.1      minqa_1.2.4          
[58] iterators_1.0.13      R6_2.5.0              boot_1.3-25          
[61] nlme_3.1-150          compiler_4.0.3  

Many thanks in advance for taking a look.

Best wishes,
Miguel

Mateusz Staniak

unread,
Feb 15, 2021, 5:01:08 AM2/15/21
to MSstats
Hi,


thanks for letting us know about the error.
Please install the development version by running devtools::install_github("MeenaChoi/MSstats") and upgrade dependencies at the same time (you will need the devtools package for this).
If the problem still persists, please let me know. Then, a snippet of your data to reproduce the error would be helpful.

Kind regards,
Mateusz Staniak

Miguel Cosenza

unread,
Feb 15, 2021, 5:29:09 AM2/15/21
to MSstats
Hello Mateusz,

Many thanks for taking a look. The error is persistent after installing the development version.

I am sending a link with the data and the script to reproduce the error via direct e-mail.

Best,
Miguel

Mateusz Staniak

unread,
Feb 15, 2021, 7:30:55 AM2/15/21
to MSstats
Hi,


currently, MSstats uses broom package to process output of fitting robust linear models while performing feature selection.
However, in case of two proteins in your datasets:
- sp|O15371|EIF3D_HUMAN
- sp|P37108|SRP14_HUMAN
the model is rank deficient, and broom package returns an error in this case.
We will fix this problem in the next MSstats release. I will send you results from this version (including these two proteins in a seperate e-mail).
With the current version, excluding these two proteins from your analysis should fix the problem.

Kind regards,
Mateusz

antje dittmann

unread,
Mar 10, 2021, 6:43:22 AM3/10/21
to MSstats
Hi Mateusz,

I'm seeing the same error:

> data_outlier_removed <- dataProcess(csv, 
+                                     featureSubset = "highQuality",
+                                     remove_uninformative_feature_outlier = TRUE,
+                                     MBimpute = TRUE,
+                                     remove50missing = TRUE)
** Log2 intensities under cutoff = 17.618  were considered as censored missing values.
** Log2 intensities = NA were considered as censored missing values.
** Flag uninformative feature and outliers by feature selection algorithm.
Analyzing features in L channel...
Joining, by = c("protein", "feature")
Joining, by = "protein"
Joining, by = "protein"
Identifying low-coverage features...
Fitting robust linear models...
Joining, by = "protein"
Identifying outliers and calculating feature variances...
Fehler: Must specify either `data` or `newdata` argument.

Would it be possible to get a fix for this error in the development version already? 
Kind regards,

Antje

Ben Polacco

unread,
May 19, 2021, 4:01:34 PM5/19/21
to MSstats
While the devs work on a real fix, I modified a forked copy that catches the error, reports the problem proteins, sets all features from problem proteins as non-outliers, and proceeds to finish the dataProcess.  You can use the result as-computed, or remove the problem proteins and repeat dataProcess.  To install my version:

library (devtools)
devtools::install_github(https://github.com/bpolacco/MSstats, ref = "hot.fix.newDataAugmentBug")
 
To test that it worked, look for the 9999 as a minor version:

> packageVersion("MSstats")
[1] ‘3.23.1.9999’
Reply all
Reply to author
Forward
0 new messages