Error in groupComparisonTMT

391 views
Skip to first unread message

sudip ghosh

unread,
Jul 5, 2021, 4:52:26 AM7/5/21
to MSstats
Hi MSstatsTMT team,

I am comparing two groups (3+3, no reference channel) in a TMT6 plex run, 8 fractions, run twice each (16 raw files). The files have FAIMS annotation. 
groupComparisonTMT returns following error. 
Warning messages:
1: Model failed to converge with 1 negative eigenvalue: -6.0e-01
2: Model failed to converge with 1 negative eigenvalue: -4.4e-01
3: Model failed to converge with 1 negative eigenvalue: -8.7e-02
4: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv,  ... :
  unable to evaluate scaled gradient

and the output is like this

Protein Label log2FC SE DF pvalue adj.pvalue           issue
1  Q03265  M-RR     NA NA NA     NA         NA unfittableModel
2  Q9CXU0  M-RR     NA NA NA     NA         NA unfittableModel
3  Q9EPU4  M-RR     NA NA NA     NA         NA unfittableModel
4  P41158  M-RR     NA NA NA     NA         NA unfittableModel
5  O88351  M-RR     NA NA NA     NA         NA unfittableModel
6  P20029  M-RR     NA NA NA     NA         NA unfittableModel

I imported the PSM list from PD search. However the proteinSummarization by 'msstats' didn't work but median has no error. 

Joining, by = c("Run", "Channel")
Summarizing for Run : 1_1 ( 1  of  2 )
Error in `[.data.table`(raw, , require.col) : 
  j (the 2nd argument inside [...]) is a single symbol but column name 'require.col' is not found. Perhaps you intended DT[, ..require.col]. This difference to data.frame is deliberate and explained in FAQ 1.1 

Please help me out where it is going wrong. I have attached the annotation file.

Best regards,

Sudip
annotation.txt

Mateusz Staniak

unread,
Jul 5, 2021, 8:23:18 AM7/5/21
to MSstats
Hi,
please also send us your function calls (so we can see all parameters that you used) and session info
Best,
Mateusz

sudip ghosh

unread,
Jul 5, 2021, 10:43:25 AM7/5/21
to MSstats
Hi Mateusz,

Will the log file be ok for the session info? I am attaching it, and here are the scripts:

library(MSstats)
library(MSstatsTMT)
library(tidyr)
library(dplyr)
raw.pd <- read.delim("~/../bHPRP/210617_eIF6_MLL-AF9_TMT6_FAIMS_F_with_human_EIF6_PSMs.txt")
colnames(raw.pd)
length(unique(raw.pd$Protein.Accessions))
annot.pd <- read.delim(file="PD_Annotation.txt", header=TRUE)
runs <- unique(raw.pd$Spectrum.File) # MS runs
Run_info <- data.frame(Run = runs) # initialize the run file 
Run_info$Mixture <- ""
Run_info$TechRepMixture <- ""
Run_info$Fraction <- ""
write.csv(Run_info, file = "Run_info.csv", row.names = FALSE)
Run_info_filled <- read.delim(file = "Run_info_filled.txt")
head(Run_info_filled)
channels <- c("126", "127", "128", "129", "130", "131")
mixtures <- unique(Run_info_filled$Mixture)
Group_info <- expand.grid(channels, mixtures)
colnames(Group_info) <- c("Channel", "Mixture")
Group_info$Condition <- ""
Group_info$BioReplicate <- ""
write.table(Group_info, file = "Group_info.txt", row.names = FALSE)
Group_info_filled <- read.delim(file = "Group_info_filled.txt")
annotation <- full_join(Run_info_filled, Group_info_filled)
input.pd <- PDtoMSstatsTMTFormat(input = raw.pd, annotation = annotation)
quant.pd.median.nonorm <- proteinSummarization(data = input.pd,
                                        method="Median",
                                        global_norm=FALSE,
                                        reference_norm=FALSE,
                                        remove_norm_channel = FALSE,
                                        remove_empty_channel = TRUE)
save(quant.pd.median.nonorm, file='quant.pd.median.nonorm.rda')
dataProcessPlotsTMT(data.peptide = input.pd, # PSM-level data
                    data.summarization = quant.pd.median.nonorm, # protein-level data
                    type = 'ProfilePlot', # choice of visualization, can be used for specific protein to see effect
                    width = 21,
                    height = 7,
                    which.Protein = 'P56537')
unique(quant.pd.median.nonorm$Condition)
comparison<-matrix(c(-1,1),nrow=1)
colnames(comparison)<- c("M", "RR")
row.names(comparison)<-c("RR-M")

test_median.pd <- groupComparisonTMT(data = quant.pd.median.nonorm, 
                                     contrast.matrix = "pairwise",
                              moderated = TRUE, # do moderated t test
                              adj.method = "BH") # multiple comparison adjustment

groupComparisonPlots(data=test_median.pd, 
                     type="VolcanoPlot", 
                     logBase.pvalue=10, 
                     ProteinName=TRUE, # only for small protein number
                     address="")
msstatstmt.log

Mateusz Staniak

unread,
Jul 5, 2021, 1:43:39 PM7/5/21
to MSstats
Hi,
is there a small subset of your data that you could share to reproduce the error with "msstats" summary option?
Best,
Mateusz

Mateusz Staniak

unread,
Jul 5, 2021, 1:49:35 PM7/5/21
to MSstats
First thing I noticed: if your dataset has 8 fractions, please label fractions from 1 to 8. Does that fix the errors?

sudip ghosh

unread,
Jul 5, 2021, 3:03:30 PM7/5/21
to MSstats
Hi Mateusz,

Thank you! The fraction information is there in the annotation file under the fraction column. Do you mean that? there are 12 columns for each fraction (6plex * 2 replicates), thereby 12*8=96 rows, is that all right for the annotation file? 

If I make a PD search with 2 fractions will that work as a small subset? I can keep/exclude replicate runs depending on what you suggest.

PDtoMSstatsTMTFormat has the following info if that is relevant

** Shared PSMs (assigned in multiple proteins) are removed.
** 75 features have 1 or 2 intensities across runs and are removed.
** Multiple measurements in a feature and a run are summarized by summaryforMultipleRows.
** For peptides overlapped between fractions of 1_1, use the fraction with maximal average abundance.
** For peptides overlapped between fractions of 1_2, use the fraction with maximal average abundance.
** Fractions belonging to same mixture have been combined.

Best,
Sudip

Mateusz Staniak

unread,
Jul 6, 2021, 4:14:00 AM7/6/21
to MSstats
Hi,
2 fractions should be OK as long as they will allow me to reproduce the error, thanks in advance.
Best,
Mateusz

sudip ghosh

unread,
Jul 7, 2021, 6:40:59 AM7/7/21
to MSstats
Hi Mateusz,

Here is the small subset, I don't see the update, I posted a while ago, so wondering if that reached you, so sending it again. May be its the size of the file, so zipped the psm text file. I see the same set of errors as the original dataset.

best,
Sudip
annotation.txt
PD-PSM_F1-2_MSStatsTMT_test_SG_210707.txt.zip

Mateusz Staniak

unread,
Jul 7, 2021, 7:10:50 AM7/7/21
to MSstats
Hi,
thanks, I'll see what the problem is
Best,
Mateusz

thuan...@gmail.com

unread,
Jul 7, 2021, 3:02:09 PM7/7/21
to MSstats
Hi Sudip,

I run MSstatsTMT analysis on the dataset you shared. It didn't return any error with either msstats or median summarization. The output of groupComparisonTMT() function is 
Screen Shot 2021-07-07 at 2.59.17 PM.png

The error seems to come from the dependent packages of MSstatsTMT. Which version of MSstatsTMT do you use? Can you update the MSstatsTMT to version 2.0.X? 

Best,
Ting

sudip ghosh

unread,
Jul 7, 2021, 4:58:12 PM7/7/21
to MSstats
Hi Ting,

Thank you! The version was 1.4.6, now updated to 2.0.1 and it is working for the small dataset! However, I have little different numbers for the proteins, 
         Protein  Label       log2FC         SE       DF       pvalue  adj.pvalue
   1: A0A075B5T6    C vs T    0.263661555   0.04219303   6.602192   0.0005361110   0.005177326
   2: A0A087WQ44   C vs T    0.105044385   0.08209535   6.602192   0.2438329476   0.351233061
   3: A0A0B4J1G0    C vs T    0.008114524   0.05931118   6.602192   0.8952566974   0.926210575
   4: A0A0B4J1J6     C vs T    0.015911199   0.05323733   6.602192   0.7742237097   0.838328561
   5: A0A0J9YUD5    C vs T    0.235656306   0.03885065   6.602192   0.0006358604   0.005684356
  ---                                                                            
5772:     Q9Z315   C vs T    0.170911814   0.05394889   6.602192   0.0170254254   0.047915123
5773:     Q9Z321   C vs T    0.050562348   0.05018940   6.602192   0.3492136063   0.461937779
5774:     S4R1W5  C vs T    0.068779094   0.03962126   6.602192   0.1287330617   0.216051564
5775:         sp.        C vs T   -0.007060302   0.12015833   6.602192   0.9548832292   0.969658985
5776:     V9GX81   C vs T   -0.079979024   0.15041997   6.602192   0.6123417430   0.707275786   

Here I used the function  test.pairwise <- groupComparisonTMT(quant.pd.msstats, moderated = TRUE)
I could not make the volcano plot also, returning this error
Error in getOption("MSstatsLog")("MSstats - groupComparisonPlots function") : 
  attempt to apply non-function

> sessionInfo()
R version 4.1.0 (2021-05-18)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Mojave 10.14.6

Matrix products: default
BLAS:   /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods  
[7] base     

other attached packages:
[1] MSstatsConvert_1.2.2 dplyr_1.0.7          tidyr_1.1.3         
[4] MSstatsTMT_2.0.1     MSstats_4.0.1       

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.7            pillar_1.6.1         
 [3] compiler_4.1.0        BiocManager_1.30.16  
 [5] nloptr_1.2.2.2        log4r_0.3.2          
 [7] bitops_1.0-7          tools_4.1.0          
 [9] boot_1.3-28           lme4_1.1-27.1        
[11] checkmate_2.0.0       preprocessCore_1.54.0
[13] lifecycle_1.0.0       tibble_3.1.2         
[15] gtable_0.3.0          nlme_3.1-152         
[17] lattice_0.20-44       pkgconfig_2.0.3      
[19] rlang_0.4.11          Matrix_1.3-4         
[21] ggrepel_0.9.1         caTools_1.18.2       
[23] gtools_3.9.2          generics_0.1.0       
[25] vctrs_0.3.8           lmerTest_3.1-3       
[27] grid_4.1.0            tidyselect_1.1.1     
[29] glue_1.4.2            data.table_1.14.0    
[31] R6_2.5.0              marray_1.70.0        
[33] fansi_0.5.0           survival_3.2-11      
[35] limma_3.48.1          minqa_1.2.4          
[37] ggplot2_3.3.5         purrr_0.3.4          
[39] magrittr_2.0.1        backports_1.2.1      
[41] gplots_3.1.1          scales_1.1.1         
[43] ellipsis_0.3.2        splines_4.1.0        
[45] MASS_7.3-54           colorspace_2.0-2     
[47] numDeriv_2016.8-1.1   KernSmooth_2.23-20   
[49] utf8_1.2.1            stringi_1.6.2        
[51] munsell_0.5.0         crayon_1.4.1 

thuan...@gmail.com

unread,
Jul 8, 2021, 12:16:09 PM7/8/21
to MSstats
MSstatsTMT version 2.0.1 made several changes, including the function output formats. See https://groups.google.com/g/msstats/c/BJ3x_m6O6os for more details.

In order to make volcano plot,  you can use the following code:

groupComparisonPlots(data=test.pairwise$ComparisonResult
                     type="VolcanoPlot", 
                     logBase.pvalue=10, 
                     ProteinName=TRUE, # only for small protein number
                     address="")

-Ting

sudip ghosh

unread,
Jul 8, 2021, 4:02:51 PM7/8/21
to MSstats
Hi Ting,

Thanks. Unfortunately, the volcano plot is still showing error
Error in getOption("MSstatsLog")("MSstats - groupComparisonPlots function") : 
  attempt to apply non-function

I think all the packages are updated now. here is the session info

R version 4.1.0 (2021-05-18)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Mojave 10.14.6

Matrix products: default
BLAS:   /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods  
[7] base     

other attached packages:
[1] dplyr_1.0.7      tidyr_1.1.3      MSstatsTMT_2.0.1 MSstats_4.0.1   

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.7            pillar_1.6.1          compiler_4.1.0       
 [4] BiocManager_1.30.16   nloptr_1.2.2.2        log4r_0.3.2          
 [7] bitops_1.0-7          tools_4.1.0           boot_1.3-28          
[10] lme4_1.1-27.1         checkmate_2.0.0       preprocessCore_1.54.0
[13] lifecycle_1.0.0       tibble_3.1.2          gtable_0.3.0         
[16] nlme_3.1-152          lattice_0.20-44       pkgconfig_2.0.3      
[19] rlang_0.4.11          Matrix_1.3-4          ggrepel_0.9.1        
[22] caTools_1.18.2        gtools_3.9.2          generics_0.1.0       
[25] vctrs_0.3.8           lmerTest_3.1-3        grid_4.1.0           
[28] tidyselect_1.1.1      glue_1.4.2            data.table_1.14.0    
[31] R6_2.5.0              marray_1.70.0         fansi_0.5.0          
[34] survival_3.2-11       limma_3.48.1          minqa_1.2.4          
[37] ggplot2_3.3.5         purrr_0.3.4           magrittr_2.0.1       
[40] backports_1.2.1       gplots_3.1.1          scales_1.1.1         
[43] ellipsis_0.3.2        splines_4.1.0         MASS_7.3-54          
[46] MSstatsConvert_1.2.2  colorspace_2.0-2      numDeriv_2016.8-1.1  
[49] KernSmooth_2.23-20    utf8_1.2.1            stringi_1.6.2        
[52] munsell_0.5.0         crayon_1.4.1 


/ Sudip

Mateusz Staniak

unread,
Jul 8, 2021, 4:24:03 PM7/8/21
to MSstats
Hi,


did you run library(MSstats) before calling groupComparisonPlots?
if that doesn't help, please run MSstatsConvert::MSstatsLogsSettings(FALSE, FALSE, FALSE),
If that still doesn't solve the problem, please run devtools::install_github("Vitek-Lab/MSstats", ref = "hotfix-gc-plots")

Best,
Mateusz

sudip ghosh

unread,
Jul 8, 2021, 4:38:44 PM7/8/21
to MSstats
Thanks Mateusz!

Running MSstatsConvert::MSstatsLogsSettings(FALSE, FALSE, FALSE) worked!
But it is showing another error
Warning message:
ggrepel: 2056 unlabeled data points (too many overlaps). Consider increasing max.overlaps 

so I changed to ProteinName=FALSE, but that is not returning any plot, not throwing any error either!

Best,
Sudip

Mateusz Staniak

unread,
Jul 8, 2021, 4:41:46 PM7/8/21
to MSstats
Hi,
it's actually a warning, so I guess the function produces a plot in that case? Or not? We'll check the ProteinName=FALSE option, thanks
Best,
Mateusz

sudip ghosh

unread,
Jul 8, 2021, 4:48:57 PM7/8/21
to MSstats
Sorry, that was my mistake, I didn't define the path correctly, so it's saving in a different location. I see the plot now. Can I change the color definition like logFC>1& adj.pvalue<0.001 = red.

Another query, Is there any simple way to output proteinSummarization$ProteinLevelData in a matrix where individual channels are in the column and proteins in the rows, right now all the channels in the same column.

 Best,
Sudip

thuan...@gmail.com

unread,
Jul 9, 2021, 1:20:37 PM7/9/21
to MSstats
You can use parameters sig (FDR cutoff for the adjusted p-values) and FCcutoff (fold change cutoff) to change the definition of red color in the volcano plot.

The following code can generate a wide table with channels in the column:

library(dplyr)
library(tidyr)
data <- proteinSummarization$ProteinLevelData
data_wide <- data %>%
  mutate(ID = paste(Run, Channel, sep = "-")) %>% 
  select(Protein, ID, Abundance) %>% 
  spread(ID, Abundance)
head(data_wide)

-Ting

sudip ghosh

unread,
Jul 9, 2021, 5:47:33 PM7/9/21
to MSstats
Thanks Ting, It worked.

Best,
Sudip

Reply all
Reply to author
Forward
0 new messages