Annotation Issue

112 views
Skip to first unread message

Montoni Bass

unread,
Jan 6, 2025, 7:46:52 PMJan 6
to MSstats
(I'm having issues to confirm the message. Sorry if multiple messages were sent)

Hello! I'm performing a mutibatch 16-Plex TMT, but I'm having this issue: 
screenshot.png

Could you please help me to solve this issue? 

With the best regards, 

Montoni.

Montoni Bass

unread,
Jan 6, 2025, 7:52:22 PMJan 6
to MSstats
Attached the annotation file, the script and the evidence.txt file header. 



file.xlsx
msstatsTMT.txt
evidence.txt

Anthony Wu

unread,
Jan 16, 2025, 10:19:25 AMJan 16
to MSstats
Hey,

My initial guess is that in your annotation file, you need to remove the extra space between "Report Intensity" and the channel number, i.e. "Reporter Intensity  1" (two spaces) is not the same as "Reporter Intensity 1" (1 space).  And ensure the capitalization is aligned, i.e.  "Reporter intensity" is not equal to "Reporter Intensity".

Other than that, without a subset of your evidence file, it's hard for me to diagnose your actual problem. If you still encounter issues, please send a subset of your evidence file with quantification values.

Tony

Montoni Bass

unread,
Jan 17, 2025, 10:31:01 AMJan 17
to MSstats
Dear Anthony,

Thank you for your answer! I tried to re-run with the corrected name (eg. "Reporter intensity 1")  but the issue persists. I also tried to use the format "Reporter.intensity.1", but none of these two have worked. I am attaching the evidence file with only a subset of the data.. I made sure all the raw file names are  also contained in it. 
Thank you in advance for your help,

Montoni.
evidence.txt

Montoni Bass

unread,
Jan 23, 2025, 4:16:38 PMJan 23
to MSstats
Dear Anthony. 

It seems I managed to solve the channel name in the annotation. But I'm still facing some issues: 

Warning message:
In melt.data.table(mq_input, measure.vars = channels, id.vars = c("ProteinName", :
  'measure.vars' [Reporterintensitycorrected1, Reporterintensitycorrected2, Reporterintensitycorrected3, Reporterintensitycorrected4, ...] are not all the same type. By hierarchy order, the resulting data value column will be of type 'double'. All variables that are not already of type 'double' will be coerced. Check the DETAILS in ?melt.data.table for more about coercion.

It is particularly strange, because in my annotation file I am using the "Reporter intensity" columns not "Reporter intensity Corrected", so probably this warning can be ignored. 

Then the only real issu that I found out is that:

INFO [2025-01-23 18:02:45] ** 'Norm' information in Condition is required for normalization. Please check it. At this moment, normalization is not performed.

But in my annotation file (attached) I put the "Norm" in the 16th channel correctly. ALso I am sharing a better evidence.txt file.

My main objective with Ms Stats is to normalize the results using the reference channel, align by the meadians, and make imputation of missing values, then after that, use the data at protein level to third party applications.


This is my code:

library(MSstatsPTM)
library(readxl)


maxq_tmt_evidence <- read.table("data/evidence_subset.txt", sep = "\t", header = TRUE)

maxq_tmt_annotation <- read_excel("data/file.xlsx")

head(maxq_tmt_evidence)
head(maxq_tmt_annotation)

msstats_format_tmt = MaxQtoMSstatsPTMFormat(evidence=maxq_tmt_evidence,
                                            annotation=maxq_tmt_annotation,
                                            fasta=('data/Musmusculus_uniprotkb_AND_reviewed_true_AND_model_o_2024_06_25.fasta'),
                                            fasta_protein_name="uniprot_ac",
                                            mod_id="\\(Phospho \\(STY\\)\\)",
                                            use_unmod_peptides=TRUE,
                                            labeling_type = "TMT",
                                            which_proteinid_ptm = "Proteins")




# Save the PTM data to a CSV file
#write.csv(msstats_format_tmt, file = "Data/msstats_format_tmt_PTM.csv", row.names = FALSE)

# Save the PROTEIN data to a CSV file
#write.csv(msstats_format_tmt$PROTEIN, file = "Data/msstats_format_tmt_PROTEIN.csv", row.names = FALSE)

head(msstats_format_tmt$PROTEIN, n = 40)

# Summarize data using dataSummarizationPTM_TMT
summary_data_tmt <- dataSummarizationPTM_TMT(
  data = msstats_format_tmt,
  method = "msstats",
  global_norm = TRUE,
  reference_norm = TRUE,
  MBimpute = TRUE,
  verbose = TRUE
)

Thank you in advance!

With the best regards,

Montoni.


evidence_subset.txt
file.xlsx

Anthony Wu

unread,
Jan 23, 2025, 5:24:57 PMJan 23
to MSstats
Hi,

By default, the MaxQuant MsstatsPTM converter uses the ReporterIntensityCorrected column for processing intensity values.  I'm not sure what makes ReporterIntensity vs ReporterIntensityCorrected columns different, but if you prefer to use ReporterIntensity, please let me know and I can look to enable the code to use ReporterIntensity.

Regardless, I looked at your data and the intensities for the norm channel in the ReporterIntensityCorrected column are all <= 1 (and all <= 1 in the ReporterIntensity column with the exception of one row), which leads to all of those values being considered as missing values within MSstatsPTM.  As a result, MSstatsPTM cannot normalize w.r.t. to the norm channel since those values are considered missing.

Out of curiosity, is there any reason why the intensity values in your dataset are so low, especially for the norm channel?

Tony

Montoni Bass

unread,
Jan 23, 2025, 6:23:10 PMJan 23
to MSstats
Hello, 

Thank you for your answer! Actually for me the default is ok. I can use Report instensity corrected with no problems. 

Regarding the intensity level. Now it all makes sense. By defaut in the lab I work they use the option "Ratio to the reference channel", making the 16th channel be 1 in all cases. So in short, I will adapt my intensity file to the  ReporterIntensityCorrected column and re-run Maxquant file, with no normalization at all. Then it will guarantee that I'm working with raw intensities only. I will report if everything went fine after doing all the procedure again!

Thank you in advance!

Montoni Bass

unread,
Feb 5, 2025, 6:43:36 PMFeb 5
to MSstats
Hello! I succesfully ran the code with the new annotation file after not enabling the normalization in MaxQuant. However, I'm still facing some issues:

I've received the following warning:

(50 in total)
Warning messages: 1: In max(by_score$score, na.rm = TRUE) : no non-missing arguments to max; returning -Inf

Then when I boxplot my MsStatsformat file to compare with the normalized and inputed one, I realized that lots of the samples are actually missing, as shown in the image:
msstatsformat.png

Can you please, help me diagnose the problem? I am pretty sure that these samples are not NaN, because I also made a boxplot of the raw .txt file, so I don't know what might be the issue. Attached the new subset of the evidence file and the annotation file used.

With the best regards,

Montoni.
evidence_subset.txt
file.xlsx

Anthony Wu

unread,
Feb 6, 2025, 6:00:14 PMFeb 6
to MSstats
Hi,

What code are you using to generate these box plots? Is it from an MSstatsPTM function or did you use standard R plotting functions?

I ran your files through MSstatsPTM and after performing dataSummarizationPTM_TMT, I filtered the FeatureLevelData by the bioreplicate ID "Ctrl 30 min + FBS 2" (which seems to be missing based on your boxplot), but the filtered data table ended up having intensity values.  

Could you check if your data table is also missing values for those bioreplicates that you claim to have all missing values?  Below is my reanalysis that you can follow too that shows there were no missing values.

library(MSstatsPTM) 
library(readxl) 
library(tidyverse) 
maxq_tmt_evidence <- read.table("evidence_subset.txt", sep = "\t", header = TRUE) 
maxq_tmt_annotation <- read_excel("file.xlsx") 
msstats_format_tmt = MaxQtoMSstatsPTMFormat(evidence=maxq_tmt_evidence, annotation=maxq_tmt_annotation, fasta=('idmapping.fasta'), fasta_protein_name="uniprot_ac", mod_id="\\(Phospho \\(STY\\)\\)", use_unmod_peptides=TRUE, labeling_type = "TMT", which_proteinid_ptm = "Proteins") 
summarized_data <- dataSummarizationPTM_TMT( data = msstats_format_tmt ) 
filtered_table = summarized_data$PTM$FeatureLevelData %>% filter(summarized_data$PTM$FeatureLevelData$BioReplicate == "Ctrl 30 min + FBS 2") 
head(filtered_table) # this is not empty

Thanks,
Tony

Montoni Bass

unread,
Feb 11, 2025, 7:19:49 PMFeb 11
to MSstats
Dear Anthony, 

I was using a python script that was mistakenly ignoring some lines. Now I can see that all the samples has data. Sorry for the oversight!

Best regards,

Montoni.

Reply all
Reply to author
Forward
0 new messages