Data import using MaxQtoMSstatsPTMFormat fails

273 views
Skip to first unread message

alkmetros1

unread,
Oct 27, 2021, 6:04:05 AM10/27/21
to MSstats
Hello everbody, 

I'm new to MSstats/MSstatsPTM and I'd like to use it for the analysis of Phospho-TMT data, which has been preprocessed using MaxQuant. I'm using TMTpro (i.e. 16 samples per plex), 4 plexes and 15 fractions per plex. The last position of the plex was used as an internal standard, using a mixture of all samples from the respective plex. There are no technical replicates.

Following the BioConductor vignette, I tried to start importing the data using the MaxQtoMSstatsPTMFormat function. I start by importing the Phospho (STY)Sites, Evidence and ProteinGroups files. However, when using the MaxQtoMSstatsPTMFormat function, the processes quits with the following error message:

raw.input <- MaxQtoMSstatsPTMFormat(sites.data = phospho.raw, annotation = anno, proteinGroups = pgroups, evidence = evi)

** + Contaminant, + Reverse, + Only.identified.by.site, PTMs are removed. 
Error in tstrsplit(PeptideSequence, ":", keep = 1) : 'keep' should contain integer values between 0 and 0.

I tried to backtrace the error, but was not successful. Did anybody ever encounter this issue before or does anybody have an idea how to solve it?

I have attached the first 20 rows of my evidence file as well as the complete annotation file. I would really appreciate if someone could check those file to see if I made any mistake during the creation of the annotation file. I've struggled a bit with it as I think it's not very straightforward, but I have followed the guide provided on the MSstats website (https://msstats.org/wp-content/uploads/2021/05/Ting-AnnotationFilePreparation.pdf)

Thanks a lot for your support.

Best
Philipp
evidence_red.txt
annotation.csv

Devon Kohler

unread,
Oct 28, 2021, 9:09:49 AM10/28/21
to MSstats
Hi Philipp,

Would you be able to send over example files for the  Phospho (STY)Sites and  ProteinGroups files? I don't see anything off in the ones you sent over so I'd like to try to run the function and see what is causing the error.

Best,
Devon

alkmetros1

unread,
Oct 28, 2021, 6:05:19 PM10/28/21
to MSstats
Hey Devon,

thank you so much for your reply! I have attached a reduced version (first 100 rows) of the Phospho (STY)Sites and ProteinGroups file, I hope that you can work with this?

Best 
Philipp

Phospho (STY)Sites_red.txt
proteinGroups_red.txt

alkmetros1

unread,
Nov 3, 2021, 5:54:55 AM11/3/21
to MSstats
Hey Devon,

I just wanted to check whether you received the data and if you already had the chance to test it?

Thank you so much for your support!

Best
Philipp

Devon Kohler

unread,
Nov 8, 2021, 10:16:57 AM11/8/21
to MSstats
Hi Philipp,

Sorry this took a little longer than expected, I had to make some changes to the converter because it was expecting a different naming convention in the column names of the Phospho (STY)Sites file. I fixed this and now it works with the data you sent over. Just make sure to update the `keyword` parameter as below. I pushed these changes to Bioconductor as a bug fix, but it will take a day or two to propagate in their systems, alternatively feel free to download the package directly from github:


Hope this fixes everything!

Best,
Devon

## Load data
annotation <- read.csv("annotation.csv")
evidence <- read.csv("evidence_red.txt", sep = "\t")

phospho_sites <- read.csv("Phospho (STY)Sites_red.txt", sep = "\t")
prot_groups <- read.csv("proteinGroups_red.txt", sep = "\t")

## Add mod into protein name
test <- MaxQtoMSstatsPTMFormat(phospho_sites, annotation, keyword = "Plex")

alkmetros1

unread,
Nov 9, 2021, 6:39:46 AM11/9/21
to MSstats
Hey Devon,

sorry, I accidentally just included you into my last answer, so I'll post it again publicly. Sorry für the inconvenience.

Thank you so much for your continuous effort and support. It's working now, although the columns that should have been populated from the annotation file only included NAs, but I was able to fix this manually. I still run into issues downstream when using the dataSummarizationPTM_TMT function on combined PTM and Protein data:

PTM.sum <- dataSummarizationPTM_TMT(data = PTM.data, method = "msstats", verbose = T) Error in checkHT(n, dx <- dim(x)) : invalid 'n' - must contain at least one non-missing element, got none.

I'm not sure if this fits the error message, but looking at the data list (e.g. PeptideSequence and PSM column), I suspect there might be an issue with the naming convention during the import of the Protein data as well?

Bildschirmfoto 2021-11-09 um 12.20.29.png

Best
Philipp

Salvador Martinez

unread,
Mar 16, 2022, 7:43:00 PM3/16/22
to MSstats
Hi:
 I am experiencing the same issue. Does somebody know why is this happening? 
Thanks!

Salva.

Salvador Martinez

unread,
Mar 16, 2022, 8:00:05 PM3/16/22
to MSstats
I think I found the solution. Instead of using the output of the TMT converter, such as  PDtoMSstatsTMTFormat or others (that is an object of class MSstatsValidated) in the input of the dataSummarizationPTM_TMT, do:
as.data.table(unclass(PTM.data)) and as.data.table(unclass(PROTEIN.data)) before passing them as input of dataSummarizationPTM_TMT.
Hope it helps someone.

Salva.

alkmetros1

unread,
Mar 17, 2022, 3:31:48 PM3/17/22
to MSstats
Thanks so much Salva, that's amazing! This seems to solve the issue!

Best
Philipp

Reply all
Reply to author
Forward
0 new messages