MSstatsPTM - some errors for Maxquant label-free datasets

66 views

Skip to first unread message

MATTHEW DING

unread,

Mar 28, 2025, 1:16:26 AMMar 28

to MSstats

Dear All,

Currently we are studying and using the package "MSstatsPTM". We want to apply this tool for our lab's MaxQuant label-free output dataset.

We are trying and testing the provided workflow for label-free data type for the provided data of maxq_lf_annotation.rda and maxq_lf_evidence.rda (in Vitek-Lab's Github: https://github.com/Vitek-Lab/MSstatsPTM/tree/devel/data). As shown in Figure 1, for the first step of data format conversion by using MaxQtoMSstatsPTMFormat, the output PTM list is like the below, however, the site information generated is incorrect. For example, as shown below, for Q9Y3B9, its phosphorylated S here should be at the protein position 8, not 34. Such a problem is not only for this case protein, but for all the output ProteinName column. (In addition, we also try the TMT data type of maxq_tmt_annotation.rda and maxq_tmt_evidence.rda, the same problem.)

In addition, as shown in Figure 2, there is another error when using groupComparisonPTM function to model the PTM and protein summarized datasets (the provided label-free maxq datasets, in Vitek-Lab's Github: https://github.com/Vitek-Lab/MSstatsPTM/tree/devel/data), showing that the GROUP for summarized PTM and PROTEIN cannot match each other.

Attached
Figure 1 Data format conversion problem

Attached
Figure 2 Group comparison problem

Thanks for your kindly help.

Best regards

Matthew Ding

Figure-2.png

Figure-1.png

Anthony Wu

unread,

Apr 2, 2025, 5:49:44 PMApr 2

to MSstats

Hi Matthew,

Thank you for bringing this to our attention.

Quick question - could you clarify which version of MSstatsPTM you are using (i.e. the latest release is version 2.8.1)? For the first issue, I ran MSstatsPTM myself and got the phosphorylated position to be 8, not 34, for Q9Y3B9.

For the second issue, I noticed this issue occurs because group H100_Y0 exists in the PTM dataset, but has all NA values in the PROTEIN dataset after conversion to MSstatsPTM format, leading to that GROUP being missing later on after data processing. There's many missing values due to using the `use_unmod_peptides` parameter, where unmodified peptides in the PTM enriched dataset are used to construct the unmodified PROTEIN dataset. Instead of using the `use_unmod_peptides` feature, we generally recommend users to have 2 separate experiments, one PTM enriched experiment and one global proteome experiment, since PTM enriched datasets are prone to having many missing values for unmodified peptides as seen here. I wouldn't consider this a bug in the code. Since this dataset is only an example in our vignette, I'll adjust the dataset such that the groups in the PTM dataset match the groups in the PROTEIN dataset to ease downstream processing (I will likely remove the data associated with H100_Y0 group altogether).

Thanks,

Tony

Reply all

Reply to author

Forward

0 new messages