MSstatsPTM: normalization and output file

147 views
Skip to first unread message

Peter

unread,
Jan 15, 2024, 4:47:10 AM1/15/24
to MSstats
Dear MSstats team,

I am currently using the MSstatsPTM package to analyze and compare PTMs enriched from different organ samples. 

My setup includes 2 biological replicates of each condition, consisting of 6 different tissue/organs +  a single mixed sample channel (total number of samples = 13). All samples were fractionated into 20 fractions. There is no technical replicates.
I have modified and unmodified samples and converted PSM file output from Proteome Discoverer using the PDtoMSstatsPTMFormat.

I wanted to ask, which type of normalization, you would recommend for this type of experiment? 
So far, I ran the analysis setting global_norm = FALSE ,since the protein abundance between conditions are quite different.
I only have 1 mixture, so I will not use reference_norm for normalization across TMT mixes. I therefore also ran the analysis with reference_norm = FALSE, 
I do have the mixed sample, which is essentially all the other samples mixed equally and then TMT-labelled, as a reference sample. Would it make sense to use it for reference_norm? 


Secondly, I have a few features in the final ouput (Model$Adjusted.model and Model$PTM.model that does not have a PTM. An example:
A2AAJ9_C916
A2ABU4
A2ABU4_149
A2ABU4_292
A2ADF7
I found a previous question in the google group that mentioned the same issue/problem (MSstatsPTM site collapse; 13. February 2023). Is the solution to filter out these features before using MSstatsPTM? 


Thank you for your time and efforts. 
Looking forward to hear from you. 

All the best, 
Peter

Devon Kohler

unread,
Jan 15, 2024, 10:31:06 AM1/15/24
to MSstats
Hi Peter,

In terms of normalization, setting global_norm = TRUE will equalize the median of all channels in your mixture (see plot below). This is a good choice if you expect the majority of your PTMS (or proteins) to not change across conditions (e.g., in a discovery experiment). If your experiment is more targeted, and you expect to see many differential PTMs, than using this option could remove the biological variation you are interested in.

norm.JPG

In terms of the proteins without modifications, the best solution is to just filter these out. In future releases we will take care of these in the converter, but for now filtering out manually is the best option. I've included some code for filtering them out below.

msstats_format$PTM = msstats_format$PTM[grepl("_", msstats_format$PTM$ProteinName),]

Best,
Devon

Peter

unread,
Jan 17, 2024, 6:43:11 AM1/17/24
to MSstats

Hi Devon, 

Thank you so much for your reply.

Yes that is exactly why I ended up choosing to use global_norm = FALSE and global_norm.PTM = FALSE, as I believe there is many differential PTMs across conditions. 
I guess my best choice would be to use some spike-in reference and normalize to that. Is it possible to specify reference standards as normalization factors in MSstatsPTM? is it something that should be specified in the annotation file? 

Thank you for the code. I will apply it to my script.

Thank you again and I wish you a great day.
All the best, 
Peter

Devon Kohler

unread,
Jan 17, 2024, 3:30:12 PM1/17/24
to MSstats
Hi Peter,

In terms of reference standards, there is actually not a default way to do this for TMT labelled experiments. Since your experiment is a single plex, you could leverage base MSstats, which has the option, with a bit of data transformation to make sure the columns match up.

The main question you would want to ask is what you are trying to normalize. Generally in label free experiments reference standards are really helpful to remove any nuisance variation that stems from post sample preparation processes. For example, since LF experiments will have multiple MS runs, reference standards can remove noise stemming from the technical runs. In a single plex TMT experiment you may not see the same level of nuisance variation (e.g. you only have one run). However, if you expect to see significant variation due to something like the tags themselves, adding a reference standard could certainly be useful. 

In general we do not see many TMT experiments with reference standards, which is why we do not have an option for it. However, if you do go down this route you can always reach out for guidance on how to adjust the code.

Best,
Devon

Ivan Gregoretti

unread,
Jan 18, 2024, 10:30:21 AM1/18/24
to MSstats
Hello Peter.

Now that your questions about normalisation has been answered, I have a question for you and perhaps the entire community.

The experiment in question consists of 6 conditions, each represented by 2 biological replicates.

My question:
Is it possible to statistically compare conditions when less than 3 replicates are available?

To the best of my understanding, the classical statistical view is that, to compare two conditions, first one needs to compute the dispersion within each condition. Computing such dispersion requires minimally 3 replicates.

Thank you.

Ivan

Devon Kohler

unread,
Jan 18, 2024, 4:27:08 PM1/18/24
to MSstats
Hi Ivan,

In order to perform statistical testing we need to calculate a signal to noise ratio. To calculate the noise we need at minimum two data points. So we would be able to calculate p-values for the experiment in question. The 3 replicates rule you reference is more of a rule of thumb and not a mathematical issue.

With that being said, you will most likely have very low statistical power (ability to detect true positives) associated with this experiment. We recommend running a power analysis in order to determine how many replicates you need, where you have some estimate of the variance you expect to see in your experiment. See below for an example power analysis plot. You can see that with two replicates per group you would only be able to reliably detect differential proteins with a fold change around 2. In order to detect proteins with a lower fold change you would need more replicates.

Screenshot 2024-01-18 at 4.24.01 PM.png.

Hope this helps!

Devon

Ivan Gregoretti

unread,
Jan 19, 2024, 3:29:23 PM1/19/24
to MSstats
Thanks Devon.
Reply all
Reply to author
Forward
0 new messages