TMT unlabeled & 0 in Norm Channel

27 views

Skip to first unread message

Logan Johnson

unread,

Feb 13, 2025, 6:47:49 PMFeb 13

to MSstats

Hi MSstats Team,

I'm very appreciative of the great software you've generated. I have no issues running the workflows or tools, but I just have a question about the processing steps/seeking recommendations. After looking through the documents that are provided and other conversations in this group, I didn't see any related to the processing/filtering of PSM without a TMT labeled modification. I was also curious how MSstatsTMT handles reference channels with missing values.

I have an experiment with 10 runs with TMT10plex labels and a pooled reference with a 131C label. I have generated a msstats.csv file from FragPipe 22.0 with the default TMT10plex workflow. I noticed that after the protein summarization step, there can be proteins that were retained from which in the unprocessed data:

1) Have no TMT modification in the 'Modified.Peptide.Sequence' column or,

2) Have a '0' value in the 'Norm' or '131C Channel' in my scenario.

I was curious how MSstatsTMT handles these cases and/or if you would recommend removing these prior to the 'PhilosophertoMSstatsTMTFormat'? I believe that TMT-Integrator removes unlabeled PSM from its result file (a separate result file from msstats.csv I know). And if you would recommend removing those PSMs that have a 0/NA for the pooled reference before this converter step.

My code chunk is below. I've attached a screenshot of my experiment's annotation file for plex_01. And my R session info is partially attached. I can share an example of a few of the proteins/PSM cases if that is helpful.

One other note is that in FragPipe, the msstats.csv file creates the column 'Probability' that isn't recognized in the MSstats Philosopher conversion step and says 'PeptideProphetProbability not found in input columns.' Renaming the column solves this, but I just thought I'd share.

Thanks again!

data <- PhilosophertoMSstatsTMTFormat(input = raw_data,
                                    annotation = annot_file,
                                    protein_id_col = "Protein.ID",
                                    peptide_id_col = "Peptide.Sequence",
                                    Purity_cutoff = 0.6,
                                    PeptideProphet_prob_cutoff = 0.7,
                                    useUniquePeptide = TRUE,
                                    rmPSM_withfewMea_withinRun = TRUE,
                                    rmPeptide_OxidationM = TRUE,
                                    rmProtein_with1Feature = FALSE,
                                    verbose = TRUE,
                                    use_log_file = FALSE,
                                    summaryforMultipleRows = sum)

data = unique(as.data.frame(data))

summarized = MSstatsTMT::proteinSummarization(data,
                                            method = 'msstats',             
                                            global_norm = TRUE,             
                                            reference_norm = TRUE,              
                                            remove_norm_channel  = TRUE,                
                                            remove_empty_channel = TRUE,
                                            verbose = TRUE, 
                                            use_log_file = FALSE)

Screenshot 2025-02-13 at 5.33.56 PM.png

Screenshot 2025-02-13 at 5.29.01 PM.png

Anthony Wu

unread,

Feb 21, 2025, 5:36:14 PMFeb 21

to MSstats

Hi,

I'm not sure if I can answer your questions in one response as I'll need to follow up with further investigation. And yes, sharing data on a few of the proteins/PSM cases would help with a follow up.

To start:

MSstatsTMT keeps proteins with no TMT modification in the 'Modified.Peptide.Sequence' column

I'm not too familiar with how this TMT modification in that column works. When there's no TMT modification in that column, how does quantification look like? If there's theoretically no TMT modification, then would the quantification values all be the same for each channel since there's no way of telling which channel the peptide signal belongs to? My hunch is that these rows should be removed, but would like to understand how these peptides got into the report in the first place and how the peptides are quantified.

MSstatsTMT keeps proteins with a '0' value in the 'Norm' or '131C Channel' in my scenario.

To clarify, how many mixtures is in your experiment? That is, when you say 10 runs, do you mean 10 mixtures, or 1 mixture with TMT10plex labeling? If you only have one mixture, then I don't think the reference channel is used. But if you have more than one mixture, I'll need to investigate further on how MSstatsTMT handles this situation.

We can rename the column used. I'm a little confused on why MSstatsTMT defaults to "PeptideProphetProbability". Is there another report from Fragpipe that might have this column?

Thanks,

Tony

Reply all

Reply to author

Forward

0 new messages