Hi,
I'll add some thoughts. Comet does not process reporter ions for TMT quant. I assume you are using some pipeline that can use Comet search results for identifying peptide sequences, inferring proteins, and doing the TMT quantitative summarization.
TMT on Orbitraps can be done two ways. All quadrupole/Orbitrap instruments (Q-Exactives and Exploris line) are limited to reporter ions present in the MS2 spectrum used for peptide identification. That has to be taken at high enough resolution to resolve the N- and C- forms of the TMT tags. The Tribrid instruments can do TMT in MS2 scans like the simpler Orbitraps, but they also support a better way to acquire more accurate reporter ion signals. This is the synchronous precursor selection (SPS) MS3 acquisition. In the SPS-MS3 instrument method, peptide identification MS2 scans are taken in the low resolution linear ion trap. A second low resolution ion trap scan repeats the peptide fragmentation and selects (typically 10) fragment ions in narrow notches. Those selected fragment ions are transferred to a collision cell for higher energy fragmentation to liberate the reporter ions with high efficiency. The reporter ions are analyzed in MS3 scans done in the Orbitrap, typically at a resolution of 50-60K. These scans, with m/z typically from 110 to 500, give the reporter ion signals for the relative quantification.
Comet searches are configured differently depending on how the TMT data was acquired. If reporter ions are being measured in the MS2 scans, those scans are high resolution and you use the high resolution fragment ion mass settings. Excluding the region with the reporter ions may improve match scoring, although there are no b- or y-ions in the reporter ion m/z region. If the SPS-MS3 method was used, the fragment ion tolerances are the low resolution setting for the ion trap. Reporter ions will be weak in the lower collision energy used for the CID scans and their m/z region does not need to be excluded.
Modifications are independent of the way the TMT data was acquired. I try to do as few variable modifications as possible, typically only oxidized Met. I always specify the TMT tags (peptide N-term and on lysine) as static modifications, along with alkylated cysteine. Here is part of a comet.params file showing the variable modifications:
#
# Up to 9 variable modifications are supported
# format: <mass> <residues> <0=variable/else binary> <max_mods_per_peptide> <term_distance> <n/c-term> <required>
# e.g. 79.966331 STY 0 3 -1 0 0
#
variable_mod01 = 15.9949 M 0 3 -1 0 0
variable_mod02 = 0.0000 X 0 3 -1 0 0
variable_mod03 = 0.0000 X 0 3 -1 0 0
variable_mod04 = 0.0000 X 0 3 -1 0 0
variable_mod05 = 0.0000 X 0 3 -1 0 0
variable_mod06 = 0.0000 X 0 3 -1 0 0
variable_mod07 = 0.0000 X 0 3 -1 0 0
variable_mod08 = 0.0000 X 0 3 -1 0 0
variable_mod09 = 0.0000 X 0 3 -1 0 0
max_variable_mods_in_peptide = 5
require_variable_mod = 0
And the static modifications:
#
# additional modifications
#
add_Cterm_peptide = 0.0000
add_Nterm_peptide = 304.2071
add_Cterm_protein = 0.0000
add_Nterm_protein = 0.0000
add_G_glycine = 0.0000 # added to G - avg. 57.0513, mono. 57.02146
add_A_alanine = 0.0000 # added to A - avg. 71.0779, mono. 71.03711
add_S_serine = 0.0000 # added to S - avg. 87.0773, mono. 87.03203
add_P_proline = 0.0000 # added to P - avg. 97.1152, mono. 97.05276
add_V_valine = 0.0000 # added to V - avg. 99.1311, mono. 99.06841
add_T_threonine = 0.0000 # added to T - avg. 101.1038, mono. 101.04768
add_C_cysteine = 57.0215 # added to C - avg. 103.1429, mono. 103.00918
add_L_leucine = 0.0000 # added to L - avg. 113.1576, mono. 113.08406
add_I_isoleucine = 0.0000 # added to I - avg. 113.1576, mono. 113.08406
add_N_asparagine = 0.0000 # added to N - avg. 114.1026, mono. 114.04293
add_D_aspartic_acid = 0.0000 # added to D - avg. 115.0874, mono. 115.02694
add_Q_glutamine = 0.0000 # added to Q - avg. 128.1292, mono. 128.05858
add_K_lysine = 304.2071 # added to K - avg. 128.1723, mono. 128.09496
add_E_glutamic_acid = 0.0000 # added to E - avg. 129.1140, mono. 129.04259
add_M_methionine = 0.0000 # added to M - avg. 131.1961, mono. 131.04048
add_O_ornithine = 0.0000 # added to O - avg. 132.1610, mono 132.08988
add_H_histidine = 0.0000 # added to H - avg. 137.1393, mono. 137.05891
add_F_phenylalanine = 0.0000 # added to F - avg. 147.1739, mono. 147.06841
add_U_selenocysteine = 0.0000 # added to U - avg. 150.3079, mono. 150.95363
add_R_arginine = 0.0000 # added to R - avg. 156.1857, mono. 156.10111
add_Y_tyrosine = 0.0000 # added to Y - avg. 163.0633, mono. 163.06333
add_W_tryptophan = 0.0000 # added to W - avg. 186.0793, mono. 186.07931
add_B_user_amino_acid = 0.0000 # added to B - avg. 0.0000, mono. 0.00000
add_J_user_amino_acid = 0.0000 # added to J - avg. 0.0000, mono. 0.00000
add_X_user_amino_acid = 0.0000 # added to X - avg. 0.0000, mono. 0.00000
add_Z_user_amino_acid = 0.0000 # added to Z - avg. 0.0000, mono. 0.00000
Note that these are the masses for the newer TMTpro reagents. The 6/10/11 plex tags are 229.1629 mass values.
Precursor ion settings are an area where I disagree with the rest of the proteomics community. I used wide tolerance settings (1.25 Da), and Da m/z scales rather than ppm. My pipeline is designed to work that way. If you are bored, you can read more at
https://pwilmart.github.io/blog/2021/04/22/Parent-ion-tolerance. With the more commonly used narrow tolerance searches, I recommend 50 ppm and allowing isotopic peak mis-triggers (isotope_error = 1 for Comet). 10 ppm is too narrow as mass calibration drifts on Orbitraps can easily be in this range. If you have errors larger than 20 ppm, the instrument should have been recalibrated. The reason to use 50ppm instead of 20ppm is to allow incorrect matches to have larger mass errors that distinguish them from correct matches which should have small mass errors. This adds power to post processing classifiers like Percolator.
Protein FASTA file choices are another area where I seem to disagree with what most folks do. FASTA files from UniProt is not a simple thing at all. See the top part of this blog entry to see what kind of mouse FASTA files you can get from UniProt:
https://pwilmart.github.io/blog/2020/09/19/shotgun-quantification-part2. I almost exclusively use the canonical forms of the reference proteomes (the one protein per gene options) and never add isoforms. I want as little peptide redundancy as possible when doing TMT quantification. This simplifies protein inference and deciding what peptides are usable for quant as most peptides map to only one protein sequence. I do FASTA processing outside of the search engine. I add a set of common contaminants and then add sequence reversed entries. I do not need to do any fancy methods to make decoys because I do wide tolerance searching. Narrow tolerance searches may require more care to make sure the decoy peptides are accurate mass balanced with target peptides. Comet can make decoys, but does not have any internal set of common contaminants. You can add contaminants to you target mouse proteins and then use Comet options to make decoys for you. You need to make sure the decoy Comet option is compatible with the post processing steps for the Comet search results. Some pipelines match peptide sequences to the FASTA file entries and you will not have the decoy proteins in your FASTA file. I think Comet makes decoy peptides for each MS2 spectrum based on the target peptides that were scored.
You could also look at MSFragger with developed workflows for TMT. They have tutorials for common analyses scenarios. The search is a little different than Comet. The post processing steps might be similar to trans proteome pipeline steps. I personally have never used MSFragger for TMT. I have my own pipeline that uses Comet for the searches and a series of Python scripts for peptide filtering, protein inference, and reporter ion processing. These shotgun TMT experiments are quite complicated to perform and to analyze. There are a lot of bench steps, the data acquisition is complicated, and the data processing involves many steps. The statistical analysis of the resulting TMT data is also very involved. The increased number of channels in TMT experiments are often used to do more complicated study designs which need more elaborate statistical analyses.
Cheers,
Phil Wilmarth
PSR Core, Oregon Health & Science University.