Interpretation help

38 views
Skip to first unread message

Erik Johnson

unread,
Oct 7, 2024, 12:33:36 AMOct 7
to Comet ms/ms db search support
Howdy!

In the Comet docs, I don't see an explanation of what each column in Comet's output corresponds to. Can someone point me to a page that helps me understand what each column in Comet's output means? Thank you!

Jimmy Eng

unread,
Oct 7, 2024, 1:47:48 PMOct 7
to Erik Johnson, Comet ms/ms db search support
Hi Erik.  I'll assume you're referring to columns in the text output format.  Here are the columns and a brief description of each.  Feel free to follow up if you want a more complete explanation for any of them.  I'll add this information to the output_txtfile parameters page.
  • scan:  The scan number of the spectrum that was searched.
  • num:  For each scan, the top N best scoring peptides are returned, controlled by the num_output_lines parameter.  This column displays the peptide order for each scan, starting at "1" to indicate the top scoring peptide, "2" to indicate the second best scoring peptide, etc.
  • charge:  The precursor charge state
  • exp_neutral_mass:  The experimental neutral mass of the measured precursor ion.
  • calc_neutral_mass:  The calculated neutral mass of the matched peptide.
  • e_value:  The expectation value or E-value score for the peptide.  Some info in this paper on how Comet calculates E-values (although there is a correction to one detail in the paper on the calculation).
  • xcorr:  The cross correlation score for the peptide.  Here's a very nice illustration of how this score is calculated by Will Fondrie.
  • delta_cn:  The deltaCn which is the difference in the normalized cross correlation score, historically between top hit and next best hit.  In Comet, each deltaCn score for each row is the  difference in the normalized cross correlation score between that hit and next lower peptide hit.
  • sp_score:  The preliminary score which is the sum of peak intensities that match the peptide and accounts for continuity of an ion series and the length of the peptide.  I think of this as a quick/simple peptide match score that now exists only for backwards compatibility for post-search processing tools.
  • ions_matched:  Out of the total number of theoretical fragment ions being considered in the search, this is the number of those ions that were found in the experimental spectrum.
  • ions_total:   The total number of theoretical fragment ions for the peptide.
  • plain_peptide:  The raw peptide sequence.
  • modified_peptide:  The peptide sequence including previous and next amino acids as well as any variable modifications.
  • prev_aa:  In the first protein that contains this peptide, the amino acid just before or n-terminal to the peptide.
  • next_aa:   In the first protein that contains this peptide, the amino acid just after or c-terminal to the peptide.
  • protein:  The first protein in the database that contains the peptide.
  • protein_count:  The total number of proteins in the database that contains the peptide.
  • modifications:  An encoding of static and variable modifications in the peptide.  See the documentation in output_txtfile for explanation of this encoding.
  • retention_time_sec:  If available from the query file, this reports the retention time in seconds of the spectrum being searched.
  • sp_rank:  The rank of the preliminary score (Sp).  If the peptide results were ordered by the sp_score column, this column reports the rank order of this peptide when sorted by sp_score.  So if this peptide had the fourth highest sp_score, this column would contain a "4".

On Sun, Oct 6, 2024 at 9:33 PM Erik Johnson <airik.k...@gmail.com> wrote:
Howdy!

In the Comet docs, I don't see an explanation of what each column in Comet's output corresponds to. Can someone point me to a page that helps me understand what each column in Comet's output means? Thank you!

--
You received this message because you are subscribed to the Google Groups "Comet ms/ms db search support" group.
To unsubscribe from this group and stop receiving emails from it, send an email to comet-ms+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/comet-ms/ac458efe-edfe-4a50-962a-0f0f6ab63f71n%40googlegroups.com.

Erik Johnson

unread,
Nov 12, 2024, 1:19:39 PMNov 12
to Comet ms/ms db search support
Hi Jimmy, 

Thanks for that info! 

I've got some more questions for you if you don't mind: given an input spectrum, I know Comet does some spectrum processing before comparing the spectrum to theoretical peptides. Two questions:
  1. Is it possible to get/output the processed spectrum that Comet using in processed spectrum to peptide matching?
  2. Is the "ions_total" column the number of peaks in the processed spectrum? I thought it would be the theoretical ions that the matched peptide would produce but it doesn't look to be. For instance, if we're only considering b- and y-ions and the peptide ABC, I would think the theoretical ions would be {A, BC, AB, C} and ions_total would be 4. But it doesn't look like that's right.
Thanks for the help!


Best,
Erik


Jimmy Eng

unread,
Nov 12, 2024, 3:25:46 PMNov 12
to Comet ms/ms db search support
Hi Erik.  

Regarding #1, the processed "spectrum" that Comet uses internally is a binned array of floating point numbers where the array index is the binned mass and the array value is the processed intensity for the fast cross correlation scoring.  The binned mass is based on the fragment_bin_tol and fragment_bin_offset parameters.  This pseudo "spectrum", especially after the processing for the fast cross correlation scoring, is semi useless unless you use it directly to generate the cross correlation scores.  If this is what you want, intervene at line 929 of CometPreprocess.cpp and export pScoring->pfFastXcorrData[] in whatever format you'd want to see that array.  Or if you're interested in the array before the fast cross correlation processing, dump pdTmpRawData[] at line 979 of the same file.  If this isn't want you're looking for, definitely follow-up again; I'll try and assist you in whatever you're trying to accomplish.  I just don't think Comet's internal array representation of spectra is really going to be useful unless you want to learn about how Comet does things.  If you want either of these arrays and aren't comfortable adding in the code to export them, let me know and I'll get you a binary that does this for you.  You'll have to define what format you want these exported in.

The "ions_total" value is the number of theoretical fragment ions in a peptide.  For a peptide of length N, if you're only considering b- and y-ions, this number would be 2*(N-1).  What you might be missing is that this number is also scaled by the fragment ion charge states that are analyzed.  So if 1+ and 2+ fragment ions are analyzed, this number would be 2*2*(N-1).  If 1+, 2+, and 3+ fragment ions are analyzed, this number would be 3*2*(N-1).  The maximum fragment ion charge state considered in controlled by the max_fragment_charge parameter.  Comet considers all fragment charge states up to 1 less than the precursor charge state or the charge defined in max_fragment_charge  , whatever is less.


Reply all
Reply to author
Forward
0 new messages