Interpretation of stats from pepXML viewer

42 views
Skip to first unread message

Alastair Skeffington

unread,
Dec 13, 2019, 5:29:16 AM12/13/19
to spctools-discuss
Hi,

Can someone explain what's happen here please:

Number of spectra in the .mgf file = 1856
Number of spectra after conversion to mzXML = 1856
Number of spectra in pepXML file after comet search: 1752

I then search against database A (DB-A) and database B (DB-B) and count the number of spectra in the pepXML file as counts of "<spectrum_query spectrum=". Yields 1752 in both cases.

I then run "xinteract -Neg.pepXML -PPM -dDECOY_ -OARP eg_pepp.pepXML" on each of the files. Now when I search for the string "<spectrum_query spectrum=" I get 481 hits for DB-A and 632 hits for DB-B. The pepXML view then calculates the "Efficiency ID'd/searched" - so the denominator is different for each database.
 
Can someone help to explain this behavior?

  • Why does the xinteract output present different numbers of records ("<spectrum_query spectrum= ...") for the two different databases, even though the number of records in the pepXML input was the same?
  • Isn't the definition of 'Spectra searched' used by Petunia misleading? It's not the number of input spectra to the search engine.
  • Presumably the loss of spectra between the input to comet and the output is due to loosing those spectra with absolutely no match to the database or decoys due to some internal threshold on the score for PSMs.

Any help would be hugely appreciated!

Thanks,
Alastair

Eric Deutsch

unread,
Dec 13, 2019, 10:25:53 AM12/13/19
to spctools...@googlegroups.com, Eric Deutsch

Hi Alistair, thanks for posting your questions. Although it’s possible something problematic is happening, I think the answers to your questions are:

 

1) Most search engines including Comet have a minimum spectrum quality threshold to even search the spectrum, e.g. minimum of 10 peaks. There are settings in the comet.params. Also, if your database is very small (not generally recommended) with very narrow tolerances, there may not be any possible peptides that match the query spectrum precursor m/z within tolerance. In both cases, there is no result from the search engine, and therefore nothing in the pepXML

 

2) My best guess here is: the default behavior of PeptideProphet is to NOT write out any spectra with probability < 0.05 to the output file so that it is smaller and faster to explore. You can turn off this behavior with the -p0 option to xinteract. This will pass through all PSMs.

--
You received this message because you are subscribed to the Google Groups "spctools-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to spctools-discu...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/spctools-discuss/09b3c5de-a9b9-46be-8b40-d502a5c17bf6%40googlegroups.com.

Jimmy Eng

unread,
Dec 13, 2019, 11:27:18 AM12/13/19
to spctools...@googlegroups.com
Regarding inputs and output counts to Comet, as Eric mentioned, there are search parameters which affect whether or not a spectrum will be searched at all.  These are: activation_method, digest_mass_range, max_precursor_charge, minimum_peaks, minimum_intensity, and remove_precursor_peak (which can affect minimum_peaks).  A query spectrum that happens to be out of bounds for any of these parameters will not be analyzed (is effectively ignored) and will not show up in the output.  Any spectra that passes these filters will have an output entry in Comet's pepXML output file, even if there's no matching peptide output.  Such blank outputs can happen in cases where the database or tolerances are so small that no peptides are scored or the spectrum is so poor that the xcorr score is less than or equal to zero.  This is why you see 1752 spectra in the pepXML after Comet searches of both your databases as that's the number of spectra that actually get analyzed and is a function of the input and not the database being searched.

--

Eric Deutsch

unread,
Dec 13, 2019, 3:56:04 PM12/13/19
to spctools...@googlegroups.com, Eric Deutsch

Thanks for the clarification, Jimmy!

Alastair Skeffington

unread,
Dec 16, 2019, 4:22:35 PM12/16/19
to spctools-discuss
Thanks very much for your answers - that's very helpful!
Alastair


Am Freitag, 13. Dezember 2019 21:56:04 UTC+1 schrieb Eric Deutsch:

Thanks for the clarification, Jimmy!

 

From: spctools...@googlegroups.com <spctools...@googlegroups.com> On Behalf Of Jimmy Eng
Sent: Friday, December 13, 2019 8:27 AM
To: spctools...@googlegroups.com
Subject: Re: [spctools-discuss] Interpretation of stats from pepXML viewer

 

Regarding inputs and output counts to Comet, as Eric mentioned, there are search parameters which affect whether or not a spectrum will be searched at all.  These are: activation_method, digest_mass_range, max_precursor_charge, minimum_peaks, minimum_intensity, and remove_precursor_peak (which can affect minimum_peaks).  A query spectrum that happens to be out of bounds for any of these parameters will not be analyzed (is effectively ignored) and will not show up in the output.  Any spectra that passes these filters will have an output entry in Comet's pepXML output file, even if there's no matching peptide output.  Such blank outputs can happen in cases where the database or tolerances are so small that no peptides are scored or the spectrum is so poor that the xcorr score is less than or equal to zero.  This is why you see 1752 spectra in the pepXML after Comet searches of both your databases as that's the number of spectra that actually get analyzed and is a function of the input and not the database being searched.

 

On Fri, Dec 13, 2019 at 2:29 AM 'Alastair Skeffington' via spctools-discuss <spctools...@googlegroups.com> wrote:

Hi,

 

Can someone explain what's happen here please:

 

Number of spectra in the .mgf file = 1856

Number of spectra after conversion to mzXML = 1856

Number of spectra in pepXML file after comet search: 1752

 

I then search against database A (DB-A) and database B (DB-B) and count the number of spectra in the pepXML file as counts of "<spectrum_query spectrum=". Yields 1752 in both cases.

 

I then run "xinteract -Neg.pepXML -PPM -dDECOY_ -OARP eg_pepp.pepXML" on each of the files. Now when I search for the string "<spectrum_query spectrum=" I get 481 hits for DB-A and 632 hits for DB-B. The pepXML view then calculates the "Efficiency ID'd/searched" - so the denominator is different for each database.

 

Can someone help to explain this behavior?

 

  • Why does the xinteract output present different numbers of records ("<spectrum_query spectrum= ...") for the two different databases, even though the number of records in the pepXML input was the same?
  • Isn't the definition of 'Spectra searched' used by Petunia misleading? It's not the number of input spectra to the search engine.
  • Presumably the loss of spectra between the input to comet and the output is due to loosing those spectra with absolutely no match to the database or decoys due to some internal threshold on the score for PSMs.

 

Any help would be hugely appreciated!

 

Thanks,

Alastair

--
You received this message because you are subscribed to the Google Groups "spctools-discuss" group.

To unsubscribe from this group and stop receiving emails from it, send an email to spctools...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "spctools-discuss" group.

To unsubscribe from this group and stop receiving emails from it, send an email to spctools...@googlegroups.com.

Reply all
Reply to author
Forward
0 new messages