Help with PeptideProphet statistics

370 views
Skip to first unread message

ha...@uow.edu.au

unread,
May 12, 2008, 11:42:55 PM5/12/08
to spctools-discuss
Hi all,

I've been using the TPP for processing large amounts of data and all
was well. I've submitted a paper using the data generated with the TPP
and a reviewer has come back with a comment I'm not sure how to deal
with. I was hoping someone who understands the statistical analysis
PeptideProphet employs could assist me answering the query raised by
the reviewer.

To fill in the blanks; I processed all the data through the TPP
Petunia interface. I used the default setting of 0.05 for "Filter out
results below this PeptideProphet probability" when running
PeptideProphet. I then looked at the statistics PeptideProphet returns
and used a peptide score so that the error level for the peptides left
in my (ProteinProphet) list were equal to the 0.05 level.

So when I wrote the manuscript, I said that peptide error level was
set at 0.05. The reviewer came back with the following reply:

"In the TPP, the default minimum peptide probability from Peptide
Prophet is 0.05. Any peptides with lower probabilities of being
correctly identified are filtered out before the Protein Prophet
analysis to reduce files sizes and speed up analysis. The estimated
probabilities are based on the relative heights of correct and
incorrect score distributions at a given discriminant function score.
These probabilities can be used to calculate “local” error (or false
discovery) rates, but they are not the same as the more commonly
quoted peptide false discovery rate. The later quantity is the
estimated proportion of incorrect peptides among all valid peptides,
and is most easily determined using reversed or randomized databases."

My understanding is that the PeptideProphet statistics are superior to
using a reversed, or randomised database to determine FPR, hence my
desire to use the TPP (amongst other reasons).

Can somebody please explain what I am missing and how I can respond to
the reviewer? All help and suggestions are greatly appreciated.

Peter

ane...@yahoo.com

unread,
May 13, 2008, 10:35:36 AM5/13/08
to spctools-discuss
The default min probability of 0.05 is not relevant, it's just for
computational convenience. What is relevant is the minimum probability
threshold that you eventually applied to your data (I would guess it
is somewhere between 0.5 and 0.9). The probabilities as computed by
peptideProphet or PropteinProphet should be interpreted as local FDR,
for additional discussion see

H. Choi and A.I. Nesvizhskii, False discovery rates and related
statistical concepts in mass spectrometry-based proteomics. J.
Proteome Res. 7, 47-50 (2008)

The (global) FDR can be computed directly from local FDR, and is also
reported in the output files (the ROC curves). So you would want to
report the minimum probability used for filtering,as well as the
corresponding error rate (FDR) from the ROC curve.

The decoy-based method of estimating FDR is an alternative way. Just
because the decoy-based method is very easy, it does not man that it
is necessarily better. Both methods have advantages and disadvantages,
and in fact can be combined within a single model for improved FDR
estimation, see other related papers from my group in JPR January
issue.

Alexey Nesvizhskii

Brendan MacLean

unread,
May 14, 2008, 10:40:02 AM5/14/08
to spctools-discuss
In the simplest terms, you should clarify that you used a
PeptideProphet probability cut-off of P (the number between 0.9 and
0.5 Alexey mentions) to attain a PeptideProphet predicted FDR of 5%
(i.e. 0.05). It sounds like at least the reviewer may be thinking you
used a PeptideProbability cut-off of 0.05 which would certainly give
you a much higher predicted FDR.

Note that using a cut-off with PeptideProphet predicted FDR of 5% as
input to ProteinProphet does not guarantee a ProteinProphet FDR of
5%. Also, the reviewer may still prefer to see Prophet predictions
corroborated by statistics based on the use of a forward-reversed
FASTA file, which appears to me to be gaining some ground as the
standard for calculating FDR, as it is simple and transferable across
all tool-sets.

--Brendan
Reply all
Reply to author
Forward
0 new messages