NSP model in iProphet/ProteinProphet; model vs decoy based FDR in ProteinProphet

Rene B

unread,

Dec 18, 2013, 10:25:41 AM12/18/13

to spctools...@googlegroups.com

Hi all,

I am running PeptideProphet, iProphet and ProteinProphet (TPP 4.6.3) on Q Exactive data searched with Comet, Myrimatch and OMSSA. I wondered if the NSP model should be disabled in ProteinProphet when it is enabled in iProphet? I got confused because it seems Petunia enables the NSP model both in iprophet and proteinprophet by default (ie. when xinteract runs with the -ip option).

Another question is that when I compare decoy estimated protein FDRs to ProteinProphet modelled FDRs, ProteinProphet seems a bit optimistic (decoy based FDR of 0.1% corresponds to ~0.02% model FDR). This is with NSP enabled in iProphet and disabled in ProteinProphet. How should I deal with discrepancy, ie. should I take the decoy or probability based FDR to select a probability cutoff?

I have attached some examples for a search with myrimatch only. These are the commands I used to generate the graphs:

xinteract -Nmyrimatch.pep.xml -OAP -p0 -a%ExperimentFolder% -dDECOY0 -E%ExperimentTag% *.pep.xml

InterProphetParser myrimatch.pep.xml myrimatch.ipro.pep.xml

ProphetModels.pl -i myrimatch.ipro.pep.xml -k -r 0.25 -d "DECOY1"

ProteinProphet myrimatch.ipro.pep.xml myrimatch.prot.xml IPROPHET NONSP

ProtProphModels.pl -k -r 0.25 -d DECOY1 -i myrimatch.prot.xml

The graphs are:

myrimatch_all.ipro.pep_FDR_10pc: PeptideProphet/iProphet decoy vs model FDR, all models enabled

myrimatch_nonsp.ipro.pep_FDR_10pc: PeptideProphet/iProphet decoy vs model FDR, NSP model disabled in iProphet

myrimatch_nonsp.prot_FDR_5pc: ProteinProphet decoy vs model FDR, NSP model disabled in ProteinProphet

myrimatch_all.prot_FDR_5pc: ProteinProphet decoy vs model FDR, NSP model enabled in iProphet and ProteinProphet

Thanks in advance!

Kind regards,

Rene

myrimatch_all.ipro.pep_FDR_10pc.png

myrimatch_all.prot_FDR_5pc.png

myrimatch_nonsp.ipro.pep_FDR_10pc.png

myrimatch_nonsp.prot_FDR_5pc.png

David Shteynberg

unread,

Dec 18, 2013, 2:13:27 PM12/18/13

to spctools-discuss

Hello Rene

Thanks for using the tools and double checking your work.

In my tests I have found that applying the NSP model at the iProphet step greatly improves performance on peptide level. And applying the NSP model at the ProteinProphet step improves performance on the protein level. The two models are somewhat different since the ProteinProphet model considers grouping information while the iProphet model doesnt. I have not found the two to interfere.

A safe and conservative approach so would look at the conservative estimate e.g. ProteinProphet probability cutoff to give me 1% error with decoys or 1% error with the model which ever is more conservative.

When the model tends to underestimate error on protein or peptide level this is usually stemming from underestimation at the spectrum level by PeptideProphet and can be controlled by the CLEVEL={value} option for PeptideProphetParser -c{value} for xinteract. Setting this to a number greater than zero like .5 or 1 or 2 will serve to make the model more conservative overall, a negative value will have opposite effect which will carry through to the peptide and protein levels.

Also I am curious why you set decoy rate to 0.25?

Best,
David

--
You received this message because you are subscribed to the Google Groups "spctools-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to spctools-discu...@googlegroups.com.
To post to this group, send email to spctools...@googlegroups.com.
Visit this group at http://groups.google.com/group/spctools-discuss.
For more options, visit https://groups.google.com/groups/opt_out.

Rene B

unread,

Dec 19, 2013, 3:01:53 AM12/19/13

to spctools...@googlegroups.com

Hi David,

Thank you for your quick reply and suggestions. The decoy ratio is set to 0.25 as I use two sets of decoys, one for modeling and the other for validation. Each decoy set corresponds to 25% of entries in the database.

Kind regards,

Rene

Op woensdag 18 december 2013 20:13:27 UTC+1 schreef David Shteynberg:

Dave Trudgian

unread,

Feb 6, 2014, 3:17:12 PM2/6/14

to spctools...@googlegroups.com

David,

I just saw Rene's note about the -r 0.25 decoy ratio. I'm similarly using 2 decoy sets (50% target, 25% DECOY_1, 25% DECOY_2) but with -r 0.5. I had assumed the ratio was supposed to be specified as decoys_used/targets and there are twice as many targets as DECOY_2s in my case so -r = 0.5.

Having looked in ProphetModels.pl I'm now not so sure.... the estimation if -r isn't supplied is pp_prob_array / pp_prob_array_decoy for hits with p<=0.02, but I'm not sure whether this is total/decoy or target/decoy.

Can you confirm which approach is correct?

Not a huge problem for me if -r 0.5 is wrong, as am computing and using decoy stats elsewhere, external to TPP. Would just mean the plots from ProphetModels.pl that are being saved are wrong.

Thanks,

Dave Trudgian

David Shteynberg

unread,

Feb 6, 2014, 4:39:20 PM2/6/14

to spctools-discuss

Hi Dave,

r is computed as Decoy / Total with less than 2% probability. There is a detailed discussion of this in the iProphet paper.

If you have a DB of 50% target 50% decoy and none of the decoys are discarded (which is one way to use your 50%T 25%D1 25%D2) then r = 0.5

If you discard half of the decoys e.g. D1 is used for modelling and DECOYPROBS is disabled (in which case all D1 get probability 0) and all D1 should be excluded from the analysis by ProphetModels.pl . Then the remaining decoys D2 will constitute roughly 1/3 of the remaining database entries and r will be roughly one third ( 25/75 = 0.3333) . In fact, r is related not only to the protein counts but to the distinct peptides in each set of the Database entries, and as the original database and the decoys may have degenerate (repeated) peptides, that's why it will be only roughly that percentage and vary depending on the database, how the decoys are constructed and how indepent are D1's decoys from D2 decoys.

The iProphet paper carries more info on this than I can put in an email, so that's a good reference for this.

Cheers,

-David

Dave Trudgian

unread,

Feb 6, 2014, 5:20:17 PM2/6/14

to spctools...@googlegroups.com

David,

Thanks for the pointer to the iProphet paper - very useful. I'd just been thinking over a coffee about r=1/3 if ProphetModels could ignore the first decoy set. Disabling DECOYPROBS on the DECOY1 set hadn't come into my head. I'd worried in the past about the degeneracy issue, but have just ignored it so far.

I have been working off the decoy probs downstream to report estimated FDRs both at model fitting (DECOY1) and on the independent set (DECOY2), with the latter used for filtering, and the former just as info for the curious. I guess I can disable DECOYPROBS and just compute FDR on the independent set, or modify ProphetModels.pl so it can ignore specified (DECOY1) sequences in its computations. That way the ProphetModels.pl output is going to be consistent with the downstream stuff.

I guess the only thing I'm left wondering is whether the ProphetModels.pl help statement might confusing to others as well? I've always considered a 'ratio' to generally between two distinct sets, i.e. target:decoy rather than a subset vs total. Maybe it could be explicitly stated?

-r <NUM> -- Specify decoy ratio (decoy/total sequences). Will guess from P<0.001 hits if not specified.

Thanks again.

Dave T

Eric Deutsch

unread,

Feb 7, 2014, 4:59:52 PM2/7/14

to spctools...@googlegroups.com, Eric Deutsch

Maybe “decoy fraction” is the right term for this concept?

David Shteynberg

unread,

Feb 7, 2014, 6:16:48 PM2/7/14

to spctools-discuss, Eric Deutsch

I like to refer to it as the "decoy rate" as it is the rate at which decoys are acquired among matches drawn at random from the database.

-David

Eric Deutsch

unread,

Feb 10, 2014, 8:48:09 PM2/10/14

to spctools-discuss, David Shteynberg

Sure, decoy rate sounds good, much better than the decoy ratio.

Thanks,

Eric

Reply all

Reply to author

Forward