Confusion regarding how to calculate Total SC (ie n

Gabriel Gray

unread,

Nov 2, 2011, 3:21:36 PM11/2/11

to spctools...@googlegroups.com

We're a little confused about "n_instances" (and indirectly " initial_probability" and "nsp_adjusted_probability") in the prot.xml files: Specifically, if we ran PeptideProphet, iProphet, and ProteinProphet with MinProbs of 0 (instead of the default of 0.05), would that mean that the n_instances is including in its calculation the peptide-spectra matches (PSMs) which originally had peptide probabilities (in the pep.xml file) of something as low as 0? If so, does that mean that simply because 1 peptide in one of the pep.xml files had a high probabilty (as indicated by the "initial_probability" field in the prot.xml file), then all other occurrences of that same peptide sequence now automatically gets counted in the prot.xml's "n_instnaces" field, even if the oher occurrences of that peptide were of extremely low priobability (eg 0.01)? Isn't this inacurate? That is, should we perhaps have used a value of MINPROB that is not 0 (in either peptideProphet, iProphet, or ProteinProphet), such as say 90%?

Or, would it be equally acceptable to bring in all the data (e.g,. MinProb =0) but then load the results in some sort of relational database and then dynamically re-calculate n_instances based on the PSM that are NOT filtered out (e.g., "where PeptideProphet PSM Probability > 90%") or would that ignore the benefit of the "nsp_adjusted_probability" calculation?

David Shteynberg

unread,

Nov 2, 2011, 6:51:18 PM11/2/11

to spctools...@googlegroups.com

The n_instances is an old value that was a pre-iProphet attempt to
take into account the number of observations of a given peptide. It
simply computes the sum of probabilities of all spectra matching to a
given peptide. The statistic is calculated but in no way affects the
protein probability unless you enable the ProteinProphet INSTANCES
model, which you don't need to enable when using ProteinProphet on
iProphet data.

> --
> You received this message because you are subscribed to the Google Groups
> "spctools-discuss" group.
> To view this discussion on the web visit
> https://groups.google.com/d/msg/spctools-discuss/-/qZKanml8SDgJ.
> To post to this group, send email to spctools...@googlegroups.com.
> To unsubscribe from this group, send email to
> spctools-discu...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/spctools-discuss?hl=en.
>

Gabriel Gray

unread,

Nov 2, 2011, 7:26:38 PM11/2/11

to spctools...@googlegroups.com

So, to derive the "total SC" for a given protein, how should that be done? Presumably, we should count all the matching peptides in the pep.xml files that are above a user-specified iProphet probability threshold while ignoring completely the "initial_probability" and "nsp_adjusted_probability" fields from the prot.xml file?

Reply all

Reply to author

Forward

Confusion regarding how to calculate Total SC (ie n_instances)

Gabriel Gray

David Shteynberg

Gabriel Gray