We're a little confused about "n_instances" (and indirectly " initial_probability" and "nsp_adjusted_probability") in the prot.xml files: Specifically, if we ran PeptideProphet, iProphet, and ProteinProphet with MinProbs of 0 (instead of the default of 0.05), would that mean that the n_instances is including in its calculation the peptide-spectra matches (PSMs) which originally had peptide probabilities (in the pep.xml file) of something as low as 0? If so, does that mean that simply because 1 peptide in one of the pep.xml files had a high probabilty (as indicated by the "initial_probability" field in the prot.xml file), then all other occurrences of that same peptide sequence now automatically gets counted in the prot.xml's "n_instnaces" field, even if the oher occurrences of that peptide were of extremely low priobability (eg 0.01)? Isn't this inacurate? That is, should we perhaps have used a value of MINPROB that is not 0 (in either peptideProphet, iProphet, or ProteinProphet), such as say 90%?
Or, would it be equally acceptable to bring in all the data (e.g,. MinProb =0) but then load the results in some sort of relational database and then dynamically re-calculate n_instances based on the PSM that are NOT filtered out (e.g., "where PeptideProphet PSM Probability > 90%") or would that ignore the benefit of the "nsp_adjusted_probability" calculation?