PeptideProphet with X!Tandem

152 views
Skip to first unread message

Farshad AbdollahNia

unread,
Feb 24, 2024, 5:50:55 PMFeb 24
to spctools-discuss
Hello experts,

I am having issues with X! Tandem in combination with PeptideProphet and iProphet and I was wondering if someone could help, please.

As a background, we (our lab) have been using X!Tandem on its own using its expectation value cut-off for the past years, which seemed to work fine with our Sciex Triple-TOF instrument. More recently, we have switched to using Orbitrap instruments which perform a lot better and I have realized that Comet, whether alone or combined with other TPP peptide validation tools (PeptideProphet/iProphet), gives us great coverage. But I have also noticed that X!Tandem results suffer a huge loss when processed with PeptideProphet/iProphet, and I am wondering if I am doing something wrong. Here is an example:

First, I understand that Tandem must be set up to output all search results for the Prophets to work. So I make sure to set up the output, results parameter to "all", which should override the max valid e-value:

2024-02-24_13-37.png

This results in Tandem reporting ~40k valid models:

2024-02-24_13-31.png
Running the Prophets, it seems that PeptideProphet fails to properly capture and fit the two peaks of false and valid model distributions, especially for charge +2 and +3 ions (btw, is there a way to extend the graphs to fval > 10?). Maybe because of this, I get only ~17k PSMs at 1% error rate:

2024-02-24_13-15.png

2024-02-24_13-27.png

On the other hand, if I analyze the same data using Comet, I get well-separated score dists and over 47k validated PSMs:

2024-02-24_13-07_1.png
2024-02-24_13-07.png

Some more surprising observations: 
  • When I set "output, results" to "valid", X!Tandem still reports about the same number of valid models (39935). This indicates that X!Tandem does not actually underperform by 64% (17k vs 47k), but rather only 15% (40k vs 47k).
  • If I set "output, results" to "all" and change  "output, maximum valid expectation value" to "1000.0", X!Tandem reports ~104k valid models. This is confusing because it suggests that the "all" option does not override the max e-value cut-off. Either way, the PeptideProphet output (score distributions and num correct) still looks the same with these settings.

I am concerned that I am not setting up the Tandem workflow correctly, or that the tools are not actually working right. I would appreciate any feedback.

Here is a link to the analysis input files: 
If I have forgotten something or if you want the full analysis files, please let me know.

Thank you,
Farshad

David Shteynberg

unread,
Feb 26, 2024, 1:08:43 PMFeb 26
to spctools...@googlegroups.com
Hello Farshad,

Thank you for using our software, asking great questions and providing nice examples to work with!  I was able to download an process your data.  I made a few changes to the parameters you were using and I added decoys to your database (although I didn't end up using them for validation.)  However, given my changes I was able to get much higher returns of PSMs from the Tandem search.  


I think the main issue you are seeing is setting you expect score cutoff too low at 0.01, which essentially reduces the sensitivity and ensures only high scoring results will be returned by Tandem.  TPP is then forced to fit a bimodal model to only "very good" results and the sensitivity is drastically reduced!  I changed this to 1e6 to ensure all results are returned and TPP will model all of these to gain sensitivity.  I also changed the search setting "spectrum, parent monoisotopic mass isotope error" to "yes" and searching for semi-tryptic peptides with "protein, cleavage semi" set to "yes."  Then I ran the results through TPP PeptideProphet using the DECOY in the database as a secondary validation with the following result:

image.png
image.png

You can see in the ROC image on the right, that according to the decoy estimated validation there are just under 50K PSMs identified at an error rate of 1% (according to the PeptideProphet model there are 48,607 PSMs at 1% error).  This seems in the same ballpark as the comet results you were comparing in your email.    You can further improve the validation by running iProphet on the results.

Hopefully you are able to change your parameters and observe the same effect when you run this on your system.

Cheers!
-David



--
You received this message because you are subscribed to the Google Groups "spctools-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to spctools-discu...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/spctools-discuss/CAFyEx3wep744mMekGGrzD6CeM%3DxnmiGCfH8TMfs8guuFQD90SQ%40mail.gmail.com.

Farshad AbdollahNia

unread,
Feb 26, 2024, 3:33:00 PMFeb 26
to spctools...@googlegroups.com
Hi David,

Thank you for taking the time to look into this and for your helpful tips. I was able to reproduce your results by changing the settings you mentioned. But I still have some concerns:

First, are you confirming that setting the "output, results" to "all" does not remove the cap on expect score, despite what the documentation says?

Second, I was trying to do a rather side-by-side comparison with Comet. Setting the parent isotope error to yes makes the searches more comparable. But my Comet search did not include semi-tryptic peptides (comet params file is attached). So, to keep it a fair comparison, I only set the expect cut-off to 1e6 and the parent isotope error to yes. With this, I get 29,877 PSMs at 1% error. More importantly, the charge +2 fval distribution graph (below) still clearly shows a bad bimodal fit to the data. To my understanding, a correct model fit is essential for PeptideProphet to correctly estimate the probabilities. 

2024-02-26_10-49.png

With semi-tryptic peptides included (as you suggested), the model still doesn't seem to capture the score distribution for 2+ charge very well, but perhaps somewhat better (we can't really tell unless we see the entire distribution):

2024-02-26_11-00.png

Of course, in the latter case I get 49,374 PSMs at 1% error as you did. But with semi-tryptic peptides included, Comet also improves from 47k to 51k PSMs (not shown here).

So, it seems that we are still looking at a difference of 30k vs 47k between X!Tandem and Comet when semi-tryptic peptides are not included. Also, when semi-tryptic peptides are included, the jump from 30k to 49k PSMs for X!Tandem doesn't seem genuine and not corroborated by Comet. This all most likely results from PeptideProphet failing to properly model the score distribution from X!Tandem especially for charge +2 ions. Is this a correct conclusion?

Thanks,
Farshad


comet.params.high-high

David Shteynberg

unread,
Feb 26, 2024, 7:50:01 PMFeb 26
to spctools...@googlegroups.com
Hello Farshad,

Please find my responses to your questions below.

First, are you confirming that setting the "output, results" to "all" does not remove the cap on expect score, despite what the documentation says?

Actually no! I retested with the max expect score of 0.01 and "output,results" "all" and no filter was applied in the tandem output so "all" does keep all the PSMs in the file.

Second, I was trying to do a rather side-by-side comparison with Comet. Setting the parent isotope error to yes makes the searches more comparable. But my Comet search did not include semi-tryptic peptides (comet params file is attached). So, to keep it a fair comparison, I only set the expect cut-off to 1e6 and the parent isotope error to yes. With this, I get 29,877 PSMs at 1% error. More importantly, the charge +2 fval distribution graph (below) still clearly shows a bad bimodal fit to the data. To my understanding, a correct model fit is essential for PeptideProphet to correctly estimate the probabilities. 

The 2+ fvalue distribution that you suggest shows a bad bimodal fit, actually shows that the PSM of 2+ results are not really bimodal per-se but are a single smeared distribution across the fvalue range.  The use of additional models in PeptideProphet from the semi-tryptic search allows it some traction in the classifier to still separate the two underlying distributions of positive and negative results even for the 2+ model here.   

That said, I searched your data again with only fully tryptics only and parent isotope error set to yes.  I then reprocessed with PeptideProphet only enabling ACCMASS model with PPM mode.  The model uses an Extreme Value distribution for negative PSM and Gaussian for positive PSM models.  I used the decoys in the database as a representation of ground-truth to which I can compare the PeptideProphet probabilites.  Results are pasted below:
image.png


image.png
image.png


So, it seems that we are still looking at a difference of 30k vs 47k between X!Tandem and Comet when semi-tryptic peptides are not included. Also, when semi-tryptic peptides are included, the jump from 30k to 49k PSMs for X!Tandem doesn't seem genuine and not corroborated by Comet. This all most likely results from PeptideProphet failing to properly model the score distribution from X!Tandem especially for charge +2 ions. Is this a correct conclusion?


The FDR and ROC plots show that when I included the decoy-counting it actually has remarkably close agreement between the PeptideProphet model estimates and the decoy-based estimates, so I would argue the results are "genuine."  To further boost the performance of the classifier I would suggest to combine the comet analysis and the tandem analysis using iProphet to get the best possible validation, especially at the peptide level.

Please write back if you have further questions.

Cheers!

-David






Farshad AbdollahNia

unread,
Feb 26, 2024, 10:25:46 PMFeb 26
to spctools...@googlegroups.com
Hi David,

Thank you for your responses. It helped clarify things a lot. I agree that the results you obtained for fully tryptic peptides, i.e. the 47k PSMs and the ROC plots, look accurate/genuine and consistent with Comet. But I am having difficulties reproducing what you got. I tried to include decoys and I got the message "Mixture model quality test failed for charge (2+)." from PeptideProphet with no model fit for 2+ and anomalous ROC results (images below):
2024-02-26_18-18.png
2024-02-26_17-54.png
2024-02-26_19-10.png

I am not sure what I am doing wrong (I have successfully done analyses with decoys in the past using other samples). Would you be able to take a look, please? I have put the analysis outputs and the commands log in the folder linked below (let me know if anything is missing):

Many thanks!

Farshad

David Shteynberg

unread,
Feb 27, 2024, 6:20:04 PMFeb 27
to spctools...@googlegroups.com
Hello Farshad,

It looks like the automatic quality control filter in PeptideProphet is invalidating the model in your analysis.  This can be by-passed in xinteract by enabling force fitting of the model, also you can combining this with option with the option to discard the charge state you don't need (e.g. here shown with ignoring 1+.)

image.png
  

However, my analysis was slightly different from yours I did not specify the decoy at the xinteract modeling step and in my analysis the quality filter was not triggered.  Instead of specifying the decoys on the xinteract page for use by PeptideProphet, I used the Decoy Peptide Validation.  This tool compares the model probabilities and corresponding error rates from PeptideProphet or iProphet to decoy-based error calculations.
image.png

Cheers!
-David



Farshad AbdollahNia

unread,
Feb 27, 2024, 10:58:00 PMFeb 27
to spctools...@googlegroups.com
Thank you, David! I activated the by-pass (force fitting) and I think I am one step closer to solving the problem, but not fully there. The graph now shows a fitted model for charge +2, but the number of PSMs doesn't change (it was ~ 18k and it still is). The ROC plot, this time generated using Decoy Peptide Validation, still looks bad. The warning in the output log is still present, so I am wondering if the quality filter is still applied to the results. What do you think? (my tandem outputs and the fasta containing decoys are included in the shared folder, in case it helps with diagnosis)

image.png

image.png

image.png
image.png

Thanks,
Farshad


David Shteynberg

unread,
Feb 28, 2024, 1:31:43 PMFeb 28
to spctools-discuss
Hello again Farshad,

I have downloaded and checked you interact.pep.xml file and it appears you are still using the option DECOY=DECOY and no options to FORCEDISTR.  So it seems this is the interact file from one of your older analyses.  I recommend you use the option to rename the file (or move the old file somewhere else) and try with the recommended settings again.  Right now the problem of low ID rates is still due to the invalidation of the charge 2+ spectrum model.

Thanks!
-David

Farshad AbdollahNia

unread,
Feb 28, 2024, 2:42:49 PMFeb 28
to spctools...@googlegroups.com
Hi David,

Sorry I hadn't uploaded the results for the latest analysis last night. I put those files into the folder "forcedistr" within the parent shared folder. Just to be sure, I then deleted all the analysis files in the TPP workspace and ran everything again (this morning). The output looks similar and it's uploaded in "forcedistr_2". The command logs are also included. The remaining inputs (fasta and tandem params) are the same as in the parent folder.

Something unusual that I notice is that PeptideProphet attempts to produce ROC plots even without the decoy option being selected and before running Decoy Peptide Validation separately (the uploaded files include the latter step, but I can give you some without it). This is surprising because the only clue that decoys exist would be in the protein names. Does PeptideProphet try to auto-detect them?

Thanks again for taking the time to look into this.

Farshad


Farshad AbdollahNia

unread,
Feb 28, 2024, 2:55:32 PMFeb 28
to spctools...@googlegroups.com
Please disregard the 2nd paragraph in my last email. I just tried again with a different output file name and there was no ROC plot from PeptideProphet alone, so I was probably seeing them due to browser caching. But everything else looks as before (low PSM and bad ROC after decoy validation)

Thanks,
Farshad

David Shteynberg

unread,
Feb 28, 2024, 3:44:11 PMFeb 28
to spctools...@googlegroups.com
Hello Farshad,

I noticed that you are running version 6.1.0 of the software.  When I just ran the latest released version (6.3.3) it seems the number PSMs returned is much higher:
image.png
image.png

Can you please update your version and try again?

Thanks,
-David

Farshad AbdollahNia

unread,
Feb 29, 2024, 4:33:18 PMFeb 29
to spctools...@googlegroups.com
Hi David,

Today I got a chance to update to 6.3.3 and now I am able to reproduce your results -- quite a significant change since 6.1.0! This resolves the issue.

Once again, I really appreciate your help and the time you and others put into developing TPP and helping the community. We will make sure to acknowledge in any publications.

Thank you,
Farshad


Reply all
Reply to author
Forward
0 new messages