Peptide prophet showing all zero probability

63 views
Skip to first unread message

sudarshan kumar

unread,
Jul 23, 2024, 7:56:19 AMJul 23
to spctools-discuss
Hi, 
Can you please explain why?
When I searched with tandem, IT shows all good hits as I expected from the sample.
But when I used peptide prophet, The result is all 0 probability?
I tried merging 4 files and individually also. But in all cases the probability comes to 0.

where is the problem>

Best

David Shteynberg

unread,
Jul 23, 2024, 1:16:20 PMJul 23
to spctools-discuss
Hello Sudarshan,

Although it is not possible to tell exactly what happened here without more information, the usual suspects are:

1.  Data Quality
2.  Incorrect Database
3.  Incorrect Parameters

I just ran a test of X!Tandem locally on the ISB18 dataset.  The search ran and the TPP validation generated PSMs with positive probability values.  If you are able to provide more information about your specific analysis or post your dataset online for download I would be happy to take a closer look.

Cheers!
-David




--
You received this message because you are subscribed to the Google Groups "spctools-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to spctools-discu...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/spctools-discuss/e74da627-d489-453c-b480-96453fbdc6a5n%40googlegroups.com.

sudarshan kumar

unread,
Jul 24, 2024, 3:45:20 AMJul 24
to spctools...@googlegroups.com
Thank you so much  David,
Please find the screenshot of what I am getting. 
I am sharing the file also for you to check please.
Best regards,
Sud




--
-------------------------------------------------------------------
The real voyage of discovery consists not in seeking new lands but seeing with new eyes. — Marcel Proust

Dr. Sudarshan Kumar
(Fulbright-Nehru Fellow)
(B.V.Sc.& A.H., M.V.Sc., PhD.)
Sr. Scientist
Animal Biotechnology Center
(Proteomics and Cell Biology Lab.)
National Dairy Research Institute Karnal, 132001
Haryana, India
Contact No 09254912456
URL www.ndri.res.in

New Microsoft Worem'd Document.docx

sudarshan kumar

unread,
Jul 24, 2024, 5:19:52 AMJul 24
to spctools...@googlegroups.com

please find the mzml file shared through drive

David Shteynberg

unread,
Jul 24, 2024, 10:38:51 AMJul 24
to spctools-discuss
Can you compress the mzML file, the search database and your search together and reshare?  I will need at least those file to try to reproduce your process.

Cheers!

David Shteynberg

unread,
Jul 24, 2024, 10:50:04 AMJul 24
to David Shteynberg, spctools-discuss
Hello again Sud!

From your screen shots I could tell you are running version 6.3.2 of the software.  I think the issue you are seeing might be a solved one in the latest release 7.1.0.  Are you able to update and try the latest version?

Thanks!
-David

sudarshan kumar

unread,
Jul 25, 2024, 1:47:05 AMJul 25
to spctools...@googlegroups.com
Ok let me shift to TPP 7.1.0 and then I will run the test again. and will let you know
Best regards,
Sud

sudarshan kumar

unread,
Jul 25, 2024, 2:31:32 AMJul 25
to spctools...@googlegroups.com
Hi David,
I tried 7.1.0 also. and I see that the problem is still the same (0 probability to hits returned and that too all with irrelevant proteins)  for tandem searched data, and comet searched data both. 
1. I see that both the search (comet and tandem) are giving me all good hits. 
2. But at peptide prophet it returns me all 0 probability bad hits (undesirable proteins identified). 

sudarshan kumar

unread,
Jul 25, 2024, 7:12:06 AMJul 25
to spctools...@googlegroups.com
Hi, David,
Can you please also share the comet param version 2024. Or please let me know where i can download it from.
Regards
Sud

sudarshan kumar

unread,
Jul 25, 2024, 7:15:28 AMJul 25
to spctools...@googlegroups.com
When I used comet param 2023 in TPP 7.1.0. It aborts the search saying that your comet is of 2023 version. 

sudarshan kumar

unread,
Jul 25, 2024, 8:14:23 AMJul 25
to spctools...@googlegroups.com
swissprotkb_taxonomy_id_9913_AND_model_or_2023_08_04.fasta.txt

David Shteynberg

unread,
Jul 25, 2024, 10:18:30 AMJul 25
to spctools-discuss
also required are your search params and db

David Shteynberg

unread,
Jul 25, 2024, 1:38:20 PMJul 25
to David Shteynberg, spctools-discuss
Hello Sud,

Thank you for sharing the problematic dataset!  I was able to download the data, search it with both comet and tandem, and generate some non-zero probability results.   Since there are so few correct results in this data, about 2% of the PSMs, it makes it difficult for PeptideProphet to select out the correct results, especially with out the aid of some decoy true negatives.   Yet, I was still able to get a few PSMs with your basic analysis database and selecting the  NEGGAMMA distribution for negatives analysis, and a few more by lowering c-level to 0.5 (or 0).

You should always be careful selecting minimum probability thresholds or the false positives can pile up.  Here, you can see I added a few decoy to your database, which is highly recommended to get another estimate on the remaining errors and biases in the data.  When adding decoys you might also consider adding common contaminant proteins to the database as well (I have not done that in this case.)  Adding decoys also allow you to take advantage of the semi-parametric modeling in PeptideProphet which can be more sensitive and extract more correct results from the data set.   Here are the parametric NEGGAMMA model results:

image.png

As you see NEGGAMMA models struggle to pull together over 30 correct PSMs:

image.png

And when I run iProphet on this result these map to only about 26 peptide sequences:

PastedGraphic-3.png




In the following optimized analysis I added two sets of deBruijn randomized decoys to your database.  I then did an X!Tandem search (refinement enabled to boost speed and sensitivity.). Then I used the TPP PeptideProphet with DECOY0 as “known” decoy, DECOY1 as “unknown” decoy, semi-parametric model, bandwidth of 3, clevel of 2.  PeptideProphet models and PSM-level results:

PastedGraphic-2.png


Also I enabled the use of iProphet to obtain these Peptide-level models and results on the data you sent:

PastedGraphic-1.png



As you can see, even with the “optimized” analysis, at most we can identify about 100 PSMs and about 80 peptides, in this mzML file containing 4117 spectra, which are all 2+ charge.

Incidentally, I did this analysis on a Mac.  I was also able to comet search and process this dataset on my Android phone, where I have setup the TPP, although I don’t recommend you try that at this time ;)

Cheers,
-David


On Jul 25, 2024, at 7:22 AM, David Shteynberg <david.sh...@isbscience.org> wrote:

Please create a new comet params file for the comet search in the new release which has updated comet. You can do this using the Files tab in petunia and create new file.

sudarshan kumar

unread,
Jul 26, 2024, 2:25:09 AMJul 26
to spctools...@googlegroups.com
Hi David, 
Thank you for doing the analysis at your end. 
1. I see that in the first image "interact neggamma. pepxml" all the hits are assigned to correct proteins. I know it as I was expecting these proteins from the sample. 

2. I wonder why the model for +2 charge is fitting the observed ions to negative model 

3. Your observation that "As you can see, even with the “optimized” analysis, at most we can identify about 100 PSMs and about 80 peptides, in this mzML file containing 4117 spectra, which are all 2+ charge" may be correct given the stringency of wide tools of statistics applied. But it will be interesting to see- are these qualified 80 peptides belong to the same proteins as indicated in the raw comet or tandem search? My statement- comet search without prophet control - shows very correct hits.

4. I am not able to find a 2024 version of comet.param even when I tried to create a new one in TPP 7.1.0.  If I copy an old version of comet.param (it is of 2023). and the seearch aborts.

5. I am not able to do it through command prompt either. Kindly send a copy (notepad) of comet.param version 2024 file to my mail which I can edit as per my need. 

Best regards,
Sud


David Shteynberg

unread,
Jul 26, 2024, 4:51:34 PMJul 26
to spctools-discuss
Hello Sud,

1. I see that in the first image "interact neggamma. pepxml" all the hits are assigned to correct proteins. I know it as I was expecting these proteins from the sample. 

I am glad this meets your expectations, but the goal is to give your PSMs high probabilities when they are correct and low probabilities when they are not correct.  Sometimes this might not meet your expectations, for example some of your spectra might be from peptides of common contaminant proteins, but unless your database includes common contaminant proteins any of those spectra will be matched to an incorrect protein.


2. I wonder why the model for +2 charge is fitting the observed ions to negative model 

This is because the positive distribution is tiny compared to the negative distribution.  Here it is when we zoom in:

image.png

3. Your observation that "As you can see, even with the “optimized” analysis, at most we can identify about 100 PSMs and about 80 peptides, in this mzML file containing 4117 spectra, which are all 2+ charge" may be correct given the stringency of wide tools of statistics applied. But it will be interesting to see- are these qualified 80 peptides belong to the same proteins as indicated in the raw comet or tandem search? My statement- comet search without prophet control - shows very correct hits.

I ran ProteinProphet on the “optimized” analysis.  Here is the full set of proteins mapped by the peptide IDs:
PastedGraphic-1.png

Let me know should additional questions remain.

Cheers!
-David

On Jul 25, 2024, at 11:23 PM, sudarshan kumar <kumarsu...@gmail.com> wrote:

Hi David, 
Thank you for doing the analysis at your end. 
1. I see that in the first image "interact neggamma. pepxml" all the hits are assigned to correct proteins. I know it as I was expecting these proteins from the sample. 

2. I wonder why the model for +2 charge is fitting the observed ions to negative model 

3. Your observation that "As you can see, even with the “optimized” analysis, at most we can identify about 100 PSMs and about 80 peptides, in this mzML file containing 4117 spectra, which are all 2+ charge" may be correct given the stringency of wide tools of statistics applied. But it will be interesting to see- are these qualified 80 peptides belong to the same proteins as indicated in the raw comet or tandem search? My statement- comet search without prophet control - shows very correct hits.

4. I am not able to find a 2024 version of comet.param even when I tried to create a new one in TPP 7.1.0.  If I copy an old version of comet.param (it is of 2023). and the seearch aborts.

5. I am not able to do it through command prompt either. Kindly send a copy (notepad) of comet.param version 2024 file to my mail which I can edit as per my need. 

Best regards,
Sud

On Thu, Jul 25, 2024 at 11:08 PM David Shteynberg <dshte...@systemsbiology.org> wrote:
Hello Sud,

Thank you for sharing the problematic dataset!  I was able to download the data, search it with both comet and tandem, and generate some non-zero probability results.   Since there are so few correct results in this data, about 2% of the PSMs, it makes it difficult for PeptideProphet to select out the correct results, especially with out the aid of some decoy true negatives.   Yet, I was still able to get a few PSMs with your basic analysis database and selecting the  NEGGAMMA distribution for negatives analysis, and a few more by lowering c-level to 0.5 (or 0).

You should always be careful selecting minimum probability thresholds or the false positives can pile up.  Here, you can see I added a few decoy to your database, which is highly recommended to get another estimate on the remaining errors and biases in the data.  When adding decoys you might also consider adding common contaminant proteins to the database as well (I have not done that in this case.)  Adding decoys also allow you to take advantage of the semi-parametric modeling in PeptideProphet which can be more sensitive and extract more correct results from the data set.   Here are the parametric NEGGAMMA model results:

<image.png>

As you see NEGGAMMA models struggle to pull together over 30 correct PSMs:

<image.png>

And when I run iProphet on this result these map to only about 26 peptide sequences:

<PastedGraphic-3.png>




In the following optimized analysis I added two sets of deBruijn randomized decoys to your database.  I then did an X!Tandem search (refinement enabled to boost speed and sensitivity.). Then I used the TPP PeptideProphet with DECOY0 as “known” decoy, DECOY1 as “unknown” decoy, semi-parametric model, bandwidth of 3, clevel of 2.  PeptideProphet models and PSM-level results:

<PastedGraphic-2.png>


Also I enabled the use of iProphet to obtain these Peptide-level models and results on the data you sent:

Reply all
Reply to author
Forward
0 new messages