Process PEAKS pepxml in TPP

11 views
Skip to first unread message

Sergio Ciordia

unread,
Oct 9, 2024, 6:39:46 PMOct 9
to spctools-discuss
Hi,

I have been using TPP for some time now mainly to validate with PeptideProphet the results I get with various search engines and to generate a spectral library that I use in various programs. The thing is that in my lab we are now using PEAKS (v12) and the output is really good. I was wondering if it would be possible to include in TPP an analysis pipeline of the PEAKS data from the pep.xml file generated by the program.
I know it is a commercial software but the output is very good and I would be very grateful if you could consider including it like Sequest or Mascot. I think it would not be complicated since they already have the pep.xml output, it would only have to be compatible with XInteract to be able to validate peptide-spectrum matches.

If necessary I can provide a sample pep.xml file.

Thank you very much.

Best regards,
Sergio

David Shteynberg

unread,
Oct 9, 2024, 6:46:06 PMOct 9
to spctools-discuss
Hello Sergio,

Thank you for your email.  As you know PEAKS is not a search engine that we have integrated in the TPP, mainly because we have not had any requests for this feature before your email.  It is something that can be done with a bit of work and testing, but unfortunately there is currently no funding for us to continue this work.  I wish I had a more satisfying answer to give you, but perhaps, if you can forward your sample pep.xml file, we can do this if more funding becomes available.

Best,
-David

--
You received this message because you are subscribed to the Google Groups "spctools-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to spctools-discu...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/spctools-discuss/621aa092-4079-4dc8-963f-92ca8eb7e6acn%40googlegroups.com.

Sergio Ciordia

unread,
Oct 9, 2024, 6:59:19 PMOct 9
to spctools...@googlegroups.com
Hi David,

Thank you for your quick and honest response. I am sorry to hear that you have no funds to continue TPP implementation. I hope this problem will be solved.
Nevertheless, I appreciate your consideration of evaluating a sample file. This file contains what the PEAKS team calls ‘Peptide output in pepxml’. I hope it is adequate but in any case I don't think there will be a problem in getting a suitable data output for TPP if necessary. The other format that can be obtained is mzidentML in case it is of interest.

LINK: pepXML PEAKS sample

Thanks again for your reply.

Best regards,
Sergio

--------------------------
Sergio Ciordia Higuera
Proteomics Facility
National Center for Biotechnology
C\Darwin, 3
Universidad Autónoma de Madrid
Cantoblanco
28049 Madrid (Spain)
Phone: +34 91 585 4540 / 4695
Fax: +34 91 585 4506


You received this message because you are subscribed to a topic in the Google Groups "spctools-discuss" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/spctools-discuss/VptRGKWbkvM/unsubscribe.
To unsubscribe from this group and all its topics, send an email to spctools-discu...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/spctools-discuss/26147492-1510-42FC-A601-FB34DCABBC69%40systemsbiology.org.

David Shteynberg

unread,
Oct 9, 2024, 7:34:11 PMOct 9
to spctools-discuss
Thank you Sergio!  Would you mind also sending the mzML data and the sequence database that goes along with these search results from PEAKS?

Best,
-David

Sergio Ciordia

unread,
Oct 10, 2024, 6:52:39 AMOct 10
to spctools...@googlegroups.com
Thank you David. You are right, I attach in this new link the 3 files you need: mzML, fasta and pep.xml.

Link: PEAKS Dataset

I hope you can do something. Thanks anyway for your concern.


Best regards,
Sergio

--------------------------
Sergio Ciordia Higuera
Proteomics Facility
National Center for Biotechnology
C\Darwin, 3
Universidad Autónoma de Madrid
Cantoblanco
28049 Madrid (Spain)
Phone: +34 91 585 4540 / 4695
Fax: +34 91 585 4506

David Shteynberg

unread,
Oct 10, 2024, 1:40:08 PMOct 10
to spctools...@googlegroups.com
Hello Sergio,

Thanks for sending this. After taking a look I have another request.  The database you searched against seems to not contain any entrapment DECOYS to help independently validate any computed scores or probabilities.  Are you able to search this data against a database containing some decoys?  You can use the TPP decoy generator to create decoys (we have been using deBruijn randomized sequences), or should I create it and give you the database to search?

Best,
-David

Sergio Ciordia

unread,
Oct 10, 2024, 1:48:07 PMOct 10
to spctools...@googlegroups.com
Hi David,

There are several commercial softwares such as PEAKS or Proteome Discoverer that you upload the target database directly and they internally generate the Target-Decoy database. The problem is that when you launch the search and get the output, the software usually removes the decoy entries from the final results and that's why they don't appear in the list.

I understand then that we would need the same ‘pep.xml’ file but containing all the target and decoy entries. Is that what you need?.

Best,
Sergio

--------------------------
Sergio Ciordia Higuera
Proteomics Facility
National Center for Biotechnology
C\Darwin, 3
Universidad Autónoma de Madrid
Cantoblanco
28049 Madrid (Spain)
Phone: +34 91 585 4540 / 4695
Fax: +34 91 585 4506

David Shteynberg

unread,
Oct 10, 2024, 1:54:21 PMOct 10
to spctools...@googlegroups.com
Hello Sergio,

Preferably the database will contain entrapment decoys that are not known to the search algorithm as decoys, and can then be used to evaluate the performance of the search independently.  So yes, the results should contain both targets and "unknown" decoys.  Is that possible?

Thanks!
-David

Sergio Ciordia

unread,
Oct 10, 2024, 4:29:35 PMOct 10
to spctools...@googlegroups.com
As far as I know, the PEAKS export does not allow the export of ‘decoy’ data. I'm going to ask them if you can get it somehow but I don't think so (at least in pep.xml format). As I am going to ask them, do you think it would be better to get the PSMs export or with the peptide information is enough (both target and decoy)?

Thanks for everything David, I'll let you know as soon as I hear back from the support team.


Best,
Sergio

--------------------------
Sergio Ciordia Higuera
Proteomics Facility
National Center for Biotechnology
C\Darwin, 3
Universidad Autónoma de Madrid
Cantoblanco
28049 Madrid (Spain)
Phone: +34 91 585 4540 / 4695
Fax: +34 91 585 4506

David Shteynberg

unread,
Oct 10, 2024, 4:33:41 PMOct 10
to spctools...@googlegroups.com
I think we have a bit of misunderstanding here.  I am not looking for decoys that are "known" to PEAKS,  I want to include entrapment decoys that are "unknown" to PEAKS and known to us as true negatives.  Then we can utilized the true negatives to estimate error rates.  The true negative entrapment decoys should not be revealed to the search algorithm, just like the false positives among the target sequences are not known to the algorithm, but represent the error we are trying to control.

I hope this makes sense.

-David

Sergio Ciordia

unread,
Oct 10, 2024, 4:47:17 PMOct 10
to spctools...@googlegroups.com
OK, I think I understand you now. I think what you are asking me to do is to create a database containing the target and decoy entries and repeat the search. Then in the output that PEAKS gives us, we could locate the ‘unknown’ decoy. Is this what you need?.

Sergio

--------------------------
Sergio Ciordia Higuera
Proteomics Facility
National Center for Biotechnology
C\Darwin, 3
Universidad Autónoma de Madrid
Cantoblanco
28049 Madrid (Spain)
Phone: +34 91 585 4540 / 4695
Fax: +34 91 585 4506

Sergio Ciordia

unread,
Oct 10, 2024, 5:44:08 PMOct 10
to spctools...@googlegroups.com
Hi David,

I have anticipated your answer and I think this is what you asked for. I have used TPP to generate the DECOY database using these settings:

image.png

I have searched PEAKS again and uploaded the new file ‘ PEAKS_target_decoy.pep.xml’ and the decoy database to the shared folder.

Link: PEAKS Dataset

See what you think of the output.

Best,
Sergio

--------------------------
Sergio Ciordia Higuera
Proteomics Facility
National Center for Biotechnology
C\Darwin, 3
Universidad Autónoma de Madrid
Cantoblanco
28049 Madrid (Spain)
Phone: +34 91 585 4540 / 4695
Fax: +34 91 585 4506

David Shteynberg

unread,
Oct 10, 2024, 5:52:05 PMOct 10
to spctools-discuss

David Shteynberg

unread,
Oct 11, 2024, 8:18:00 PMOct 11
to spctools-discuss
Hello Sergio,

After having a much closer look at this data I am having some doubts regarding the data as it is reported in the PEAKS pepXML output.  First of all I noticed that the mass differences and the calculated neutral masses were off in the pepXML export, for example:

PastedGraphic-1.png

So I modified my tools to allow recomputing the calculated neutral masses and mass differences.  Here is this entry after I run my tool:

PastedGraphic-2.png

Although the calculated neutral mass is now correct, the mass difference for this PSM is quite large, and possibly represents another modification in this peptide that is not annotated by PEAKS for this PSM.  The mass of 16 Da matches oxidation and I see many of these types of peptides, with a massdiff of 16 and containing a Methionine.  


This leads me to suspect a bug in the pepXML export of these PSMs from PEAKS.

I was able to get quite good results running your mzML file through comet.  These are the comet modifications I used in my search:

PastedGraphic-3.png

PastedGraphic-4.png


Although I am uncertain that this will fix all of the issues I am seeing in this result, is it possible to rerun PEAKS using a similar set of modifications, I used here for comet?

Another concern was that out of over 26,000 PSMs in the pepXML file only 48 are hits to the DECOY portion of the database.  This seems a bit low to me, and I am not sure why that is.


Here is the summary from the comet + TPP analysis of this run:
image.png

Comet + PeptideProphet is finding about 24000 correct PSMs at an error-rate less than 1%.

Running iProphet on the data boost the PSM number higher, especially at the lower error-rates:

PastedGraphic-5.png


To summarize, comet+PeptideProphet+iProphet is finding about 24000 correct PSMs, mapping to almost 21000 peptide sequences in this file:

PastedGraphic-6.png


Perhaps if the PEAKS issues can be solved, a similar number can be found using that software, but for now the results coming out of PEAKS are not close.

Best,
-David






On Oct 10, 2024, at 2:43 PM, Sergio Ciordia <scio...@cnb.csic.es> wrote:

Hi David,

I have anticipated your answer and I think this is what you asked for. I have used TPP to generate the DECOY database using these settings:

Sergio Ciordia

unread,
Oct 12, 2024, 3:42:07 AMOct 12
to spctools...@googlegroups.com
Thank you very much David for such a thorough analysis, for the effort and your time. I agree with you that there seems to be a bug in the pepXML export. I'm going to write to support to see if they can fix it (and fast). Since I'm going to write to them, what exactly would we need to do the analysis with TPP? For example:

- Should the file collect all the PSMs or only the peptides?.
- Should the file contain the decoys? In this way, it would not be necessary to use the strategy of the ‘unknown decoys’ or yes?.
- They have to correct the calculated neutral masses and mass differences bug.
- Should the pepXML file be filtered by FDR?. This last point seems important to me because the file I gave you seems to me to be filtered by 1% FDR at the PSM level (you can't change it unless you repeat the search in PEAKS and set a less strict value). In case the pepXML file has to be exported with an FDR value, what should it be for the TPP analysis?

Let's see if we can get it to work. Thanks again for all your help in trying to implement PEAKS in TPP and all the comparative analysis with Comet.


Best regards,
Sergio

--------------------------
Sergio Ciordia Higuera
Proteomics Facility
National Center for Biotechnology
C\Darwin, 3
Universidad Autónoma de Madrid
Cantoblanco
28049 Madrid (Spain)
Phone: +34 91 585 4540 / 4695
Fax: +34 91 585 4506

David Shteynberg

unread,
Oct 12, 2024, 11:52:12 PMOct 12
to spctools-discuss
Hello Sergio,

My answers are in bold:

- Should the file collect all the PSMs or only the peptides?.

 If we are trying to replicate  how TPP processes other search engines then, yes collect all PSMs without filtering and let the TPP process the pepXML.

- Should the file contain the decoys? In this way, it would not be necessary to use the strategy of the ‘unknown decoys’ or yes?

The results should contains the ‘unknown decoys’, any ‘known decoys’ that are exposed to the search engine are no longer reliable as independent indicators of false positive results and should be excluded (ignored) from further FDR estimates.

- They have to correct the calculated neutral masses and mass differences bug

Yes!

- Should the pepXML file be filtered by FDR?. This last point seems important to me because the file I gave you seems to me to be filtered by 1% FDR at the PSM level (you can't change it unless you repeat the search in PEAKS and set a less strict value). In case the pepXML file has to be exported with an FDR value, what should it be for the TPP analysis?

No.  You can have all results exported from PEAKS (maximum FDR without filtering) and let TPP establish the thresholds for FDR.  You can then apply TPP based FDR thresholds without rerunning the analysis (assuming the minimum probability for reported PSMs is set to zero.)

Best,
-David


On Oct 12, 2024, at 12:41 AM, Sergio Ciordia <scio...@cnb.csic.es> wrote:

Thank you very much David for such a thorough analysis, for the effort and your time. I agree with you that there seems to be a bug in the pepXML export. I'm going to write to support to see if they can fix it (and fast). Since I'm going to write to them, what exactly would we need to do the analysis with TPP? For example:

- Should the file collect all the PSMs or only the peptides?.
- Should the file contain the decoys? In this way, it would not be necessary to use the strategy of the ‘unknown decoys’ or yes?.
- They have to correct the calculated neutral masses and mass differences bug.
- Should the pepXML file be filtered by FDR?. This last point seems important to me because the file I gave you seems to me to be filtered by 1% FDR at the PSM level (you can't change it unless you repeat the search in PEAKS and set a less strict value). In case the pepXML file has to be exported with an FDR value, what should it be for the TPP analysis?

Let's see if we can get it to work. Thanks again for all your help in trying to implement PEAKS in TPP and all the comparative analysis with Comet.

Best regards,
Sergio

--------------------------
Sergio Ciordia Higuera
Proteomics Facility
National Center for Biotechnology
C\Darwin, 3
Universidad Autónoma de Madrid
Cantoblanco
28049 Madrid (Spain)
Phone: +34 91 585 4540 / 4695
Fax: +34 91 585 4506


El sáb, 12 oct 2024 a las 2:18, David Shteynberg (<dshte...@systemsbiology.org>) escribió:
Hello Sergio,

After having a much closer look at this data I am having some doubts regarding the data as it is reported in the PEAKS pepXML output.  First of all I noticed that the mass differences and the calculated neutral masses were off in the pepXML export, for example:

<PastedGraphic-1.png>

So I modified my tools to allow recomputing the calculated neutral masses and mass differences.  Here is this entry after I run my tool:

<PastedGraphic-2.png>

Although the calculated neutral mass is now correct, the mass difference for this PSM is quite large, and possibly represents another modification in this peptide that is not annotated by PEAKS for this PSM.  The mass of 16 Da matches oxidation and I see many of these types of peptides, with a massdiff of 16 and containing a Methionine.  


This leads me to suspect a bug in the pepXML export of these PSMs from PEAKS.

I was able to get quite good results running your mzML file through comet.  These are the comet modifications I used in my search:

<PastedGraphic-3.png>

<PastedGraphic-4.png>


Although I am uncertain that this will fix all of the issues I am seeing in this result, is it possible to rerun PEAKS using a similar set of modifications, I used here for comet?

Another concern was that out of over 26,000 PSMs in the pepXML file only 48 are hits to the DECOY portion of the database.  This seems a bit low to me, and I am not sure why that is.


Here is the summary from the comet + TPP analysis of this run:
<image.png>

Comet + PeptideProphet is finding about 24000 correct PSMs at an error-rate less than 1%.

Running iProphet on the data boost the PSM number higher, especially at the lower error-rates:

<PastedGraphic-5.png>


To summarize, comet+PeptideProphet+iProphet is finding about 24000 correct PSMs, mapping to almost 21000 peptide sequences in this file:

Sergio Ciordia

unread,
Oct 13, 2024, 8:08:39 AMOct 13
to spctools...@googlegroups.com
Thank you David, everything is clear. I will write to the PEAKS support team tomorrow.

Best,
Sergio

--------------------------
Sergio Ciordia Higuera
Proteomics Facility
National Center for Biotechnology
C\Darwin, 3
Universidad Autónoma de Madrid
Cantoblanco
28049 Madrid (Spain)
Phone: +34 91 585 4540 / 4695
Fax: +34 91 585 4506

Reply all
Reply to author
Forward
0 new messages