SearchGUI / PeptideShaker testing within GalaxyP (April 2015 tests).

77 views
Skip to first unread message

Pratik Jagtap

unread,
Apr 26, 2015, 11:15:15 AM4/26/15
to harald.barsnes, Marc Vaudel, Lennart Martens, Galaxy for Proteomics, peptide...@googlegroups.com, Timothy Griffin, Thomas McGowan


Hello Harald and Marc, 


Hello after a long silence !


We have started testing SearchGUI / PeptideShaker within GalaxyP and here are a few observations / questions. We would greatly appreciate your inputs.


     MS-GF+ and OMSSA searches within SearchGUI / PeptideShaker work.

History: https://galaxyp.msi.umn.edu/u/pjagtap/h/gcc-workshop-raw-mzml-mascotmgf-sgomsgf-ps

Workflow: https://galaxyp.msi.umn.edu/u/pjagtap/w/copy-of-raw-mzml-ppmgf-sg-ps

    What does not work:  X! tandem generates an output after SearchGUI search but the PeptideShaker output generated is empty. We are following up internally on which parameters need to be used to make this work.


We will appreciate your answers to the following questions / observations:


a) Where can we find documentation on PeptideShaker? What do the terms validated  and doubtful mean with reference to protein ID? What does “Minimum confidence required…Mw plot” mean?  


b) There is some confusion on the two parameters in SearchGUI within GalaxyP (see attached images) about use of decoy databases and we should try to seek some clarity on how to optimally use these.

Should not the first one be sufficient ? The second option is redundant and confusing in our opinion.


c) In the summary output (from checking the  'Certificate of Analysis" option). Why are their validated spectra and peptides on two lines?  - 

8: #Validated Peptides: 4479.0

9: #Validated Peptides: 790.0

22: #Validated PSM: 4275.0

23: #Validated PSM: 3917.0


d) In any of the outputs, is there a column that has information on search algorithm and associated score? Within PeptideShaker, can replicates be compared in the same run?


e) Can target database be used for searches?


Your answers will be greatly appreciated. If would be easier to discuss via Skype - we can do that as well. 


Regards,


Pratik (on behalf of the GalaxyP testing group).


Pratik Jagtap,
Managing Director, Center for Mass Spectrometry and Proteomics,
43 Gortner Laboratory, 1479 Gortner Avenue, St. Paul, MN 55108      
PS Parameters.png

Marc Vaudel

unread,
Apr 27, 2015, 8:52:15 AM4/27/15
to Pratik Jagtap, harald.barsnes, Lennart Martens, Galaxy for Proteomics, peptide...@googlegroups.com, Timothy Griffin, Thomas McGowan
Dear Pratik,

Great to hear from you, and glad to see that the workflow is getting in place :) It is not obvious to me why X!Tandem does not work where the other search engines do. You will find more information in the log created along with the X!Tandem result. Will be happy to help if I can!

a) Where can we find documentation on PeptideShaker? What do the terms validated  and doubtful mean with reference to protein ID? What does “Minimum confidence required…Mw plot” mean? 
- We have most of the documentation online on the tool website. If you are looking for specific information on the parameters please refer to the command line wiki (https://code.google.com/p/peptide-shaker/wiki/PeptideShakerCLI). There is also contextual help accessible from the interface by clicking on the question marks. 
- "Validated" means that the match passed the FDR threshold. However, as you know, sometimes very poor hits still pass a stringent statistical threshold, eventually due to peculiarities of the scoring. We thus conduct additional quality control on the matches for example based on mass deviation or fragment ion annotation. The matches passing both the statistical validation and the quality control are named "Confident" and marked green on the interface, the matches passing the statistical validation and not the quality control are named "Doubtful" and marked yellow on the interface. In general, we recommend keeping all statistically validated matches (confident and doubtful). But if you plan follow-up experiments or processing, you might want to focus on confident matches. If you are particularly interested in a doubtful match, it might be relevant to acquire additional data to confirm the identification. You can find more documentation on this in the chapter 1.5 of our tutorials (http://compomics.com/bioinformatics-for-proteomics/). If you click on the validation status of a match in the interface (last column of most tables), a diagnostic dialog will help you inspecting the quality of the match.
 “Minimum confidence required…Mw plot” is a parameter used to generate the Mw plot in the fraction tab. It thus has no influence on the identification results, and is used only for the display. It sets the minimal confidence used to select a protein for display in this plot. I would not recommend changing this parameter.

b) There is some confusion on the two parameters in SearchGUI within GalaxyP (see attached images) about use of decoy databases and we should try to seek some clarity on how to optimally use these.
Should not the first one be sufficient ? The second option is redundant and confusing in our opinion.
These decoy options seem confusing and redundant indeed. I am not sure to which parameters they refer to. The database given to PeptideShaker should be the same as the one used for the search. It will be useless to add decoys between the search and PeptideShaker. We recommend adding contaminants and then decoy of all target sequences when creating the fasta file of interest, before the search. Adding decoys can be done in SearchGUI. For the sake of speed and reproducibility, we recommend the use of reverse sequences.

c) In the summary output (from checking the  'Certificate of Analysis" option). Why are their validated spectra and peptides on two lines?  - 
When validating matches, PeptideShaker separates peptides and PSMs based on different features like charge and modifications in order to improve the identification rate. It seems that the name of the different categories is not exported correctly in the Certificate of Analysis. It should be something like "# phosphorylated peptides" and "# non modified peptides". I will look into this.

d) In any of the outputs, is there a column that has information on search algorithm and associated score? Within PeptideShaker, can replicates be compared in the same run?
- There is no information on the algorithm score in the standard reports, but you can create a report via the gui including these, and it will be later on available via command line on this computer. If you like I can also extend the current default, or create a report for galaxy specifically.
- If you create a project with all replicates together, you will not be able to distinguish them afterwards. Also, I fear that the score will be biased toward the highly abundant proteins. I would thus recommend processing replicates separately and merge the results subsequently.

e) Can target database be used for searches?
It is possible to use target only databases, but really not recommended. Then, only one search engine can be used. Peptide and protein scores will not make sense. Protein inference grouping and PTM localization scoring will be impaired. Finally, no error rate (confidence/FDR) will be calculated.

I hope this answers your questions. Don't hesitate to write again if anything remains unclear, or if new issues pop up :)

Best regards,

Marc


Ira Cooke

unread,
Apr 27, 2015, 6:58:16 PM4/27/15
to Marc Vaudel, Pratik Jagtap, harald.barsnes, Lennart Martens, Galaxy for Proteomics, peptide...@googlegroups.com, Timothy Griffin, Thomas McGowan
Hi Marc, 

Many thanks for this detailed email, and thanks to Pratik for asking.  

I can comment on a couple of your points. 

b) There is some confusion on the two parameters in SearchGUI within GalaxyP (see attached images) about use of decoy databases and we should try to seek some clarity on how to optimally use these.
Should not the first one be sufficient ? The second option is redundant and confusing in our opinion.
These decoy options seem confusing and redundant indeed. I am not sure to which parameters they refer to. The database given to PeptideShaker should be the same as the one used for the search. It will be useless to add decoys between the search and PeptideShaker. We recommend adding contaminants and then decoy of all target sequences when creating the fasta file of interest, before the search. Adding decoys can be done in SearchGUI. For the sake of speed and reproducibility, we recommend the use of reverse sequences.

I totally agree with this and have just not got around to removing these redundant options from the wrapper.  I’ve created an issue in our wrapper repo for this and will try and fix it soon



d) In any of the outputs, is there a column that has information on search algorithm and associated score? Within PeptideShaker, can replicates be compared in the same run?
- There is no information on the algorithm score in the standard reports, but you can create a report via the gui including these, and it will be later on available via command line on this computer. If you like I can also extend the current default, or create a report for galaxy specifically. 
- If you create a project with all replicates together, you will not be able to distinguish them afterwards. Also, I fear that the score will be biased toward the highly abundant proteins. I would thus recommend processing replicates separately and merge the results subsequently.

Please let us know if you do implement a new command line option to support this and we will add it to the wrapper. 

Cheers
Ira


--
You received this message because you are subscribed to the Google Groups "Galaxy for Proteomics" group.
To post to this group, send email to gal...@umn.edu.
Visit this group at http://groups.google.com/a/umn.edu/group/galaxyp/.
To view this discussion on the web visit https://groups.google.com/a/umn.edu/d/msgid/galaxyp/CAE1e1dvAkJFApDt6abkFz9puGJLao9vnVNgGgxq06q27u_cHJQ%40mail.gmail.com.

To unsubscribe from this group and stop receiving emails from it, send an email to galaxyp+u...@umn.edu.

Ira Cooke

unread,
Apr 27, 2015, 7:52:26 PM4/27/15
to Pratik Jagtap, harald.barsnes, Marc Vaudel, Lennart Martens, Galaxy for Proteomics, peptide...@googlegroups.com, Timothy Griffin, Thomas McGowan
Hi Pratik, 

Thanks for sending all this detail through.  

I should say that in my testing I’ve run some searches in which I see PSM’s from all search engines that were run (OMSSA, X!Tandem, MS-GF, Comet in my case) … but sometimes I also see (as you do) that X!Tandem hits do not show up.  For me it seems to depend on the dataset but I cannot pin down the exact circumstances.  

Not sure if that is helpful .. if I get time to investigate properly I’ll let you know what I find.

Cheers
Ira
 

--
You received this message because you are subscribed to the Google Groups "Galaxy for Proteomics" group.
To post to this group, send email to gal...@umn.edu.
Visit this group at http://groups.google.com/a/umn.edu/group/galaxyp/.


To unsubscribe from this group and stop receiving emails from it, send an email to galaxyp+u...@umn.edu.
<PS Parameters.png>

Tim Griffin

unread,
Apr 27, 2015, 10:40:37 PM4/27/15
to Ira Cooke, Pratik Jagtap, harald.barsnes, Marc Vaudel, Lennart Martens, Galaxy for Proteomics, peptide...@googlegroups.com, Thomas McGowan
Ira and all -- on the x!Tandem issue, we discovered we were missing a library update within our implementation for X!Tandem.  After this update it looks like X!Tandem is running and working.  We're now doing more testing and learning the ins and outs of the platform.  Next plans are to link it to some of our other tools for proteogenomics, but not there yet.

- Tim
--
Tim Griffin, Ph.D.
Associate Professor, and

Director, Center for Mass Spectrometry and Proteomics
University of Minnesota
Dept. of Biochemistry, Molecular Biology and Biophysics
6-155 Jackson Hall
321 Church Street SE
Minneapolis, MN  55455
USA

Office: 7-144 Molecular Cellular Biology (MCB)

Tel: 612-624-5249
Fax: 612-624-0432
Email: tgri...@umn.edu

https://www.cbs.umn.edu/bmbb/contacts/timothy-j-griffin
Center for Mass Spectrometry and Proteomics website: http://www.cbs.umn.edu/msp/

Pratik Jagtap

unread,
Apr 28, 2015, 3:03:47 AM4/28/15
to Tim Griffin, Ira Cooke, harald.barsnes, Marc Vaudel, Lennart Martens, Galaxy for Proteomics, peptide...@googlegroups.com, Thomas McGowan
Hello Marc,

Thanks for the detailed answers and links to places where we find documentation.

Ira - thanks for your inputs and follow up on the target-decoy option.

As Tim wrote, JJ worked on a library update that got X!tandem working again. 

We are trying to make it work for a database which has UniProt human proteins, contaminant proteins and translated cDNA from EnSEMBL so that it can be used for downstream analysis for proteogenomics.

We will keep you updated on how future tests work.

Thanks and Regards,

Pratik

Pratik Jagtap,
Managing Director, Center for Mass Spectrometry and Proteomics,
43 Gortner Laboratory, 1479 Gortner Avenue, St. Paul, MN 55108      

Tim Griffin

unread,
Apr 28, 2015, 3:25:27 PM4/28/15
to Ira Cooke, Marc Vaudel, Pratik Jagtap, harald.barsnes, Lennart Martens, Galaxy for Proteomics, peptide...@googlegroups.com, Thomas McGowan
Thanks for the comments Ira.  I'll put a couple of comments on top of yours:

1.  As these issues come up, should we be submitting them as a new issue at the Github repository (I have my username and password so I'm ready to go!)

2.  I'll also reiterate the desire to have the "SearchEngine" output as one that could be generated directly in Galaxy.  Being able to determine which hits were from which search engine is very informative so getting at this output in a more streamlined manner would be terrific.

Thanks,
- Tim
--
Tim Griffin, Ph.D.
Associate Professor, and
Director, Center for Mass Spectrometry and Proteomics
University of Minnesota
Dept. of Biochemistry, Molecular Biology and Biophysics
6-155 Jackson Hall
321 Church Street SE
Minneapolis, MN  55455
USA

Office: 7-144 Molecular Cellular Biology (MCB)

Tel: 612-624-5249
Fax: 612-624-0432
Email: tgri...@umn.edu

https://www.cbs.umn.edu/bmbb/contacts/timothy-j-griffin
Center for Mass Spectrometry and Proteomics website: http://www.cbs.umn.edu/msp/
Reply all
Reply to author
Forward
0 new messages