Re: PeptideShaker/SearchGui on the toolshed

13 views

Skip to first unread message

Pratik Jagtap

unread,

Jun 25, 2014, 2:39:32 PM6/25/14

to Bart Gottschalk, Jim Johnson, Tim Griffin, lennart...@ugent.be, Harald....@biomed.uib.no, Marc Vaudel, Björn Grüning, Ira Cooke, Galaxy for Proteomics, Shaun Stewart, Alan Blakely, Kevin Murray

Thanks Ira,

As discussed yesterday, we will request developers at MSI to install this on the development site so that Shaun and students can start testing it.

Regards,

Pratik

Pratik Jagtap,

Managing Director,

Center for Mass Spectrometry and Proteomics,

43 Gortner Laboratory
1479 Gortner Avenue
St. Paul, MN 55108

Phone: 612-624-9275

On Wed, Jun 25, 2014 at 11:40 AM, Ira Cooke <I.C...@latrobe.edu.au> wrote:

Hi All,

In case anybody wants to use it, I have updated the peptideshaker tool on both tool sheds to use recent releases from Harald and Marc. A quick test indicates that the dependency install works and a small test dataset runs OK.

The galaxyp wrapper on the main toolshed has been updated to this version, since the old wrapper was no longer working anyway.

http://toolshed.g2.bx.psu.edu/view/galaxyp/peptideshaker

Now that we have this working, it looks like we may be doing a rework of the design anyway .. splitting the task into multiple tools (working from what Gerben and Elvis have done). Marc is working to implement features in Peptideshaker and SearchGUI to support this.

Cheers

Ira

On 24 Jun 2014, at 6:47 pm, John Chilton <jmch...@gmail.com> wrote:

Hello All,

I'm super excited to see the collaboration on these tools and wrappers. I think I started with the intention to break these into separate tools but it didn't really work well with Galaxy. Galaxy makes no guarantee that input and output file locations are immutable - there are a bunch of ways to configure Galaxy to ensure this is not the case. Galaxy tools are best therefore when distinct input "files" produce a fixed number of independent outputs. Also I think not needing the keep the SearchGUI results around meant the tools consumed a lot less disk space.

Certainly - if things have involved or my initial assumptions were wrong - I would encourage splitting out the tools. But if SearchGUI cannot produce a single independent output for consumption by PeptideShaker I would encourage keeping them combined. One could consider using Galaxy tool macros to create more focused tools that follow this pattern without duplication.

-John

Hi Marc .. and all,

I was initially very enthusiastic about splitting the tools, but I must admit I’m much less sure that it’s a good idea now. For debugging it is definitely nice to see which step things break at … so that’s a win for the split approach. On the other hand;

1. Most of the complexity of the UI will remain in the generation of identification parameters step .. so I don’t think we will gain in terms of usability there .. but we will require the user to add extra steps to their workflows which is perhaps a decrease in usability.

2. We run into issues with making sure that paths remain intact across tool running steps. It doesn’t sound like the -identification_files parameter will solve the issue at this stage because it is possible for files to be spread across multiple directories.

So for my part … I remain a bit unsure.

Cheers

Ira

On 24 Jun 2014, at 12:36 pm, Marc Vaudel <mva...@gmail.com> wrote:

Hi Gerben, Elvis and Ira!

Good to ear that things are working well!

I agree that splitting the tool sounds like the most reasonable approach. For their usage as well as maintenance.

> 1) Are all (non MSGF+) MzID files automatically loaded when a directory is chosen as the -identification_files parameter?

Yes, if a directory is given all compatible files are automatically loaded.

> 2) What’s the difference between the -report and -documentation parameter for the ReportCLI.

The -report parameter exports the actual reports (tables). The -documentation parameter exports the documentation of the report (description of the content of the columns of the tables).

> 3) The choice of running different search engines, is that stored in the parameter file? In other words, can this information be retrieved by the SearchCLI from the -id_params input, or do you still need to specify that by the optional parameters (e.g.: -xtandem, -msgf, …).

No this is not stored in the parameter file. You can run different sets of search engines with the same parameter file.

Will be happy to help if you have more questions. Btw I really "TODO: implementation of the DeNovoCLI.". Eager to see all this combined :)

Best regards,

Marc

2014-06-24 17:51 GMT+02:00 Ira Cooke <I.C...@latrobe.edu.au>:

Hi Gerben,

Many thanks for the info about your work. I think we should definitely join forces to try and make this tool wrapper the best it can be. I guess splitting up the tasks is not a trivial thing due to the issues you raise, but I’m sure there are ways around that. What is your overall feeling about the benefits of splitting up the tasks? I’m guessing the main benefit is easier debugging since you can see which stage failed.

Also .. what do the others in this email list think about the approach of splitting up the tools? Is it a good idea? Should we try to adopt this approach for our tool?

In terms of fixing the paths in the output files, I think this is something we could do for the text file formats reasonably easily by a bit of find/replace magic in the wrapper. For the cps file it might be tricky though? As you say, the input files can be spread across multiple directories. Maybe there is something that could be done in peptideshaker that would allow providing a full list of paths, or a method to edit the paths in a cps file.

Thanks heaps for sharing your ideas :)!

Cheers

Ira

On 24 Jun 2014, at 10:29 am, Gerben Menschaert <gerben.m...@gmail.com> wrote:

Hi Harald, Marc, Ira, Björn,

We (mostly Elvis and I) are finalizing our implementation of the PeptideShaker/SearchGui toolset in Galaxy.

Our setup is to split the tasks into different stand-alone entities:

IdentificationParameterCLI (this also allows setting search-algorithm specific input parameters, PepNovo+ and DirecTag not yet implemented)

output=parameter file

Combination of SearchCLI (with optional FastaCLI) & PeptideShakerCLI

output=cps file

ReportCLI

output=csv file(s)

FollowUpCLI (MzIdentML output will be included asap, using the MzidCLI)

output depends on chosen follow up analysis

TODO: implementation of the DeNovoCLI.

Allow me to ask some specific questions:

1) Are all (non MSGF+) MzID files automatically loaded when a directory is chosen as the -identification_files parameter?

2) What’s the difference between the -report and -documentation parameter for the ReportCLI.

3) The choice of running different search engines, is that stored in the parameter file? In other words, can this information be retrieved by the SearchCLI from the -id_params input, or do you still need to specify that by the optional parameters (e.g.: -xtandem, -msgf, …).

More specifically for Ira, Björn:

4) Input spectrum files: A temporary spectra folder is created wherein symbolic links are placed to the actual mgf files (physically in ../files/xxx/.dat). As such all spectra are in the same directory. Also the .dat files are renamed to the actual display_name and get the .mgf extension in order to be compatible with the PeptideShaker/SearchGui toolset.

This temporary folder is located in the job_working_directory. So, in our setup splitting the different tasks, these links are lost for step 3 (i.e. ReportCLI).

Two possible solutions:

* A symbolic link could be created next to the ../files/xxx/yyy.dat (../galaxy_db/files/xxx/display_name.mgf), but there’s a possibility that the full set of mgf input files is spread over two directories (e.g. ../galaxy_db/files/001/ and ../galaxy_db/files/002/), thus disabling the usage of a directory for the -identification_files parameter. Am I right that it actually could be the case that a second subfolder (e.g. /002) can be created while running the same job?

* A predefined spectrum file container (directory) could be used to symlink the mgf files. What would be a nice place to create this, in the galaxy_db directory?

By the way, Ira, it was nice meeting you in Baltimore.

I know you’re in Minnesota this week, also trying to test the Galaxy implementation of PeptideShaker. Tomorrow, we’ll try to send our wrappers to you, so you can try and get them in the toolshed.

Gerben & Elvis

Reply all

Reply to author

Forward

0 new messages