> 1) when running tide-search with --results=protobuf (for use with tide-
> results), is there switch to set the output filename? I get the
> default results.tideres filename and can't see how to change the
> behavior in tide-search.
Yes, use --results_filename=<xxx>.
> 2) I'm going from Orbi .raw to tide .spectrumrecords. In order to do
> so I'm using msconvert on a Windows system to output .ms2 and then
> using your customized msconvert on my Linux system to get
> to .spectrumrecords. I also tried outputting .mzML and .mgf on the
> Windows system. All seemed to work, although the scan number was
> missing in .spectrumrecords with one of the input formats (I believe
> it was mzML). Is there a preferred workflow from Thermo .raw
> to .spectrumrecords?
Good question, but no, I don't think there's a preferred way to do
this. I used the Unix version of proteowizard's msconvert as the basis
for Tide's msconvert. If a file format works well in proteowizard, it
can work in Tide, but not all formats use a spectrum number, as you
point out. Please continue to use the method you are using now. I'll
make a note that spectrum numbering/naming is an issue for Tide
because of the variety of upstream spectrum formats. If I can make
time, I'll try to generalize Tide's provisions for numbering or naming
spectra.
> 3) not to be an ingrate, but I'm not seeing the improvement in speed
> that I was expecting relative to PD 1.3 and I'm wondering if I'm doing
> something that's slowing tide down. I don't have an exact apples to
> apples comparison since tide and PD are running on separate hardware,
> but I've tried to eliminate as many variables as possible. I'm running
> the same data file, same fasta, same search options (i.e. full
> tryptic, 1 missed, C+57, 1M+16, 1STY+80) and a pretty minimal PD
> Sequest workflow. Under those conditions, tide's run time is 110 sec
> and PD's reported run time is 520 sec. I believe the PD run time
> includes creating the protein groups, so the actual Sequest search may
> be 30-40 seconds quicker. Based on your J Prot Res paper, I was
> expecting considerably better than a 10x iimprovement with tide
> whereas I'm actually seeing something less than 5x.
Thanks for answering this one yourself in your follow-up mail (not
quoted here). I will make a note that ppm-based tolerance windows are
important to many (if not most) users. In the meantime, your method of
specifying a small precursor tolerance window (using the --mass_window
flag) is a good idea.
> 4) Do you have any plans to develop a multi-threaded version of tide?
> It would be very useful on our hardware, as I've noticed that tide
> maxes out one core on our 32-core system while the rest sit completely
> idle. Same thing with tide-index. Seems like a waste :) If you aren't
> planning on adding multi-threading, or even in the interim if you are,
> do you have any suggestions on a strategy for parallelizing tide by
> splitting the input file and running multiple instances? I'm assuming
> that it would be more straightforward to split the data while it was
> in .ms2 format, since I don't have any tools for manipulating
> the .spectrumrecords format. Similary, I assume it would be most
> straightforward to output to .sqt or .pep.xml and parse those files in
> order combine them back into a single output. I should probably
> mention that the ultimate destinations for this data will be ProteoIQ,
> Scaffold and the TPP.
For now, the right way to deploy multiple CPUs or cores is indeed to
run multiple instances of Tide at the same time. Certainly some
important efficiencies (most importantly file I/O speed for the index
file) would be available with a built-in threading implementation.
However, this is not an immediate priority for me as the method you
mention of "parallelizing tide by
splitting the input file and running multiple instances" is exactly
the right thing to do for now. If your data aren't already naturally
divided into multiple input files, then you should split the input
file (at the ms2 stage) and recombine after the search is done, as you
mention. I wasn't planning to produce tools for this, but if you need
some advice/help on this, please post again.
Thanks again for your questions.
Regards,
Benjamin