Msconvert vs Trapper - Speed differences and preprocessing

132 views
Skip to first unread message

bio.x2y

unread,
Feb 16, 2010, 3:52:34 PM2/16/10
to spctools-discuss
Hi,

I independently used both Trapper (4.3.1) and Msconvert (pwiz 1.6.0)
to convert a 1.4Gb Agilent MassHunter ".d" file containing ~16000
spectra to mzXML.

Parameters:
$msconvert --mzXML --verbose large.d
$trapper --mzXML -v large.d large.mzXML

Msconvert took 5hrs 38mins to complete, generating a 57Gb file.
Trapper took 1hr 26mins to complete, generating a 28Gb file.

I can understand the differences in size, given the differences in
structure and precision. However, the time difference still appears
quite high.

One interesting observation is that msconvert does not start writing
to the output file until 1hr 20min has elapsed. At that point, the
file begins filling and the progress messages start appearing in the
output. Trapper, on the other hand, starts filling the output file and
reporting progress immediately.

I have seen this occur now for two runs on two different days, so I
don't think it's related to other activity on the machine.

Perhaps msconvert is engaging in some preprocessing that isn't
strictly necessary, for Agilent ".d" files at least?

Thanks,
bio.x2y

Dave Trudgian

unread,
Feb 16, 2010, 4:22:21 PM2/16/10
to spctools...@googlegroups.com
Hi,

This seems like a *really* long time for both tools even if the file
contains a lot of scans. Are you converting files over a slow network
share or similar?

We regularly convert Agilent 6520 QTOF files to mzXML using trapper and
msconvert. For trapper it takes about 27s to convert a file with 8800
scans to mzXML using centroid mode, or 8m 20s for profile mode.

These timings are on a server with Xeon 5550s and 16GB of RAM, but I
wouldn't expect conversions to take hours, even on my 2 yr old laptop.
msconvert is usually about twice as slow as trapper when converting
using 32-bit precision.

DT

Matthew Chambers

unread,
Feb 16, 2010, 4:26:05 PM2/16/10
to spctools...@googlegroups.com
The delay in start up time for Agilent is a known issue. Unfortunately
the current Agilent API doesn't provide a way to get the list of scanIds
in a data file, nor a way to get a spectrum's scanId without getting its
data arrays. ProteoWizard is designed to support random access by
nativeID and by index to all of the data formats it supports, so it has
to enumerate all the spectra up front in order to get each of their
scanIds. On profile data, that takes a frustrating amount of time.

I have a feature request pending with Agilent to get a function which
provides either a list of scanIds or a spectrum without metadata.

Thanks for doing the comparison. I agree with Dave that the conversion
time sounds pretty long in both cases and I suspect a network share.

-Matt

bio.x2y

unread,
Feb 16, 2010, 4:42:13 PM2/16/10
to spctools-discuss
Dave, Matt,
Thanks for the comments.

My Windows happens to be living in a Virtualbox on a Mac Pro, so that
explains the overall sluggish pace! I guess that's part of the fun of
needing the Agilent library.

Out of curiosity, I might try running one of the jobs on my own year-
old laptop, just in case this is data related..

Cheers,
bio.x2y

On Feb 16, 9:26 pm, Matthew Chambers <matthew.chamb...@vanderbilt.edu>
wrote:

Joe Slagel

unread,
Feb 17, 2010, 4:06:12 PM2/17/10
to spctools...@googlegroups.com
Matt,

Thanks for the explanation.   Does this mean that trapper isn't using the Agilent API? (Asking the naive trapper user question)

-Joe


--
You received this message because you are subscribed to the Google Groups "spctools-discuss" group.
To post to this group, send email to spctools...@googlegroups.com.
To unsubscribe from this group, send email to spctools-discu...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/spctools-discuss?hl=en.




--
Joe Slagel
Institute for Systems Biology
jsl...@systemsbiology.org
(206) 732-1362

Matthew Chambers

unread,
Feb 17, 2010, 4:11:51 PM2/17/10
to spctools...@googlegroups.com
Hi Joe,

Msconvert and Trapper use the same API. But Trapper is a dedicated
converter so it doesn't worry about random access. Thus it doesn't need
to do the initial enumeration which accounts for the large difference in
run time. Pwiz is not just about conversion though. Pwiz's SeeMS tool
can directly view raw spectra like MassHunter does (except for that
blasted initialization time on profile data!).

-Matt


Joe Slagel wrote:
> Matt,
>
> Thanks for the explanation. Does this mean that trapper isn't using
> the Agilent API? (Asking the naive trapper user question)
>
> -Joe
>
>
> On Tue, Feb 16, 2010 at 1:26 PM, Matthew Chambers
> <matthew....@vanderbilt.edu

Robert

unread,
Feb 17, 2010, 5:35:15 PM2/17/10
to spctools-discuss
Hi,

we are using an Agilent MSD/TOF for metabolomics. For us, it is
important to convert to CENTROID mzXML for further data processing,
otherwise the files become extremely large.

Greetings, Robert

Reply all
Reply to author
Forward
0 new messages