MzXML2Search problem (Agilent msconvert-ed data)

34 views
Skip to first unread message

dctrud

unread,
Nov 17, 2009, 9:00:30 AM11/17/09
to spctools-discuss
MzXML2Search conversion to mgf fails for me on mzXML / mzML created
using msconvert from Agilent 6520 QTOF data.

Trapper numbers spectra using an index starting at 1, whilst msconvert
uses the Agilent scan ID (can be a very large number). MzXML2Search
conversion of the resulting file into mgf then fails once it reaches a
scan ID > 99,999.

Have seen similar problems with other programs and contacted Matt
Chambers who said that the numbering would stay the same, and that
it's better if programs which can't cope with the large numbers are
fixed. Is this possible (desirable?) for MzXML2Search and any other
TPP tools that might be affected?

I can't use Trapper as I need to extract Profile MS + Centroid MS/MS
from a dual mode file (msconvert supported). I can't directly convert
to mgf with msconvert as I need to do peak thresholding to get file
sizes down to a reasonable level.

Cheers,

DT

Matt Chambers

unread,
Nov 17, 2009, 9:29:18 AM11/17/09
to spctools...@googlegroups.com
What kind of thresholding do you do? Enabling that in msconvert is
overdue - the backend code to support it is already in place.

MzXML2Search is failing because it depends on a strict DTA name scheme
with 5 digits. This is going to break with long LTQ Velos runs so it
needs to be fixed regardless of the Agilent scanId issue. It's ironic
that a Thermo file is the one to break the Thermo-centric assumptions. :P

-Matt

Dave Trudgian

unread,
Nov 17, 2009, 9:33:09 AM11/17/09
to spctools...@googlegroups.com
Matt,

We just apply a (low) absolute threshold. Different values for different
instruments, but it's most critical on the Agilent and Water's QTOFs, as
without any threshold there are 1000s of peaks.

Look forward to seeing the filtering in msconvert.

DT
--
Dr. David Trudgian
Bioinformatician in Proteomics
University of Oxford

Mon-Thu: CCMP, Roosevelt Drive
Tel: (+44) (01865 2)87784

Friday : Dunn School of Pathology, S. Parks Rd.
Tel: (+44) (01865 2)75557



Brian Pratt

unread,
Nov 17, 2009, 11:00:56 AM11/17/09
to spctools...@googlegroups.com
Returning to the behavior of mzXML2Search, just eyeballing the code I don't see any reason it should fail at scan numbers > 99999.  Perhaps the problem is actually downstream from mzXML2Search?  While mzXML2Search should quite happily emit 6 digit MGF scan numbers, I can imagine a consumer of MGF might not see that coming. 
 
Perhaps you could furnish an example of the msconvert output that's giving you trouble?  Eyeballing the code only gets one so far, it's ideal to see it actually running in the debugger.
 
Brian Pratt

Dave Trudgian

unread,
Nov 17, 2009, 11:44:56 AM11/17/09
to spctools...@googlegroups.com
Brian,

I've just had a second look and found the reason the conversion stops at
scan 99999:

The default upper scan number in the options struct is set to 99999:

MzXML2Search.cxx:117
iLastScan = 99999;

Then just before the main loop it's used to set the value used to
terminate the main loop:

MzXML2Search.cxx:344
if (iAnalysisLastScan > options.iLastScan)
iAnalysisLastScan = options.iLastScan;


If the option default is changed to a higher value then the
outputMGF(..) function would need to be changed to write >5 digit scan
numbers. Don't know whether this would have an effect downstream :-(

If you would still like the mzML file as an example let me know where I
can upload it. It's 1.5GB gzipped.

Cheers,

DT


Brian Pratt wrote:
> Returning to the behavior of mzXML2Search, just eyeballing the code I don't see any reason it should fail at scan numbers > 99999. Perhaps the problem is actually downstream from mzXML2Search? While mzXML2Search should quite happily emit 6 digit MGF scan numbers, I can imagine a consumer of MGF might not see that coming.
>
> Perhaps you could furnish an example of the msconvert output that's giving you trouble? Eyeballing the code only gets one so far, it's ideal to see it actually running in the debugger.
>
> Brian Pratt
>

Brian Pratt

unread,
Nov 17, 2009, 1:52:10 PM11/17/09
to spctools...@googlegroups.com
Actually it will write 6 digit scan numbers as is.  The formatting just specifies that it won't write any fewer than 5 digits. 
 
I think the only code change I would want to make is to set that default upper limit to the largest possible integer value, but as there's a workaround I don't think we need to worry about it.
 
And yeah, who knows what happens downstream.  My intuition is that it won't be a problem, actually, but that's only a guess.
 
Brian

Matthew Chambers

unread,
Jun 10, 2010, 12:49:43 PM6/10/10
to spctools...@googlegroups.com
Unless there are indeed downstream problems, it would be good to change
that default upper scan number since others are running into the problem
and it's a pretty obscure source of error.

Dave, the peak filtering has been in msconvert for several months, so
presumably users can now convert straight from MassHunter to MGF with
msconvert. The remaining issue is if one wanted to convert to mzXML
instead with both MS1 and MS2 scans in the file. If one only wanted to
filter MS2s (like you certainly would want to do if you kept MS1s in
profile) then that's not currently possible. It's a bit of a
command-line syntax quagmire that we need to address in our tools.

-Matt

> <mailto:dct...@ccmp.ox.ac.uk><mailto:dct...@ccmp.ox.ac.uk

Jimmy Eng

unread,
Jun 10, 2010, 1:10:51 PM6/10/10
to spctools...@googlegroups.com
I just changed to a really big value. (99999 was considered big many
years ago!)

> --
> You received this message because you are subscribed to the Google Groups
> "spctools-discuss" group.
> To post to this group, send email to spctools...@googlegroups.com.
> To unsubscribe from this group, send email to
> spctools-discu...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/spctools-discuss?hl=en.
>
>

Reply all
Reply to author
Forward
0 new messages