Question on mzXML and mzML

wayn...@gmail.com

unread,

Jul 1, 2008, 11:11:24 PM7/1/08

to spctools-discuss

Dear all,

We are in the process of writing a program that will read from either
mzXML or mzML. What are the major differences between mzXML and mzML?
Which format is better to use depending on the availability of writer/
extractor for mzXML or mzML?

Thanks,

Wayne

Joshua Tasman

unread,

Jul 2, 2008, 2:11:50 PM7/2/08

to spctools...@googlegroups.com

Hi Wayne,

Moving forward, I recommend that you base your code on the
ProteoWizard project ("pwiz", http://proteowizard.sourceforge.net/).
This has c++ implementation of reader/writer for both formats. The
SPC converters (us here) will be integrating with this architecture.

Regarding which format you want to use-- mzML is the newer (this
summer) and more feature rich format. It's actually a standard,
approved by the HUPO-PSI committee, and we expect to see instrument
vendors directly supporting this data format.

mzXML is a format developed at the SPC several years ago, and has
served the community well as it has evolved. It's less verbose than
mzML, as it contains less annotation information. So, it might be
preferable if you want a simple, potentially faster, solution, which
is easier to validate (typical XML schema, vs "business logic"
semantic validation for mzML-- note, this validator is already
implemented as a standalone project.)

Currently, almost all data you'll see in published papers or
repositories is mzXML, but this will shift to mzML. Currently, most
data converters (writers) support mzXML, but full mzML support for all
converters is in the works for all converters (from the SPC, at
least.)

Either way-- just use PWIZ and you automatically support both formats!

Josh

Joshua Tasman

unread,

Jul 2, 2008, 2:12:23 PM7/2/08

to spctools...@googlegroups.com

PS-- I think PWIZ already has C# bindings as well, and there may be
talk of adding other languages like java and python...

Matthew Chambers

unread,

Jul 2, 2008, 2:16:56 PM7/2/08

to spctools...@googlegroups.com

Hi Wayne,

Not to beat around the bush: coding an mzML reader or writer is a
significant undertaking. It is a much more complicated format than mzXML
because of its flexibility and the integrated CV. I recommend you use an
existing library to do the reading and writing for you, if possible. For
example, ProteoWizard (proteowizard.sourceforge.net) is one such library
for C++ or .NET programs. If you prefer an mzXML-centric view of the
data, you can use RAMP (also for C/C++). Two of the most important
differences (to my mind):
- CV-based approach for most metadata allows rapid addition of most new
terms and concepts to the format
- nativeID allows reference to the native spectra from whatever file or
instrument generated the mzML