Moving forward, I recommend that you base your code on the
ProteoWizard project ("pwiz", http://proteowizard.sourceforge.net/).
This has c++ implementation of reader/writer for both formats. The
SPC converters (us here) will be integrating with this architecture.
Regarding which format you want to use-- mzML is the newer (this
summer) and more feature rich format. It's actually a standard,
approved by the HUPO-PSI committee, and we expect to see instrument
vendors directly supporting this data format.
mzXML is a format developed at the SPC several years ago, and has
served the community well as it has evolved. It's less verbose than
mzML, as it contains less annotation information. So, it might be
preferable if you want a simple, potentially faster, solution, which
is easier to validate (typical XML schema, vs "business logic"
semantic validation for mzML-- note, this validator is already
implemented as a standalone project.)
Currently, almost all data you'll see in published papers or
repositories is mzXML, but this will shift to mzML. Currently, most
data converters (writers) support mzXML, but full mzML support for all
converters is in the works for all converters (from the SPC, at
least.)
Either way-- just use PWIZ and you automatically support both formats!
Josh
Not to beat around the bush: coding an mzML reader or writer is a
significant undertaking. It is a much more complicated format than mzXML
because of its flexibility and the integrated CV. I recommend you use an
existing library to do the reading and writing for you, if possible. For
example, ProteoWizard (proteowizard.sourceforge.net) is one such library
for C++ or .NET programs. If you prefer an mzXML-centric view of the
data, you can use RAMP (also for C/C++). Two of the most important
differences (to my mind):
- CV-based approach for most metadata allows rapid addition of most new
terms and concepts to the format
- nativeID allows reference to the native spectra from whatever file or
instrument generated the mzML
I'm not sure what you last question is really asking.
-Matt