I'm developing tools to support the output of Tide data generated by Crux and am trying to parse and use the pepxml generated by tide. This is, generally, far preferable to me over parsing tab delimited text files.
However, I'm running into some problems parsing the pepxml generated by Tide--it's producing XML that won't validate against the pepxml schema. Specifically, I'm running into these two issues right now. I'll follow up if I find others:
1) num_tol_term is occassionally set to "" This attribute must have a value that is an integer.
2) more than one "modification_info" element per search hit. The XSD specifies either 0 or 1 instances of this element.
For example, I'm finding the following:
<modification_info modified_peptide="LLAGLLHPGQAVSFWGCFAQM[15.99]YFFVALGITESYLLAAMSYDR">
<mod_aminoacid_mass position="21" mass="147.03"/>
</modification_info>
<modification_info modified_peptide="LLAGLLHPGQAVSFWGCFAQMYFFVALGITESYLLAAMSYDR">
<mod_aminoacid_mass position="17" mass="160.03"/>
</modification_info>
This is causing my XML parsing library a lot of consternation. This should be:
<modification_info modified_peptide="LLAGLLHPGQAVSFWGCFAQM[15.99]YFFVALGITESYLLAAMSYDR">
<mod_aminoacid_mass position="21" mass="147.03"/>
<mod_aminoacid_mass position="17" mass="160.03"/>
</modification_info>
Note that I think it only does this if there are two different mod masses. All mods w/ the same mass are correctly together.