Unable to search generated mzML files

106 views
Skip to first unread message

Connor Hoffmann

unread,
Aug 7, 2017, 6:45:09 PM8/7/17
to Comet-ms support
Hi everybody,


My group is creating software that will generate mzML files that are compatible with other software. The mzML files we generated are viewable in programs like Seems and PRIDE inspector. They are also searchable through MS/MS searches done by programs like Mascot and MSGF+. Our mzML files pass validation checks for XML Schema as well as the mzML validation check put our by OpenMS.


The issue is that when I try to run a search through our generated file using Comet (and a few other search engines compatible with TPP) I get the following error:

test.mzml(1) : error 2: Syntax error parsing XML.

This indicates to me that this is a formatting error or the mzML we are generating. However the mzML files we generate are openable and searchable by other programs and meet the mzML standard requirements detaield by HUPO.  Do Comet and other TPP searches have a more strict standard for mzML files? If so is there a location to see these specifications?

I know the workaround would be to simply convert our mzML files into TPP compatible ones with MSConvert (this works) however if at all possible I would like to generate natively compatible mzML documents.

I have been pouring through our generated mzML files and comparing them to mzML files that are compatible with Comet and can see no discernible difference. Any help would be greatly appreciated! I have attached one of the generated mzML files for viewing.

Thanks! 
indexedTestSmall.mzml

Jimmy Eng

unread,
Aug 7, 2017, 7:10:15 PM8/7/17
to Comet-ms support
Connor,

Comet and a lot of the TPP tools make use of the scan index and it appears that the index offsets that you're writing are wrong.  In your example file, here's your index offset entry for scan 1:

      <offset idRef="scan=1">19889</offset>

If I open your mzml file and fseek to that position, it does not point to the first spectrum.  As far as I can tell, that 19889 offset points to somewhere on line 63 of your mzml file whereas it should be pointing to the first '<' character of line 54.  Go fix your scan indexing and all should hopefully be well.  Other tools that simply read the mzml file and don't make use of the scan index won't care that the scan index is broken.

(Same answer will be cross-posted to the spctools-discuss group.)

Jimmy

Jimmy Eng

unread,
Aug 7, 2017, 7:12:42 PM8/7/17
to Comet-ms support
Actually I see Mike Hoopmann answered you on spctools-discuss including also noting that indexListOffset is also set wrong.

Connor Hoffmann

unread,
Aug 8, 2017, 1:26:02 PM8/8/17
to Comet-ms support
Hi Jimmy,

thanks for the advice (from both you and Mike). You both diagnosed the problem rather quickly! I'll let you know when I finish redoing our idnexing.

Thanks!
Reply all
Reply to author
Forward
0 new messages