problem(?) with specific mzIdentML file

11 views
Skip to first unread message

colin...@googlemail.com

unread,
Mar 11, 2018, 5:33:26 AM3/11/18
to Pyteomics
Hi,

when I read this mzIdentML file:

ftp://ftp.pride.ebi.ac.uk/pride/data/archive/2014/09/PXD000966/CPTAC_CompRef_00_iTRAQ_01_2Feb12_Cougar_11-10-09.mzid.gz

and then ask for the SpectraData element using code like:

mzid_reader = py_mzid.MzIdentML(mzId_path)
for sid_result in mzid_reader:
spectra_data = mzid_reader.get_by_id(sid_result['spectraData_ref'],
tag_id='SpectraData', detailed=True)


the structure of what's returned is different from other mzIdentML files. The following:

<SpectraData id="CPTAC_CompRef_00_iTRAQ_01_2Feb12_Cougar_11-10-09" location="blah.mzML">
<FileFormat>
<cvParam cvRef="MS" accession="MS:1000584" name="mzML file" value=""/>
</FileFormat>
<SpectrumIDFormat>
<cvParam cvRef="MS" accession="MS:1000768" name="Thermo nativeID format" value=""/>
</SpectrumIDFormat>
</SpectraData>

results in:

{'FileFormat': {'mzML file': {'accession': 'MS:1000584', 'value': ''}}, 'SpectrumIDFormat': {'Thermo nativeID format': {'accession': 'MS:1000768', 'value': ''}}, 'id': 'CPTAC_CompRef_00_iTRAQ_01_2Feb12_Cougar_11-10-09', 'location': 'blah.mzML'}

as compared to the more usual (example from ftp://ftp.pride.ebi.ac.uk/pride/data/archive/2015/04/PXD001885/OT_141126_16_GFP%20(F004691).mzid.gz):

<SpectraData id="OT_141126_16_GFP (F004691)" location="blah.MGF">
<FileFormat>
<cvParam accession="MS:1001062" name="Mascot MGF file" cvRef="PSI-MS"/>
</FileFormat>
<SpectrumIDFormat>
<cvParam accession="MS:1000774" name="multiple peak list nativeID format" cvRef="PSI-MS"/>
</SpectrumIDFormat>
</SpectraData>

resulting in:

{'FileFormat': {'name': 'Mascot MGF file', 'accession': 'MS:1001062'}, 'SpectrumIDFormat': {'name': 'multiple peak list nativeID format', 'accession': 'MS:1000774'}, 'id': 'OT_141126_16_GFP (F004691)', 'location': 'blah.MGF'}

do others get the same behaviour and is this expected?

best wishes,
Colin

colin...@googlemail.com

unread,
Mar 11, 2018, 2:23:26 PM3/11/18
to Pyteomics
i think it might be to do with the empty value attribute in the cvParam.
Other files that have this behave similarly,
eg. ftp://ftp.pride.ebi.ac.uk/pride/data/archive/2017/06/PXD006757/160115_Jackson_NMooney_4822_Rabl2B_01.raw_20160119_Byonic.mzid.gz

colin...@googlemail.com

unread,
Mar 12, 2018, 3:27:03 AM3/12/18
to Pyteomics
OK - sorry, I don't think this a problem with the library.

Someone else made some changes to the function _handle_param in xml.py, in our local copy of the library, and this broke things.

sorry to bother you,
thanks again for pyteomics,
Colin
Reply all
Reply to author
Forward
0 new messages