Hi,
when I read this mzIdentML file:
ftp://ftp.pride.ebi.ac.uk/pride/data/archive/2014/09/PXD000966/CPTAC_CompRef_00_iTRAQ_01_2Feb12_Cougar_11-10-09.mzid.gz
and then ask for the SpectraData element using code like:
mzid_reader = py_mzid.MzIdentML(mzId_path)
for sid_result in mzid_reader:
spectra_data = mzid_reader.get_by_id(sid_result['spectraData_ref'],
tag_id='SpectraData', detailed=True)
the structure of what's returned is different from other mzIdentML files. The following:
<SpectraData id="CPTAC_CompRef_00_iTRAQ_01_2Feb12_Cougar_11-10-09" location="blah.mzML">
<FileFormat>
<cvParam cvRef="MS" accession="MS:1000584" name="mzML file" value=""/>
</FileFormat>
<SpectrumIDFormat>
<cvParam cvRef="MS" accession="MS:1000768" name="Thermo nativeID format" value=""/>
</SpectrumIDFormat>
</SpectraData>
results in:
{'FileFormat': {'mzML file': {'accession': 'MS:1000584', 'value': ''}}, 'SpectrumIDFormat': {'Thermo nativeID format': {'accession': 'MS:1000768', 'value': ''}}, 'id': 'CPTAC_CompRef_00_iTRAQ_01_2Feb12_Cougar_11-10-09', 'location': 'blah.mzML'}
as compared to the more usual (example from
ftp://ftp.pride.ebi.ac.uk/pride/data/archive/2015/04/PXD001885/OT_141126_16_GFP%20(F004691).mzid.gz):
<SpectraData id="OT_141126_16_GFP (F004691)" location="blah.MGF">
<FileFormat>
<cvParam accession="MS:1001062" name="Mascot MGF file" cvRef="PSI-MS"/>
</FileFormat>
<SpectrumIDFormat>
<cvParam accession="MS:1000774" name="multiple peak list nativeID format" cvRef="PSI-MS"/>
</SpectrumIDFormat>
</SpectraData>
resulting in:
{'FileFormat': {'name': 'Mascot MGF file', 'accession': 'MS:1001062'}, 'SpectrumIDFormat': {'name': 'multiple peak list nativeID format', 'accession': 'MS:1000774'}, 'id': 'OT_141126_16_GFP (F004691)', 'location': 'blah.MGF'}
do others get the same behaviour and is this expected?
best wishes,
Colin