similar code, different behaviour... i'm a bit confused

13 views
Skip to first unread message

colin...@googlemail.com

unread,
Feb 9, 2018, 10:05:48 AM2/9/18
to Pyteomics
Hi,

thank you for your work on this useful library, however, there is something about it that confuses me.

We wish to iterate through all the DBSequences and use code as follows:

sequence_collection = mzid_reader.iterfind('SequenceCollection').next()
for sequence in sequence_collection['DBSequence']:
#do stuff with sequence

where mzid_reader is an instance of the MzIdentML class.

It works and it seems to be the fastest way to do it for large files, we think because there can only be one SequenceCollection element and this way loads its contents into memory for fast iteration without necessarily going through the entire file. (Any comments on this most welcome.)

However, if I use the same pattern to iterate through the AnalysisSoftware it doesn't work as expected. If I write:

analysis_software_list = mzid_reader.iterfind('AnalysisSoftwareList').next()

then the dict returned only contains the first AnalysisSoftware element, not a collection of them, even it there is more than one of them under AnalysisSoftwareList. It seems the behaviour is different from when I asked for iterfind('SequenceCollection').

Could somebody explain this please? It may be some simple mistake or misunderstanding as I am new to python and this library.

best wishes,
Colin


Joshua Klein

unread,
Feb 9, 2018, 11:12:37 AM2/9/18
to pyteomics

This has to do with how the XML schema for mzIdentML is interpreted. The parser doesn’t know that AnalysisSoftware is supposed to be part of a list. To get around the issue, you can do the following:

mzid_reader.schema_info['lists'].add("AnalysisSoftware")

This will register the element as being part of a list.



--

---
You received this message because you are subscribed to the Google Groups "Pyteomics" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pyteomics+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Lev Levitsky

unread,
Feb 9, 2018, 11:19:39 AM2/9/18
to pyteomics, colin...@googlemail.com
Hi Colin,

as Joshua has already noted, this is not a mistake on your part. The problem is with the default MzIdentML schema information.
He also suggests an easy workaround (thank you Joshua!).

Meanwhile I have updated the default schema information in pyteomics, so if you install the latest version it should work out of the box.

Best regards,
Lev


--
Lev Levitsky
Institute for Energy Problems of Chemical Physics RAS
Laboratory of Physical and Chemical Methods for Structure Analysis
Leninsky pr. 38, bld. 2 119334 Moscow Russia
tel: +7 499 1378257 fax: +7 499 1378257, +7 499 1378258

colin...@googlemail.com

unread,
Feb 12, 2018, 5:54:30 AM2/12/18
to Pyteomics
I can confirm this works, thanks also for the speedy response,
Colin
Reply all
Reply to author
Forward
0 new messages