get more detailed information from cvParam tags from MzidentML

30 views
Skip to first unread message

lkolb...@gmail.com

unread,
Mar 14, 2018, 2:06:10 PM3/14/18
to Pyteomics
Hi,

I noticed that when getting information from mzidentML tags the _handle_param function reduces the information included in cvParam tags to:
a) a dictionary with the 'name' attribute as the key and the 'value' attribute as value
or if there is no 'value' attribute
b) the name attribute as a cvstr

This leads to two different possible data types being returned (dictionary or cvstr), which might be prone to throwing unexpected data type errors.

My main concern is that all other attributes get ignored. Is there a way to get out all attribute value pairs from cvParam tags as a dictionary with every attribute being a key and its value the value?

e.g.
<FragmentTolerance>
<cvParam accession="MS:1001412" name="search tolerance plus value" value="0.25" cvRef="PSI-MS" unitAccession="UO:0000221" unitName="dalton" unitCvRef="UO"/>

</FragmentTolerance>

will only return then name and the value ('FragmentTolerance': 'search tolerance plus value': 0.25), but I can't get unitName, unitCvRef, unitAccession or accession. Only getting the value of the tolerance but not the unit is not sufficient in this case.


Cheers,
Lars

Joshua Klein

unread,
Mar 14, 2018, 3:16:08 PM3/14/18
to pyteomics

We’ve been discussing this problem with differing return value types in relation to other issues. This is particularly problematic when some vendors produce params with empty value strings regardless of whether the param has a value type.

As for your main question, that metadata can be accessed using something like the code below:

Given an mzIdentML file with the following content:

<FragmentTolerance>
    <cvParam accession="MS:1001412" name="search tolerance plus value" value="0.4" cvRef="PSI-MS" unitAccession="UO:0000221" unitName="dalton" unitCvRef="UO"/>
    <cvParam accession="MS:1001413" name="search tolerance minus value" value="0.4" cvRef="PSI-MS" unitAccession="UO:0000221" unitName="dalton" unitCvRef="UO"/>
</FragmentTolerance>

The following code will get you the data you want:

from pyteomics import mzid

reader = mzid.read("path/to/file.mzid")

# Get the piece of the document described in the snippet
gen = reader.iterfind("FragmentTolerance")
datum = next(gen)

for k, v in datum.items():
    print(k, type(k), k.accession, k.unit_accession)
    print(v, type(v), v.unit_info)

This will produce the following output:

search tolerance plus value <class 'pyteomics.auxiliary.cvstr'> MS:1001412 UO:0000221
0.4 <class 'pyteomics.auxiliary.unitfloat'> dalton
search tolerance minus value <class 'pyteomics.auxiliary.cvstr'> MS:1001413 UO:0000221
0.4 <class 'pyteomics.auxiliary.unitfloat'> dalton

These “augmented” value types were used because they let us preserve backwards-compatibility.



--

---
You received this message because you are subscribed to the Google Groups "Pyteomics" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pyteomics+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

lkolb...@gmail.com

unread,
Mar 17, 2018, 12:09:52 PM3/17/18
to Pyteomics
thanks for the quick response. The code you provided worked in this case, when using iterind. But when using get_by_id for direct access via the indexed position in the file I'll only get the slimmed down version described above returned (using handle_params).

I made the following change to the handle params function
by calling get_by_id with additional kwargs detailed & accession_key arguments. We noticed inconsistencies in mzidentMLs we tested from PRIDE regarding the cvParam names so using the accession number as a key might help.

> def _handle_param(self, element, **kwargs):
"""Unpacks cvParam and userParam tags into key-value pairs"""
types = {'int': unitint, 'float': unitfloat, 'string': unitstr}
attribs = element.attrib
unit_info = None

#LK edit extract all data from cvParam
if kwargs.get('detailed'):
attr_dict = {}
for k, v in attribs.items():
attr_dict[k] = v

if kwargs.get('accession_key'):
return {attribs['accession']: attr_dict}

return {attribs['name']: attr_dict}
# LK edit end
# ... rest of the function


Don't know if that's functionality that might be useful to you as well. It doesn't do any of the type conversions you do later in the function. But they could be included.

Another edit to this handle_params function that makes sure empty value attributes are getting ignored.
changing:
>if 'value' in attribs:
...
to
>if 'value' in attribs and attribs['value'].strip() != '': # CC edit
> To unsubscribe from this group and stop receiving emails from it, send an email to pyteomics+...@googlegroups.com.

Joshua Klein

unread,
Mar 17, 2018, 1:21:45 PM3/17/18
to pyteomics

Could you point me to the mzIdentML file and id where you’re getting different output from iterfind and get_by_id please?


To unsubscribe from this group and stop receiving emails from it, send an email to pyteomics+unsubscribe@googlegroups.com.

lkolb...@gmail.com

unread,
Mar 22, 2018, 9:14:44 AM3/22/18
to Pyteomics
sorry, I must have gotten something confused there. The output from both seems to be the same.
I'm still having issues with getting accession numbers back. I'll get back to you as soon as I can give you a more accurate description of what's going on.

Thanks,
Lars
Reply all
Reply to author
Forward
0 new messages