How to extract the intensities for peaks with a specific m/z from the mzxml file

742 views
Skip to first unread message

Liang QIAO

unread,
Jan 22, 2014, 8:49:36 AM1/22/14
to spctools...@googlegroups.com
Hi All,

I want to get the intensities of peaks with a specific m/z at different retention times from hundreds of thermo RAW files. 
I have converted a RAW file into a mzxml file, and opened the mzxml file by excel to display the xml table.
In the table, I found only the lowMz for the peak with lowest m/z, the highMz for the one with highest m/z and the basePeakMz for the most intensive one.
However, there should be hundreds of peaks according to the peaksCount
I put here a figure to show the displayed xml table by excel.
Is there a way to display all the counted peaks with m/z as well as the corresponding intensities at each retention time?

Thank you very much for your help.

Best wishes

Liang 



Jason Winget

unread,
Jan 22, 2014, 11:03:34 AM1/22/14
to spctools...@googlegroups.com
Hi Liang, this is something I've been working on recently. The peak intensities are compressed, so you need to decompress them to generate a table of peaks and intensities.
You will almost certainly want to do this with a script instead of in Excel.

For mzXML files, I first extract all scan events, then parse the peaks element as follows (in python, using the base64, struct, and lxml modules):
----
line = elem.xpath('mzXML:peaks', namespaces=NS)[0].text
decoded = base64.standard_b64decode(line)
tmp_size = len(decoded) / 8
unpack_format = "!%dd" % tmp_size # This is important. The data is double-double encoded

idx = 0
mz_list = []
intensity_list = []
for val in struct.unpack_from(unpack_format, decoded):
    if(idx%2 == 0):
        mz_list.append(float(val))
    else:
        itensity_list.append(float(val))
    idx += 1
----
This returns two lists containing the m/z values and intensities, respectively for each scan.

For mzML, I've found it easier to use the pymzml module for python. Here is some code to get you started with that:
----
msrun = pymzml.run.Reader(filepath)
for s in msrun:
    if s['id'] in scans:
        for mz, i in s.peaks:
            print(mz, i)
----
Hope this gets you going.

Best,
Jason

Hannes Röst

unread,
Jan 22, 2014, 12:29:46 PM1/22/14
to spctools...@googlegroups.com
Hi Liang

For easy handling of mzXML, you can also use pyopenms

import pyopenms
exp = pyopenms.MSExperiment()
pyopenms.FileHandler().loadExperiment(filepath, exp)
for spectrum in exp:
for peak in spectrum:
print peak.getMZ(), peak.getIntensity()

It is based on the OpenMS package and is available through PyPI:
https://pypi.python.org/pypi/pyopenms

I hope this helps

Hannes
> --
> You received this message because you are subscribed to the Google Groups "spctools-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to spctools-discu...@googlegroups.com.
> To post to this group, send email to spctools...@googlegroups.com.
> Visit this group at http://groups.google.com/group/spctools-discuss.
> For more options, visit https://groups.google.com/groups/opt_out.

Jason Winget

unread,
Jan 22, 2014, 1:41:57 PM1/22/14
to spctools...@googlegroups.com
Hannes:
I wish I had found pyopenms when I was trying to solve this! The unpacking of the peak/intensity values was not trivial to sort out
Time to refactor my code...

Hannes Röst

unread,
Jan 22, 2014, 3:59:36 PM1/22/14
to spctools...@googlegroups.com
Hi Jason

We just completed the final release last August and published the
paper these days, so pyOpenMS has not been around that long. I hope it
is useful and stays useful to the community. If you have comments or
suggestions, please just send me an email :-)

Hannes

Liang QIAO

unread,
Jan 23, 2014, 5:30:38 AM1/23/14
to spctools...@googlegroups.com
Dear Jason,

Thank you very much for the help.
I will try with your method.

Best regards

Liang

Liang QIAO

unread,
Jan 23, 2014, 5:31:53 AM1/23/14
to spctools...@googlegroups.com
Dear Hannes,

Thank you so much for the kind help.

I will try both methods from you and Jason.

Best regards

Liang

Taejoon Kwon

unread,
Jan 23, 2014, 12:10:46 PM1/23/14
to spctools...@googlegroups.com
The other option would be using ProteoWizard, concerting your RAW files (or mzXML files) to ms1 and ms2 format. I think you can intuitively interpret those text. Or see the following paper http://www.ncbi.nlm.nih.gov/pubmed/15317041

Taejoon
Reply all
Reply to author
Forward
0 new messages