Needing help to read mzml or mgf into dataframe

gameb...@gmail.com

unread,

Dec 21, 2018, 12:27:09 PM12/21/18

to Pyteomics

Hello,

I have Waters .raw files from a developed assay that I am trying to read as a table / Pandas Dataframe in Python.

I used ProteoWizard to convert the Waters .raw to .mgf.
I have been trying to use Pyteomics to initially read the .mgf, but have been unsuccessful in getting anything from my files.

The following code just gives me a blank list:

from pyteomics import mgf, auxiliary

with mgf.read('C:/Users/admin001/Desktop/Advanta Analytical - Ascent Files Initial/Ascent - Waters/02 - Waters P60 Batch 1/mgf Test/170916 W3 A P60 KNW PNA 1_12.mgf') as reader:
auxiliary.print_tree(next(reader))

ERROR:
StopIteration Traceback (most recent call last)
<ipython-input-19-66474771469e> in <module>()
1 from pyteomics import mgf, auxiliary
2 with mgf.read('C:/Users/admin001/Desktop/Advanta Analytical - Ascent Files Initial/Ascent - Waters/02 - Waters P60 Batch 1/mgf Test/170916 W3 A P60 KNW PNA 1_12.mgf') as reader:
----> 3 auxiliary.print_tree(next(reader))
4

C:\ProgramData\Anaconda3\lib\site-packages\pyteomics\auxiliary\file_helpers.py in __next__(self)
123 def __next__(self):
124 # try:
--> 125 return next(self._reader)
126 # except StopIteration:
127 # self.__exit__(None, None, None)

StopIteration:

I experience the exact same problem when performing this with .mzML files and snippets.
Attached are examples of the .mzML and .mgf I am working with. This is really quite a frustrating process, as I am not an experienced programmer, but I am someone who has to get this data into a readable format. Any help would be appreciated.

170916 W3 A P60 KNW PNA 1_13.mzML

Joshua Klein

unread,

Dec 21, 2018, 1:00:03 PM12/21/18

to pyteomics

Your mzML file contains only chromatograms, not spectra, because MSConvert detected you acquired an SRM experiment. I would be surprised if there was any content in the produced MGF file. To obtain spectra, you have to pass the --srmAsSpectra flag to MSConvert on the command line. This option isn’t available from the GUI.

Alternatively, if you want to work with those chromatograms, the mzML reader will work with those with only a small adjustment:

from pyteomics import mzml, auxiliary

with mzml.MzML(path) as reader:
    for chrom in reader.iterfind("chromatogram"):
        auxiliary.print_tree(chrom)

Neither mzML nor MGF produce data that fit in a DataFrame.

--
You received this message because you are subscribed to the Google Groups "Pyteomics" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pyteomics+...@googlegroups.com.
To post to this group, send email to pyte...@googlegroups.com.
Visit this group at https://groups.google.com/group/pyteomics.
To view this discussion on the web visit https://groups.google.com/d/msgid/pyteomics/0de28b42-1d21-4b57-a820-84ffbdd9ee48%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

gameb...@gmail.com

unread,

Dec 21, 2018, 2:42:07 PM12/21/18

to Pyteomics

You just blew this wide open for me.
I strongly believe that I can get by with reading the data as chromatograms. Enough data is present that I believe I can link the .mzML information to some end-user information. I'm hoping that I can even rebuild the chrome from this (which is part of my goal, in addition to presenting some of the info in a table-esque format).

Thank you for the rapid response! You've just made my Christmas!

Joshua Klein

unread,

Dec 21, 2018, 3:34:23 PM12/21/18

to pyteomics

Glad to help. I think you were having a similar problem with a KNIME/OpenMS workflow a few days ago?

Please feel free to ask further questions if you run into any issues down the line, or if you're not sure how to approach a problem.

--
You received this message because you are subscribed to the Google Groups "Pyteomics" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pyteomics+...@googlegroups.com.
To post to this group, send email to pyte...@googlegroups.com.
Visit this group at https://groups.google.com/group/pyteomics.

To view this discussion on the web visit https://groups.google.com/d/msgid/pyteomics/25fab274-08ee-411b-bfc6-d68175cf909d%40googlegroups.com.

gameb...@gmail.com

unread,

Dec 21, 2018, 4:46:13 PM12/21/18

to Pyteomics

Yes, I did have a similar issue. It imparted me with the notion that my data was fit enough to be ran through statistics, and that pre/post-processing steps were likely unnecessary. Unfortunately, I am on an extraordinarily tight work deadline with a lot of moving pieces, so I don't have the luxury of solving all of the issues encountered. I would prefer a slightly slower pace, where I could do more of my own leg-work rather than haranguing the community.

FWIW, I was able to get my files ran through KNIME as spectra, but I have settled on Python for statistics, because it will tie into a GUI and some other tools that we're building in Python. The processing of chromatograms rather than spectra is appealing, because I believe that it creates less work, and it seems like our MassLynx assay has cleaned up the run quite a bit.

Reply all

Reply to author

Forward