reading mgf data and visualization

35 views
Skip to first unread message

Elena E

unread,
Jun 13, 2022, 9:13:38 AM6/13/22
to Pyteomics
Hello,

I am quite new to Python, and I was wondering if anyone has experience with reading mgf data in Python and visualizing peaks. 
 
I would greatly appreciate it if you could share a working sample code.

Many thanks,
Elena

Lev Levitsky

unread,
Jun 13, 2022, 10:35:45 AM6/13/22
to pyte...@googlegroups.com, ameso...@gmail.com
Hi!

I just updated this example in the docs, it should cover all the basics. Feel free to ask questions here or on Github.

Best regards,
Lev

--
You received this message because you are subscribed to the Google Groups "Pyteomics" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pyteomics+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/pyteomics/a681382f-b5eb-4f9d-b2f8-41d38cbf1103n%40googlegroups.com.


--
Lev Levitsky
Institute for Energy Problems of Chemical Physics RAS
Laboratory of Physical and Chemical Methods for Structure Analysis
Leninsky pr. 38, bld. 2 119334 Moscow Russia
tel: +7 499 1378257 fax: +7 499 1378257, +7 499 1378258

Elena E

unread,
Jun 14, 2022, 7:12:40 AM6/14/22
to Lev Levitsky, pyte...@googlegroups.com
Hi Lev,

Many thanks for this much appreciated!

Best wishes,
Eleni

Elena E

unread,
Jun 15, 2022, 4:09:08 AM6/15/22
to Lev Levitsky, pyte...@googlegroups.com
Dear Lev, 

Many thanks for the easy to follow example! It was extremely helpful! I was able to follow and produce the graphics!

I have a follow up question. The data I am playing around is a publically available Covid dataset and I was wondering how 1) I could find Covid positive/negative and 2) how does one include the peaks information to fit, say a regression model.

Many thanks once again!

Best wishes,
Elena

Lev Levitsky

unread,
Jun 15, 2022, 6:01:48 AM6/15/22
to Elena E, pyte...@googlegroups.com
Hi Elena,

I'm glad the examples helped! I wish I could continue being helpful, but these follow-up questions are harder for me to answer. They also definitely go beyond the scope of Pyteomics and my experience.

I'm not sure what you mean by "find Covid positive/negative" but I imagine that a public dataset should include some kind of annotation of positive and negative samples. Otherwise, I have no idea how this dataset can support the publication.
The recently adopted standard for annotation of public proteomics datasets is SDRF-Proteomics. If your dataset is annotated in this format, you can use sdrf-pipelines or just Pandas to parse the annotations, if you need to do it programmatically. Otherwise, I would just look at Supporting Information or the dataset itself for some kind of annotation. If it's in a consistent tabular format, Pandas can be used to read it in Python.

I can't say much about your second question, too, but a few notes:
  • m/z and intensity values can be extracted using "m/z array" and "intensity array" keys from the dictionaries produced by Pyteomics parsers;
  • these values are stored as NumPy arrays;
  • if you need to track intensities of specific ion types across multiple spectra, then I suggest you look at spectrum_utils. Pyteomics optionally uses it for spectrum annotation (see example 4) but if you use it directly, you can access peak annotations through spectrum_utils spectrum objects;
  • for a variety of regression models in Python, look at scikit-learn.
I hope it helps somewhat.

Best regards,
Lev

Elena E

unread,
Jun 15, 2022, 6:05:18 AM6/15/22
to Lev Levitsky, pyte...@googlegroups.com
Dear Lev,

Thank you very much for this! This helps a lot! 
Have a lovely day!

Best,
Elena
Reply all
Reply to author
Forward
0 new messages