--
---
You received this message because you are subscribed to the Google Groups "Pyteomics" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pyteomics+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Since this came up again, I thought I’d post my current PEFF Defline parser:
import re
from collections import OrderedDict
class PEFFDeflineParser(object):
kv_pattern = re.compile(r"\\(?P<key>\S+)=(?P<value>.+?)(?:\s(?=\\)|$)")
detect_pattern = re.compile(r"^>?\S+:\S+")
def __init__(self, validate=True):
self.validate = validate
def extract_parenthesis_list(self, text):
chunks = []
chunk = []
paren_level = 0
i = 0
n = len(text)
while i < n:
c = text[i]
i += 1
if c == "(":
if paren_level > 0:
chunk.append(c)
paren_level += 1
elif c == ")":
if paren_level > 1:
chunk.append(c)
paren_level -= 1
if paren_level == 0:
if chunk:
chunks.append(chunk)
chunk = []
else:
chunk.append(c)
chunks = list(map(''.join, chunks))
return chunks
def split_pipe_separated_tuple(self, text):
parts = text.split("|")
return parts
def coerce_types(self, key, value):
if "|" in value:
value = self.split_pipe_separated_tuple(value)
result = []
for i, v in enumerate(value):
result.append(self._coerce_value(key, v, i))
return tuple(result)
else:
return self._coerce_value(key, value, 0)
def _coerce_value(self, key, value, index):
try:
return int(value)
except ValueError:
pass
try:
return float(value)
except ValueError:
pass
return str(value)
def parse(self, line):
if self.validate:
match = self.detect_pattern.match(line)
if not match:
raise ValueError(
"Failed to parse {!r} using {!r}".format(
line, self))
storage = OrderedDict()
prefix = None
db_uid = None
if line.startswith(">"):
line = line[1:]
prefix, line = line.split(":", 1)
db_uid, line = line.split(" ", 1)
storage['Prefix'] = prefix
storage['DbUniqueId'] = db_uid
for key, value in self.kv_pattern.findall(line):
if not value.startswith("(") or " (" in value:
storage[key] = self.coerce_types(key, value)
else:
# multi-value
storage[key] = [self.coerce_types(key, v) for v in self.extract_parenthesis_list(value)]
return storage
There are a few issues with it still, the largest being the lack of support for the index-labeled features for specifying proteoform variants. The other large issue is the lack of proper feature structure unpacking. It seemed like PEFF expects you to have parsed psi-ms.obo
in order to look up the data types for some features, but instead I just settle for making tuples of roughly typed values.
Hi Brett,Thank you for writing and indicating your interest. Adding PEFF has come up, and it shouldn't be too hard to add to the existing parsers.The current priority is implementing some structural changes to all parsers in preparation for the next release, which I want to finish in September. I will look more closely into it and try to get something done on PEFF before the release.If you have any thoughts or suggestions, you are most welcome to share them here or in a personal email to me.Best regards,Lev
On Mon, Aug 6, 2018 at 1:28 AM, <brett...@gmail.com> wrote:
G'day team,
I was just wondering if there are any plans to include PEFF parsing to the already awesome library? It would seem to be a good addition to the fasta parsing section.
It might be something we can help with if that's useful.
Cheers,
Brett
--
---
You received this message because you are subscribed to the Google Groups "Pyteomics" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pyteomics+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--Lev Levitsky
Institute for Energy Problems of Chemical Physics RAS
Laboratory of Physical and Chemical Methods for Structure Analysis
Leninsky pr. 38, bld. 2 119334 Moscow Russia
tel: +7 499 1378257 fax: +7 499 1378257, +7 499 1378258
--
---
You received this message because you are subscribed to the Google Groups "Pyteomics" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pyteomics+...@googlegroups.com.
Just to follow up, we’ve put a draft version of a peff
module into the 4.0 branch that builds on the new random-access FASTA mechanism. The PEFF parser can unpack and type the PEFF feature fields, and provides a mapping-like interface similar to the defline parsers in the fasta
. It can also read the header block to extract database metadata, but at the moment no additional special processing is done. I’m waiting for discussion from the specification authors before attempting to do more with feature data structures, and it’s undecided whether we should implement a first-class object for dealing with iterating over proteoforms.
Any feedback would be useful.