Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

regexps to objects

16 views
Skip to first unread message

andrea crotti

unread,
Jul 27, 2012, 5:36:34 AM7/27/12
to python-list
I have some complex input to parse (with regexps), and I would like to
create nice objects directy from them.
The re module doesn't of course try to conver to any type, so I was
playing around to see if it's worth do something as below, where I
assign a constructor to every regexp and build an object from the
result..

Do you think it makes sense in general or how do you cope with this problem?

import re
from time import strptime
TIME_FORMAT_INPUT = '%m/%d/%Y %H:%M:%S'

def time_string_to_obj(timestring):
return strptime(timestring, TIME_FORMAT_INPUT)


REGEXPS = {
'num': ('\d+', int),
'date': ('[0-9/]+ [0-9:]+', time_string_to_obj),
}


def reg_to_obj(reg, st):
reg, constr = reg
found = re.match(reg, st)
return constr(found.group())


if __name__ == '__main__':
print reg_to_obj(REGEXPS['num'], '100')
print reg_to_obj(REGEXPS['date'], '07/24/2012 06:23:13')

Peter Otten

unread,
Jul 27, 2012, 6:24:41 AM7/27/12
to pytho...@python.org
There is an undocumented Scanner class in the re module:

>>> from datetime import datetime
>>> from re import Scanner
>>> sc = Scanner([
... ("[0-9/]+ [0-9:]+", lambda self, s: datetime.strptime(s, "%m/%d/%Y %H:
%M:%S")),
... (r"\d+", lambda self, s: int(s)),
... ("\s+", lambda self, s: None)])

>>> sc.scan("07/24/2012 06:23:13")
([datetime.datetime(2012, 7, 24, 6, 23, 13)], '')
>>> sc.scan("07/24/2012 06:23:13 123")
([datetime.datetime(2012, 7, 24, 6, 23, 13), 123], '')

However:

>>> sc.scan("456 07/24/2012 06:23:13 123")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python2.7/re.py", line 322, in scan
action = action(self, m.group())
File "<stdin>", line 2, in <lambda>
File "/usr/lib/python2.7/_strptime.py", line 325, in _strptime
(data_string, format))
ValueError: time data '456 07' does not match format '%m/%d/%Y %H:%M:%S'


0 new messages