Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Multiple regex match idiom

79 views
Skip to first unread message

Hrvoje Niksic

unread,
May 9, 2007, 5:00:14 AM5/9/07
to
I often have the need to match multiple regexes against a single
string, typically a line of input, like this:

if (matchobj = re1.match(line)):
... re1 matched; do something with matchobj ...
elif (matchobj = re2.match(line)):
... re2 matched; do something with matchobj ...
elif (matchobj = re3.match(line)):
....

Of course, that doesn't work as written because Python's assignments
are statements rather than expressions. The obvious rewrite results
in deeply nested if's:

matchobj = re1.match(line)
if matchobj:
... re1 matched; do something with matchobj ...
else:
matchobj = re2.match(line)
if matchobj:
... re2 matched; do something with matchobj ...
else:
matchobj = re3.match(line)
if matchobj:
...

Normally I have nothing against nested ifs, but in this case the deep
nesting unnecessarily complicates the code without providing
additional value -- the logic is still exactly equivalent to the
if/elif/elif/... shown above.

There are ways to work around the problem, for example by writing a
utility predicate that passes the match object as a side effect, but
that feels somewhat non-standard. I'd like to know if there is a
Python idiom that I'm missing. What would be the Pythonic way to
write the above code?

Charles Sanders

unread,
May 9, 2007, 5:37:21 AM5/9/07
to
Hrvoje Niksic wrote:
> I often have the need to match multiple regexes against a single
> string, typically a line of input, like this:
>
> if (matchobj = re1.match(line)):
> ... re1 matched; do something with matchobj ...
> elif (matchobj = re2.match(line)):
> ... re2 matched; do something with matchobj ...
> elif (matchobj = re3.match(line)):
> ....
[snip]

>
> There are ways to work around the problem, for example by writing a
> utility predicate that passes the match object as a side effect, but
> that feels somewhat non-standard. I'd like to know if there is a
> Python idiom that I'm missing. What would be the Pythonic way to
> write the above code?

Only just learning Python, but to me this seems better.
Completely untested.

re_list = [ re1, re2, re3, ... ]
for re in re_list:
matchob = re.match(line)
if matchob:
....
break

Of course this only works it the "do something" is the same
for all matches. If not, maybe a function for each case,
something like

re1 = re.compile(....)
def fn1( s, m ):
....
re2 = ....
def fn2( s, m ):
....

re_list = [ (re1, fn1), (re2, fn2), ... ]

for (r,f) in re_list:
matchob = r.match(line)
if matchob:
f( line, matchob )
break
f(line,m)

Probably better ways than this exist.


Charles

Nick Vatamaniuc

unread,
May 9, 2007, 6:04:09 AM5/9/07
to

Hrvoje,

To make it more elegant I would do this:

1. Put all the ...do somethings... in functions like
re1_do_something(), re2_do_something(),...

2. Create a list of pairs of (re,func) in other words:
dispatch=[ (re1, re1_do_something), (re2, re2_do_something), ... ]

3. Then do:
for regex,func in dispatch:
if regex.match(line):
func(...)


Hope this helps,
-Nick Vatamaniuc

Steffen Oschatz

unread,
May 10, 2007, 6:07:36 AM5/10/07
to

Instead of scanning the same input over and over again with different,
maybe complex, regexes and ugly looking, nested ifs, i would suggest
defining a grammar and do parsing the input once with registered hooks
for your matching expressions.

SimpleParse (http://simpleparse.sourceforge.net) with a
DispatchProcessor or pyparsing (http://pyparsing.wikispaces.com/) in
combination with setParseAction or something similar are your friends
for such a task.

Steffen

0 new messages