Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

template strings for matching?

0 views
Skip to first unread message

Joe Strout

unread,
Oct 9, 2008, 8:35:09 AM10/9/08
to pytho...@python.org
Catching up on what's new in Python since I last used it a decade ago,
I've just been reading up on template strings. These are pretty
cool! However, just as a template string has some advantages over %
substitution for building a string, it seems like it would have
advantages over manually constructing a regex for string matching.

So... is there any way to use a template string for matching? I
expected something like:

templ = Template("The $object in $location falls mainly in the
$subloc.")
d = templ.match(s)

and then d would either by None (if s doesn't match), or a dictionary
with values for 'object', 'location', and 'subloc'.

But I couldn't find anything like that in the docs. Am I overlooking
something?

Thanks,
- Joe

Tino Wildenhain

unread,
Oct 9, 2008, 8:52:24 AM10/9/08
to Joe Strout, pytho...@python.org

sk...@pobox.com

unread,
Oct 9, 2008, 8:59:17 AM10/9/08
to Joe Strout, pytho...@python.org
Joe> templ = Template("The $object in $location falls mainly in the $subloc.")
Joe> d = templ.match(s)

Joe> and then d would either by None (if s doesn't match), or a
Joe> dictionary with values for 'object', 'location', and 'subloc'.

Joe> But I couldn't find anything like that in the docs. Am I
Joe> overlooking something?

Nope, you're not missing anything.

Skip

sk...@pobox.com

unread,
Oct 9, 2008, 9:05:02 AM10/9/08
to Tino Wildenhain, pytho...@python.org

Tino> Yeah, its a bit hard to spot:

Tino> http://docs.python.org/library/stdtypes.html#string-formatting-operations

That shows how to use the template formatting as it currently exists. To my
knowledge there is no support for the inverse operation, which is what Joe
asked about. Given a string and a format string assign the elements of the
string which correspond to the template elements to key/value pairs in a
dictionary.

Skip

Peter Otten

unread,
Oct 9, 2008, 9:20:48 AM10/9/08
to
Joe Strout wrote:

> Catching up on what's new in Python since I last used it a decade ago,
> I've just been reading up on template strings. These are pretty
> cool!

I don't think they've gained much traction and expect them to be superseded
by PEP 3101 (see http://www.python.org/dev/peps/pep-3101/ )

> However, just as a template string has some advantages over %
> substitution for building a string, it seems like it would have
> advantages over manually constructing a regex for string matching.
>
> So... is there any way to use a template string for matching? I
> expected something like:
>
> templ = Template("The $object in $location falls mainly in the
> $subloc.")
> d = templ.match(s)
>
> and then d would either by None (if s doesn't match), or a dictionary
> with values for 'object', 'location', and 'subloc'.
>
> But I couldn't find anything like that in the docs. Am I overlooking
> something?

I don't think so. Here's a DIY implementation:

import re

def _replace(match):
word = match.group(2)
if word == "$":
return "[$]"
return "(?P<%s>.*)" % word

def extract(template, text):
r = re.compile(r"([$]([$]|\w+))")
r = r.sub(_replace, template)
return re.compile(r).match(text).groupdict()


print extract("My $$ is on the $object in $location...",
"My $ is on the biggest bird in the highest tree...")

As always with regular expressions I may be missing some corner cases...

Peter

Robin Becker

unread,
Oct 9, 2008, 9:29:52 AM10/9/08
to pytho...@python.org
Joe Strout wrote:
> Catching up on what's new in Python since I last used it a decade ago,
> I've just been reading up on template strings. These are pretty cool!
> However, just as a template string has some advantages over %
> substitution for building a string, it seems like it would have
> advantages over manually constructing a regex for string matching.
>
> So... is there any way to use a template string for matching? I
> expected something like:
.......

you could use something like this to record the lookups

>>> class XDict(dict):
... def __new__(cls,*args,**kwds):
... self = dict.__new__(cls,*args,**kwds)
... self.__record = set()
... return self
... def _record_clear(self):
... self.__record.clear()
... def __getitem__(self,k):
... v = dict.__getitem__(self,k)
... self.__record.add(k)
... return v
... def _record(self):
... return self.__record
...
>>> x=XDict()
>>> x._record()
set([])
>>> x=XDict(a=1,b=2,c=3)
>>> x
{'a': 1, 'c': 3, 'b': 2}
>>> '%(a)s %(c)s' % x
'1 3'
>>> x._record()
set(['a', 'c'])
>>>

a slight modification would allow your template match function to work even when
some keys were missing in the dict. That would allow you to see which lookups
failed as well.
--
Robin Becker

Paul McGuire

unread,
Oct 9, 2008, 9:49:43 AM10/9/08
to
Pyparsing makes building expressions with named fields pretty easy.

from pyparsing import Word, alphas

wrd = Word(alphas)

templ = "The" + wrd("object") + "in" + wrd("location") + \
"stays mainly in the" + wrd("subloc") + "."

tests = """\
The rain in Spain stays mainly in the plain.
The snake in plane stays mainly in the cabin.
In Hempstead, Haverford and Hampshire hurricanes hardly ever
happen.
""".splitlines()
for t in tests:
t = t.strip()
try:
match = templ.parseString(t)
print match.object
print match.location
print match.subloc
print "Fields are: %(object)s %(location)s %(subloc)s" % match
except:
print "'" + t + "' is not a match."
print

Read more about pyparsing at http://pyparsing.wikispaces.com.
-- Paul

Tino Wildenhain

unread,
Oct 9, 2008, 10:02:21 AM10/9/08
to sk...@pobox.com, pytho...@python.org
??? can you elaborate? I don't see the problem.

"%(foo)s" % mapping

just calls get("foo") on mapping so if you have a dictionary
with all possible values it just works. If you want to do
some fancy stuff just subclass and change the method
call appropriately.

Regards
Tino

Joe Strout

unread,
Oct 9, 2008, 10:24:21 AM10/9/08
to pytho...@python.org
On Oct 9, 2008, at 7:05 AM, sk...@pobox.com wrote:

> Tino> http://docs.python.org/library/stdtypes.html#string-formatting-operations
>
> That shows how to use the template formatting as it currently
> exists. To my
> knowledge there is no support for the inverse operation, which is
> what Joe
> asked about. Given a string and a format string assign the elements
> of the
> string which correspond to the template elements to key/value pairs
> in a
> dictionary.

Right.

Well, what do y'all think? It wouldn't be too hard to write this for
myself, but it seems like the sort of thing Python ought to have built
in. Right on the Template class, so it doesn't add anything new to
the global namespace; it just makes this class more useful.

I took a look at PEP 3101, which is more of a high-powered string
formatter (as the title says, Advanced String Formatting), and will be
considerably more intimidating for a beginner than Template. So, even
if that goes through, perhaps Template will stick around, and being
able to use it in both directions could be quite handy.

Oh boy! Could this be my very first PEP? :)

Thanks for any opinions,
- Joe


sk...@pobox.com

unread,
Oct 9, 2008, 10:40:11 AM10/9/08
to Tino Wildenhain, pytho...@python.org

Tino> ??? can you elaborate? I don't see the problem.

Tino> "%(foo)s" % mapping

Joe wants to go in the other direction. Using your example, he wants a
function which takes a string and a template string and returns a dict.
Here's a concrete example:

s = "My dog has fleas"
fmt = "My $pet has $parasites"
d = fmt_extract(fmt, s)
assert d['pet'] == 'dog'
assert d['parasites'] == 'fleas'

Skip

Joe Strout

unread,
Oct 9, 2008, 12:20:19 PM10/9/08
to pytho...@python.org
Wow, this was harder than I thought (at least for a rusty Pythoneer
like myself). Here's my stab at an implementation. Remember, the
goal is to add a "match" method to Template which works like
Template.substitute, but in reverse: given a string, if that string
matches the template, then it should return a dictionary mapping each
template field to the corresponding value in the given string.

Oh, and as one extra feature, I want to support a ".greedy" attribute
on the Template object, which determines whether the matching of
fields should be done in a greedy or non-greedy manner.

------------------------------------------------------------
#!/usr/bin/python

from string import Template
import re

def templateMatch(self, s):
# start by finding the fields in our template, and building a map
# from field position (index) to field name.
posToName = {}
pos = 1
for item in self.pattern.findall(self.template):
# each item is a tuple where item 1 is the field name
posToName[pos] = item[1]
pos += 1

# determine if we should match greedy or non-greedy
greedy = False
if self.__dict__.has_key('greedy'):
greedy = self.greedy

# now, build a regex pattern to compare against s
# (taking care to escape any characters in our template that
# would have special meaning in regex)
pat = self.template.replace('.', '\\.')
pat = pat.replace('(', '\\(')
pat = pat.replace(')', '\\)') # there must be a better way...

if greedy:
pat = self.pattern.sub('(.*)', pat)
else:
pat = self.pattern.sub('(.*?)', pat)
p = re.compile(pat)

# try to match this to the given string
match = p.match(s)
if match is None: return None
out = {}
for i in posToName.keys():
out[posToName[i]] = match.group(i)
return out


Template.match = templateMatch

t = Template("The $object in $location falls mainly in the $subloc.")
print t.match( "The rain in Spain falls mainly in the train." )
------------------------------------------------------------

This sort-of works, but it won't properly handle $$ in the template,
and I'm not too sure whether it handles the ${fieldname} form,
either. Also, it only escapes '.', '(', and ')' in the template...
there must be a better way of escaping all characters that have
special meaning to RegEx, except for '$' (which is why I can't use
re.escape).

Probably the rest of the code could be improved too. I'm eager to
hear your feedback.

Thanks,
- Joe


MRAB

unread,
Oct 9, 2008, 5:53:12 PM10/9/08
to

How about something like:

import re

def placeholder(m):
if m.group(1):
return "(?P<%s>.+)" % m.group(1)
elif m.group(2):
return "\\$"
else:
return re.escape(m.group(3))

regex = re.compile(r"\$(\w+)|(\$\$)")

t = "The $object in $location falls mainly in the $subloc."
print regex.sub(placeholder, t)

0 new messages