The obvious application for this is to allow parsing of large inputs without
holding all the data in memory. And I just used it to rewrite all the Lepl
documentation using about 1% of memory.
Here's the matcher itself:
@trampoline_matcher_factory()
def Iterate(matcher):
'''
This isn't complex to implement, but conceptually is rather odd. It takes
a single matcher and returns a result for each match as it consumes the input.
This means `parse_all()` is needed to retrieve the entire result (and there
is no backtracking).
In practice this means that if you have a matcher whose top level is a
repeating element (for example, lines in a file) then you can treat the
entire parser as a lazy iterator over the input. The obvious application
is with `.config.low_memory()` as this allows for large output to be
generated without consuming a large amount of memory.
'''
def match(support, stream):
while True:
(result, stream) = yield matcher._match(stream)
yield (result, stream)
return match
and here's a snippet of how it's used:
def matcher():
BQ = '`'
BQ2 = BQ + BQ
junk = AnyBut(BQ)[1:,...]
backquote = Literal(BQ) >> (lambda x: BQ2)
spaces = Whitespace()[1:,...]
function = Any(ascii_uppercase + ascii_lowercase + digits + '.')[1:,...] + Optional('()')
api_ref = '<api/redirect' + AnyBut('>')[:,...] + '>'
link = backquote + function + Drop(spaces + api_ref) + backquote + Drop('_')
other = Or(BQ + junk + BQ, BQ2 + junk + BQ2)
return Iterate(junk | link | other)
def rewrite(matcher, path, dir=None, backup='.old'):
matcher.config.no_full_first_match().low_memory()
parser = matcher.get_parse_file_all()
(fd, temp) = mkstemp(dir=dir)
output = fdopen(fd, 'w')
with open(path) as input:
for line in parser(input):
output.write(line[0])
output.close()
if backup:
prev = path+backup
if exists(prev):
remove(prev)
rename(path, prev)
else:
remove(path)
rename(temp, path)
The rewrite function applies the matcher line by line to the input, writing
out a new version and working something like a conveyor belt.
Andrew