Rewriting files in Lepl

0 views
Skip to first unread message

andrew cooke

unread,
Mar 18, 2011, 8:59:10 PM3/18/11
to Lepl Mailing List

This is a neat trick that I'm including in Lepl 5. You can define a matcher
(which I've called Iterate()) that yields each line in a match as a separate
result. That effectively changes Lepl's "parse_all()" to be a generator. or
rather, it changes the entire parser to be a generator, so input and output
move in parallel.

The obvious application for this is to allow parsing of large inputs without
holding all the data in memory. And I just used it to rewrite all the Lepl
documentation using about 1% of memory.

Here's the matcher itself:

@trampoline_matcher_factory()
def Iterate(matcher):
'''
This isn't complex to implement, but conceptually is rather odd. It takes
a single matcher and returns a result for each match as it consumes the input.

This means `parse_all()` is needed to retrieve the entire result (and there
is no backtracking).

In practice this means that if you have a matcher whose top level is a
repeating element (for example, lines in a file) then you can treat the
entire parser as a lazy iterator over the input. The obvious application
is with `.config.low_memory()` as this allows for large output to be
generated without consuming a large amount of memory.
'''
def match(support, stream):
while True:
(result, stream) = yield matcher._match(stream)
yield (result, stream)
return match

and here's a snippet of how it's used:

def matcher():

BQ = '`'
BQ2 = BQ + BQ

junk = AnyBut(BQ)[1:,...]

backquote = Literal(BQ) >> (lambda x: BQ2)
spaces = Whitespace()[1:,...]
function = Any(ascii_uppercase + ascii_lowercase + digits + '.')[1:,...] + Optional('()')
api_ref = '<api/redirect' + AnyBut('>')[:,...] + '>'
link = backquote + function + Drop(spaces + api_ref) + backquote + Drop('_')

other = Or(BQ + junk + BQ, BQ2 + junk + BQ2)

return Iterate(junk | link | other)

def rewrite(matcher, path, dir=None, backup='.old'):
matcher.config.no_full_first_match().low_memory()
parser = matcher.get_parse_file_all()
(fd, temp) = mkstemp(dir=dir)
output = fdopen(fd, 'w')
with open(path) as input:
for line in parser(input):
output.write(line[0])
output.close()
if backup:
prev = path+backup
if exists(prev):
remove(prev)
rename(path, prev)
else:
remove(path)
rename(temp, path)

The rewrite function applies the matcher line by line to the input, writing
out a new version and working something like a conveyor belt.

Andrew

Reply all
Reply to author
Forward
0 new messages