> --
> You received this message because you are subscribed to the Google Groups
> "lepl" group.
> To post to this group, send email to le...@googlegroups.com.
> To unsubscribe from this group, send email to
> lepl+uns...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/lepl?hl=en.
>
Please say if it's not any help.
Andrew
On Wed, 29 Dec 2010 07:30:59 -0500, andrew cooke <and...@acooke.org>
wrote:
Hope this helps. It's an interesting example, because the trampoline
matcher factory isn't well documented. I will add this to the next
release as an example.
Andrew
from logging import basicConfig, INFO
from lepl import *
basicConfig(level=INFO)
def line(matcher):
'''Include space before and a newline after'''
return ~Space()[:] & matcher & ~Space()[:] & ~Literal('\n')
@trampoline_matcher_factory()
def make_pair(start, end, contents):
'''Generate a matcher that checks start and end tags.
`start` must match the start and return the "label"
`end` must match the end and return the same "label"
If end fails then contents is used instead (and the matcher
repeats.
The return value is a pair that contains the label and
the contents.'''
def matcher(support, stream0):
(match, stream1) = yield start._match(stream0)
label = match[0]
result = []
while True:
# can we end?
try:
(match, stream2) = yield end._match(stream1)
if match and match[0] == label:
yield ([(label, result)], stream2)
else:
support._info('end matched, but %s != %s' %
(match[0], label))
except StopIteration:
support._info('failed to match end')
pass
# failed to end, so try matching contents instead
(match, stream2) = yield contents._match(stream1)
result += match
support._info('matched %s: %s' % (match, result))
stream1 = stream2
return matcher
nested = Delayed()
start = line(~Literal('*') & Word())
end = line(~Literal('*end*') & Word())
# note that we force at least one match here ([1:]) to avoid repeated
empty
# matches (which would give an infinite loop instead of failing)
# also, nested must go first here, or we match *bar as data
contents = nested | line(Word()[1:,~Space()[:]])
nested += make_pair(start, end, contents)
def test(result, target):
assert result == target, str(target) + ' != ' + str(result)
# start token
test(start.parse('*foo\n'), ['foo'])
test(start.parse(' *foo \n'), ['foo'])
# end token
test(end.parse('*end*foo\n'), ['foo'])
test(end.parse(' *end*foo \n'), ['foo'])
# simple pair
test(nested.parse('*foo\n*end*foo\n'), [('foo', [])])
# simple contents
test(contents.parse('ab c\n'), ['ab', 'c'])
# nested contents
test(nested.parse(
'''*foo
ab c
*end*foo
'''), [('foo', ['ab', 'c'])])
# multiple lines
test(nested.parse(
'''*foo
ab c
p q rs tu
*end*foo
'''), [('foo', ['ab', 'c', 'p', 'q', 'rs', 'tu'])])
# multiple nestings
test(nested.parse(
'''*foo
ab c
*bar
p q rs tu
*end*bar
*end*foo
'''), [('foo', ['ab', 'c', ('bar', ['p', 'q', 'rs', 'tu'])])])
# this is consistent, but perhaps not expected. if you don't want
this, then
# you need to define contents more carefully - perhaps exclude '*' from
the
# characters in Word() for example.
test(nested.parse(
'''*foo
ab c
*bar
p q rs tu
*end*baz
*end*foo
'''), [('foo', ['ab', 'c', '*bar', 'p', 'q', 'rs', 'tu', '*end*baz'])])
On Fri, 31 Dec 2010 15:36:12 -0500, andrew cooke <and...@acooke.org>
wrote:
> Hi,
>
> Hope this helps. It's an interesting example, because the trampoline
> matcher factory isn't well documented. I will add this to the next
> release as an example.
[...]
--
ALSO there was a bug in the code I gave originally, which may not have
helped help (sorry). You need a "return" as follows:
@trampoline_matcher_factory()
def make_pair(start, end, contents):
def matcher(support, stream0):
(match, stream1) = yield start._match(stream0)
label = match[0]
result = []
while True:
try:
(match, stream2) = yield end._match(stream1)
if match:
if match[0] == label:
yield ([(label, result)], stream2)
return # THIS WAS MISSING
else:
support._debug('Bad end: %s' % match[0]) #
debug code to find the error in data
except StopIteration:
pass
(match, stream2) = yield contents._match(stream1)
result += match
stream1 = stream2
return matcher
The problem with the input data is here (see how making the parser more
accurate will help? and also how the debug logging above helps identify
this?):
:Att:Font
Size=9
:eFont
:Txt:Font
Size=9
Bold
:eFont
I am still looking at this; there may be other issues too.
Andrew
On Mon, 3 Jan 2011 06:56:22 -0200, Haroldo Stenger
<haroldo...@gmail.com> wrote:
> Hi Andrew ,
>
> the grammar worked excellent. I noticed it goes slow on rather bigger
> inputs. I plan to run it on real big inputs. I believe the
> slugishness
> has to do with slicing into words the contents in the 'non marker'
> lines, besides the 'big lookahead' involved (which I don't know if
> should be treated differently, what your opinion on this?).
>
> Were these lines treated as lines, not as list of words, maybe it
> would go faster ? I hope so ! , and also, I'm wondering if the 'line
> continuation' abstraction (although not explicit in this case as in
> the case, here just the end marker ends a set of lines) , could be
> somehow applied to the benefit of rapidness.
>
> I provide my dataset just in case it helps, which Iput in a string
> in the code/
>
> btw, I used the attached code to parse.
>
> thanks for the hard work.
>
> best regards,
> Haroldo
>
> 2010/12/31 andrew cooke
>
> test(start.parse('*foon'), ['foo'])
> test(start.parse(' *foo n'), ['foo'])
>
> # end token
> test(end.parse('*end*foon'), ['foo'])
> test(end.parse(' *end*foo n'), ['foo'])
>
> # simple pair
> test(nested.parse('*foon*end*foon'), [('foo', [])])
>
> # simple contents
> test(contents.parse('ab cn'), ['ab', 'c'])
> To post to this group, send email to le...@googlegroups.com [2].
> To unsubscribe from this group, send email to
> lepl+uns...@googlegroups.com [3].
> For more options, visit this group at
> http://groups.google.com/group/lepl?hl=en [4].
>
> --
> You received this message because you are subscribed to the Google
> Groups "lepl" group.
> To post to this group, send email to le...@googlegroups.com.
> To unsubscribe from this group, send email to
> lepl+uns...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/lepl?hl=en.
>
>
> Links:
> ------
> [1] mailto:and...@acooke.org
> [2] mailto:le...@googlegroups.com
> [3] mailto:lepl%2Bunsu...@googlegroups.com
> [4] http://groups.google.com/group/lepl?hl=en
def make_3():
text = get_data(1)
fix =
compile_(r'(?m)^(\s*)(?::(?:[A-Z][a-z]*)+)+?(:(?:[A-Z][a-z]*)+.*)$')
text = fix.sub(r'\1\2', text)
with get_file(3, 'w') as out:
out.write(text)
on the input I found the following errors:
pl6 lepl-hg: diff ./src/lepl/_performance/dynamic/data-1.txt
./src/lepl/_performance/dynamic/data-3.txt
392c392
< :Att:Font
---
> :Font
395c395
< :Txt:Font
---
> :Font
786c786
< :Title:ColorInfo
---
> :ColorInfo
788c788
< :Rows:ColorInfo
---
> :ColorInfo
799c799
< :Title:ColorInfo
---
> :ColorInfo
801c801
< :Rows:ColorInfo
---
> :ColorInfo
811c811
< :Title:ColorInfo
---
> :ColorInfo
813c813
< :Rows:ColorInfo
---
> :ColorInfo
819c819
< :Att:Font
---
> :Font
824c824
< :Tit:Font
---
> :Font
Andrew
def line(matcher):
return ~Space()[:] & matcher & ~Space()[:] & ~Literal('\n')
@trampoline_matcher_factory()
def make_pair(start, end, contents):
def matcher(support, stream0):
(match, stream1) = yield start._match(stream0)
label = match[0]
result = []
while True:
try:
(match, stream2) = yield end._match(stream1)
if match:
if match[0] == label:
yield ([(label, result)], stream2)
return
else:
support._debug('Bad end: %s' % match[0])
except StopIteration:
pass
(match, stream2) = yield contents._match(stream1)
result += match
stream1 = stream2
return matcher
def base():
nested = Delayed()
start = line(~Literal(':') & Word())
end = line(~Literal(':e') & Word())
contents = nested | line(Word()[:,~Space()[:]])
nested += make_pair(start, end, contents)
return nested
def restricted():
nested = Delayed()
camel = Word(ascii_uppercase, ascii_lowercase)[1:, ...]
start = line(~Literal(':') & camel)
end = line(~Literal(':e') & camel)
anything = AnyBut('\n')[:, ...] & ~Literal('\n')
contents = nested | anything
nested += make_pair(start, end, contents)
return nested