Works fine, but it's slow! I know lepl is a bit overkill for this job (regexp
problably would work). Is there any obvious way to improve the speed of this
parser?
I'm parsing a bunch of files. Each file has 2 lines, one line is parsed by the
first parser, and the other line is parsed by the 2nd parser.
def combine(results):
all = {}
for result in results: # a list of dicts
all[result['name']] = result['value']
return all
name = Word() > 'name'
value = Drop("'") & Word(AnyBut("'")) & Drop("'") | Word() > 'value'
arg = name & '=' & value > make_dict
spaces = ~Space()[:]
with Separator(spaces):
sep = ','
args = arg[1:,Drop(sep)]
l = Drop('Namespace(') & args & Drop(')') > combine
with Separator (spaces):
val = name & Drop(':') & value > make_dict
vals = val[1:] > combine
print l.parse_string ('''Namespace(alpha=0.10000000000000001, decim_phase=0,
delta=1.0000000000000001e-05, epsilon=0, errors=100, esnodB=13.07,
fftsize=1024, filter_type='none', freq=0, initial_error=0,
logname='test_btr_equal_coded.py.nbecker6.3204d546abd8e4b8fbf5243c8e9fc184.err',
loop_gain=0.0001, max_hours=12.0, max_iter=13, mod='16apsk', notch=False,
order=7, print_time=30, rate='8/9', si=32, size=16200, sps=2.2727270000000002,
symbol_rate=220000000.0, taps=25, tau=0, tolerance=0, wait_converge=2000,
wait_mse=None, wait_stable=1000)''')
print vals.parse_string ('''err: 0 packet: 28343 per: 0.0 mse: -13.0492273838
tau: 0.135622039908 delay: 93 corr: 205.260337205 phase: 0.635247951126
timing: 0.0116289076632''')
If you are using a parser more than once it is a lot more efficient to
create it and then use it. In the code you posted you are doing:
I.parse_string(...)
which is creating a parser each time it is called. Instead, try:
parser = I.string_parser()
parser.parse(....)
and then for repeated use, repeat the second line.
Someone else here suggested caching the parser, so that it would "just
work", and if I remember correctly that does now exist in the latest
code in subversion, but since the subversion code is in the middle of
a major refactor at the moment I would not suggest using it.
More generally, there are other ways to increase the speed of LEPL,
and the current refactoring will improve performance further. But
from what you said, I think the main issue here is that parse_string
creates a parser on each call.
Cheers,
Andrew
PS I am writing the above in a bit of a hurry from a hotel room; I
haven't checked anything so please forgive any typos. Hopefully the
general idea is OK.
2010/1/30 Neal Becker <ndbe...@gmail.com>:
> --
> You received this message because you are subscribed to the Google Groups "lepl" group.
> To post to this group, send email to le...@googlegroups.com.
> To unsubscribe from this group, send email to lepl+uns...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/lepl?hl=en.
>
>
Thanks. Tried it - no noticable difference.
ah, yes.
> Thanks. Tried it - no noticable difference.
then can you explain in more detail what is slow? perhaps provide
the code you are using for a timing test?
one other thing you can try is providing an empty Configuration() to
string_parser:
parser = I.string_parser(Configuration())
that will remove and memoisation, which can make simple parser slower.
andrew
I'm just watching it parse a list of files, and not measuring it scientifically.
It is much faster after making this change you just suggested.
> >> >c18 4.err', loop_gain=0.0001, max_hours=12.0, max_iter=13,
ah, ok. good, that makes sense. it was doing extra work that's
pointless in your case (the idea is that it works "out of the box" for
everyone, but it's not smart enough (yet) to automatically avoid the
extra overhead when it's not needed). also, the changes i am working
on at the moment will make it faster for this kind of problem.
cheers,
andrew
First, there is something disasterously wrong with the default config.
I need to work on that - sorry, I didn't realise it was that bad.
Second, I suspect that the following will give an additional increase
in speed over the previous fix (I can't be certain, because my code
base is slightly different, but I am seeing a speedup of a factor of
10 or so):
...string_parser(Configuration(rewriters=[regexp_rewriter(UnicodeAlphabet.instance,
False, DfaRegexp)]))
That rewrites much of the parser to be a regular expression, which
works really well for this particular case (at least in my tests).
Also, here's a slightly cleaner version of the parser (although less
general - it has a single space between args for example):
def combine(dicts):
return dict((d['name'], d['value']) for d in dicts)
name = Word(ascii_letters + '_') > 'name'
value = (String(quote="'") | Float() | 'False' | 'True' | 'None') > 'value'
arg = (name & '=' & value) > make_dict
line = Drop('Namespace(') & arg[1:,Drop(', ')] & Drop(')') > combine
parser = line[1:, '\n']
Andrew