slow!

Neal Becker

unread,

Jan 30, 2010, 8:39:08 AM1/30/10

to le...@googlegroups.com

I'm using lepl to parse a bunch of simple strings. The format of the string
is shown in the example at the end.

Works fine, but it's slow! I know lepl is a bit overkill for this job (regexp
problably would work). Is there any obvious way to improve the speed of this
parser?

I'm parsing a bunch of files. Each file has 2 lines, one line is parsed by the
first parser, and the other line is parsed by the 2nd parser.

def combine(results):
all = {}
for result in results: # a list of dicts
all[result['name']] = result['value']
return all

name = Word() > 'name'
value = Drop("'") & Word(AnyBut("'")) & Drop("'") | Word() > 'value'
arg = name & '=' & value > make_dict
spaces = ~Space()[:]
with Separator(spaces):
sep = ','
args = arg[1:,Drop(sep)]
l = Drop('Namespace(') & args & Drop(')') > combine

with Separator (spaces):
val = name & Drop(':') & value > make_dict
vals = val[1:] > combine

print l.parse_string ('''Namespace(alpha=0.10000000000000001, decim_phase=0,
delta=1.0000000000000001e-05, epsilon=0, errors=100, esnodB=13.07,
fftsize=1024, filter_type='none', freq=0, initial_error=0,
logname='test_btr_equal_coded.py.nbecker6.3204d546abd8e4b8fbf5243c8e9fc184.err',
loop_gain=0.0001, max_hours=12.0, max_iter=13, mod='16apsk', notch=False,
order=7, print_time=30, rate='8/9', si=32, size=16200, sps=2.2727270000000002,
symbol_rate=220000000.0, taps=25, tau=0, tolerance=0, wait_converge=2000,
wait_mse=None, wait_stable=1000)''')

print vals.parse_string ('''err: 0 packet: 28343 per: 0.0 mse: -13.0492273838
tau: 0.135622039908 delay: 93 corr: 205.260337205 phase: 0.635247951126
timing: 0.0116289076632''')

andrew cooke

unread,

Jan 30, 2010, 11:40:12 AM1/30/10

to le...@googlegroups.com

Hi,

If you are using a parser more than once it is a lot more efficient to
create it and then use it. In the code you posted you are doing:

I.parse_string(...)

which is creating a parser each time it is called. Instead, try:

parser = I.string_parser()
parser.parse(....)

and then for repeated use, repeat the second line.

Someone else here suggested caching the parser, so that it would "just
work", and if I remember correctly that does now exist in the latest
code in subversion, but since the subversion code is in the middle of
a major refactor at the moment I would not suggest using it.

More generally, there are other ways to increase the speed of LEPL,
and the current refactoring will improve performance further. But
from what you said, I think the main issue here is that parse_string
creates a parser on each call.

Cheers,
Andrew

PS I am writing the above in a bit of a hurry from a hotel room; I
haven't checked anything so please forgive any typos. Hopefully the
general idea is OK.

2010/1/30 Neal Becker <ndbe...@gmail.com>:

> --
> You received this message because you are subscribed to the Google Groups "lepl" group.
> To post to this group, send email to le...@googlegroups.com.
> To unsubscribe from this group, send email to lepl+uns...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/lepl?hl=en.
>
>

Neal Becker

unread,

Jan 30, 2010, 11:58:11 AM1/30/10

to le...@googlegroups.com

On Saturday 30 January 2010, andrew cooke wrote:
> Hi,
>
> If you are using a parser more than once it is a lot more efficient to
> create it and then use it. In the code you posted you are doing:
>
> I.parse_string(...)
>
> which is creating a parser each time it is called. Instead, try:
>
> parser = I.string_parser()
> parser.parse(....)
>

That would be parser(...) I believe.

Thanks. Tried it - no noticable difference.

andrew cooke

unread,

Jan 30, 2010, 2:02:40 PM1/30/10

to le...@googlegroups.com

2010/1/30 Neal Becker <ndbe...@gmail.com>:

> On Saturday 30 January 2010, andrew cooke wrote:
>> Hi,
>>
>> If you are using a parser more than once it is a lot more efficient to
>> create it and then use it. In the code you posted you are doing:
>>
>> I.parse_string(...)
>>
>> which is creating a parser each time it is called. Instead, try:
>>
>> parser = I.string_parser()
>> parser.parse(....)
>>
> That would be parser(...) I believe.

ah, yes.

> Thanks. Tried it - no noticable difference.

then can you explain in more detail what is slow? perhaps provide
the code you are using for a timing test?

one other thing you can try is providing an empty Configuration() to
string_parser:

parser = I.string_parser(Configuration())

that will remove and memoisation, which can make simple parser slower.

andrew

Neal Becker

unread,

Jan 30, 2010, 2:33:43 PM1/30/10

to le...@googlegroups.com

On Saturday 30 January 2010, andrew cooke wrote:
> 2010/1/30 Neal Becker <ndbe...@gmail.com>:
> > On Saturday 30 January 2010, andrew cooke wrote:
> >> Hi,
> >>
> >> If you are using a parser more than once it is a lot more efficient to
> >> create it and then use it. In the code you posted you are doing:
> >>
> >> I.parse_string(...)
> >>
> >> which is creating a parser each time it is called. Instead, try:
> >>
> >> parser = I.string_parser()
> >> parser.parse(....)
> >
> > That would be parser(...) I believe.
>
> ah, yes.
>
> > Thanks. Tried it - no noticable difference.
>
> then can you explain in more detail what is slow? perhaps provide
> the code you are using for a timing test?
>
> one other thing you can try is providing an empty Configuration() to
> string_parser:
>
> parser = I.string_parser(Configuration())
>
> that will remove and memoisation, which can make simple parser slower.

I'm just watching it parse a list of files, and not measuring it scientifically.
It is much faster after making this change you just suggested.

> >> >c18 4.err', loop_gain=0.0001, max_hours=12.0, max_iter=13,

andrew cooke

unread,

Jan 30, 2010, 3:59:19 PM1/30/10

to le...@googlegroups.com

2010/1/30 Neal Becker <ndbe...@gmail.com>:

>> one other thing you can try is providing an empty Configuration() to
>> string_parser:
>>
>> parser = I.string_parser(Configuration())
>>

>> that will remove memoisation, which can make simple parser slower.

>
> I'm just watching it parse a list of files, and not measuring it scientifically.
> It is much faster after making this change you just suggested.

ah, ok. good, that makes sense. it was doing extra work that's
pointless in your case (the idea is that it works "out of the box" for
everyone, but it's not smart enough (yet) to automatically avoid the
extra overhead when it's not needed). also, the changes i am working
on at the moment will make it faster for this kind of problem.

cheers,
andrew

andrew cooke

unread,

Feb 1, 2010, 3:31:13 PM2/1/10

to le...@googlegroups.com

I've been experimenting a bit more with this data and my new code.

First, there is something disasterously wrong with the default config.
I need to work on that - sorry, I didn't realise it was that bad.

Second, I suspect that the following will give an additional increase
in speed over the previous fix (I can't be certain, because my code
base is slightly different, but I am seeing a speedup of a factor of
10 or so):

...string_parser(Configuration(rewriters=[regexp_rewriter(UnicodeAlphabet.instance,
False, DfaRegexp)]))

That rewrites much of the parser to be a regular expression, which
works really well for this particular case (at least in my tests).

Also, here's a slightly cleaner version of the parser (although less
general - it has a single space between args for example):

def combine(dicts):
return dict((d['name'], d['value']) for d in dicts)

name = Word(ascii_letters + '_') > 'name'
value = (String(quote="'") | Float() | 'False' | 'True' | 'None') > 'value'
arg = (name & '=' & value) > make_dict
line = Drop('Namespace(') & arg[1:,Drop(', ')] & Drop(')') > combine
parser = line[1:, '\n']

Andrew

andrew cooke

unread,

Feb 1, 2010, 3:56:28 PM2/1/10

to le...@googlegroups.com

Gah - had a bug in my test. Not 10x faster, just an extra 25%. Andrew

Reply all

Reply to author

Forward