execution speed

31 views
Skip to first unread message

Luca

unread,
Mar 15, 2011, 10:26:10 AM3/15/11
to lepl
Hi,
I wrote a small lepl grammar to parse a dump file, it works but it
seems quite slow even for a few lines, I am sure that the problem lies
in the grammar I wrote... Could you please give me some hint about
enhancing the grammar speed (about 5 seconds) ? Thank you in advance!

Luca

Here is the snippet grammar :

SOL = Drop(LineAwareSol())
EOL = Drop(LineAwareEol())
integer = Map(Token(Integer()), int)
uletter = Token(Upper())
real = Map(Token(Real()), float)

source = '''1 G 0.0 0.0
0.0 0.0 0.0 0.0
2 G 0.0 0.0
0.0 0.0 0.0 0.0
3 G 0.0 0.0
0.0 0.0 0.0 0.0
4 G 0.0 0.0
0.0 0.0 0.0 0.0
5 G 0.0 0.0
0.0 0.0 0.0 0.0
6 G 0.0 0.0
0.0 0.0 0.0 0.0
7 G 0.0 0.0
0.0 0.0 0.0 0.0
8 G 0.0 0.0
0.0 0.0 0.0 0.0
9 G 0.0 0.0 -9.856000E-05
-1.444699E-17 1.944000E-03 0.0
10 G 0.0 0.0 -9.856000E-05
-1.427843E-17 1.944000E-03 0.0
11 G 0.0 0.0 -1.085216E-02
-2.749537E-16 1.874400E-02 0.0
12 G 0.0 0.0 -1.085216E-02
-2.748317E-16 1.874400E-02 0.0
13 G 0.0 0.0 -3.600576E-02
-6.652665E-16 3.074400E-02 0.0
14 G 0.0 0.0 -3.600576E-02
-6.717988E-16 3.074400E-02 0.0
15 G 0.0 0.0 -7.075936E-02
-8.592844E-16 3.794400E-02 0.0
16 G 0.0 0.0 -7.075936E-02
-8.537008E-16 3.794400E-02 0.0
17 G 0.0 0.0 -1.103130E-01
-9.445027E-16 4.034400E-02 0.0
18 G 0.0 0.0 -1.103130E-01
-9.538811E-16 4.034400E-02 0.0
100 G 0.0 0.0
0.0 0.0 0.0 0.0
200 G 0.0 0.0
0.0 0.0 0.0 0.0
'''

data_line = SOL & integer & uletter & Repeat(real, start = 6,
stop = 6) & EOL
table =OneOrMore(data_line)
table.config.default_line_aware()
begin = datetime.datetime.now()
print table.parse(source)
print datetime.datetime.now() - begin

andrew cooke

unread,
Mar 15, 2011, 10:37:37 AM3/15/11
to le...@googlegroups.com

Hi,

Your grammar looks fine. It's worth remembering that Lepl is written in
Python, so it is slow. However, what I think is your main problem is that you
are not separating the work Lepl has to do to create the parser from the time
actually spent parsing.

Lepl does a lot of work "compiling" the parser. This does take time, but is
done just once. You can then use the parser many times.

By default the compiled parser is saved internally and reused, so you only pay
for this once, the first time you call the parser. But for timing, it means
that if you do the timing as in your code you include that time.

If you want to time just the time needed to run the parser then I would
suggest changing your code to:

table.config.default_line_aware()
begin = datetime.datetime.now()
parser = table.get_parse()
print 'compile time:', datetime.datetime.now() - begin


begin = datetime.datetime.now()
print table.parse(source)

print 'parse time:', datetime.datetime.now() - begin

Or alternatively:

table.config.default_line_aware()
for i in range(3):
begin = datetime.datetime.now()
table.parse(source)
print 'parse time:', i, datetime.datetime.now() - begin

We should show that the first time is longer than the rest (although this will
also show the results of any caching if you use that).

Does that help?

Cheers,
Andrew

> --
> You received this message because you are subscribed to the Google Groups "lepl" group.
> To post to this group, send email to le...@googlegroups.com.
> To unsubscribe from this group, send email to lepl+uns...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/lepl?hl=en.
>

andrew cooke

unread,
Mar 15, 2011, 8:11:24 PM3/15/11
to le...@googlegroups.com, luca.d...@gmail.com

Hi,

OK I've been looking at this in more detail after work.

While what I said is true, you have also come across a bug / issue. My
regular expression code is taking a crazy long time to compile the regexp for
floats / reals, and that is what you are using here.

I've only know about this for a few weeks, and am still thinking about
possible fixes. In the long term I have some better regexp code that I will
add to Lepl. In the shorter term I may be able to switch to Python's re
library.

In your case a simple woraround is to replace Float() with a simpler regexp.
If I do timings for Token(Float()) and
Token(r'\-?[0-9]+\.[0-9]+(?:E\-?[0-9]+)?') I get:

Timing results
--------------

With compilation: 1 parse(s), best of 3
Parse only: 100 parse(s), best of 3

Matcher Compiling | Parse only
-------------------------------------------
float 3.769 | 0.000112 (s)
float no memoize 3.767 | 0.052816 (s)
regexp 0.542 | 0.000113 (s)
regexp no memoize 0.538 | 0.052089 (s)

(this is output form a new utility in Lepl 5). As you can see, using the
alternative regexp reduces the compilation time from 4s to half a second. And
the actual parsing, in either case is only taking 1/20s.

Hope that helps, and sorry for not spotting this earlier,

Andrew


On Tue, Mar 15, 2011 at 07:26:10AM -0700, Luca wrote:

Luca

unread,
Mar 16, 2011, 4:46:34 AM3/16/11
to lepl
Hi,

thank you very much, your solution works like a charm! :-)
I was trying to also take into account the compile time, because I
actually have two usage profiles : hundreds of small unit tests which
I execute quite often (so they need to be fast enough, and which would
not benefit of compilation) and the real executions (one single
execution over hundreds of thousands of similar lines, so compilation
is really welcome in this case). Since the compilation time is not too
high, I think I will keep it, otherwise I think I could disable it
during tests with a flag...

I suppose that in the future lepl could possibly take advantage of a
cython compilation, when cython will support things such as "yield"
keyword and such... I have personally been able to compile most of my
code with interesting improvements ;-)

thank you again for your prompt answer and help, and for creating
lepl!

Luca
Reply all
Reply to author
Forward
0 new messages