Hi,
I wrote a small lepl grammar to parse a dump file, it works but it
seems quite slow even for a few lines, I am sure that the problem lies
in the grammar I wrote... Could you please give me some hint about
enhancing the grammar speed (about 5 seconds) ? Thank you in advance!
Luca
Here is the snippet grammar :
SOL = Drop(LineAwareSol())
EOL = Drop(LineAwareEol())
integer = Map(Token(Integer()), int)
uletter = Token(Upper())
real = Map(Token(Real()), float)
source = '''1 G 0.0 0.0
0.0 0.0 0.0 0.0
2 G 0.0 0.0
0.0 0.0 0.0 0.0
3 G 0.0 0.0
0.0 0.0 0.0 0.0
4 G 0.0 0.0
0.0 0.0 0.0 0.0
5 G 0.0 0.0
0.0 0.0 0.0 0.0
6 G 0.0 0.0
0.0 0.0 0.0 0.0
7 G 0.0 0.0
0.0 0.0 0.0 0.0
8 G 0.0 0.0
0.0 0.0 0.0 0.0
9 G 0.0 0.0 -9.856000E-05
-1.444699E-17 1.944000E-03 0.0
10 G 0.0 0.0 -9.856000E-05
-1.427843E-17 1.944000E-03 0.0
11 G 0.0 0.0 -1.085216E-02
-2.749537E-16 1.874400E-02 0.0
12 G 0.0 0.0 -1.085216E-02
-2.748317E-16 1.874400E-02 0.0
13 G 0.0 0.0 -3.600576E-02
-6.652665E-16 3.074400E-02 0.0
14 G 0.0 0.0 -3.600576E-02
-6.717988E-16 3.074400E-02 0.0
15 G 0.0 0.0 -7.075936E-02
-8.592844E-16 3.794400E-02 0.0
16 G 0.0 0.0 -7.075936E-02
-8.537008E-16 3.794400E-02 0.0
17 G 0.0 0.0 -1.103130E-01
-9.445027E-16 4.034400E-02 0.0
18 G 0.0 0.0 -1.103130E-01
-9.538811E-16 4.034400E-02 0.0
100 G 0.0 0.0
0.0 0.0 0.0 0.0
200 G 0.0 0.0
0.0 0.0 0.0 0.0
'''
data_line = SOL & integer & uletter & Repeat(real, start = 6,
stop = 6) & EOL
table =OneOrMore(data_line)
table.config.default_line_aware()
begin = datetime.datetime.now()
print table.parse(source)
print datetime.datetime.now() - begin
Your grammar looks fine. It's worth remembering that Lepl is written in Python, so it is slow. However, what I think is your main problem is that you are not separating the work Lepl has to do to create the parser from the time actually spent parsing.
Lepl does a lot of work "compiling" the parser. This does take time, but is done just once. You can then use the parser many times.
By default the compiled parser is saved internally and reused, so you only pay for this once, the first time you call the parser. But for timing, it means that if you do the timing as in your code you include that time.
If you want to time just the time needed to run the parser then I would suggest changing your code to:
table.config.default_line_aware() begin = datetime.datetime.now() parser = table.get_parse() print 'compile time:', datetime.datetime.now() - begin begin = datetime.datetime.now() print table.parse(source) print 'parse time:', datetime.datetime.now() - begin
Or alternatively:
table.config.default_line_aware() for i in range(3): begin = datetime.datetime.now() table.parse(source) print 'parse time:', i, datetime.datetime.now() - begin
We should show that the first time is longer than the rest (although this will also show the results of any caching if you use that).
On Tue, Mar 15, 2011 at 07:26:10AM -0700, Luca wrote: > Hi, > I wrote a small lepl grammar to parse a dump file, it works but it > seems quite slow even for a few lines, I am sure that the problem lies > in the grammar I wrote... Could you please give me some hint about > enhancing the grammar speed (about 5 seconds) ? Thank you in advance!
> -- > You received this message because you are subscribed to the Google Groups "lepl" group. > To post to this group, send email to lepl@googlegroups.com. > To unsubscribe from this group, send email to lepl+unsubscribe@googlegroups.com. > For more options, visit this group at http://groups.google.com/group/lepl?hl=en.
OK I've been looking at this in more detail after work.
While what I said is true, you have also come across a bug / issue. My regular expression code is taking a crazy long time to compile the regexp for floats / reals, and that is what you are using here.
I've only know about this for a few weeks, and am still thinking about possible fixes. In the long term I have some better regexp code that I will add to Lepl. In the shorter term I may be able to switch to Python's re library.
In your case a simple woraround is to replace Float() with a simpler regexp. If I do timings for Token(Float()) and Token(r'\-?[0-9]+\.[0-9]+(?:E\-?[0-9]+)?') I get:
Timing results --------------
With compilation: 1 parse(s), best of 3 Parse only: 100 parse(s), best of 3
Matcher Compiling | Parse only ------------------------------------------- float 3.769 | 0.000112 (s) float no memoize 3.767 | 0.052816 (s) regexp 0.542 | 0.000113 (s) regexp no memoize 0.538 | 0.052089 (s)
(this is output form a new utility in Lepl 5). As you can see, using the alternative regexp reduces the compilation time from 4s to half a second. And the actual parsing, in either case is only taking 1/20s.
Hope that helps, and sorry for not spotting this earlier,
On Tue, Mar 15, 2011 at 07:26:10AM -0700, Luca wrote: > Hi, > I wrote a small lepl grammar to parse a dump file, it works but it > seems quite slow even for a few lines, I am sure that the problem lies > in the grammar I wrote... Could you please give me some hint about > enhancing the grammar speed (about 5 seconds) ? Thank you in advance!
> -- > You received this message because you are subscribed to the Google Groups "lepl" group. > To post to this group, send email to lepl@googlegroups.com. > To unsubscribe from this group, send email to lepl+unsubscribe@googlegroups.com. > For more options, visit this group at http://groups.google.com/group/lepl?hl=en.
thank you very much, your solution works like a charm! :-)
I was trying to also take into account the compile time, because I
actually have two usage profiles : hundreds of small unit tests which
I execute quite often (so they need to be fast enough, and which would
not benefit of compilation) and the real executions (one single
execution over hundreds of thousands of similar lines, so compilation
is really welcome in this case). Since the compilation time is not too
high, I think I will keep it, otherwise I think I could disable it
during tests with a flag...
I suppose that in the future lepl could possibly take advantage of a
cython compilation, when cython will support things such as "yield"
keyword and such... I have personally been able to compile most of my
code with interesting improvements ;-)
thank you again for your prompt answer and help, and for creating
lepl!
Luca
On 16 mar, 01:11, andrew cooke <and...@acooke.org> wrote:
> OK I've been looking at this in more detail after work.
> While what I said is true, you have also come across a bug / issue. My
> regular expression code is taking a crazy long time to compile the regexp for
> floats / reals, and that is what you are using here.
> I've only know about this for a few weeks, and am still thinking about
> possible fixes. In the long term I have some better regexp code that I will
> add to Lepl. In the shorter term I may be able to switch to Python's re
> library.
> In your case a simple woraround is to replace Float() with a simpler regexp.
> If I do timings for Token(Float()) and
> Token(r'\-?[0-9]+\.[0-9]+(?:E\-?[0-9]+)?') I get:
> Timing results
> --------------
> With compilation: 1 parse(s), best of 3
> Parse only: 100 parse(s), best of 3
> Matcher Compiling | Parse only
> -------------------------------------------
> float 3.769 | 0.000112 (s)
> float no memoize 3.767 | 0.052816 (s)
> regexp 0.542 | 0.000113 (s)
> regexp no memoize 0.538 | 0.052089 (s)
> (this is output form a new utility in Lepl 5). As you can see, using the
> alternative regexp reduces the compilation time from 4s to half a second. And
> the actual parsing, in either case is only taking 1/20s.
> Hope that helps, and sorry for not spotting this earlier,
> Andrew
> On Tue, Mar 15, 2011 at 07:26:10AM -0700, Luca wrote:
> > Hi,
> > I wrote a small lepl grammar to parse a dump file, it works but it
> > seems quite slow even for a few lines, I am sure that the problem lies
> > in the grammar I wrote... Could you please give me some hint about
> > enhancing the grammar speed (about 5 seconds) ? Thank you in advance!
> > --
> > You received this message because you are subscribed to the Google Groups "lepl" group.
> > To post to this group, send email to lepl@googlegroups.com.
> > To unsubscribe from this group, send email to lepl+unsubscribe@googlegroups.com.
> > For more options, visit this group athttp://groups.google.com/group/lepl?hl=en.