Your grammar looks fine. It's worth remembering that Lepl is written in
Python, so it is slow. However, what I think is your main problem is that you
are not separating the work Lepl has to do to create the parser from the time
actually spent parsing.
Lepl does a lot of work "compiling" the parser. This does take time, but is
done just once. You can then use the parser many times.
By default the compiled parser is saved internally and reused, so you only pay
for this once, the first time you call the parser. But for timing, it means
that if you do the timing as in your code you include that time.
If you want to time just the time needed to run the parser then I would
suggest changing your code to:
table.config.default_line_aware()
begin = datetime.datetime.now()
parser = table.get_parse()
print 'compile time:', datetime.datetime.now() - begin
begin = datetime.datetime.now()
print table.parse(source)
print 'parse time:', datetime.datetime.now() - begin
Or alternatively:
table.config.default_line_aware()
for i in range(3):
begin = datetime.datetime.now()
table.parse(source)
print 'parse time:', i, datetime.datetime.now() - begin
We should show that the first time is longer than the rest (although this will
also show the results of any caching if you use that).
Does that help?
Cheers,
Andrew
> --
> You received this message because you are subscribed to the Google Groups "lepl" group.
> To post to this group, send email to le...@googlegroups.com.
> To unsubscribe from this group, send email to lepl+uns...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/lepl?hl=en.
>
OK I've been looking at this in more detail after work.
While what I said is true, you have also come across a bug / issue. My
regular expression code is taking a crazy long time to compile the regexp for
floats / reals, and that is what you are using here.
I've only know about this for a few weeks, and am still thinking about
possible fixes. In the long term I have some better regexp code that I will
add to Lepl. In the shorter term I may be able to switch to Python's re
library.
In your case a simple woraround is to replace Float() with a simpler regexp.
If I do timings for Token(Float()) and
Token(r'\-?[0-9]+\.[0-9]+(?:E\-?[0-9]+)?') I get:
Timing results
--------------
With compilation: 1 parse(s), best of 3
Parse only: 100 parse(s), best of 3
Matcher Compiling | Parse only
-------------------------------------------
float 3.769 | 0.000112 (s)
float no memoize 3.767 | 0.052816 (s)
regexp 0.542 | 0.000113 (s)
regexp no memoize 0.538 | 0.052089 (s)
(this is output form a new utility in Lepl 5). As you can see, using the
alternative regexp reduces the compilation time from 4s to half a second. And
the actual parsing, in either case is only taking 1/20s.
Hope that helps, and sorry for not spotting this earlier,
Andrew
On Tue, Mar 15, 2011 at 07:26:10AM -0700, Luca wrote: