Matt
--
"You do not really understand something unless you can explain it to
your grandmother." -- Albert Einstein.
_______________________________________________
cl-ppcre-devel site list
cl-ppcr...@common-lisp.net
http://common-lisp.net/mailman/listinfo/cl-ppcre-devel
I might be confused, but isn't '^' what you need?
-Hans
> On Sun, Sep 28, 2008 at 14:02, Matthew D. Swank
> <akopa.gma...@gmail.com> wrote:
> > Is it possible to create a scanner that only matches at the start
> > index?
>
> I might be confused, but isn't '^' what you need?
>
Probably, I just hadn't thought of it. I could pre-pend a '^' to each
choice.
Thanks, Matt
--
"You do not really understand something unless you can explain it to
your grandmother." -- Albert Einstein.
> On Sun, Sep 28, 2008 at 14:02, Matthew D. Swank
> <akopa.gma...@gmail.com> wrote:
> > Is it possible to create a scanner that only matches at the start
> > index?
>
> I might be confused, but isn't '^' what you need?
>
I tried using a contruct like `(:sequence :start-anchor (:regex ,regex))
where regex is a pcre string, but matching still takes for ever (as in
I gave up after 10 min) when slurping a moderately sized file (400k).
Note, matching works fine for files under 1k, or if I break it up into
lines for line oriented input.
Matt
--
"You do not really understand something unless you can explain it to
your grandmother." -- Albert Einstein.
> I tried using a contruct like `(:sequence :start-anchor (:regex
> ,regex)) where regex is a pcre string, but matching still takes for
> ever (as in I gave up after 10 min) when slurping a moderately sized
> file (400k). Note, matching works fine for files under 1k, or if I
> break it up into lines for line oriented input.
Show us the regex you were using and some test data and then maybe we
can help you to optimize it.
I suppose you read this?
http://weitz.de/cl-ppcre/#blabla
Edi.
Well the regexes are defined in the lexers in this file:
http://common-lisp.net/~mswank/apache-ppcre.lisp
The lexer api is in this file:
http://common-lisp.net/~mswank/cl-ppcre-lexer.lisp
Finally, the log file I'm lexing:
http://lcpug.asternix.com/pub/Main/ApacheLogProject/access.log
Compare
(with-open-file (in "access.log")
(let ((foo (stream-gen *apache-pcrelex-line* in)))
(time (loop :for x := (funcall foo)
:unless x :return nil))))
with
(with-open-file (in "access.log")
(let ((foo (stream-gen *apache-pcrelex* in)))
(time (loop :for x := (funcall foo)
:unless x :return nil))))
When I slurp the entire file into a string the matches seem to be
taking about a tenth of a second for each token.
Matt
--
"You do not really understand something unless you can explain it to
your grandmother." -- Albert Einstein.
> Well the regexes are defined in the lexers in this file:
> http://common-lisp.net/~mswank/apache-ppcre.lisp
>
> The lexer api is in this file:
> http://common-lisp.net/~mswank/cl-ppcre-lexer.lisp
>
> Finally, the log file I'm lexing:
> http://lcpug.asternix.com/pub/Main/ApacheLogProject/access.log
>
> Compare
> (with-open-file (in "access.log")
> (let ((foo (stream-gen *apache-pcrelex-line* in)))
> (time (loop :for x := (funcall foo)
> :unless x :return nil))))
>
> with
>
> (with-open-file (in "access.log")
> (let ((foo (stream-gen *apache-pcrelex* in)))
> (time (loop :for x := (funcall foo)
> :unless x :return nil))))
>
> When I slurp the entire file into a string the matches seem to be
> taking about a tenth of a second for each token.
Sorry, I don't have the time to read the entire application right now.
Can you boil this down to a single application of PPCRE:SCAN which is
too slow?
Thanks,
You are probably not doing the same thing with the "line oriented approach" and the "full file in
one string" approach.
With full file in, if not taking care of stopping the scan at end of each line (if you want a line
by line scanning as you suggest by trying such an approach as well), I guess your are scanning until
the end of the full string for each line (which for sure is very expensive).
But that's just a guess as I've only had a very quick look to your code :-)
Cheers,
Sebastien.
Matthew D. Swank a écrit :
> Hi Matthew,
>
> You are probably not doing the same thing with the "line oriented
> approach" and the "full file in one string" approach.
>
> With full file in, if not taking care of stopping the scan at end of
> each line (if you want a line by line scanning as you suggest by
> trying such an approach as well), I guess your are scanning until the
> end of the full string for each line (which for sure is very
> expensive).
>
> But that's just a guess as I've only had a very quick look to your
> code :-)
>
> Cheers,
> Sebastien.
Well, the lexer code is line agnostic; i.e. you could replace 'end
of each line' with any old stop. What it does is adjust the start
index as it matches tokens.
One thing I did notice is that I read the file into an adjustable
vector, and that is the string I pass to the scanners. I suppose ppcre
has to coerce that every time a scanner runs?
Matt
--
"You do not really understand something unless you can explain it to
your grandmother." -- Albert Einstein.
Yes, scan needs a simple-string. From the SCAN docs:
target-string will be coerced to a simple string if it isn't one
already.
Cheers,
Chris Dean
>
> "Matthew D. Swank" <akopa.gma...@gmail.com> writes:
> > One thing I did notice is that I read the file into an adjustable
> > vector, and that is the string I pass to the scanners. I suppose
> > ppcre has to coerce that every time a scanner runs?
>
> Yes, scan needs a simple-string. From the SCAN docs:
>
> target-string will be coerced to a simple string if it isn't one
> already.
>
Coercing the slurped file to a simple string makes things work
swimmingly.
Thanks,
Matt
--
"You do not really understand something unless you can explain it to
your grandmother." -- Albert Einstein.