On Thu, Jun 25, 2009 at 4:31 AM, Xiangjun Wu<neta...@gmail.com> wrote:
> "(\\w+)*\\@\\w+"
That's the type of regular expression that typically leads to a
combinatorial explosion in regex engines unless they use specific
"tricks" to deal with this. Recent versions of Perl are pretty clever
in this regard (they look for "floating" substrings) while CL-PPCRE
isn't, but - frankly - I don't really see the point of this. I think
this is mainly so that the regex engine looks good in benchmarks. I
definitely wouldn't call this a bug.
The question is - what do you want to achieve with this regular
expression? Can't you write it in a simpler way?
Cheers,
Edi.
_______________________________________________
cl-ppcre-devel site list
cl-ppcr...@common-lisp.net
http://common-lisp.net/mailman/listinfo/cl-ppcre-devel
> The question is - what do you want to achieve with this regular
> expression? Can't you write it in a simpler way?
Isn't this pattern pretty useful in general:
A@B
where A and B are word characters and @ is a specific non-word
character?
How else could we specify it?
[a-zA-Z0-9] doesn't seem acceptable to me since it relies on
the latin alphabet...
Leslie
--
http://www.linkedin.com/in/polzer
> On Jun 25, 9:21 pm, Edi Weitz <e...@agharta.de> wrote:
>
>> The question is - what do you want to achieve with this regular
>> expression? Can't you write it in a simpler way?
>
> Isn't this pattern pretty useful in general:
>
> A@B
>
> where A and B are word characters and @ is a specific non-word
> character?
Sure, but the original bug report was about this:
(\\w+)*\\@\\w+
I can't make any sense of this regular expression, but maybe it is
because I am lacking some skills. Maybe Wu can explain what he wants
to achive with it?
-Hans
It should be:
(cl-ppcre:scan
(cl-ppcre:create-scanner
"(_\\w+)*\\@\\w+") "______________________________________"
:start 0)
but other examples indicate the accurate idea.
--
片云天共远永夜月同孤
Looking at this I'm not sure what this is good for.
Why would we want to match strings of the form _xxx@xxx
in a full-text indexer?
Perhaps it would be best to get rid of the whole messy
regex (of which this is only a small part) and write
a new documented one from scratch. Or use a custom
state-based tokenizer.
Perhaps
(cl-ppcre:create-scanner "(_[_\\w]+)?@\\w+")
will work for your app? The problem in the original expression is the
"+" followed by the "*" can lead to a combinatorial explosion.
If you loosen the requirement that all non-zero matches in the first
expression must begin with an "_" you could have:
(cl-ppcre:create-scanner "[_\\w]*@\\w+")
Cheers,
Chris Dean