> You can find this through search:
> http://attractivechaos.github.com/plb/
>
> Go looks fine, half the memory consumption of java. It's too slow on
> regex. I read somewhere that Go designers chose to implement regex in
> a way that gives it slow performance for short regex, but constant
> performance for longer regex. This is not helpful for most real world
> situations, where we just need to execute very short regular
> expressions. "How often do you write a regex > 30 characters?" To this
> date I literally never needed them.
No. It's to have strict, reasonable upper limits on the time a regex
takes to run, relative to the length of the regex. This means better
performance for certain special regexes that a typical implementation
has major difficulties with, potentially taking "longer than the life
of the universe" to handle. This means that taking regexes from a source
does not require trusting that source not to try to hang you. It also
means no messy "regex optimisation" stuff.
I would suggest this is preferrable in many real world situations to
improved performance; security is an issue more often than the
performance gap is.
All that said, a big part of the reason it is as slow as it is is
simply that it isn't *that* optimised yet, compared to the libraries
those languages use for regexes, which are almost universally actually
written in C and thus a poor metric of the language itself. This can be
improved without compromising any of the above.
At any rate, regex performance is not a significant measure; if you
don't like the stdlib implementation you can just use another. Unlike
Perl or some other languages, the provided version is not baked into
the language in some way replacements can't be.
> You can find this through search: http://attractivechaos.github.com/plb/
>
> Go looks fine, half the memory consumption of java. It's too slow on
> regex. I read somewhere that Go designers chose to implement regex in
> a way that gives it slow performance for short regex, but constant
> performance for longer regex. This is not helpful for most real world
> situations, where we just need to execute very short regular
> expressions. "How often do you write a regex > 30 characters?" To this
> date I literally never needed them.
It's not the overall length of the pattern. The problem with PCRE and friends is that they use an algorithm that can be exponential in the length of the input string.
Go's current regexp package is a simple placeholder that is, correctly, O(m+n). A new implementation is in the works, and now that SWIG is maturing a wrapper for RE2 (also O(m+n)) is also becoming plausible.
It was never the intention that Go has a slower regular expression library. A correct, linear-time one is a goal, however.
For the gazillionth time, I refer to http://swtch.com/~rsc/regexp/regexp1.html .
I'd like to reiterate the true problem: regular expressions, once the pride of Unix, became broken and misunderstood badly in the 1980s and early 1990s and have never recovered. Idiocy has prevailed. Many of the 'features' people find missing from the Go regexp package are really workarounds to overcome poorly chosen semantics of the Perl style of regexp and to avoid the exponential behavior.
You see, the (valid) complaint that Go's regexp package is slow is never offset by comments that, unlike most implementations, it's actually correct. For me, correctness is more important than speed. It's possible to be correct *and* fast, however, as Russ writes, and that will come. It's just a fair bit of work. (And to keep people happy, the semantics will need to be broken and the feature set must be needlessly expanded, but that's life.)
It all makes me ineffably sad. Once, regexps just worked. Nowadays you need to choose your implementation carefully, and you don't get to choose proper semantics.
-rob
-rob