What differences are there in between pkg "regexp" and "ext/regexp"?

455 views
Skip to first unread message

Fisher Wei

unread,
Feb 1, 2012, 2:08:19 AM2/1/12
to golan...@googlegroups.com
Hi all,

Why has golang two regexp pkg?
What differences are there in between pkg "regexp" and "ext/regexp"?

Br,
Fisher.

Jessta

unread,
Feb 1, 2012, 2:23:54 AM2/1/12
to Fisher Wei, golan...@googlegroups.com
On Wed, Feb 1, 2012 at 6:08 PM, Fisher Wei <f...@wqfw.de> wrote:
> Hi all,
>
> Why has golang two regexp pkg?
> What differences are there in between pkg "regexp" and "ext/regexp"?

One is old and one is new.
exp/regexp in the "release" is a newer regexp package that wasn't
ready to be the main regexp pkg at the time of the release.

In the weeklies "exp/regexp" is now called just "regexp" and the old
one was moved to "old/regexp"

- jessta

--
=====================
http://jessta.id.au

Fisher Wei

unread,
Feb 1, 2012, 2:27:06 AM2/1/12
to Jessta, golan...@googlegroups.com
Hi,

Thanks.

Br,
Fisher.

Fisher Wei

unread,
Feb 1, 2012, 2:40:31 AM2/1/12
to Jessta, golan...@googlegroups.com
Hi,

Sorry, and I have another question:
Did "ext/xxxx" pkgs are newer/beta version? and will be merged to root of pkg tree?

Br,
Fisher.

Kyle Lemons

unread,
Feb 1, 2012, 2:52:44 AM2/1/12
to Fisher Wei, Jessta, golan...@googlegroups.com
Sorry, and I have another question:

No need to apologize.
 
Did "ext/xxxx" pkgs are newer/beta version? and will be merged to root of pkg tree?

The exp/ subdirectory is for things that may or may not grow up to become well-formed enough to be standard library packages but which the team feel would probably be suitable additions to the stdlib if they did. It has also been used a few times for housing the development of replacement packages (template and regexp) before they are phased into the standard library.  These packages will not appear in binary distributions of releases starting with Go 1 and are not considered part of the "locked down" API.

Fisher Wei

unread,
Feb 1, 2012, 3:03:36 AM2/1/12
to Kyle Lemons, Jessta, golan...@googlegroups.com
Many thanks.

Br,
Fisher.

Johann Höchtl

unread,
Feb 1, 2012, 7:21:52 AM2/1/12
to golan...@googlegroups.com, Fisher Wei
I know that re2 has a different runtime behaviour than the old regexp and that the algorithms behind re2 are known to be generally fast
http://code.google.com/p/re2/

But frankly I was hoping that Go would do better now on regexp-dna:
http://shootout.alioth.debian.org/u64q/program.php?test=regexdna&lang=go&id=1

Is there some inherent design which prevents Go from performing better?

Johann

tux21b

unread,
Feb 1, 2012, 11:35:15 AM2/1/12
to golan...@googlegroups.com, Fisher Wei
Is there some inherent design which prevents Go from performing better?

Go's regexp package (and re2 which was implemented by the same author, Russ Cox),
both have a linear run-time behavior in the worst case. That's basically done by tracking
multiple paths simultaneously which might be, for very simple inputs, a bit slower than
the exponential backtracking algorithm used by pcre (Python, Perl, Java, etc.).

Here are some great articles by Russ Cox about the algorithm used:

There are lot's of (even common!) inputs which Go's regexp package can handle in
some microseconds, which wouldn't even complete in Python, Perl, Java, etc.
Unfortunately, the benchmarks are biased.

Also note that the Go compiler doesn't do any advanced optimization yet (in comparison
to the well-known C compilers with a long history + some manual tuned functions which
has been rewritten in assembler in those libraries).

The Go version which is used on the language shootout page is pretty old and Go is
now able to inline short functions. The heuristics for that are getting tuned constantly.

Check out the latest tip and run the benchmarks on your own. You can find the Go and
the C version in the shootout directory of the Go repository. I did some local benchmarks
last week and the Go version just took about 3x more time (not 24x as listed on the
shootout page). Just try it on your own :)

-christoph

Johann Höchtl

unread,
Feb 1, 2012, 2:18:53 PM2/1/12
to golan...@googlegroups.com, Fisher Wei


Am Mittwoch, 1. Februar 2012 17:35:15 UTC+1 schrieb tux21b:

The Go version which is used on the language shootout page is pretty old and Go is
now able to inline short functions. The heuristics for that are getting tuned constantly.

Nah, the author of the shootout pays special attention to keep Go up to date, the version in question is
6g version weekly.2012-01-27 11507
That's why I am curious and scenting an issue. However I do not have an application at hand which has a problem because of overly poor regex performance.

-christoph

Isaac Gouy

unread,
Feb 1, 2012, 9:34:49 PM2/1/12
to golang-nuts


On Feb 1, 8:35 am, tux21b <tux...@gmail.com> wrote:

> The Go version which is used on the language shootout page is pretty old
> and Go is now able to inline short functions.

FUD


> I did some local benchmarks last week and the Go version just took about
> 3x more time (not 24x as listed on the shootout page).

Is the Go source code for that program the same as that shown on the
benchmarks game web page?

Andrew Gerrand

unread,
Feb 1, 2012, 9:42:02 PM2/1/12
to Isaac Gouy, golang-nuts
On 2 February 2012 13:34, Isaac Gouy <igo...@yahoo.com> wrote:
>
>
> On Feb 1, 8:35 am, tux21b <tux...@gmail.com> wrote:
>
>> The Go version which is used on the language shootout page is pretty old
>> and Go is now able to inline short functions.
>
> FUD

That doesn't mean what you think it means.

>> I did some local benchmarks last week and the Go version just took about
>> 3x more time (not 24x as listed on the shootout page).
>
> Is the Go source code for that program the same as that shown on the
> benchmarks game web page?

Why don't you see for yourself? At the very least, the regexp package
has changed underfoot, even if the benchmark program is the same.

Andrew

Isaac Gouy

unread,
Feb 2, 2012, 1:57:09 AM2/2/12
to golang-nuts
On Feb 1, 6:42 pm, Andrew Gerrand <a...@golang.org> wrote:

> > On Feb 1, 8:35 am, tux21b <tux...@gmail.com> wrote:
>
> >> The Go version which is used on the language shootout page is pretty old
> >> and Go is now able to inline short functions.
>
> > FUD
>
> That doesn't mean what you think it means.

That isn't worth a response.


> >> I did some local benchmarks last week and the Go version just took about
> >> 3x more time (not 24x as listed on the shootout page).
>
> > Is the Go source code for that program the same as that shown on the
> > benchmarks game web page?
>
> Why don't you see for yourself?

Because tux21b should be able to identify which program tux21b
actually measured;
But I'd have to guess which program tux21b actually measured.


> At the very least, the regexp package
> has changed underfoot, even if the benchmark program is the same.

Are you suggesting that tux21b's 8x faster "local benchmarks" are
caused by improvements to the regexp package since weekly.2012-01-27
11507 ?

My guess is that tux21b just hasn't noticed that the C program in go/
test/bench/shootout is not the fastest C program on the benchmarks
game webpage.

The C program in go/test/bench/shootout is listed as only 2.5x faster
than the fastest Go program on the benchmarks game webpage.

Isaac Gouy

unread,
Feb 2, 2012, 1:52:50 PM2/2/12
to golang-nuts


On Feb 1, 8:35 am, tux21b <tux...@gmail.com> wrote:

> Check out the latest tip and run the benchmarks on your own. You can find
> the Go and the C version in the shootout directory of the Go repository.


Please note that directory only has sequential C programs, but for
some tasks has both sequential and parallel Go programs - remember to
compare the times of the sequential programs Go programs not the
parallel Go programs.

Jim Whitehead II

unread,
Feb 2, 2012, 2:08:05 PM2/2/12
to Isaac Gouy, golang-nuts
On Thu, Feb 2, 2012 at 2:34 AM, Isaac Gouy <igo...@yahoo.com> wrote:
>
>
> On Feb 1, 8:35 am, tux21b <tux...@gmail.com> wrote:
>
>> The Go version which is used on the language shootout page is pretty old
>> and Go is now able to inline short functions.
>
> FUD

I think it's pretty clear here that the author is not attempting in
any way, shape, or form, to spread 'fear', 'uncertainty', or 'doubt'.
He's simply incorrect, and basing his response on his prior knowledge
and experience. There's a world of difference between being mistaken
and responding based on a misunderstanding and purposely and what
you're suggesting here.

>> I did some local benchmarks last week and the Go version just took about
>> 3x more time (not 24x as listed on the shootout page).
>
> Is the Go source code for that program the same as that shown on the
> benchmarks game web page?

- Jim

Isaac Gouy

unread,
Feb 2, 2012, 8:31:20 PM2/2/12
to golang-nuts
On Feb 2, 11:08 am, Jim Whitehead II <jnwhi...@gmail.com> wrote:
> On Thu, Feb 2, 2012 at 2:34 AM, Isaac Gouy <igo...@yahoo.com> wrote:
>
> > On Feb 1, 8:35 am, tux21b <tux...@gmail.com> wrote:
>
> >> The Go version which is used on the language shootout page is pretty old
> >> and Go is now able to inline short functions.
>
> > FUD
>
> I think it's pretty clear here that the author is not attempting in
> any way, shape, or form, to spread 'fear', 'uncertainty', or 'doubt'.
> He's simply incorrect, and basing his response on his prior knowledge
> and experience. There's a world of difference between being mistaken
> and responding based on a misunderstanding and purposely and what
> you're suggesting here.


You seem to be claiming some insight into what tux21b was trying to do
with his post, so:

- please tell me what he was trying to achieve when he wrote
"Unfortunately, the benchmarks are biased."

- please tell me what "prior knowledge and experience" could give
someone the impression that "the Go version which is used on the
language shootout page is pretty old" when the fact of the matter is
that the Go measurements have been updated 50 times in the 2 years 2
months that Go has been included.

Nigel Tao

unread,
Feb 2, 2012, 9:38:17 PM2/2/12
to Isaac Gouy, golang-nuts
On 3 February 2012 12:31, Isaac Gouy <igo...@yahoo.com> wrote:
> You seem to be claiming some insight into what tux21b was trying to do
> with his post, so:
>
> - please tell me what he was trying to achieve when he wrote
> "Unfortunately, the benchmarks are biased."

I'm only guessing, but I suspect that the regex test used in the
shootout works well with backtracking implementations.

However, backtracking is worst case exponential. The very first
diagram in http://swtch.com/~rsc/regexp/regexp1.html shows that PCRE
performance can be many orders of magnitude worse than an NFA
implementation. Note that PCRE's time is measured in seconds, NFA's in
micros.

It would be trivial to construct a test case where PCRE would be 1000x
slower than Go's regexp, even though PCRE is written in C. "Biased"
may be too strong a word -- I don't think there is malicious intent --
but the regexp shootout does not contain such a test.


> - please tell me what "prior knowledge and experience" could give
> someone the impression that "the Go version which is used on the
> language shootout page is pretty old" when the fact of the matter is
> that the Go measurements have been updated 50 times in the 2 years 2
> months that Go has been included.

Again, for me, the term FUD connotates intentional deception, and I'd
presume that this was simply an honest mistake.

Isaac Gouy

unread,
Feb 2, 2012, 10:05:31 PM2/2/12
to golang-nuts


On Feb 2, 6:38 pm, Nigel Tao <nigel...@golang.org> wrote:
> On 3 February 2012 12:31, Isaac Gouy <igo...@yahoo.com> wrote:
>
> > You seem to be claiming some insight into what tux21b was trying to do
> > with his post, so:
>
> > - please tell me what he was trying to achieve when he wrote
> > "Unfortunately, the benchmarks are biased."
>
> I'm only guessing, but I suspect that the regex test used in the
> shootout works well with backtracking implementations.
>
> However, backtracking is worst case exponential. The very first
> diagram inhttp://swtch.com/~rsc/regexp/regexp1.htmlshows that PCRE
> performance can be many orders of magnitude worse than an NFA
> implementation. Note that PCRE's time is measured in seconds, NFA's in
> micros.
>
> It would be trivial to construct a test case where PCRE would be 1000x
> slower than Go's regexp, even though PCRE is written in C. "Biased"
> may be too strong a word -- I don't think there is malicious intent --
> but the regexp shootout does not contain such a test.
>
> > - please tell me what "prior knowledge and experience" could give
> > someone the impression that "the Go version which is used on the
> > language shootout page is pretty old" when the fact of the matter is
> > that the Go measurements have been updated 50 times in the 2 years 2
> > months that Go has been included.
>
> Again, for me, the term FUD connotates intentional deception, and I'd
> presume that this was simply an honest mistake.


I don't know anything about you, if I start publishing negative
information about you - information that I could easily confirm is
false if I bothered to check - would that be "simply an honest
mistake"?

Nigel Tao

unread,
Feb 2, 2012, 10:41:15 PM2/2/12
to Isaac Gouy, golang-nuts
On 3 February 2012 14:05, Isaac Gouy <igo...@yahoo.com> wrote:
> I don't know anything about you, if I start publishing negative
> information about you - information that I could easily confirm is
> false if I bothered to check - would that be "simply an honest
> mistake"?

My real estate agent sent me a letter earlier this week with my name
misspelled. This was easily confirmable as false, yet I still believe
that it was an honest mistake.

Kyle Lemons

unread,
Feb 2, 2012, 11:52:25 PM2/2/12
to Isaac Gouy, golang-nuts
I don't know anything about you, if I start publishing negative
information about you - information that I could easily confirm is
false if I bothered to check - would that be "simply an honest
mistake"?

I find that almost all interactions (on the internet or otherwise) go more smoothly when you give the other party the benefit of the doubt.  It's just the neighborly thing to do.

Isaac Gouy

unread,
Feb 3, 2012, 2:29:11 AM2/3/12
to golang-nuts
Why didn't you answer my question?

tux21b

unread,
Feb 3, 2012, 2:47:45 AM2/3/12
to golan...@googlegroups.com, Isaac Gouy
WTF? 10 new posts about me trying to trick you? No way!

I admit that I haven't compared the C version in Go's shootout directory with the best
C version from the shootout. I simply assumed that the C version in the repository will
be one of the better ones... Obviously, I was mistaken. Sorry for that.

Also, English isn't my mother tongue, so "biased" might be indeed a too strong word,
but I was indeed referring to the O(n) vs. O(e^n) characteristics which are imho more
important. Many thanks Nigel for getting this straight.

And about the topic itself (is anyone still interested in it? 0o) my message was mainly
that the Go compiler doesn't do a lot of optimization yet (but that's getting better,
just run the benchmark with different weeklies) and that I don't think that there are any
fundamental differences between Go's regexp and re2, because both have been written
by Russ Cox, who has a very good knowledge about regular expressions. I can still
recommend you to read all of his regex articles, which have been linked several times
now.

DisposaBoy

unread,
Feb 3, 2012, 2:51:25 AM2/3/12
to golan...@googlegroups.com
please just let the thread die. we don't need this kind of negativity here...

Andy Balholm

unread,
Feb 3, 2012, 11:27:46 AM2/3/12
to golan...@googlegroups.com, Isaac Gouy
Some users expected a big jump in performance from the old regular expression package to the new one, but actually the performance was about the same. The main difference between the two is that the new package supports many more features in its regular expression syntax.

No one ever bothered to optimize the old regexp package, because it was a temporary stopgap. The new package has algorithms that were carefully selected for consistent performance, but I don't think a lot of micro-optimization has been done on it yet. As I recall, the changeover was made as soon as the new package was feature-complete and debugged. Since then I've seen very little activity on the mailing list related to how to optimize it. Maybe there's still room to speed it up from consistently decent performance to consistently very good performance.

It will probably never win the benchmarks game—not because it's written in Go rather than C, but because it can't be a bulldozer and a Corvette at the same time. PCRE is a Corvette; it goes really fast when the conditions are good, but it's easy to bog it down. The Go regexp package is a bulldozer; no matter what terrain (horribly complicated regular expressions) you put it on, it just keeps right on going and pushes its way through.


Isaac Gouy

unread,
Feb 3, 2012, 4:38:50 PM2/3/12
to golang-nuts
On Feb 2, 11:47 pm, tux21b <tux...@gmail.com> wrote:
> WTF? 10 new posts about me trying to trick you? No way!
>
> I admit that I haven't compared the C version in Go's shootout directory
> with the best
> C version from the shootout. I simply assumed that the C version in the
> repository will
> be one of the better ones... Obviously, I was mistaken. Sorry for that.

Apology accepted.


> Also, English isn't my mother tongue, so "biased" might be indeed a too
> strong word

Apology accepted.


You also claimed "The Go version which is used on the language
shootout page is pretty old" - that's not true, why did you claim that?

Kyle Lemons

unread,
Feb 3, 2012, 5:00:03 PM2/3/12
to Isaac Gouy, golang-nuts
You also claimed "The Go version which is used on the language
shootout page is pretty old" - that's not true, why did you claim that?

Possibly because the version of the programs no longer runs cleanly?  That indicates to me that it's in need of some love.


Isaac Gouy

unread,
Feb 3, 2012, 6:42:56 PM2/3/12
to golang-nuts
On Feb 3, 2:00 pm, Kyle Lemons <kev...@google.com> wrote:
> > You also claimed "The Go version which is used on the language
> > shootout page is pretty old" - that's not true, why did you claim that?
>
> Possibly because the version of the programs no longer
> runs<http://shootout.alioth.debian.org/u32/performance.php?test=regexdna>
> cleanly?
> That indicates to me that it's in need of some love.


1) Please note the URL Johann Höchtl provided in the comment that
tux21b replied-to, and note that tux21b did read the relative
performance for those Go 6g programs that do in-fact run to
completion:

http://shootout.alioth.debian.org/u64q/program.php?test=regexdna&lang=go&id=1


2) Go 8g regex-dna program memory usage increased with Go 8g release.
2011-02-15 7463

N=500000
before: regex-dna #1 = 106992
after: regex-dna #1 = 133092
before: regex-dna #4 = 79336
after: regex-dna #4 = 91644

N=5000000
before: regex-dna #1 = 2109212
after: regex-dna #1 = out of memory
before: regex-dna #4 = 1961848
after: regex-dna #4 = out of memory


3) Please understand that when I ask tux21b directly for an
explanation of why tux21b did something, the answer to my question is
beyond your ken.

Isaac Gouy

unread,
Feb 4, 2012, 11:00:26 AM2/4/12
to golang-nuts
Someone didn't think it was okay to post the following to golang-nuts
but for some reason they thought it was okay to email it to me -

> Hey,
>
> Maybe you're having a bad day or something, but you're being a pretty
> major asshole on a public mailing list. It might be to everyone's
> benefit if you bowed out of the conversation for a little while.
>
> Regards,
> Some guy.

DisposaBoy

unread,
Feb 4, 2012, 11:57:48 AM2/4/12
to golan...@googlegroups.com
that wasn't me but... please stop with the spamming, it's getting really annoying.
Reply all
Reply to author
Forward
0 new messages