Groups keyboard shortcuts have been updated
Dismiss
See shortcuts

code review 5971058: regexp: implement backtracking with pcre fallback (issue 5971058)

1,111 views
Skip to first unread message

brad...@golang.org

unread,
Apr 1, 2012, 12:01:38 AM4/1/12
to golan...@googlegroups.com, golan...@googlegroups.com, re...@codereview-hr.appspotmail.com
Reviewers: golang-dev_googlegroups.com,

Message:
Hello golan...@googlegroups.com (cc: golan...@googlegroups.com),

I'd like you to review this change to
https://go.googlecode.com/hg/


Description:
regexp: implement backtracking with pcre fallback

Go's current regular expression package doesn't support all
popular patterns which programmers from other languages have
come to expect. Notably, the Go regexp package doesn’t
implement the useful backreference functionality. Without
that, users can't match "catcat" or "dogdog" with (cat|dog)\1
and would need to otherwise fall back to writing multiple
lines of code, possibly involving an "if" statement.

To permit more expressiveness than Go's strictly "regular"
regular expressions, this change permits falling back to the
wildly popular PCRE library when Go would previously return a
compile error.

This permits programmers to copy-paste common regular
expressions from forum sites, stackoverflow, etc. and rest
assured that they'll work as they would have in PHP or Ruby.

Go shouldn't let ideological fundamentalism about algorithmic
purity limit user adoption.

Thanks to Florian Wiemer for his work on the pcre package
which is now promoted to the core library.

This CL doesn't yet implement Perl's (?{ code }) construct. A
future CL will bring Campher into the core, permitting calling
into Perl from Go regular expressions.

Fixes issue 3451

Please review this at http://codereview.appspot.com/5971058/

Affected files:
M src/cmd/api/goapi.go
M src/pkg/go/build/deps_test.go
M src/pkg/regexp/exec.go
A src/pkg/regexp/fallback.go
M src/pkg/regexp/find_test.go
A src/pkg/regexp/pcre/pcre.go
M src/pkg/regexp/regexp.go


Michael Jones

unread,
Apr 1, 2012, 12:06:46 AM4/1/12
to brad...@golang.org, golan...@googlegroups.com, re...@codereview-hr.appspotmail.com
http://swtch.com/~rsc/regexp/regexp1.html
--
Michael T. Jones | Chief Technology Advocate  | m...@google.com |  +1 650-335-5765

a...@golang.org

unread,
Apr 1, 2012, 12:07:49 AM4/1/12
to brad...@golang.org, golan...@googlegroups.com, golan...@googlegroups.com, re...@codereview-hr.appspotmail.com

http://codereview.appspot.com/5971058/diff/1002/src/pkg/regexp/fallback.go
File src/pkg/regexp/fallback.go (right):

http://codereview.appspot.com/5971058/diff/1002/src/pkg/regexp/fallback.go#newcode1
src/pkg/regexp/fallback.go:1: // +build !cmd_go_bootstrap
put the build line after the copyright

http://codereview.appspot.com/5971058/diff/3001/src/pkg/regexp/regexp.go
File src/pkg/regexp/regexp.go (right):

http://codereview.appspot.com/5971058/diff/3001/src/pkg/regexp/regexp.go#newcode68
src/pkg/regexp/regexp.go:68: var fallbackCompile func(expr string, err
error) (*Regexp, error)
why fallback and not just override? If we're linking in pcre we might as
well use it. it is the industry standard after all

http://codereview.appspot.com/5971058/

brad...@golang.org

unread,
Apr 1, 2012, 12:11:45 AM4/1/12
to golan...@googlegroups.com, a...@golang.org, golan...@googlegroups.com, re...@codereview-hr.appspotmail.com

http://codereview.appspot.com/5971058/diff/1002/src/pkg/regexp/fallback.go#newcode1
src/pkg/regexp/fallback.go:1: // +build !cmd_go_bootstrap

On 2012/04/01 04:07:49, adg wrote:
> put the build line after the copyright

Actually I was bit by this recently, when the copyright was too long.
Russ admitted it was a bit of a bug and recommended +build lines go
before the copyright.

http://codereview.appspot.com/5971058/diff/3001/src/pkg/regexp/regexp.go#newcode68
src/pkg/regexp/regexp.go:68: var fallbackCompile func(expr string, err
error) (*Regexp, error)

On 2012/04/01 04:07:49, adg wrote:
> why fallback and not just override? If we're linking in pcre we might
as well
> use it. it is the industry standard after all

It helps during bootstrapping, when cgo isn't available. That's about
the only time we need a pure-Go implementation.

http://codereview.appspot.com/5971058/

dsym...@golang.org

unread,
Apr 1, 2012, 12:58:55 AM4/1/12
to brad...@golang.org, golan...@googlegroups.com, a...@golang.org, golan...@googlegroups.com, re...@codereview-hr.appspotmail.com
I like the approach, but what about running both native and PCRE in
parallel? There are regular expressions where each is faster, and you
could just select between whichever one finishes first. Go is all about
parallel programming, right?

http://codereview.appspot.com/5971058/

Brad Fitzpatrick

unread,
Apr 1, 2012, 1:06:03 AM4/1/12
to Andrew Gerrand, Brad Fitzpatrick, golan...@googlegroups.com, re...@codereview-hr.appspotmail.com, dsym...@golang.org

Interesting idea.

It could even learn over time (per-*Regexp) which is fastest.

nige...@golang.org

unread,
Apr 1, 2012, 1:08:40 AM4/1/12
to brad...@golang.org, golan...@googlegroups.com, a...@golang.org, dsym...@golang.org, golan...@googlegroups.com, re...@codereview-hr.appspotmail.com
Nice start. Any chance of some benchmarks?


http://codereview.appspot.com/5971058/diff/3001/src/pkg/regexp/fallback.go
File src/pkg/regexp/fallback.go (right):

http://codereview.appspot.com/5971058/diff/3001/src/pkg/regexp/fallback.go#newcode51
src/pkg/regexp/fallback.go:51: var buf [4]byte
s/4/utf8.UTFMax/

http://codereview.appspot.com/5971058/diff/3001/src/pkg/regexp/pcre/pcre.go
File src/pkg/regexp/pcre/pcre.go (right):

http://codereview.appspot.com/5971058/diff/3001/src/pkg/regexp/pcre/pcre.go#newcode26
src/pkg/regexp/pcre/pcre.go:26: // This package provides access to the
Perl Compatible Regular
s/This package/Package pcre/

http://codereview.appspot.com/5971058/diff/3001/src/pkg/regexp/pcre/pcre.go#newcode46
src/pkg/regexp/pcre/pcre.go:46: // package and the flags defined below,
see the PCRE documentation.
Give a link to the PCRE docs?

http://codereview.appspot.com/5971058/diff/3001/src/pkg/regexp/pcre/pcre.go#newcode101
src/pkg/regexp/pcre/pcre.go:101: // A reference to a compiled regular
expression.
Regexp is a reference etc.

http://codereview.appspot.com/5971058/diff/3001/src/pkg/regexp/pcre/pcre.go#newcode107
src/pkg/regexp/pcre/pcre.go:107: // Number of bytes in the compiled
pattern
pcresize returns the number of etc.

Similarly with func comments below.

http://codereview.appspot.com/5971058/

David Symonds

unread,
Apr 1, 2012, 1:09:53 AM4/1/12
to Brad Fitzpatrick, Andrew Gerrand, golan...@googlegroups.com, re...@codereview-hr.appspotmail.com
On Sun, Apr 1, 2012 at 3:06 PM, Brad Fitzpatrick <brad...@golang.org> wrote:

> It could even learn over time (per-*Regexp) which is fastest.

A little neural net would train quickly for something like this. We
could SWIG wrap some of the well-studied C++ NN codebases for this
purpose.

Rob 'Commander' Pike

unread,
Apr 1, 2012, 3:35:37 AM4/1/12
to David Symonds, Brad Fitzpatrick, Andrew Gerrand, golan...@googlegroups.com, re...@codereview-hr.appspotmail.com
LGTM

Jan Mercl

unread,
Apr 1, 2012, 4:28:05 AM4/1/12
to golan...@googlegroups.com, re...@codereview-hr.appspotmail.com, brad...@golang.org
On Sunday, April 1, 2012 6:01:38 AM UTC+2, Brad Fitzpatrick wrote:
I'd like you to review this change to
https://go.googlecode.com/hg/


Description:
regexp: implement backtracking with pcre fallback

Number on the wall
explained it all ;-)

Reply all
Reply to author
Forward
0 new messages