Just a follow up. SRE2 was written before Go 1 was released. I contacted Sam Thorogood, the author of SRE2, yesterday. He was very responsive and said he would spend a fun weekend to upgrade the code to Go 1.x sometime.
In the meanwhile, I read again the paper at
http://swtch.com/~rsc/regexp/regexp[1-3].html and the source of Go's "regexp". My understanding is that "regexp" implements Thompson's algorithm to match all machine threads in lock steps "char by char" without backtracking. After reading the code, it seems to me that not much can be done to the code according to what the paper says. I am wondering if someone already has ideas to improve "regexp"'s performance. Is adding a scheduler for the threads a good idea? What simple things could be done to make it perform closer to C++ RE2?
I downloaded Go's benchmark code from
http://benchmarksgame.alioth.debian.org/u64q/program.php?test=regexdna&lang=go&id=1 ran it and profiled it using pprof. I saw a lot of time was spent on runtime.memmove and runtime.copy. (slice resizing?) I am thinking that we might cut 30~40% of the execution time by removing these memory movements. Please see the pprof info and let me know if these movements were intended.
(pprof) top10
Total: 8604 samples
3062 35.6% 35.6% 5178 60.2% regexp.(*machine).add
2474 28.8% 64.3% 2474 28.8% runtime.memmove
1537 17.9% 82.2% 2195 25.5% regexp.(*machine).step
438 5.1% 87.3% 7469 86.8% regexp.(*machine).match
311 3.6% 90.9% 311 3.6% regexp/syntax.EmptyOpContext
241 2.8% 93.7% 1970 22.9% runtime.copy
173 2.0% 95.7% 173 2.0% regexp.(*machine).alloc
122 1.4% 97.1% 122 1.4% regexp.(*inputBytes).step
118 1.4% 98.5% 118 1.4% regexp/syntax.(*Inst).MatchRune
55 0.6% 99.2% 63 0.7% MHeap_AllocLocked
go/src/pkg/runtime/memmove_amd64.s
2474 2474 28.8% runtime.memmove
go/src/pkg/runtime/slice.c
. . 236: }
. . 237:
35 35 238: if(ret == 1 && width == 1) { // common case worth about 2x to do here
. . 239: *to.array = *fm.array; // known to be a byte pointer
. . 240: } else {
59 1788 241: runtime·memmove(to.array, fm.array, ret*width);
. . 242: }
. . 243: