Not-KMP algorithm in string Count()?

94 views
Skip to first unread message

郭晓峰

unread,
Jan 18, 2012, 11:54:05 PM1/18/12
to golan...@googlegroups.com
Hi there,

On studying go lang, and found the code snippet as below in L56-L61 in
strings.go:

for i := 0; i+l <= len(s); i++ {
if s[i] == c && s[i:i+l] == sep {
n++
i += l - 1
}
}

Looks "s[i:i+l] == sep" isn't that efficient, and the code may be
called very frequently. Would like to know the eng-thought on this.

Greatly appreciate replies!

Best Regards,
Xiaofeng

Brad Fitzpatrick

unread,
Jan 19, 2012, 12:22:22 AM1/19/12
to 郭晓峰, golan...@googlegroups.com
Is the question whether s[i:i+1] == sep is efficient, or whether a better algorithm as a whole would be better?

郭晓峰

unread,
Jan 19, 2012, 1:08:03 AM1/19/12
to Brad Fitzpatrick, golan...@googlegroups.com
Hi Brad,

I mean for comparing strings, the algorithm is O(len(s) * len(sep)),
which is worse than KMP algorithm with O(len(s) + len(sep)).

I am not sure whether there are any concerns for using current implementation.

Best Regards,
Xiaofeng

Evan Shaw

unread,
Jan 19, 2012, 1:13:31 AM1/19/12
to 郭晓峰, golan...@googlegroups.com
For long strings, there are better algorithms, but most strings are
relatively short. The constant factors on fancy algorithms would hurt
the common case.

It might be reasonable to use different strategies depending on string
size. Emphasis on "might".

- Evan

Brad Fitzpatrick

unread,
Jan 19, 2012, 1:16:37 AM1/19/12
to Evan Shaw, 郭晓峰, golan...@googlegroups.com
Agreed.

Xiaofeng, feel free to contribute a KMP version, but please include benchmarks with varying sized strings as well as varying sized substrings.

In any case, I can't think of anywhere where Count is a bottleneck, but faster could be nicer.

郭晓峰

unread,
Jan 19, 2012, 2:18:40 AM1/19/12
to Brad Fitzpatrick, Evan Shaw, golan...@googlegroups.com
Sure. Will do that and send out for review later.

Think need to create benchmark for distance of "sep", and find a
balance to use "sep == s[i:i+l]" and KMP.

Best Regards,
Xiaofeng

郭晓峰

unread,
Jan 20, 2012, 1:50:20 AM1/20/12
to Brad Fitzpatrick, Evan Shaw, golan...@googlegroups.com
Hi Brad,

Found similar stuff in bytes package, and do some changes, below is
some benchmark result for existed Benchmark cases:

Current one:
bytes_test.BenchmarkCount32 5000000 731 ns/op
43.77 MB/s
bytes_test.BenchmarkCount4K 10000 113331 ns/op
36.14 MB/s
bytes_test.BenchmarkCount4M 20 116035850 ns/op
36.15 MB/s
bytes_test.BenchmarkCount64M 1 1880733000 ns/op
35.68 MB/s
bytes_test.BenchmarkCountEasy32 50000000 68.6 ns/op
466.18 MB/s
bytes_test.BenchmarkCountEasy4K 5000000 337 ns/op
12143.64 MB/s
bytes_test.BenchmarkCountEasy4M 10000 284531 ns/op
14741.11 MB/s
bytes_test.BenchmarkCountEasy64M 500 7451792
ns/op 9005.73 MB/s2™

Result with KMP support:
bytes_test.BenchmarkCount32 5000000 501 ns/op
63.77 MB/s
bytes_test.BenchmarkCount4K 50000 43069 ns/op
95.10 MB/s
bytes_test.BenchmarkCount4M 50 43907560 ns/op
95.53 MB/s
bytes_test.BenchmarkCount64M 5 703088600 ns/op
95.45 MB/s
bytes_test.BenchmarkCountEasy32 10000000 274 ns/op
116.45 MB/s
bytes_test.BenchmarkCountEasy4K 500000 7178 ns/op
570.63 MB/s
bytes_test.BenchmarkCountEasy4M 500 7187668 ns/op
583.54 MB/s
bytes_test.BenchmarkCountEasy64M 20 115955400
ns/op 578.75 MB/s

It's my first time to write for go lang, would like to make sure I
just need to follow instructions in the doc
(http://golang.org/doc/contribute.html), right?

The CL can be expected to be sent out tomorrow night. (Sorry, need
some time to familiar with hg and send out the CL) And would like to
discuss into details about how to balance the origin one and the new
one.

Best Regards,
Xiaofeng

Brad Fitzpatrick

unread,
Jan 20, 2012, 3:04:11 AM1/20/12
to 郭晓峰, Evan Shaw, golan...@googlegroups.com
Yes, http://golang.org/doc/contribute.html is the right instructions.
Reply all
Reply to author
Forward
0 new messages