Re: code review 5536082: bytes: Applied KMP algorithm for bytes count. (issue 5536082)

36 views
Skip to first unread message

郭晓峰

unread,
Jan 20, 2012, 2:40:59 PM1/20/12
to golan...@googlegroups.com, golan...@googlegroups.com, re...@codereview-hr.appspotmail.com
Hi there,

Sorry to bother. I don't think the CL is sophisticated for reviewing,
still need some changes. But have some questions related to the CL:

- Why benchmarkCount() doesn't work as I expected? Looks the Count()
code is optimized out. I need them to get a k for KMP, like below:
if (len(sep) < k) {
if s[i:i+len(sep)] == sep {
//...
}
} else {
CountKMP(...)
}

And below is the adjusted result similar to I posted yesterday. The
result for BenchmarkCountSepSize<i>() is constant as I mentioned
above:

$ gotest -bench=BenchmarkCount
rm -f _test/bytes.a
rm -f _test/bytes.agopack grc _test/bytes.a _gotest_.6 asm_amd64.6
PASSbytes_test.BenchmarkCount32 5000000 484 ns/op
66.04 MB/s
bytes_test.BenchmarkCount4K 50000 40479 ns/op
101.19 MB/sbytes_test.BenchmarkCount4M 50
41046900 ns/op 102.18 MB/sbytes_test.BenchmarkCount64M
5 662377800 ns/op 101.32 MB/s
bytes_test.BenchmarkCountOld32 5000000 746 ns/op
42.89 MB/s
bytes_test.BenchmarkCountOld4K 10000 115871 ns/op
35.35 MB/s
bytes_test.BenchmarkCountOld4M 20 118981700 ns/op
35.25 MB/s
bytes_test.BenchmarkCountOld64M 1 1902782000 ns/op
35.27 MB/s
bytes_test.BenchmarkCountEasy32 10000000 237 ns/op
134.90 MB/s
bytes_test.BenchmarkCountEasy4K 500000 7156 ns/op
572.33 MB/s
bytes_test.BenchmarkCountEasy4M 500 7171258 ns/op
584.88 MB/s
bytes_test.BenchmarkCountEasy64M 20 115735250
ns/op 579.85 MB/s
bytes_test.BenchmarkCountOldEasy32 50000000 68.7
ns/op 465.97 MB/s
bytes_test.BenchmarkCountOldEasy4K 5000000 336
ns/op 12173.58 MB/s
bytes_test.BenchmarkCountOldEasy4M 10000 285756
ns/op 14677.91 MB/s
bytes_test.BenchmarkCountOldEasy64M 500 7516010
ns/op 8928.79 MB/s
bytes_test.BenchmarkCountSepSize1 500 3528376 ns/op
bytes_test.BenchmarkCountSepSize2 500 4117214 ns/op
bytes_test.BenchmarkCountSepSize3 500 4095292 ns/op
bytes_test.BenchmarkCountSepSize4 500 4088852 ns/op
bytes_test.BenchmarkCountSepSize5 500 4091520 ns/op
bytes_test.BenchmarkCountSepSize6 500 4088948 ns/op
bytes_test.BenchmarkCountSepSize7 500 4091320 ns/op
bytes_test.BenchmarkCountSepSize8 500 4098144 ns/op
bytes_test.BenchmarkCountSepSize9 500 4088538 ns/op
bytes_test.BenchmarkCountSepSize10 500 4089418 ns/op

Best Regards,
Xiaofeng

On Fri, Jan 20, 2012 at 11:30 AM, <lam...@gmail.com> wrote:
> Reviewers: golang-dev_googlegroups.com,
>
> Message:
> Hello golan...@googlegroups.com (cc: golan...@googlegroups.com,
> golan...@googlegroups.com),
>
> I'd like you to review this change to
> https://go.googlecode.com/hg/
>
>
> Description:
> bytes: Applied KMP algorithm for bytes count.
>
> Related discussion thread:
> https://groups.google.com/group/golang-nuts/browse_thread/thread/37b126bd62a7dfcb
>
> Please review this at http://codereview.appspot.com/5536082/
>
> Affected files:
>  M src/pkg/bytes/bytes.go
>  M src/pkg/bytes/bytes_test.go
>
>

count-benchmark.log

lam...@gmail.com

unread,
Jan 20, 2012, 2:30:14 PM1/20/12
to golan...@googlegroups.com, golan...@googlegroups.com, golan...@googlegroups.com, re...@codereview-hr.appspotmail.com

lam...@gmail.com

unread,
Jan 20, 2012, 2:34:53 PM1/20/12
to golan...@googlegroups.com, golan...@googlegroups.com, golan...@googlegroups.com, re...@codereview-hr.appspotmail.com

lam...@gmail.com

unread,
Jan 21, 2012, 12:41:29 AM1/21/12
to golan...@googlegroups.com, brad...@golang.org, golan...@googlegroups.com, re...@codereview-hr.appspotmail.com
Hi there,

I modified some code today, looks the result isn't that promising. The
result is attached in another CL: http://codereview.appspot.com/5554080,
I used "<MaxNInArray>.log" as the file name. (For example, 5.log means
MaxNInArray = 5 in the benchmark test.

Bytes "s" and "sep" are randomly generated, that means if MaxNInArray
increases, the possibility of "sep" in "s" is decreased. So, "sep == s"
don't need to test a lot of chars. (In most cases, 2 chars are enough)
So, most of the result can't provide promising result that KMP can
provide better result for bytes.Count().

The case I investigated yesterday is a misleading one, it tries
bytes.Count("0000...0x", "000000x"), which is a worst case for "sep ==
s", and we can see about 4-times speed-up there.

But anyway, the result is interesting, maybe we can involve Knuth for
discussing... :-)

Best Regards,
Xiaofeng

> Best Regards,
> Xiaofeng

> On Fri, Jan 20, 2012 at 11:30 AM, <mailto:lam...@gmail.com> wrote:
> > Reviewers: http://golang-dev_googlegroups.com,
> >
> > Message:
> > Hello mailto:golan...@googlegroups.com (cc:
mailto:golan...@googlegroups.com,
> > mailto:golan...@googlegroups.com),


> >
> > I'd like you to review this change to
> > https://go.googlecode.com/hg/
> >
> >
> > Description:
> > bytes: Applied KMP algorithm for bytes count.
> >
> > Related discussion thread:
> >

> > &nbsp;M src/pkg/bytes/bytes.go
> > &nbsp;M src/pkg/bytes/bytes_test.go
> >
> >

http://codereview.appspot.com/5536082/

Russ Cox

unread,
Jan 24, 2012, 3:41:22 PM1/24/12
to lam...@gmail.com, golan...@googlegroups.com, brad...@golang.org, re...@codereview-hr.appspotmail.com
Hi.

I think we should leave bytes.Count alone for now.
The algorithm can be made faster in some cases,
but they are not that common (usually the substring
being counted is very short). Adding the allocation
here slows down those cases and changes the
performance profile of bytes.Count in an important way.
And if the substring is long, that []int is going to be
very large, so the memory cost will be significant.

I encourage you to put your KMP code into an
external repository that can be installed using goinstall
(or, equivalently, the new 'go get').

In a separate package, it would be appropriate to have
a separate type representing the pre-computed []int,
and that data could be reused for multiple searches.

Thanks.
Russ

Reply all
Reply to author
Forward
0 new messages