code review 6118049: bytes: add assembly version of Equal for ARM (issue 6118049)

74 views
Skip to first unread message

da...@cheney.net

unread,
Apr 24, 2012, 11:38:55 PM4/24/12
to golan...@googlegroups.com, re...@codereview-hr.appspotmail.com
Reviewers: golang-dev_googlegroups.com,

Message:
Hello golan...@googlegroups.com,

I'd like you to review this change to
https://code.google.com/p/go


Description:
bytes: add assembly version of Equal for ARM

BenchmarkEqual32 662 159 -75.98%
BenchmarkEqual4K 76545 13719 -82.08%
BenchmarkEqual4M 90136700 23588870 -73.83%
BenchmarkEqual64M 2147483647 1419616000 -42.63%

BenchmarkEqual32 48.32 201.15 4.16x
BenchmarkEqual4K 53.51 298.56 5.58x
BenchmarkEqual4M 46.53 177.81 3.82x
BenchmarkEqual64M 27.12 47.27 1.74x

Please review this at http://codereview.appspot.com/6118049/

Affected files:
M src/pkg/bytes/asm_arm.s


Index: src/pkg/bytes/asm_arm.s
===================================================================
--- a/src/pkg/bytes/asm_arm.s
+++ b/src/pkg/bytes/asm_arm.s
@@ -27,4 +27,30 @@
RET

TEXT ·Equal(SB),7,$0
- B ·equalPortable(SB)
+ MOVW aptr+0(FP), R0
+ MOVW alen+4(FP), R1
+ MOVW bptr+12(FP), R2
+ MOVW blen+16(FP), R3
+
+ CMP R1, R3 // unequal lengths are not equal
+ B.NE _notequal
+
+ ADD R0, R1 // end
+
+_next:
+ CMP R0, R1
+ B.EQ _equal // reached the end
+ MOVBU.P 1(R0), R4
+ MOVBU.P 1(R2), R5
+ CMP R4, R5
+ B.EQ _next
+
+_notequal:
+ MOVW $0, R0
+ MOVW R0, equal+24(FP)
+ RET
+
+_equal:
+ MOVW $1, R0
+ MOVW R0, equal+24(FP)
+ RET


da...@cheney.net

unread,
Apr 24, 2012, 11:41:09 PM4/24/12
to golan...@googlegroups.com, re...@codereview-hr.appspotmail.com
The benchmark numbers above are incorrect due to the allocator
influencing the benchmark time. I have made a possible fix to the
benchmark (at the cost of a 128mb static allocation).

http://codereview.appspot.com/6116048

Here is the same test with the new becnhmark code

BenchmarkEqual32 670 152 -77.31%
BenchmarkEqual4K 76629 12306 -83.94%
BenchmarkEqual4M 78468300 12575380 -83.97%
BenchmarkEqual64M 1284210000 201596100 -84.30%

BenchmarkEqual32 47.75 209.76 4.39x
BenchmarkEqual4K 53.45 332.83 6.23x
BenchmarkEqual4M 53.45 333.53 6.24x
BenchmarkEqual64M 52.26 332.89 6.37x

http://codereview.appspot.com/6118049/

zhai

unread,
Apr 25, 2012, 12:31:12 AM4/25/12
to da...@cheney.net, golan...@googlegroups.com, re...@codereview-hr.appspotmail.com
Here is a  slight change to your code, re-ranged instructions,  maybe save 2-3 cycles in the inner loop. 5a compiles passed, but I don't have a arm box to test it.

TEXT ·Equal(SB),7,$0
MOVW aptr+0(FP), R0
MOVW alen+4(FP), R1
MOVW bptr+12(FP), R2
MOVW blen+16(FP), R3
CMP R1, R3 // unequal lengths are not equal
B.NE _notequal

CMP $0, R1
B.EQ _equal // lengths  = 0
MOVBU.NE.P 1(R0), R4
MOVBU.NE.P 1(R2), R5
_next:
CMP R4, R5
B.NE    _notequal
SUB.S   $1, R1
MOVBU.NE.P 1(R0), R4
MOVBU.NE.P 1(R2), R5
B.NE _next

_equal:
MOVW $1, R0
MOVW R0, equal+24(FP)
RET

_notequal:
MOVW $0, R0
MOVW R0, equal+24(FP)
RET

Dave Cheney

unread,
Apr 25, 2012, 2:13:03 AM4/25/12
to zhai, golan...@googlegroups.com, re...@codereview-hr.appspotmail.com
Thanks for the suggestion, again the cost of those conditional
instructions appears to be slightly higher.

PASS
BenchmarkEqual32 10000000 163 ns/op 195.65 MB/s
BenchmarkEqual4K 100000 14388 ns/op 284.67 MB/s
BenchmarkEqual4M 100 14700010 ns/op 285.33 MB/s
BenchmarkEqual64M 5 234979400 ns/op 285.59 MB/s

da...@cheney.net

unread,
Apr 26, 2012, 5:13:08 PM4/26/12
to golan...@googlegroups.com, qyz...@gmail.com, minu...@gmail.com, r...@golang.org, re...@codereview-hr.appspotmail.com
Please take another look.

http://codereview.appspot.com/6118049/

da...@cheney.net

unread,
Apr 29, 2012, 7:32:09 PM4/29/12
to golan...@googlegroups.com, qyz...@gmail.com, minu...@gmail.com, r...@golang.org, re...@codereview-hr.appspotmail.com

ia...@golang.org

unread,
Apr 30, 2012, 7:32:06 PM4/30/12
to da...@cheney.net, golan...@googlegroups.com, qyz...@gmail.com, minu...@gmail.com, r...@golang.org, re...@codereview-hr.appspotmail.com

http://codereview.appspot.com/6118049/diff/4001/src/pkg/bytes/asm_arm.s
File src/pkg/bytes/asm_arm.s (right):

http://codereview.appspot.com/6118049/diff/4001/src/pkg/bytes/asm_arm.s#newcode30
src/pkg/bytes/asm_arm.s:30: MOVW aptr+0(FP), R0
I would move the loads of aptr and bptr to below the comparison of alen
and blen.

http://codereview.appspot.com/6118049/diff/4001/src/pkg/bytes/asm_arm.s#newcode35
src/pkg/bytes/asm_arm.s:35: CMP R1, R3 // unequal lengths are not equal
I think I make it a little smaller, along the lines of:

movw $0,r6
cmp r1,r3
b.ne _ret

add r0,r1
b _start

_next:
movbu.p 1(r0),r5
movbu.p 1(r2),r5
cmp r4,r5
b.ne _ret

_start:
cmp r0,r1
b.ne _next

movw $1,r6

_ret:
movw r6,equal+24(fp)
ret

http://codereview.appspot.com/6118049/

da...@cheney.net

unread,
Apr 30, 2012, 10:21:42 PM4/30/12
to golan...@googlegroups.com, qyz...@gmail.com, minu...@gmail.com, r...@golang.org, ia...@golang.org, golan...@googlegroups.com, re...@codereview-hr.appspotmail.com
Thank you for your review ian. I've tweaked it a bit more to incorporate
as many of suggestions as possible.

BenchmarkEqual32 10000000 158 ns/op 202.35
MB/s
BenchmarkEqual4K 100000 13683 ns/op 299.33
MB/s
BenchmarkEqual4M 100 12598580 ns/op 332.92
MB/s
BenchmarkEqual64M 5 201489200 ns/op 333.06
MB/s
On 2012/04/30 23:32:06, iant wrote:
> I would move the loads of aptr and bptr to below the comparison of
alen and
> blen.

Done. That shaved another 9ns (and similar %'s) from Equal32.

http://codereview.appspot.com/6118049/diff/4001/src/pkg/bytes/asm_arm.s#newcode35
src/pkg/bytes/asm_arm.s:35: CMP R1, R3 // unequal lengths are not equal
Hmm, turned out to be slower. I think it is the forward branch to
_start.

BenchmarkEqual32 10000000 259 ns/op 123.49
MB/s
BenchmarkEqual4K 100000 28641 ns/op 143.01
MB/s
BenchmarkEqual4M 50 29295640 ns/op 143.17
MB/s
BenchmarkEqual64M 5 469024800 ns/op 143.08
MB/s

da...@cheney.net

unread,
Apr 30, 2012, 10:22:00 PM4/30/12
to golan...@googlegroups.com, qyz...@gmail.com, minu...@gmail.com, r...@golang.org, ia...@golang.org, golan...@googlegroups.com, re...@codereview-hr.appspotmail.com

da...@cheney.net

unread,
Apr 30, 2012, 10:27:57 PM4/30/12
to golan...@googlegroups.com, qyz...@gmail.com, minu...@gmail.com, r...@golang.org, ia...@golang.org, golan...@googlegroups.com, re...@codereview-hr.appspotmail.com
Patch set 6 turned out to regress, here are the performance numbers for
patch set 7.

BenchmarkEqual32 10000000 150 ns/op 212.15
MB/s
BenchmarkEqual4K 200000 12314 ns/op 332.60
MB/s
BenchmarkEqual4M 100 12597660 ns/op 332.94
MB/s
BenchmarkEqual64M 5 201660200 ns/op 332.78
MB/s

http://codereview.appspot.com/6118049/

ia...@golang.org

unread,
May 1, 2012, 1:43:16 AM5/1/12
to da...@cheney.net, golan...@googlegroups.com, qyz...@gmail.com, minu...@gmail.com, r...@golang.org, golan...@googlegroups.com, re...@codereview-hr.appspotmail.com

da...@cheney.net

unread,
May 1, 2012, 6:56:20 PM5/1/12
to golan...@googlegroups.com, qyz...@gmail.com, minu...@gmail.com, r...@golang.org, ia...@golang.org, golan...@googlegroups.com, re...@codereview-hr.appspotmail.com
Thank you to everyone for your comments. Once there are a few more
builders reporting in I will submit this change.

http://codereview.appspot.com/6118049/

nige...@golang.org

unread,
May 1, 2012, 7:07:56 PM5/1/12
to da...@cheney.net, golan...@googlegroups.com, qyz...@gmail.com, minu...@gmail.com, r...@golang.org, ia...@golang.org, golan...@googlegroups.com, re...@codereview-hr.appspotmail.com

Dave Cheney

unread,
May 1, 2012, 7:08:16 PM5/1/12
to da...@cheney.net, golan...@googlegroups.com, qyz...@gmail.com, minu...@gmail.com, r...@golang.org, ia...@golang.org, nige...@golang.org, re...@codereview-hr.appspotmail.com
Will fix

da...@cheney.net

unread,
May 1, 2012, 7:14:48 PM5/1/12
to golan...@googlegroups.com, qyz...@gmail.com, minu...@gmail.com, r...@golang.org, ia...@golang.org, nige...@golang.org, golan...@googlegroups.com, re...@codereview-hr.appspotmail.com
gofmt has made me lazy, i'll be more careful to check the formatting in
the future.
On 2012/05/01 23:07:56, nigeltao wrote:
> Mixed spaces and tabs after "CMP".

Done.

http://codereview.appspot.com/6118049/
Reply all
Reply to author
Forward
0 new messages