Re: Use generated Neon version of MemCopy() on ARM, if platform supports it. (issue 12920009)

5 views
Skip to first unread message

rodolph....@gmail.com

unread,
Apr 30, 2013, 1:08:36 PM4/30/13
to olo...@chromium.org, da...@chromium.org, mvst...@chromium.org, ha...@chromium.org, ul...@chromium.org, v8-...@googlegroups.com
comments for the neon code.

We have a similar patch in the work (stalled for now), let us know if you
are
interested in merging our efforts.


https://codereview.chromium.org/12920009/diff/23001/src/arm/codegen-arm.cc
File src/arm/codegen-arm.cc (right):

https://codereview.chromium.org/12920009/diff/23001/src/arm/codegen-arm.cc#newcode124
src/arm/codegen-arm.cc:124: static const int kCacheLineSize = 64;
This is true on A8 and A15 but not A9.

https://codereview.chromium.org/12920009/diff/23001/src/arm/codegen-arm.cc#newcode155
src/arm/codegen-arm.cc:155: ASSERT(OS::kMinComplexMemCopy >= 16);
STATIC_ASSERT

https://codereview.chromium.org/12920009/diff/23001/src/arm/codegen-arm.cc#newcode173
src/arm/codegen-arm.cc:173: __ ldrb(lr, MemOperand(r1, 1, PostIndex),
cs);
use ldrh instead of 2 ldrb.

Same for stores.

https://codereview.chromium.org/12920009/diff/23001/src/arm/codegen-arm.cc#newcode179
src/arm/codegen-arm.cc:179: __ vld4(8, r1, d0, element_0, Writeback);
I am not sure why you are using vld4. Currently you are loading a byte
into d0[0], the next byte into d1[0], etc. So you are touching 4 double
registers to load 4 bytes. If you want to load a word then use ldr. It
also saves you implementing vld4 in the assembler/disasm/simulator.

If you want to avoid mixing arm/neon then you should also get rid of he
ldrb/strb above.

https://codereview.chromium.org/12920009/diff/23001/src/arm/codegen-arm.cc#newcode226
src/arm/codegen-arm.cc:226: __ b(&has32, hs);
If I followed correctly when you enter the has32 block, you know that 32
<= r2 < 64, so this branch will never be taken.

https://codereview.chromium.org/12920009/diff/23001/src/arm/codegen-arm.cc#newcode244
src/arm/codegen-arm.cc:244: __ b(&skip_copy4, ge);
ge implies N flag == V flag, shift with SetCC don't touch the overflow
flag so you probably meant to use pl (N clear).

https://codereview.chromium.org/12920009/diff/23001/src/arm/codegen-arm.cc#newcode251
src/arm/codegen-arm.cc:251: __ ldrb(lr, MemOperand(r1, 1, PostIndex),
cs);
ldrh, then strh below.

https://codereview.chromium.org/12920009/diff/23001/src/arm/codegen-arm.cc#newcode257
src/arm/codegen-arm.cc:257: __ bx(lr);
You can combine both operations above with:
__ pop(pc);

https://codereview.chromium.org/12920009/
Reply all
Reply to author
Forward
0 new messages