Re: Emit VMLA for multiply-add on ARM (issue 11293061)

33 views
Skip to first unread message

ha...@chromium.org

unread,
Nov 9, 2012, 7:17:28 PM11/9/12
to ul...@chromium.org, da...@chromium.org, ul...@google.com, v8-...@googlegroups.com
On 2012/11/07 09:54:03, ulan1 wrote:
> Looks good overall!

Thanks! I've uploaded a new patch. This passes all tests (except one, but
that
times out on bleeding_edge too). I also couldn't build and test it on MIPS,
because it seems that build is broken.

I've cleaned it up and think this is ready for a proper review now. I'll
look
into adding some new tests for it next week.


> > This confuses me, becaue the simulator just does:
> > set_d_register_from_double(vd, dd_value + dn_value * dm_value);
> Could it be that the simulator is using 80-bit registers to compute the
> expression and therefore is too precise?
> Maybe split it into two instructions? one computes the product and writes
> it
> into the register, and another computes the sum after reading from the
register.

Yes, this seems to fix it.


> > I'm also not entirely sure if this approach is the best.
> I not sure too, but putting it at a lower level would require support for
> instruction sequence matching, which we don't have yet.

One idea I had was to try to do this when we go from the hydrogen
representation
to lithium. That way it's easy to find the pattern. I changed the DoAdd
function
to detect that one operand is a multiplication and generate a multiply-add
there, but then I couldn't figure out how to prevent it from generating
code for
the multiplication as well.

> > Also, is it even ok to introduce a hydrogen instruction that only works
> with
> doubles?
> I think so.

OK, cool. I'll rename it to MultiplyAddD to make this more clear.


> > And I guess my #ifdef's in the hydrogen file is not ok.
> Yep, we need to implement vmla instruction for other architectures.

I don't know about availability of floating-point multiply-add instructions
on
the other architectures, but I guess we can just expand it to mul and add
when
we generate the assembly. I tried emitting two lithium instructions for one
hydrogen, but that obviously doesn't work.

I did this for mips too, but that build seems to be broken currently, so I
haven't verified that it works.


> Small nit: I noticed long lines in the code, we have 80 chars limits.

> https://codereview.chromium.org/11293061/diff/3001/src/arm/lithium-arm.cc
> File src/arm/lithium-arm.cc (right):


https://codereview.chromium.org/11293061/diff/3001/src/arm/lithium-arm.cc#newcode1321
> src/arm/lithium-arm.cc:1321: // FIXME: Not entirely sure this is how to do
it..
> This looks correct.

Great


> https://codereview.chromium.org/11293061/diff/3001/src/arm/lithium-arm.h
> File src/arm/lithium-arm.h (right):


https://codereview.chromium.org/11293061/diff/3001/src/arm/lithium-arm.h#newcode625
> src/arm/lithium-arm.h:625: class LMultiplyAdd: public
> LTemplateInstruction<1,
3,
> 0> { // FIXME: Do we need a tmp?
> > Do we need a tmp?
> Nope, we don't use tmp register in the implementation of the instruction.

Removed the comment.



https://codereview.chromium.org/11293061/diff/3001/src/arm/lithium-codegen-arm.cc
> File src/arm/lithium-codegen-arm.cc (right):


https://codereview.chromium.org/11293061/diff/3001/src/arm/lithium-codegen-arm.cc#newcode1293
> src/arm/lithium-codegen-arm.cc:1293: DwVfpRegister a =
> ToDoubleRegister(instr->a()); // FIXME: Not sure this is correct..
> Should we check for support of vmla here?
> Otherwise, this looks correct.

If I understand correctly, VMLA is supported in all versions of VFP, so I
don't
think we need to do any checks for support.

> https://codereview.chromium.org/11293061/diff/3001/src/hydrogen-instructions.h
> File src/hydrogen-instructions.h (right):


https://codereview.chromium.org/11293061/diff/3001/src/hydrogen-instructions.h#newcode3551
> src/hydrogen-instructions.h:3551: // HArithmeticBinaryOperation and
> HMathFloorOfDiv.
> It sets the output representation of this instruction.

Great. I've removed the comment.

> https://codereview.chromium.org/11293061/diff/3001/src/hydrogen.cc
> File src/hydrogen.cc (right):

> https://codereview.chromium.org/11293061/diff/3001/src/hydrogen.cc#newcode8327
> src/hydrogen.cc:8327: // FIXME: Probably need to check types here?
> Setting input representation should be sufficient. Hydrogen will introduce
> special change instructions for inputs.

OK.

https://codereview.chromium.org/11293061/

jkum...@chromium.org

unread,
Nov 12, 2012, 5:12:47 AM11/12/12
to ha...@chromium.org, ul...@chromium.org, da...@chromium.org, ul...@google.com, v8-...@googlegroups.com
Drive-by comment:
I don't think doing this at HGraph construction time is the right approach.
The
strongest reason for that is that the fused multiply-add depends on
instructions' representations -- which are not yet final at graph
construction
time, they are inferred later (to some extent right now, to much bigger
extent
when https://chromiumcodereview.appspot.com/10837165/ lands, and to much,
much
bigger extent in the medium-term future).

One way around this would be a separate Hydrogen phase, but that idea has
two
drawbacks: (1) an additional phase that traverses the entire graph does
have a
cost, especially when considering huge functions (hundreds of thousands of
instructions, and these do happen); and (2) Hydrogen is probably not the
right
place to put logic that has to do with platform-specific machine
instructions.

After discussing this with Sven and Ulan, we agreed that the best approach
is
probably to put this replacement step into the Lithium translation phase
(i.e.
lithium-arm.cc). When an HMul is visited that has a single use which is an
HAdd
that uses the HMul (1) as left input, or (2) as right input while the left
input
is not an HMul, just skip it (don't emit any lithium instruction for it).
When
an HAdd is visited where (1) the left input or (2) the right but not the
left
input is an HMul, emit an FMA instead instead of an addition.

https://codereview.chromium.org/11293061/

veg...@google.com

unread,
Nov 12, 2012, 12:12:53 PM11/12/12
to ha...@chromium.org, ul...@chromium.org, da...@chromium.org, ul...@google.com, jkum...@chromium.org, v8-...@googlegroups.com
DBC for ia32/x64 unrelated to architecture


https://codereview.chromium.org/11293061/diff/12022/src/ia32/lithium-ia32.cc
File src/ia32/lithium-ia32.cc (right):

https://codereview.chromium.org/11293061/diff/12022/src/ia32/lithium-ia32.cc#newcode1360
src/ia32/lithium-ia32.cc:1360: LOperand* c =
UseRegisterAtStart(instr->c());
c is not used at start at all in the instruction pattern so this
constraint is incorrect.

AtStart means: no destructive operations are performed over registers
until the use.

I think it happens to work in expression like a * a + a only because
SameAsFirst accidentally creates artificial interference.

https://codereview.chromium.org/11293061/

ha...@chromium.org

unread,
Nov 13, 2012, 7:11:21 AM11/13/12
to ul...@chromium.org, da...@chromium.org, ul...@google.com, jkum...@chromium.org, veg...@google.com, v8-...@googlegroups.com
On 2012/11/12 10:12:47, Jakob wrote:
> After discussing this with Sven and Ulan, we agreed that the best
> approach is
> probably to put this replacement step into the Lithium translation phase
> (i.e.
> lithium-arm.cc). When an HMul is visited that has a single use which is an
HAdd
> that uses the HMul (1) as left input, or (2) as right input while the left
input
> is not an HMul, just skip it (don't emit any lithium instruction for it).
> When
> an HAdd is visited where (1) the left input or (2) the right but not the
> left
> input is an HMul, emit an FMA instead instead of an addition.

Thanks! This makes sense. I've updated the patch to do the instruction
folding
during Lithium formation.

https://codereview.chromium.org/11293061/

ha...@chromium.org

unread,
Nov 13, 2012, 7:13:21 AM11/13/12
to ul...@chromium.org, da...@chromium.org, ul...@google.com, jkum...@chromium.org, veg...@google.com, v8-...@googlegroups.com
Did this comment apply only to ia32/x64 (no longer part of the patch), or
does
it apply to ARM as well?

https://codereview.chromium.org/11293061/

sven...@chromium.org

unread,
Nov 13, 2012, 8:03:52 AM11/13/12
to ha...@chromium.org, ul...@chromium.org, da...@chromium.org, ul...@google.com, jkum...@chromium.org, veg...@google.com, v8-...@googlegroups.com
Added a few style nits. @slava: Are the constraints in DoMultiplyAdd
correct?


https://codereview.chromium.org/11293061/diff/8003/src/arm/lithium-arm.cc
File src/arm/lithium-arm.cc (right):

https://codereview.chromium.org/11293061/diff/8003/src/arm/lithium-arm.cc#newcode1361
src/arm/lithium-arm.cc:1361: if (instr->left()->IsMul()) {
Style issue: Please use a helper function here and below, it makes
things clearer, removing the need for the comment above.

LInstruction* LChunkBuilder::DoMultiplyAdd(HMul* mul, HInstruction*
summand) {
LOperand* a = UseRegisterAtStart(mul->left());
LOperand* b = UseRegisterAtStart(mul->right());
LOperand* c = UseRegister(summand);
return DefineSameAsFirst(new(zone()) LMultiplyAddD(c, a, b));
}

https://codereview.chromium.org/11293061/diff/8003/src/arm/lithium-arm.h
File src/arm/lithium-arm.h (right):

https://codereview.chromium.org/11293061/diff/8003/src/arm/lithium-arm.h#newcode627
src/arm/lithium-arm.h:627: LMultiplyAddD(LOperand* c, LOperand *a,
LOperand* b) {
Can we use something less cryptic for the operands like factor1, factor2
and summand? Or are a/b/c standard names in some ARM and/or GCC
documentation?

https://codereview.chromium.org/11293061/

jkum...@chromium.org

unread,
Nov 13, 2012, 8:05:21 AM11/13/12
to ha...@chromium.org, ul...@chromium.org, da...@chromium.org, ul...@google.com, veg...@google.com, sven...@chromium.org, v8-...@googlegroups.com
Yes, this approach looks much better.
https://codereview.chromium.org/11293061/diff/8003/src/arm/lithium-arm.cc#newcode1370
src/arm/lithium-arm.cc:1370: if (instr->right()->IsMul()) {
Please add an ASSERT(!instr->left()->IsMul()) inside this block. That
situation is currently guaranteed, but for the benefit of future
refactorings I'd like to document that it is a necessity.

https://codereview.chromium.org/11293061/diff/8003/src/arm/lithium-arm.h
File src/arm/lithium-arm.h (right):

https://codereview.chromium.org/11293061/diff/8003/src/arm/lithium-arm.h#newcode627
src/arm/lithium-arm.h:627: LMultiplyAddD(LOperand* c, LOperand *a,
LOperand* b) {
I'd appreciate a comment here what the three operands are used for
("Computes c += a * b" or something like that).

Oh, and a nit: "LOperand* a"

https://codereview.chromium.org/11293061/

ha...@chromium.org

unread,
Nov 13, 2012, 9:59:58 AM11/13/12
to ul...@chromium.org, da...@chromium.org, ul...@google.com, jkum...@chromium.org, veg...@google.com, sven...@chromium.org, v8-...@googlegroups.com
Thanks very much for the comments! New patch uploaded.
https://codereview.chromium.org/11293061/diff/8003/src/arm/lithium-arm.cc#newcode1361
src/arm/lithium-arm.cc:1361: if (instr->left()->IsMul()) {
On 2012/11/13 13:03:52, Sven Panne wrote:
> Style issue: Please use a helper function here and below, it makes
things
> clearer, removing the need for the comment above.

> LInstruction* LChunkBuilder::DoMultiplyAdd(HMul* mul, HInstruction*
summand) {
> LOperand* a = UseRegisterAtStart(mul->left());
> LOperand* b = UseRegisterAtStart(mul->right());
> LOperand* c = UseRegister(summand);
> return DefineSameAsFirst(new(zone()) LMultiplyAddD(c, a, b));
> }

Done.

I noticed that you changed from UseRegisterAtStart to UseRegister for c.
Was that intentional? I'm still confused about what to use here. The
instruction does use the value of c, so it doesn't just need a register
that it can clobber.

https://codereview.chromium.org/11293061/diff/8003/src/arm/lithium-arm.cc#newcode1370
src/arm/lithium-arm.cc:1370: if (instr->right()->IsMul()) {
On 2012/11/13 13:05:21, Jakob wrote:
> Please add an ASSERT(!instr->left()->IsMul()) inside this block. That
situation
> is currently guaranteed, but for the benefit of future refactorings
I'd like to
> document that it is a necessity.

Done.

https://codereview.chromium.org/11293061/diff/8003/src/arm/lithium-arm.h
File src/arm/lithium-arm.h (right):

https://codereview.chromium.org/11293061/diff/8003/src/arm/lithium-arm.h#newcode627
src/arm/lithium-arm.h:627: LMultiplyAddD(LOperand* c, LOperand *a,
LOperand* b) {
On 2012/11/13 13:03:52, Sven Panne wrote:
> Can we use something less cryptic for the operands like factor1,
factor2 and
> summand? Or are a/b/c standard names in some ARM and/or GCC
documentation?

Done.

https://codereview.chromium.org/11293061/diff/8003/src/arm/lithium-arm.h#newcode627
src/arm/lithium-arm.h:627: LMultiplyAddD(LOperand* c, LOperand *a,
LOperand* b) {
On 2012/11/13 13:05:21, Jakob wrote:
> I'd appreciate a comment here what the three operands are used for
("Computes c
> += a * b" or something like that).

> Oh, and a nit: "LOperand* a"

Done.

https://codereview.chromium.org/11293061/

veg...@google.com

unread,
Nov 13, 2012, 10:47:47 AM11/13/12
to ha...@chromium.org, ul...@chromium.org, da...@chromium.org, ul...@google.com, jkum...@chromium.org, sven...@chromium.org, v8-...@googlegroups.com
The comment applied to ia32/x64 only.
ARM's constraint looks correct because it is compiled to a single
instruction.

https://codereview.chromium.org/11293061/

jkum...@chromium.org

unread,
Nov 13, 2012, 10:50:18 AM11/13/12
to ha...@chromium.org, ul...@chromium.org, da...@chromium.org, ul...@google.com, veg...@google.com, sven...@chromium.org, v8-...@googlegroups.com

sven...@chromium.org

unread,
Nov 14, 2012, 3:28:36 AM11/14/12
to ha...@chromium.org, ul...@chromium.org, da...@chromium.org, ul...@google.com, jkum...@chromium.org, veg...@google.com, v8-...@googlegroups.com

ul...@chromium.org

unread,
Nov 14, 2012, 3:34:54 AM11/14/12
to ha...@chromium.org, da...@chromium.org, ul...@google.com, jkum...@chromium.org, veg...@google.com, sven...@chromium.org, v8-...@googlegroups.com
Thanks! LGTM, I will land it for you.

http://codereview.chromium.org/11293061/

ul...@chromium.org

unread,
Nov 14, 2012, 6:02:08 AM11/14/12
to ha...@chromium.org, da...@chromium.org, ul...@google.com, jkum...@chromium.org, veg...@google.com, sven...@chromium.org, v8-...@googlegroups.com
On 2012/11/14 08:34:54, ulan wrote:
> Thanks! LGTM, I will land it for you.

Landed as r12958.

https://chromiumcodereview.appspot.com/11293061/
Reply all
Reply to author
Forward
0 new messages