Macro-op fusion and the ABI.

309 views
Skip to first unread message

Rogier Brussee

unread,
Aug 2, 2017, 3:18:50 PM8/2/17
to RISC-V ISA Dev
The RV ISA is spartan. Many instructions that can be found in other ISA's are supposed to be encoded with macro-op fusion, especially if they can be expressed with instructions in the Compact extension. That is all nice and dandy but for the hardware to recognise a sequence of instructions and macro fuse them, that precise sequence has to be generated.  

This suggests that at least some macro-op fusions should become part of the ABI as officially recommended in particular for compiler writers.
A natural implementation of this idea would be to define asm/linker macro's that the compiler can use, effectively extending the ISA. 

Extra bonus, if things can be speced such that a linker or loader can safely substitute the macro with a semantically equivalent sequence that is optimal for the particular microarchitecture including substitution with a single instruction 

Example of what I have in mind:

macro:

ZEXT %rd % rs1 imm7 

default expansion 
srli %rd %rs1 -imm7; srai %rd %rd -imm7

legitimate implementations

micro architecture1 macro-op fuses everything with -32 <= imm7 <31
micro architecture2 macro-op fuses only the compact versions with %rd ==%rs1 
micro architecture3 only macro-op fuses for imm7 == 32
micro architecture4 just executes the two instructions
micro architecture5 has a special zextw instruction. The non default asm macro expands to this instruction for imm7= 32.


Rogier





Michael Clark

unread,
Aug 2, 2017, 3:32:20 PM8/2/17
to Rogier Brussee, RISC-V ISA Dev
I think zext.w pseudo to match the sext.w pseudo is a good idea.

The riscv gcc compiler metadata already has a pattern match expression called shift_shift that emits the zero extension sequence however it is currently informal, i.e. not defined in the specification pseudo-operations section.

It seems logical that we could define a pseudo called zext.w to match sext.w and change binutils to accept the macro.

Unfortunately it is unlike the strict operands situation where binutils already accepted the canonical forms as defined in the ISA specification. In this case gcc can’t emit zext.w unless the pre-requisite binutils change is in place, so it would be a much longer term change.

Adding zext.w as a pseudo op would not be changing the frozen part of the ISA, rather it would be naming the sequence which is already emitted by gcc i.e. documenting the current pseudos. By adding it to the spec, if gives implementors a guaranteed way to detect zero extension and substitute it with a micro-op or a metadata in a rename table is perceivably one possible implementation.

If there is agreement I don’t mind to submit a patch to riscv-isa-manual and binutils. This change is in essence a small change, or refinement whereby we are documenting the implementation.

We’d probably need to wait a year (or some sufficient amount of time for people to upgrade their binutils) until we make the change in gcc. Alternatively it could be enabled by a configure check assuming gcc build detects the version of binutils it is being built against, however that might make the metadata change more tricky to implement.

The first place may just be to add the pseudo to the ISA manual? as it is technically already emitted.

Michael.

Rogier Brussee

unread,
Aug 2, 2017, 6:14:27 PM8/2/17
to RISC-V ISA Dev, rogier....@gmail.com


Op woensdag 2 augustus 2017 21:32:20 UTC+2 schreef michaeljclark:

> On 3 Aug 2017, at 7:18 AM, Rogier Brussee <rogier....@gmail.com> wrote:
>
> The RV ISA is spartan. Many instructions that can be found in other ISA's are supposed to be encoded with macro-op fusion, especially if they can be expressed with instructions in the Compact extension. That is all nice and dandy but for the hardware to recognise a sequence of instructions and macro fuse them, that precise sequence has to be generated.  
>
> This suggests that at least some macro-op fusions should become part of the ABI as officially recommended in particular for compiler writers.
> A natural implementation of this idea would be to define asm/linker macro's that the compiler can use, effectively extending the ISA.
>
> Extra bonus, if things can be speced such that a linker or loader can safely substitute the macro with a semantically equivalent sequence that is optimal for the particular microarchitecture including substitution with a single instruction
>
> Example of what I have in mind:
>
> macro:
>
> ZEXT %rd % rs1 imm7
>
> default expansion
> srli %rd %rs1 -imm7; srai %rd %rd -imm7
>
> legitimate implementations
>
> micro architecture1 macro-op fuses everything with -32 <= imm7 <31
> micro architecture2 macro-op fuses only the compact versions with %rd ==%rs1
> micro architecture3 only macro-op fuses for imm7 == 32
> micro architecture4 just executes the two instructions
> micro architecture5 has a special zextw instruction. The non default asm macro expands to this instruction for imm7= 32.

I think zext.w pseudo to match the sext.w pseudo is a good idea.

The riscv gcc compiler metadata already has a pattern match expression called shift_shift that emits the zero extension sequence however it is currently informal, i.e. not defined in the specification pseudo-operations section.  
It seems logical that we could define a pseudo called zext.w to match sext.w and change binutils to accept the macro.

Unfortunately it is unlike the strict operands situation where binutils already accepted the canonical forms as defined in the ISA specification. In this case gcc can’t emit zext.w unless the pre-requisite binutils change is in place, so it would be a much longer term change.  
Adding zext.w as a pseudo op would not be changing the frozen part of the ISA, rather it would be naming the sequence which is already emitted by gcc i.e. documenting the current pseudos. By adding it to the spec, if gives implementors a guaranteed way to detect zero extension and substitute it with a micro-op or a metadata in a rename table is perceivably one possible implementation.

If there is agreement I don’t mind to submit a patch to riscv-isa-manual and binutils. This change is in essence a small change, or refinement whereby we are documenting the implementation.  

Yes I chose zext as a relatively uncontroversial example. A more controversial example would be a bswap.[hwd] macro. Obviously you can define byte swapping in terms of shifts and or instructions, but a B.bswap instruction is highly desirable.  Depending on your point of view, a macro is a bad idea that only adds complexity and link time because the number of instructions is too large to be reasonably macro-oped, or a useful compatibility layer.  Anyway, I think that extensions that punt natural primitives to "can be done" with macro-op fusion" should spec the fusion sequence and define a macro.

 
We’d probably need to wait a year (or some sufficient amount of time for people to upgrade their binutils) until we make the change in gcc. Alternatively it could be enabled by a configure check assuming gcc build detects the version of binutils it is being built against, however that might make the metadata change more tricky to implement.  


Even for the I extension other macro ops were proposed. I would also like a neg macro but that is just cosmetics. 

 
The first place may just be to add the pseudo to the ISA manual? as it is technically already emitted. 
 

If there is some sort of consensus it is a good idea yes.

Ciao
Rogier

 
Michael.

Michael Clark

unread,
Aug 2, 2017, 6:29:51 PM8/2/17
to Rogier Brussee, RISC-V ISA Dev
I noticed while sending the email that you have imm7, so that zext has a bit offset. That makes sense as it could be used for 16-bit words on RV32 or RV64, etc. I still need to work on something to measure the frequencies of such patterns.

We’d probably need to wait a year (or some sufficient amount of time for people to upgrade their binutils) until we make the change in gcc. Alternatively it could be enabled by a configure check assuming gcc build detects the version of binutils it is being built against, however that might make the metadata change more tricky to implement.  


Even for the I extension other macro ops were proposed. I would also like a neg macro but that is just cosmetics. 

There is a neg pseudo already:

neg rd,rs         sub rd,x0,rs

It won’t show in compiler output because gcc already knows how to negate using sub, but the pseudo should be present in binutils.

The same might be true for shift_shift. I’m not sure if it is worth trying to get gcc to emit it, nevertheless the pseudo makes sense. i.e. define the zero extension pattern in the list of pseudos and make sure other compilers emit the same pattern so it can be optimised by microarchitectures.

The first place may just be to add the pseudo to the ISA manual? as it is technically already emitted. 
 

If there is some sort of consensus it is a good idea yes.

Ciao
Rogier

 
Michael. 


-- 
You received this message because you are subscribed to the Google Groups "RISC-V ISA Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+u...@groups.riscv.org.
To post to this group, send email to isa...@groups.riscv.org.
Visit this group at https://groups.google.com/a/groups.riscv.org/group/isa-dev/.
To view this discussion on the web visit https://groups.google.com/a/groups.riscv.org/d/msgid/isa-dev/b7809447-e4d9-4106-a3a6-a13f3333748e%40groups.riscv.org.

Andrew Waterman

unread,
Aug 2, 2017, 6:44:16 PM8/2/17
to Rogier Brussee, RISC-V ISA Dev
Adding assembler directives seems attractive for a few common idioms,
but the cross-product of fusable instructions is huge, and beyond the
scope of what the assembler should understand. Furthermore, the
patterns that should be fused will evolve over time. So while I agree
we should have canonical representations of idioms to guide compiler
writers and hardware implementors, I don't think the assembler is the
right place for that.
> --
> You received this message because you are subscribed to the Google Groups
> "RISC-V ISA Dev" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to isa-dev+u...@groups.riscv.org.
> To post to this group, send email to isa...@groups.riscv.org.
> Visit this group at
> https://groups.google.com/a/groups.riscv.org/group/isa-dev/.
> To view this discussion on the web visit
> https://groups.google.com/a/groups.riscv.org/d/msgid/isa-dev/b9f59605-ba50-43b4-a38d-ed0189633e98%40groups.riscv.org.

Michael Clark

unread,
Aug 2, 2017, 8:25:23 PM8/2/17
to Andrew Waterman, Rogier Brussee, RISC-V ISA Dev

> On 3 Aug 2017, at 10:43 AM, Andrew Waterman <wate...@eecs.berkeley.edu> wrote:
>
> Adding assembler directives seems attractive for a few common idioms,
> but the cross-product of fusable instructions is huge, and beyond the
> scope of what the assembler should understand. Furthermore, the
> patterns that should be fused will evolve over time. So while I agree
> we should have canonical representations of idioms to guide compiler
> writers and hardware implementors, I don't think the assembler is the
> right place for that.

Yes. Agree.

Imagine if we had all_pairs(RV32I.ALU, RV32I.ALU) with 2 pipelined ALUs, and patterns for dependent ops that use a temporary with a very short live span. e.g.

<alu_op> r1,r2,r3
<alu_op> r1,r1,r4

However zext.w seems logical due to the presence of sext.w, perhaps then just as an addition to the pseudo instruction table in the ISA manual.

The question is whether to match the sext.w form or as Rogier had defined it, with a variable width immediate, given the same pattern could be used for zero extending a half word or byte. Only 32-bit values (or 64-bit for RV128) are sign-extended in the ISA so i’m not sure of the frequency of the zero extend pattern for other immediate values. We’ll have to work on some macro-op fusion histograms…
> To view this discussion on the web visit https://groups.google.com/a/groups.riscv.org/d/msgid/isa-dev/CA%2B%2B6G0DEw%3Dz-WWAztVW6k%2BX0DVAPzpWQHBGCUueYrUiH_QaUJw%40mail.gmail.com.

Rogier Brussee

unread,
Aug 3, 2017, 5:13:24 AM8/3/17
to RISC-V ISA Dev, rogier....@gmail.com


Op donderdag 3 augustus 2017 00:44:16 UTC+2 schreef waterman:
Adding assembler directives seems attractive for a few common idioms,
but the cross-product of fusable instructions is huge, and beyond the
scope of what the assembler should understand.  Furthermore, the
patterns that should be fused will evolve over time.  So while I agree
we should have canonical representations of idioms to guide compiler
writers and hardware implementors,

That was by far my main point. 

I only propose a macro/pseudo instruction to be part of spec if it is something that 
SHOULD be implemented because the compiler will take the performance (and or power) implications
into account, but not MUST be implemented because it MUST be true that 
nothing breaks if you don't. 
 
I don't think the assembler is the
right place for that.


Quite possibly, particularly in view of the changes in binutils required, but that would be another point of making it part of the spec.
  
Macro's seemed like a natural extension of the idea of letting the asm swap out instructions for their compact version, and macros like call
or la already used.  It is also meant as a software abstraction that allows you to easily swap out the fusion sequence for another sequence better suited for the architecture, e.g. by implementing it with a non standard instruction (example: Pulpino has a zext.w instruction) .

Rogier Brussee

unread,
Aug 3, 2017, 6:30:23 AM8/3/17
to RISC-V ISA Dev, wate...@eecs.berkeley.edu, rogier....@gmail.com


Op donderdag 3 augustus 2017 02:25:23 UTC+2 schreef michaeljclark:

> On 3 Aug 2017, at 10:43 AM, Andrew Waterman <wate...@eecs.berkeley.edu> wrote:
>
> Adding assembler directives seems attractive for a few common idioms,
> but the cross-product of fusable instructions is huge, and beyond the
> scope of what the assembler should understand.  Furthermore, the
> patterns that should be fused will evolve over time.  So while I agree
> we should have canonical representations of idioms to guide compiler
> writers and hardware implementors, I don't think the assembler is the
> right place for that.

Yes. Agree.

Imagine if we had all_pairs(RV32I.ALU, RV32I.ALU) with 2 pipelined ALUs, and patterns for dependent ops that use a temporary with a very short live span. e.g.

        <alu_op> r1,r2,r3
        <alu_op> r1,r1,r4


I should have been clearer that I only  propose a spec/macro for things that would have been primitives, if it were not for the fact that they can be naturally expressed with fusion. If you can fuse all pairs of instructions of the above type as an optimisation, that is great. The compiler will need to know you like pairs like this, but no macro is required. 
 
However zext.w seems logical due to the presence of sext.w, perhaps then just as an addition to the pseudo instruction table in the ISA manual. 
The question is whether to match the sext.w form or as Rogier had defined it, with a variable width immediate, given the same pattern could be used for zero extending a half word or byte. Only 32-bit values (or 64-bit for RV128) are sign-extended in the ISA so i’m not sure of the frequency of the zero extend pattern for other immediate values. We’ll have to work on some macro-op fusion histograms…


zero and sign extension starting from an arbitrary bit position were proposed for the B extension, and then rejected as "can be done with macro-op fusion". Maybe there should be different macros acknowledging that
acknowledging that sign and zero extension from the basic sizes is much more common and possibly implemented differently. 

sext.b rd rs1                  slli rd rs1 -8 ; srai rd rd -8      # p.extbs rd rs1 on pulpino
zext.b rd rs1                  andi rd rs1 0xFF                    

sext.h rd rs1                  slli rd rs1 -16 ; srai rd rd -16  # p.exths rd rs1 on pulpino
zext.h rd rs1                  slli rd rs1 -16 ; srli rd rd -16   # p.exthz rd rs1 on pulpino

sext.w rd rs1                  addwi rd rs1 0
zext.w rd rs1                  slli rd rs1 -32 ; srli rd rd -32  # p.extwz rd rs1 on an imaginary 64 bit pulpino 

sext.d rd rs1                  adddi rd rs1 0   
zext.d rd rs1                  slli rd rs1 -64 ; srli rd rd -64  # p.extdz rd rs1 on an imaginary 128 bit pulpino 

sexti   rd rs1 imm7        slli rd rs1 -imm7 ; srai rd rd -imm7
zexti   rd rs1 imm7        slli rd rs1 -imm7 ; srli rd rd  -imm7

ahmad othman

unread,
Dec 21, 2021, 7:57:46 AM12/21/21
to RISC-V ISA Dev, Rogier Brussee, waterman
Hi,
does GCC have this feature now of fused instruction. or what files should i target to enable macro op fusion. I know that GCC support FMA fuse instruction now.

Philipp Tomsich

unread,
Dec 21, 2021, 9:39:42 AM12/21/21
to ahmad othman, RISC-V ISA Dev, Rogier Brussee, waterman
Ahmad,

there's a patch on the list that adds instruction fusion for RISC-V (targeting the VT-1 fusion patterns):

Thanks,
Philipp.

MitchAlsup

unread,
Dec 28, 2021, 9:56:26 PM12/28/21
to RISC-V ISA Dev, Philipp Tomsich, RISC-V ISA Dev, Rogier Brussee, waterman, ahmd...@gmail.com
May I suggest that instruction-fusing is a microarchitectural tool, and should probably not become visible at the ASCII level of assembly code.
<
{in my opinion = on}
There may machines for which do not have the register ports for 3-way <integer> addition.
There may be machines for which wider fusion is de regueur.
The choice should be left to the implementation and not propagate up to ISA.
{in my opinion = off}

Philipp Tomsich

unread,
Dec 29, 2021, 6:16:13 AM12/29/21
to MitchAlsup, RISC-V ISA Dev, Rogier Brussee, waterman, ahmd...@gmail.com
Mitch,

Given that macro-op fusion usually considers consecutive instructions only, compilers need to consider fusion-pairs in their scheduling decisions (to keep fusion-pairs together) when scheduling for a core that supports fusion.  This will necessarily expose these pairs to the assembly level (as compilers communicate with assemblers via the ASCII level of assembly) due to them being put into consecutive assembly instructions.

I don't really what the ask of your comment is… maybe you could explain?

Thanks,
Philipp.

MitchAlsup

unread,
Dec 29, 2021, 11:39:22 AM12/29/21
to RISC-V ISA Dev, Philipp Tomsich, RISC-V ISA Dev, Rogier Brussee, waterman, ahmd...@gmail.com, MitchAlsup
The thing is; macro-op fusion is something that should remain in the realm of implementation.
Much like the machine architecture specifies that integer arithmetic is 2-s complement, the
architecture refrains from specifying the exact nature of how 2-s complement arithmetic is
to be performed; {ripple carry, Manchester carry, carry select, Kooge-Stone,...}.
<
If you believe that there will be implementations that are small enough to not make use of
macro-instruction fusing, and you believe that there will be implementations large enough
that one needs to fuse more than 2 instructions, let the implementations decide--that is don't
put stuff in the compiler that cannot be used across the whole spectrum.

Allen Baum

unread,
Dec 29, 2021, 2:15:37 PM12/29/21
to MitchAlsup, RISC-V ISA Dev, Philipp Tomsich, Rogier Brussee, waterman, ahmd...@gmail.com
I have to push back on this. There are already many examples of micro-architectural flags that compilers recognize to make them generate code that is optimal for very specific microarchitectures, even within the same ISA family. There was a lawsuit by AMD that claimed that Intel compilers would deliberately not use those optimization flags on AMD processors, which hurt their competitiveness.

So there is more than ample precedent for implementation details "leaking" into compilers - and that's fine. 
If you use the wrong flag you'll get sub-optimal - but correct - code, and that's OK; it's still architecturally compliant, and that is the most important requirement.

Reply all
Reply to author
Forward
0 new messages