On 3 Mar 2017, at 5:20 PM, Michael Clark <michae...@mac.com> wrote:
On 3 Mar 2017, at 5:12 PM, Sober Liu <sob...@nvidia.com> wrote:
I am not sure I get u idea fully. But are u expected that both code/data in 32bits range?
Yes. +-32 as per the AUIPC+JALR pair.And for “a direct PC relative jump", do u expect for static libs instead of dynamic libs?
Yes. I am thinking about the static case. I need to analyse GOT offset calls is dynamic libs. Next…
To view this discussion on the web visit https://groups.google.com/a/groups.riscv.org/d/msgid/isa-dev/2DAF1EFC-8793-4880-BDC6-88BE908BECE5%40mac.com.
On 3 Mar 2017, at 5:39 PM, Michael Clark <michae...@mac.com> wrote:On 3 Mar 2017, at 5:20 PM, Michael Clark <michae...@mac.com> wrote:
On 3 Mar 2017, at 5:12 PM, Sober Liu <sob...@nvidia.com> wrote:
I am not sure I get u idea fully. But are u expected that both code/data in 32bits range?
Yes. +-32 as per the AUIPC+JALR pair.And for “a direct PC relative jump", do u expect for static libs instead of dynamic libs?
Yes. I am thinking about the static case. I need to analyse GOT offset calls is dynamic libs. Next…Sorry I meant in dynamic libs. I am presently analysing vmlinux.I will have to think about PLT stubs to GOT offsets and lazy resolution:1aec0: 00018e17 auipc t3, pc + 98304
1aec4: 880e3e03 ld t3, -1920(t3) # 0x0000000000032740
1aec8: 000e0367 jalr t1, t3, 0We can assuming a dynamic linker doesn’t unlink a GOT entry by tracing AUIPC+LD+JALR a few times (after resolve has populated the GOT entry) to avoid a hash table lookup for translated code, although it would likely be possible to make a test case that changes a GOT entry and demonstrates the processor is not a RISC-V, rather makes some assumptions about jump targets. A translator should be able to pass tests. This would be an interesting tricky test.
On 3 Mar 2017, at 6:37 PM, Michael Clark <michae...@mac.com> wrote:
On 3 Mar 2017, at 6:36 PM, Michael Clark <michae...@mac.com> wrote:On 3 Mar 2017, at 6:33 PM, Jacob Bachmeyer <jcb6...@gmail.com> wrote:
Michael Clark wrote:With parallel instruction decode, the immediate for the AUIPC+JALR jump could be decoded in one step, however the address temporary still needs to be committed to the register file for consistency.
Given the register side effect is redundant in a fused decode implementation it leads to the possibility of an extension like this:
auipc zero, pc + 1576960
jalr ra, zero, -164 # <strcmp>
AUIPC with rd=zero is a nop. I’m not suggesting this is a good idea; it just came to mind when considering the register side effect redundant with the fused variant. i.e. we just need to decode the immediate over two instructions.
No extension needed and perfectly consistent:
AUIPC ra, 1576960
JALR ra, ra, -164 # <strcmp>
In fact, this is the *only* way for AUIPC+JALR as a function call to be a valid fusion pair. The example you give is a no-op followed by an absolute jump of the type originally envisioned as an SBI call.
Remember: macro-op fusion requires that all side-effects of earlier instructions be clobbered by later instructions in the fusion group.
Yes. Good idea. I like your version.
zero was just what immediately came to mind as a no side effect version. ra is perfect.
In fact it’s such a good idea that CALL should emit it. The pseudo is hard-coded to use `t1`.
> email to isa-dev+unsubscribe@groups.riscv.org.
> To post to this group, send email to isa...@groups.riscv.org.
> Visit this group at
> https://groups.google.com/a/groups.riscv.org/group/isa-dev/.
> To view this discussion on the web visit
> https://groups.google.com/a/groups.riscv.org/d/msgid/isa-dev/9B54C10C-00C2-44C7-AC86-BF4A04A673AA%40mac.com.
--
You received this message because you are subscribed to the Google Groups "RISC-V ISA Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+unsubscribe@groups.riscv.org.
To post to this group, send email to isa...@groups.riscv.org.
Visit this group at https://groups.google.com/a/groups.riscv.org/group/isa-dev/.
To view this discussion on the web visit https://groups.google.com/a/groups.riscv.org/d/msgid/isa-dev/CA%2B%2B6G0D-L3r4OAT6kBgebAre5%2B6MNi0YJHEtq_ZZ4mm2u2eWhQ%40mail.gmail.com.
>> > email to isa-dev+unsubscribe@groups.riscv.org.
>> > To post to this group, send email to isa...@groups.riscv.org.
>> > Visit this group at
>> > https://groups.google.com/a/groups.riscv.org/group/isa-dev/.
>> > To view this discussion on the web visit
>> >
>> > https://groups.google.com/a/groups.riscv.org/d/msgid/isa-dev/9B54C10C-00C2-44C7-AC86-BF4A04A673AA%40mac.com.
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "RISC-V ISA Dev" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to isa-dev+unsubscribe@groups.riscv.org.
>> To post to this group, send email to isa...@groups.riscv.org.
>> Visit this group at
>> https://groups.google.com/a/groups.riscv.org/group/isa-dev/.
>> To view this discussion on the web visit
>> https://groups.google.com/a/groups.riscv.org/d/msgid/isa-dev/CA%2B%2B6G0D-L3r4OAT6kBgebAre5%2B6MNi0YJHEtq_ZZ4mm2u2eWhQ%40mail.gmail.com.
>
>
--
You received this message because you are subscribed to the Google Groups "RISC-V ISA Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+unsubscribe@groups.riscv.org.
To post to this group, send email to isa...@groups.riscv.org.
Visit this group at https://groups.google.com/a/groups.riscv.org/group/isa-dev/.
To view this discussion on the web visit https://groups.google.com/a/groups.riscv.org/d/msgid/isa-dev/CA%2B%2B6G0BADWQhtjOfnU_SCE5Q7_uXHLGhW%2Br1mVRwPipYq-z0Hw%40mail.gmail.com.
To view this discussion on the web visit https://groups.google.com/a/groups.riscv.org/d/msgid/isa-dev/CA%2B%2B6G0BADWQhtjOfnU_SCE5Q7_uXHLGhW%2Br1mVRwPipYq-z0Hw%40mail.gmail.com.
| To post to this group, send email to isa...@groups.riscv.org.
| Visit this group at https://groups.google.com/a/groups.riscv.org/group/isa-dev/
| .
| To view this discussion on the web visit https://groups.google.com/a/
| groups.riscv.org/d/msgid/isa-dev/
| 9ea8ebf6-63b6-4314-979e-c12292b54673%40groups.riscv.org.
--
You received this message because you are subscribed to the Google Groups "RISC-V ISA Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+unsubscribe@groups.riscv.org.
To post to this group, send email to isa...@groups.riscv.org.
Visit this group at https://groups.google.com/a/groups.riscv.org/group/isa-dev/.
To view this discussion on the web visit https://groups.google.com/a/groups.riscv.org/d/msgid/isa-dev/22781.41698.49397.828281%40KAMacBookPro2016.local.
On 19 Jun 2017, at 1:09 AM, Rogier Brussee <rogier....@gmail.com> wrote:Recently, the CALL macro has been changed in the assembly and ELF spec. I wondered whether it would not make sense to also change the TAIL macro from1:AUIPC t0, %pcrel_hi(symbol)JALR ra, t0 %pcrel_hi(1b)
toAUIPC t1, %pcrel_hi(symbol)JALR t1 , t1 %pcrel_hi(1b)We take t1 = x6 so as to leave callstacks alone. Sure this sets t1, but who cares: if callstacks are left alone that does no harm. It may even be useful for stack unwinding for exception handling and debugging to have a chance to know where a call came from even if it is a tail call. The main point is that hardware needs only match one pattern for call fusion, as in the absence of tail calls, the original tail-call == jump pattern would only be useful for very very long jumps in a function which should be rare enough to be worth the trouble.
Op dinsdag 25 april 2017 02:16:42 UTC+2 schreef michaeljclark:
Excellent!
Perfect! Thank you. +1
| To post to this group, send email to isa...@groups.riscv.org.
| Visit this group at https://groups.google.com/a/groups.riscv.org/group/isa-dev/
| .
| To view this discussion on the web visit https://groups.google.com/a/
| groups.riscv.org/d/msgid/isa-dev/
| 9ea8ebf6-63b6-4314-979e-c12292b54673%40groups.riscv.org.
--
You received this message because you are subscribed to the Google Groups "RISC-V ISA Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+u...@groups.riscv.org.
To post to this group, send email to isa...@groups.riscv.org.
Visit this group at https://groups.google.com/a/groups.riscv.org/group/isa-dev/.
To view this discussion on the web visit https://groups.google.com/a/groups.riscv.org/d/msgid/isa-dev/22781.41698.49397.828281%40KAMacBookPro2016.local.
--
You received this message because you are subscribed to the Google Groups "RISC-V ISA Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+u...@groups.riscv.org.
To post to this group, send email to isa...@groups.riscv.org.
Visit this group at https://groups.google.com/a/groups.riscv.org/group/isa-dev/.
To view this discussion on the web visit https://groups.google.com/a/groups.riscv.org/d/msgid/isa-dev/f587046e-ba4e-4d5d-9bfa-4b20ce2efbb4%40groups.riscv.org.
#include <stdio.h>size_t add(size_t a, size_t b){return a + b;}int main(){size_t total = 0;for (size_t i = 0; i < 1000; i++) {#if defined (MACRO_FUSION)__asm__ __volatile__(" mv a0, %1\n"" mv a1, %2\n""1: auipc t1, %%pcrel_hi(add)\n"" jalr ra, %%pcrel_lo(1b)(t1)\n"" mv %0, a0\n": "=r"(total): "r"(total), "r"(i));#elif defined (MACRO_FUSION_ELISION)__asm__ __volatile__(" mv a0, %1\n"" mv a1, %2\n""1: auipc ra, %%pcrel_hi(add)\n"" jalr ra, %%pcrel_lo(1b)(ra)\n"" mv %0, a0\n": "=r"(total): "r"(total), "r"(i));#elif defined (MACRO_INDIRECT)__asm__ __volatile__(" mv a0, %1\n"" mv a1, %2\n""1: auipc t1, %%pcrel_hi(add)\n"" addi t1, t1, %%pcrel_lo(1b)\n"" jalr ra, t1\n"" mv %0, a0\n": "=r"(total): "r"(total), "r"(i));#elsetotal = add(total, i);#endif}printf("total=%lu\n", total);return 0;}
To view this discussion on the web visit https://groups.google.com/a/groups.riscv.org/d/msgid/isa-dev/D886BCEA-78B6-4A48-B5D0-94F780F4BDB2%40mac.com.
On Sun, Jun 18, 2017 at 6:09 AM, Rogier Brussee
<rogier....@gmail.com> wrote:
> Recently, the CALL macro has been changed in the assembly and ELF spec. I
> wondered whether it would not make sense to also change the TAIL macro from
>
> 1:
> AUIPC t0, %pcrel_hi(symbol)
> JALR ra, t0 %pcrel_hi(1b)
>
> to
>
> AUIPC t1, %pcrel_hi(symbol)
> JALR t1 , t1 %pcrel_hi(1b)
>
> We take t1 = x6 so as to leave callstacks alone. Sure this sets t1, but who
> cares: if callstacks are left alone that does no harm. It may even be useful
> for stack unwinding for exception handling and debugging to have a chance to
> know where a call came from even if it is a tail call. The main point is
> that hardware needs only match one pattern for call fusion, as in the
> absence of tail calls, the original tail-call == jump pattern would only be
> useful for very very long jumps in a function which should be rare enough to
> be worth the trouble.
TAIL is currently defined as
auipc t1, ...
jalr x0, t1, ...
We already avoid using t0, to avoid messing up the call stacks.
DWARF information provides sufficient information to recover the call
graph.
Furthermore, PLTs can destroy the value in t1, so this isn't
helpful in the general case.
Finally, some low-end unpipelined implementations will execute
JAL/JALR more slowly when rd != 0.
--
You received this message because you are subscribed to the Google Groups "RISC-V ISA Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+u...@groups.riscv.org.
To post to this group, send email to isa...@groups.riscv.org.
Visit this group at https://groups.google.com/a/groups.riscv.org/group/isa-dev/.
To view this discussion on the web visit https://groups.google.com/a/groups.riscv.org/d/msgid/isa-dev/f9f893be-db15-4387-8d40-970278aed02d%40groups.riscv.org.
Hi Rogier,I understand what you mean regarding sharing the macro-fusion pattern now. That wasn’t clear to me, however I still think the JALR should avoid the redundant register write as simple implementations won’t be able to do anything about this extra write, which they don’t have now (and that was not the case for CALL), and macro-op implementations, being more sophisticated, are more able to bear the cost of having two patterns.
It’s interesting that you point this out as my macro-fusion pattern for CALL is as follows:(auipc, rd=x }, { jalr rd=ra, rs1=x }The call macro explicitly sets rd to ra and in my implementation
I need to add a second macro-op pattern match for TAIL:(auipc, rd=x }, { jalr rd=zero, rs1=x }
This is a small price.
I still think the price of having two pattern matches in the macro-op fusion case is a better trade than giving all simple implementations the cost of the redundant write. That aside, it is novel. I had not thought about the rationale of having a single pattern match.
--
You received this message because you are subscribed to the Google Groups "RISC-V ISA Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+unsubscribe@groups.riscv.org.
To post to this group, send email to isa...@groups.riscv.org.
Visit this group at https://groups.google.com/a/groups.riscv.org/group/isa-dev/.
To view this discussion on the web visit https://groups.google.com/a/groups.riscv.org/d/msgid/isa-dev/CABfYdSruqr7Ke_dLramB2QFirw9n2rRTjgsWXiAn2Eke92G4uw%40mail.gmail.com.
--
You received this message because you are subscribed to the Google Groups "RISC-V ISA Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+unsubscribe@groups.riscv.org.
To post to this group, send email to isa...@groups.riscv.org.
Visit this group at https://groups.google.com/a/groups.riscv.org/group/isa-dev/.
To view this discussion on the web visit https://groups.google.com/a/groups.riscv.org/d/msgid/isa-dev/CABfYdSqrNfj%3DAXgwQWnyoTW5nJTqh67chx%2BK4vB_jarn%3DiUMNw%40mail.gmail.com.
Hi Rogier,
I understand what you mean regarding sharing the macro-fusion pattern now. That wasn’t clear to me, however I still think the JALR should avoid the redundant register write as simple implementations won’t be able to do anything about this extra write, which they don’t have now (and that was not the case for CALL), and macro-op implementations, being more sophisticated, are more able to bear the cost of having two patterns.It’s interesting that you point this out as my macro-fusion pattern for CALL is as follows:(auipc, rd=x }, { jalr rd=ra, rs1=x }The call macro explicitly sets rd to ra and in my implementation
I need to add a second macro-op pattern match for TAIL:(auipc, rd=x }, { jalr rd=zero, rs1=x }
On 21 Jun 2017, at 10:09 PM, Rogier Brussee <rogier....@gmail.com> wrote:Hi Michael,
Op dinsdag 20 juni 2017 02:22:26 UTC+2 schreef michaeljclark:Hi Rogier,I understand what you mean regarding sharing the macro-fusion pattern now. That wasn’t clear to me, however I still think the JALR should avoid the redundant register write as simple implementations won’t be able to do anything about this extra write, which they don’t have now (and that was not the case for CALL), and macro-op implementations, being more sophisticated, are more able to bear the cost of having two patterns.It’s interesting that you point this out as my macro-fusion pattern for CALL is as follows:(auipc, rd=x }, { jalr rd=ra, rs1=x }The call macro explicitly sets rd to ra and in my implementationOn second thought, this pattern does not fit the (max) two input one output mould of a standard RV instruction as it has two outputs (ra and x as a clobber) and one input. This is no problem for the CALL macro (and I guess for your software implementation), but if you insist on fixing the link register to ra and want to stay in the general mould, the fusion pattern would have to be{auipc rd=ra }, { jalr rd=ra, rs1=ra }
To view this discussion on the web visit https://groups.google.com/a/groups.riscv.org/d/msgid/isa-dev/5698373f-b430-4b31-99b0-40d164619f21%40groups.riscv.org.
--
You received this message because you are subscribed to the Google Groups "RISC-V ISA Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+u...@groups.riscv.org.
To post to this group, send email to isa...@groups.riscv.org.
Visit this group at https://groups.google.com/a/groups.riscv.org/group/isa-dev/.
To view this discussion on the web visit https://groups.google.com/a/groups.riscv.org/d/msgid/isa-dev/594AF3C9.9030506%40gmail.com.
On 22 Jun 2017, at 1:33 PM, Jacob Bachmeyer <jcb6...@gmail.com> wrote:Agreed.
With respect to AUIPC, redundant calculations of the higher address part can be optimised away. e.g.
1: AUIPC a0, %pcrel_hi(sym)
LD a0, %pcrel_lo(1b)(a0) # can be fused and side effect can be elided
ADDI a0,a0,1
2: AUIPC a1, %pcrel_hi(sym) # can’t be eliminated because a0 is lost, but can be fused
SD a0, %pcrel_lo(2b)(a1)
The optimisation requires the load to use a register allocation where the high part of the address is preserved in another register
1: AUIPC a1, %pcrel_hi(sym)
LD a0, %pcrel_lo(1b)(a1) # address temporary side effect can’t be elided as its later reused
ADDI a0,a0,1
SD a0, %pcrel_lo(1b)(a1)
This optimization is implementation-dependent for hardware -- AUIPC/LD, ADDI, AUIPC/SD would execute in three cycles, while AUIPC, LD, ADDI, SD requires four cycles. Oddly enough, the latter sequence is faster on hardware that doesn't fuse AUIPC/LD and AUPIC/SD, since the first sequence needs five cycles without fusion. Binary translation, however, should be able to recognize the equivalence of these sequences and adjust the %pcrel_lo offsets to match.
<snip>
The important part is that millicode calls will not use AUIPC, otherwise
we would need another fusion pattern that recognizes the millicode link
register. Or we could generalize the fusion pattern for "far call" to
"far jump-and-link" as:
{ auipc rd=X }, { jalr rd=X, rs1=X }
This would have the same effects as independent AUIPC/JALR instructions,
including pushing a return stack if X is either x1 or x5.
-- Jacob
As long as one of the "common" registers x8 - x15 is used, this sequence
fits in 32-bits with RVC. An RV64 hardware implementation could
recognize it and simply clear the upper half of the register, but RV128
will need to shift by 96 bits both ways. Interestingly, C.SRLI can
encode this, but C.SLLI cannot. The best for 32-bit zero extension on
RV128 is 48-bits: either a 32-bit SLLI or two C.SLLI, followed by C.SRLI.
-- Jacob
On 22 Jun 2017, at 1:33 PM, Jacob Bachmeyer <jcb6...@gmail.com> wrote:
RISC-V Zero extension requires detecting two shift operations and seems to be done near use due to the ABI passing around sign extended forms. e.g. a cast to unsigned int requires zero extension like this.
SLLI a1,a1,32
SRLI a1,a1,32
As long as one of the "common" registers x8 - x15 is used, this sequence fits in 32-bits with RVC. An RV64 hardware implementation could recognize it and simply clear the upper half of the register, but RV128 will need to shift by 96 bits both ways. Interestingly, C.SRLI can encode this, but C.SLLI cannot. The best for 32-bit zero extension on RV128 is 48-bits: either a 32-bit SLLI or two C.SLLI, followed by C.SRLI.
SLLI a1,a1,32SRLI a1,a1,32
On 25 Jun 2017, at 8:13 PM, Michael Clark <michae...@mac.com> wrote:On 22 Jun 2017, at 1:33 PM, Jacob Bachmeyer <jcb6...@gmail.com> wrote:RISC-V Zero extension requires detecting two shift operations and seems to be done near use due to the ABI passing around sign extended forms. e.g. a cast to unsigned int requires zero extension like this.
SLLI a1,a1,32
SRLI a1,a1,32
As long as one of the "common" registers x8 - x15 is used, this sequence fits in 32-bits with RVC. An RV64 hardware implementation could recognize it and simply clear the upper half of the register, but RV128 will need to shift by 96 bits both ways. Interestingly, C.SRLI can encode this, but C.SLLI cannot. The best for 32-bit zero extension on RV128 is 48-bits: either a 32-bit SLLI or two C.SLLI, followed by C.SRLI.I implemented the zero extend fusion pattern; for both compressed and non-compressed opcodes (any combination); which is one pattern because I pattern match after decompressing into canonical opcodes:SLLI a1,a1,32SRLI a1,a1,32Now I’ve noticed new and interesting patterns.This this one; zero extended add register immediate (and potentially the register register form too):# 0x0000000000010994 addiw a1, a1, -1
add r9d, -1 ; 4183C1FFmovsxd r9, r9d ; 4D63C9
L134:
# 0x0000000000010996 zext a1
movzx r9, r9d ; 4D0FB7C9
L135:
# 0x000000000001099a addi a1, a1, 1
add r9, 1 ; 4983C101
L136:You can see it has matched a compressed sequence as the PC increments by 4 on zext.
So now I am going to pattern match “zero extended 32-bit add”, a 48-bit fusion pattern:ADDIW a1,a1, nSLLI a1,a1,32SRLI a1,a1,32ADDIWZ will translate to this:add r9d, -1 ; 4183C1FFGiven the amount of code that has int or unsigned int as loop induction variables, I wouldn’t be surprised if these patterns are relatively common.This is inside an AES cipher so it may make a noticeable difference. A lot of ciphers use 32-bit unsigned integers.
--
You received this message because you are subscribed to the Google Groups "RISC-V ISA Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+u...@groups.riscv.org.
To post to this group, send email to isa...@groups.riscv.org.
Visit this group at https://groups.google.com/a/groups.riscv.org/group/isa-dev/.
To view this discussion on the web visit https://groups.google.com/a/groups.riscv.org/d/msgid/isa-dev/C8572BB6-B9B1-48E6-B6E1-E56A6BAF6AC1%40mac.com.
--
You received this message because you are subscribed to the Google Groups "RISC-V ISA Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+u...@groups.riscv.org.
To post to this group, send email to isa...@groups.riscv.org.
Visit this group at https://groups.google.com/a/groups.riscv.org/group/isa-dev/.
To view this discussion on the web visit https://groups.google.com/a/groups.riscv.org/d/msgid/isa-dev/C8572BB6-B9B1-48E6-B6E1-E56A6BAF6AC1%40mac.com.
To view this discussion on the web visit https://groups.google.com/a/groups.riscv.org/d/msgid/isa-dev/CA%2B%2B6G0DR727zsddnvXZJ2pkjEm43JVejHen4fVUSGpLNCPPWFA%40mail.gmail.com.
To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+unsubscribe@groups.riscv.org.
To post to this group, send email to isa...@groups.riscv.org.
Visit this group at https://groups.google.com/a/groups.riscv.org/group/isa-dev/.
To view this discussion on the web visit https://groups.google.com/a/groups.riscv.org/d/msgid/isa-dev/C8572BB6-B9B1-48E6-B6E1-E56A6BAF6AC1%40mac.com.
--
You received this message because you are subscribed to the Google Groups "RISC-V ISA Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+unsubscribe@groups.riscv.org.
To post to this group, send email to isa...@groups.riscv.org.
Visit this group at https://groups.google.com/a/groups.riscv.org/group/isa-dev/.
To view this discussion on the web visit https://groups.google.com/a/groups.riscv.org/d/msgid/isa-dev/CA%2B%2B6G0DR727zsddnvXZJ2pkjEm43JVejHen4fVUSGpLNCPPWFA%40mail.gmail.com.
Now I’ve noticed new and interesting patterns.This this one; zero extended add register immediate (and potentially the register register form too):# 0x0000000000010994 addiw a1, a1, -1
add r9d, -1 ; 4183C1FFmovsxd r9, r9d ; 4D63C9
L134:
# 0x0000000000010996 zext a1
movzx r9, r9d ; 4D0FB7C9
L135:
# 0x000000000001099a addi a1, a1, 1
add r9, 1 ; 4983C101
L136:You can see it has matched a compressed sequence as the PC increments by 4 on zext.So now I am going to pattern match “zero extended 32-bit add”, a 48-bit fusion pattern:ADDIW a1,a1, nSLLI a1,a1,32SRLI a1,a1,32ADDIWZ will translate to this:add r9d, -1 ; 4183C1FFGiven the amount of code that has int or unsigned int as loop induction variables, I wouldn’t be surprised if these patterns are relatively common.
email to isa-dev+unsubscribe@groups.riscv.org.
To post to this group, send email to isa...@groups.riscv.org.
Visit this group at
https://groups.google.com/a/groups.riscv.org/group/isa-dev/.
To view this discussion on the web visit
https://groups.google.com/a/groups.riscv.org/d/msgid/isa-dev/C8572BB6-B9B1-48E6-B6E1-E56A6BAF6AC1%40mac.com.
--
You received this message because you are subscribed to the Google Groups
"RISC-V ISA Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to isa-dev+unsubscribe@groups.riscv.org.
To post to this group, send email to isa...@groups.riscv.org.
Visit this group at
https://groups.google.com/a/groups.riscv.org/group/isa-dev/.
To view this discussion on the web visit
https://groups.google.com/a/groups.riscv.org/d/msgid/isa-dev/CA%2B%2B6G0DR727zsddnvXZJ2pkjEm43JVejHen4fVUSGpLNCPPWFA%40mail.gmail.com.
--
You received this message because you are subscribed to the Google Groups "RISC-V ISA Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+unsubscribe@groups.riscv.org.
To post to this group, send email to isa...@groups.riscv.org.
Visit this group at https://groups.google.com/a/groups.riscv.org/group/isa-dev/.
To view this discussion on the web visit https://groups.google.com/a/groups.riscv.org/d/msgid/isa-dev/1ACC2257-4D98-4B2E-83DF-4AC9D106876F%40mac.com.
On 27 Jun 2017, at 10:22 AM, Michael Clark <michae...@mac.com> wrote:I can confirm it is the LOOP_EXIT block because when I compile with -DAES_FULL_UNROLL the compiler does not emit the zero extension.I’ve fully analysed it. There are two scalars that evolve in the loop, one of them is const u32 rk[] and the other is int r.void aes_rijndael_encrypt(const u32 rk[], int Nr, const u8 pt[16], u8 ct[16]){…/* Nr - 1 full rounds: */
r = Nr >> 1;
for (;;) {
ROUND(1,t,s);
rk += 8;
if (--r == 0)
break;
ROUND(0,s,t);
}…}The loop expression has rk += 8, however SCEV has determined an expression to adjust rk in one go at loop exit, and is adjusting rk in terms of int Nr (signed 32-bit integer) and has turned it into a subtract 1 from Nr, zero extend, shift by 5. e.g.rk += (Nr-1) << 5;
To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+u...@groups.riscv.org.
To post to this group, send email to isa...@groups.riscv.org.
Visit this group at https://groups.google.com/a/groups.riscv.org/group/isa-dev/.
To view this discussion on the web visit https://groups.google.com/a/groups.riscv.org/d/msgid/isa-dev/0529AED9-D606-46B1-999C-63B8B32EE928%40mac.com.
To view this discussion on the web visit https://groups.google.com/a/groups.riscv.org/d/msgid/isa-dev/7A908574-494E-48CA-9922-4F839E205DA2%40mac.com.
To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+unsubscribe@groups.riscv.org.
To post to this group, send email to isa...@groups.riscv.org.
Visit this group at https://groups.google.com/a/groups.riscv.org/group/isa-dev/.
To view this discussion on the web visit https://groups.google.com/a/groups.riscv.org/d/msgid/isa-dev/0529AED9-D606-46B1-999C-63B8B32EE928%40mac.com.
--
You received this message because you are subscribed to the Google Groups "RISC-V ISA Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+unsubscribe@groups.riscv.org.
To post to this group, send email to isa...@groups.riscv.org.
Visit this group at https://groups.google.com/a/groups.riscv.org/group/isa-dev/.
To view this discussion on the web visit https://groups.google.com/a/groups.riscv.org/d/msgid/isa-dev/7A908574-494E-48CA-9922-4F839E205DA2%40mac.com.
--
You received this message because you are subscribed to the Google Groups "RISC-V ISA Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+unsubscribe@groups.riscv.org.
To post to this group, send email to isa...@groups.riscv.org.
Visit this group at https://groups.google.com/a/groups.riscv.org/group/isa-dev/.
To view this discussion on the web visit https://groups.google.com/a/groups.riscv.org/d/msgid/isa-dev/DA976FA2-3851-4087-BD35-A5EA99EE281A%40mac.com.
On 27 Jun 2017, at 12:49 PM, Andrew Waterman <and...@sifive.com> wrote:On Mon, Jun 26, 2017 at 5:39 PM, Tommy Thorn
<tommy...@esperantotech.com> wrote:I'm happy you found the root cause but this is concerning as I see this
pattern quite a bit (mixing pointers with int). I wonder if a case could
had for an ADD[I]WZ instruction?
It's still quite premature to propose an ISA solution to this problem,
as improvements to the compiler have barely been explored. Even this
case resulted from a compiler optimization and wasn't fundamental.
Many cases I've seen are truly unnecessary.
Also, RV64I is frozen :-) Could stuff something like this into the B extension
Even so, for an implementation without fusion, the two shifts seems an
expensive way to zero extend when a "simple" AND could suffice (I'm assume
that the mask generation/loading will be hoisted out of loops).
Of course such would be harder to fuse on.
When the zero-extension is in a loop, it makes sense to AND with a
mask. (This is 3 static instructions, and 2 + n, dynamic
instructions, vs. 2 and 2*n.)
To view this discussion on the web visit https://groups.google.com/a/groups.riscv.org/d/msgid/isa-dev/CA%2B%2B6G0Ct-Jg8OHmpjVpvtOMybWLaMSgs_vyXk4NNpC%3Dua0iD%3DA%40mail.gmail.com.
>>> email to isa-dev+unsubscribe@groups.riscv.org.
>>> To post to this group, send email to isa...@groups.riscv.org.
>>> Visit this group at
>>> https://groups.google.com/a/groups.riscv.org/group/isa-dev/.
>>> To view this discussion on the web visit
>>>
>>> https://groups.google.com/a/groups.riscv.org/d/msgid/isa-dev/C8572BB6-B9B1-48E6-B6E1-E56A6BAF6AC1%40mac.com.
>>>
>>>
>>>
>>> --
>>> You received this message because you are subscribed to the Google Groups
>>> "RISC-V ISA Dev" group.
>>> To unsubscribe from this group and stop receiving emails from it, send an
>>> email to isa-dev+unsubscribe@groups.riscv.org.
>>> To post to this group, send email to isa...@groups.riscv.org.
>>> Visit this group at
>>> https://groups.google.com/a/groups.riscv.org/group/isa-dev/.
>>> To view this discussion on the web visit
>>>
>>> https://groups.google.com/a/groups.riscv.org/d/msgid/isa-dev/CA%2B%2B6G0DR727zsddnvXZJ2pkjEm43JVejHen4fVUSGpLNCPPWFA%40mail.gmail.com.
>>>
>>>
>>>
>>>
>>> --
>>> You received this message because you are subscribed to the Google Groups
>>> "RISC-V ISA Dev" group.
>>> To unsubscribe from this group and stop receiving emails from it, send an
>>> email to isa-dev+unsubscribe@groups.riscv.org.
>>> To post to this group, send email to isa...@groups.riscv.org.
>>> Visit this group at
>>> https://groups.google.com/a/groups.riscv.org/group/isa-dev/.
>>> To view this discussion on the web visit
>>> https://groups.google.com/a/groups.riscv.org/d/msgid/isa-dev/1ACC2257-4D98-4B2E-83DF-4AC9D106876F%40mac.com.
>>
>>
>>
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "RISC-V ISA Dev" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to isa-dev+unsubscribe@groups.riscv.org.
>> To post to this group, send email to isa...@groups.riscv.org.
>> Visit this group at
>> https://groups.google.com/a/groups.riscv.org/group/isa-dev/.
>> To view this discussion on the web visit
>> https://groups.google.com/a/groups.riscv.org/d/msgid/isa-dev/0529AED9-D606-46B1-999C-63B8B32EE928%40mac.com.
>>
>>
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "RISC-V ISA Dev" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to isa-dev+unsubscribe@groups.riscv.org.
>> To post to this group, send email to isa...@groups.riscv.org.
>> Visit this group at
>> https://groups.google.com/a/groups.riscv.org/group/isa-dev/.
>> To view this discussion on the web visit
>> https://groups.google.com/a/groups.riscv.org/d/msgid/isa-dev/7A908574-494E-48CA-9922-4F839E205DA2%40mac.com.
>>
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "RISC-V ISA Dev" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to isa-dev+unsubscribe@groups.riscv.org.
>> To post to this group, send email to isa...@groups.riscv.org.
>> Visit this group at
>> https://groups.google.com/a/groups.riscv.org/group/isa-dev/.
>> To view this discussion on the web visit
>> https://groups.google.com/a/groups.riscv.org/d/msgid/isa-dev/DA976FA2-3851-4087-BD35-A5EA99EE281A%40mac.com.
>
>
--
You received this message because you are subscribed to the Google Groups "RISC-V ISA Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+unsubscribe@groups.riscv.org.
To post to this group, send email to isa...@groups.riscv.org.
Visit this group at https://groups.google.com/a/groups.riscv.org/group/isa-dev/.
To view this discussion on the web visit https://groups.google.com/a/groups.riscv.org/d/msgid/isa-dev/CA%2B%2B6G0Ct-Jg8OHmpjVpvtOMybWLaMSgs_vyXk4NNpC%3Dua0iD%3DA%40mail.gmail.com.
However B extension could helpConstant synthesis using parameters compressed into a 12-bit immediate. Help with the hoisted case.- 0x00000000FFFFFFFF- 0xFFFFFFF000000000- 0x0000FFFF0000FFFF- 0xFFFF0000FFFF0000- 0xFF00FF00FF00FF00- 0x00FF00FF00FF00FF- 0xF0F0F0F0F0F0F0F0- 0x0F0F0F0F0F0F0F0F
Michael Clark wrote:
On 27 Jun 2017, at 12:49 PM, Andrew Waterman <and...@sifive.com <mailto:and...@sifive.com>> wrote:
It's still quite premature to propose an ISA solution to this problem,
as improvements to the compiler have barely been explored. Even this
case resulted from a compiler optimization and wasn't fundamental.
Many cases I've seen are truly unnecessary.
Also, RV64I is frozen :-) Could stuff something like this into the B extension
Yes. I’m fine with them remaining as fusion patterns.
However B extension could help
Constant synthesis using parameters compressed into a 12-bit immediate. Help with the hoisted case.
- 0x00000000FFFFFFFF
- 0xFFFFFFF000000000
- 0x0000FFFF0000FFFF
- 0xFFFF0000FFFF0000
- 0xFF00FF00FF00FF00
- 0x00FF00FF00FF00FF
- 0xF0F0F0F0F0F0F0F0
- 0x0F0F0F0F0F0F0F0F
How to encode these constants into 12 bits? (I would suggest using the sign bit to invert the generated mask, leaving 11 bits for choosing a pattern.)
An idea: "item width" (number of consecutive set bits in the pattern) as a power of two, so the immediates for your examples would be 5, -5, 4, -4, -3, 3, -2, 2. This leads to a very small range, needing only four bits for alternating masks with up to 128 bit item width (-7/7 would be all-bits-set/all-bits-clear on RV128). Could this even fit into RVC?