JALR : fail to understand 'push and pop' use case.

591 views
Skip to first unread message

Pierre G.

unread,
Jul 13, 2018, 7:39:02 AM7/13/18
to RISC-V ISA Dev
As per definition in riscv-spec-2.2 :

The indirect jump instruction JALR (jump and link register) uses the I-type encoding. The target
address is obtained by adding the 12-bit signed I-immediate to the register rs1, then setting the
least-signi cant bit of the result to zero. The address of the instruction following the jump (pc+4)
is written to register rd.

And later, there is the Table 2.1 :
rd     rs1   rs1=rd    RAS action
!link  !link   -       none
!link  link    -       pop
link   !link   -       push
link   link    0       push and pop
link   link    1       push


I am able to derive the behaviur of jalr for all but the 4th one - I do not understand why it can be said 'push and pop'.
Nevertheless, I am providing here below my understanding of each of the 5 use cases.

jalr x6, x7, imm  // none are link register
     new state :
          pc <= x7 + imm


jalr x6, x1, imm  // 2nd case as x6 is not link, but x1 is. This is pop.
     new state :
          pc <= x1.    // in this case imm is useless.


jalr x1, x7, imm  // rd is link, rs1 not link. this is a push
     new state :
          x1 <= old pc + 4
          pc <= x7 + imm


jalr x1, x5, imm  // rd is link, rs1 is link; rd!=rs1, this is push & pop
     new state :
          x1 <= old pc + 4
          pc <= x5 + imm 

jalr x1, x1, imm  // rd is link, rs1 is link; rd=rs1, this is push
     new state :
          x1 <= old pc + 4
          pc <= old x1 + imm 



I had also a look at the implementation of jalr in the spike simulator, and it is not inline with my understanding

reg_t rv32_jalr(processor_t* p, insn_t insn, reg_t pc)
{
  int xlen = 32;
  reg_t npc = sext_xlen(pc + insn_length( MATCH_JALR));
  // #include "insns/jalr.h" - this header file is unfold here below :
  reg_t tmp = npc;
  set_pc((RS1 + insn.i_imm()) & ~reg_t(1));
  WRITE_RD(tmp);
  trace_opcode(p,  MATCH_JALR, insn);
  return npc;
}


Could someone clarifiy this table I have mentionned above ?

Thanks.

Samuel Falvo II

unread,
Jul 13, 2018, 12:21:26 PM7/13/18
to Pierre G., RISC-V ISA Dev
On Fri, Jul 13, 2018 at 4:39 AM, Pierre G. <pig...@gmail.com> wrote:>
jalr x6, x7, imm // none are link register
>
> new state :
> pc <= x7 + imm

As I understand things, the return address stack is an implementation
detail, an optimization if you will, to facilitate faster subroutine
calls and returns by priming branch prediction logic. It does not
affect the register update of Rd when executing the JAL(R)
instruction, and should prediction fail, the core *must* jump to
rs1+imm.

new state:
pc <= x7 + imm
x6 <= old pc+4

> jalr x6, x1, imm // 2nd case as x6 is not link, but x1 is. This is pop.
>
> new state :
> pc <= x1. // in this case imm is useless.

The immediate field can never useless.

When you pop the RAS stack, the intent is to predict the return
address. E.g., the instruction *fetch* stage starts fetching at the
peeked[1] address, in the *hopes* that it's doing the right thing.
That prediction can be wrong; the prediction is right if, and only if,
rs1+imm == top of RAS at the time the JALR actually *executes* (e.g.,
reaches an execute stage). If the two values are not correct, you'll
need to invalidate the pipeline stages leading up to the JALR
execution stage, and fetch again from rs1+imm.

new state:
pc <= x1 + imm // imm can't be useless, or you'll break
compatibility with earlier non-privileged ISA specs.
x6 <= old pc + 4

________
1. A stack typically has three operations, not just two. Push and
pop we're all familiar with, and this is what happens during the
*execute* stage. A *Peek* is a non-destructive read of the top of
stack.

--
Samuel A. Falvo II

Samuel Falvo II

unread,
Jul 13, 2018, 12:23:02 PM7/13/18
to Pierre G., RISC-V ISA Dev
On Fri, Jul 13, 2018 at 9:21 AM, Samuel Falvo II <sam....@gmail.com> wrote:
> ________
> 1. A stack typically has three operations, not just two. Push and
> pop we're all familiar with, and this is what happens during the
> *execute* stage. A *Peek* is a non-destructive read of the top of
> stack.

Apologies; in the context of how the RISC-V uses the RAS, I should
have said there are four operations:

- Push
- Pop
- Peek
- Push & Pop, which is just an ordinary exchange operation.

Cesar Eduardo Barros

unread,
Jul 13, 2018, 7:22:14 PM7/13/18
to Pierre G., RISC-V ISA Dev
Em 13-07-2018 08:39, Pierre G. escreveu:
> As per definition in riscv-spec-2.2 :
>
> The indirect jump instruction JALR (jump and link register) uses the
> I-type encoding. The target
> address is obtained by adding the 12-bit signed I-immediate to the
> register rs1, then setting the
> least-signi cant bit of the result to zero. The address of the
> instruction following the jump (pc+4)
> is written to register rd.
>
>
> And later, there is the Table 2.1 :
>
> rd     rs1   rs1=rd    RAS action
> !link  !link   -       none
> !link  link    -       pop
> link   !link   -       push
> link   link    0       push and pop
> link   link    1       push
>

This table mentions the effect JALR has on the "return address stack".
However, the "return address stack" has no effect on the JALR
instruction. The behavior of the JALR instruction is still as described
by the definition above.

But if the "return address stack" has no effect, why does it exist? It's
basically an optimization: the processor guesses that the value it just
got from the "return address stack" is going to be the address of the
next instruction, and starts speculatively executing from there. When
the processor gets the real address, it checks if its guess was correct.
If it wasn't, the processor says "oops, nevermind", discards the
speculative work it just did, and tries again with the correct address.

Notice that the observable effect of both cases (correct guess and wrong
guess) is the same (*). The only difference is that it's a little faster
when it guesses correctly, and a little slower when it guesses wrong.

You didn't see it on the source code of the Spike simulator because it
doesn't implement this optimization, and in fact this optimization
doesn't make a lot of sense for a simulator in the first place.

(*) ...except when it isn't. The recent Spectre class of vulnerabilities
showed that what shouldn't be observable sometimes is, and an attacker
can extract the results from a computation which never happens. It's as
much of a mindscrew as it sounds.

--
Cesar Eduardo Barros
ces...@cesarb.eti.br

Rogier Brussee

unread,
Jul 14, 2018, 10:37:57 AM7/14/18
to RISC-V ISA Dev, pig...@gmail.com, ces...@cesarb.eti.br
It exists so that one can do 

auipc ra hi(imm32)
jalr ra ra lo(imm32)

This is effectively the same thing as a pseudo instruction

CALL ra imm32

The difference with the regular jal instruction is that the jal instruction only has a 20 bit immediate and an implicit trailing 0. Using ra as both rs1 and rd means that the pair can be macro-op fused, and both internally in hardware and for jit compilers in software can be a treated on the same footing as a jal instruction with a longer immediate.  Indeed, it is actually the CALL pseudo instruction used in the gas assembler. I think gas is clever enough to use jal if the immediate fits in 20 +1 bits, or perhaps gcc itself or the linker does that, but lets ignore that here. 


(aside:

now that a larger body of software is available it would be interesting to know what the effect is of using the last remaining open slot in the C extension for 
C.jalr_ra_ra imm11 --> jalr ra ra imm11<< 1

far calls (i.e. CALL's) are generally useful, high up the list in real software (e.g. in the kernel), and in this way would reduce to a near optimal 48 rather than 64 bit.
)

The original CALL pseudo was  something like

auipc t2 hi(imm32)
jalr ra t2 lo(imm32)

This has the pesky visible side effect of setting t2.  This means that you may have to needlessly save and restore t2 (although it is not so likely, as there are lots of temporary registers), but more importantly it makes it ever so much harder to treat the two instructions as the nice predictable call that this idiomatic combination intends to be rather than a more expensive indirect call. The only reason not to use rs1 == rd <- ra  is its interaction with the return stack. Indeed they had to change the recommendation for the return stack in the spec  and the CALL pseudo to what it is now, because originally it said 
jalr  rd ra means pop the stack, and people pointed out the advantages of using rs1 == rd <- ra for a push only after the 1.0 version of the I spec. 

However, _apparently_ it is also useful to do a push and pop simultaneous for rather more rare coroutine style program flow control. So they special cased

jalr ra t0 lo(imm)
jalr t0 ra lo(imm)


which has probably low hardware cost. 


See this discussion


and its outcome





Op zaterdag 14 juli 2018 01:22:14 UTC+2 schreef Cesar Eduardo Barros:
Reply all
Reply to author
Forward
0 new messages