JALR - potential hazard and suggestion that JALR with odd immediate be NSE

231 views

Skip to first unread message

David Horner

unread,

Jul 13, 2016, 9:04:55 PM7/13/16

to RISC-V ISA Dev

Section 14.4 of User-Level ISA, Version 2.1 describes JALR thus:

Note that the JALR instruction does not treat the 12-bit immediate as multiples of 2 bytes,
unlike the conditional branch instructions. This avoids one more immediate format in hardware.
In practice, most uses of JALR will have either a zero immediate or be paired with a LUI or
AUIPC, so the slight reduction in range is not significant.
The JALR instruction ignores the lowest bit of the calculated target address. This both
simplifies the hardware slightly and allows the low bit of function pointers to be used to store
auxiliary information. Although there is potentially a slight loss of error checking in this case,
in practice jumps to an incorrect instruction address will usually quickly raise an exception.

The "slight loss of error checking" probably refers to the validation that occurs when a function pointer is required to be on a 16-bit boundary. A random erroneous value will have a 50/50 chance of being caught before the jump, but the likelihood is that even the erroneous even address would "usually quickly raise an exception".

The intent, as I read it of the above section is that the low order bit in the function pointer can be used without concern that the JALR will use it for address calculation.

However, if in the JALR instruction the low order bit of the immediate is set this is not the case.
The JALR will branch to a target when an auxiliary bit is zero, and two bytes following that target when that auxiliary bit is set.
I will call this the "double odd" effect.

Three design decision were necessary and combined to raise this issue.

    a) the JALR immediate can be odd; unlike other control flow instructions which have a word aligned value.
    b) the addressing register can have an odd value (except of course x0).
        The reason is well explained in the excerpt above.
    c) RVC allows for instructions to start on a 16-bit boundary.

Decisions a and b allow for a carry from the LSB to occur which meant the LSB of the register could advance the target address.
Decision c allowed for both possible target addresses to be executed.
    (One or the other would have caused an alignment trap without CSV present).

This allows a potential hazard and a potential trojan code opportunity.

Again, note that this functionality only comes about as a result of the RVC specification.
The base ISAs only allow 32 bit aligned instructions, and unaligned fetches would result in exceptions.
This NSE proposed to an ISA that is already at V2.1 is really only applicable as it supports a RVC that is currently only at 1.9.
Arguably necessary base ISA extensions that are required for optimal RVC definition can be included; as (arguably) already done for C.JAL and C.JALR, redefining the return address from PC+4 to PC+<instruction length>.

Marking the odd immediate JALR as NSE would highlight the possibility that the low order bit of the function pointer may have unintended consequences.
That the intent of allowing it to be used for auxiliary purposes is not guaranteed.

Possible uses of this functionality:

I posted on isa-dev ( and also on sw-dev, my bad, apparently)
        "enhanced subroutine invocation mode: A proposal for a significant use of the JALR odd bits "side-effect"".

Obviously, the functionality can be leveraged to some useful purpose.

Therefore, I do not recommend prohibiting the behaviour all together.
(This could be done by redefining JALR to effectively clear both low order bits before they are added.)
Rather, I propose that they be defined as NSE, which would highlight that even this defined behaviour cannot be relied upon unless so specified by the HW arch.

In any event, I am concerned about the hazards and the coding that may occur.

Hazards:

    Hardware implementation limitations on jumps into 32bit instructions.

    (1) First, is it allowable in all situations for a jump into a RV32/64I instruction?
    Must this be supported on all "compliant" RVC implementations?
    These questions are relevant to RVC irrespective of this NSE proposal.

    If a 32 bit instruction is already cached, perhaps along with decoding of determined exclusion settings might these clash with an execution RVC that is included within the same instruction?

    A HW implementation may determine that the tagging of the instruction is best done using the instructions high order word. That would definely clash with the ability to execute the RVC instruction within it.

    (2) The "double odd" JALR functionality can be used to selectively vector between two RVC instructions (allowing drop through or more typically I expect two C.J ).
    However, there would be the temptation to use a LUI x0,<immediate including RVC in upper word> trick.
    This is described in the "simpler" implementation of the enhanced subroutine invocation mode: A proposal for a significant use of the JALR odd bits "side-effect".

    If there is no guarantee that every RVC must allow proper sub-instruction execution, then unreliable execution would likely ensue.

    Marking odd offset JALR as NSE would highlight this hazard.

    Latent bugs and exploits.

    (a) In that this "double odd" feature is anticipated to be rarely used, it is also less likely to be extensively tested in a HW implementation.

        The explicit awareness that a NSE designation provides would highlight the need to test this case if they are indeed supporting it.

    (b) Excluding the low order bit from an adder that handles address calculation might seem a viable trade-off.
     Why encumber that block and use extra traces for a non-existent situation?
     This could certainly be decided intentionally or unintentionally (see (a) above).
     Such an implementation would likely describe this in the errata as a feature and not a bug.

     Defining this case as NSE encourages for such design decisions to be made explicitly and intentionally.

    (c) Assuming this "double odd" behaviour is operational on all hardware is risky.
        Due to the two HW implementation considerations above, it is very possible that the behaviour is only approximately there or it is absent.

       Defining JALR with odd immediate as NSE helps to address the likely variation in implementation.
       It allows the eco-system to tolerate such plausible variants.
        It gives the producer the opportunity to legitimately claim its implementation feature rich, rather than deficient.
   This informs the consumer, and allows better comparison of competitive products.
             Especially for what would otherwise be a deficiency that most could care less about.

    (d) Highlighting this "double odd" behaviour with NSE and allowing variants that do not carry from the LSB could be desirable for hardened and hardening systems.

        This ability to have code act in two distinctly different ways when the pointer is odd vs even allows for Trojan code.
         Software can be tested and proven correct, but with a minor change to an address pointer the code is hijacked.
        The pointer oddness could be performed in a linkage editor and potentially would not affect any other code so linked.
        Or it could be deemed to be harmless, a typical "transparently hide an auxiliary bit" that many applications use.
            a scan to see that the bit is not checked anywhere in the code would pass.
              or a generalized routine that is innocuous but validates the setting of the bit would justify its cause to be set.

        Prohibiting the carry from LSB would avoid concern for this attack vector.
        Designating NSE would allow for a legitimate "hardened" implementation.

Michael Clark

unread,

Jul 13, 2016, 11:42:25 PM7/13/16

to David Horner, RISC-V ISA Dev

I like the sound of excluding carry and using an either or of either the odd bit from the immediate or the odd bit from the register (a logical or of these two bits to indicate the NSE) as an approach to trigger the requirement for a landing pad sequence that indicates the target procedure is virtual (the target of an indirect call). This has been discussed on the list already but no details on specific encodings.

If either bit were set (in the register at runtime or in the immediate by the compiler at compile time) then an 'or' implementation could use this as a signal to check for a (specific) landing pad NOP instruction sequence. As you mention, the general ISA is a superset and has many NOPs so an implementation would be free to have an errata that caused it to fault on one 16-bit NOP and one 32-bit NOP such that an implementer could exploit this errata as a feature.

A strict verifier, with a stricter RVC derivative, could check that all continuations (jump targets) are 32-bit aligned and begin with a 32-bit NOP instruction (thinking about Google's pre-existing NaCl sandbox technique here) and if not have a realigning 16-bit NOP.

Indirect calls to a block on a 16-bit boundary could require a (specific) 16-bit NOP and (specific) a 32-bit NOP if either of the odd bits are set and the target address excluding the odd bits is 16-bit aligned, or require a single (specific) 32-bit NOP if the target is 32-bit aligned. Basically a re-alignment sequence and indirect call target marker.

I do not understand the attacks that are above 4-byte boundaries with an indirect call target marker on RISC-V given the maximum instruction length is 4-bytes.

Is there any reason why a block would need to for example be 16-byte (128-bit) aligned or is it only relevant that we align to the longest possible instruction on the ISA to prevent instruction embedding attacks? assuming we mark indirect call targets.

Of course implementations with 6 or 8 byte instructions would need similarly higher alignment and padding requirements for indirect jump targets.

I understand some architectures allow an undefined number of modifier bytes that can result in arbitrarily long instructions but I don't understand why it would be necessary for blocks to be aligned any higher than the maximum instruction length assuming there is a marker to indicate code that is a target of an indirect call (address resolved at runtime).

The question is JALR x0,ra,0 or RET. This needs to be able to tail call an explicit continuation i.e. the case where ra is not PC + length(inst), rather another entry point.

Perhaps extending this scheme we would use JALR x0,ra,1 to indicate the tail call case (I am thinking about return as simply an indirect call to the continuation address following the procedure call that transferred control. We need to restrict regular returns to not return to a landing pad (does this raise the alignment requirement?).

Note ra could be any one of the 31 registers in the case I am thinking about as we may like to shuffle registers, and thus not have any special link register specific optimisations. I'm thinking of implementing a post compilation register shuffler in a link loader.

I need to think about this more to understand it more completely.

Sent from my iPhone

--
You received this message because you are subscribed to the Google Groups "RISC-V ISA Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+u...@groups.riscv.org.
To post to this group, send email to isa...@groups.riscv.org.
Visit this group at https://groups.google.com/a/groups.riscv.org/group/isa-dev/.
To view this discussion on the web visit https://groups.google.com/a/groups.riscv.org/d/msgid/isa-dev/4433d89a-abf6-41ec-8ded-23a89fbec3ab%40groups.riscv.org.

David Horner

unread,

Jul 14, 2016, 8:34:54 AM7/14/16

to Michael Clark, RISC-V ISA Dev

On 2016-07-13 11:42 PM, Michael Clark wrote:

I like the sound of excluding carry and using an either or of either the odd bit from the immediate or the odd bit from the register (a logical or of these two bits to indicate the NSE) as an approach to trigger the requirement for a landing pad sequence that indicates the target procedure is virtual (the target of an indirect call).

Ok - so this was not specific to JALR as such, but an extended op code?

This has been discussed on the list already

I searched for JALR on the list and did not find this suggestion related to it.
Can you please give me the discussion thread name. thanks.

but no details on specific encodings.

If JALR with an odd immediate were designated as NSE then it could be used for such an implementation.

If either bit were set (in the register at runtime or in the immediate by the compiler at compile time) then an 'or' implementation could use this as a signal to check for a (specific) landing pad NOP instruction sequence. As you mention, the general ISA is a superset and has many NOPs so an implementation would be free to have an errata that caused it to fault on one 16-bit NOP and one 32-bit NOP such that an implementer could exploit this errata as a feature.

I believe the design and detailed specification of RISCV was intended to avoid such adhoc extensions and resultant fragmentation.
The designers left room in the opcode space for implementers to "experiment" and specifically have NSEs for such purposes.

However, there is no need for designating the NOPs as faulting as the canonical illegal instructions x'00' and x'0000' will work just as well.

A strict verifier, with a stricter RVC derivative, could check that all continuations (jump targets) are 32-bit aligned and begin with a 32-bit NOP instruction (thinking about Google's pre-existing NaCl sandbox technique here) and if not have a realigning 16-bit NOP.

Indirect calls to a block on a 16-bit boundary could require a (specific) 16-bit NOP and (specific) a 32-bit NOP if either of the odd bits are set and the target address excluding the odd bits is 16-bit aligned, or require a single (specific) 32-bit NOP if the target is 32-bit aligned. Basically a re-alignment sequence and indirect call target marker.

I do not understand the attacks that are above 4-byte boundaries with an indirect call target marker on RISC-V given the maximum instruction length is 4-bytes.

Is there any reason why a block would need to for example be 16-byte (128-bit) aligned or is it only relevant that we align to the longest possible instruction on the ISA to prevent instruction embedding attacks? assuming we mark indirect call targets.

Of course implementations with 6 or 8 byte instructions would need similarly higher alignment and padding requirements for indirect jump targets.

I understand some architectures allow an undefined number of modifier bytes that can result in arbitrarily long instructions but I don't understand why it would be necessary for blocks to be aligned any higher than the maximum instruction length assuming there is a marker to indicate code that is a target of an indirect call (address resolved at runtime).

The question is JALR x0,ra,0 or RET. This needs to be able to tail call an explicit continuation i.e. the case where ra is not PC + length(inst), rather another entry point.

I would like to continue discussion of this on the original thread.
I certainly like the idea of vectored subroutine branches with "hardware" support.
Especially and it helps alleviate the concern that the two instruction sequence used to generate addresses is very difficult for a linker to adjust.

Perhaps extending this scheme we would use JALR x0,ra,1 to indicate the tail call case (I am thinking about return as simply an indirect call to the continuation address following the procedure call that transferred control. We need to restrict regular returns to not return to a landing pad (does this raise the alignment requirement?).

yes, another possible and more JALR specific example of what could be implemented if odd JALR are designated NSE.

Note ra could be any one of the 31 registers in the case I am thinking about as we may like to shuffle registers, and thus not have any special link register specific optimisations. I'm thinking of implementing a post compilation register shuffler in a link loader.

I need to think about this more to understand it more completely.

Thank you for your response.
Good thoughts.

34duc

unread,

Jul 14, 2016, 10:58:51 AM7/14/16

to RISC-V ISA Dev, michae...@mac.com

On Thursday, July 14, 2016 at 8:34:54 AM UTC-4, David Horner wrote:

I certainly like the idea of vectored subroutine branches with "hardware" support.
Especially and it helps alleviate the concern that the two instruction sequence used to generate addresses is very difficult for a linker to adjust.

The multiple instruction sequences generated for various things are no problem at all for the linker to adjust.

There are different relocation types which are used to indicate to the linker which bits of the result (symbol + offset) or (symbol1 - symbol2 + offset) should be placed where in the bytes (of code or data) to which the relocation is attached.

There is no need for the instructions to be next to each other either. They can also be mingled with other instructions requiring relocation as well.

The linker can also remove and change two instruction combinations into single instructions if the target of the relocation is not too far away (relative) or not too big (absolute) (or vice versa).

Michael Clark

unread,

Jul 14, 2016, 11:59:49 AM7/14/16

to David Horner, RISC-V ISA Dev

> On 15/07/2016, at 12:34 AM, David Horner <ds2h...@gmail.com> wrote:
>
> I searched for JALR on the list and did not find this suggestion related to it.
> Can you please give me the discussion thread name. thanks.

I think the discussion was more general in terms of jumps to odd addresses targeting instruction embeddings and may not have referenced the odd bits and carry behaviour in JALR as precisely as you have outlined it in your email.

It is discussed for other architectures in Google's "NaCl Sandbox paper". I can hunt for a link to the earlier message however the paper should be the canonical link as these corner cases exist on many architectures. Google has worked on verifiers for x86, x86-64 and ARM. Their use however has been targeted as a "client" sandbox for native browser extensions using PPAPI and NaCl whereas my particular interest is in reporpoiseing this in a "server" sandbox.

With respect to my precious email I made the erroneous statement regarding a landing pad NOP sequence as faulting rather than the specific sequence "not faulting" if it is the target of an indirect call with either of these odd bits set ("logic or" of the LSB or the register and the immediate). Regular instructions should in fact fault if the marker is not present. Brain was travelling faster than my fingers.

I believe Microsoft is investigating this with UD instructions with Intel (can find reference). I don't believe it is a proprietary technique as it is used on both ARM and x86 however it could always be implemented as some form or "pipeline errata" as you mention, for which we have to work around in the compiler - mostly just to keep the lawyers happy.

I like the semantic of the odd immediate being a compiler indication of an indirect target and the register being a runtime indication of an indirect target.

JALR x0,ra,1 is also very interesting and could be some form of TRET or Tail Return to indirect target as you may note that JALR x0,ra,0 is the encoding for the pseudo opcode RET, which as you imagine with the scheme we outline, would not require the landing pad to be present being a regular call return versus an indirect continuation (or tail call). However it would if ra was somehow populated with an odd address as you mentioned.

I quite like the idea of this requiring the processor to realign, and thus requiring a specific 4 byte or 6 bytes NOP sequence depending on whether the rounded address is 2 byte or 4 byte aligned.

You raise a very interesting case with JALR carry or no carry behaviour nevertheless. I like it. Especially the "no carry" semantic.

Cheers,
Michael

Sent from my iPhone

Michael Clark

unread,

Jul 14, 2016, 12:26:33 PM7/14/16

to 34duc, RISC-V ISA Dev

Yes indeed.

I have a very simple and at this stage "quite hacky" proof of concept for post compilation relinking without explicit relocation information.

https://github.com/michaeljclark/riscv-meta/blob/master/app/riscv-compress-elf.cc

It's a starting point. Compression was a good test case as it needs relocation of instruction pairs and these can be placed in atypical locations by the compiler (the specification implies the LUI,x and AUIPC,x pairs are emitted together however I discovered this is not the case.

The code I have at present is very fresh, needs a lot more work and doesn't relocate references outside of the code segment (I am inserting NOPs) however I have code to repackage and relink the ELF offsets to other sections, I just have not completed it. It does relocate debug symbols but it doesn't support DWARF in what I would call a POC.

There is likely feedback from this effort on what we can emit from the compiler codegen for a more easily verifiable target that removes some of the corner cases I am witnessing that I can't resolve statically. This would not be a typical RV target, rather a derivative that has stricter codegen constraints to remove ambiguous cases in a similar vein to Google's NaCl.

I understand the NaCl effort has moved on to WebAssembly however my use case is somewhat different to NaCl and I see RISC-V as an "ideal target" for a number of reasons.

It may be possible to have a hardware implementation that removes the need for some of the inserted opcodes (TCAM MPU for hardware bounds checks versus insertion of masking instructions around loads and stores).

There is also a presentation from some of the LLVM ASan developers on hardware support for accelerating the address sanitiser. I can try to dig up a link...