Possible extensions for IBT

196 views
Skip to first unread message

Xuejie Xiao

unread,
Oct 20, 2020, 12:38:50 AM10/20/20
to RISC-V ISA Dev
Hi there,

We are now building ckb-vm(https://github.com/nervosnetwork/ckb-vm), which is a software based RISC-V implementation to use in the blockchain world. One requirement we have, is to make sure the software running in this RISC-V ISA be as secure enough for certain fields such as fintech.

Due to those reasons, we are looking for possible ways to help enhance the security of the programming running in our RISC-V environment. Some techniques we are interested include:

* Indirect Branch Tracking: certain markers are included in the program, where indirect jumps can only jump to basic blocks that start with those markers. This paper describes the technique in more details: https://www.usenix.org/system/files/conference/usenixsecurity13/sec13-paper_pappas.pdf. Notice that latest Intel CPU also has this feature supported.
* Memory tagging: specific taggings are added to memory address, which can help detect memory safety issues. This article has more details: https://lwn.net/SubscriberLink/834289/a8f6c7c67ccca2c3/. The latest Linux kernel already has this feature supported for arm64 architecture.

So my questions here are:

1. Do we have any existing extensions, that would include, or plan to include those features?
2. If the question to 1 is no, will anyone be interested in such an extension, if we put forward a proposal for an extension with those features?

We can of course built an extension that is tailored for our environment, but we still want to check here, if anyone else shares the same interest, so we can perform joint efforts here.

Thanks
Xuejie Xiao

Allen Baum

unread,
Oct 20, 2020, 2:22:07 AM10/20/20
to Xuejie Xiao, RISC-V ISA Dev
The pointer masking proposal in the J-extension, documentation here:
might do what you want for memory tagging

I don't recall any official extension that  implements portals, which is what I would call the indirect branch tracking.
HP PA-RISC had support for that, via an op that branched to a special kind of page or segment, as I recall, which trampolined to the actual code.


--
You received this message because you are subscribed to the Google Groups "RISC-V ISA Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+u...@groups.riscv.org.
To view this discussion on the web visit https://groups.google.com/a/groups.riscv.org/d/msgid/isa-dev/33730570-cb8e-4a3c-aa14-5a7703f784c3n%40groups.riscv.org.

Samuel Falvo II

unread,
Oct 20, 2020, 2:35:20 AM10/20/20
to Allen Baum, Xuejie Xiao, RISC-V ISA Dev
IBT doesn't involve trampolines; they use special no-op instructions
which must exist at the effective address of an indirect branch. The
CALL or JMP instruction *itself* checks to make sure this no-op
instruction exists at the address specified before committing to the
jump. If it's not there, it traps, allowing the OS to intervene.
> To view this discussion on the web visit https://groups.google.com/a/groups.riscv.org/d/msgid/isa-dev/CAF4tt%3DBya-%2BZS9pnsfzePGQ65daU7cT3DteEOLLXiMtMpwe6Kg%40mail.gmail.com.



--
Samuel A. Falvo II

Xuejie Xiao

unread,
Oct 20, 2020, 10:36:39 PM10/20/20
to RISC-V ISA Dev, Allen Baum, RISC-V ISA Dev, Xuejie Xiao
Thanks! It indeed looks like J provides what we are looking for in pointer masking.

Xuejie Xiao

unread,
Oct 20, 2020, 10:38:27 PM10/20/20
to RISC-V ISA Dev, Samuel Falvo II, Xuejie Xiao, RISC-V ISA Dev, Allen Baum
Yes that is exactly what I was referring to here. Assuming we want to start building a draft for such an extension, would this be something that people are interested in? Or would it make more sense, if we just contribute it to J?

Allen Baum

unread,
Oct 21, 2020, 12:48:34 AM10/21/20
to Xuejie Xiao, RISC-V ISA Dev, Samuel Falvo II
That's tricky and sounds really nasty to implement. Thinking out loud here:
What if there is an interrupt between the indirect branch and the special noop? What if there is a page fault on the target?
Do trap handlers all need to have a special noop? That will break the trap handlers, especially in vector mode.
You either need to have all indirect branches do this, or carve out a new opcode (which perhaps doesn't need an offset at least).

This would require a processor state bit (almost certainly in mstatus) to remember the fact that the last instruction was a U-mode indirect branch - and that the next instruction should trap if it is set and in Umode unless it is the special-noop. The bit is cleared when a U-mode non-indirect branch is executed.

That's pretty ugly, and I'm sure there are corner cases galore (not to mention really difficult to fit into high-end pipelines.)
The HP approach is simpler: an execute only page that simply contains nothing but branches to entry points. It's not as performant, but you don't need to make any HW changes to make it work and it requires that only a single 32b op for the branch, which limits it to a 1MB range. You can always branch to another trampoline if that range isn't good enough.

The special bit is cleared by 

Ved

unread,
Jun 15, 2021, 10:47:56 AM6/15/21
to RISC-V ISA Dev, Allen Baum, RISC-V ISA Dev, Samuel Falvo II, Xuejie Xiao
There would be one such flag for each mode (and a prev-copy of the flag to nesting) - so whether the trap handler needs the noop - depends on whether IBT is enabled in that mode.  An trap that occurs after the jump has retired but before the next instruction is fetched will preserve the "waiting-for-noop" indication. When the trap handler does a SRET, the front-end when it fetches the instruction at sepc will cause a trap if its not a special noop. The flag that indicates that the special noop is require is only cleared when the noop retires. The noop itself does not need to execute - it can be safely elided by the machine as it serves no other purpose. So if there is a page fault, instruction breakpoint, or an access fault at target of jump, the flag stays set. 
These have been implemented in high-end out-of-order pipelines both by Intel (CET/IBT) as well as ARM (BTI). 
I did not understand the concern about vector mode. 
" The bit is cleared when a U-mode non-indirect branch is executed." - The bit is only set by the jump op - So non-indirect branches dont need to do anything special.


Allen Baum

unread,
Jun 15, 2021, 3:48:42 PM6/15/21
to Ved, RISC-V ISA Dev, Samuel Falvo II, Xuejie Xiao
It took me a while to remember this from 8 months ago. 
The functionality of this flag is similar to the debug single step bit which causes a breakpoint exception after the execution of an instruction.
 - it gets set on any indirect jump or call for which an enable control bit is set.
 - it gets cleared by the special noop instruction (which - strictly speaking - isn't a noop because it does modify hart state - the special flag.

A per/mode status  bit is needed to indicate the flag state at the time of entry from that mode.
A per/mode control bit is needed to cause indirect transfers to set the status flag, 
Indirect transfers, and the noop, would set (if enabled)/clear only the flag associated with the current mode.

You need to think  about which modes are allowed to set/ clear it the enable bit;
 i.e. can it only be enabled/disabled from privilege modes higher than the mode it corresponds to, or even at the same mode.
(if only from higher modes, then it can't be used for mmode; but if same higher or same, then malware it just disable it;
 or they could be implemented as W1S so they can be set from the same level but not cleared, but still could be cleared by a higher level.
I am unaware of any other CSR bit with that mode specific functionality....

You need to think about how this interacts with the hypervisor as well.

This will take at least a (possibly restricted view) CSR per mode for virtualization, so a pair of bits per mode.
mstatus bits are scarce and valuable, and this would use 6 (or more for hypervisor) which argues against using this.
This is security related, so putting them into mseccfg (used by the ePMP extension and perhaps others) might be more appropriate,
though that would require adding ssecfg and possibly usecfg as restricted view versions (I don't think the currently exist)

vedvya...@gmail.com

unread,
Jun 16, 2021, 4:02:24 PM6/16/21
to Allen Baum, RISC-V ISA Dev, Samuel Falvo II, Xuejie Xiao

By the nature of this tracking bit software cannot observe the 1 setting of this bit in hardware. So, malware clearing it is not a concern since to clear it you would need to use an instruction that is not the special no-op and when this tracking bit is 1 any instruction but the no-op causes a fault. Setting this bit will require the next instruction that is fetched to be the special no-op – so again there is either a fault or if the next instruction is the special no-op the bit is defused and software in that mode cannot observe the 1-setting of this bit. The state of the tracking bit itself would be reported as a “prevIBT” state in mstatus and sstatus. For VS mode it would be in the hsstatus. So, it’s not a concern to enable in m-mode – the stateen for s-mode would enable it for S and for M-mode would enable it for M mode execution. The enable may not be appropriate in mstatus, this is better to be in the “stateen” register that we are discussing for extended ISA.

 

Ved

Allen Baum

unread,
Jun 17, 2021, 1:29:36 AM6/17/21
to Ved, RISC-V ISA Dev, Samuel Falvo II, Xuejie Xiao
Sorry, this isn't making sense. There is a CSR enable for each mode, If the enable for umode tracking is in a Umode CSR, then a CSRRC instruction will clear it and no branching is required. No noop instruction gets anywhere near Malware doesn't need to observe anything, just needs to clear the track-enable bit, and after that, no checking is performed.
Setting or clearing the enable bit doesn't need to be followed by a special noop. 
Setting the track bit does, but we aren't talking about that here.
So: the enable bit is a RW CSR bit. The state of the tracking bit is a RO CSR bit.

So I'll ask again: which modes can read and write the enable and status bits - 
 - the modes they control (so Umode tracking enable can be set and cleared by Umode or anything more privileged), or
 - modes that are higher privileged (so Umode tracking can only be enabled and disabled by S or M modes)
If the latter is the case, then there is no higher mode than M-mode, and Mmode can't be tracked
If the former is the case, then Umode malware could disable Umode tracking

vedvya...@gmail.com

unread,
Jun 17, 2021, 7:47:58 AM6/17/21
to Allen Baum, RISC-V ISA Dev, Samuel Falvo II, Xuejie Xiao

Good morning Allen. Sorry if I was not clear. I will try again.

 

There are two bits we are discussing – IBT-enable, IBT-tracker.

 

IBT-enable – enable branch tracking for termination using the branch-terminating noop.

  1. The enable for u-mode is a RW CSR bit that can be set/clear by S or M mode
  2. The enable for s-mode is a RW CSR bit that can be set/clear by M mode.
  3. The enable for m-mode is a RW CSR bit that is lockable or is sticky read-write (RWS) bit, by m-mode

 

The IBT-tracker – tracks the current state of the branch tracker.

  • This bit is 1 if there was a trap between retirement of an indirect branch and retirement of a branch-terminating noop at the branch target.
  • If a trap occurs in this window, the state of the bit needs to be preserved. The behavior of the trap handler is to store the state of the bit in a prevIBT bit in sstatus, mstatus, or vssstatus CSR depending on which mode will handler the trap.
  • This bit is a RW bit since it may need to be context switched.
  • This bit will be restored from [m|v|h]status CSR on a [S|M]RET that resumes execution of the interrupted program.

Allen Baum

unread,
Jun 17, 2021, 12:48:30 PM6/17/21
to Ved, RISC-V ISA Dev, Samuel Falvo II, Xuejie Xiao
Ah, very good, much better. This answers my question. 
I would say that the IBT-Tracker bit is:
 - set on an indirect branch
 - cleared when either a branch-terminating noop is executed or a trap occurs
 - restored/loaded from one of the xprevIBT-tracker bits when a *RET instruction is executed into the associated mode x

And the per-mode xprevIBT bits are:
 - loaded from the IBT tracker when a trap* occurs in the mode x
 - cleared when a xRET is executed into the associated mode
 - can be set or cleared by a CSR op with appropriate privileges

* does this need to be handled differently between 
  a trap caused by encountering a non-branch-terminating noop when the IBT-tracker bit is set, and
  another trap cause before encountering non-branch-terminating noop?
 (and, do you re-use the illegal op trap or create a new one? 
    It's as if you are treating the IBT-tracker as an extra opcode bit, and only one opcode is defined with IBT-tracker=1.
     Unclear to me is if the timing would work, since you might not know the state of the IBT-tracker bit until after decode of jump target instruction)

Will there also be a compressed version of the noop?

You might get resistance to using 3 xstatus CSR bits for this, as the number of unused bits is limited.
 (you need separate bits for each mode, because xstatus CSRs are restricted view
   - there is only one copy, and which bits are allowed to be read or written is mode-specific)
An alternative is to put them into 3 separate CSRs (I suggesteds mseccfg, and define new sseccfg and vseccfg CSRs) which have plenty of unused bits)


vedvya...@gmail.com

unread,
Jun 18, 2021, 8:56:15 AM6/18/21
to Allen Baum, RISC-V ISA Dev, Samuel Falvo II, Xuejie Xiao

So, on the trap itself an illegal op trap would be appropriate. The prevIBT provides an indication about why it was illegal.

 

On the IBT-tracker itself, you would have a decode copy of the tracker in the machine and a retirement copy of the tracker in the machine – both being in the in-order part of the machine. The retirement copy being used to recover the decode copy on mispredictions so the timing for this does not look onerous. The decoder will since it just decoded a jump instruction moves its copy to 1 and proceeds to decode the predicted target of the jump and if it is not the noop marks that instruction to fault (if it retired). If the prediction was wrong the decode copy will be recovered from the retirement copy. The machine can also stop allocation if at the predicted target of the jump it did not find the noop so it could prevent even speculative execution of code if the branch somehow got trained to non-branch target sites in the program.

 

There should not be a compressed version of the no op. An uncompressed instruction could have up to 20 bits of immediate and that may lead to an alias with the no op encoding and that would be unfortunate.

 

Reporting tracker state would need 2 bits:

  • SprevIBT – on an A->S or a S->S trap, the SprevIBT holds the state of the tracker before the trap. SRET will restore it from SprevIBT
  • MprevIBT – on a A->M, S->M, M->M trap, the MprevIBT holds the state of the tracker before the trap. MRET will restore it from MprevIBT

 

Vedvyas

Jeff Jacobson

unread,
Jun 18, 2021, 2:50:43 PM6/18/21
to RISC-V ISA Dev, RISC-V ISA Dev, Allen Baum
I'm somewhat curious why the pointer masking proposal is party of the J (dynamically translated languages) extension.

Memory safety and control-flow integrity are mainstream topics which are important to modern computers in general, irrespective of how code is compiled. 

I'm wondering whether there is any holistic effort to address memory-safety and code-flow integrity in RISC-V, or if the plan (such as it is) is to allow this to trickle-into the architecture in a piecemeal and uncoordinated manner.

~Jeff

FangFei Yang

unread,
Jun 24, 2021, 12:51:05 AM6/24/21
to RISC-V ISA Dev, Allen Baum, RISC-V ISA Dev, Samuel Falvo II, Xuejie Xiao, Ved
IBT-tracker is set when an indirect jump commited. And prevent any other instructions, except endbranch to commit. I guess it has nothing to do with the decode stage?

BTW, I'm more curious how people define indirect jump in riscv? jr %x looks like return to me so it might not come with end branch, and what about jalr? 

Allen Baum

unread,
Jun 25, 2021, 4:59:17 PM6/25/21
to FangFei Yang, RISC-V ISA Dev, Samuel Falvo II, Xuejie Xiao, Ved
JR is just JALR with x0 has the link target; it is a special case of JALR.
Technically, ECALL is indirect, since the trap address is in an xTVEC CSR, so you can resume execution at different addresses from the same source instruction, wheree a JAL or Bxx would not.
Reply all
Reply to author
Forward
0 new messages