Instructions to support control flow integrity

239 views
Skip to first unread message

Watson Ladd

unread,
Mar 15, 2017, 11:52:32 PM3/15/17
to RISC-V ISA Dev
Dear all,

I have several ideas for potentially useful instructions and extensions that can support additional mitigations. The first is COME FROM. All jump targets must be COME FROM on chips that support it when this extension is used. As a result the available gadgets for control flow manipulation is reduced. COME FROM has no other effects. The second is a shadow stack modification which adds three instructions: CALL, CALLR and RETURN and two privileged-mode registers STACKBASE and STACKLIM as well as modify the memory protection system to add a shadow access type. STACKBASE and STACKLIM are conceptually per user or kernel process.

CALL(R) has the same format as JA(R)L and the same semantics except it does not touch the destination register. Instead the address is pushed onto a stack, that may be dumped to memory between STACKBASE and STACKLIM. RETURN pops this stack. Memory accesses issued by these dumps are in shadow mode: a page with the shadow bit set may be written or read by them. If the stack size exceeds STACKLIM an error is reported to the OS. Multiple processes can use the same shadow page because of the STACKBASE and STACKLIM registers. 

I look forward to discussing these proposals further/gathering other instructions that can assist in securing today's software.

Sincerely,
Watson

Michael Clark

unread,
Mar 16, 2017, 3:29:56 AM3/16/17
to Watson Ladd, RISC-V ISA Dev
Hi Watson,

On 16 Mar 2017, at 4:52 PM, Watson Ladd <watso...@gmail.com> wrote:

Dear all,

I have several ideas for potentially useful instructions and extensions that can support additional mitigations. The first is COME FROM. All jump targets must be COME FROM on chips that support it when this extension is used. As a result the available gadgets for control flow manipulation is reduced. COME FROM has no other effects. The second is a shadow stack modification which adds three instructions: CALL, CALLR and RETURN and two privileged-mode registers STACKBASE and STACKLIM as well as modify the memory protection system to add a shadow access type. STACKBASE and STACKLIM are conceptually per user or kernel process.

I like the idea of adding a COME FROM landing pad instruction whose presence is checked at the target program counter of any “indirect jump” i.e. JALR.

It is technically an Indirect Jump Target (IJT) instruction or an Indirect Jump Label (IJL). It would need to be a NOP on systems that didn’t support the instruction so that code is binary compatible. This makes one consider reserved NOP space (FENCE has space) versus reserved instructions that trigger illegal instruction traps versus NOPs that are already present in the ISA e.g. non control flow functions without side effects and zero as the destination register.

There are models that this COME FROM instruction could be implemented with and without a (shadow) stack. RISC-V does not technically have a stack, rather it has a return address link register with push and pop “stack hints”. In my pseudo asm below I’m labelling all return sites as indirect jump targets as architecturally returns are just indirect jumps with the return address register used as the target of the jump and zero as the link register.

- JAL (Jump And Link) target address control flow cannot be diverted other than under a model where the attacker gets control of the target page as the JAL target is a PC-Relative immediate encoded in the instruction text. JAL would not need to check for the presence of IJT at its target however a jump to the return address link register (a RET macro) would need to check for its presence so JAL would need to be followed by an IJT instruction for the indirect jump to the return address link register.

JAL ra, proc
ra: IJL

# static
proc: 
IJL # not technically required, could be optimised away if function is not extern
JALR zero, ra # ret is a pseudo instruction for jump and link register with zero for the link register

- JALR  (Jump And Link Register) control flow is open for abuse if an attacker can somehow get control of the rs1 register. IJT would need to exist at the target address and at the instruction following the JALR (in a stackless model) as it is also the target of the indirect jump to the return address link register (RET pseudo instruction). Note I’m using pseudo asm for an inline GOT reference versus a call to a PLT stub which is something like what would be emitted for a shared library call with the -fno-plt option. Apparently -fno-plt is a trend we might see if the -fno-plt patches make it in to GCC (or are there already).

AUIPC t1, %gotrel_hi(proc@GOT)
LD t0, %gotrel_lo(proc@GOT)(t1)
JALR ra, t0
ra: IJL

# extern
proc:
IJL # required, checked by JALR
JALR zero, ra

- AUIPC+JALR is special. If the processor starts at the program counter of the AUIPC (Add Upper Immediate Program Counter) in the AUIPC+JALR pair, and the destination register of the AUIPC is the same as the source register of the successive JALR, the pair can be considered a direct jump and can not have its control flow target address diverted as the register is statically initialised before the jump. However the program counter for the JALR in the AUIPC+JALR pair can split but under this model it would be considered a lone JALR if split and need a COME FROM sentinel at its target. Also, while the pair can technically be considered a direct jump, a straight forward implementation (without fusion) would still require a COME FROM sentinel as the JALR will be treated by most straightforward implementations as an independent instruction. We’ve had a discussion about this recently.

AUIPC t1, %pcrel_hi(proc)
JALR ra, %pcrel_lo(proc)(t1)
IJL

# extern
proc:
IJL # required by most implementations
JALR zero, ra

The problem areas are JALR zero, ra (return exploitation) and AUIPC + LD + JALR (writable GOT exploitation). “-fPIE -Wl,-z,relro,now -fno-plt” can mitigate the writable GOT vulnerability. Shadow control flow stacks can as you mention mitigate the return vulnerability. Read only GOT perhaps should be a compiler default but unfortunately ELF dynamic linking is designed to support runtime interposition and lazy binding (which is not great for security).

The added COME FROM instruction would prevent the ability to find arbitrary ROP gadgets anywhere in code. The only control flow diversion possible under this model would be to locations that are the target of regular call returns and exported functions.

Not labelling procedure call return targets would dictate a (shadow) control flow stack. If this memory is writable by regular STORE instructions then it would still be vulnerable to arbitrary write primitives, assuming its address could be found (is not randomised) or was not in a special “segment”.

CALL(R) has the same format as JA(R)L and the same semantics except it does not touch the destination register. Instead the address is pushed onto a stack, that may be dumped to memory between STACKBASE and STACKLIM. RETURN pops this stack. Memory accesses issued by these dumps are in shadow mode: a page with the shadow bit set may be written or read by them. If the stack size exceeds STACKLIM an error is reported to the OS. Multiple processes can use the same shadow page because of the STACKBASE and STACKLIM registers. 

RISC-V has no stack and uses a link register and there is only one indirect call instruction: JALR. RET is simply a macro that uses JALR with the return address link register as the target address (all returns are indirect). CALL is a macro for AUIPC+JALR which in some cases (mentioned above) can be regarded as a direct call.

I guess the “stack hints” could be used to maintain a shadow stack which allows detection of return address substitution. i.e. JALR zero, ra where ra differs from the entry ra (maintained in the shadow stack). I wonder if any of these returns would actually be valid. I can look into this as I could map JALR onto a stack in binary translation.

It is also possible for the compiler to use the SafeStack technique (*1) whereby control flow information is spilt separately from the variable sized stack frame. None of the RISC-V jump instructions dictate the use of a particular stack as they do on other architectures, but the “stack hints” could potentially be used for security.

RISC-V compilers can in fact save the return address in registers and may not always need to spill the return addresses onto the stack.

Regards,
Michael.


I look forward to discussing these proposals further/gathering other instructions that can assist in securing today's software.

Sincerely,
Watson

--
You received this message because you are subscribed to the Google Groups "RISC-V ISA Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+u...@groups.riscv.org.
To post to this group, send email to isa...@groups.riscv.org.
Visit this group at https://groups.google.com/a/groups.riscv.org/group/isa-dev/.
To view this discussion on the web visit https://groups.google.com/a/groups.riscv.org/d/msgid/isa-dev/77269e77-bda1-424b-8d7a-3b45d2225ad7%40groups.riscv.org.

Watson Ladd

unread,
Mar 16, 2017, 2:08:57 PM3/16/17
to RISC-V ISA Dev, watso...@gmail.com
Good point. I think we could mandate labeling of return targets as well, but it would be redundant with the CALL+RETURN proposal. On the other hand this is binary-compatible in a way CALL+RETURN is not.


CALL(R) has the same format as JA(R)L and the same semantics except it does not touch the destination register. Instead the address is pushed onto a stack, that may be dumped to memory between STACKBASE and STACKLIM. RETURN pops this stack. Memory accesses issued by these dumps are in shadow mode: a page with the shadow bit set may be written or read by them. If the stack size exceeds STACKLIM an error is reported to the OS. Multiple processes can use the same shadow page because of the STACKBASE and STACKLIM registers. 

RISC-V has no stack and uses a link register and there is only one indirect call instruction: JALR. RET is simply a macro that uses JALR with the return address link register as the target address (all returns are indirect). CALL is a macro for AUIPC+JALR which in some cases (mentioned above) can be regarded as a direct call.

I guess the “stack hints” could be used to maintain a shadow stack which allows detection of return address substitution. i.e. JALR zero, ra where ra differs from the entry ra (maintained in the shadow stack). I wonder if any of these returns would actually be valid. I can look into this as I could map JALR onto a stack in binary translation.
 
Longjmp(3) and friends like tail calls would break with that proposal. I know RISC-V has no stack, which is why I am proposing having different call and return instructions which will use a stack which is memory protected. Using the Safe Stack technique doesn't provide the same level of protection as a write-4 primitive+infoleak of the stack base results in exploitation. If compilers follow a stack discipline with a defined link register, we could maybe get away with adding on the restrictions to JALR and JAL instruction usage. But how would we differentiate between a return and a computed call?

It is also possible for the compiler to use the SafeStack technique (*1) whereby control flow information is spilt separately from the variable sized stack frame. None of the RISC-V jump instructions dictate the use of a particular stack as they do on other architectures, but the “stack hints” could potentially be used for security.

RISC-V compilers can in fact save the return address in registers and may not always need to spill the return addresses onto the stack.

Yes. I was thinking along the lines of a small hardware return-address stack to achieve similar savings, although that might be too much implementation complexity.

Michael Clark

unread,
Mar 16, 2017, 2:46:55 PM3/16/17
to Watson Ladd, RISC-V ISA Dev
I thought about this more after I sent the email. Context switches. Any state will require operating system code for context switches, plus longjmp as you mention. 

There could be separate labels for call targets versus call return sites and a modified “stack hint” could be used to check for either the indirect jump target label or the indirect jump return label depending on the call direction. The rules would need to be slightly different to the “stack hints”. Always require jump target label unless rd=zero rs1=ra (which is always a return). This further limits accessibility of gadgets based on direction. The disadvantage over a shadow stack is that a return could be diverted to another return point, but even with a shadow stack a call target can be diverted to the subset of labeled call targets. The advantage of this approach is it needs no state.

These are the official RISC-V stack hints:

JALR ra, x # rd=ra push hint
JALR x, ra # rs1=ra is the pop hint

We could use these hints to control the required target label:

JALR !(zero, ra) # require IJT (Indirect Jump Target)
JALR zero, ra # require IJR (Indirect Jump Return)

e.g.

AUIPC t1, %pcrel_hi(proc)
JALR ra, %pcrel_lo(proc)(t1)
IJR

# extern
proc:
IJT
JALR zero, ra

Regarding NOP space. I am not sure if we have reserved NOP space in the ISA for these types of extensions (for backwards compatibility). Some funct7 bits around FENCE in MISC-MEM makes a lot of sense for reserved NOP space if there were some reserved opcode bits that processors should not trap on. Overriding existing instruction that are effective NOPs is not as clean. It is a control flow FENCE in a sense.

It is also possible for the compiler to use the SafeStack technique (*1) whereby control flow information is spilt separately from the variable sized stack frame. None of the RISC-V jump instructions dictate the use of a particular stack as they do on other architectures, but the “stack hints” could potentially be used for security.

RISC-V compilers can in fact save the return address in registers and may not always need to spill the return addresses onto the stack.

Yes. I was thinking along the lines of a small hardware return-address stack to achieve similar savings, although that might be too much implementation complexity.

Regards,
Michael.


I look forward to discussing these proposals further/gathering other instructions that can assist in securing today's software.

Sincerely,
Watson

--
You received this message because you are subscribed to the Google Groups "RISC-V ISA Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+u...@groups.riscv.org.
To post to this group, send email to isa...@groups.riscv.org.
Visit this group at https://groups.google.com/a/groups.riscv.org/group/isa-dev/.
To view this discussion on the web visit https://groups.google.com/a/groups.riscv.org/d/msgid/isa-dev/77269e77-bda1-424b-8d7a-3b45d2225ad7%40groups.riscv.org.


--
You received this message because you are subscribed to the Google Groups "RISC-V ISA Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+u...@groups.riscv.org.
To post to this group, send email to isa...@groups.riscv.org.
Visit this group at https://groups.google.com/a/groups.riscv.org/group/isa-dev/.

Jacob Bachmeyer

unread,
Mar 16, 2017, 8:19:00 PM3/16/17
to Watson Ladd, RISC-V ISA Dev
I am uncertain if you are serious. I recognize "COME FROM" from
INTERCAL, although at least your proposal omits the
"action-at-a-distance" semantics INTERCAL assigns to it.

I argue that this most certainly does *not* belong in the base ISA,
although an extension could provide something similar. Multiple "safe
jump target" opcodes would be needed in any case, since most programs
will have quite a few of them and failure to separate distinct classes
will defeat the purpose of limiting control transfers. I suggest
limiting the control-flow restrictions to function entry points and a
separate "control returns here" for function calls.

RISC-V very intentionally does *not* actually have hardware support for
any kind of stack--the stack is purely a software object in RISC-V.
Instead, RISC-V uses a link register to hold the return address. This
allows call and return to avoid memory access, and some leaf functions
may avoid accessing memory entirely. The ability to use any register as
a jump target also means that programs are not required to use x1 to
store the return address, and there is already a concept of "millicode"
that uses x5 as the link register to allow function prologue/epilogue
sequences to be emitted once and shared between many functions, reducing
code size.

-- Jacob

Michael Clark

unread,
Mar 16, 2017, 9:06:48 PM3/16/17
to jcb6...@gmail.com, Watson Ladd, RISC-V ISA Dev

> On 17 Mar 2017, at 1:18 PM, Jacob Bachmeyer <jcb6...@gmail.com> wrote:
>
> Watson Ladd wrote:
>> Dear all,
>>
>> I have several ideas for potentially useful instructions and extensions that can support additional mitigations. The first is COME FROM. All jump targets must be COME FROM on chips that support it when this extension is used. As a result the available gadgets for control flow manipulation is reduced. COME FROM has no other effects. The second is a shadow stack modification which adds three instructions: CALL, CALLR and RETURN and two privileged-mode registers STACKBASE and STACKLIM as well as modify the memory protection system to add a shadow access type. STACKBASE and STACKLIM are conceptually per user or kernel process.
>>
>> CALL(R) has the same format as JA(R)L and the same semantics except it does not touch the destination register. Instead the address is pushed onto a stack, that may be dumped to memory between STACKBASE and STACKLIM. RETURN pops this stack. Memory accesses issued by these dumps are in shadow mode: a page with the shadow bit set may be written or read by them. If the stack size exceeds STACKLIM an error is reported to the OS. Multiple processes can use the same shadow page because of the STACKBASE and STACKLIM registers.
>> I look forward to discussing these proposals further/gathering other instructions that can assist in securing today's software.
>
> I am uncertain if you are serious. I recognize “COME FROM" from INTERCAL, although at least your proposal omits the "action-at-a-distance" semantics INTERCAL assigns to it.

Well, at least there is prior art.

> I argue that this most certainly does *not* belong in the base ISA, although an extension could provide something similar. Multiple "safe jump target" opcodes would be needed in any case, since most programs will have quite a few of them and failure to separate distinct classes will defeat the purpose of limiting control transfers. I suggest limiting the control-flow restrictions to function entry points and a separate “control returns here" for function calls.

I think the most salient issue is extensions that depend on using a NOP i.e. strengthening extensions that aim to be backwards binary compatible. I think the foundation should consider adding some NOP encoding space in the base ISA that can be reserved for such extensions. i.e. so that standard implementations don’t have to handle costly illegal instruction faults on hardware without the jump target and jump return labels, when running strengthened code, but still remain binary compatible (just without the additive protection).

Adding such custom NOPS (FENCE.NOP1, FENCE.NOP2, FENCE.NOP3, FENCE.NOP4) however would probably require a major ISA version number bump as code that used these NOPs would fault on current processors. i.e. reserve them in ISA version 3.0. I notice there are 4 bits of encoding space in funct7 and 2 bits in funct3 near to FENCE (which also can also be a NOP on some implementations). It could alternatively come out of the CUSTOM opcode space, but processors would need to treat some encodings as NOPs to be able to run protected code without additional overhead.

Overloading regular NOP embeddings doesn’t seem so clean and may have unintended consequences.

> RISC-V very intentionally does *not* actually have hardware support for any kind of stack--the stack is purely a software object in RISC-V. Instead, RISC-V uses a link register to hold the return address. This allows call and return to avoid memory access, and some leaf functions may avoid accessing memory entirely. The ability to use any register as a jump target also means that programs are not required to use x1 to store the return address, and there is already a concept of "millicode" that uses x5 as the link register to allow function prologue/epilogue sequences to be emitted once and shared between many functions, reducing code size.
>
> -- Jacob
>
> --
> You received this message because you are subscribed to the Google Groups "RISC-V ISA Dev" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+u...@groups.riscv.org.
> To post to this group, send email to isa...@groups.riscv.org.
> Visit this group at https://groups.google.com/a/groups.riscv.org/group/isa-dev/.
> To view this discussion on the web visit https://groups.google.com/a/groups.riscv.org/d/msgid/isa-dev/58CB2B71.7000902%40gmail.com.

Alex Elsayed

unread,
Mar 16, 2017, 10:49:04 PM3/16/17
to isa...@groups.riscv.org

On Thursday, 16 March 2017 18:06:42 PDT Michael Clark wrote:

<snip>

 

> I think the most salient issue is extensions that depend on using a NOP i.e.

> strengthening extensions that aim to be backwards binary compatible. I

> think the foundation should consider adding some NOP encoding space in the

> base ISA that can be reserved for such extensions. i.e. so that standard

> implementations don’t have to handle costly illegal instruction faults on

> hardware without the jump target and jump return labels, when running

> strengthened code, but still remain binary compatible (just without the

> additive protection).

>

> Adding such custom NOPS (FENCE.NOP1, FENCE.NOP2, FENCE.NOP3, FENCE.NOP4)

> however would probably require a major ISA version number bump as code that

> used these NOPs would fault on current processors. i.e. reserve them in ISA

> version 3.0. I notice there are 4 bits of encoding space in funct7 and 2

> bits in funct3 near to FENCE (which also can also be a NOP on some

> implementations). It could alternatively come out of the CUSTOM opcode

> space, but processors would need to treat some encodings as NOPs to be able

> to run protected code without additional overhead.

>

> Overloading regular NOP embeddings doesn’t seem so clean and may have

> unintended consequences.

 

<snip>

 

I strongly disagree. In particular, there is a very large encoding space for NOPs in the existing encoding - ADDI x0, x0, imm12 alone supports 4,096 distinct NOPs, the same encoding space as is available for CSRs.

 

Throw in SLTI, SLTIU, XORI, ORI, and ANDI and the space grows even more.

 

Use LUI x0, imm20 and you have a _truly_ absurd encoding space.

 

If we can trust the RISC-V foundation to manage the CSR encoding space (which is considerably smaller), it seems entirely reasonable to trust them with managing the NOP encoding space similarly, rather than defining new and strange NOPs, which are themselves _not_ backwards-compatible as NOPs with the _released_ userspace ISA.

signature.asc

Michael Clark

unread,
Mar 16, 2017, 11:25:41 PM3/16/17
to Alex Elsayed, isa...@groups.riscv.org

> On 17/03/2017, at 3:48 PM, Alex Elsayed <etern...@gmail.com> wrote:
>
> I strongly disagree. In particular, there is a very large encoding space for NOPs in the existing encoding - ADDI x0, x0, imm12 alone supports 4,096 distinct NOPs, the same encoding space as is available for CSRs.

Fair point.

Perhaps the use of these NOPs should somehow be coordinated.

Indirect call target labels for control flow integrity are becoming standard features in several ISAs.

That said, I don't think RISC-V should implement a shadow stack as has been done in other CFI implementations.

A stack-less labelling approach for JALR would be distinct. In fact an expanded target bit space in the label could provide more protection than existing labelling approaches. The return label in a stack less model following a JAL could for example encode some bits of the address of the return instruction of the target procedure (with static linking) and provide stronger protection than other CFI labelling techniques. i.e. if an attacker gets control of the link register. This is in fact where more bits in the label would be most useful. Returns.

Sent from my iPhone

Andrew Waterman

unread,
Mar 17, 2017, 12:29:23 AM3/17/17
to Alex Elsayed, RISC-V ISA Dev
I concur.

There is already an expansive RVC HINT space--including, along the
lines of what you mentioned, C.LUI x0--and I would expect the
Foundation to declare the analogous RVI instructions as HINTs, as part
of the formal adoption of the RVI spec.

Thus far, no HINTs are allocated, and I expect the RVC ones will be
especially carefully guarded.

>
> --
> You received this message because you are subscribed to the Google Groups
> "RISC-V ISA Dev" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to isa-dev+u...@groups.riscv.org.
> To post to this group, send email to isa...@groups.riscv.org.
> Visit this group at
> https://groups.google.com/a/groups.riscv.org/group/isa-dev/.
> To view this discussion on the web visit
> https://groups.google.com/a/groups.riscv.org/d/msgid/isa-dev/3024061.GQHWjI4ytg%40arkadios.

Alex Elsayed

unread,
Mar 17, 2017, 8:12:28 PM3/17/17
to RISC-V ISA Dev
Thinking about this further, there may be an interesting (if somewhat nop-
encoding-space-intensive) option for doing this sans a stack, by taking a
macro-op fusion approach.

Specifically, this:

come_from(addr, link):
ADDI zero, link, addr[11:0]
AUIPC zero, addr[32:12]

1.) Leaves the `ADDI zero, zero, imm12` NOP-encoding space alone
Since the canonical NOP lives there, probably best to keep it pristine
2.) Can express the inverse of any jump that can be expressed with AUIPC, JALR
3.) Only requires eight bits of added µarch state
(jumped:1, overflow:1, bad:1, which:5)
4.) Has relatively simple semantics

The behavior is as follows:
1.) On jump, set jumped = 1
2.) If jumped == 1 and *pc == ADDI zero, link = !zero, imm12
1.) Set which = index(link)
2.) Calculate next_pc as the following instruction
3.) Calculate low12 = (imm12 + next_pc)[11:0]
4.) If addition carried past low12,
Set overflow = 1
5.) Set bad = low12 != link[11:0]
3.) If jumped == 0 or *pc != ADDI zero, link = !zero, imm12
1.) Set state = 0
4.) If which != 0 and *pc == AUIPC zero, imm20
1.) Calculate high20 = imm20 + overflow + pc[32:12]
2.) Set bad |= s_ext(high20) != s_ext(link[32:20])
3.) If bad == 1
1.) Set state = 0
2.) TRAP
5.) If either is false
1.) Set state = 0

Since trampolines and such should be transparent to the link register, this
can even protect jumps that happen via them. However, this will bloat every
jump target by two instructions per jump source.
signature.asc

Alex Elsayed

unread,
Mar 17, 2017, 8:23:35 PM3/17/17
to RISC-V ISA Dev
Minor fix - step 3 should be split, such that if
jumped == 0, state is cleared, but if jumped == 1 and the instruction does not
match, then a trap occurs.
signature.asc

Albert Cahalan

unread,
Mar 18, 2017, 2:17:29 AM3/18/17
to Michael Clark, Alex Elsayed, isa...@groups.riscv.org
On 3/16/17, Michael Clark <michae...@mac.com> wrote:

> Indirect call target labels for control flow integrity are becoming standard
> features in several ISAs.
>
> That said, I don't think RISC-V should implement a shadow stack as has been
> done in other CFI implementations.

Consider that a separate issue. (very effective and very messy)

> A stack-less labelling approach for JALR would be distinct. In fact an
> expanded target bit space in the label could provide more protection than
> existing labelling approaches. The return label in a stack less model
> following a JAL could for example encode some bits of the address of the
> return instruction of the target procedure (with static linking) and provide
> stronger protection than other CFI labelling techniques. i.e. if an attacker
> gets control of the link register. This is in fact where more bits in the
> label would be most useful. Returns.

Letting the instructions specify the extra bits on both sides of the jump/call
is better. Languages can use a hash of the function call signature or other
state. Besides generally reducing the available targets, identical hashes
are more likely to be uninteresting to the attacker.

Calls and returns can somewhat use the same hash as long as there is at
least a way to tell which is which. Flipping all the bits would do. Some kinds
of optimization may die as a result; the returning function won't know the
correct arguments if it was reached via jumping from the tail of a function
that was called from where it is returning to.

Michael Clark

unread,
Mar 18, 2017, 2:40:59 AM3/18/17
to Albert Cahalan, Alex Elsayed, RISC-V ISA Dev, Watson Ladd
On 18 Mar 2017, at 7:17 PM, Albert Cahalan <acah...@gmail.com> wrote:

On 3/16/17, Michael Clark <michae...@mac.com> wrote:

Indirect call target labels for control flow integrity are becoming standard
features in several ISAs.

That said, I don't think RISC-V should implement a shadow stack as has been
done in other CFI implementations.

Consider that a separate issue. (very effective and very messy)

A stack-less labelling approach for JALR would be distinct. In fact an
expanded target bit space in the label could provide more protection than
existing labelling approaches. The return label in a stack less model
following a JAL could for example encode some bits of the address of the
return instruction of the target procedure (with static linking) and provide
stronger protection than other CFI labelling techniques. i.e. if an attacker
gets control of the link register. This is in fact where more bits in the
label would be most useful. Returns.

Letting the instructions specify the extra bits on both sides of the jump/call
is better. Languages can use a hash of the function call signature or other
state. Besides generally reducing the available targets, identical hashes
are more likely to be uninteresting to the attacker.

Yes. A hash of the expected return address could be used if we used 12-bits of ADDI

JAL ra, proc
# 1 instruction indirect return label (hash)
ra: ADDI zero, ra, %pcrel_ret_hash(proc)

#extern
proc:
IJT # indirect jump target label
JALR zero, ra

Alex shows how a fused pair of ADDI+AUIPC can be used to specify the 32-bit relative address of the return of the called function. If I understand it correctly it is something like this:

JAL ra, proc
# 2 instruction indirect return label (32-bit PC-relative to return)
ra: ADDI zero, ra %pcrel_ret_lo(proc)
AUIPC zero, %pcrel_ret_hi(proc)

#extern
proc:
IJT # indirect jump target label
JALR zero, ra

%pcrel_ret_hash, %pcrel_ret_hi and %pcrel_ret_lo in this pseudo asm respectively take the hash, hi and lo address of the ret instruction of the called function. Function returns would need relocation entries. It does have the disadvantage of only allowing one return path out for a function, so all paths out of a function must jump to a single return point, and as you mention, it would prevent tail calls from the target function.

In the 32-bit relative model, control can provably only return from the called function. Of course call targets need to have looser restrictions.

Calls and returns can somewhat use the same hash as long as there is at
least a way to tell which is which. Flipping all the bits would do. Some kinds
of optimization may die as a result; the returning function won't know the
correct arguments if it was reached via jumping from the tail of a function
that was called from where it is returning to.

Yes it would disable multiple return paths, and thus combinations such as a tail call exit as well as a regular return, or multiple returns from one function.

It is certainly an interesting hardening model to consider.

Michael.

Albert Cahalan

unread,
Mar 18, 2017, 2:08:19 PM3/18/17
to Michael Clark, Alex Elsayed, RISC-V ISA Dev, Watson Ladd
On 3/18/17, Michael Clark <michae...@mac.com> wrote:
>> On 18 Mar 2017, at 7:17 PM, Albert Cahalan <acah...@gmail.com> wrote:
>> On 3/16/17, Michael Clark <michae...@mac.com> wrote:

>>> A stack-less labelling approach for JALR would be distinct. In fact an
>>> expanded target bit space in the label could provide more protection
>>> than
>>> existing labelling approaches. The return label in a stack less model
>>> following a JAL could for example encode some bits of the address of the
>>> return instruction of the target procedure (with static linking) and
>>> provide
>>> stronger protection than other CFI labelling techniques. i.e. if an
>>> attacker
>>> gets control of the link register. This is in fact where more bits in
>>> the
>>> label would be most useful. Returns.
>>
>> Letting the instructions specify the extra bits on both sides of the
>> jump/call
>> is better. Languages can use a hash of the function call signature or
>> other
>> state. Besides generally reducing the available targets, identical hashes
>> are more likely to be uninteresting to the attacker.
>
> Yes. A hash of the expected return address could be used if we used 12-bits
> of ADDI

From an ABI perspective, this doesn't seem viable. Perhaps I misunderstood,
but it looks like you would have the caller know the address from which the
callee will return and/or the callee know the address to which it must return.
I don't like either, and wasn't suggesting either.

If the caller must know the address from which the callee will return, then ELF
would need some significant changes. Callees would need to return from one
point, and this point would need to be known by the caller. It could
be implied by
being right before the function start, causing trouble like ppc64 and ia64 have.
It could be implied by being a fixed offset afterward, which forces a jump over
the code. It could be at the end of a function, meaning that the address of a
function is not sufficient to call it. All of this is trouble.

If the callee must know the address to which it must return, then functions are
no longer reentrant. Functions also can't be in libraries, at least
not without stubs.

My proposal was more like this:

For a C++ function, take the CRC32 of what the name would be if all the types
were named "X". For a C function, take the CRC32 of an array that has for the
return value and each argument an enum to indicate that it is one of: void,
pointer, integer or enum, floating point, vararg "..." args, other.
For other languages,
choose something appropriate for the language. This info is built into
the caller
at the call site, the callee at the entry point, the callee at the
return, and the caller
at a point right after the call site. There are two control flow
changes, the call and
the return, which could have separate values. One could be computed from the
other, for example by flipping the bits.

The idea is that function pointers still work as long as you are not
casting them
to an incompatible function type.

Using a well-known fixed hash value (probably zero) for the return path would
allow tail-call optimization and give an intermediate level of security. Tossing
aside tail-call optimization, one could just flip the bits of the value on one
path to get a different value for the other path. This would allow the
call site to
specify just one value instead of two.

Michael Clark

unread,
Mar 18, 2017, 5:40:02 PM3/18/17
to Albert Cahalan, Alex Elsayed, RISC-V ISA Dev, Watson Ladd
It obviously needs changes in both the compiler and binutils. The compiler changes might be bigger in scale than the linker. The RET macro that generates JALR zero, ra could emit a reloc for the callee return instruction. Binutils would need some additional relocs. It would not work with dynamic shared libraries.

PE/COFF could support this with shared libraries as the text is not mapped read-only and all callers of shared library entry points can be identified at link/load time as they are rewritten during runtime relocation (unlike ELF which just updates the GOT). i.e. there is no GOT indirection with PE/COFF.

PE/COFF function linkages might be a bit faster (at the expense of sacrificing page sharing) than the double jump to the PLT and then the GOT indirect jump (or only one jump with inline GOT references i.e. -fno-plt ; assuming gcc has that patch).

It remains to be seen that with a stackless approach, if return landing pads should check they only come from the called function, rather than from any return. It is an interesting model even if it does dictate one return point for each function.
> --
> You received this message because you are subscribed to the Google Groups "RISC-V ISA Dev" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+u...@groups.riscv.org.
> To post to this group, send email to isa...@groups.riscv.org.
> Visit this group at https://groups.google.com/a/groups.riscv.org/group/isa-dev/.
> To view this discussion on the web visit https://groups.google.com/a/groups.riscv.org/d/msgid/isa-dev/CABfYdSoVUOsa3v1o%2BGMQrPb3cppPC_XSvhyjfXDE1aO3NH6DeQ%40mail.gmail.com.

Jacob Bachmeyer

unread,
Mar 18, 2017, 7:32:52 PM3/18/17
to Michael Clark, Alex Elsayed, isa...@groups.riscv.org
After thinking about this a bit, (and still unsure if we are being
trolled--"COME FROM" really bothers me) a SAFE-INDIRECT-LANDING opcode
will need a 33-bit immediate. The only way to do this is a longer
opcode, which breaks backwards compatibility, but I argue that breaking
backwards compatibility is a *feature* in a control-flow integrity
extension. CFI cannot be backwards-compatible with programs that are
not compiled for it, and a program compiled for CFI but running on an
implementation that lacks CFI will get a false sense of security.

I suggest a 64-bit JAL-alike as the indirect jump target label. By
making the label an executable jump, we preserve the ability to have
dense jump tables, but the need for a 33-bit parameter limits the range
of the jump. A major opcode in the 64-bit space (one of only 32(!))
gives us a 52-bit encoding space. The destination register (using the
ordinary rd field?) consumes 5 bits from that (and we need a destination
register to support trampoline tables). The "allowed jump mask" is 33
bits (to be explained). This leaves 14 bits for an offset, giving a
range of +/- 8KiB. This seems reasonable to me, since this permits up
to 2048 entries in a contiguous block before other measures must be
taken to reach an intermediate transfer routine. The simplest "other
measure" is to reserve a few slots in the table for regular JAL x0
instructions that forward execution to the table dispatch code.

The "allowed jump mask" field is 33-bits because it controls which
registers can be used to jump to a landing and whether a non-zero offset
must be present in the calling JALR. 31 bits correspond to permission
to jump to this target using each of x1 through x31, while the remaining
two bits determine whether the offset must be zero (as in function
return), must be non-zero (as in transfers to a PLT-like structure using
a base pointer), or is unrestricted. The fourth possible value is
reserved. Indirect jumps through x0 do not require a
SAFE-INDIRECT-LANDING opcode and are unrestricted because they can be
adequately controlled by controlling memory layout.

-- Jacob

Watson Ladd

unread,
Mar 18, 2017, 7:58:20 PM3/18/17
to RISC-V ISA Dev, michae...@mac.com, etern...@gmail.com, jcb6...@gmail.com


On Saturday, March 18, 2017 at 4:32:52 PM UTC-7, Jacob Bachmeyer wrote:
Michael Clark wrote:
>> On 17/03/2017, at 3:48 PM, Alex Elsayed <etern...@gmail.com> wrote:
>>
>> I strongly disagree. In particular, there is a very large encoding space for NOPs in the existing encoding - ADDI x0, x0, imm12 alone supports 4,096 distinct NOPs, the same encoding space as is available for CSRs.
>>    
>
> Fair point.
>
> Perhaps the use of these NOPs should somehow be coordinated.
>
> Indirect call target labels for control flow integrity are becoming standard features in several ISAs.
>
> That said, I don't think RISC-V should implement a shadow stack as has been done in other CFI implementations.
>
> A stack-less labelling approach for JALR would be distinct. In fact an expanded target bit space in the label could provide more protection than existing labelling approaches. The return label in a stack less model following a JAL could for example encode some bits of the address of the return instruction of the target procedure (with static linking) and provide stronger protection than other CFI labelling techniques. i.e. if an attacker gets control of the link register. This is in fact where more bits in the label would be most useful. Returns.

After thinking about this a bit, (and still unsure if we are being
trolled--"COME FROM" really bothers me)

It was tongue-in-cheek, but mnemnonics should be memorable.

So some of the mechanisms discussed are less familiar to me. If I understand this proposal correctly, it suggests that we have a 14 bit offset pointing from the destination of a jump to the
instruction that jumped there, plus a 33 bit address determining which data could have been used to jump to it. (I don't understand the bit about making the target an executable jump). 

My concern is that this doesn't protect function pointers, as there the target of a return is branched from any function the pointer could have pointed to. Obviously an attacker who overwrites a function pointer can make it go to whatever function they want, but this proposal would seem to imply that ROP is still possible by setting the return addresses to the sites of
function pointer invocations. A stack-like discipline (which doesn't necessarily result in extra memory accesses) would provide a stronger guarantee. 

-- Jacob

Michael Clark

unread,
Mar 18, 2017, 7:58:51 PM3/18/17
to jcb6...@gmail.com, Alex Elsayed, isa...@groups.riscv.org
Returns are not safe. In fact they are a vector. An “untied” label after a call is essentially a gadget label. Untied as in not knowing where the return is coming from.

Stack cookies can be leaked with a read primitive. A ROP bypass finds and writes the stack cookie along with a return address of a gadget. If the stack is writable, contains variable length items and function addresses it can be used to subvert control flow. 

SafeStack mitigates this by placing variable length (arrays) on an alternate stack. i.e. only provably bounded frame accesses are allowed on the safe stack that contains control flow and unbounded accesses use an unsafe stack. The SafeStack shadow stack could theoretically be found and written to with a write primitive.

Shadow stacks that are only accessible by CALL and RETURN on a stack architectures can prove that a return is not a gadget access however they need context switching support and changes to longjmp, assuming a higher privilege level has control of the shadow stack bounds.

In a stackless architecture, having a return address label after a call with the program counter of the called function's RET (jalr zero, ra) instruction can completely defeat arbitrary return.

That said, from what I can tell regards CFI is that most people will sacrifice security for speed.

Albert Cahalan

unread,
Mar 18, 2017, 8:16:37 PM3/18/17
to jcb6...@gmail.com, Michael Clark, Alex Elsayed, isa...@groups.riscv.org
On 3/18/17, Jacob Bachmeyer <jcb6...@gmail.com> wrote:
> Michael Clark wrote:
>> [somebody wrote]

>>> I strongly disagree. In particular, there is a very large encoding space
>>> for NOPs in the existing encoding - ADDI x0, x0, imm12 alone supports
>>> 4,096 distinct NOPs, the same encoding space as is available for CSRs.
>>>
>>
>> Fair point.
>>
>> Perhaps the use of these NOPs should somehow be coordinated.

I hope no badly-optimizing compiler would emit these.

> After thinking about this a bit, (and still unsure if we are being
> trolled--"COME FROM" really bothers me) a SAFE-INDIRECT-LANDING opcode

I've seen a "COME FROM" on real hardware, and this isn't it.
The SHARC DSP supports 6 simultaneously active COME FROM.
Sadly, they named it something else. It takes the form of a jump,
which could be conditional, but gives the address to jump from
rather than the address to jump to. This address is remembered.
When the address is hit and the condition is true, the jump is taken.
It's a tad like hardware breakpoints or watchpoints, but it causes a
regular jump instead of an exception. This gives the CPU the ability
to do zero-latency loops, with all the pipelining and parallelism you
could want. It's actually quite nice for stuff like the FFT.

Since we're bikeshedding, "SAFE-INDIRECT-LANDING" is quite
wordy and "SIL" is really terse. Depending on how exactly things
work I could go for "SENDCOOKIE" and "RECVCOOKIE",
"TXCOOKIE" and "RXCOOKIE", "PROPOSE" and "ACCEPT", or similar.

(set an arbitrary cookie for the next jump or the one after, do the
jump or jumps, then die unless there is a cookie-accepter that
matches the current cookie)

> The "allowed jump mask" field is 33-bits because it controls which

bit masks for registers is looking complicated...

Jacob Bachmeyer

unread,
Mar 18, 2017, 8:27:28 PM3/18/17
to Watson Ladd, RISC-V ISA Dev, michae...@mac.com, etern...@gmail.com
Watson Ladd wrote:
> On Saturday, March 18, 2017 at 4:32:52 PM UTC-7, Jacob Bachmeyer wrote:
>
> After thinking about this a bit, (and still unsure if we are being
> trolled--"COME FROM" really bothers me)
>
>
> It was tongue-in-cheek, but mnemnonics should be memorable.
Fair enough; I will call it SAFE-INDIRECT-LANDING for now, though.
The destination of an indirect jump would itself be a (direct) jump.
For the usual function entry point, jump-to-next-instruction is
perfectly acceptable. For densely-packed jump tables, each
SAFE-INDIRECT-LANDING could further transfer control to an indirect
entry routine, storing (effectively) its own address (that JALR jumped
to) in a scratch register for the entry routine to resolve further.

The 14 bit offset is where control is transferred *after* landing from
an indirect jump. The 33-bit "allowed jump mask" is a 2-bit "jump
offset control" field and a 31-bit "allowed indirect register mask"
field. Bits 1 through 31 in this field correspond to registers x1
through x31. If a SAFE-INDIRECT-LANDING is reached by JALR using x16,
for example, a fault is raised unless bit 16 in this field is set.
Similarly, JALR has an offset, and the 2-bit subfield indicates whether
a JALR that reaches this landing must have a zero offset, must have a
non-zero offset, or may have any offset.

> My concern is that this doesn't protect function pointers, as there
> the target of a return is branched from any function the pointer could
> have pointed to. Obviously an attacker who overwrites a function
> pointer can make it go to whatever function they want, but this
> proposal would seem to imply that ROP is still possible by setting the
> return addresses to the sites of
> function pointer invocations. A stack-like discipline (which doesn't
> necessarily result in extra memory accesses) would provide a stronger
> guarantee.

Remember that "RET" in RISC-V is actually "JALR x0, x1, 0"--an indirect
jump. Since the site of a function call will not (in general) be
preceded by a SAFE-INDIRECT-LANDING (unless the program does multiple
function calls in a sequence), ROP to "use of a function pointer" is not
guaranteed to be viable.

If functions are never called through x1/ra, then the function entry
point will have the bit for x1 cleared and ROP to a function entry point
immediately faults. Return-oriented programming would only be able to
use gadgets that start with a legitimate return from a function call.
On one hand this makes searching for gadgets easier, on the other hand
it does that by significantly reducing the set of possible gadgets,
hopefully by enough to make ROP infeasible.


-- Jacob

Jacob Bachmeyer

unread,
Mar 18, 2017, 8:42:39 PM3/18/17
to Michael Clark, Alex Elsayed, isa...@groups.riscv.org
Michael Clark wrote:
>
>> On 19 Mar 2017, at 12:32 PM, Jacob Bachmeyer <jcb6...@gmail.com
>> <mailto:jcb6...@gmail.com>> wrote:
>>
>> Michael Clark wrote:
>>>> On 17/03/2017, at 3:48 PM, Alex Elsayed <etern...@gmail.com
Admitted, however, it is still an improvement since those gadget labels
are now the *only* gadgets possible. This is better than permitting any
trailing subset of any function to be used as a gadget. Further, use of
x1 as the return address in RISC-V is merely a convention--a program can
use other registers to store return addresses, so this leads to 31
classes of return points, one of which is the standard ABI. Internal
functions do not actually need to use the standard ABI.

> Stack cookies can be leaked with a read primitive. A ROP bypass finds
> and writes the stack cookie along with a return address of a gadget.
> If the stack is writable, contains variable length items and function
> addresses it can be used to subvert control flow.
>
> SafeStack mitigates this by placing variable length (arrays) on an
> alternate stack. i.e. only provably bounded frame accesses are allowed
> on the safe stack that contains control flow and unbounded accesses
> use an unsafe stack. The SafeStack shadow stack could theoretically be
> found and written to with a write primitive.
>
> Shadow stacks that are only accessible by CALL and RETURN on a stack
> architectures can prove that a return is not a gadget access however
> they need context switching support and changes to longjmp, assuming a
> higher privilege level has control of the shadow stack bounds.
>
> In a stackless architecture, having a return address label after a
> call with the program counter of the called function's RET (jalr zero,
> ra) instruction can completely defeat arbitrary return.

Yes, at the cost of significant increases in ABI complexity and the
complete inability to use tail-calls. An alternate idea: use the fourth
value of the 2-bit subfield to indicate that the remaining 31 bits are
instead a "return cookie"--if SAFE-INDIRECT-LANDING was reached by
normal execution, this would set a hidden "return cookie" register,
while if SAFE-INDIRECT-LANDING was the target of an indirect jump, this
would compare the 31-bit value to the hidden "return cookie" register
and raise an exception unless they match. This adds only a function's
31-bit "return cookie" to the information needed to call a function and
does not complicate ASLR. (Maybe derive the return cookie from a hash
of the function name and arguments? This would also catch incorrect
function calls, although functions that implement polymorphic interfaces
would need to derive their return cookies from the interface they
implement.)

> That said, from what I can tell regards CFI is that most people will
> sacrifice security for speed.

And that is a limit to how thorough this can get. I argue that
"something" is still better than "nothing".


-- Jacob

Jacob Bachmeyer

unread,
Mar 18, 2017, 10:41:31 PM3/18/17
to Albert Cahalan, Michael Clark, Alex Elsayed, isa...@groups.riscv.org
Albert Cahalan wrote:
> On 3/18/17, Jacob Bachmeyer <jcb6...@gmail.com> wrote:
>
>> After thinking about this a bit, (and still unsure if we are being
>> trolled--"COME FROM" really bothers me) a SAFE-INDIRECT-LANDING opcode
>>
>
> I've seen a "COME FROM" on real hardware, and this isn't it.
> The SHARC DSP supports 6 simultaneously active COME FROM.
> Sadly, they named it something else. It takes the form of a jump,
> which could be conditional, but gives the address to jump from
> rather than the address to jump to. This address is remembered.
> When the address is hit and the condition is true, the jump is taken.
> It's a tad like hardware breakpoints or watchpoints, but it causes a
> regular jump instead of an exception. This gives the CPU the ability
> to do zero-latency loops, with all the pipelining and parallelism you
> could want. It's actually quite nice for stuff like the FFT.
>
> Since we're bikeshedding, "SAFE-INDIRECT-LANDING" is quite
> wordy and "SIL" is really terse. Depending on how exactly things
> work I could go for "SENDCOOKIE" and "RECVCOOKIE",
> "TXCOOKIE" and "RXCOOKIE", "PROPOSE" and "ACCEPT", or similar.
>

How about drop the "INDIRECT" and shorten to "CFI.SALA"?

> (set an arbitrary cookie for the next jump or the one after, do the
> jump or jumps, then die unless there is a cookie-accepter that
> matches the current cookie)
>

This leads to a use for the fourth mode in
SAFE-INDIRECT-LANDING/CFI.SALA. Rewriting:

A 32-bit sucfistatus ("Supervisor-controlled User Control Flow Integrity
STATUS") CSR is introduced, freely accessible by the supervisor,
containing a 1-bit indirect-jump-in-progress/"IJIP" flag, a 1-bit mode
select "cookie" flag and either a 30-bit cookie field if the "cookie"
flag is set, or a 6-bit "jump source description" field otherwise. The
"jump source description" field contains a 1-bit non-zero-offset flag
and a 5-bit jump source register field. A CFI violation trap is taken
if IJIP is set and any instruction other than CFI.SALA is executed.
JALR sets the IJIP flag unconditionally and the "jump source
description" field if the "cookie" flag is clear. CFI.SALA clears IJIP
if its condition is satisfied.

CFI.SALA is a 64-bit instruction, using a major opcode in the 64-bit
encoding space. CFI.SALA has a unique format due to its requirements.
The 52 oprand bits in CFI.SALA are assigned: 2-bit mode, 5-bit
destination register, 31-bit jump source register mask, and 14-bit
control transfer offset.

CFI.SALA is executed as a direct JAL ("jump-and-link") using the 5-bit
destination register and 14-bit offset sign extended and added to the
program counter. The destination register may of course be x0. The
jump is unconditional. Executing CFI.SALA has conditional side-effects
on sucfistatus.

If the mode field is 2'b01, 2'b10, or 2'b11, then the "cookie" flag in
sucfistatus must be clear and the jump source register field must either
be zero or the bit indicated by the
value of the jump source register field must be set in the jump source
register mask, which has bits numbered 1 through 31, corresponding to
jumps indirected from registers x1 through x31. Additionally, if the
mode is 2'b01, the non-zero-offset flag must be clear; if the mode is
2'b10, the non-zero-offset flag must be set; and if the mode is 2'b11,
the non-zero-offset flag may have either value. If all of these
conditions are met, IJIP is cleared.

If the mode field is 2'b00, CFI.SALA functions as either PROPOSE or
ACCEPT, storing or verifying a 30-bit cookie value. The 31-bit jump
source register mask immediate is instead a 1-bit PROPOSE/ACCEPT flag
and a 30-bit cookie value. If the flag is set as PROPOSE, then the
"cookie" flag is set in sucfistatus and the 30-bit cookie value is
copied to sucfistatus. IJIP is unaltered by PROPOSE. If the flag is
set as ACCEPT, then the "cookie" flag in sucfistatus must be set and the
cookie value in sucfistatus must match the 30-bit cookie immediate. If
these conditions are met, IJIP is cleared.


-- Jacob

Watson Ladd

unread,
Mar 19, 2017, 12:31:31 AM3/19/17
to jcb6...@gmail.com, isa...@groups.riscv.org, Michael Clark, Albert Cahalan, Alex Elsayed
Doesn't work for nested function calls as the cookie gets clobbered. I'm not sure where you would use this mode.


-- Jacob


--
You received this message because you are subscribed t8o the Google Groups "RISC-V ISA Dev" group.

To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+u...@groups.riscv.org.
To post to this group, send email to isa...@groups.riscv.org.
Visit this group at https://groups.google.com/a/groups.riscv.org/group/isa-dev/.

Jacob Bachmeyer

unread,
Mar 20, 2017, 12:04:37 AM3/20/17
to Watson Ladd, isa...@groups.riscv.org, Michael Clark, Albert Cahalan, Alex Elsayed
Watson Ladd wrote:
> On Mar 18, 2017 7:41 PM, "Jacob Bachmeyer" <jcb6...@gmail.com
> <mailto:jcb6...@gmail.com>> wrote:
>
> Albert Cahalan wrote:
>
> On 3/18/17, Jacob Bachmeyer <jcb6...@gmail.com
It does work for nested function calls--the cookie is set just before
the return. The return sequence is: [in function epilogue] PROPOSE
<return cookie>; RET (which is "JALR x0, x1, 0"); [in caller] ACCEPT
<expected return cookie>. A nested function call will set its own
return cookie and the outer function will verify the inner function's
return cookie. The outer function will then set its own return cookie
just before returning to its caller. The only catch is that a function
ending with a tail call must use the tail-called function's return
cookie in its ABI. (In other words, tail-called functions are
potentially visible to their caller's caller.)

On a side note, SafeStack is almost trivial on RISC-V because the stack
is purely a software object. Using a second stack exclusively for user
data and reserving the primary stack for control flow incurs only a
slight cost of 1 or 2 callee-saved registers as stack and frame pointers
for the data stack, but the data frame pointer could be stored on the
control stack, reducing this to a single additional stack pointer.

-- Jacob

Watson Ladd

unread,
Mar 20, 2017, 12:20:09 AM3/20/17
to jcb6...@gmail.com, isa...@groups.riscv.org, Albert Cahalan, Michael Clark, Alex Elsayed
Ah, I see. So the return site differs per function. Nice!


On a side note, SafeStack is almost trivial on RISC-V because the stack is purely a software object.  Using a second stack exclusively for user data and reserving the primary stack for control flow incurs only a slight cost of 1 or 2 callee-saved registers as stack and frame pointers for the data stack, but the data frame pointer could be stored on the control stack, reducing this to a single additional stack pointer.

A write-4 primitive plus infoleak to find the shadow stack base results in exploitation. Real hardware enforcement is needed to stop this, but the first idea with cookies does a lot.



-- Jacob


Jacob Bachmeyer

unread,
Mar 20, 2017, 12:51:22 AM3/20/17
to Watson Ladd, isa...@groups.riscv.org, Albert Cahalan, Michael Clark, Alex Elsayed
Watson Ladd wrote:
>
>
> On Mar 19, 2017 9:04 PM, "Jacob Bachmeyer" <jcb6...@gmail.com
> <mailto:jcb6...@gmail.com>> wrote:
>
> Watson Ladd wrote:
>
> On Mar 18, 2017 7:41 PM, "Jacob Bachmeyer" <jcb6...@gmail.com
> <mailto:jcb6...@gmail.com> <mailto:jcb6...@gmail.com
> <mailto:jcb6...@gmail.com>>> wrote:
>
> Albert Cahalan wrote:
>
> On 3/18/17, Jacob Bachmeyer <jcb6...@gmail.com
> <mailto:jcb6...@gmail.com>
> <mailto:jcb6...@gmail.com
Of course, that is why the cookie value is 30 bits. Programs are
expected to have many different cookie values.

> On a side note, SafeStack is almost trivial on RISC-V because the
> stack is purely a software object. Using a second stack
> exclusively for user data and reserving the primary stack for
> control flow incurs only a slight cost of 1 or 2 callee-saved
> registers as stack and frame pointers for the data stack, but the
> data frame pointer could be stored on the control stack, reducing
> this to a single additional stack pointer.
>
>
> A write-4 primitive plus infoleak to find the shadow stack base
> results in exploitation. Real hardware enforcement is needed to stop
> this, but the first idea with cookies does a lot.

There is always *some* combination that results in exploitation; the
goal is to make that combination sufficiently rare in real programs that
exploits are infeasible. Note that I am *not* suggesting a shadow stack
as I understand the term (where control flow information is copied to
and from a secondary stack)--I suggest *two* (or more) distinct stacks:
the primary stack contains only control flow and possibly non-array
local variables, while a secondary stack contains all local arrays and
possibly other (non-pointer) local variables. A third stack may be
needed for arrays of pointers, or those may be required to be on the
heap, which will need its own protections (storing the heap metadata
somewhere other than conveniently right next to user buffers would
probably be a big help). If an attacker cannot cause an overrun that
corrupts a pointer, what primitives are still possible?

This is also where restrictions on indirect jumps enter the picture--if
transfers through code pointers can only land at designated safe landing
points (CFI.SALA), then the total set of possible ROP gadgets is greatly
reduced. Requirements that particular registers be used to call
particular functions only further impede an attacker. If returns are
further bound to the type of the returned value (one consistent way to
generate return cookies, but stricter models are possible), then the set
of available ROP gadgets is further reduced. Eventually, we reach a
point where the set of ROP gadgets simply does not include something the
attacker needs and exploitation is infeasible.

There is another subtle aspect of the CFI.SALA approach: a landing
point may have any number of CFI.SALA instructions, and execution
continues if at least one of them matches. This allows multiple return
cookies to be accepted, or for different sets of registers to be
permitted for calls with and without offsets. This also allows a
function to permit a call through a particular register or with any
number of acceptable function entry cookies. These features,
particularly the option for multiple valid cookie values permit strict
cookie assignments to be used, even with polymorphic objects. (Function
A may use method Bar on objects of type C or D, but function B will only
accept objects of type C. This cookie model can enforce this example
rule, either by raising CFI violation when D::Bar returns to B or upon B
calling D::Bar if function entry cookies are used.)


-- Jacob

Stefan O'Rear

unread,
Mar 31, 2017, 1:36:11 AM3/31/17
to Watson Ladd, RISC-V ISA Dev
On Wed, Mar 15, 2017 at 8:52 PM, Watson Ladd <watso...@gmail.com> wrote:
> Dear all,

Hello and welcome to the RISC-V community!

(I am not even trying to represent the project in the following.)

> I have several ideas for potentially useful instructions and extensions that
> can support additional mitigations. The first is COME FROM. All jump targets

1. I believe the person in charge of this area is David Chisnall at Cambridge.

2. You may have noticed that the base ISA is aggressively minimal and
leaves out things like integer rotates. Krste has commented a couple
times that "the hardest part is saying no". RISC-V's design purpose
was to be simple enough for research projects. Any proposal to add
anything faces an extremely uphill battle.

3. Proposals to add complicated features are even more uphill, due to
the RISC-V microarchitectural neutrality goal.

4. I for one am looking forward to the day - maybe 100 years from now
- when C is rare enough in security-critical codebases that we don't
have to jump through hoops to deal with stack-buffer overflows.

> must be COME FROM on chips that support it when this extension is used. As a

5. "When this extension is used" is extremely vague. Minimal
implementations and non-minimal implementations need to share a
software ecosystem, so adding new hard requirements for existing
software is problematic.

6. "Jump targets" is quite a bit of microarchitectural complexity. If
you're jumping to a swapped out page, do you trap before the jump, or
trap after (and lose the chance to check the jump target)?

7. RISC-V has a design goal of being "boring", in the sense that it
imposes very few novel challenges for implementors and compiler
writers. I am not aware of any mainstream ISA that does anything like
this proposal.

8. In the spirit of RISC we prefer solutions that involve existing
instructions, or at the very least existing datapath. Rare
instructions using rare datapath are difficult to implement
efficiently.

> result the available gadgets for control flow manipulation is reduced. COME
> FROM has no other effects. The second is a shadow stack modification which
> adds three instructions: CALL, CALLR and RETURN and two privileged-mode
> registers STACKBASE and STACKLIM as well as modify the memory protection

9. We're trying to add as few privileged registers as possible to
avoid a death-of-a-thousand-cuts situation on VM context switch
latency.

> system to add a shadow access type. STACKBASE and STACKLIM are conceptually
> per user or kernel process.

10. If it's per-process, that's worse, since it affects context switch
latency for ordinary processes.

11. "Add a shadow access type" is a major violation of the boringness principle.

> CALL(R) has the same format as JA(R)L and the same semantics except it does

12. JAL uses an entire major opcode. There are only remaining two
major opcodes reserved for standard use. If you want one of them for
yourself, you'll need to make a very good case for it.

> not touch the destination register. Instead the address is pushed onto a
> stack, that may be dumped to memory between STACKBASE and STACKLIM. RETURN

13. There is no hardware stack on RISC-V currently. Adding a hardware
stack would be a major break from the current design language.

> pops this stack. Memory accesses issued by these dumps are in shadow mode: a
> page with the shadow bit set may be written or read by them. If the stack

14. So you want a new PTE bit. Again, we have two PTE bits reserved;
getting one of them requires a very good case, ideally for something
that has been battle-proven on commercial ISAs (20 years ago).
Alternatively, we could make an incompatible change to the PTE format
and drop support for 34-bit physical addresses on RV32; this doesn't
really improve your prospects of getting a bit though.

> size exceeds STACKLIM an error is reported to the OS. Multiple processes can
> use the same shadow page because of the STACKBASE and STACKLIM registers.
>
> I look forward to discussing these proposals further/gathering other
> instructions that can assist in securing today's software.

I think this specific proposal is a poor fit for the design language
and implementation realities of the RISC-V standardized baseline. I
think this proposal occupies an unhappy median position, where it is
difficult to support with existing compilers and kernels but provides
much weaker protection than research designs like CHERI and DOVER.
More attention to microarchitectural implementation issues would be an
improvement.

I appreciate your interest and contribution and welcome further
discussion on this proposal and future proposals of yours. Thank you.

-s

Michael Clark

unread,
Mar 31, 2017, 5:47:38 PM3/31/17
to Stefan O'Rear, Watson Ladd, RISC-V ISA Dev
On 31 Mar 2017, at 6:36 PM, Stefan O'Rear <sor...@gmail.com> wrote:

On Wed, Mar 15, 2017 at 8:52 PM, Watson Ladd <watso...@gmail.com> wrote:
Dear all,

Hello and welcome to the RISC-V community!

(I am not even trying to represent the project in the following.)

I have several ideas for potentially useful instructions and extensions that
can support additional mitigations. The first is COME FROM. All jump targets

1. I believe the person in charge of this area is David Chisnall at Cambridge.

2. You may have noticed that the base ISA is aggressively minimal and
leaves out things like integer rotates.  Krste has commented a couple
times that "the hardest part is saying no".  RISC-V's design purpose
was to be simple enough for research projects.  Any proposal to add
anything faces an extremely uphill battle.

The lack of an integer rotate is a conspicuous omission which will hopefully be remedied in the Bit Manipulation Extension. Rotates are trivial to implement if there is already a barrel shifter. It seems wasteful of the barrel shifter hardware to not have a bit controlling a trace that loops back to the other end of the shifter. While it only saves 4 instructions, the prevalence of rotate in cryptographic algorithms makes the savings quite substantial on crypto benchmarks. The cost of the trace is much less than the cost of the 4 instructions. I note that gcc and clang both lift rotate patterns in C into rotate instructions. The other conspicuous omission is bswap which generates pathological code on RISC-V, given that /most/ cryptographic algorithms expect network byte order (big endian), I think bswap will be essential.

3. Proposals to add complicated features are even more uphill, due to
the RISC-V microarchitectural neutrality goal.

4. I for one am looking forward to the day - maybe 100 years from now
- when C is rare enough in security-critical codebases that we don't
have to jump through hoops to deal with stack-buffer overflows.

must be COME FROM on chips that support it when this extension is used. As a

5. "When this extension is used" is extremely vague.  Minimal
implementations and non-minimal implementations need to share a
software ecosystem, so adding new hard requirements for existing
software is problematic.

6. "Jump targets" is quite a bit of microarchitectural complexity.  If
you're jumping to a swapped out page, do you trap before the jump, or
trap after (and lose the chance to check the jump target)?

7. RISC-V has a design goal of being "boring", in the sense that it
imposes very few novel challenges for implementors and compiler
writers.  I am not aware of any mainstream ISA that does anything like
this proposal.

Watson’s proposal is pretty similar to Intel CET (Control-flow Enforcement Technology) which uses a shadow stack. It is in a mainstream ISA albeit is a relatively new feature (June 2016). CFI has been in the literature for a while so it’s about time the techniques were adopted in a mainstream ISA. CFI techniques seem to be quite well studied. I believe the Intel CET implementation uses NOPs for backwards compatibility:


The criticism of Intel’s CET is the number of bits used in indirect jump targets. In a stackless architecture, JALR could propose and expect a CFI cookie instruction at the target of the indirect jump, which may be the return point; the next instruction after JAL(R), or at indirect jump targets, to “tie” the indirect forward and return control flow. This was discussed at lengths in the thread. The number of bits represent the strength of the control flow “tie” and relate to the probability of finding an indirect target that can be used as a gadget. The return target after a JAL, if using the PC relative address of the called functions return can be “completely” tied (at the expense of tail calls). Note that tail calls are not possible in SysV i386 PIC ABI, so comparatively the loss of tail calls for the gain of CFI is not strictly a big loss.

Returns are interesting. I’m working on binary translation and there are similar control flow integrity issues. JALR is the only RVI instruction that I am still interpreting. I need to implement a tiny circular buffer (so we don’t have to worry about stack overflow), that uses the JAL(R) push pop hints and contains the pair of guest code return address and translated trace return address, so I can accelerate RET with a tiny trampoline that checks these address along with a slow path when the address in the circular buffer doesn’t match. e.g. which may happen during context switching, longjmp, or deep nesting.

It’s worth noting that the lack of notice to the processor during context switch and longjmp prevent using the stack hints to detect control flow tampering, however a stackless architecture using cookies or relative addresses of returns, could make a RISC-V CFI extension distinct from the prevailing mainstream implementation.

--
You received this message because you are subscribed to the Google Groups "RISC-V ISA Dev" group.

To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+u...@groups.riscv.org.
To post to this group, send email to isa...@groups.riscv.org.
Visit this group at https://groups.google.com/a/groups.riscv.org/group/isa-dev/.

Jacob Bachmeyer

unread,
Mar 31, 2017, 9:24:39 PM3/31/17
to Michael Clark, Stefan O'Rear, Watson Ladd, RISC-V ISA Dev
Michael Clark wrote:
> The criticism of Intel’s CET is the number of bits used in indirect
> jump targets. In a stackless architecture, JALR could propose and
> expect a CFI cookie instruction at the target of the indirect jump,
> which may be the return point; the next instruction after JAL(R), or
> at indirect jump targets, to “tie” the indirect forward and return
> control flow. This was discussed at lengths in the thread. The number
> of bits represent the strength of the control flow “tie” and relate to
> the probability of finding an indirect target that can be used as a
> gadget. The return target after a JAL, if using the PC relative
> address of the called functions return can be “completely” tied (at
> the expense of tail calls). Note that tail calls are not possible in
> SysV i386 PIC ABI, so comparatively the loss of tail calls for the
> gain of CFI is not strictly a big loss.

See the "sucfistatus" proposal I suggested earlier in this thread--the
use of cookies and/or multiple stacks (which would be a pure ABI change,
not affecting the hardware at all) does not preclude tail calls,
although it does make them effectively visible to the calling function
(which must ACCEPT the tail-called function's return cookie). A
slightly further improved form of that proposal can even work with
millicode--the function PROPOSEs a return cookie, then makes a direct
jump to the common epilogue millicode, which returns with its (tail)
caller's return cookie intact.


-- Jacob

Michael Clark

unread,
Mar 31, 2017, 10:56:37 PM3/31/17
to jcb6...@gmail.com, Stefan O'Rear, Watson Ladd, RISC-V ISA Dev
Yes. I liked the idea in principle however I still like the idea of a NOP embedding so that the ROP mitigations can be introduced without backwards incompatible ABI changes. I disagree with the false sense of security argument. It is more about not requiring a different set of binaries to run on systems without the CFI instructions. Backwards compatibility is not particularly novel, but a NOP embedding allows armoured binaries to run on cores without the instructions being present just like Linux or Windows can run on cores without SMAP, SMEP, NX or a variety of other security features being present. I think ABI backwards compatibility is really more of a simple necessity.

Much of the same type of thought has gone into IETF’s TLS exploit mitigations remaining backwards compatible with a subset of earlier versions, and it of course TLS is a non-proprietary open standard. Of course endpoints needs to upgrade to benefit from future mitigations (in this case ROP), but backwards compatibility can be retained for endpoints that don’t yet have the mitigations deployed.

Michael

Jacob Bachmeyer

unread,
Apr 1, 2017, 7:29:51 PM4/1/17
to Michael Clark, Stefan O'Rear, Watson Ladd, RISC-V ISA Dev
Michael Clark wrote:
>> On 1 Apr 2017, at 2:24 PM, Jacob Bachmeyer <jcb6...@gmail.com> wrote:
>>
>> Michael Clark wrote:
>>
>>> The criticism of Intel’s CET is the number of bits used in indirect jump targets. In a stackless architecture, JALR could propose and expect a CFI cookie instruction at the target of the indirect jump, which may be the return point; the next instruction after JAL(R), or at indirect jump targets, to “tie” the indirect forward and return control flow. This was discussed at lengths in the thread. The number of bits represent the strength of the control flow “tie” and relate to the probability of finding an indirect target that can be used as a gadget. The return target after a JAL, if using the PC relative address of the called functions return can be “completely” tied (at the expense of tail calls). Note that tail calls are not possible in SysV i386 PIC ABI, so comparatively the loss of tail calls for the gain of CFI is not strictly a big loss.
>>>
>> See the "sucfistatus" proposal I suggested earlier in this thread--the use of cookies and/or multiple stacks (which would be a pure ABI change, not affecting the hardware at all) does not preclude tail calls, although it does make them effectively visible to the calling function (which must ACCEPT the tail-called function's return cookie). A slightly further improved form of that proposal can even work with millicode--the function PROPOSEs a return cookie, then makes a direct jump to the common epilogue millicode, which returns with its (tail) caller's return cookie intact.
>>
>
> Yes. I liked the idea in principle however I still like the idea of a NOP embedding so that the ROP mitigations can be introduced without backwards incompatible ABI changes. I disagree with the false sense of security argument. It is more about not requiring a different set of binaries to run on systems without the CFI instructions. Backwards compatibility is not particularly novel, but a NOP embedding allows armoured binaries to run on cores without the instructions being present just like Linux or Windows can run on cores without SMAP, SMEP, NX or a variety of other security features being present. I think ABI backwards compatibility is really more of a simple necessity.
>

The fundamental problem is that there is not enough space in a NOP
embedding. Note that the CFI.SALA instruction I suggested is 64 bits
long. Also, a "CFI enable" flag allows CFI-capable implementations to
run non-CFI binaries (if the discussion had continued, the next rewrite
of CFI.SALA would have reduced cookies to 29 bits to accommodate a "CFI
active" flag in sucfistatus and/or changed to cookies only), but a
program has no good way to determine if its CFI protections are actually
effective if the indirect landing is a baseline NOP. Using a new
instruction solves this problem--a binary built for CFI will run *only*
on systems that actually support CFI. (Or that at least recognize the
instructions--a supervisor could take the illegal instruction trap and
emulate CFI.SALA, at least for PROPOSE/ACCEPT--another reason to change
to cookies only and drop the "how did we get here?" mechanism.)

-- Jacob

Michael Clark

unread,
Apr 1, 2017, 7:45:39 PM4/1/17
to jcb6...@gmail.com, Stefan O'Rear, Watson Ladd, RISC-V ISA Dev
In your proposal there needs to be some state held while jumping, populated during PROPOSE and checked during ACCEPT.

I can see that the instruction fetch unit is in a jumping state after executing the PROPOSE, changing program counter and then expecting to fetch an ACCEPT. If it is in this state could the jump state machine not reset or accumulate the state from multiple NOPs? in much the same way that you propose that there could be multiple ACCEPT instuctions to handle virtual functions. It is then just a matter of how the landing pad instruction(s) assemble the accept cookie.

It’s arguable whether handling another instruction width is more or less complex than populating the bits from what could be handled as “coprocessor instructions” for the “jump state machine”, that switch the fetch unit back into the normal state or cause a trap, depending on the observed cookie.

Clearly there has to be a mode where the CFI instructions are not enabled which is an essential part in your design. The argument is whether the cookies can be embedded in NOPs to allow the code to run on existing RISC-V systems.

Jacob Bachmeyer

unread,
Apr 1, 2017, 8:12:38 PM4/1/17
to Michael Clark, Stefan O'Rear, Watson Ladd, RISC-V ISA Dev
Michael Clark wrote:
>
>> On 2 Apr 2017, at 11:29 AM, Jacob Bachmeyer <jcb6...@gmail.com
>> <mailto:jcb6...@gmail.com>> wrote:
>>
>> Michael Clark wrote:
>>>> On 1 Apr 2017, at 2:24 PM, Jacob Bachmeyer <jcb6...@gmail.com
This is not what I suggested. PROPOSE merely selects "cookie mode" and
sets the cookie. If CFI is active, (baseline) JALR sets IJIP. If IJIP
is set, execution of any instruction other than CFI.SALA faults. ACCEPT
checks if the cookie matches its immediate and clears IJIP if so.

In "how did we get here?" mode, JALR simply copies its rs1 to a field in
sucfistatus. Internal logic converts that value to a one-hot encoding
for comparison with the mask in CFI.SALA. If ((mask & onehot) != 0),
IJIP is cleared.

Special case: JALR with rs1==x0 does not set IJIP.

> It’s arguable whether handling another instruction width is more or
> less complex than populating the bits from what could be handled as
> “coprocessor instructions” for the “jump state machine”, that switch
> the fetch unit back into the normal state or cause a trap, depending
> on the observed cookie.

The state machine that I suggest is much simpler and is integrated into
the main execution pipeline:

(1) (optional; selects "cookie mode") CFI.SALA as PROPOSE selects
"cookie mode" and sets the cookie

(2) JALR sets IJIP and copies its rs1 to an internal register that is
visible in sucfistatus in "how did we get here?" mode

(3a) CFI.SALA in "how did we get here?" mode checks if JALR's rs1
satisfies its mask and clears IJIP if so

(3b) CFI.SALA as ACCEPT checks if a cookie is set and clears both IJIP
and "cookie mode" if its immediate matches the current cookie

(4) Executing any instruction other than CFI.SALA while IJIP is set
raises CFI violation

I also expect that 64-bit instructions will eventually turn up in most
non-microcontroller implementations anyway. (microcontroller == akin to
PIC or AVR)

> Clearly there has to be a mode where the CFI instructions are not
> enabled which is an essential part in your design. The argument is
> whether the cookies can be embedded in NOPs to allow the code to run
> on existing RISC-V systems.

The instructions are always enabled, but their use is not enforced
unless "CFI enable" is set. ("CFI enable" was missing from the last draft.)

In fact, CFI.SALA as PROPOSE/ACCEPT provides another option: JALR can
set IJIP even if CFI is not enforced iff a cookie is currently active.
This allows a shared library, for example, to use CFI internally even if
its host program was not built for CFI.


-- Jacob

Michael Clark

unread,
Apr 1, 2017, 9:42:23 PM4/1/17
to jcb6...@gmail.com, Stefan O'Rear, Watson Ladd, RISC-V ISA Dev
On 2 Apr 2017, at 12:12 PM, Jacob Bachmeyer <jcb6...@gmail.com> wrote:

Michael Clark wrote:

On 2 Apr 2017, at 11:29 AM, Jacob Bachmeyer <jcb6...@gmail.com <mailto:jcb6...@gmail.com>> wrote:

Michael Clark wrote:
On 1 Apr 2017, at 2:24 PM, Jacob Bachmeyer <jcb6...@gmail.com <mailto:jcb6...@gmail.com>> wrote:

Michael Clark wrote:
  
The criticism of Intel’s CET is the number of bits used in indirect jump targets. In a stackless architecture, JALR could propose and expect a CFI cookie instruction at the target of the indirect jump, which may be the return point; the next instruction after JAL(R), or at indirect jump targets, to “tie” the indirect forward and return control flow. This was discussed at lengths in the thread. The number of bits represent the strength of the control flow “tie” and relate to the probability of finding an indirect target that can be used as a gadget. The return target after a JAL, if using the PC relative address of the called functions return can be “completely” tied (at the expense of tail calls). Note that tail calls are not possible in SysV i386 PIC ABI, so comparatively the loss of tail calls for the gain of CFI is not strictly a big loss.
    
See the "sucfistatus" proposal I suggested earlier in this thread--the use of cookies and/or multiple stacks (which would be a pure ABI change, not affecting the hardware at all) does not preclude tail calls, although it does make them effectively visible to the calling function (which must ACCEPT the tail-called function's return cookie).  A slightly further improved form of that proposal can even work with millicode--the function PROPOSEs a return cookie, then makes a direct jump to the common epilogue millicode, which returns with its (tail) caller's return cookie intact.
  

Yes. I liked the idea in principle however I still like the idea of a NOP embedding so that the ROP mitigations can be introduced without backwards incompatible ABI changes. I disagree with the false sense of security argument. It is more about not requiring a different set of binaries to run on systems without the CFI instructions. Backwards compatibility is not particularly novel, but a NOP embedding allows armoured binaries to run on cores without the instructions being present just like Linux or Windows can run on cores without SMAP, SMEP, NX or a variety of other security features being present. I think ABI backwards compatibility is really more of a simple necessity.


The fundamental problem is that there is not enough space in a NOP embedding.  Note that the CFI.SALA instruction I suggested is 64 bits long. Also, a "CFI enable" flag allows CFI-capable implementations to run non-CFI binaries (if the discussion had continued, the next rewrite of CFI.SALA would have reduced cookies to 29 bits to accommodate a "CFI active" flag in sucfistatus and/or changed to cookies only), but a program has no good way to determine if its CFI protections are actually effective if the indirect landing is a baseline NOP.  Using a new instruction solves this problem--a binary built for CFI will run *only* on systems that actually support CFI.  (Or that at least recognize the instructions--a supervisor could take the illegal instruction trap and emulate CFI.SALA, at least for PROPOSE/ACCEPT--another reason to change to cookies only and drop the "how did we get here?" mechanism.)

In your proposal there needs to be some state held while jumping, populated during PROPOSE and checked during ACCEPT.

I can see that the instruction fetch unit is in a jumping state after executing the PROPOSE, changing program counter and then expecting to fetch an ACCEPT. If it is in this state could the jump state machine not reset or accumulate the state from multiple NOPs? in much the same way that you propose that there could be multiple ACCEPT instuctions to handle virtual functions. It is then just a matter of how the landing pad instruction(s) assemble the accept cookie.

This is not what I suggested.  PROPOSE merely selects "cookie mode" and sets the cookie.  If CFI is active, (baseline) JALR sets IJIP.  If IJIP is set, execution of any instruction other than CFI.SALA faults.  ACCEPT checks if the cookie matches its immediate and clears IJIP if so.

That was my understanding of your proposal. There is a state such that after an indirect jump the processor will fault if IJIP is set and a regular instruction is fetched instead of an ACCEPT instruction; which clears IJIP if the cookie matches. 

I’m just arguing that PROPOSE and ACCEPT are encoded within existing NOPs for example by adding a shift or match bit, where n-bits of a 20-bit immediate from a AUIPC NOP can be shifted into the cookie register over one or two instructions, primarily to support binary backwards compatibility.

I don’t see that adding too much complexity to a unit that can fetch 16-bit and 32-bit instructions. The implementation already has to keep cookie state over several instructions e.g. PROPOSE, JALR (IJIP=1), ACCEPT (IJIP=0). If we add a shift/match bit to your CFI.SALA instruction, and encode it within a 32-bit AUIPC NOP instruction and require the cookie state register to have a constant offset shifter.

2 x 32-bit AUIPC NOP instructions encoding within the 20-bit immediate would be satisfactory.

- 1-bit PROPOSE/ACCEPT
- 1-bit SHIFT/MATCH

In fact 17-bits of labels would be enough for most binaries such that all indirect jumps and returns could be tied to unique values so it may in fact only require 1 x 32-bit AUIPC NOP instruction instead of 2 x 32-bit AUIPC NOP instructions with the SHIFT/MATCH bit i’ve added.

sucfistatus could have a 3-bit ‘cseg’ field (cookie segments) that indicates how many 17-bit segments need to be shifted in to the cookie before comparison. This would allow for 17-bit cookies (encoded in a single 32-bit NOP) all the way up to 119-bit cookies (encoded in 7 x 32-bit NOPs). sucfistatus ‘cseg’ width of zero would switch off CFI.

$ riscv64-unknown-elf-objdump -d linux-4.6.2/vmlinux | grep jalr | wc -l
    5066
$ riscv64-unknown-elf-objdump -d linux-4.6.2/vmlinux | grep ret | wc -l
   17136

How many label bits do we need? Why can’t we fit into a 32-bit NOP?

In "how did we get here?" mode, JALR simply copies its rs1 to a field in sucfistatus.  Internal logic converts that value to a one-hot encoding for comparison with the mask in CFI.SALA.  If ((mask & onehot) != 0), IJIP is cleared.

Special case:  JALR with rs1==x0 does not set IJIP.

It’s arguable whether handling another instruction width is more or less complex than populating the bits from what could be handled as “coprocessor instructions” for the “jump state machine”, that switch the fetch unit back into the normal state or cause a trap, depending on the observed cookie.

The state machine that I suggest is much simpler and is integrated into the main execution pipeline:

(1) (optional; selects "cookie mode") CFI.SALA as PROPOSE selects "cookie mode" and sets the cookie

(2) JALR sets IJIP and copies its rs1 to an internal register that is visible in sucfistatus in "how did we get here?" mode

(3a) CFI.SALA in "how did we get here?" mode checks if JALR's rs1 satisfies its mask and clears IJIP if so

(3b) CFI.SALA as ACCEPT checks if a cookie is set and clears both IJIP and "cookie mode" if its immediate matches the current cookie

(4) Executing any instruction other than CFI.SALA while IJIP is set raises CFI violation

I also expect that 64-bit instructions will eventually turn up in most non-microcontroller implementations anyway.  (microcontroller == akin to PIC or AVR)

If we had some 48-bit and 64-bit NOPs or other instructions specified such that many RISC-V fetch units could alreadu handle them, then I would be more inclined towards the 64-bit instruction approach however I think it would be better if CFI binaries can also run on the RISC-V Base ISA.

It is technically possible by changing the way CFI.SALA accumulates its cookie bits.

Otherwise it is a surefire way to propose a CFI mechanism that doesn’t get adopted due to binary incompatibility issues.

Clearly there has to be a mode where the CFI instructions are not enabled which is an essential part in your design. The argument is whether the cookies can be embedded in NOPs to allow the code to run on existing RISC-V systems.

The instructions are always enabled, but their use is not enforced unless "CFI enable" is set.  ("CFI enable" was missing from the last draft.)

In fact, CFI.SALA as PROPOSE/ACCEPT provides another option:  JALR can set IJIP even if CFI is not enforced iff a cookie is currently active. This allows a shared library, for example, to use CFI internally even if its host program was not built for CFI.


-- Jacob

-- 
You received this message because you are subscribed to the Google Groups "RISC-V ISA Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+u...@groups.riscv.org.
To post to this group, send email to isa...@groups.riscv.org.
Visit this group at https://groups.google.com/a/groups.riscv.org/group/isa-dev/.

Allen Baum

unread,
Apr 1, 2017, 10:09:36 PM4/1/17
to Michael Clark, jcb6...@gmail.com, Stefan O'Rear, Watson Ladd, RISC-V ISA Dev


-Allen

On Apr 1, 2017, at 6:42 PM, Michael Clark <michae...@mac.com> wrote:


...
I’m just arguing that PROPOSE and ACCEPT are encoded within existing NOPs for example by adding a shift or match bit, where n-bits of a 20-bit immediate from a AUIPC NOP can be shifted into the cookie register over one or two instructions, primarily to support binary backwards compatibility.

I'm just waiting for the bug reports when those sequences start crossing page boundaries or cross with interrupts...

Jacob Bachmeyer

unread,
Apr 1, 2017, 10:38:41 PM4/1/17
to Michael Clark, Stefan O'Rear, Watson Ladd, RISC-V ISA Dev
Michael Clark wrote:
>> On 2 Apr 2017, at 12:12 PM, Jacob Bachmeyer <jcb6...@gmail.com
>> <mailto:jcb6...@gmail.com>> wrote:
>>
>> Michael Clark wrote:
>>>> On 2 Apr 2017, at 11:29 AM, Jacob Bachmeyer <jcb6...@gmail.com
>>>> <mailto:jcb6...@gmail.com> <mailto:jcb6...@gmail.com>> wrote:
>>>>
>>>> Michael Clark wrote:
>>>>>> On 1 Apr 2017, at 2:24 PM, Jacob Bachmeyer <jcb6...@gmail.com
Well, first we have a hard limit on cookie size: the cookie must be
exposed in sucfistatus so that it can be swapped/reloaded during context
switch. This access really should be atomic, so cookie length cannot
exceed XLEN minus however many state bits the CFI mechanism uses. For
simplicity, I suggested that sucfistatus should always be a 32-bit
register regardless of XLEN.

Second, because the cookie mechanism must be transparent across context
switch, both the current pending cookie *and* any partially loaded match
candidate must appear in CSR space. This either halves the possible
cookie length (probably to 14 bits) or requires an additional CSR for
the pending cookie fragments or limits CFI to RV64 and up. CFI.SALA as
a 64-bit instruction permits PROPOSE/ACCEPT to be atomic, even on RV32.

Third, there is a large amount of encoding space in a 64-bit instruction
and CFI.SALA as I suggest is also a direct jump (possibly conditional on
match). There is (comparatively) very little encoding space available
in U-type NOPs and I have not seen any consensus that CFI is the proper
use for it. A U-type NOP has a 20-bit immediate, so two would be needed
for a reasonable cookie; this takes exactly as much space as CFI.SALA in
the program text, but omits the option for a direct jump that is useful
for PLT-like structures. Splitting cookies into two fragments would
mean that we have CFI.HINT.PH, CFI.HINT.PL, CFI.HINT.AH, CFI.HINT.AL,
for "propose high half", "propose low half", "accept high half", "accept
low half". Effectively, this would mean we have two cookies, both of
which must match, but an acceptor that accepts multiple values cannot
tie a particular high value to a particular low value. On the other
hand, this only uses 16 bits out of the available 20, so it is probably
more palatable in terms of actually reserving meaning for those NOPs,
but is limited to PROPOSE/ACCEPT and omits the "how did we get here?"
mode, also useful for PLT-like structures.

Fourth, there is a major advantage to longer cookies: more space means
that less concern is needed about possible collisions and cookie values
are easier to generate. One option for return cookies would simply be a
CRC over the type of the first value returned. (like CRC("struct quux {
int foo; int bar; long baz; } *") or CRC("typedef some_type") and so
on) The effectiveness of this approach is inversely proportional to the
odds of a CRC match between differing types.
First, implementations that do not support 64-bit instructions must
raise illegal instruction upon encountering one (CFI would simply be
incompatible with implementations that use the long instruction encoding
as custom 32-bit instructions) and a supervisor could still
trap-and-emulate.

Second, I expect that 64-bit instructions will eventually become
commonplace for reasons unrelated to CFI.

Third, a very important issue for performance is that PROPOSE/ACCEPT
should be atomic. Accumulating cookie bits is a good way to expand the
amount of state that must be swapped on *every* context switch. CFI
should not be complex enough for lazy CFI context switch to be worthwhile.


-- Jacob

Corey Richardson

unread,
Apr 15, 2017, 9:49:26 PM4/15/17
to Watson Ladd, RISC-V ISA Dev
I haven't been following this thread super closely, but Galois just proposed this for x86: http://landhere.galois.com/


On Wed, Mar 15, 2017, at 23:52, Watson Ladd wrote:
> Dear all,
>
> I have several ideas for potentially useful instructions and extensions
> that can support additional mitigations. The first is COME FROM. All jump
> targets must be COME FROM on chips that support it when this extension is
> used. As a result the available gadgets for control flow manipulation is
> reduced. COME FROM has no other effects. The second is a shadow stack
> modification which adds three instructions: CALL, CALLR and RETURN and two
> privileged-mode registers STACKBASE and STACKLIM as well as modify the
> memory protection system to add a shadow access type. STACKBASE and
> STACKLIM are conceptually per user or kernel process.
>
> CALL(R) has the same format as JA(R)L and the same semantics except it does
> not touch the destination register. Instead the address is pushed onto a
> stack, that may be dumped to memory between STACKBASE and STACKLIM. RETURN
> pops this stack. Memory accesses issued by these dumps are in shadow mode:
> a page with the shadow bit set may be written or read by them. If the stack
> size exceeds STACKLIM an error is reported to the OS. Multiple processes
> can use the same shadow page because of the STACKBASE and STACKLIM
> registers.
>
> I look forward to discussing these proposals further/gathering other
> instructions that can assist in securing today's software.
>
> Sincerely,
> Watson
>
> --
> You received this message because you are subscribed to the Google Groups "RISC-V ISA Dev" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+u...@groups.riscv.org.
> To post to this group, send email to isa...@groups.riscv.org.
Reply all
Reply to author
Forward
0 new messages