Introducing a new module-level metadata format is just another kind of IR
extension, with the disadvantage of being untyped. We should be cutting down
on these types of things, not introducing a new one [1].
So, bitset would be a property that means : globals with the same name will append on a string of bits in the order in which they appear, and it's the job of the front end to make sure the correct order is followed in every unit, so that linking does the right job in every corner case?
Could that be used for efficient global boolean flags, like LLVM's options? Even if you don't need the range check, you get the bit mask check for free.
I'm trying to find other uses to help you in the case to make this into a first class citizen, not metadata.
Cheers,
Renato
So, bitset would be a property that means : globals with the same name will append on a string of bits in the order in which they appear, and it's the job of the front end to make sure the correct order is followed in every unit, so that linking does the right job in every corner case?
Could that be used for efficient global boolean flags, like LLVM's options? Even if you don't need the range check, you get the bit mask check for free.
On 29 Jan 2015 11:36, "Sean Silva" <chiso...@gmail.com> wrote:
>
>
>
> On Thu, Jan 29, 2015 at 12:53 AM, Renato Golin <renato...@linaro.org> wrote:
>>
>> So, bitset would be a property that means : globals with the same name will append on a string of bits in the order in which they appear, and it's the job of the front end to make sure the correct order is followed in every unit, so that linking does the right job in every corner case?
>>
>> Could that be used for efficient global boolean flags, like LLVM's options? Even if you don't need the range check, you get the bit mask check for free.
>
> Maybe during LTO... in general they would need to have distinct addresses.
>
> Actually, Peter, would it be possible to prototype this with an appending i8 array that we already have in the IR? Some space overhead compared to appending bits, but could allow for a quick prototype.
This would work, and you could make the packaging during your lowering pass, no?
Cheers,
Renato
On Jan 29, 2015, at 6:50 PM, Peter Collingbourne <pe...@pcc.me.uk> wrote:On Thu, Jan 29, 2015 at 02:22:48PM -0800, Peter Collingbourne wrote:I've been working on a patch that implements the bitset attribute and bitset
lowering pass. I'll see if I can send it out later today.
http://reviews.llvm.org/D7288
I have a bad feeling about this... :)
--renato
Oof. That’s a deal breaker for any of the uses I was hoping for. Nuts. :(
I wanted to start by giving an explanation of what I am trying to achieve
and how I am trying to achieve it.
I am working towards introducing into LLVM a security mechanism, Forward
Control Flow Integrity (CFI), that is designed to mitigate against
vulnerabilities that allow attacks based on corrupting vtable or function
pointers in memory in order to subvert a program's control flow.
More details are in a paper that I co-authored that was presented at USENIX
Security last year [1]. As mentioned in the paper, attackers are increasingly
relying on such techniques to subvert control flow. This is why I feel that it
is particularly important that compilers contain practical defenses against
such attacks.
One particular variant of the defense I am proposing, vtable verification,
was implemented in GCC and described in section 3 of the paper, however it
comes with a significant performance overhead, more than 8% on certain Chrome
browser-based benchmarks and up to around 20% on SPEC 2006 benchmarks (see
Figure 2). This is likely due to the fact that it searches lists of vtables
to determine if a given vtable is valid. This is a direct consequence of
its avoidance of techniques that depend on changing how the program is linked.
The implementation I am proposing to contribute to LLVM focuses on
performance. Building the needed data structures at link time allows us
to reduce the cost of checking the validity of a vtable pointer to a few
machine instructions and a single load from memory.
On Thu, Jan 29, 2015 at 11:04:41PM -0800, Chris Lattner wrote:
> I don’t think that adding an IR construct for this is the way to go. You’re making IR complicated for everyone to serve a very narrow use case (one that admittedly I don’t really understand).
> Also, your patch is incomplete. Presumably you have to scatter checks for “if (!GV->isBitSet())” throughout the optimizer, codegen and other things that touch globals.
I'll address these points simultaneously because I would like to explain
why extensive support for the new construct is not required, and why the
maintenance burden is not as large as it might seem (and why my patch does
not need to make such extensive changes to the optimizers).
The first point I'd like to make is that from the point of view of optimizer
passes other than bitset lowering, the llvm.bitset.test intrinsic has opaque
semantics with regard to the content of the globals it references, so they
cannot legally modify the contents of the bitset global.
The second is that any use of a bitset global other than as an argument to
the llvm.bitset.test intrinsic has undefined semantics. (This is something
that can be documented better.) This means that any optimizer pass that
looks through global initializers does not require any changes, as any
transformation it may perform on IR that treats such globals as regular
globals (for example by taking its address or loading from it) is semantics
preserving, as the semantics of such IR would have been undefined anyway.
With these points in mind, I'm reasonably confident that very little code
needs to care about the new flag.
(I should also point out that I know that the patch most likely works without
any other optimizer changes, because I have a work-in-progress patch that
implements the Clang side of this, and have successfully applied it to a
large C++ codebase and found real bugs in that codebase.)
Regarding codegen, I haven't implemented support in codegen for bitsets yet,
the intrinsic is completely handled by the pass. I can't imagine the changes
being very intrusive though. We can easily add a check that no bitset stuff
makes it through to codegen for the moment.
> The fact that this affects semantics and will only work with LTO and not native linkers is also really weird to me.
I agree, which is why I plan to add support to lld and perhaps other linkers,
but we do have to start somewhere. Adding the functionality to the compiler
only seems like a reasonable first step, even if we depend on LTO to begin
with.
> Is there other precedent for that? The only cases I know that affect LTO add information that is safe to drop (e.g. TBAA etc).
There is the jumptable attribute, which has been used to implement a variant
of CFI for indirect function calls (see section 4 of the USENIX paper), and
that only works effectively with an LTO'd module. (We might end up adding
native linker support for that or something similar as well.)
Thanks,
--
Peter
[1] http://www.pcc.me.uk/~peter/acad/usenix14.pdf
Does that seem more reasonable to people?
> The one think we need to ensure is that your metadata can be dropped by the
> optimizer and the code remains correct. I'm guessing no vtable would mean
> anything goes (not check)? That's bad security-wise, but it'll at least
> work. We may want to make sure nothing gets dropped through a debug flag,
> so that we can compile Chrome and be confident that all the checks we want
> are there.
So the flag would determine whether no bitset present means return 1 or
return 0? Sounds reasonable.
On Tue, Feb 03, 2015 at 04:03:45PM -0800, Sean Silva wrote:
> One other thing: if this can be used for control-flow integrity, I assume
> it has to have good knowledge of the available indirect call targets. Could
> we also use this information for devirtualization?
I would expect so. If a bitset contains only one element, we should be able
to teach the lowering pass to simply test that the given pointer is equal
to that element (i.e. the only valid vptr). If the later IR branches to a
trapping block if that condition is false, it should be possible for the
optimizer to deduce that the condition is true in any code that is dominated
by the branch, and from that do devirtualization.
Thanks,
--
Peter