[LLVMdev] IR extension proposal: bitset constants

27 views
Skip to first unread message

Peter Collingbourne

unread,
Jan 27, 2015, 4:21:42 PM1/27/15
to llv...@cs.uiuc.edu
Hi all,

I would like to propose a mechanism that allows IR modules to co-operatively
build a pointer set corresponding to addresses within a given set of
globals. The specific use case I have in mind is to provide a mechanism
for a C++ program to efficiently verify (at each call site) that a vtable
pointer is in the set of valid vtable pointers for the class or its derived
classes. One way of doing this is for a toolchain component to build, for each
class, a bit set that maps to the memory region allocated for the vtables,
such that each 1 bit in the bit set maps to a valid vtable for that class,
and lay out the vtables next to each other, to minimize the total size of
the bit sets. Call sites would perform a range check followed by a load of
the appropriate bit from the bit set.

To give a concrete example, suppose we have the following classes:

struct A { virtual void f(); };
struct B : A { virtual void f(), g(); };
struct C : A { virtual void f(), h(); };

with the following vtables:

_ZTV1A = { &A::rtti, &A::f };
_ZTV1B = { &B::rtti, &B::f, &B::g };
_ZTV1C = { &C::rtti, &C::f, &C::h };

The set of valid vtables for static class A is {&_ZTV1A[1], &_ZTV1B[1],
&_ZTV1C[1]}, for B is {&_ZTV1B[1]} and for C is {&_ZTV1C[1]}. The toolchain
would lay out _ZTV1A, _ZTV1B and _ZTV1C consecutively, and generate bit sets
like this:

A = {0,1,0,1,0,0,1,0}
B = {0,0,0,1,0,0,0,0}
C = {0,0,0,0,0,0,1,0}

A call site (say the static type of the call is A) will check whether the
object's vtable falls in the range [_ZTV1A, _ZTV1C], and is sizeof(void
*)-aligned. It would then load the (vtable-_ZTV1A)/sizeof(void *)'th entry
in A's bitset, which will be 1 if the vtable is valid for A.

We are now faced with a number of implementation questions: how do we represent
these (potentially partial) bit sets in LLVM, and how do we arrange to lay
out the globals and build the complete bit sets? Remember that classes B
and C may be in different translation units, so we do not necessarily have
a complete picture of A's bit set at A's compile time.

What I propose is that the parts of the bit sets be represented as arrays
of pointers, which will be resolved through some mechanism (either at
LTO time, or at regular (non-LTO) link time) to bit sets corresponding to
those addresses. This mechanism would also be responsible for laying out
the referenced globals (i.e. the vtables) as close together as possible. We
introduce a new flag on globals, the so-called "bitset" flag, which causes
the global to be transformed using this mechanism. Such a global cannot be
accessed in the normal way. It may only be accessed using an intrinsic that
tests whether a given pointer is in the set.

To return to our concrete example, a translation unit defining the vtable
for A may also define the following global:

@_ZBS1A = appending bitset constant [1 x i8*] [gep(@_ZTV1A, 0, 1)]

A translation unit defining B would define:

@_ZBS1A = appending bitset constant [1 x i8*] [gep(@_ZTV1B, 0, 1)]
@_ZBS1B = appending bitset constant [1 x i8*] [gep(@_ZTV1B, 0, 1)]

And C:

@_ZBS1A = appending bitset constant [1 x i8*] [gep(@_ZTV1C, 0, 1)]
@_ZBS1C = appending bitset constant [1 x i8*] [gep(@_ZTV1C, 0, 1)]

A later mechanism would combine the bitset globals, lay out the referenced
globals and build the bit sets. In the LTO case, the IR linker can already
do the combination step if the globals are given appending linkage. A
later compiler pass (which we will call the "bitset lowering pass") would lay out
the globals (by combining them into a single large global, and RAUWing the
original globals with aliases that gep into the large global) and build the
bit sets. In the non-LTO case, we could also invent an object-file-specific
mechanism for the backend to tell the linker to combine and build the bit sets,
for example by placing the bitset globals in a specifically named section,
and place the referenced globals in individual sections so that the linker
can rearrange them.

As previously mentioned, an intrinsic would be used to test entries in the
bit set at call sites. For example, the relevant IR for the following C++ code:

A *some_object = ...;
some_object->f();

might look like this:

%vtable = load i8** %some_object
%p = call i1 @llvm.bitset.test(i8* %vtable, i8* @_ZBS1A) readnone
br i1 %p, label %continue, label %trap

In the LTO case, such intrinsic calls would be lowered by the bitset lowering pass
into the appropriate range/bit set check code, which might look like this:
(pseudocode for 64-bit platform)

if %vtable & 7 != 0 || %vtable < global_start_addr(_ZBS1A) || %vtable > global_end_addr(_ZBS1A) {
%p = 0
} else {
%p = bitset(_ZBS1A)[(%vtable - global_start_addr(_ZBS1A)) >> 3]
}

In the non-LTO case, we would generate similar code, but with appropriate
relocations so that the required constants (global start/end address, bitset
address, mask, shift amount) would receive appropriate linker-determined
values.

The first step I propose to make in this direction is a patch that adds
support for the bitset flag on globals, and introduces the bitset lowering pass. A
later step would involve building backend support, and linker support in lld.

Comments appreciated.

Thanks,
--
Peter
_______________________________________________
LLVM Developers mailing list
LLV...@cs.uiuc.edu http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

Sean Silva

unread,
Jan 28, 2015, 6:00:03 AM1/28/15
to Peter Collingbourne, llv...@cs.uiuc.edu
Is there any way to accomplish this that doesn't require changes in every part of the toolchain?

What you describe here sounds really efficient, but for a change this pervasive throughout the toolchain, I would at least like to see an attempt done that does not require changing the toolchain. Then we can circle back to changing the toolchain if that proves inadequate and with concrete experience informing what really is going to require a toolchain change (Maybe you've already done this? Please share.).

-- Sean Silva

Kostya Serebryany

unread,
Jan 28, 2015, 12:48:08 PM1/28/15
to Sean Silva, llv...@cs.uiuc.edu
I would start from using module-level metadata.
An IR extension might be a good idea once we show that 
  - the proposed indirect call protection mechanism is efficient and useful, and
  - there are other use cases for the IR extension. 

--kcc 

Peter Collingbourne

unread,
Jan 28, 2015, 1:41:24 PM1/28/15
to Sean Silva, llv...@cs.uiuc.edu
Hi Sean,

I agree that we should first try to accomplish this without changing every
part of the toolchain. This is why I proposed to first introduce only the IR
support and bitset lowering pass. This would provide a working implementation
that is confined to the compiler, but does require LTO.

In the long run, however, I would like to drop the dependency on LTO, in
order to permit fast incremental builds.

Once the initial approach has proven to be effective, we can think about
dropping the LTO dependency by implementing linker support. I only included
the linker implementation details to show that the design can work without
requiring LTO.

Peter

Peter Collingbourne

unread,
Jan 28, 2015, 2:01:15 PM1/28/15
to Kostya Serebryany, llv...@cs.uiuc.edu
Introducing a new module-level metadata format is just another kind of IR
extension, with the disadvantage of being untyped. We should be cutting down
on these types of things, not introducing a new one [1].

I feel that a global is the right design for this, as it represents data
that is conceptually in the "object file".

Peter

[1] https://groups.google.com/forum/#!topic/llvm-dev/u4hZUWAXQuY

Kostya Serebryany

unread,
Jan 28, 2015, 2:30:42 PM1/28/15
to Peter Collingbourne, llv...@cs.uiuc.edu
On Wed, Jan 28, 2015 at 10:57 AM, Peter Collingbourne <pe...@pcc.me.uk> wrote:
Introducing a new module-level metadata format is just another kind of IR
extension, with the disadvantage of being untyped. We should be cutting down
on these types of things, not introducing a new one [1].

Hard to argue with that. 
module-level metadata is a simple (though not the only) way to make a prototype and prove that the approach works. 

Renato Golin

unread,
Jan 28, 2015, 8:00:24 PM1/28/15
to Peter Collingbourne, LLVM Dev

So, bitset would be a property that means : globals with the same name will append on a string of bits in the order in which they appear, and it's the job of the front end to make sure the correct order is followed in every unit, so that linking does the right job in every corner case?

Could that be used for efficient global boolean flags, like LLVM's options? Even if you don't need the range check, you get the bit mask check for free.

I'm trying to find other uses to help you in the case to make this into a first class citizen, not metadata.

Cheers,
Renato

Peter Collingbourne

unread,
Jan 28, 2015, 10:08:03 PM1/28/15
to Renato Golin, LLVM Dev
On Thu, Jan 29, 2015 at 12:53:49AM +0000, Renato Golin wrote:
> So, bitset would be a property that means : globals with the same name will
> append on a string of bits in the order in which they appear, and it's the
> job of the front end to make sure the correct order is followed in every
> unit, so that linking does the right job in every corner case?

Not quite. Each entry in a bitset global (after deduplication) represents
a 1 in a bitset in a particular region of the object file. The entries may
appear in any order, and it is the job of LTO (or the linker) to build the
actual bitset.

> Could that be used for efficient global boolean flags, like LLVM's options?
> Even if you don't need the range check, you get the bit mask check for
> free.

I don't understand exactly what you mean.

Peter

Sean Silva

unread,
Jan 29, 2015, 6:40:22 AM1/29/15
to Renato Golin, LLVM Dev
On Thu, Jan 29, 2015 at 12:53 AM, Renato Golin <renato...@linaro.org> wrote:

So, bitset would be a property that means : globals with the same name will append on a string of bits in the order in which they appear, and it's the job of the front end to make sure the correct order is followed in every unit, so that linking does the right job in every corner case?

Could that be used for efficient global boolean flags, like LLVM's options? Even if you don't need the range check, you get the bit mask check for free.

Maybe during LTO... in general they would need to have distinct addresses.

Actually, Peter, would it be possible to prototype this with an appending i8 array that we already have in the IR? Some space overhead compared to appending bits, but could allow for a quick prototype.

-- Sean Silva

Renato Golin

unread,
Jan 29, 2015, 7:20:48 AM1/29/15
to Sean Silva, LLVM Dev


On 29 Jan 2015 11:36, "Sean Silva" <chiso...@gmail.com> wrote:
>
>
>
> On Thu, Jan 29, 2015 at 12:53 AM, Renato Golin <renato...@linaro.org> wrote:
>>
>> So, bitset would be a property that means : globals with the same name will append on a string of bits in the order in which they appear, and it's the job of the front end to make sure the correct order is followed in every unit, so that linking does the right job in every corner case?
>>
>> Could that be used for efficient global boolean flags, like LLVM's options? Even if you don't need the range check, you get the bit mask check for free.
>
> Maybe during LTO... in general they would need to have distinct addresses.
>
> Actually, Peter, would it be possible to prototype this with an appending i8 array that we already have in the IR? Some space overhead compared to appending bits, but could allow for a quick prototype.

This would work, and you could make the packaging during your lowering pass, no?

Cheers,
Renato

Peter Collingbourne

unread,
Jan 29, 2015, 5:26:22 PM1/29/15
to Renato Golin, LLVM Dev
I don't think it could be made to work in all cases. Consider this class
hierarchy, with each class defined in a separate translation unit:

class A { virtual void f(); };
class B { virtual void g(); };
class C { virtual void h(); };
class D : A, B { virtual void f(), g(); };
class E : B, C { virtual void g(), h(); };

vtables:

A = { &A::rtti, &A::f };
B = { &B::rtti, &B::g };
C = { &C::rtti, &C::h };
D = { &D::rtti, &D::f, &D::rtti, &D::g };
E = { &E::rtti, &E::g, &E::rtti, &E::h };

bitsets (assuming layout A,B,C,D,E):

A = { 0,1,0,0,0,0,0,1,0,0,0,0,0,0 }
B = { 0,0,0,1,0,0,0,0,0,1,0,1,0,0 }
C = { 0,0,0,0,0,1,0,0,0,0,0,1,0,1 }
D = { 0,0,0,0,0,0,0,1,0,0,0,0,0,0 }
E = { 0,0,0,0,0,0,0,0,0,0,0,1,0,0 }

Note that because of the existence of class C, we must leave a gap in
the bitset for A between A and D. Because none of the translation units
in C's hierarchy are necessarily aware of A, they cannot know to leave a
gap. Attempting to fix this in the bitset lowering pass would just introduce
more complexity, IMO.

So I don't think it is possible (or simple, at the least) without references
to other globals in the thing we use to build the bitset.

I've been working on a patch that implements the bitset attribute and bitset
lowering pass. I'll see if I can send it out later today.

Peter Collingbourne

unread,
Jan 29, 2015, 9:53:30 PM1/29/15
to Renato Golin, LLVM Dev
On Thu, Jan 29, 2015 at 02:22:48PM -0800, Peter Collingbourne wrote:
> I've been working on a patch that implements the bitset attribute and bitset
> lowering pass. I'll see if I can send it out later today.

http://reviews.llvm.org/D7288

Jim Grosbach

unread,
Jan 29, 2015, 9:56:07 PM1/29/15
to Peter Collingbourne, llv...@cs.uiuc.edu
Hi Peter,

Please forgive if this is an obvious question, but how reasonable is it for this approach to work when not all translation units are available to be rebuilt with the new information?

I'm very interested in what you're proposing here.

Jim

Sent from my iPhone

Peter Collingbourne

unread,
Jan 29, 2015, 10:38:34 PM1/29/15
to Jim Grosbach, llv...@cs.uiuc.edu
On Thu, Jan 29, 2015 at 06:51:51PM -0800, Jim Grosbach wrote:
> Hi Peter,
>
> Please forgive if this is an obvious question, but how reasonable is it for this approach to work when not all translation units are available to be rebuilt with the new information?
>
> I'm very interested in what you're proposing here.

This technique works best if we can lay out the vtables in a specific order,
but this is not necessarily a requirement. I can imagine an extension of the
design where if the pointer is out of range we can fall back to a slow path
that compares the vtable pointer against each element of a list of valid
addresses that belong to external code.

The hard requirements (I believe) are that we have headers for all external
classes that may interact with our code, and that the external objects export
the needed vtable symbols. With those, we should be able to use Clang to
build the list of valid addresses. I believe this should work with some
implementations of the standard library, for example.

If the external library does not have all headers available, or (say) if
it exposes classes defined in an anonymous namespace, we may not be able
to identify all the valid addresses. To handle those cases we may need to
develop some kind of blacklisting mechanism.

Chris Lattner

unread,
Jan 30, 2015, 2:07:39 AM1/30/15
to Peter Collingbourne, LLVM Dev

On Jan 29, 2015, at 6:50 PM, Peter Collingbourne <pe...@pcc.me.uk> wrote:

On Thu, Jan 29, 2015 at 02:22:48PM -0800, Peter Collingbourne wrote:
I've been working on a patch that implements the bitset attribute and bitset
lowering pass. I'll see if I can send it out later today.

http://reviews.llvm.org/D7288

Hi Peter,

I don’t think that adding an IR construct for this is the way to go.  You’re making IR complicated for everyone to serve a very narrow use case (one that admittedly I don’t really understand).  The fact that this affects semantics and will only work with LTO and not native linkers is also really weird to me.  Is there other precedent for that?  The only cases I know that affect LTO add information that is safe to drop (e.g. TBAA etc).

Also, your patch is incomplete.  Presumably you have to scatter checks for “if (!GV->isBitSet())” throughout the optimizer, codegen and other things that touch globals.

-Chris

Renato Golin

unread,
Jan 30, 2015, 7:39:40 AM1/30/15
to Chris Lattner, LLVM Dev
On 30 January 2015 at 07:04, Chris Lattner <clat...@apple.com> wrote:
> Also, your patch is incomplete. Presumably you have to scatter checks for
> “if (!GV->isBitSet())” throughout the optimizer, codegen and other things
> that touch globals.

I have a bad feeling about this... :)

--renato

Jim Grosbach

unread,
Jan 30, 2015, 1:05:48 PM1/30/15
to Peter Collingbourne, llv...@cs.uiuc.edu

> On Jan 29, 2015, at 7:35 PM, Peter Collingbourne <pe...@pcc.me.uk> wrote:
>
> On Thu, Jan 29, 2015 at 06:51:51PM -0800, Jim Grosbach wrote:
>> Hi Peter,
>>
>> Please forgive if this is an obvious question, but how reasonable is it for this approach to work when not all translation units are available to be rebuilt with the new information?
>>
>> I'm very interested in what you're proposing here.
>
> This technique works best if we can lay out the vtables in a specific order,
> but this is not necessarily a requirement. I can imagine an extension of the
> design where if the pointer is out of range we can fall back to a slow path
> that compares the vtable pointer against each element of a list of valid
> addresses that belong to external code.
>
> The hard requirements (I believe) are that we have headers for all external
> classes that may interact with our code, and that the external objects export
> the needed vtable symbols. With those, we should be able to use Clang to
> build the list of valid addresses. I believe this should work with some
> implementations of the standard library, for example.

Oof. That’s a deal breaker for any of the uses I was hoping for. Nuts. :(

Peter Collingbourne

unread,
Jan 30, 2015, 2:47:47 PM1/30/15
to Chris Lattner, LLVM Dev
Hi Chris,

I wanted to start by giving an explanation of what I am trying to achieve
and how I am trying to achieve it.

I am working towards introducing into LLVM a security mechanism, Forward
Control Flow Integrity (CFI), that is designed to mitigate against
vulnerabilities that allow attacks based on corrupting vtable or function
pointers in memory in order to subvert a program's control flow.

More details are in a paper that I co-authored that was presented at USENIX
Security last year [1]. As mentioned in the paper, attackers are increasingly
relying on such techniques to subvert control flow. This is why I feel that it
is particularly important that compilers contain practical defenses against
such attacks.

One particular variant of the defense I am proposing, vtable verification,
was implemented in GCC and described in section 3 of the paper, however it
comes with a significant performance overhead, more than 8% on certain Chrome
browser-based benchmarks and up to around 20% on SPEC 2006 benchmarks (see
Figure 2). This is likely due to the fact that it searches lists of vtables
to determine if a given vtable is valid. This is a direct consequence of
its avoidance of techniques that depend on changing how the program is linked.

The implementation I am proposing to contribute to LLVM focuses on
performance. Building the needed data structures at link time allows us
to reduce the cost of checking the validity of a vtable pointer to a few
machine instructions and a single load from memory.

On Thu, Jan 29, 2015 at 11:04:41PM -0800, Chris Lattner wrote:
> I don’t think that adding an IR construct for this is the way to go. You’re making IR complicated for everyone to serve a very narrow use case (one that admittedly I don’t really understand).

> Also, your patch is incomplete. Presumably you have to scatter checks for “if (!GV->isBitSet())” throughout the optimizer, codegen and other things that touch globals.

I'll address these points simultaneously because I would like to explain
why extensive support for the new construct is not required, and why the
maintenance burden is not as large as it might seem (and why my patch does
not need to make such extensive changes to the optimizers).

The first point I'd like to make is that from the point of view of optimizer
passes other than bitset lowering, the llvm.bitset.test intrinsic has opaque
semantics with regard to the content of the globals it references, so they
cannot legally modify the contents of the bitset global.

The second is that any use of a bitset global other than as an argument to
the llvm.bitset.test intrinsic has undefined semantics. (This is something
that can be documented better.) This means that any optimizer pass that
looks through global initializers does not require any changes, as any
transformation it may perform on IR that treats such globals as regular
globals (for example by taking its address or loading from it) is semantics
preserving, as the semantics of such IR would have been undefined anyway.

With these points in mind, I'm reasonably confident that very little code
needs to care about the new flag.

(I should also point out that I know that the patch most likely works without
any other optimizer changes, because I have a work-in-progress patch that
implements the Clang side of this, and have successfully applied it to a
large C++ codebase and found real bugs in that codebase.)

Regarding codegen, I haven't implemented support in codegen for bitsets yet,
the intrinsic is completely handled by the pass. I can't imagine the changes
being very intrusive though. We can easily add a check that no bitset stuff
makes it through to codegen for the moment.

> The fact that this affects semantics and will only work with LTO and not native linkers is also really weird to me.

I agree, which is why I plan to add support to lld and perhaps other linkers,
but we do have to start somewhere. Adding the functionality to the compiler
only seems like a reasonable first step, even if we depend on LTO to begin
with.

> Is there other precedent for that? The only cases I know that affect LTO add information that is safe to drop (e.g. TBAA etc).

There is the jumptable attribute, which has been used to implement a variant
of CFI for indirect function calls (see section 4 of the USENIX paper), and
that only works effectively with an LTO'd module. (We might end up adding
native linker support for that or something similar as well.)

Thanks,
--
Peter

[1] http://www.pcc.me.uk/~peter/acad/usenix14.pdf

JF Bastien

unread,
Jan 31, 2015, 2:37:53 PM1/31/15
to Peter Collingbourne, LLVM Dev
Trying to summarize all opinions expressed here: Peter is proposing an initial implementation that would only work with LTO. Folks seem put off by this implementation affecting IR without having proven itself, and having shortcomings (as Jim pointed out). Kostya proposed going through metadata (and Chris kind of did too by mentioning tbaa), but Peter points out that this will make the implementation trickier.

It sounds like going through metadata for the LTO-only proof-of-concept would be preferable, even if more complex. Peter, how more complex would that be?

As an interested user of this I'm happy with LTO, and I can help with the reviews of the code.

Peter Collingbourne

unread,
Jan 31, 2015, 5:10:34 PM1/31/15
to JF Bastien, LLVM Dev
On Sat, Jan 31, 2015 at 11:35:01AM -0800, JF Bastien wrote:
> Trying to summarize all opinions expressed here: Peter is proposing an
> initial implementation that would only work with LTO. Folks seem put off by
> this implementation affecting IR without having proven itself, and having
> shortcomings (as Jim pointed out). Kostya proposed going through metadata
> (and Chris kind of did too by mentioning tbaa), but Peter points out that
> this will make the implementation trickier.
>
> It sounds like going through metadata for the LTO-only proof-of-concept
> would be preferable, even if more complex. Peter, how more complex would
> that be?

I was just thinking about how the metadata-based design for this might work.

We could have a named metadata global containing the list of bitset entries.
Each entry would be a pair consisting of an identifier for the bitset (in
this case, the mangled name of the class) and a pointer into a global
(in this case, a valid vtable pointer).

For example, this class hierarchy:

class A { virtual void f(); };
class B : A { virtual void f(); };
class C : A { virtual void f(); };

would have these bitsets:

!llvm.bitsets = !{!0, !1, !2, !3, !4}

!0 = !{!"1A", i8* getelementptr inbounds ([3 x i8*]* @_ZTV1A, i32 0, i32 2)}
!1 = !{!"1A", i8* getelementptr inbounds ([3 x i8*]* @_ZTV1B, i32 0, i32 2)}
!2 = !{!"1A", i8* getelementptr inbounds ([3 x i8*]* @_ZTV1C, i32 0, i32 2)}
!3 = !{!"1B", i8* getelementptr inbounds ([3 x i8*]* @_ZTV1B, i32 0, i32 2)}
!4 = !{!"1C", i8* getelementptr inbounds ([3 x i8*]* @_ZTV1C, i32 0, i32 2)}

The LLVM linker can already merge the contents of named metadata globals.

The intrinsic for reading from bitsets would have this definition:

declare i1 @llvm.bitset.test(i8*, metadata)

and would be called like this:

%x = call i1 @llvm.bitset.test(i8* %p, metadata !{!"1A"})

The disadvantage of this design is that metadata strings always have global
scope, so identically named classes in anonymous namespaces in different
translation units would receive the same bitset. (We can't just use the
vtable globals as bitset identifiers, because they may be erased by globaldce.
There are probably various things we could do to force the globals to stick
around, if we wanted to go that way.)

Does that seem more reasonable to people?

Thanks,
--
Peter

JF Bastien

unread,
Jan 31, 2015, 5:19:14 PM1/31/15
to Peter Collingbourne, LLVM Dev

That's pretty unfortunate. Can you mangle anonymous namespace class names further for your purpose? With the module name?


Does that seem more reasonable to people?

Seems reasonable as a first step. It doesn't seem like anyone is opposed to trying it out, and adding to IR if the metadata approach shows this to be a desirable feature.

The one think we need to ensure is that your metadata can be dropped by the optimizer and the code remains correct. I'm guessing no vtable would mean anything goes (not check)? That's bad security-wise, but it'll at least work. We may want to make sure nothing gets dropped through a debug flag, so that we can compile Chrome and be confident that all the checks we want are there.

Peter Collingbourne

unread,
Jan 31, 2015, 5:48:49 PM1/31/15
to JF Bastien, LLVM Dev
Yes, I forgot that we could do that. Mangling with the source file name
would probably be enough for most purposes.

> Does that seem more reasonable to people?
> >
>
> Seems reasonable as a first step. It doesn't seem like anyone is opposed to
> trying it out, and adding to IR if the metadata approach shows this to be a
> desirable feature.
>
> The one think we need to ensure is that your metadata can be dropped by the
> optimizer and the code remains correct. I'm guessing no vtable would mean
> anything goes (not check)? That's bad security-wise, but it'll at least
> work. We may want to make sure nothing gets dropped through a debug flag,
> so that we can compile Chrome and be confident that all the checks we want
> are there.

So the flag would determine whether no bitset present means return 1 or
return 0? Sounds reasonable.

JF Bastien

unread,
Jan 31, 2015, 5:55:57 PM1/31/15
to Peter Collingbourne, LLVM Dev
> The one think we need to ensure is that your metadata can be dropped by the
> optimizer and the code remains correct. I'm guessing no vtable would mean
> anything goes (not check)? That's bad security-wise, but it'll at least
> work. We may want to make sure nothing gets dropped through a debug flag,
> so that we can compile Chrome and be confident that all the checks we want
> are there.

So the flag would determine whether no bitset present means return 1 or
return 0? Sounds reasonable.

I'd keep it generating the same code, but have a debug output when the debug flag is on. My reasoning is that this wouldn't happen if your feature were a first-class IR construct, but because it's in the let's-try-it-out metadata phase it needs to "just work" when metadata is dropped. Yours being a security-oriented feature we'd want to be able to sanity-check that things compiled as expected for users who care (like Chrome). Once the feature graduates to IR instead of metadata it would be expected to always be correct security-wise (never silently drop information).

Or maybe I'm being overly cautious when it comes to metadata's ability to be dropped?

Peter Collingbourne

unread,
Jan 31, 2015, 7:46:41 PM1/31/15
to JF Bastien, LLVM Dev
Okay, makes sense.

> Or maybe I'm being overly cautious when it comes to metadata's ability to
> be dropped?

As far as I am aware, only instruction-level metadata can be dropped entirely.
Module-level metadata stays mostly intact, but references to globals in
metadata can be replaced with nulls if the global is dropped.

The current behavior suits our purposes, but I can't find anywhere where any of
this behavior is specified, and it's entirely possible that it may change. The
flag that you propose would be a useful guard against that possibility.

Pete Cooper

unread,
Jan 31, 2015, 9:00:29 PM1/31/15
to Peter Collingbourne, LLVM Dev


Sent from my iPhone
I think too much would break if global named metadata just start being dropped. So for now at least it's fair to expect it's there.

For the global variables it would be fair to add them to the llvm.compiler.used list if you want to keep them around. Anyone opting in to the bitsets it likely ok with a few extra globals lasting until link time.

Thanks
Pete

Peter Collingbourne

unread,
Feb 3, 2015, 7:31:21 PM2/3/15
to Pete Cooper, LLVM Dev
Okay, sounds good. (FWIW, I don't think it is necessary to keep the vtable
globals around. In fact, all the better if globaldce can eliminate them,
so we can avoid allocating space for them in the bitset.)

I've updated http://reviews.llvm.org/D7288 with the new version of the
implementation that uses the metadata format I proposed earlier.

Sean Silva

unread,
Feb 3, 2015, 7:37:12 PM2/3/15
to Peter Collingbourne, llv...@cs.uiuc.edu
One other thing: if this can be used for control-flow integrity, I assume it has to have good knowledge of the available indirect call targets. Could we also use this information for devirtualization?

-- Sean Silva

Peter Collingbourne

unread,
Feb 3, 2015, 7:56:32 PM2/3/15
to Sean Silva, llv...@cs.uiuc.edu
On Tue, Feb 03, 2015 at 04:03:45PM -0800, Sean Silva wrote:
> One other thing: if this can be used for control-flow integrity, I assume
> it has to have good knowledge of the available indirect call targets. Could
> we also use this information for devirtualization?

I would expect so. If a bitset contains only one element, we should be able
to teach the lowering pass to simply test that the given pointer is equal
to that element (i.e. the only valid vptr). If the later IR branches to a
trapping block if that condition is false, it should be possible for the
optimizer to deduce that the condition is true in any code that is dominated
by the branch, and from that do devirtualization.

Sean Silva

unread,
Feb 3, 2015, 8:14:48 PM2/3/15
to Peter Collingbourne, llv...@cs.uiuc.edu
On Tue, Feb 3, 2015 at 4:13 PM, Peter Collingbourne <pe...@pcc.me.uk> wrote:
On Tue, Feb 03, 2015 at 04:03:45PM -0800, Sean Silva wrote:
> One other thing: if this can be used for control-flow integrity, I assume
> it has to have good knowledge of the available indirect call targets. Could
> we also use this information for devirtualization?

I would expect so. If a bitset contains only one element, we should be able
to teach the lowering pass to simply test that the given pointer is equal
to that element (i.e. the only valid vptr). If the later IR branches to a
trapping block if that condition is false, it should be possible for the
optimizer to deduce that the condition is true in any code that is dominated
by the branch, and from that do devirtualization.


Well, I was wondering about for devirtualization without necessarily needing any of the CFI stuff. We should probably aim for a design that provides a primitive that can satisfy the needs of both CFI and devirtualization.

-- Sean Silva
 

Thanks,
--
Peter

Peter Collingbourne

unread,
Feb 3, 2015, 8:35:45 PM2/3/15
to Sean Silva, llv...@cs.uiuc.edu
A client that doesn't need CFI could emit IR that looks like this:

%pred = call i1 @llvm.bitset.test(i8* %vptr, metadata !"some_bitset")
call void @llvm.assume(i1 %pred)
; make the call in the normal way

and from that we should be able to lower to the same type of IR for
single-element bitsets and make the same sort of deduction. (The lowering
pass could avoid emitting bitsets unless the call is used by something other
than a call to llvm.assume.)
Reply all
Reply to author
Forward
0 new messages