[llvm-dev] Implementing Data Flow Integrity

43 views
Skip to first unread message

Fee via llvm-dev

unread,
Jul 31, 2016, 1:22:30 PM7/31/16
to llvm...@lists.llvm.org
Dear all,

I want to implement a pass that provides some kind of data flow
integrity similar to Write Integrity Testing
(https://www.doc.ic.ac.uk/~cristic/papers/wit-sp-ieee-08.pdf).

This approach statically determines for each memory write the
(conservative, overapproximated) points-to set of locations that can be
written by the instruction. Further, it instruments the memory write
instruction to prevent a write to a location not in the points-to set.

How can I get the points-to set, including locations from
stack/heap/static variables?
How do I approach this problem in general?
I am new to LLVM.

Thank you!

Regards,
– Fredi

_______________________________________________
LLVM Developers mailing list
llvm...@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

John Criswell via llvm-dev

unread,
Aug 1, 2016, 3:59:13 PM8/1/16
to Fee, llvm...@lists.llvm.org
On 7/31/16 11:48 AM, Fee via llvm-dev wrote:
> Dear all,
>
> I want to implement a pass that provides some kind of data flow
> integrity similar to Write Integrity Testing
> (https://www.doc.ic.ac.uk/~cristic/papers/wit-sp-ieee-08.pdf).


>
> This approach statically determines for each memory write the
> (conservative, overapproximated) points-to set of locations that can be
> written by the instruction. Further, it instruments the memory write
> instruction to prevent a write to a location not in the points-to set.

Correct. I would also point out that their use of Anderson's analysis
is (most likely) unnecessary. Because they unify points-to sets before
instrumenting, they are modifying the end-result of the inclusion-based
analysis to be what unification-based points-to analysis would have
computed. It is not clear to me that anything can be gained by using
inclusion-based analysis over unification-based analysis.

>
> How can I get the points-to set, including locations from
> stack/heap/static variables?
> How do I approach this problem in general?

To the best of my knowledge, the existing LLVM alias analysis passes
only provide a mod/ref and aliasing query interface. I don't believe
they provide a shape graph or points-to sets that can be easily used.
You might want to check CFL-AA to see what it provides, but I have
personally never used it.

You could use DSA located in the poolalloc project which provides a
shape graph. The original SAFECode essentially did what WIT does
(except that it also protected memory reads and used a very different
run-time check mechanism, plus it could optimize away provably type-safe
checks). SAFECode used DSA's shape graphs to segregate the heap, find
points-to sets, and learn memory object type information. However, in
its current shape, you'd need to run DSA prior to most LLVM
optimizations to get good field sensitivity. Otherwise, DSA will lose
field sensitivity and provide poor precision in its results.

As I need something similar for my research work, my research group will
be working on either improving or replacing DSA. However, it'll be
awhile, so if you need something now, either CFL-AA or DSA will be your
best bet.

> I am new to LLVM.

Welcome to the club.

Regards,

John Criswell

>
> Thank you!
>
> Regards,
> – Fredi
>
> _______________________________________________
> LLVM Developers mailing list
> llvm...@lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev


--
John Criswell
Assistant Professor
Department of Computer Science, University of Rochester
http://www.cs.rochester.edu/u/criswell

Fee via llvm-dev

unread,
Aug 2, 2016, 9:03:06 PM8/2/16
to John Criswell, llvm...@lists.llvm.org
Hi John Criswell,

Thank you for your helpful answer.

I think CFL Alias Analysis is not the right way to go because it seems
to avoid building whole points-to sets to be more efficient (correct me
if i am wrong). So, it would trade compile-time performance for
runtime-security, which is bad.

DSA from poolalloc seems promising, I need to check it.

Do you think that an implementation of Andersen's pointer analysis
(like https://github.com/grievejia/andersen) would work?
It seems to be not field-sensitive. Do you have any clue how hard it is
to make an existing analysis field-sensitive?

Regards
—Fredi

Daniel Berlin via llvm-dev

unread,
Aug 2, 2016, 9:12:16 PM8/2/16
to Fee, llvm-dev
On Tue, Aug 2, 2016 at 5:57 PM, Fee via llvm-dev <llvm...@lists.llvm.org> wrote:
Hi John Criswell,

Thank you for your helpful answer.

I think CFL Alias Analysis is not the right way to go because it seems
to avoid building whole points-to sets to be more efficient (correct me
if i am wrong).

It does not build points-to sets, but does answer all the aliasing queries you can normally ask of LLVM.
It will answer all of those queries as well as any other anderson's impl will.

Nothing in LLVM asks or answers "what is the set of things this pointer points-to". 
 

So, it would trade compile-time performance for
runtime-security, which is bad.

This is not correct, in the sense that CFL-Anders will give precisely the same answers any other andersen's impl will give about aliasing.
 

DSA from poolalloc seems promising, I need to check it.

 Do you think that an implementation of Andersen's pointer analysis
(like https://github.com/grievejia/andersen) would work?
That one ignores the effect of certain instructions.
 
It seems to be not field-sensitive. Do you have any clue how hard it is
to make an existing analysis field-sensitive?

For LLVM, it requires changes to both the type of constraints one handles, constraint building, constraint optimization, and constraint solving.

It is non-trivial unless you are very familiar with andersens and it's various implementation techniques.
Reply all
Reply to author
Forward
0 new messages