Oh, I just came across Function::hasAddressTaken. Maybe I can just use that instead?
_______________________________________________ LLVM Developers mailing list llvm...@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
-- John Criswell Assistant Professor Department of Computer Science, University of Rochester http://www.cs.rochester.edu/u/criswell
You could conservatively assume that any function that has its address taken has a pointer to it that escapes into memory or external code.
To make things a little more accurate, you could scan the uses of any function for which hasAddressTaken() returns true and see if any of its uses escapes its function or escapes into memory or external code. I believe hasAddressTaken() returns true if the function is subjected to a cast instruction, and functions are often casted if they are used in a call that uses a different signature than the function's declared signature.
To get anything more accurate, you'll need to use alias analysis or points-to analysis. DSA tracks function pointers in the heap and can tell you whether the function is called from external code. However, DSA's accuracy currently suffers if it is run after LLVM's optimizations, and the code needs some serious TLC.
> I need to track down all pointers anywhere in a module that could be pointing to functions (because some of the optimizations I want to do, require either identifying every use of a function, or conservatively identifying when such cannot be done).
>
> A starting point is to look at all the global variables:
>
> for (auto &G : M.globals())
> for (auto &V : G.operands())
> if (auto F = dyn_cast<Function>(V))
>
> Of course, instructions can also refer to functions, both as direct calls and otherwise:
>
> for (auto &F : M) {
> for (auto &I : inst_range(F)) {
> for (auto &V : I.operands())
> if (auto F = dyn_cast<Function>(V))
Your code is reminiscent of the traversal logic in tools/verify-uselistorder.
There's a chance you might be able to reuse the ValueMapping class from that project.
> But there are other things as well, for example it seems there is something called a personality function that can be a pointer to another function, so need to add that
>
> if (F.hasPersonalityFn())
>
> It seems there are other things called prefix data and prologue data, which are pointers to constants, which could contain pointers to functions, so would need to include those as well.
>
> Am I correct in thinking that prefix data and prologue data will not be included in the global variables list, so do need special handling?
Prefix data, prologue data, and function personalities do not need to be globals.
This will require special handling. The ValueMapping class has good skeleton code for this.
> Could they be recursive? That is, could those constants contain pointers to other constants... which end up containing pointers to functions... such that none of the intermediate constant objects are in the global variable list?
I don't see how this is possible. If it is, we'd need to file a PR against verify-uselistorder.
best
vedant
DSA was built when LLVM's optimizations maintained the type information on GEP and other instructions (DSA existed before LLVM was open-source). As such, it uses LLVM's type information to aid in its type-inference which, in turn, gives it field sensitivity which, in turn, improves its accuracy. Over time, LLVM optimizations have come to modify the type information so that it is just simple byte-level indexing (as opposed to array-of-structure indexing). DSA hasn't been updated to handle that well. That is why its precision is better pre-optimization than post-optimization.
Just out of curiosity, what are you trying to do? I need call graph analysis for C/C++ code with function pointers, and so I'm writing an NSF proposal to seek funding to do that (among other enhancements to my SVA infrastructure). If it's something that would be useful to you (or other LLVM community members), it would be useful for me to know that.
On Wed, Dec 23, 2015 at 5:35 PM, John Criswell <jtcr...@gmail.com> wrote:
DSA was built when LLVM's optimizations maintained the type information on GEP and other instructions (DSA existed before LLVM was open-source). As such, it uses LLVM's type information to aid in its type-inference which, in turn, gives it field sensitivity which, in turn, improves its accuracy. Over time, LLVM optimizations have come to modify the type information so that it is just simple byte-level indexing (as opposed to array-of-structure indexing). DSA hasn't been updated to handle that well. That is why its precision is better pre-optimization than post-optimization.
Ah! I don't suppose you could point to some examples of this? E.g. a simple test program such that one could eyeball the intermediate code before and after optimization?
Just out of curiosity, what are you trying to do? I need call graph analysis for C/C++ code with function pointers, and so I'm writing an NSF proposal to seek funding to do that (among other enhancements to my SVA infrastructure). If it's something that would be useful to you (or other LLVM community members), it would be useful for me to know that.
SVA?
I'm trying to write a superoptimizer that can optimize code based on a high-level understanding of what it's actually doing, so yes, call graph analysis that can deal with function pointers does seem likely to be one of the things that will be needed.
On 12/22/15 4:45 AM, Russell Wallace via llvm-dev wrote:
Oh, I just came across Function::hasAddressTaken. Maybe I can just use that instead?
You could conservatively assume that any function that has its address taken has a pointer to it that escapes into memory or external code.
On 12/22/15 4:45 AM, Russell Wallace via llvm-dev wrote:
Oh, I just came across Function::hasAddressTaken. Maybe I can just use that instead?
You could conservatively assume that any function that has its address taken has a pointer to it that escapes into memory or external code.
I wonder if a conservative estimate of pointed-to functions would also need to include all functions with externally visible linkage?
I could imagine a few routes by which that allows such a function to have its address show up in the data being handled by the Module's own code.