[LLVMdev] Walking thru CallGraph bottom up

438 views
Skip to first unread message

Simone Atzeni

unread,
Feb 24, 2015, 2:39:50 PM2/24/15
to llv...@cs.uiuc.edu
Hi all,

I would like to create a Pass that given an IR instruction walks starting from that instruction up to the main function
to identify all the functions call that have been made to call that instruction.

Is it possible? What kind of Pass should I create?

Thanks
Best,
Simone

Simone Atzeni
simo...@gmail.com
+1 (801) 696-8373


_______________________________________________
LLVM Developers mailing list
LLV...@cs.uiuc.edu http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

John Criswell

unread,
Feb 24, 2015, 3:33:43 PM2/24/15
to Simone Atzeni, llv...@cs.uiuc.edu
On 2/24/15 2:27 PM, Simone Atzeni wrote:
> Hi all,
>
> I would like to create a Pass that given an IR instruction walks starting from that instruction up to the main function
> to identify all the functions call that have been made to call that instruction.
>
> Is it possible? What kind of Pass should I create?

Yes, it is possible. I think a ModulePass would be most appropriate,
though a FunctionPass may be alright.

To get the call graph, you can use LLVM's CallGraph analysis. If you
need to handle function pointers more accurately than LLVM's internal
CallGraph analysis does, you can use DSA's CallGraph analysis (which has
the same interface but may only work with LLVM 3.2 and earlier LLVM
releases).

-- John T.

>
> Thanks
> Best,
> Simone
>
> Simone Atzeni
> simo...@gmail.com
> +1 (801) 696-8373
>
>
> _______________________________________________
> LLVM Developers mailing list
> LLV...@cs.uiuc.edu http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev


--
John Criswell
Assistant Professor
Department of Computer Science, University of Rochester
http://www.cs.rochester.edu/u/criswell

Simone Atzeni

unread,
Feb 25, 2015, 10:54:33 AM2/25/15
to John Criswell, llv...@cs.uiuc.edu
Thanks John.

I guess I will use a ModulePass, so when I am implementing the “runOnModule” function,
do I have to loop through all the functions, for each functions all the BasicBlocks and for each BasicBlock all the instructions
or given the Module I have to call the CallGraph directly?
Is there an example out there? I can’t find anything.

Thanks.
Simone

John Criswell

unread,
Feb 25, 2015, 11:04:26 AM2/25/15
to Simone Atzeni, llv...@cs.uiuc.edu
On 2/25/15 10:51 AM, Simone Atzeni wrote:
> Thanks John.
>
> I guess I will use a ModulePass, so when I am implementing the “runOnModule” function,
> do I have to loop through all the functions, for each functions all the BasicBlocks and for each BasicBlock all the instructions

If you know the Instruction, you can get it's basic block using
Instruction::getParent(), and then get its enclosing function using
BasicBlock::getParent().

Once you know the enclosing function, you can use the CallGraph pass to
find which functions call it, and then repeat the procedures for
functions calling that function, etc.

> or given the Module I have to call the CallGraph directly?
> Is there an example out there? I can’t find anything.

It uses the DSA CallGraph pass, but
http://llvm.org/viewvc/llvm-project/safecode/branches/release_32/lib/InsertPoolChecks/CFIChecks.cpp?revision=189030&view=markup
might provide a decent example.

Regards,

John Criswell

Simone Atzeni

unread,
Feb 26, 2015, 5:55:11 PM2/26/15
to John Criswell, llv...@cs.uiuc.edu
I think I got it and the example is pretty, however for what I understand, I can get the CallGraphNode for a given function F that has a list that represents all the functions that F is calling, but how can I get the list of the functions that are calling F? There is not a sort a similar list?

Thanks.
Simone

Kevin Hu

unread,
Feb 26, 2015, 10:17:31 PM2/26/15
to simo...@gmail.com, llv...@cs.uiuc.edu
Hi Simon,
 
From: Simone Atzeni <simo...@gmail.com>
To: John Criswell <jtcr...@gmail.com>
Cc: llv...@cs.uiuc.edu
Subject: Re: [LLVMdev] Walking thru CallGraph bottom up
Message-ID: <318EBA41-2040-4EFE...@gmail.com>
Content-Type: text/plain; charset="windows-1252"

I think I got it and the example is pretty, however for what I understand, I can get the CallGraphNode for a given function F that has a list that represents all the functions that F is calling, but how can I get the list of the functions that are calling F? There is not a sort a similar list?

After doing a simple search inside the LLVM documents, I found no so such data structures or methods for finding the callers of a CallGraphNode.

However, since your problem is to find the path from main to the target function in the CallGraph, I think a depth-first-search from the main node might work. Following is the algorithm I have in mind.

1. You can keep a list of functions in your search path, pushing to the list each time you visit a new node.
2. If you find the one you currently have is a wrong path (you reach a leaf or a visited node), you pop out all the functions in the wrong path from your list.
3. Search in another path, until you hit the node you're looking for. 
4. And then you can have a call path from main to your function in the list.

Note that you may need a DFS on the entire CallGraph if you want all possible call paths from main to your function.

IMHO, by getting a list of callers of a target function doesn't actually help simplify the problem a lot. You still need a  DFS or BFS from the node to find the main function node.

Hope this helps a little.

Apologies if you find any format, layout or etiquette problems in my mail. This is the first time I write to a mailing list.

Thanks.

---
Regards,
Kevin Hu
 
Thanks.
Simone

> On Feb 25, 2015, at 09:01, John Criswell <jtcr...@gmail.com> wrote:
>
> On 2/25/15 10:51 AM, Simone Atzeni wrote:
>> Thanks John.
>>
>> I guess I will use a ModulePass, so when I am implementing the ?runOnModule? function,

>> do I have to loop through all the functions, for each functions all the BasicBlocks and for each BasicBlock all the instructions
>
> If you know the Instruction, you can get it's basic block using Instruction::getParent(), and then get its enclosing function using BasicBlock::getParent().
>
> Once you know the enclosing function, you can use the CallGraph pass to find which functions call it, and then repeat the procedures for functions calling that function, etc.
>
>> or given the Module I have to call the CallGraph directly?
> http://www.cs.rochester.edu/u/criswell <http://www.cs.rochester.edu/u/criswell>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.cs.uiuc.edu/pipermail/llvmdev/attachments/20150226/51760124/attachment-0001.html>
d of LLVMdev Digest, Vol 128, Issue 111
*****************************************



Yours,
Kevin Hu

John Criswell

unread,
Feb 27, 2015, 12:03:39 AM2/27/15
to Kevin Hu, simo...@gmail.com, llv...@cs.uiuc.edu
Dear Simon,

Kevin is correct; as far as I can tell, there is no method of getting the functions calling a given function.  Instead, you have to start at the main() function and search for the function using a depth-first or breadth-first search.

What may make sense is to build a new data structure that has nodes that point from callees to callers once and then use that for your queries.  That way, you're not researching the CallGraph all the time.

One important note is that the CallGraph has an "external node" which represents all unresolved calls (e.g., indirect function calls).  Technically, you need to start searching from this node as well as the main() node (as a program can call out to external code which then calls back into the program; think of qsort() as an example).

Regards,

John Criswell

Simone Atzeni

unread,
Feb 27, 2015, 12:55:17 AM2/27/15
to John Criswell, Kevin Hu, llv...@cs.uiuc.edu
Hi guys,

thank you so much for your great answers.
I kept looking for such a data structure but there is no such a thing as you confirmed.
But I think I have all the info I need thanks to your suggestions.
If I get something working I will share it!

Thanks again!
Cheers,
Simone

Simone Atzeni

unread,
Mar 2, 2015, 2:15:57 AM3/2/15
to Herbie Robinson, Kevin Hu, llv...@cs.uiuc.edu
Hi Herbie,

thanks for you answer and explanation.

>
> Also, if any of the functions are external, you are completely stuck (unless you put everything together with lld).


I am indeed having a problem regarding external function.
I my program is just one file everything work and I can access all the functions.
However, if it has multiple files I have a lot of unresolved pointers to external functions.
I am creating separately all the byte code files for each file of my program.
Then I link all of them together with llvm-link, I thought it was enough but actually I can not access the functions that are in other files even if I am link all of them together.

Do you have any idea/suggestion how to solve this problem?

Thanks!
Best,
Simone


> On Feb 28, 2015, at 14:30, Herbie Robinson <HerbieR...@verizon.net> wrote:
>
> On 2/24/15 2:27 PM, Simone Atzeni wrote:
>> Hi all,
>>
>> I would like to create a Pass that given an IR instruction walks starting from that instruction up to the main function
>> to identify all the functions call that have been made to call that instruction.
>>
>> Is it possible? What kind of Pass should I create?
>>
> Technically, that's not a pass, because a pass is an algorithm for going through all the Instructions.
>
> What you want to do is only possible to do for internal functions if there are no function variables in play.
>
> The Instruction class has a method to get to the basic block which has a method to get to the Funciton. In both cases, the method is called getParent(). Once you get the Function, you can find all the internal uses using the use chain from the Function.
>
> Start looking at the user class here:
>
> http://llvm.org/doxygen/classllvm_1_1User.html
>
> When you go through the user chain, you will find all sorts of instructions and possibly other things. If you find anything other than a call Instruction, you have hit upon a case that's more difficult to handle (related to using function pointer). If you run whatever analysis you are trying to do after all the optimization passes, value propagation may have covered some of the function pointer cases; so, you want to recurse through PHI nodes, too.
>
> Also, if any of the functions are external, you are completely stuck (unless you put everything together with lld).

John Criswell

unread,
Mar 2, 2015, 9:39:03 AM3/2/15
to Simone Atzeni, Herbie Robinson, Kevin Hu, llv...@cs.uiuc.edu
On 3/2/15 2:12 AM, Simone Atzeni wrote:
> Hi Herbie,
>
> thanks for you answer and explanation.
>
>> Also, if any of the functions are external, you are completely stuck (unless you put everything together with lld).
>
> I am indeed having a problem regarding external function.
> I my program is just one file everything work and I can access all the functions.
> However, if it has multiple files I have a lot of unresolved pointers to external functions.
> I am creating separately all the byte code files for each file of my program.
> Then I link all of them together with llvm-link, I thought it was enough but actually I can not access the functions that are in other files even if I am link all of them together.
>
> Do you have any idea/suggestion how to solve this problem?

Obviously, if you want to analyze the call graph of an entire program,
you need to collect the information from across compilation units. In
LLVM, the best way to do that is to link the bitcode files together into
a single bitcode file. This can either by done using llvm-link (which
requires changing the Makefiles of most programs) or by adding your
passes into libLTO (which is more work but allows you to do
whole-program analysis with little/no modification to program Makefiles).

Of course, if you have external library code (e.g., libc) that is not in
LLVM bitcode format, you can't analyze it using an LLVM pass. One
option is to know what these functions do (i.e., treat them as a special
case). Another option would be to try to convert them into LLVM bitcode
format using Revgen or BAP, but I'm not sure how well that works.

Most analyses understand the semantics of the C library function so that
they can treat them specially and then assume that any function that has
its address taken or has externally visible linkage can be called by
external library code. The "external node" in the CallGraph analysis
should state that it can call any function with externally visible linkage.

Regards,

John Criswell

>
> Thanks!
> Best,
> Simone
>
>
>> On Feb 28, 2015, at 14:30, Herbie Robinson <HerbieR...@verizon.net> wrote:
>>
>> On 2/24/15 2:27 PM, Simone Atzeni wrote:
>>> Hi all,
>>>
>>> I would like to create a Pass that given an IR instruction walks starting from that instruction up to the main function
>>> to identify all the functions call that have been made to call that instruction.
>>>
>>> Is it possible? What kind of Pass should I create?
>>>
>> Technically, that's not a pass, because a pass is an algorithm for going through all the Instructions.
>>
>> What you want to do is only possible to do for internal functions if there are no function variables in play.
>>
>> The Instruction class has a method to get to the basic block which has a method to get to the Funciton. In both cases, the method is called getParent(). Once you get the Function, you can find all the internal uses using the use chain from the Function.
>>
>> Start looking at the user class here:
>>
>> http://llvm.org/doxygen/classllvm_1_1User.html
>>
>> When you go through the user chain, you will find all sorts of instructions and possibly other things. If you find anything other than a call Instruction, you have hit upon a case that's more difficult to handle (related to using function pointer). If you run whatever analysis you are trying to do after all the optimization passes, value propagation may have covered some of the function pointer cases; so, you want to recurse through PHI nodes, too.
>>
>> Also, if any of the functions are external, you are completely stuck (unless you put everything together with lld).


--
John Criswell
Assistant Professor
Department of Computer Science, University of Rochester
http://www.cs.rochester.edu/u/criswell

Simone Atzeni

unread,
Mar 2, 2015, 11:01:04 AM3/2/15
to John Criswell, llv...@cs.uiuc.edu, Kevin Hu, Herbie Robinson

Obviously, if you want to analyze the call graph of an entire program, you need to collect the information from across compilation units.  In LLVM, the best way to do that is to link the bitcode files together into a single bitcode file.  This can either by done using llvm-link (which requires changing the Makefiles of most programs) or by adding your passes into libLTO (which is more work but allows you to do whole-program analysis with little/no modification to program Makefiles).

Hi John,

this is actually what I am doing.
I have a simple program that has a main that call 2 functions.
Those 2 functions are in two different .c files.
So I create the BC file with clang for the 3 files.
I link them together with llvm-link and then I disassemble it.
However when I run my pass it still can’t see the external calls.

Why is that?

Thanks.
Simone
Reply all
Reply to author
Forward
0 new messages