[llvm-dev] Existing studies on the benefits of pointer analysis

227 views
Skip to first unread message

Jia Chen via llvm-dev

unread,
Mar 15, 2016, 4:38:52 PM3/15/16
to llvm...@lists.llvm.org
Dear llvm devs,

tl;dr: What prevents llvm from switching to a fancier pointer analysis?

Currently, there exists a variety of general-purpose alias analyses in the LLVM codebase: basic-aa, globalsmodref-aa, tbaa, scev-aa, and cfl-aa. However, only the first three are actually turned on when invoking clang with -O2 or -O3 (please correct me if I'm wrong about this).

If one looks at existing research literatures, there are even more algorithm to consider for doing pointer analysis. Some are field-sensitive, some are field-based, some are flow-sensitive, some are context-sensitive. Even for flow-insensitive ones, they could also be inclusion-style (-andersen-aa) and equality-style (-steens-aa and -ds-aa). Those algorithms are often backed up by rich theoretical framework as well as preliminary evaluations which demonstrate their superior precision and/or performance.

Given such an abundance choices of pointer analyses that seem to be much better in the research land, why does real-world compiler infrastructures like llvm still rely on those three simple (and ad-hoc) ones to perform IR optimization? Based on my understanding (and again please correct me if I am wrong):

(1) The minor reason: those "better" algorithms are very hard to implement in a robust way and nobody seems to be interested in trying to write and maintain them.
(2) The major reason: it's not clear whether those "better" algorithms are actually better for llvm. More precise pointer analyses tend to slow down compile time a lot while contributing too little to the optimization passes that use them. The benefit one gets from a more precise analysis may not justify the compile-time or the maintenance cost.

So my question here is: what kind(s) of precision really justify the cost and what kinds do not? Has anybody done any study in the past to evaluate what kinds of features in pointer analyses will benefit what kinds of optimization passes? Could there potentially be more improvement on pointer analysis precision without adding too much compile-time/maintenance cost? Has the precision/performance tradeoffs got fully explored before?

Any pointers will be much appreciated. No pun intended :)

PS1: To be more concrete, what I am looking for is not some black-box information like "we switched from basic-aa to cfl-aa and observed 1% improvement at runtime". I believe white-box studies such as "the licm pass failed to hoist x instructions because -tbaa is not flow sensitive" are much more interesting for understanding the problem here.

PS2: If no such evaluation exists in the past, I'd happy to do that myself and report back my findings if anyone here is interested.

--
Best Regards,

--
Jia Chen
Department of Computer Science
University of Texas at Austin

Christian Convey via llvm-dev

unread,
Mar 21, 2016, 10:32:19 AM3/21/16
to Jia Chen, llvm-dev
Hi Jia,

If one looks at existing research literatures, there are even more algorithm to consider for doing pointer analysis.

For at least some published AA algorithms, there may be some uncertainty about software patents and/or copyright.  

At one point I was interested in the status of this AA implementation by Lian Li et al.  IIRC, when I contacted Lian to ask if there was any chance of getting it into LLVM, IIRC she said that her employer wouldn't promise to relinquish all possible claims it had on that algorithm's IP.  So unfortunately, at least in the U.S., an algorithm being published in an academic journal doesn't remove all legal risk associated with using it.

Also, speaking from my own experience, even when an AA algorithm seems to be described in great detail in some piece of literature (e.g., a phd thesis), there can still be a lot of details which are glossed over, or which seem clear when reading the document but which get a lot more confusing when one tries to actually implement it.  

That can make implementing such an algorithm take far longer than one would expect based on just reading the document.  (It's also an argument in favor of requiring academic papers which describe the behavior of a software implementation to actually include a working copy of the source code, IMHO.)

So my question here is: what kind(s) of precision really justify the cost and what kinds do not? Has anybody done any study in the past to evaluate what kinds of features in pointer analyses will benefit what kinds of optimization passes?

At one point I discussed this with Daniel Berlin, but I'm having trouble find a record of the conversation.  IIRC, he says that he once threw a huge amount of computing power at doing a full context-sensitive AA on some software, and the speedup he observed in the resulting program as underwhelming (10-15%?).  

I can't remember if that was with GCC or LLVM.  That result is a data point, although it may not say much about how much additional speedup could be realized if the algorithms which use the AA results were themselves adapted to capitalize on fully context-sensitive, flow-sensitive, hula-dancer-on-the-dashboard AA results.

Cheers,
Christian

Jia Chen via llvm-dev

unread,
Mar 21, 2016, 11:57:10 AM3/21/16
to Christian Convey, llvm-dev
Hi Christian,

Thank you so much for the reply! Please see my comments inline.


On 03/21/2016 09:32 AM, Christian Convey wrote:
Hi Jia,

If one looks at existing research literatures, there are even more algorithm to consider for doing pointer analysis.

For at least some published AA algorithms, there may be some uncertainty about software patents and/or copyright.  

At one point I was interested in the status of this AA implementation by Lian Li et al.  IIRC, when I contacted Lian to ask if there was any chance of getting it into LLVM, IIRC she said that her employer wouldn't promise to relinquish all possible claims it had on that algorithm's IP.  So unfortunately, at least in the U.S., an algorithm being published in an academic journal doesn't remove all legal risk associated with using it.
This is news to me. Thanks for sharing it.


Also, speaking from my own experience, even when an AA algorithm seems to be described in great detail in some piece of literature (e.g., a phd thesis), there can still be a lot of details which are glossed over, or which seem clear when reading the document but which get a lot more confusing when one tries to actually implement it.  

That can make implementing such an algorithm take far longer than one would expect based on just reading the document.  (It's also an argument in favor of requiring academic papers which describe the behavior of a software implementation to actually include a working copy of the source code, IMHO.)
My personal experience also coincides. And even if the paper does come with an artifact or source codes, they are usually proof-of-concept implementations with lots of important real-world corner cases ignored.

So my question here is: what kind(s) of precision really justify the cost and what kinds do not? Has anybody done any study in the past to evaluate what kinds of features in pointer analyses will benefit what kinds of optimization passes?

At one point I discussed this with Daniel Berlin, but I'm having trouble find a record of the conversation.  IIRC, he says that he once threw a huge amount of computing power at doing a full context-sensitive AA on some software, and the speedup he observed in the resulting program as underwhelming (10-15%?). 
I kind of expect that. As you mentioned later, most optimization passes work in a context-insensitive manner (i.e. they won't clone a function and optimize differently on different clones). Context sensitivity on the pointer analysis side is probably not going to help a lot if the client cannot fully capitalize on it. In the settings of compiler optimization, my guess is that flow sensitivity, field sensitivity, heap model and external library annotations are the four aspects that are likely to matter.

I did some preliminary experiments with licm on c programs over the last weekend. I chose licm because intuitively that's the optimization that could have the biggest performance impact. The result suggested that tbaa, cfl-aa, scev-aa and globals-aa yields very little additional benefits over basic-aa. Again, both the methodology and benchmark selection are very immature and the results need to be double-checked, but my hope is that by looking at how aa algorithms and their clients interact I may be able to get some hints on what kind of aa a compiler really wants.

I can't remember if that was with GCC or LLVM.  That result is a data point, although it may not say much about how much additional speedup could be realized if the algorithms which use the AA results were themselves adapted to capitalize on fully context-sensitive, flow-sensitive, hula-dancer-on-the-dashboard AA results.

Cheers,
Christian


Daniel Berlin via llvm-dev

unread,
Mar 21, 2016, 12:06:03 PM3/21/16
to Jia Chen, llvm-dev
On Tue, Mar 15, 2016 at 1:37 PM, Jia Chen via llvm-dev <llvm...@lists.llvm.org> wrote:
Dear llvm devs,

tl;dr: What prevents llvm from switching to a fancier pointer analysis?

Nothing.
 

Currently, there exists a variety of general-purpose alias analyses in the LLVM codebase: basic-aa, globalsmodref-aa, tbaa, scev-aa, and cfl-aa. However, only the first three are actually turned on when invoking clang with -O2 or -O3 (please correct me if I'm wrong about this).

This is correct.
Eventually, i hope george will have time to get back to CFL-AA and turn it on by default.
 

If one looks at existing research literatures, there are even more algorithm to consider for doing pointer analysis. Some are field-sensitive, some are field-based, some are flow-sensitive, some are context-sensitive. Even for flow-insensitive ones, they could also be inclusion-style (-andersen-aa) and equality-style (-steens-aa and -ds-aa). Those algorithms are often backed up by rich theoretical framework as well as preliminary evaluations which demonstrate their superior precision and/or performance.

CFL-AA is a middle ground between steens and anders, can be easily made field and context sensitive, etc.
 

Given such an abundance choices of pointer analyses that seem to be much better in the research land, why does real-world compiler infrastructures like llvm still rely on those three simple (and ad-hoc) ones to perform IR optimization?

Time and energy.
 
Based on my understanding (and again please correct me if I am wrong):

(1) The minor reason: those "better" algorithms are very hard to implement in a robust way and nobody seems to be interested in trying to write and maintain them.

This is false.  Heck, at the time i implemented it in GCC, field-sensitive andersen's analysis was unknown in production compilers.  That's why i'm thanked in all the papers - i did the engineering work to make it fast and reliable.

 
(2) The major reason: it's not clear whether those "better" algorithms are actually better for llvm. More precise pointer analyses tend to slow down compile time a lot while contributing too little to the optimization passes that use them. The benefit one gets from a more precise analysis may not justify the compile-time or the maintenance cost.


CFL-AA is probably the right trade-off here. You can stop at any time and have correct answers, you can be as lazy as you like.
etc.

The reality is i think you overlook the realistic answer:

3. Nobody has had time or energy to fix up CFL-AA or SCEV-AA. They spend their time on lower-hanging fruit until that lower hanging fruit is gone.

IE For the moment, CFL-AA and SCEV-AA and ... are not the thing holding llvm back.

 

So my question here is: what kind(s) of precision really justify the cost and what kinds do not?

Depends entirely on your applications.
 
Has anybody done any study in the past to evaluate what kinds of features in pointer analyses will benefit what kinds of optimization passes?
Yes.
Chris did many years ago, and i've done one more recently.
 
Could there potentially be more improvement on pointer analysis precision without adding too much compile-time/maintenance cost?
Yes.

Has the precision/performance tradeoffs got fully explored before?
Yes 

Any pointers will be much appreciated. No pun intended :)

PS1: To be more concrete, what I am looking for is not some black-box information like "we switched from basic-aa to cfl-aa and observed 1% improvement at runtime". I believe white-box studies such as "the licm pass failed to hoist x instructions because -tbaa is not flow sensitive" are much more interesting for understanding the problem here.

White-box studies are very application specific, and often very pass specific.
 

PS2: If no such evaluation exists in the past, I'd happy to do that myself and report back my findings if anyone here is interested.
I don't think any of the world is set up to make that valuable.

Nothing takes advantage of context sensitivity, flow sensitivity, etc.
 

Daniel Berlin via llvm-dev

unread,
Mar 21, 2016, 12:13:09 PM3/21/16
to Christian Convey, llvm-dev, Jia Chen
On Mon, Mar 21, 2016 at 7:32 AM, Christian Convey via llvm-dev <llvm...@lists.llvm.org> wrote:
Hi Jia,

If one looks at existing research literatures, there are even more algorithm to consider for doing pointer analysis.

For at least some published AA algorithms, there may be some uncertainty about software patents and/or copyright.  

At one point I was interested in the status of this AA implementation by Lian Li et al.  IIRC, when I contacted Lian to ask if there was any chance of getting it into LLVM, IIRC she said that her employer wouldn't promise to relinquish all possible claims it had on that algorithm's IP.  So unfortunately, at least in the U.S., an algorithm being published in an academic journal doesn't remove all legal risk associated with using it.

I wouldn't worry about this part.  I'm a pointer analysis guy and an IP lawyer.
I'm pretty careful about what algorithms we end up with in LLVM :)
 

Also, speaking from my own experience, even when an AA algorithm seems to be described in great detail in some piece of literature (e.g., a phd thesis), there can still be a lot of details which are glossed over, or which seem clear when reading the document but which get a lot more confusing when one tries to actually implement it.  


Yes, i had a blog post on this one, which was basically titled "most pointer analysis research is bullshit".  People mostly do research implementations, and ignore little things like "the effect of external function calls" (or worse "stub all of them"), and yes, implementing those things significantly changes the time bounds.  Or they do things like tell me that field-sensitivity slows nothing down because they are working on java, where you can't take the address of fields :)

Over the years, you get a good eye for what will end up practical.

GCC's implementation of andersen's, which uses hardekopf's research and work, is *very* fast in both Intra and inter procedural mode, field sensitive, and handles all issues.
  
Adding context sensitivity to it would be expensive, however.


That can make implementing such an algorithm take far longer than one would expect based on just reading the document.  (It's also an argument in favor of requiring academic papers which describe the behavior of a software implementation to actually include a working copy of the source code, IMHO.)

Yes.
This is one of the reasons's i always liked ben's research so much. He published all code and benchmarks.

Note also that you a lot of them do have source code, you just have to look really hard ;)
 

So my question here is: what kind(s) of precision really justify the cost and what kinds do not? Has anybody done any study in the past to evaluate what kinds of features in pointer analyses will benefit what kinds of optimization passes?

At one point I discussed this with Daniel Berlin, but I'm having trouble find a record of the conversation.  IIRC, he says that he once threw a huge amount of computing power at doing a full context-sensitive AA on some software, and the speedup he observed in the resulting program as underwhelming (10-15%?).  
Yes.  But see below.
 

I can't remember if that was with GCC or LLVM.  That result is a data point, although it may not say much about how much additional speedup could be realized if the algorithms which use the AA results were themselves adapted to capitalize on fully context-sensitive, flow-sensitive, hula-dancer-on-the-dashboard AA results.

Note however that this is going to be true of any new AA algorithm.   You have to have the infrastructure necessary to make use of it, you have to tune optimizations to use the information well, etc.

Realistically, getting the infrastructure ready and tuning it is a year or two of work, at least (for one person).

As i mentioned, at this point, there is still much lower hanging fruit.  When there isn't, i suspect we'll get back to AA.


Daniel Berlin via llvm-dev

unread,
Mar 21, 2016, 1:17:31 PM3/21/16
to Jia Chen, llvm-dev


On Mon, Mar 21, 2016, 10:00 AM Jia Chen <jc...@cs.utexas.edu> wrote:
Hi Daniel,


On 03/21/2016 11:05 AM, Daniel Berlin wrote:


On Tue, Mar 15, 2016 at 1:37 PM, Jia Chen via llvm-dev <llvm...@lists.llvm.org> wrote:
Dear llvm devs,

tl;dr: What prevents llvm from switching to a fancier pointer analysis?

Nothing.
 

Currently, there exists a variety of general-purpose alias analyses in the LLVM codebase: basic-aa, globalsmodref-aa, tbaa, scev-aa, and cfl-aa. However, only the first three are actually turned on when invoking clang with -O2 or -O3 (please correct me if I'm wrong about this).

This is correct.
Eventually, i hope george will have time to get back to CFL-AA and turn it on by default.
 

If one looks at existing research literatures, there are even more algorithm to consider for doing pointer analysis. Some are field-sensitive, some are field-based, some are flow-sensitive, some are context-sensitive. Even for flow-insensitive ones, they could also be inclusion-style (-andersen-aa) and equality-style (-steens-aa and -ds-aa). Those algorithms are often backed up by rich theoretical framework as well as preliminary evaluations which demonstrate their superior precision and/or performance.

CFL-AA is a middle ground between steens and anders, can be easily made field and context sensitive, etc.
 

Given such an abundance choices of pointer analyses that seem to be much better in the research land, why does real-world compiler infrastructures like llvm still rely on those three simple (and ad-hoc) ones to perform IR optimization?

Time and energy.
 
Based on my understanding (and again please correct me if I am wrong):

(1) The minor reason: those "better" algorithms are very hard to implement in a robust way and nobody seems to be interested in trying to write and maintain them.

This is false.  Heck, at the time i implemented it in GCC, field-sensitive andersen's analysis was unknown in production compilers.  That's why i'm thanked in all the papers - i did the engineering work to make it fast and reliable.

 
(2) The major reason: it's not clear whether those "better" algorithms are actually better for llvm. More precise pointer analyses tend to slow down compile time a lot while contributing too little to the optimization passes that use them. The benefit one gets from a more precise analysis may not justify the compile-time or the maintenance cost.


CFL-AA is probably the right trade-off here. You can stop at any time and have correct answers, you can be as lazy as you like.
etc.

Regarding CFL-AA: in my understanding, cfl-aa does not introduce a new precision tradeoff.

You can make it do what you want much easier than existing frameworks in my experience.

It is merely a demand-driven way of implementing existing analyses, by extending those algorithms to track additional "pointed-to-by" information. Laziness may help with the running time of the cfl analysis when only partial points-to info is needed, but if the client wants to do a whole-program analysis and require whole-program points-to info (which is usually true for optimizing compilers since they will eventually examine and touch every piece of the codes given to it), should cfl-aa be no different than traditional whatever-sensitive pointer analysis?

CFL, at least when I ran the numbers, was faster at all pairs than existing analysis.



The reality is i think you overlook the realistic answer:

3. Nobody has had time or energy to fix up CFL-AA or SCEV-AA. They spend their time on lower-hanging fruit until that lower hanging fruit is gone.

IE For the moment, CFL-AA and SCEV-AA and ... are not the thing holding llvm back.

I'd love to hear some examples of "lower-hanging fruit" in LLVM, especially in the area of middle-end analyses and optimizations. I thought LLVM is mature enough that any obvious chances of improvement in analyses and optimization have already been taken, no?

No.
For example, gvn and pre are fairly simple implementations that miss obvious optimizations.

 

So my question here is: what kind(s) of precision really justify the cost and what kinds do not?

Depends entirely on your applications.
 
Has anybody done any study in the past to evaluate what kinds of features in pointer analyses will benefit what kinds of optimization passes?
Yes.
Chris did many years ago, and i've done one more recently.

Great! Are they published somewhere? Can the data be shared somehow?

No, and sadly, no
 
Could there potentially be more improvement on pointer analysis precision without adding too much compile-time/maintenance cost?
Yes.

Has the precision/performance tradeoffs got fully explored before?
Yes 

Any pointers will be much appreciated. No pun intended :)

PS1: To be more concrete, what I am looking for is not some black-box information like "we switched from basic-aa to cfl-aa and observed 1% improvement at runtime". I believe white-box studies such as "the licm pass failed to hoist x instructions because -tbaa is not flow sensitive" are much more interesting for understanding the problem here.

White-box studies are very application specific, and often very pass specific.

And I understand that. My goal is to look for commonalities among passes and applications.

This generally just discovers things we already know, which is that certain passes have deficiencies.

However, if the existing studies you mentioned above are extensive and conclusive enough to show that the aas we have today have fully exploited almost all such commonalities, then it's probably a better idea for me to find something else to work on. But again, I'd like to see the data first.


 

PS2: If no such evaluation exists in the past, I'd happy to do that myself and report back my findings if anyone here is interested.
I don't think any of the world is set up to make that valuable.

Nothing takes advantage of context sensitivity, flow sensitivity, etc.
I agree that nothing takes advantage of context sensitivity. But I would argue against flow sensitivity, field sensitivity, heap model and external function models

I'm talking about infrastructure wise, nothing in llvm can take advantage because the APIs don't exist.

. Flow sensitivity is helpful when the optimization pass itself is flow-sensitive (e.g. adce, gvn),

No api exists that they could use right now for this, and you'd have to change things to understand answers are not valid over the entire function.

and field sensitivity is helpful when a struct contains multiple pointers. Heap sensitivity is basically what motivates Chris Lattner's PLDI'07 paper, and external function models are helpful since without them the analysis has to be extremely conservative and concludes everything that external functions touch all may-alias each other.

I don't disagree, this is the one to two years of work I said would be needed

Jia Chen via llvm-dev

unread,
Mar 21, 2016, 1:21:12 PM3/21/16
to Daniel Berlin, llvm-dev
Hi Daniel,

On 03/21/2016 11:05 AM, Daniel Berlin wrote:


On Tue, Mar 15, 2016 at 1:37 PM, Jia Chen via llvm-dev <llvm...@lists.llvm.org> wrote:
Dear llvm devs,

tl;dr: What prevents llvm from switching to a fancier pointer analysis?

Nothing.
 

Currently, there exists a variety of general-purpose alias analyses in the LLVM codebase: basic-aa, globalsmodref-aa, tbaa, scev-aa, and cfl-aa. However, only the first three are actually turned on when invoking clang with -O2 or -O3 (please correct me if I'm wrong about this).

This is correct.
Eventually, i hope george will have time to get back to CFL-AA and turn it on by default.
 

If one looks at existing research literatures, there are even more algorithm to consider for doing pointer analysis. Some are field-sensitive, some are field-based, some are flow-sensitive, some are context-sensitive. Even for flow-insensitive ones, they could also be inclusion-style (-andersen-aa) and equality-style (-steens-aa and -ds-aa). Those algorithms are often backed up by rich theoretical framework as well as preliminary evaluations which demonstrate their superior precision and/or performance.

CFL-AA is a middle ground between steens and anders, can be easily made field and context sensitive, etc.
 

Given such an abundance choices of pointer analyses that seem to be much better in the research land, why does real-world compiler infrastructures like llvm still rely on those three simple (and ad-hoc) ones to perform IR optimization?

Time and energy.
 
Based on my understanding (and again please correct me if I am wrong):

(1) The minor reason: those "better" algorithms are very hard to implement in a robust way and nobody seems to be interested in trying to write and maintain them.

This is false.  Heck, at the time i implemented it in GCC, field-sensitive andersen's analysis was unknown in production compilers.  That's why i'm thanked in all the papers - i did the engineering work to make it fast and reliable.

 
(2) The major reason: it's not clear whether those "better" algorithms are actually better for llvm. More precise pointer analyses tend to slow down compile time a lot while contributing too little to the optimization passes that use them. The benefit one gets from a more precise analysis may not justify the compile-time or the maintenance cost.


CFL-AA is probably the right trade-off here. You can stop at any time and have correct answers, you can be as lazy as you like.
etc.

Regarding CFL-AA: in my understanding, cfl-aa does not introduce a new precision tradeoff. It is merely a demand-driven way of implementing existing analyses, by extending those algorithms to track additional "pointed-to-by" information. Laziness may help with the running time of the cfl analysis when only partial points-to info is needed, but if the client wants to do a whole-program analysis and require whole-program points-to info (which is usually true for optimizing compilers since they will eventually examine and touch every piece of the codes given to it), should cfl-aa be no different than traditional whatever-sensitive pointer analysis?



The reality is i think you overlook the realistic answer:

3. Nobody has had time or energy to fix up CFL-AA or SCEV-AA. They spend their time on lower-hanging fruit until that lower hanging fruit is gone.

IE For the moment, CFL-AA and SCEV-AA and ... are not the thing holding llvm back.

I'd love to hear some examples of "lower-hanging fruit" in LLVM, especially in the area of middle-end analyses and optimizations. I thought LLVM is mature enough that any obvious chances of improvement in analyses and optimization have already been taken, no?
So my question here is: what kind(s) of precision really justify the cost and what kinds do not?

Depends entirely on your applications.
 
Has anybody done any study in the past to evaluate what kinds of features in pointer analyses will benefit what kinds of optimization passes?
Yes.
Chris did many years ago, and i've done one more recently.
Great! Are they published somewhere? Can the data be shared somehow?
 
Could there potentially be more improvement on pointer analysis precision without adding too much compile-time/maintenance cost?
Yes.

Has the precision/performance tradeoffs got fully explored before?
Yes 

Any pointers will be much appreciated. No pun intended :)

PS1: To be more concrete, what I am looking for is not some black-box information like "we switched from basic-aa to cfl-aa and observed 1% improvement at runtime". I believe white-box studies such as "the licm pass failed to hoist x instructions because -tbaa is not flow sensitive" are much more interesting for understanding the problem here.

White-box studies are very application specific, and often very pass specific.

And I understand that. My goal is to look for commonalities among passes and applications. However, if the existing studies you mentioned above are extensive and conclusive enough to show that the aas we have today have fully exploited almost all such commonalities, then it's probably a better idea for me to find something else to work on. But again, I'd like to see the data first.

 

PS2: If no such evaluation exists in the past, I'd happy to do that myself and report back my findings if anyone here is interested.
I don't think any of the world is set up to make that valuable.

Nothing takes advantage of context sensitivity, flow sensitivity, etc.
I agree that nothing takes advantage of context sensitivity. But I would argue against flow sensitivity, field sensitivity, heap model and external function models. Flow sensitivity is helpful when the optimization pass itself is flow-sensitive (e.g. adce, gvn), and field sensitivity is helpful when a struct contains multiple pointers. Heap sensitivity is basically what motivates Chris Lattner's PLDI'07 paper, and external function models are helpful since without them the analysis has to be extremely conservative and concludes everything that external functions touch all may-alias each other.

Renato Golin via llvm-dev

unread,
Mar 21, 2016, 1:41:05 PM3/21/16
to Daniel Berlin, llvm-dev, Jia Chen
On 21 March 2016 at 17:17, Daniel Berlin via llvm-dev

<llvm...@lists.llvm.org> wrote:
>>> Has anybody done any study in the past to evaluate what kinds of features
>>> in pointer analyses will benefit what kinds of optimization passes?
>>
>> Yes.
>> Chris did many years ago, and i've done one more recently.
>>
>> Great! Are they published somewhere? Can the data be shared somehow?
>
> No, and sadly, no

This sounds like a good GSOC project.

Having the evaluation done is great, but if you can't share, than
that's pretty much useless to the community at large.

Even if a student does a less thorough evaluation, having something
out is better than having nothing, and with your expertise, I'm sure
we can get such a student doing some pretty capable analysis with
little resources.

cheers,
--renato
_______________________________________________
LLVM Developers mailing list
llvm...@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

Jia Chen via llvm-dev

unread,
Mar 21, 2016, 2:55:57 PM3/21/16
to Daniel Berlin, llvm-dev


It is merely a demand-driven way of implementing existing analyses, by extending those algorithms to track additional "pointed-to-by" information. Laziness may help with the running time of the cfl analysis when only partial points-to info is needed, but if the client wants to do a whole-program analysis and require whole-program points-to info (which is usually true for optimizing compilers since they will eventually examine and touch every piece of the codes given to it), should cfl-aa be no different than traditional whatever-sensitive pointer analysis?

CFL, at least when I ran the numbers, was faster at all pairs than existing analysis.

There could be many reasons for it, e.g. better implementations. Again, my point is that cfl-aa is more of an implementation strategy than a fundamentally superior approach.

Great! Are they published somewhere? Can the data be shared somehow?

No, and sadly, no
:(

I'm talking about infrastructure wise, nothing in llvm can take advantage because the APIs don't exist.

. Flow sensitivity is helpful when the optimization pass itself is flow-sensitive (e.g. adce, gvn),

No api exists that they could use right now for this, and you'd have to change things to understand answers are not valid over the entire function.

I see what you are saying now. Sometimes flow/ctx-insensitive alias queries can benefit from a flow/ctx-sensitive analysis, yet my intuition is that such cases are likely to be rare. I could go ahead and modify those passes myself to carry on the study, but that option probably won't be too interesting to the community.

Thank you very much for pointing that out to me.

Jia Chen via llvm-dev

unread,
Mar 21, 2016, 2:56:32 PM3/21/16
to Renato Golin, llvm-dev

On 21 March 2016 at 17:17, Daniel Berlin via llvm-dev
<llvm...@lists.llvm.org> wrote:
Has anybody done any study in the past to evaluate what kinds of features
in pointer analyses will benefit what kinds of optimization passes?
Yes.
Chris did many years ago, and i've done one more recently.

Great! Are they published somewhere? Can the data be shared somehow?
No, and sadly, no
This sounds like a good GSOC project.


Need any volunteers?

I'd be interested in any work that relates to pointer analysis, including this as well as the " one to two years of work" Daniel mentioned. What held me back from submitting a proposal is the concern that such kind of explorative work whose outcome is not guaranteed to be useful may not be attractive enough to the LLVM devs.

Daniel Berlin via llvm-dev

unread,
Mar 21, 2016, 2:59:43 PM3/21/16
to Renato Golin, llvm-dev, Jia Chen
On Mon, Mar 21, 2016 at 10:40 AM, Renato Golin <renato...@linaro.org> wrote:
On 21 March 2016 at 17:17, Daniel Berlin via llvm-dev
<llvm...@lists.llvm.org> wrote:
>>> Has anybody done any study in the past to evaluate what kinds of features
>>> in pointer analyses will benefit what kinds of optimization passes?
>>
>> Yes.
>> Chris did many years ago, and i've done one more recently.
>>
>> Great! Are they published somewhere? Can the data be shared somehow?
>
> No, and sadly, no

This sounds like a good GSOC project.

Having the evaluation done is great, but if you can't share, than
that's pretty much useless to the community at large.

Which is why i've never mentioned it or used it in the community ;)
 

Even if a student does a less thorough evaluation, having something
out is better than having nothing, and with your expertise, I'm sure
we can get such a student doing some pretty capable analysis with
little resources.


FWIW, i'm not sure this is worthwhile at this time, because we pretty much know enough of the low-hanging answers to keep someone busy with implementation work for years.

(IE we know that scev-aa would be of significant benefit to PRE and GVN, etc).

I would rather see someone spend their time getting SCEV-AA on by default or CFL-AA on by default than doing another evaluation.


Daniel Berlin via llvm-dev

unread,
Mar 21, 2016, 3:03:35 PM3/21/16
to Jia Chen, llvm-dev
On Mon, Mar 21, 2016 at 11:34 AM, Jia Chen <jc...@cs.utexas.edu> wrote:


It is merely a demand-driven way of implementing existing analyses, by extending those algorithms to track additional "pointed-to-by" information. Laziness may help with the running time of the cfl analysis when only partial points-to info is needed, but if the client wants to do a whole-program analysis and require whole-program points-to info (which is usually true for optimizing compilers since they will eventually examine and touch every piece of the codes given to it), should cfl-aa be no different than traditional whatever-sensitive pointer analysis?

CFL, at least when I ran the numbers, was faster at all pairs than existing analysis.

There could be many reasons for it, e.g. better implementations.

FWIW: the implementations i compared against are completely state of the art and very well engineered (IE not research crap :P).
 
Again, my point is that cfl-aa is more of an implementation strategy than a fundamentally superior approach.

The first part is true, but the second part depends on your definition of "superior approach".

 You can solve andersens and steengaards and everything else using standard dataflow solvers, and that's an implementation strategy, but it will be really slow.

Part of the tradeoff is how fast something runs, and approaches that are orders of magnitude faster often change the calculus of what people do. For example, before hardekopf's work, andersens was considered too slow to be practical in a real compiler.

Now, GCC does it by default. 

So i would call that approach a superior approach :)

So saying that CFL-AA offers nothing superior in terms of approach, IMHO, misunderstands the nature of the problem. If your goal is to get precision at all costs, then yes, it's not superior.  If your goal is to get something into a production compiler, that is understandable, maintainable, can turn on and off field and context sensitivity easily, etc, then it may be a superior approach.


I'm talking about infrastructure wise, nothing in llvm can take advantage because the APIs don't exist.

. Flow sensitivity is helpful when the optimization pass itself is flow-sensitive (e.g. adce, gvn),

No api exists that they could use right now for this, and you'd have to change things to understand answers are not valid over the entire function.

I see what you are saying now. Sometimes flow/ctx-insensitive alias queries can benefit from a flow/ctx-sensitive analysis, yet my intuition is that such cases are likely to be rare.

Yes.
 
I could go ahead and modify those passes myself to carry on the study, but that option probably won't be too interesting to the community.

Right, because then you aren't testing LLVM, you are testing LLVM with better infrastructure :)
 

Thank you very much for pointing that out to me.

Happy to ;)

Renato Golin via llvm-dev

unread,
Mar 21, 2016, 3:06:06 PM3/21/16
to Daniel Berlin, llvm-dev, Jia Chen
On 21 March 2016 at 18:59, Daniel Berlin <dbe...@dberlin.org> wrote:
> Which is why i've never mentioned it or used it in the community ;)

Makes sense. :)


> I would rather see someone spend their time getting SCEV-AA on by default or
> CFL-AA on by default than doing another evaluation.

But those may not be simple enough for a GSOC, that's why I mentioned it.

The analysis could not only get us a birds view of the problem ahead,
but also introduce new developers to AA, which would make their future
work on SCEV-AA or CFL-AA easier. Kind of a teaching tool to get more
AA-savvy people.

Daniel Berlin via llvm-dev

unread,
Mar 21, 2016, 3:07:51 PM3/21/16
to Renato Golin, George Burgess IV, llvm-dev, Jia Chen
On Mon, Mar 21, 2016 at 12:05 PM, Renato Golin <renato...@linaro.org> wrote:
On 21 March 2016 at 18:59, Daniel Berlin <dbe...@dberlin.org> wrote:
> Which is why i've never mentioned it or used it in the community ;)

Makes sense. :)


> I would rather see someone spend their time getting SCEV-AA on by default or
> CFL-AA on by default than doing another evaluation.

But those may not be simple enough for a GSOC, that's why I mentioned it.


CFL-AA should just be fixing performance regressions, and maybe a little bug fixing, which is hopefully easy enough.  It's already fast enough as a pass.

SCEV-AA would be harder (must make SCEV-AA faster).

The analysis could not only get us a birds view of the problem ahead,
but also introduce new developers to AA, which would make their future
work on SCEV-AA or CFL-AA easier. Kind of a teaching tool to get more
AA-savvy people.

Sure.  

cheers,
--renato

Hal Finkel via llvm-dev

unread,
Mar 21, 2016, 3:26:31 PM3/21/16
to Daniel Berlin, llvm-dev, Jia Chen
From: "Daniel Berlin via llvm-dev" <llvm...@lists.llvm.org>
To: "Renato Golin" <renato...@linaro.org>, "George Burgess IV" <george.b...@gmail.com>
Cc: "llvm-dev" <llvm...@lists.llvm.org>, "Jia Chen" <jc...@cs.utexas.edu>
Sent: Monday, March 21, 2016 2:07:44 PM
Subject: Re: [llvm-dev] Existing studies on the benefits of pointer analysis



On Mon, Mar 21, 2016 at 12:05 PM, Renato Golin <renato...@linaro.org> wrote:
On 21 March 2016 at 18:59, Daniel Berlin <dbe...@dberlin.org> wrote:
> Which is why i've never mentioned it or used it in the community ;)

Makes sense. :)


> I would rather see someone spend their time getting SCEV-AA on by default or
> CFL-AA on by default than doing another evaluation.

But those may not be simple enough for a GSOC, that's why I mentioned it.


CFL-AA should just be fixing performance regressions, and maybe a little bug fixing, which is hopefully easy enough.  It's already fast enough as a pass.

My understanding from George is that there are self-hosting miscompiles if you disable all AA except for CFL-AA. This is what is preventing us from enabling it by default. George, is that right?

 -Hal

SCEV-AA would be harder (must make SCEV-AA faster).

The analysis could not only get us a birds view of the problem ahead,
but also introduce new developers to AA, which would make their future
work on SCEV-AA or CFL-AA easier. Kind of a teaching tool to get more
AA-savvy people.

Sure.  

cheers,
--renato


_______________________________________________
LLVM Developers mailing list
llvm...@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev



--
Hal Finkel
Assistant Computational Scientist
Leadership Computing Facility
Argonne National Laboratory

George Burgess IV via llvm-dev

unread,
Mar 21, 2016, 4:10:55 PM3/21/16
to Hal Finkel, llvm-dev, Jia Chen
As of late-August 2015, putting CFL-AA behind BasicAA caused miscompiles when trying to bootstrap Clang/LLVM, yeah. It didn't seem that there were many new errors (I think it caused ~10 tests to fail, where fail = either segv or produce the wrong output), but it did end up breaking things. I don't recall if standalone CFL-AA causes miscompiles, but I highly doubt the breakages I observed were BasicAA's fault.

WRT speed, `time make -j14` on my box (6c/12t) didn't show a meaningful increase in compile time when CFL-AA gets enabled (read: it got lost in the noise). So, I agree that it's probably fast enough at the moment; if we want to enhance it, we should focus on making it bootstrap clang+LLVM/making it more accurate.

Daniel Berlin via llvm-dev

unread,
Mar 21, 2016, 4:36:38 PM3/21/16
to Jia Chen, llvm-dev
On Mon, Mar 21, 2016 at 12:26 PM, Jia Chen <jc...@cs.utexas.edu> wrote:

 You can solve andersens and steengaards and everything else using standard dataflow solvers, and that's an implementation strategy, but it will be really slow.

Part of the tradeoff is how fast something runs, and approaches that are orders of magnitude faster often change the calculus of what people do. For example, before hardekopf's work, andersens was considered too slow to be practical in a real compiler.

Now, GCC does it by default.

So i would call that approach a superior approach :)

So saying that CFL-AA offers nothing superior in terms of approach, IMHO, misunderstands the nature of the problem. If your goal is to get precision at all costs, then yes, it's not superior.  If your goal is to get something into a production compiler, that is understandable, maintainable, can turn on and off field and context sensitivity easily, etc, then it may be a superior approach.


Apparently "superior approach" is a misnomer on my side. My apologies.

No worries at all!
 
What I should have said is "an approach with superior precision". Both cfl and Ben Hardekopf's work you mentioned (which improves analysis performance by using SSA transformation as a pre-pass to eliminate easy-to-analyze pointers)
can be viewed as optimizations on standard dataflow solver,

Well, not quite. Just to be pedantic:
It does hash value numbering and CSE and some other things on the constraint graph.  

CFL is not a dataflow solver at all. It's a graph reachability solver.
Ben's work is a constraint solver.
It does not know or care about CFG's, basic blocks, etc.
Dataflow solvers are like Ryder and Landi's approach


 
but at the end of the day they do nothing more than that. 
From a client's perspective, they are no different from standard solvers except they are faster.

Yes.
 

I do acknowledge that cfl may work better in practice (although I held different opinions about understandability and maintainability).

Sure. Having implemented tons and tons and tons of these algorithms and things, i'd argue that constraint solving tends to be easier to understand once you get it, but has limitations that are harder to overcome than in CFL land.
 
It's just that I tend to make judgment of pointer analysis based on the need of a client.

Thinking about individual clients, while useful, is not always the right end game.

It pretty much  does not matter if i improve GVN if now it just catches cases some other cheap pass does anyway.
Maybe it does in some ways, but it's often not going to let you remove that other pass, or the expense of improving it is not worth the cost.

So providing context-sensitive AA to a pass so it can do an amazing job will buy you pretty much nothing if the other passes can do the same job with less info.

 
Again, I meant no offense and I apologize for my inappropriate choice of words.


It was neither offensive nor inappropriate :)

Jia Chen via llvm-dev

unread,
Mar 21, 2016, 4:39:25 PM3/21/16
to Daniel Berlin, llvm-dev

> You can solve andersens and steengaards and everything else using
> standard dataflow solvers, and that's an implementation strategy, but
> it will be really slow.
>
> Part of the tradeoff is how fast something runs, and approaches that
> are orders of magnitude faster often change the calculus of what
> people do. For example, before hardekopf's work, andersens was
> considered too slow to be practical in a real compiler.
>
> Now, GCC does it by default.
>
> So i would call that approach a superior approach :)
>
> So saying that CFL-AA offers nothing superior in terms of approach,
> IMHO, misunderstands the nature of the problem. If your goal is to get
> precision at all costs, then yes, it's not superior. If your goal is
> to get something into a production compiler, that is understandable,
> maintainable, can turn on and off field and context sensitivity
> easily, etc, then it may be a superior approach.
>
>
Apparently "superior approach" is a misnomer on my side. My apologies.
What I should have said is "an approach with superior precision". Both
cfl and Ben Hardekopf's work you mentioned (which improves analysis
performance by using SSA transformation as a pre-pass to eliminate
easy-to-analyze pointers) can be viewed as optimizations on standard
dataflow solver, but at the end of the day they do nothing more than
that. From a client's perspective, they are no different from standard
solvers except they are faster.

I do acknowledge that cfl may work better in practice (although I held
different opinions about understandability and maintainability). It's

just that I tend to make judgment of pointer analysis based on the need

of a client. Again, I meant no offense and I apologize for my
inappropriate choice of words.

--
Best Regards,

--
Jia Chen

Philip Reames via llvm-dev

unread,
Mar 21, 2016, 9:28:45 PM3/21/16
to Jia Chen, Christian Convey, llvm-dev


On 03/21/2016 08:56 AM, Jia Chen via llvm-dev wrote:
Hi Christian,

Thank you so much for the reply! Please see my comments inline.

On 03/21/2016 09:32 AM, Christian Convey wrote:
Hi Jia,

If one looks at existing research literatures, there are even more algorithm to consider for doing pointer analysis.

For at least some published AA algorithms, there may be some uncertainty about software patents and/or copyright.  

At one point I was interested in the status of this AA implementation by Lian Li et al.  IIRC, when I contacted Lian to ask if there was any chance of getting it into LLVM, IIRC she said that her employer wouldn't promise to relinquish all possible claims it had on that algorithm's IP.  So unfortunately, at least in the U.S., an algorithm being published in an academic journal doesn't remove all legal risk associated with using it.
This is news to me. Thanks for sharing it.

Also, speaking from my own experience, even when an AA algorithm seems to be described in great detail in some piece of literature (e.g., a phd thesis), there can still be a lot of details which are glossed over, or which seem clear when reading the document but which get a lot more confusing when one tries to actually implement it.  

That can make implementing such an algorithm take far longer than one would expect based on just reading the document.  (It's also an argument in favor of requiring academic papers which describe the behavior of a software implementation to actually include a working copy of the source code, IMHO.)
My personal experience also coincides. And even if the paper does come with an artifact or source codes, they are usually proof-of-concept implementations with lots of important real-world corner cases ignored.

So my question here is: what kind(s) of precision really justify the cost and what kinds do not? Has anybody done any study in the past to evaluate what kinds of features in pointer analyses will benefit what kinds of optimization passes?

At one point I discussed this with Daniel Berlin, but I'm having trouble find a record of the conversation.  IIRC, he says that he once threw a huge amount of computing power at doing a full context-sensitive AA on some software, and the speedup he observed in the resulting program as underwhelming (10-15%?). 
I kind of expect that. As you mentioned later, most optimization passes work in a context-insensitive manner (i.e. they won't clone a function and optimize differently on different clones). Context sensitivity on the pointer analysis side is probably not going to help a lot if the client cannot fully capitalize on it. In the settings of compiler optimization, my guess is that flow sensitivity, field sensitivity, heap model and external library annotations are the four aspects that are likely to matter.

I did some preliminary experiments with licm on c programs over the last weekend. I chose licm because intuitively that's the optimization that could have the biggest performance impact. The result suggested that tbaa, cfl-aa, scev-aa and globals-aa yields very little additional benefits over basic-aa. Again, both the methodology and benchmark selection are very immature and the results need to be double-checked, but my hope is that by looking at how aa algorithms and their clients interact I may be able to get some hints on what kind of aa a compiler really wants.
Just to chime in here, your results match my experience and expectations with LICM as well.  Between basic-aa, and TBAA (specifically LLVM's implementation thereof), I haven't seen a lot of cases where an imprecision in the alias analysis prevents hoisting.

*However*, if you're interested in LICM specifically, I have *definitely* seen cases where the precision of AliasSetTracker (our grouping of AA results to prevent O(n^2) queries) prevents hoisting in spurious cases.  AST could use some serious attention, both from an engineering standpoint and from (possibly) a theoretically one. 

I can't remember if that was with GCC or LLVM.  That result is a data point, although it may not say much about how much additional speedup could be realized if the algorithms which use the AA results were themselves adapted to capitalize on fully context-sensitive, flow-sensitive, hula-dancer-on-the-dashboard AA results.

Cheers,
Christian


--
Best Regards,

--
Jia Chen


Daniel Berlin via llvm-dev

unread,
Mar 21, 2016, 9:53:54 PM3/21/16
to Philip Reames, llvm-dev, Jia Chen
On Mon, Mar 21, 2016 at 6:28 PM, Philip Reames via llvm-dev <llvm...@lists.llvm.org> wrote:


On 03/21/2016 08:56 AM, Jia Chen via llvm-dev wrote:
Hi Christian,

Thank you so much for the reply! Please see my comments inline.

On 03/21/2016 09:32 AM, Christian Convey wrote:
Hi Jia,

If one looks at existing research literatures, there are even more algorithm to consider for doing pointer analysis.

For at least some published AA algorithms, there may be some uncertainty about software patents and/or copyright.  

At one point I was interested in the status of this AA implementation by Lian Li et al.  IIRC, when I contacted Lian to ask if there was any chance of getting it into LLVM, IIRC she said that her employer wouldn't promise to relinquish all possible claims it had on that algorithm's IP.  So unfortunately, at least in the U.S., an algorithm being published in an academic journal doesn't remove all legal risk associated with using it.
This is news to me. Thanks for sharing it.

Also, speaking from my own experience, even when an AA algorithm seems to be described in great detail in some piece of literature (e.g., a phd thesis), there can still be a lot of details which are glossed over, or which seem clear when reading the document but which get a lot more confusing when one tries to actually implement it.  

That can make implementing such an algorithm take far longer than one would expect based on just reading the document.  (It's also an argument in favor of requiring academic papers which describe the behavior of a software implementation to actually include a working copy of the source code, IMHO.)
My personal experience also coincides. And even if the paper does come with an artifact or source codes, they are usually proof-of-concept implementations with lots of important real-world corner cases ignored.

So my question here is: what kind(s) of precision really justify the cost and what kinds do not? Has anybody done any study in the past to evaluate what kinds of features in pointer analyses will benefit what kinds of optimization passes?

At one point I discussed this with Daniel Berlin, but I'm having trouble find a record of the conversation.  IIRC, he says that he once threw a huge amount of computing power at doing a full context-sensitive AA on some software, and the speedup he observed in the resulting program as underwhelming (10-15%?). 
I kind of expect that. As you mentioned later, most optimization passes work in a context-insensitive manner (i.e. they won't clone a function and optimize differently on different clones). Context sensitivity on the pointer analysis side is probably not going to help a lot if the client cannot fully capitalize on it. In the settings of compiler optimization, my guess is that flow sensitivity, field sensitivity, heap model and external library annotations are the four aspects that are likely to matter.

I did some preliminary experiments with licm on c programs over the last weekend. I chose licm because intuitively that's the optimization that could have the biggest performance impact. The result suggested that tbaa, cfl-aa, scev-aa and globals-aa yields very little additional benefits over basic-aa. Again, both the methodology and benchmark selection are very immature and the results need to be double-checked, but my hope is that by looking at how aa algorithms and their clients interact I may be able to get some hints on what kind of aa a compiler really wants.
Just to chime in here, your results match my experience and expectations with LICM as well.  Between basic-aa, and TBAA (specifically LLVM's implementation thereof), I haven't seen a lot of cases where an imprecision in the alias analysis prevents hoisting.

Yeah, at best, for LICM, it's just going to tell you the best place to insert runtime checks.  LICM has a specific goal, and it's usually not AA that prevents proving something loop invariant.  Most loads/stores are also either trivially loop invariant, or impossible to prove loop invariant.

More general PRE and GVN-like opts (basically, load elimination, load hoisting, dead store elimination, store sinking) are where i expect better AA to help the most off the top of my head.  But to figure out the gains, you'd really need to instrument at runtime to figure out what the theoretical maximum gain is (IE when things are equivalent that it is not proving equivalent).


 

*However*, if you're interested in LICM specifically, I have *definitely* seen cases where the precision of AliasSetTracker (our grouping of AA results to prevent O(n^2) queries) prevents hoisting in spurious cases.  AST could use some serious attention, both from an engineering standpoint and from (possibly) a theoretically one. 


You already know my view on this one: It's going to be remarkably hard to make AST work the way folks want it and have it be incremental and completely agnostic of anything but the AA API.

It's just really hard if not provably impossible to do this incrementally and avoid duplicate work, and get precise results and ...


On the other hand, it's pretty easy if you basically say "i provide list of all pointers and statements i care about, you make me some sets", and you let it figure out the answers upfront.

(it's also not clear to me why AST is the right abstraction for LICM to work on top of, but that's neither here nor there :P)



Philip Reames via llvm-dev

unread,
Mar 22, 2016, 2:51:35 PM3/22/16
to Daniel Berlin, llvm-dev, Jia Chen


On 03/21/2016 06:53 PM, Daniel Berlin wrote:


On Mon, Mar 21, 2016 at 6:28 PM, Philip Reames via llvm-dev <llvm...@lists.llvm.org> wrote:


On 03/21/2016 08:56 AM, Jia Chen via llvm-dev wrote:


I did some preliminary experiments with licm on c programs over the last weekend. I chose licm because intuitively that's the optimization that could have the biggest performance impact. The result suggested that tbaa, cfl-aa, scev-aa and globals-aa yields very little additional benefits over basic-aa. Again, both the methodology and benchmark selection are very immature and the results need to be double-checked, but my hope is that by looking at how aa algorithms and their clients interact I may be able to get some hints on what kind of aa a compiler really wants.
Just to chime in here, your results match my experience and expectations with LICM as well.  Between basic-aa, and TBAA (specifically LLVM's implementation thereof), I haven't seen a lot of cases where an imprecision in the alias analysis prevents hoisting.

Yeah, at best, for LICM, it's just going to tell you the best place to insert runtime checks.  LICM has a specific goal, and it's usually not AA that prevents proving something loop invariant.  Most loads/stores are also either trivially loop invariant, or impossible to prove loop invariant.
Side note: The key limiting factor for load hoisting is most often being able to prove dereferenceability.  I only mention that because the OP had asked for where other optimizer changes might help. 

*However*, if you're interested in LICM specifically, I have *definitely* seen cases where the precision of AliasSetTracker (our grouping of AA results to prevent O(n^2) queries) prevents hoisting in spurious cases.  AST could use some serious attention, both from an engineering standpoint and from (possibly) a theoretically one. 


You already know my view on this one: It's going to be remarkably hard to make AST work the way folks want it and have it be incremental and completely agnostic of anything but the AA API.

It's just really hard if not provably impossible to do this incrementally and avoid duplicate work, and get precise results and ...


On the other hand, it's pretty easy if you basically say "i provide list of all pointers and statements i care about, you make me some sets", and you let it figure out the answers upfront.
Er, this is actually fairly close to what the code does.  It just does it in a really oddly structured manner.  But yes, I agree, this code could be radically improved. 

(it's also not clear to me why AST is the right abstraction for LICM to work on top of, but that's neither here nor there :P)
Out of curiosity, what would you suggest instead?

For context, I have a patch in my tree which falls back to O(n^2) queries for small loops.  We found this to be rather important for the optimization of IR derived from our Java frontend. 

Philip

Philip Reames via llvm-dev

unread,
Mar 22, 2016, 2:55:58 PM3/22/16
to George Burgess IV, Hal Finkel, llvm-dev, Jia Chen
It's found more and more like "get CFL-AA turned on by default" might be a viable GSoC project for the right student.  It would require someone with existing knowledge of AA and a willingness to debug nasty problems, but it sounds like there's definitely interest in the community in seeing this happen.

If the student finished early (unlikely), they could start on SCEV-AA as well. 

Philip

Mehdi Amini via llvm-dev

unread,
Mar 22, 2016, 3:11:17 PM3/22/16
to Philip Reames, llvm-dev, Jia Chen
FWIW, I'd be happy to help mentoring such a project if a student is interested.

During the last LLVM Dev Meeting (the US one), I think better AA was number one in the list of requested features during the BoF on "Sophisticated Program Analysis on LLVM IR"  ( http://devmtg15.llvm.org/event/4VNY/sophisticated-program-analysis-on-llvm-ir ). 
CC John, I don't remember if there was a summary of the BoF posted online.

-- 
Mehdi

Jia Chen via llvm-dev

unread,
Mar 22, 2016, 5:09:33 PM3/22/16
to Philip Reames, Daniel Berlin, llvm-dev
Also out of curiosity, why LLVM choose to expose pointer analysis information as alias-query APIs rather than APIs that let the client query points-to sets? This question has bothered me ever since I started to look into LLVM's alias analysis framework.

It seems to me that alias queries may yield less precise results than points-to queries. To put it in another way, it is easy to convert points-to information to aliasing information (just check for emptiness of points-to set intersection), but the reverse is much harder and may not be possible sometimes. The alias set tracker also introduce an additional source of imprecision: if a may alias b and b may alias c, it is not necessary that a may alias c but the AST would merge them all into one set.

It also doesn't seem like alias information is cheaper to compute (e.g. andersen) and is cheaper to query (e.g. the O(n^2) query problem).

Jia Chen via llvm-dev

unread,
Mar 22, 2016, 5:33:35 PM3/22/16
to Philip Reames, dbe...@dberlin.org, llvm-dev
It's something that I am certainly interested in and qualified to do.

However, the way I read Daniel's reply in this thread is: "LLVM, in its current form, is unlikely to benefit from a more precise aa". He did mention that cfl-aa is "more understandable and maintainable", and is "fast enough", but nothing is said about the benefits. There was some discussions I could find back in 2014, which basically says that cfl-aa offers no significant improvement in performance. Things may got greatly improved since then, yet it is not clear to me what the current situation is.

With the benefits being unclear a GSoC proposal on this topic may not look motivated enough.

Daniel Berlin via llvm-dev

unread,
Mar 22, 2016, 5:41:46 PM3/22/16
to Jia Chen, llvm-dev
No pointer analysis will show any benefits until it is on by default and tuning starts :)

or everything is tuned and then it's turned on, whichever way you want to do it.


Hal Finkel via llvm-dev

unread,
Mar 22, 2016, 5:46:47 PM3/22/16
to Jia Chen, llvm-dev
----- Original Message -----
> From: "Jia Chen via llvm-dev" <llvm...@lists.llvm.org>
> To: "Philip Reames" <list...@philipreames.com>, dbe...@dberlin.org
> Cc: "llvm-dev" <llvm...@lists.llvm.org>
> Sent: Tuesday, March 22, 2016 4:33:34 PM
> Subject: Re: [llvm-dev] Existing studies on the benefits of pointer analysis
>
>
> It's something that I am certainly interested in and qualified to do.
>
> However, the way I read Daniel's reply in this thread is: "LLVM, in
> its current form, is unlikely to benefit from a more precise aa". He
> did mention that cfl-aa is "more understandable and maintainable",
> and is "fast enough", but nothing is said about the benefits. There
> was some discussions I could find back in 2014, which basically says
> that cfl-aa offers no significant improvement in performance. Things
> may got greatly improved since then, yet it is not clear to me what
> the current situation is.
>
> With the benefits being unclear a GSoC proposal on this topic may not
> look motivated enough.
>

I think there's wide interest in getting CFL-AA into a usable state. As I recall, the percentage of noalias results during self hosting went up by a huge amount (for example). More noalias results obviously does not guarantee better code performance (the performance could even be worse because of register allocation / scheduling deficiencies). Nevertheless, we'll never do better if we're hamstrung by poor aliasing results.

Moreover, we currently rely heavily on BasicAA, which is a huge collection of local heuristics that catches many things, but is quite slow. It is slow because it needs to do local recursive walks for each pointwise query. If getting CFL-AA, in addition to increasing precision, provides us with a path away from our current BasicAA design toward something with better complexity, that's a definite win.

-Hal

Chris Lattner via llvm-dev

unread,
Mar 25, 2016, 9:09:02 PM3/25/16
to Jia Chen, llvm-dev
On Mar 21, 2016, at 10:00 AM, Jia Chen via llvm-dev <llvm...@lists.llvm.org> wrote:

So my question here is: what kind(s) of precision really justify the cost and what kinds do not?

Depends entirely on your applications.
 
Has anybody done any study in the past to evaluate what kinds of features in pointer analyses will benefit what kinds of optimization passes?
Yes.
Chris did many years ago, and i've done one more recently.

Great! Are they published somewhere? Can the data be shared somehow?


My results are published here:

Sadly, they are over a decade out of date, and some things have changed since then :-)

PS2: If no such evaluation exists in the past, I'd happy to do that myself and report back my findings if anyone here is interested.
I don't think any of the world is set up to make that valuable.

Nothing takes advantage of context sensitivity, flow sensitivity, etc.
I agree that nothing takes advantage of context sensitivity. But I would argue against flow sensitivity, field sensitivity, heap model and external function models. Flow sensitivity is helpful when the optimization pass itself is flow-sensitive (e.g. adce, gvn), and field sensitivity is helpful when a struct contains multiple pointers. Heap sensitivity is basically what motivates Chris Lattner's PLDI'07 paper, and external function models are helpful since without them the analysis has to be extremely conservative and concludes everything that external functions touch all may-alias each other.

I’m still a big fan of context sensitive, flow insensitive, unification based models.  Contrary to your claim, context sensitivity *is* useful for mod-ref analysis, e.g. “can I hoist a load across this call”?  Context sensitivity improves the precision of the mod/ref set of the call.

-Chris

Daniel Berlin via llvm-dev

unread,
Mar 25, 2016, 9:26:17 PM3/25/16
to Chris Lattner, llvm-dev, Jia Chen


I’m still a big fan of context sensitive, flow insensitive, unification based models.

CFL can emulate this in the same time bound.
 
 Contrary to your claim, context sensitivity *is* useful for mod-ref analysis, e.g. “can I hoist a load across this call”?  Context sensitivity improves the precision of the mod/ref set of the call.
 
-Chris

Yeah.
It depends entirely on your goal. In reality, often what you really want is something to say "hey, i've got this pointer over here, and i really want to hoist  it up here. Do something, tell me if that is possible".

(This is actually why i'm a fan of CFL-AA. You can essentially make it as expensive or not expensive as you want, and it still does really well in pracftice in time)

Jia Chen via llvm-dev

unread,
Mar 26, 2016, 12:05:08 AM3/26/16
to Chris Lattner, llvm-dev
On 03/25/2016 08:08 PM, Chris Lattner wrote:
I’m still a big fan of context sensitive, flow insensitive, unification based models. 

Interestingly I find the unification approach quite unsatisfactory sometime. What happens there is pointers with the same "depth" are too often clobbered together unless they are really unrelated to each other.

Contrary to your claim, context sensitivity *is* useful for mod-ref analysis, e.g. “can I hoist a load across this call”?  Context sensitivity improves the precision of the mod/ref set of the call.

I'm not sure about that. How often does mod-ref information change across callsites? Isn't a good context-insensitive function summary good enough?

-Jia

Jia Chen via llvm-dev

unread,
Mar 26, 2016, 12:20:29 AM3/26/16
to Daniel Berlin, llvm-dev
On 03/25/2016 08:26 PM, Daniel Berlin wrote:
>
> Yeah.
> It depends entirely on your goal. In reality, often what you really
> want is something to say "hey, i've got this pointer over here, and i
> really want to hoist it up here. Do something, tell me if that is
> possible".
>
And this is one motivation of my current research: how can various
precision dimensions of a pointer analysis be effectively driven by its
client.

> (This is actually why i'm a fan of CFL-AA. You can essentially make it
> as expensive or not expensive as you want, and it still does really
> well in pracftice in time)
>

Again, "making it as expensive or not expensive as you want" is not
something unique about cfl-aa. With the right tweak one can also do it
with a traditional solver. The real interesting question here is how to
find out what locations are most likely to matter and worth making
expensive.

- Jia

Daniel Berlin via llvm-dev

unread,
Mar 26, 2016, 11:02:51 AM3/26/16
to Jia Chen, llvm-dev
With no offense meant, I've been writing solvers for 10+ years.  What you are suggesting is not just a tweak.  Doing it on a per pointer basis is neither trivial nor easy to keep sound. If you think it is, I would suggest taking GCC's implementation and trying to do it. If what you say was true, production compilers would be doing it.  But they don't.

The other problem you mention is, IMHO, not actually as interesting.  We already have traditional methods (value profiling, etc) of knowing which things matter.  Static prediction of this has a long history of over promise and under delivery.  

Jia Chen via llvm-dev

unread,
Mar 26, 2016, 3:28:58 PM3/26/16
to Daniel Berlin, llvm-dev
On 03/26/2016 10:02 AM, Daniel Berlin wrote:
> With no offense meant, I've been writing solvers for 10+ years. What
> you are suggesting is not just a tweak. Doing it on a per pointer
> basis is neither trivial nor easy to keep sound. If you think it is, I
> would suggest taking GCC's implementation and trying to do it. If what
> you say was true, production compilers would be doing it. But they don't.

None taken. I didn't say it is an easy problem. Pointer analysis on a
C-ish language is always hard no matter what approach one takes, let
alone tuning the precision on a per pointer basis.


>
> The other problem you mention is, IMHO, not actually as interesting.
> We already have traditional methods (value profiling, etc) of knowing
> which things matter. Static prediction of this has a long history of
> over promise and under delivery.

I agree that it is easier to get some quick hints with profiling, yet
like any other dynamic approaches profiling has its own drawbacks:
benchmark dependent, no soundness guarantees, etc. I believe it is still
not the time to conclude that we've already got a perfect solution on
this problem and declare death sentence to any attempts to seek a good
static prediction mechanism.

Chris Lattner via llvm-dev

unread,
Mar 28, 2016, 1:37:56 AM3/28/16
to Jia Chen, llvm-dev
It changes all the time.  Here’s a trivial example, assume no inlining and no AA other than the one in question:

std::vector<int> V1 = { 1, 2, 3 };
std::vector<int> V2 = { 4, 5, 6 };

V1.pop_back();    // Mutates *this

auto length = V1.size();

V2.pop_back();    // Mutates *this

auto zero = length - V1.size()

In this case, the compiler should “obviously” be able to CSE length, allowing further simplification to substitute zero with 0.

However, with a context sensitive AA, both &V1 and &V2 end up aliasing the “this” pointer in std::vector::pop_back.  As such, without context sensitivity, you would falsely assume that “V2.pop_back();” could modify “V1”.  This is unfortunate, particularly for OO languages that frequently use static dispatch (like C++, Swift, and others).


That said, I have no idea what you’re referring to by "context-insensitive function summary”.  If you’re talking about something context sensitive, then ya, it can handle this.  :-)

-Chris

Jia Chen via llvm-dev

unread,
Mar 28, 2016, 11:10:35 AM3/28/16
to Chris Lattner, llvm-dev


On 03/28/2016 12:37 AM, Chris Lattner wrote:
It changes all the time.  Here’s a trivial example, assume no inlining and no AA other than the one in question:

std::vector<int> V1 = { 1, 2, 3 };
std::vector<int> V2 = { 4, 5, 6 };

V1.pop_back();    // Mutates *this

auto length = V1.size();

V2.pop_back();    // Mutates *this

auto zero = length - V1.size()

In this case, the compiler should “obviously” be able to CSE length, allowing further simplification to substitute zero with 0.

However, with a context sensitive AA, both &V1 and &V2 end up aliasing the “this” pointer in std::vector::pop_back.  As such, without context sensitivity, you would falsely assume that “V2.pop_back();” could modify “V1”.  This is unfortunate, particularly for OO languages that frequently use static dispatch (like C++, Swift, and others).


That said, I have no idea what you’re referring to by "context-insensitive function summary”.  If you’re talking about something context sensitive, then ya, it can handle this.  :-)

For the example to work here the CSE pass itself needs to be flow-sensitive and context-sensitive. I don't think that's how most optimizations in LLVM work. If it is, then I agree with all you said. But if it isn't, there's no point in bumping up the context sensitivity just for the pointer analysis.

As Daniel mentioned earlier in this thread, the analysis analysis framework in LLVM doesn't provide any APIs for flow-sensitive queries as well as context-sensitive queries. This design choice almost eliminate any possibilities for a flow-sensitive or context-sensitive pointer analysis to be useful. Strangely, the set of APIs does support 1-CFA context-sensitive mod-ref queries (so I guess one could somehow reap some context-sensitive benefits out of them after all). To me that design incoherence looks confusing, but I'm pretty sure you know much better than me why it should work that way :)


- Jia

Daniel Berlin via llvm-dev

unread,
Mar 28, 2016, 11:26:35 AM3/28/16
to Jia Chen, llvm-dev

For the example to work here the CSE pass itself needs to be flow-sensitive and context-sensitive. I don't think that's how most optimizations in LLVM work. If it is, then I agree with all you said. But if it isn't, there's no point in bumping up the context sensitivity just for the pointer analysis.

As Daniel mentioned earlier in this thread, the analysis analysis framework in LLVM doesn't provide any APIs for flow-sensitive queries as well as context-sensitive queries. This design choice almost eliminate any possibilities for a flow-sensitive or context-sensitive pointer analysis to be useful. Strangely, the set of APIs does support 1-CFA context-sensitive mod-ref queries (so I guess one could somehow reap some context-sensitive benefits out of them after all). To me that design incoherence looks confusing, but I'm pretty sure you know much better than me why it should work that way :)
 
In general, I would never assume someone sat down and thought *really hard* about the design of an API in a piece of software to get it right for all time. It happens, but it's actually pretty rare.  One of the most limited resources most software teams have is time, and they use often use that time to get done what they need, and design APIs based on current and easily predictable future needs.

The same thing is true about LLVM's APIs.  You shouldn't assume we sat down, thought really hard about AA by meditating upon a mountain, and then designed this API.  Instead, some thought was put into it, based on current and easily predicted needs, and then people moved on.

I can't say this was the wrong choice, either, given how long it has been since someone has even come along considering implementing context sensitive or flow-sensitive AA (and having talked to tons of people over the years, this is *not* a chicken and egg problem. Literally nobody has had a *strong* desire). It wouldn't make much sense to try to design an API without any implementations that want to use those API's.

As that changes, i expect the AA API will be redesigned and reworked to accomodate the needs of those things.

- Jia

Hal Finkel via llvm-dev

unread,
Mar 28, 2016, 11:51:04 AM3/28/16
to Jia Chen, llvm-dev
From: "Jia Chen via llvm-dev" <llvm...@lists.llvm.org>
To: "Chris Lattner" <clat...@apple.com>
Cc: "llvm-dev" <llvm...@lists.llvm.org>
Sent: Monday, March 28, 2016 10:10:12 AM
Subject: Re: [llvm-dev] Existing studies on the benefits of pointer analysis



On 03/28/2016 12:37 AM, Chris Lattner wrote:
It changes all the time.  Here’s a trivial example, assume no inlining and no AA other than the one in question:

std::vector<int> V1 = { 1, 2, 3 };
std::vector<int> V2 = { 4, 5, 6 };

V1.pop_back();    // Mutates *this

auto length = V1.size();

V2.pop_back();    // Mutates *this

auto zero = length - V1.size()

In this case, the compiler should “obviously” be able to CSE length, allowing further simplification to substitute zero with 0.

However, with a context sensitive AA, both &V1 and &V2 end up aliasing the “this” pointer in std::vector::pop_back.  As such, without context sensitivity, you would falsely assume that “V2.pop_back();” could modify “V1”.  This is unfortunate, particularly for OO languages that frequently use static dispatch (like C++, Swift, and others).


That said, I have no idea what you’re referring to by "context-insensitive function summary”.  If you’re talking about something context sensitive, then ya, it can handle this.  :-)

For the example to work here the CSE pass itself needs to be flow-sensitive and context-sensitive. I don't think that's how most optimizations in LLVM work. If it is, then I agree with all you said. But if it isn't, there's no point in bumping up the context sensitivity just for the pointer analysis.
Can you elaborate on what you mean by flow sensitive? We have a mod/ref query interface that can return answers specific to a particular instruction/call pair. The code above could easily live in a single basic block, and if we had function attribute deduction on the 'argmemonly' attribute, we could probably do this now.

 -Hal


As Daniel mentioned earlier in this thread, the analysis analysis framework in LLVM doesn't provide any APIs for flow-sensitive queries as well as context-sensitive queries. This design choice almost eliminate any possibilities for a flow-sensitive or context-sensitive pointer analysis to be useful. Strangely, the set of APIs does support 1-CFA context-sensitive mod-ref queries (so I guess one could somehow reap some context-sensitive benefits out of them after all). To me that design incoherence looks confusing, but I'm pretty sure you know much better than me why it should work that way :)


- Jia

_______________________________________________
LLVM Developers mailing list
llvm...@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

Jia Chen via llvm-dev

unread,
Mar 28, 2016, 12:32:45 PM3/28/16
to Hal Finkel, llvm-dev
On 03/28/2016 10:50 AM, Hal Finkel wrote:

On 03/28/2016 12:37 AM, Chris Lattner wrote:
It changes all the time.  Here’s a trivial example, assume no inlining and no AA other than the one in question:

std::vector<int> V1 = { 1, 2, 3 };
std::vector<int> V2 = { 4, 5, 6 };

V1.pop_back();    // Mutates *this

auto length = V1.size();

V2.pop_back();    // Mutates *this

auto zero = length - V1.size()

Can you elaborate on what you mean by flow sensitive? We have a mod/ref query interface that can return answers specific to a particular instruction/call pair. The code above could easily live in a single basic block, and if we had function attribute deduction on the 'argmemonly' attribute, we could probably do this now.

 -Hal

What I meant is that the CSE needs to be aware of the execution order, i.e. the call to V1.pop_back() should not be in the middle of the two V1.size() for zero to be 0. If there exists more complicated control flows, CSE needs to be able to make the same kind of argument across basic blocks.

I didn't follow LLVM development very closely to be familiar with how LLVM handles CSE. If what I said above is exactly how it works today, then yes we could probably do this now.

But still, there is no APIs that answers "are p and q aliases before this instruction x?". The same can be done for mod-ref today (if I remembered correctly this isn't even the case before the AAResult class came into existence), but not for aliases.

Chris Lattner via llvm-dev

unread,
Mar 29, 2016, 8:49:39 PM3/29/16
to Jia Chen, llvm-dev
On Mar 28, 2016, at 9:32 AM, Jia Chen via llvm-dev <llvm...@lists.llvm.org> wrote:
Can you elaborate on what you mean by flow sensitive? We have a mod/ref query interface that can return answers specific to a particular instruction/call pair. The code above could easily live in a single basic block, and if we had function attribute deduction on the 'argmemonly' attribute, we could probably do this now.

 -Hal

What I meant is that the CSE needs to be aware of the execution order, i.e. the call to V1.pop_back() should not be in the middle of the two V1.size() for zero to be 0. If there exists more complicated control flows, CSE needs to be able to make the same kind of argument across basic blocks.

I didn't follow LLVM development very closely to be familiar with how LLVM handles CSE. If what I said above is exactly how it works today, then yes we could probably do this now.

The existing LLVM passes and infrastructure already do this.

But still, there is no APIs that answers "are p and q aliases before this instruction x?". The same can be done for mod-ref today (if I remembered correctly this isn't even the case before the AAResult class came into existence), but not for aliases.

This is already handled correctly by LLVM.  The sequence in question is:

auto length = V1.size();

“V1.size()” (which is effectively a load) is available after this instruction.

V2.pop_back();    // Mutates *this

Alias analysis is queried to say “does std::vector::pop_back(&V2) mod/ref &V1”?

If it returns mod, then “length” is removed from the available set.

auto zero = length - V1.size()

The access to V1.size() also a load, if it is in the available set, it is replaced with the SSA value for “length”.  A later simplification pass turns "length-length" into 0, allowing the original load to be removed as dead.

-Chris

Chris Lattner via llvm-dev

unread,
Mar 29, 2016, 8:52:11 PM3/29/16
to Daniel Berlin, llvm-dev, Jia Chen
FWIW, DS-AA is context sensitive in this way, and LLVM definitely does take advantage of it.  Even circa 2005. This is the whole point of chapter 4 of http://llvm.org/pubs/2005-05-04-LattnerPHDThesis.html 

-Chris

Reply all
Reply to author
Forward
0 new messages