For the last few months since the GCC Cauldron I've been
researching/helping out
plan for multi-threading GCC. I've posted my GCC ideas here:
https://docs.google.com/document/d/1po_RRgSCtRyYgMHjV0itW8iOzJXpTdHYIpC9gUMjOxk/edit
as I've heard there is interest on the LLVM side as well. I've talked to
some of the IBM folks from
the Toronto area about it face to face due to them having more middle
end and back end knowledge
then me.
Not sure if anyone is interested in my ideas or wants me to extend my
help on the LLVM side,
Nick
_______________________________________________
LLVM Developers mailing list
llvm...@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
You might want to check out MLIR: its pass manager is already automatically and implicitly multithreaded.
-Chris
On 2/28/20 12:19 AM, Chris Lattner wrote:
> Hi Nicholas,
>
> You might want to check out MLIR: its pass manager is already automatically and implicitly multithreaded.
>
> -Chris
Chris,
I was aware that LLVM was moving to MLIR at some point due to this. I've
curious as
to how MLIR deals with IPO as that's the problem I was running into.
Even if you have
pipelines what guarantees do we have about IPO or are there any. For
example if an
analysis pass runs after a loop pass for heuristics to get good data,
does MLIR ensure
that? The problem isn't getting a pipeline as much as IPO issues in
terms of rescheduling
in a pipeline or incorrectly splitting up into pipelines. I yet to find
a good solution on the
GCC side as well and it seems that there will be issues with this no
matter what, not
sure of what trade offs the LLVM side is willing to make.
The other issues is that graph data structures to my knowledge do not
allow insertion
of multiple nodes at the same time or how to scale the graphs for
callers or SSA. Not
sure if you guys have encapsulated the operators and data structures in
a series of
classes as that would be the first part. The other part is figuring out
how to link and
interact with build systems to launch threads from make -j or other
similar things.
Thanks for the notice about MLIR through maybe my IPO is not really there
but after reading parts of it seems to be a issue through a little
smaller still
and thanks for the prompt response,
Nick
On 2/28/20 12:19 AM, Chris Lattner wrote:Hi Nicholas,Chris,
You might want to check out MLIR: its pass manager is already automatically and implicitly multithreaded.
-Chris
I was aware that LLVM was moving to MLIR at some point due to this. I've curious as
to how MLIR deals with IPO as that's the problem I was running into. Even if you have
pipelines what guarantees do we have about IPO or are there any. For example if an
analysis pass runs after a loop pass for heuristics to get good data, does MLIR ensure
that? The problem isn't getting a pipeline as much as IPO issues in terms of rescheduling
in a pipeline or incorrectly splitting up into pipelines. I yet to find a good solution on the
GCC side as well and it seems that there will be issues with this no matter what, not
sure of what trade offs the LLVM side is willing to make.
The other issues is that graph data structures to my knowledge do not allow insertion
of multiple nodes at the same time or how to scale the graphs for callers or SSA. Not
sure if you guys have encapsulated the operators and data structures in a series of
classes as that would be the first part. The other part is figuring out how to link and
interact with build systems to launch threads from make -j or other similar things.
Thanks for the notice about MLIR through maybe my IPO is not really there
but after reading parts of it seems to be a issue through a little smaller still
and thanks for the prompt response,
Johannes,
That's fine. Unfortunately I've a student so the GCC side knows as well
but I will
only be working on this in my spare time. We did get somewhere with my paper
here:
https://docs.google.com/document/d/1po_RRgSCtRyYgMHjV0itW8iOzJXpTdHYIpC9gUMjOxk/edit
I've looking into the SSA trees currently but we can discuss what you
guys want me
to start researching first. The research does need to happen including
profiling due
to avoiding making mistakes in choosing major implementation details.
Otherwise just ping me and the list when your ready or other people can,
Yep, agreed, I gave a talk a couple days ago (with Tatiana) with a proposed path forward, but explained it as one possible path. We’ll share the slides publicly in a few days after a couple things get taken care of.
-Chris
>
> If we want to look into multi-threading for the existing LLVM
> infrastructure there are certainly possibilities. Computing (function)
> analyses results prior to CGSCC and Module passes for example. When it
> comes to transformations it is less straight forward but there is still
> interest in looking into it.
>
> I'm traveling but if you are interested we can discuss this further next
> week.
>
> Cheers,
> Johannes
_______________________________________________
On 2/28/20 12:19 AM, Chris Lattner wrote:
> Hi Nicholas,
>
> You might want to check out MLIR: its pass manager is already automatically and implicitly multithreaded.
>
> -Chris
Chris,
I was aware that LLVM was moving to MLIR at some point due to this.
I've
curious as
to how MLIR deals with IPO as that's the problem I was running into.
I've
curious as
to how MLIR deals with IPO as that's the problem I was running into.
FWIW I believe LLVM's new pass manager (NPM) was designed with parallelism and the ability to support this situation (that MLIR doesn't? Or doesn't to the degree/way in which the NPM does). I'll leave it to folks (Chandler probably has the most context here) to provide some more detail there if they can/have time.
On Feb 29, 2020, at 2:08 PM, David Blaikie <dbla...@gmail.com> wrote:I've
curious as
to how MLIR deals with IPO as that's the problem I was running into.
FWIW I believe LLVM's new pass manager (NPM) was designed with parallelism and the ability to support this situation (that MLIR doesn't? Or doesn't to the degree/way in which the NPM does). I'll leave it to folks (Chandler probably has the most context here) to provide some more detail there if they can/have time.Historically speaking, all of the LLVM pass managers have been designed to support multithreaded compilation (check out the ancient history of the WritingAnLLVMPass doc if curious).
The problem is that LLVM has global use-def chains on constants, functions and globals, etc, so it is impractical to do this. Every “inst->setOperand” would have to be able to take locks or use something like software transactional memory techniques in their implementation. This would be very complicated and very slow.
MLIR defines this away from the beginning. This is a result of the core IR design, not the pass manager design itself.
-Chris
- Dave
-Chris
The point
When the number of compiles exceeds the number of cores such that all the cores are utilized, nothing is gained by trying to multi-thread the individual compiles. In fact, loading up the cores with more threads or tasks than there are cores will reduce compiling efficiency because of time slicing. And sequencing through more tasks than less when the cores are not overloaded will reduce compiling efficiency because more tasks have to be loaded and unloaded to the cores.
In addition to what Chris said, there’s also the case of large TUs / Unity files. Given that currently the compilation of a single TU is not multi-thread, you can get “spikes” as the one below, where only one core (out of many) is working. One minute wasted, when I have many cores available:

For this specific case, not having Unity files makes the build uniform (no spikes like the one above), but it takes 10x more time to compile (20k TUs and 25k .h files compiled, which reduce down to ~600 Unity).
To fix the spike, you then have to resort to a iterative optimization algorithm running nightly to find the best trade-off between size of unity and build times. This complicates things further.
The point being, +1 for multi-threading the compiler :-)
Alex.
De : llvm-dev <llvm-dev...@lists.llvm.org>
De la part de Chris Lattner via llvm-dev
Envoyé : March 2, 2020 1:08 PM
À : Neil Nelson <nne...@infowest.com>
Cc : llvm...@lists.llvm.org
Objet : Re: [llvm-dev] Multi-Threading Compilers
Very
good. I just saw that Alex is making some good points.
The TBB pages earlier are an excellent reference for multi-threading or for TBB multi-tasking. They illustrate issues that a good threading design should consider that are not commonly recognized when embarking on a that design. I would say the task scheduler is their most comprehensive approach.
Clearly, I am not familiar enough with LLVM to be making any definite prescription and recommend your more knowledgeable judgment.
But in the spirit of attempting a more complete perspective we may consider that a threading design for LLVM is not simple. And I suspect not nearly as simple, not saying that the current ideas are simple in that I do not understand them, as has been explored to this point.
For example, you remarked there are use cases, and having a realistic appreciation for what use cases there may be is important, where a multi-threading compile would be useful. But then on that point, if we were to have threading on the compile to an object file, would we then overload the cores, make them slower, when using ninja to compile LLVM? How would the compile threading take into account this other use case, if that was something to consider?
Do we encumber the LLVM design with threading
such that the
trade-off for that design against its use-case benefits is not
well
justified? My sense is that if threading was judged a reasonable
goal
that a preliminary design would be presented after some research
that
would initially complement non-threading goals, a general
improvement
that would yield a better track for threading later. This
reflects
the sense that a good threading design is not simple and will
impact
a significant portion of LLVM. And it could well be that that is
what is happening now.
How would threading impact future LLVM development? My sense is that in some reasonable portion of future development would be made more difficult.
You mentioned that single object compiling would have a benefit. Here are some factors that could mitigate that benefit.
Although core speeds are not expected to increase and remain in the 4-5 gigahertz rate, IPC (instructions per cycle) is expected to increase at a slow rate. Over time there will be a natural reduction in compile time for some benchmark compile.
There is also a gain in compile time for small source files as against larger source files and some argument with regard to design could be made for some maximum source file size to reduce complexity that would also address keeping compile time down. Is there a reason to expect an increase in source file size that would increase compile time?
Noting the LLVM compile test from my prior post gave an average compile time of 14*0.1532 = 2.1448 seconds, is compile time a significant or marginal issue? What would be the target for an average compile time?
Neil Nelson
I can't say anything about the GCC side, but this isn't a particularly novel aspect of the MLIR pass manager. In many ways, the pass manager is the easiest/simplest part of the multi-threading problem. The bigger problem is making sure that the rest of the compiler infrastructure is structured in a way that is thread-safe, or can be made thread-safe. This is why most of the discussion is based around how to model things like constants, global values, etc. When I made MLIR multi-threaded a year ago, a large majority of my time was spent outside of the pass manager. For a real example, I spent much more time just on multi-threaded pass timing than making the pass manager itself multi-threaded.
I can't say anything about the GCC side, but this isn't a particularly novel aspect of the MLIR pass manager. In many ways, the pass manager is the easiest/simplest part of the multi-threading problem. The bigger problem is making sure that the rest of the compiler infrastructure is structured in a way that is thread-safe, or can be made thread-safe. This is why most of the discussion is based around how to model things like constants, global values, etc. When I made MLIR multi-threaded a year ago, a large majority of my time was spent outside of the pass manager. For a real example, I spent much more time just on multi-threaded pass timing than making the pass manager itself multi-threaded.
+Chandler & Alina for new pass manager context
On Thu, Feb 27, 2020 at 9:45 PM Nicholas Krause via llvm-dev <llvm...@lists.llvm.org> wrote:
On 2/28/20 12:19 AM, Chris Lattner wrote:
> Hi Nicholas,
>
> You might want to check out MLIR: its pass manager is already automatically and implicitly multithreaded.
>
> -Chris
Chris,
I was aware that LLVM was moving to MLIR at some point due to this.
FWIW: I don't /tihnk/ that's quite an accurate representation of the situation. MLIR is now part of the LLVM project, but so far I haven't seen any discussions of moving the core LLVM project itself to MLIR (admittedly I'm not super clear on the details of MLIR or what that'd look like - I would've guessed that LLVM would itself be a lower level that a given MLIR stack might lower to, etc). There's been some discussion of folks interested in experimenting with using MLIR in Clang, though that's not a project goal right now.
I've
curious as
to how MLIR deals with IPO as that's the problem I was running into.
FWIW I believe LLVM's new pass manager (NPM) was designed with parallelism and the ability to support this situation (that MLIR doesn't? Or doesn't to the degree/way in which the NPM does). I'll leave it to folks (Chandler probably has the most context here) to provide some more detail there if they can/have time.
Oops.. sorry, hit send too soon..
Picking on this point, perhaps a bit off-topic for the whole discussion.I can't say anything about the GCC side, but this isn't a particularly novel aspect of the MLIR pass manager. In many ways, the pass manager is the easiest/simplest part of the multi-threading problem. The bigger problem is making sure that the rest of the compiler infrastructure is structured in a way that is thread-safe, or can be made thread-safe. This is why most of the discussion is based around how to model things like constants, global values, etc. When I made MLIR multi-threaded a year ago, a large majority of my time was spent outside of the pass manager. For a real example, I spent much more time just on multi-threaded pass timing than making the pass manager itself multi-threaded.
I have recently realized that for the purpose of multi-threaded pass timing currently existing LLVM timers
are hardly suitable since they measure per-process time instead of per-thread time.
(and there seem to be no portable LLVM interfaces for per-thread time query :( )
From a first glance it seems that in your MLIR timing examples all the times are also per-process.
How do you handle cases when half of your threads are doing something else?
And if you handle it per-thread - can you point me to the code doing that, pls :)
On Feb 28, 2020, at 8:56 AM, Johannes Doerfert <johannes...@gmail.com> wrote:On 02/28, Nicholas Krause via llvm-dev wrote:Anyhow what is the status and what parts are we planning to move to
MLIR in LLVM/Clang. I've not seen any discussion on that other than
starting to plan for it.
As far as I know, there is no (detailed/discussed/agreed upon/...) plan
to move any existing functionality in LLVM-Core or Clang to MLIR. There
are some people that expressed interest in there is Chris's plan on how
the transition could look like.
Yep, agreed, I gave a talk a couple days ago (with Tatiana) with a proposed path forward, but explained it as one possible path. We’ll share the slides publicly in a few days after a couple things get taken care of.
> For example, you remarked there are use cases, and having a realistic
> appreciation for what use cases there may be is important, where a
> multi-threading compile would be useful.
LTO could significantly benefit.
> There is also a gain in compile time for small source files as against
> larger source files and some argument with regard to design could be
> made for some maximum source file size to reduce complexity that would
> also address keeping compile time down. Is there a reason to expect an
> increase in source file size that would increase compile time?
One data point: we see many customer codes with generated sources.
Those sources just keep growing and growing.
> Noting the LLVM compile test from my prior post gave an average compile
> time of 14*0.1532 = 2.1448 seconds, is compile time a significant or
> marginal issue? What would be the target for an average compile time?
It is significant for us. Over the years we have even added special
options just to speed up compile time for customers' develop-test-debug
cycles.
-David
Compiler Folks are very busy people as there aren't as much of us unfortunately so no need toHey,
Apologies for the wait, everything right now is going crazy..
I think we should early in move this conversation on the llvm Dev list but generally speaking we can see three options here:1) parallelize single passes or a subset of passes that are known to not interfer, e.g. the attributor,2) parallelize analysis pass execution before a transformation that needs them,
I was researching this on and off for the last few months in terms of figuring out how to make the pass manager itself async. Its not easy and I'm not even3) investigate what needs to be done for a parallel execution of many passes, e.g. How can we avoid races on shared structure such as the constant pool.
As mentioned on the other thread, the main challenge here is in the
use lists of constant values (which includes also globals and
functions). Right now, those are linked lists that are global for an
entire LLVMContext. Every addition or removal of a use of a constant
has to touch them, and doing such fine-grained locking doesn't seem
like a great idea.
So this is probably the biggest and seemingly most daunting thing
you'd have to address first, but it's feasible and seems like a good
idea to evolve LLVM IR in a direction where it ends up looking more
like MLIR and can avoid these locks.
Cheers,
Nicolai
--
Lerne, wie die Welt wirklich ist,
aber vergiss niemals, wie sie sein sollte.
GCC has the same issues it terms of certain core structures so not
really surprised.
>
> So this is probably the biggest and seemingly most daunting thing
> you'd have to address first, but it's feasible and seems like a good
> idea to evolve LLVM IR in a direction where it ends up looking more
> like MLIR and can avoid these locks.
Sure that makes sense I will see what Johannes wants to start with.
Nick
>
> Cheers,
> Nicolai
I think addressing this issue first makes sense. I would however start
by determining the actual impact of different design choices here. I
mean, do we know locks will be heavily contented? If I had to guess I'd
say most passes will not create or modify functions nor add or remove
calls. I further guess that passes which create/query llvm::Constant
values will do so for ConstantInt between -1 and 2, I mean most of the
time. This might be wrong but we should for sure check before we
redesign the entire constant handling (as MLIR did). My suggestion is to
profile first. What we want is to monitor the use-list of constants but
I'm not sure if that is easy off the top of my head. What we can do
easily is to print a message in the methods that are used to "create" a
constant, thus the constructors (of llvm::Constant) and the
ConstantXXX::get() methods. We print the pass names and these "constant
generation" messages in a run of the test suite and analyze the result.
What passes create constants, how often, which (kind of) constants, etc.
We should also determine if any pass ever walks the use list of
constants. I know we do it for global symbols but I don't know we do it
for others. That said, I think it is sensible to distinguish global
symbols and other constants at some point because (I think) we use them
differently.
From there we decide how to move forward. Localized constants, as MLIR
has them, some locking (or similar solution), or maybe just restrictions
on the parallel execution of passes.
I hope this makes some sense.
Cheers,
Johannes
If your talking about this class:
https://llvm.org/doxygen/classllvm_1_1Constant.html
Then yes that makes sense in terms of getting the data. The only three
questions I would
ask are:
1. There are methods in pass classes for getting the name of them.
However the problem
would be adding a switch statement to check what were passing into the
constructor. I'm
not sure of what the top level getName e.t.c. class is. Therefore if
there is one I'm assuming
Value or ModulePass. This makes it easier as you don't have to wonder
about data types
but walk down the virtual hierarchy for this. Names at least to my
knowledge are a top
level feature and we should just use it there. So just do:
XXX->GetName() where XXX is the top level class(es).
2. Lock contention is rather vague. For example we could have multiple
readers and few writers.
So we would also need to figure that out in order to choose the right
locks. Thoughts?
3. Does LLVM have something like the GCC compile farm as at most point
were going to need
to test on larger machines for scaling and race conditions.
The only other thing is LLVM seems to be late rc so it doesn't matter to
me but do we want to
wait for LLVM 10 final to be released as this is new work. I don't think
it matters frankly as this
is way off being in mainline anything soon.
Thanks and sorry if I mistaken,
Nick
On Mar 19, 2020, at 2:31 PM, Johannes Doerfert <johannes...@gmail.com> wrote:I think addressing this issue first makes sense. I would however start
by determining the actual impact of different design choices here. I
mean, do we know locks will be heavily contented? If I had to guess I'd
say most passes will not create or modify functions nor add or remove
calls.
On Mar 20, 2020, at 12:34 PM, Nicholas Krause <xero...@gmail.com> wrote:
The problem isn’t constants or functions themselves, it is that they are instances of llvm::Value. Everything that walks a use/def list would have to run code that checks for this, and every call to inst->setOperand() would have to do locking or conditional locking. This would be a significant across-the-board slowdown for the compiler even when globals and constants are not involved.-Chris
Hi Chris,
Thanks for the comments. Sure that may be true but I would prefer to get real data to see how shared data actually looks. We can
argue over it at a theoretical level. However with any real mutli-threading of a code base, real data is required. I've not sure about
setOperand() or other llvm:Value issues but real data is more my concern. Sure it may turn out that is the case but I would prefer
to have data as guessing with multi-threaded code in terms of locking/scaling is always bad or even assuming issues.
Yes.
> Then yes that makes sense in terms of getting the data. The only
three questions I would
> ask are:
> 1. There are methods in pass classes for getting the name of them.
However the problem
> would be adding a switch statement to check what were passing into
the constructor. I'm
> not sure of what the top level getName e.t.c. class is. Therefore if
there is one I'm assuming
> Value or ModulePass. This makes it easier as you don't have to wonder
about data types
> but walk down the virtual hierarchy for this. Names at least to my
knowledge are a top
> level feature and we should just use it there. So just do:
> XXX->GetName() where XXX is the top level class(es).
You lost me. I was thinking to add a print in all the interesting
llvm::Constant classes. If you run opt (or clang) with
-debug-pass=Details you should see what is printed interleaved with
messages what passes are run. So you can attribute passes to actions on
constants. Exactly this might not work but there should be a way to do
this.
> 2. Lock contention is rather vague. For example we could have
multiple readers and few writers.
> So we would also need to figure that out in order to choose the right
locks. Thoughts?
A lot, but at the end it depends on what we are dealing with. Let's
postpone this discussion.
> 3. Does LLVM have something like the GCC compile farm as at most point
> were going to need to test on larger machines for scaling and race
> conditions.
We have buildbots but most people have access to larger machines.
Testing and measuring clang on a large node or on multiple nodes will be
the least of our problems ;)
> The only other thing is LLVM seems to be late rc so it doesn't matter
> to me but do we want to wait for LLVM 10 final to be released as this
> is new work. I don't think it matters frankly as this is way off being
> in mainline anything soon.
Agreed. We need to gather data. Determine options forward, present those
options to the community and start a discussion, prototype options to
see how they pan out, hopefully finish the discussion, implement the
desired solution, go through review and testing, merge and enable it, in
roughly this order.
Okay, let's actually think through how practical that is. The class
hierarchy is:
Value
- Argument
- BasicBlock
- InlineAsm (huh, why is that not a constant?)
- MetadataAsValue (+ children)
- User
-- Instruction (+ children)
-- Constant
--- ConstantData (undef, token none, literals)
--- ConstantAggregate (non-literal aggregates)
--- BlockAddress
--- ConstantExpr
--- GlobalValue (+ children)
-- Operator (utility / facade, i.e. not real)
-- DerivedUser (extension point used by MemorySSA)
It seems to me that the only points of this hierarchy that are
guaranteed to be function-local are Argument and Instruction.
Everything else could end up having uses from multiple functions
and/or initializers of globals. DerivedUser seems particularly special
-- but it's only used by MemorySSA, and that appears to be
function-local?
Of the values that can have global references to them, I believe that
most could just be immortal and without use lists. I'd say this
applies to:
- InlineAsm
- MetadataAsValue
- Constant other than GlobalValue
This leaves as values with global references that _cannot_ be immortal:
- GlobalValue (+ children)
- BasicBlock
And values that will have use lists, and only local references are:
- Argument
- Instruction
So the following does seem like a feasible minimal step forward:
1. Build a mechanism to break the use dependencies for GlobalValue and
BasicBlock, i.e. allow immortal BlockAddress and ConstantGlobalValue
values while _also_ allowing us to delete GlobalValues and
BasicBlocks.
For this mechanism, we can let ourselves be inspired by
mlir::SymbolRefAttr, which uses strings for the linkage.
Alternatively (and perhaps preferably), we could use a weak_ptr-like
mechanism. This can be very efficient, since each GlobalValue and
BasicBlock only ever needs to have a single instance of
ConstantGlobalValue and BlockAddress referring to it, respectively, so
a simple back-link is sufficient.
2. Change setOperand to only update use lists of Argument and
Instruction. All other use lists will always be empty -- no special
handling in the use list accessors is required, but we should place
assertions there to assert that use lists are not accidentally used on
other value types.
How does that sound for that start?
> When creating a constant we could provide the function in which the
> constant is used. We have a ConstantExpr wrapper per function to make
> the constants local (similar to what I understand the MLIR design is).
> We would not get pointer comparison between constants used in different
> functions but I suspect the places we need this we can specialize
> accordingly.
>
> We wouldn't need to lock every setOperand call but only the ones that
> set a constant operand.
MLIR distinguishes between "attributes" and "operations". So you'd
define a constant by placing an operation (aka instruction) in each
function that has the actual constant as an attribute.
mlir::Attributes don't have use lists at all, so they're like
llvm::Metadata in that sense.
There is a certain elegance to having explicit instructions to import
constants into the llvm::Function context, but also ugliness, as IR
will then be full of "constant" instructions, unlike today. Plus,
changing all existing LLVM IR to have those is an enormous amount of
churn. I'm not totally opposed to doing this, but I'd lean towards
smaller, incremental steps.
Regardless, there are some related cleanups that seem like they would
be useful. For example, getting rid of llvm::ConstantExpr seems like a
very good idea to me, though somewhat orthogonal to the problem of
multi-threading. While we're at it, we could get rid of
llvm::ConstantAggregate and move llvm::Constant to no longer be an
llvm::User. This may impact how global value initializers work.
Cheers,
Nicolai
--
Lerne, wie die Welt wirklich ist,
aber vergiss niemals, wie sie sein sollte.
On Mar 25, 2020, at 12:52 AM, Doerfert, Johannes <jdoe...@anl.gov> wrote:
Some random thoughts:
If no pass ever walks the use list of a constant, except globals which
we could handle differently, we could get rid of their use-list or
overwrite their use-list interface functions to make them no-ops.
On Mar 25, 2020, at 8:35 AM, Nicolai Hähnle <nhae...@gmail.com> wrote:
MLIR distinguishes between "attributes" and "operations". So you'd
define a constant by placing an operation (aka instruction) in each
function that has the actual constant as an attribute.
mlir::Attributes don't have use lists at all, so they're like
llvm::Metadata in that sense.
There is a certain elegance to having explicit instructions to import
constants into the llvm::Function context, but also ugliness, as IR
will then be full of "constant" instructions, unlike today. Plus,
changing all existing LLVM IR to have those is an enormous amount of
churn.
I'm not totally opposed to doing this, but I'd lean towards
smaller, incremental steps.
Regardless, there are some related cleanups that seem like they would
be useful. For example, getting rid of llvm::ConstantExpr seems like a
very good idea to me, though somewhat orthogonal to the problem of
multi-threading. While we're at it, we could get rid of
llvm::ConstantAggregate and move llvm::Constant to no longer be an
llvm::User. This may impact how global value initializers work.
Since analyses do not modify the IR, we these look like a low-hanging
fruit. At least the legacy pass manager knows in advance which
analyses a pass needs, and could run all of them in parallel before
the passes themselves.
Michael
I doubt it is as simple as that and I we should differentiate more
before we make blanked statements.
> Today, you can walk the use-list of any operand to an instruction.
> This would be broken by this change,
First, you would be able to walk the use-list of any operand. So nothing
breaks just yet.
> which would make it much easier to write buggy/incorrect compiler code
> and passes.
I argued (implicitly) above that the uses of a Constant[Int/Float/...]
are really not interesting if no-one ever walks them. Let's assume you
walk the uses and we "removed" the use list so there are none, what does
that mean. I'd say, nothing much. If you inspect the Value and see it's
a Constant you have to assume it has Aliases that denote the same value
so direct uses are not really interesting anyway. If you inspect further
and see ConstantInt/Float/... you can deal with the missing use list. If
you don't inspect the Value you cannot really make much of an empty use
list, or can you? I doubt we ever call RAUW on a ConstantInt/Float/...
Feel free to elaborate on your concerns.
Thanks,
Johannes
If I understand correctly, you are suggesting that you can walk it, but you don’t get all the uses. This is extremely dangerous, it would be better to abort.
>> which would make it much easier to write buggy/incorrect compiler code
>> and passes.
>
> I argued (implicitly) above that the uses of a Constant[Int/Float/...]
> are really not interesting if no-one ever walks them.
There is a difference between “no one ever walks them in practice” and “no one can walk them. :-)
> Let's assume you
> walk the uses and we "removed" the use list so there are none, what does
> that mean. I'd say, nothing much. If you inspect the Value and see it's
> a Constant you have to assume it has Aliases that denote the same value
> so direct uses are not really interesting anyway. If you inspect further
> and see ConstantInt/Float/... you can deal with the missing use list. If
> you don't inspect the Value you cannot really make much of an empty use
> list, or can you? I doubt we ever call RAUW on a ConstantInt/Float/...
>
> Feel free to elaborate on your concerns.
Have you tested your thesis that no one walks the use-def chains? You could just add an assertion to the compiler (the methods like use_begin() hasOneUse() etc), that aborts on constants, then run the test suite.
On Mar 25, 2020, at 5:14 PM, Doerfert, Johannes <jdoe...@anl.gov> wrote:Let's assume you
walk the uses and we "removed" the use list so there are none, what does
that mean. I'd say, nothing much. If you inspect the Value and see it's
a Constant you have to assume it has Aliases that denote the same value
so direct uses are not really interesting anyway. If you inspect further
and see ConstantInt/Float/... you can deal with the missing use list. If
you don't inspect the Value you cannot really make much of an empty use
list, or can you? I doubt we ever call RAUW on a ConstantInt/Float/...
Feel free to elaborate on your concerns.
Have you tested your thesis that no one walks the use-def chains? You
could just add an assertion to the compiler (the methods like
use_begin() hasOneUse() etc), that aborts on constants, then run the
test suite.
That is what I suggested a few emails back. However, I never said it is
safe because no one walks it but what I suggested is that it will
actually not impact much if we don't remember the uses of Constants
(except globals).
Agreed, though as Chris points out it's something that can and should
be tested experimentally fairly easily.
> We might be able to break the dependencies on GlobalValue without too much work directly as an IR transformation. I’ve not thought this completely through, but we might get away with a module pass that duplicate globals per function and replace all uses in a function with the copy per function before running a set of function passes. Then merge the duplicates again before running passes that inspect global uses.
That feels like a really awkward hack :(
Now you have to keep track of whether the IR is currently in the state
where function pass parallelization is okay or not, which seems rather
fragile.
It also doesn't solve the problem of Functions themselves -- those are
also GlobalValues...
On Thu, Mar 26, 2020 at 12:47 AM Chris Lattner <clat...@nondot.org> wrote:
> > There is a certain elegance to having explicit instructions to import
> > constants into the llvm::Function context, but also ugliness, as IR
> > will then be full of "constant" instructions, unlike today. Plus,
> > changing all existing LLVM IR to have those is an enormous amount of
> > churn.
>
> Yes, it increases verbosity of the IR this is very true. In practice, it is not a problem though.
Okay, it seems like it could be a viable long-term goal to do this.
What I'm trying to figure out is how to incrementally evolve LLVM IR,
and it feels like this change to enable "constant materialization
instructions" is extremely invasive in terms of churn in the code base
on a level comparable to opaque pointers -- and it's not strictly
necessary for the goal of multi-threading. I'm just trying to be
pragmatic here.
[snip]
> On Mar 25, 2020, at 4:14 PM, Doerfert, Johannes <jdoe...@anl.gov> wrote:
> >
> >> Today, you can walk the use-list of any operand to an instruction.
> >> This would be broken by this change,
> >
> > First, you would be able to walk the use-list of any operand. So nothing
> > breaks just yet.
>
> If I understand correctly, you are suggesting that you can walk it, but you don’t get all the uses. This is extremely dangerous, it would be better to abort.
Agreed.
> >> which would make it much easier to write buggy/incorrect compiler code
> >> and passes.
> >
> > I argued (implicitly) above that the uses of a Constant[Int/Float/...]
> > are really not interesting if no-one ever walks them.
>
> There is a difference between “no one ever walks them in practice” and “no one can walk them. :-)
Yes, but you've got to ask yourself: what's the point of walking the
use list of a scalar constant? It feels like code that would do that
is quite likely broken to begin with and should probably abort already
today... which is exactly the experiment that you suggest below, and I
agree that doing this experiment would be a useful step in all this :)
Cheers,
Nicolai
--
Lerne, wie die Welt wirklich ist,
aber vergiss niemals, wie sie sein sollte.
> On Mar 26, 2020, at 10:19, Nicolai Hähnle via llvm-dev <llvm...@lists.llvm.org> wrote:
>
> On Wed, Mar 25, 2020 at 5:26 PM Florian Hahn <floria...@apple.com> wrote:
>> I think one important observation is that most passes currently probably do not really care too much about use lists of various constants, beyond the bookkeeping required to maintain them.
>
> Agreed, though as Chris points out it's something that can and should
> be tested experimentally fairly easily.
>
>
>> We might be able to break the dependencies on GlobalValue without too much work directly as an IR transformation. I’ve not thought this completely through, but we might get away with a module pass that duplicate globals per function and replace all uses in a function with the copy per function before running a set of function passes. Then merge the duplicates again before running passes that inspect global uses.
>
> That feels like a really awkward hack :(
>
> Now you have to keep track of whether the IR is currently in the state
> where function pass parallelization is okay or not, which seems rather
> fragile.
>
Agreed, it is not production solution. I think it was not really clear from my previous mail, but I see it mostly as a relatively cheap approach (in terms of work required to get it working) to try to get to a state where we can run experiments with function passes in parallel and get a better idea of the benefit and turn up other issues.
> It also doesn't solve the problem of Functions themselves -- those are
> also GlobalValues…
I am not sure why not. Function passes should only rely on the information at the callsite & from the declaration IIRC. For functions, we coulkd add extra declarations and update the call sites. But I might be missing something.
Cheers,
Florian
I feel like I should clarify this after thinking a bit more about this.
It sounds like "constant materialization instructions" are a useful
feature to have for better location information (for diagnostics and
debug info). And _introducing_ constant materialization instructions
should be quite feasible to do incrementally. For example, if Clang or
some other frontend sees a use case for adding those in order to do
more work in IR rather than on the AST, then that should be
supportable.
The thing that I think is "extremely invasive in terms of churn in the
code base" is making constant materialization instructions the _only_
representation of constants inside a function. Which means that this
path is not very viable if your goal is to enable multi-threaded
compilation.
However, if your goal is to move towards constant materialization
instructions for the other benefits that they can bring, then there is
an incremental path available that can quickly bring you gains -
something along the lines of:
1. The initial MVP: Introduce constant materialization instructions as
a concept. Add a switch to the Builder so that constants are
optionally emitted as materialization instructions, start using that
switch in at least one frontend, add/convert some initial passes to
make use of them, and add a very simple pass that strips them out
again for the remainder of the compilation pipeline.
2. Teach an increasing number of passes, analyses, matching
infrastructure, etc. about the constant materialization instructions.
3. Eventually, get to the point where the pass that strips them out
can be removed. Somewhere around that time, we can make the Builder
unconditionally emit them everywhere.
4. Flip the switch that starts aborting when old-style Constants are
used instead.
5. Cleanup any remains of old constant support e.g. in the matching
infrastructure.
This seems viable, but very much a marathon and not a sprint, and I
feel that this is going away a bit from the topic of multi-threading
Function passes can remove, duplicate, or just plain introduce call
sites (e.g. recognize a memset pattern), which means the use lists of
Functions can be changed during a function pass...
Cheers,
Nicolai
--
Lerne, wie die Welt wirklich ist,
aber vergiss niemals, wie sie sein sollte.
> On Mar 26, 2020, at 10:55, Nicolai Hähnle <nhae...@gmail.com> wrote:
>
> On Thu, Mar 26, 2020 at 11:53 AM Florian Hahn <floria...@apple.com> wrote:
>>> It also doesn't solve the problem of Functions themselves -- those are
>>> also GlobalValues…
>>
>> I am not sure why not. Function passes should only rely on the information at the callsite & from the declaration IIRC. For functions, we coulkd add extra declarations and update the call sites. But I might be missing something.
>
> Function passes can remove, duplicate, or just plain introduce call
> sites (e.g. recognize a memset pattern), which means the use lists of
> Functions can be changed during a function pass…
Sure, but a single function won’t be processed in parallel by a function pass and would just work on the 'local version' of the globals it uses, including called functions. So a function pass adding/removing calls to existing ‘local versions’ of functions should not be a problem I think.
One problem is that function passes can add new global, e.g. new declarations if they introduce new calls. I guess that would require some locking, but should happen quite infrequently.
Cheers,
Florian
Can they? I have had to make a pass a ModulePass in the past, because
it added a global to hold a cache. The global was used in only the
function being modified, but the fact that it was a global prevented the
pass from being a FunctionPass.
David
> On Mar 26, 2020, at 11:09, David Chisnall via llvm-dev <llvm...@lists.llvm.org> wrote:
>
> On 26/03/2020 11:06, Florian Hahn via llvm-dev wrote:
>> One problem is that function passes can add new global, e.g. new declarations if they introduce new calls. I guess that would require some locking, but should happen quite infrequently.
>
> Can they? I have had to make a pass a ModulePass in the past, because it added a global to hold a cache. The global was used in only the function being modified, but the fact that it was a global prevented the pass from being a FunctionPass.
Maybe they should not, but I think in practice plenty of function passes insert new declarations.
For example, I think Intrinsic::getDeclaration inserts a new declaration if the requested intrinsic is not yet declared (https://github.com/llvm/llvm-project/blob/master/llvm/lib/IR/Function.cpp#L1117) and a lot of function passes use it.
Cheers,
Florian
I haven't but, I suggested a few emails back we should gather such data
(and I was hoping Nicholas was going to do it).
Could you elaborate on what you said earlier. Why is problematic to
remove the use-lists tracking from Constants, except globals? If you
have a use case in which an always empty use list for such constants
would be problematic it would be beneficial for us to consider it sooner
than later.
Thanks,
Johannes
Oh, I see what you're saying now. Yes, agreed that that should work.
Though by the time you're done implementing the conversion from and to
that representation and made sure you've covered all the corner cases,
I'm not so sure you've really saved a lot of effort relative to just
doing it right in the first place :)
> One problem is that function passes can add new global, e.g. new declarations if they introduce new calls. I guess that would require some locking, but should happen quite infrequently.
Agreed.
Cheers,
Nicolai
--
Lerne, wie die Welt wirklich ist,
aber vergiss niemals, wie sie sein sollte.
Hello everyone,
Just to add a bit of spice to the discussion about “Multi-Threading Compilers”: (sorry for just bringing high-level ideas)
We are heavy users of unity files (aka blobs or jumbos).
Unity files are a big pain, they add extra complexity, but at the same time they provide tremendous build-time reductions, 10x or more. Our game projects typically read >50,000 files during the full build of a single target, out of which 20,000 .CPPs. The same unity target compiles only 600 unity .CPPs, which themselves aggregate all of the 20,000 initial .CPPs. Building locally the 20,000 TUs on a modern 3.7 GHz 6-core PC takes more than 2 hours 30 min. With unity files, it takes 20 minutes. Distributing it remotely on pool of machines takes 5 min. Caching everything and rebuilding takes 45 sec.
However we’re now tributary of the order of files in the unities. If files or folders are added or removed in the codebase, the contents of the unity can change, thus the cache is invalidated for that unity CPP. And that happens quite often in production.
Unities also induce higher build times in some cases, spikes, like I was showing in a previous post of this thread. Without inspecting the AST, it is hard to determine an optimal “cutting” point when building the unity .CPPs. We can end up with unities including template-heavy .CPPs which will take a lot longer than other Unity files.
If we are to discuss multi-threading, this means we are discussing compile-time performance and how compilation would scale in the future. I think we should consider the functionality of unity files in the compiler (maybe behind a flag if it’s non-conformant).
While I don't know exactly how that fits in this (multi-treading) discussion, efficiently coalescing compilation of several TUs should be the compiler's responsibility, and likely will be more efficient than doing it by a pre-build tool, like we do today.
In essence, if we were to provide a large number of files to Clang, let's say with the same options: (the /MP flag is still WIP, I'll get back to that soon, [1])
clang-cl /c a.cpp b.cpp c.cpp d.cpp ... /MP
And then expect the compiler to (somehow) share tokenization-lexing-filecaching-preprocessing-compilation-optims-computations-etc across TUs, in a lock-free manner preferably. Overlapped/duplicated computations across threads, in the manner of transactions, would be probably fine, if computations are small and if we want to avoid locks (but this needs to be profiled). Also the recent trend of NUMA processor “tiles” as well as HBM2 memory on-chip per “tile”, could change the way multi-threaded code is written. Perhaps states would need to be duplicated in the local NUMA memory for maximum performance. Additionally, I’m not sure (how/if) lock-based programming will scale past a few hundreds, or thousands of cores in a single image without major contention. Maybe, as long as locks don’t cross NUMA boundaries. This needs to be considered in the design.
So while the discussion seems to around multi-threading single TUs, it’d be nice to also consider the possibility of sharing state between TUs. Which maybe means retaining runtime state in global hash table(s). And possibly persisting that state on disk, or in a DB, after the compilation -- we could maybe draw a parallel with work done by SN Systems (Program Repo, see [2]), or zapcc [3].
Thanks!
Alex.
[1] https://reviews.llvm.org/D52193
[2] https://github.com/SNSystems/llvm-project-prepo
[3] https://github.com/yrnkrn/zapcc
On 3/26/20 11:56 AM, Nicolai Hähnle via llvm-dev wrote:
> On Thu, Mar 26, 2020 at 12:06 PM Florian Hahn <floria...@apple.com> wrote:
>>> On Mar 26, 2020, at 10:55, Nicolai Hähnle <nhae...@gmail.com> wrote:
>>> On Thu, Mar 26, 2020 at 11:53 AM Florian Hahn <floria...@apple.com> wrote:
>>>>> It also doesn't solve the problem of Functions themselves -- those are
>>>>> also GlobalValues…
>>>> I am not sure why not. Function passes should only rely on the information at the callsite & from the declaration IIRC. For functions, we coulkd add extra declarations and update the call sites. But I might be missing something.
>>> Function passes can remove, duplicate, or just plain introduce call
>>> sites (e.g. recognize a memset pattern), which means the use lists of
>>> Functions can be changed during a function pass…
>>
>> Sure, but a single function won’t be processed in parallel by a function pass and would just work on the 'local version' of the globals it uses, including called functions. So a function pass adding/removing calls to existing ‘local versions’ of functions should not be a problem I think.
> Oh, I see what you're saying now. Yes, agreed that that should work.
>
> Though by the time you're done implementing the conversion from and to
> that representation and made sure you've covered all the corner cases,
> I'm not so sure you've really saved a lot of effort relative to just
> doing it right in the first place :)
>
>
>> One problem is that function passes can add new global, e.g. new declarations if they introduce new calls. I guess that would require some locking, but should happen quite infrequently.
> Agreed.
>
> Cheers,
> Nicolai
CCing myself back to the discussion. As to Johannes comments I've not
problem getting data
but the discussion seems to around implementation and what data we
actually care about.
That should be discussed first IMO.
Nick
done an experiment?
Sure, I was just curious what you base that opinion on. Have you
I haven't but, I suggested a few emails back we should gather such data
(and I was hoping Nicholas was going to do it).
Could you elaborate on what you said earlier. Why is problematic to
remove the use-lists tracking from Constants, except globals? If you
have a use case in which an always empty use list for such constants
would be problematic it would be beneficial for us to consider it sooner
than later.
> On Mar 26, 2020, at 8:26 AM, Doerfert, Johannes <jdoe...@anl.gov> wrote:
>
> On 3/26/20 5:53 AM, Florian Hahn wrote:
>>> It also doesn't solve the problem of Functions themselves -- those are
>>> also GlobalValues…
>>
>>
>> I am not sure why not. Function passes should only rely on the information at the callsite & from the declaration IIRC. For functions, we coulkd add extra declarations and update the call sites. But I might be missing something.
>
> FWIW, I think deleting and creating functions (or globals in general)
> should be hidden behind an API that would allow us to synchronize things
> in parallel runs.
>
> I don't believe there are many passes that would be affected and we
> could "easily" make this happen. Hide the constructors and destructors
> so that only the friend API can access it.
Right, this is what MLIR does. However, keep in mind that this is just one piece of the puzzle. You also have things like ValueHandles which get owned by IPA passes. The CallGraph used by the inline persists across function pass manager runs, and needs to be correctly updatable in order to run CallGraphSCC passes.
Getting to a multithread clean optimizer is not one bug fix away - this is a multiyear journey, and I would prefer not to see big changes with unclear tradeoffs made in an effort to get a quick win.
Instead, let’s break down the problems and fix them (the right way!) one at a time. For example, it seems that the thread agrees that the overall design of constants are a problem: let's talk about (incrementally!) moving constants to the right architecture. There are good suggestions about this upthread.
-Chris
So, syntactic matching, if performed by traversing the use list, will in
the presence of constants not work "as good" as it could. How that is a
correctness issue (or why does it matter otherwise)?
> This has clear advantages and disadvantages, including:
>
> 1) It is sparse, defined over use-def chains instead of requiring
dense scans of the IR. As such, it scales to larger functions better.
> 2) There can be a constant on the LHS and you end up scanning
everything in the universe if you aren’t careful.
I'm unsure what you want to say here. I think/hope we don't do the
matching you mention anymore. Even if, you can easily make this work
just fine, I mean, all of it points 1) and 2) in the absence of a use
lists for constants: Only follow users of non-constants, which is the
only thing you will be able to do anyway as constants won't have users.
If all operands are constants chances are the operation is as well or it
is not well-suited for syntactic-only transformations anyway.
> The right fix for this to make constants have instructions and unique
> them into the entry block of the function (which is what MLIR does
> :-).
I know that MLIR does the right thing already. Nevertheless, I would
still like to talk about alternatives in a constructive way.
> That said, my point isn’t about this algorithm in particular, it is
> that use-def chains are a pervasive part of the IR and are used in
> lots of different ways. Making them work different will introduce
> bugs and inconsistencies in the compiler that only show up in weird
> cases, which is not great for the long term health of the project.
This is where I am not so sure. So far, and including the example above,
I'm still not aware of a soundness problem or drawback that would come
from constants without use-lists (except globals). Anyway, we gather
more data once we perform some experiments.
FTR, I'm not even sure this is a good idea but I think it is
worth/interesting to discuss and investigate anyway.
Cheers,
I think you are unfortunately both correct. As I said before, creating,
modifying, and deleting globals will most likely need explicit
synchronization. It is not too hard to implement though and unlikely to
be a performance bottleneck (IMHO).
Cheers,
Johannes
I am aware.
> You also have things like ValueHandles which get owned by IPA passes.
Why are they more problematic?
> The CallGraph used by the inline persists across function pass manager
> runs, and needs to be correctly updatable in order to run CallGraphSCC
> passes.
Sure. We make everyone use a single `CallGraphUpdater` in which we can
synchronize appropriately. Again, the argument is the same as with
Global creation/modification/deletion, it does not happen much in most
passes.
> Getting to a multithread clean optimizer is not one bug fix away -
> this is a multiyear journey, and I would prefer not to see big changes
> with unclear tradeoffs made in an effort to get a quick win.
I think I missed the suggestion of "big changes with unclear tradeoffs
made in an effort to get a quick win". So far I thought this was a
discussion of ideas, methods to gather data, and potential pitfalls.
> Instead, let’s break down the problems and fix them (the right way!)
> one at a time. For example, it seems that the thread agrees that the
> overall design of constants are a problem: let's talk about
> (incrementally!) moving constants to the right architecture. There
> are good suggestions about this upthread.
I'm not sure why you think anyone so far suggested anything to "fix it
all" or to do it "not-incrementally". For example, I think we are
talking about the design of constants and how it can be incrementally
improved (for different purposes). I also think there are interesting
suggestions upthread but I'm not sure sure there was any kind of
agreement on "the right architecture".
Cheers,
Johannes
On Mar 27, 2020, at 12:23 AM, Johannes Doerfert <johannes...@gmail.com> wrote:
> Getting to a multithread clean optimizer is not one bug fix away -
> this is a multiyear journey, and I would prefer not to see big changes
> with unclear tradeoffs made in an effort to get a quick win.
I think I missed the suggestion of "big changes with unclear tradeoffs
made in an effort to get a quick win". So far I thought this was a
discussion of ideas, methods to gather data, and potential pitfalls.
> Instead, let’s break down the problems and fix them (the right way!)
> one at a time. For example, it seems that the thread agrees that the
> overall design of constants are a problem: let's talk about
> (incrementally!) moving constants to the right architecture. There
> are good suggestions about this upthread.
I'm not sure why you think anyone so far suggested anything to "fix it
all" or to do it "not-incrementally". For example, I think we are
talking about the design of constants and how it can be incrementally
improved (for different purposes). I also think there are interesting
suggestions upthread but I'm not sure sure there was any kind of
agreement on "the right architecture”.
That's why I'm also wary of the idea of just having use lists empty
for certain types without any other special handling. However, I would
argue that if Value::use_begin() etc. contain an assertion that fails
when called on one of the value types that don't have use lists, then
the Liskov substition principle is de facto not broken. It basically
leads to a situation that is as-if Value didn't have use lists in the
first place, and only certain sub-types had use lists.
If we go down that route, maybe the inheritance hierarchy could be
reorganized in a way that makes that more obvious.
Cheers,
Nicolai
>
> > Instead, let’s break down the problems and fix them (the right way!)
> > one at a time. For example, it seems that the thread agrees that the
> > overall design of constants are a problem: let's talk about
> > (incrementally!) moving constants to the right architecture. There
> > are good suggestions about this upthread.
>
> I'm not sure why you think anyone so far suggested anything to "fix it
> all" or to do it "not-incrementally". For example, I think we are
> talking about the design of constants and how it can be incrementally
> improved (for different purposes). I also think there are interesting
> suggestions upthread but I'm not sure sure there was any kind of
> agreement on "the right architecture”.
>
>
> It sounds like there is agreement that whole module/llvmcontext use-def chains for constant are a bad idea. There is one specific proposal for how to fix that that is incremental and avoids the problem I mention above: use instructions to materialize them, so Constant stops deriving from Value. I would love to see this be explored as a step along the way, instead of spending time pondering how bad a break to the core APIs in the compiler would be in practice.
>
> -Chris
>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm...@lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
--
Lerne, wie die Welt wirklich ist,
aber vergiss niemals, wie sie sein sollte.
On Fri, Mar 27, 2020 at 5:22 PM Chris Lattner via llvm-dev
<llvm...@lists.llvm.org> wrote:
> On Mar 27, 2020, at 12:23 AM, Johannes Doerfert <johannes...@gmail.com> wrote:
>
>
> > Getting to a multithread clean optimizer is not one bug fix away -
> > this is a multiyear journey, and I would prefer not to see big changes
> > with unclear tradeoffs made in an effort to get a quick win.
>
> I think I missed the suggestion of "big changes with unclear tradeoffs
> made in an effort to get a quick win". So far I thought this was a
> discussion of ideas, methods to gather data, and potential pitfalls.
>
>
> Making use-def chains work differently based on the dynamic type of a Value* is very problematic to me. It breaks the liskov substitution principle and is likely to lead to widespread bugs.
That's why I'm also wary of the idea of just having use lists empty
for certain types without any other special handling. However, I would
argue that if Value::use_begin() etc. contain an assertion that fails
when called on one of the value types that don't have use lists, then
the Liskov substition principle is de facto not broken. It basically
leads to a situation that is as-if Value didn't have use lists in the
first place, and only certain sub-types had use lists.
On Mar 27, 2020, at 1:55 PM, David Blaikie <dbla...@gmail.com> wrote:That's why I'm also wary of the idea of just having use lists empty
for certain types without any other special handling. However, I would
argue that if Value::use_begin() etc. contain an assertion that fails
when called on one of the value types that don't have use lists, then
the Liskov substition principle is de facto not broken. It basically
leads to a situation that is as-if Value didn't have use lists in the
first place, and only certain sub-types had use lists.
But it doesn't - it means you write some generic code, test it with some cases & looks like it generalizes to other cases (you don't want to/can't test generic code with all possible generic arguments - that's why substitutability is important) then the code breaks when it hits a constant Value because it doesn't conform to the contract.
It's not a break of the Liskov substitution principle, though. For
example: In the proposal, you can iterate over uses of an Instruction,
and you can iterate over uses of all sub-types of an Instruction. You
can't iterate over Constant, but then Constant isn't a sub-type of
Instruction.
It's a bit of an academic point. What you're really trying to say is
that with this particular version of the proposal, it is not
**statically** checked at compile-time whether a Value has iterable
use lists, and that makes it easier to write incorrect code
accidentally. Yes, that's a weakness, and precisely the reason why I
suggested that we may want to re-organize the inheritance hierarchy of
Values.
Cheers,
Nicolai
Nicolai,
Please stop removing me from the CC I asked in a previous email to be
CCed so I can follow
the discussion.
Thanks,
Nick
On Fri, Mar 27, 2020 at 9:55 PM David Blaikie <dbla...@gmail.com> wrote:
> On Fri, Mar 27, 2020 at 1:53 PM Nicolai Hähnle via llvm-dev <llvm...@lists.llvm.org> wrote:
>>
>> On Fri, Mar 27, 2020 at 5:22 PM Chris Lattner via llvm-dev
>> <llvm...@lists.llvm.org> wrote:
>> > On Mar 27, 2020, at 12:23 AM, Johannes Doerfert <johannes...@gmail.com> wrote:
>> >
>> >
>> > > Getting to a multithread clean optimizer is not one bug fix away -
>> > > this is a multiyear journey, and I would prefer not to see big changes
>> > > with unclear tradeoffs made in an effort to get a quick win.
>> >
>> > I think I missed the suggestion of "big changes with unclear tradeoffs
>> > made in an effort to get a quick win". So far I thought this was a
>> > discussion of ideas, methods to gather data, and potential pitfalls.
>> >
>> >
>> > Making use-def chains work differently based on the dynamic type of a Value* is very problematic to me. It breaks the liskov substitution principle and is likely to lead to widespread bugs.
>>
>> That's why I'm also wary of the idea of just having use lists empty
>> for certain types without any other special handling. However, I would
>> argue that if Value::use_begin() etc. contain an assertion that fails
>> when called on one of the value types that don't have use lists, then
>> the Liskov substition principle is de facto not broken. It basically
>> leads to a situation that is as-if Value didn't have use lists in the
>> first place, and only certain sub-types had use lists.
>
>
> But it doesn't - it means you write some generic code, test it with some cases & looks like it generalizes to other cases (you don't want to/can't test generic code with all possible generic arguments - that's why substitutability is important) then the code breaks when it hits a constant Value because it doesn't conform to the contract.
It's not a break of the Liskov substitution principle, though.
On Mar 28, 2020, at 1:41 PM, David Blaikie <dbla...@gmail.com> wrote:>
> But it doesn't - it means you write some generic code, test it with some cases & looks like it generalizes to other cases (you don't want to/can't test generic code with all possible generic arguments - that's why substitutability is important) then the code breaks when it hits a constant Value because it doesn't conform to the contract.
It's not a break of the Liskov substitution principle, though.
It does in the sense that all Values have a list of uses in the current API - yes, if you're proposing /changing the API/ so that Values don't have a use list API, that's one thing - but adding an assertion/runtime failure is not that. Having a runtime failure where by accessing the uses of /some/ Values is a contract violation/assertion failure/etc - that makes it not substitutable, you can't just treat all Values as equal & have to special case to avoid asking for the uses of some Values.
Yes, if you move the use-list API down from Value, so Values don't have use lists, that's different/not what Chris is objecting to, I don't think - it was the " Making use-def chains work differently based on the dynamic type of a Value* is very problematic to me. " that was specifically the objection/liskov breaking issue.
What if we modeled constants as implicit arguments to functions rather than introducing a constant materialization instruction? That would solve the problem of making their use lists function local, while avoiding some of the unnecessary complexity introduced by materialization functions (de-duplication, code placement, etc.).
--Owen