I am going to work on fixing Bug 12404[1] "Connect the BBVectorizer
with Polly", and this is the first step to achieve my ultimate goal:
move the helper class that vectorized the loop (grouped unroll the
loop, in fact) from Polly to LLVM transform library. After moving the
helper class to LLVM, people work on LLVM can create may able to Loop
Vectorizer based on this helper class. The motivation of myself to
doing this is I need a utility function/class to grouped unroll the
loop so that i can reduce the number of memory-access instructions
(i.e. load/store) by fuse them together.
Currently, the biggest blocker of connecting the BBVectorizer with
Polly directly is the IndependentBlocks pass[2], which introduce
redundant instructions. These redundant instructions need no trivial
optimization passes to clean up and prevent us from call the
BBVectorizer right after we grouped unrolled the loop. In fact, the
IndependentBlocks pass bug also block another bug[3]. So fixing the
IndependentBlocks pass bug can benefit more than Polly's vectorization
codegen, make the related helper classes more generic and increase the
chance to move them to the LLVM transform library.
The IndependentBlocks pass translates intrer-basicblock scalar
dependency to memory dependency to allow us to generate new
basicblock-centric scop statements without worrying about broken the
intrer-basicblock scalar dependency. To remove the need of
IndependentBlocks, we should preserve the scalar dependency while we
generating code for new statement, so we cannot simply copy all
instruction in the original basicblock to the new location when we
generating code for the scop statement, instead, we should copy the
entire expression tree for the non-trivial instructions (i.e.
instructions with side effect, including load, store and call) in the
scop statements. And other instructions (trivial instructions) should
be included in the expression tree of the non-trivial instructions,
otherwise they are trivially dead and no need to copy.
So the new scop statement codegen flow looks like this:
1. Build substitution for the scop statement, map the instruction that
is substituted and the new instruction (note that the IVs should be
included in this map).
2. for each non-trivial instruction in the statement, copy its
expression tree, where the nodes of the tree is
o The non-trivial instruction itself (the root of the tree), or
o The dependent of the node in the tree, unless the node is
another non-trivial instruction or we can find the corresponding
instruction for the node in the substitution map (the leave of the
tree). Because we should already generate code for them, and these
nodes and its dependents are no need to copy.
Note that the new scop statement codegen flow is not
basicblock-centric, and it suppose to support code generation for
finer grained statements[3].
Some other considerations which are not going to be implemented in the
first step:
1. We can also perform CSE while generating code by generate the code
in the dominator basicblock of current codegen location whenever
possible and remember them in the substitution map, so that the other
statements can reuse the code. In this case, we should delete the
entries in the map when they are not dominating the current codegen
location to preserve the SSA form.
2. We should decouple the isl related code from the BlockGenerator,
this can be done by generating the code derived from isl and setup the
corresponding entries in the substitution map before we call
BlockGenerator.
Any suggestion or comment is appreciated.
best regards
ether
PS: Attached a patch to move the BlockGenerator to standalone
header/source files, please review.
[1]http://llvm.org/bugs/show_bug.cgi?id=12404
[2]http://llvm.org/bugs/show_bug.cgi?id=12398
[3]http://llvm.org/bugs/show_bug.cgi?id=12402
Hi ether,
good to see you targeting some important problems. Some comments:
- Moving the Polly vectorizer to LLVM transform library
The Polly vectorizer current generates directly LLVM-IR from the
polyhedral representation. This obviously requires dedicated code,
but it is at the same time the most direct way to emit LLVM-IR. It does
not require any meta-data to be created, written and reanalyzed and we
build directly the final code code. Moving this stuff to LLVM has two
main drawbacks: 1) We need a lot of additional code to forward
polyhedral information to the generic LLVM-IR vectorizers 2) Creating
scalar code in between may slow down code generation
This does not mean I am not in favor of moving more stuff to generic
LLVM passes, but we should carefully evaluate the benefits here. I
personally would prefer if we can approach this in two ways:
1) Move all stuff into generic passes for which we are sure it is
beneficial and that does not increase complexity
2) Keep the Polly vectorizer code as it is and compare it to the
BBVectorizer approach.
I think the Polly vectorizer is currently sufficiently separate to not
cause a big maintenance overhead. We can keep it around while we fix the
problems that block us from scheduling the BBVectorizer. When these
problems are fixed, we can run both next too each other and we can
compare them in terms of generated code and compile time overhead.
> Currently, the biggest blocker of connecting the BBVectorizer with
> Polly directly is the IndependentBlocks pass[2], which introduce
> redundant instructions. These redundant instructions need no trivial
> optimization passes to clean up and prevent us from call the
> BBVectorizer right after we grouped unrolled the loop. In fact, the
> IndependentBlocks pass bug also block another bug[3]. So fixing the
> IndependentBlocks pass bug can benefit more than Polly's vectorization
> codegen, make the related helper classes more generic and increase the
> chance to move them to the LLVM transform library.
Yes, I fully agree. The independent block pass is the major blocker
right now.
> The IndependentBlocks pass translates intrer-basicblock scalar
> dependency to memory dependency to allow us to generate new
> basicblock-centric scop statements without worrying about broken the
> intrer-basicblock scalar dependency. To remove the need of
> IndependentBlocks, we should preserve the scalar dependency while we
> generating code for new statement, so we cannot simply copy all
> instruction in the original basicblock to the new location when we
> generating code for the scop statement, instead, we should copy the
> entire expression tree for the non-trivial instructions (i.e.
> instructions with side effect, including load, store and call) in the
> scop statements. And other instructions (trivial instructions) should
> be included in the expression tree of the non-trivial instructions,
> otherwise they are trivially dead and no need to copy.
I think there are two parts in the independent blocks pass.
1) Translation from scalar -> 1-element arrays
2) Duplicating scalar instructions in each basic block to remove trivial
scalar dependencies.
We can first fix 2) by moving this feature into code generation. As a
next step we can target 1). 1) is slightly more complicated as it not
only needs additional changes in code generation, but also changes in
ScopInfo and ScopDetection.
Another pass which we should get rid of is the -polly-indvars pass. We
currently require canonical loop induction variables. This is a feature
that is not available any more in LLVM and we should not keep it alive
in Polly.
> So the new scop statement codegen flow looks like this:
>
> 1. Build substitution for the scop statement, map the instruction that
> is substituted and the new instruction (note that the IVs should be
> included in this map).
Building the substitutions actually needs to change a little. At the
moment we create a map OldIV -> NewIV. In the future OldIV will not
exist. Instead we need to shift to virtual induction variables as used
by scalar evolution. Scalar evolution expression reference loop
induction variables with AddRecurrences of the form {start, +,
step}_<loop>. Those add recurrences define a virtual induction variable
that counts the number of loop iterations. The value of the
AddRecurrence is start + step * <virtual_iv_of_loop>. As the OldIV will
disappear, I propose to replace the map OldIV -> NewIV by a map,
<old_virtual_if_for_loop> -> NewIV. Or in short Loop -> NewIV.
> 2. for each non-trivial instruction in the statement, copy its
> expression tree, where the nodes of the tree is
> o The non-trivial instruction itself (the root of the tree), or
> o The dependent of the node in the tree, unless the node is
> another non-trivial instruction or we can find the corresponding
> instruction for the node in the substitution map (the leave of the
> tree). Because we should already generate code for them, and these
> nodes and its dependents are no need to copy.
>
> Note that the new scop statement codegen flow is not
> basicblock-centric, and it suppose to support code generation for
> finer grained statements[3].
Keeping [3] in mind is a good idea when fixing the independent blocks
pass. However, let's make sure to proceed step by step.
Talking about [3]. You said you don't want this to be basic block
centric? How are you planning to keep track of the instructions that are
part of a statement. At the moment we just reference a basic block
and this block has an implicit list of instructions. Are you planning to
maintain an explicit list of instructions in the ScopStmt?
Also, I did not entirely understand how your proposed algorithm works.
Let's discuss this when you actually plan to work on the patch.
> Some other considerations which are not going to be implemented in the
> first step:
> 1. We can also perform CSE while generating code by generate the code
> in the dominator basicblock of current codegen location whenever
> possible and remember them in the substitution map, so that the other
> statements can reuse the code. In this case, we should delete the
> entries in the map when they are not dominating the current codegen
> location to preserve the SSA form.
I hope we get this automatically by using the SCEVExpander.
> 2. We should decouple the isl related code from the BlockGenerator,
> this can be done by generating the code derived from isl and setup the
> corresponding entries in the substitution map before we call
> BlockGenerator.
What should the BlockGenerator be in the end? At the moment it is a
specific Polyhedral to LLVM-IR translator. How would a use look like,
that is not within Polly?
> Any suggestion or comment is appreciated.
>
> best regards
> ether
>
> PS: Attached a patch to move the BlockGenerator to standalone
> header/source files, please review.
Moving the BlockGenerators out is OK, but
> --- /dev/null
> +++ b/lib/CodeGen/BlockGenerators.cpp
[..]
> +// FIXME: Not possible to move the implementation here before we decouple isl
> +// related code from BlockGenerator.
just moving the headers around seems more than incomplete. I don't
understand how isl is blocking the move? You may want to remove
references to isl for some reason, but that seems to be independent of
moving the class into another file.
I also attached a work in progress patch where I would love to hear your
comments. It implements a SCEVExpander based approach to get ride of the
independent blocks pass. It is just a very first step, but it already
passes 'make polly-test'. What do you think?
Cheers
Tobi
On 04/24/2012 05:44 AM, Hongbin Zheng wrote:
hi tobi and others,
I am going to work on fixing Bug 12404[1] "Connect the BBVectorizer
with Polly", and this is the first step to achieve my ultimate goal:
move the helper class that vectorized the loop (grouped unroll the
loop, in fact) from Polly to LLVM transform library. After moving the
helper class to LLVM, people work on LLVM can create may able to Loop
Vectorizer based on this helper class. The motivation of myself to
doing this is I need a utility function/class to grouped unroll the
loop so that i can reduce the number of memory-access instructions
(i.e. load/store) by fuse them together.
Hi ether,
good to see you targeting some important problems. Some comments:
- Moving the Polly vectorizer to LLVM transform library
The Polly vectorizer current generates directly LLVM-IR from the polyhedral representation. This obviously requires dedicated code,
but it is at the same time the most direct way to emit LLVM-IR. It does not require any meta-data to be created, written and reanalyzed and we build directly the final code code. Moving this stuff to LLVM has two main drawbacks: 1) We need a lot of additional code to forward polyhedral information to the generic LLVM-IR vectorizers 2) Creating scalar code in between may slow down code generation
This does not mean I am not in favor of moving more stuff to generic LLVM passes, but we should carefully evaluate the benefits here. I personally would prefer if we can approach this in two ways:
1) Move all stuff into generic passes for which we are sure it is beneficial and that does not increase complexity
2) Keep the Polly vectorizer code as it is and compare it to the BBVectorizer approach.
I think the Polly vectorizer is currently sufficiently separate to not
Another pass which we should get rid of is the -polly-indvars pass. We currently require canonical loop induction variables. This is a feature that is not available any more in LLVM and we should not keep it alive in Polly.
So the new scop statement codegen flow looks like this:
1. Build substitution for the scop statement, map the instruction that
is substituted and the new instruction (note that the IVs should be
included in this map).
Building the substitutions actually needs to change a little. At the moment we create a map OldIV -> NewIV. In the future OldIV will not exist. Instead we need to shift to virtual induction variables as used by scalar evolution. Scalar evolution expression reference loop induction variables with AddRecurrences of the form {start, +, step}_<loop>. Those add recurrences define a virtual induction variable
that counts the number of loop iterations. The value of the AddRecurrence is start + step * <virtual_iv_of_loop>. As the OldIV will disappear, I propose to replace the map OldIV -> NewIV by a map,
<old_virtual_if_for_loop> -> NewIV. Or in short Loop -> NewIV.
2. for each non-trivial instruction in the statement, copy its
expression tree, where the nodes of the tree is
o The non-trivial instruction itself (the root of the tree), or
o The dependent of the node in the tree, unless the node is
another non-trivial instruction or we can find the corresponding
instruction for the node in the substitution map (the leave of the
tree). Because we should already generate code for them, and these
nodes and its dependents are no need to copy.
Note that the new scop statement codegen flow is not
basicblock-centric, and it suppose to support code generation for
finer grained statements[3].
Keeping [3] in mind is a good idea when fixing the independent blocks pass. However, let's make sure to proceed step by step.
Talking about [3]. You said you don't want this to be basic block centric?
How are you planning to keep track of the instructions that are part of a statement. At the moment we just reference a basic block
and this block has an implicit list of instructions. Are you planning to
maintain an explicit list of instructions in the ScopStmt?
Also, I did not entirely understand how your proposed algorithm works.
Let's discuss this when you actually plan to work on the patch.
Some other considerations which are not going to be implemented in the
first step:
1. We can also perform CSE while generating code by generate the code
in the dominator basicblock of current codegen location whenever
possible and remember them in the substitution map, so that the other
statements can reuse the code. In this case, we should delete the
entries in the map when they are not dominating the current codegen
location to preserve the SSA form.
I hope we get this automatically by using the SCEVExpander.
2. We should decouple the isl related code from the BlockGenerator,
this can be done by generating the code derived from isl and setup the
corresponding entries in the substitution map before we call
BlockGenerator.
What should the BlockGenerator be in the end? At the moment it is a specific Polyhedral to LLVM-IR translator. How would a use look like,
that is not within Polly?
Any suggestion or comment is appreciated.
best regards
ether
PS: Attached a patch to move the BlockGenerator to standalone
header/source files, please review.
Moving the BlockGenerators out is OK, but
--- /dev/null[..]
+++ b/lib/CodeGen/BlockGenerators.cpp
+// FIXME: Not possible to move the implementation here before we decouple isl
+// related code from BlockGenerator.
just moving the headers around seems more than incomplete. I don't understand how isl is blocking the move? You may want to remove references to isl for some reason, but that seems to be independent of moving the class into another file.
I also attached a work in progress patch where I would love to hear your comments. It implements a SCEVExpander based approach to get ride of the independent blocks pass. It is just a very first step, but it already passes 'make polly-test'. What do you think?
Cheers
Tobi
the "ScalarEvolution &SE, ValueMapT &GlobalMap, ValueMapT &BBMap" appears repeatly in CodeGeneration.cpp, maybe we can grouped them in a struct named "CodeGenContext"?
From 9dc4d18bffb04f2babadacb08f14a767c1da8a55 Mon Sep 17 00:00:00 2001 From: Tobias Grosser <tob...@grosser.es> Date: Mon, 2 Apr 2012 14:56:25 +0200 Subject: [PATCH] Add SCEV rewrite based code generation This is the first draft of SCEV rewrite based code generation. It already passes the Polly test suite, but will probably break for several OpenMP code generation tests. --- lib/CodeGen/CodeGeneration.cpp | 244 ++++++++++++++++++++++++++++++++- test/CodeGen/simple_vec_call_2.ll | 34 ++++-- test/CodeGen/simple_vec_impossible.ll | 12 ++- 3 files changed, 279 insertions(+), 11 deletions(-) diff --git a/lib/CodeGen/CodeGeneration.cpp b/lib/CodeGen/CodeGeneration.cpp index 7edd136..762c291 100644 --- a/lib/CodeGen/CodeGeneration.cpp +++ b/lib/CodeGen/CodeGeneration.cpp @@ -90,10 +90,212 @@ GroupedUnrolling("enable-polly-grouped-unroll", "instuctions"), cl::Hidden, cl::init(false), cl::ZeroOrMore); +static cl::opt<bool> +SCEVCodegen("polly-codegen-scev", cl::desc("Use SCEV based code generation"), + cl::init(true), cl::Hidden, cl::ZeroOrMore); + typedef DenseMap<const Value*, Value*> ValueMapT; typedef DenseMap<const char*, Value*> CharMapT; typedef std::vector<ValueMapT> VectorValueMapT; +/// The SCEVRewriter takes a scalar evolution expression and updates the +/// following components: +/// +/// - SCEVUnknown +/// +/// Values referenced in SCEVUnknown subexpressions are looked up in +/// two Value to Value maps (GlobalMap and BBMap). If they are found they are +/// replaced by a reference to the value they map to. +/// +/// - SCEVAddRecExpr +/// +/// Based on a Loop -> Value map {Loop_1: %Value}, an expression +/// {%Base, +, %Step}<Loop_1> is rewritten to %Base + %Value * %Step. +/// AddRecExpr's with more than two operands can not be translated. +/// +/// FIXME: The comment above is not yet reality. At the moment we derive +/// %Value by looking up the canonical IV of the loop and by defining +/// %Value = GlobalMap[%IV]. This needs to be changed to remove the need for +/// canonical induction variables. +/// +/// +/// How can this be used? +/// ==================== +/// +/// SCEVRewrite based code generation works on virtually independent blocks. +/// This means we do not run the independent blocks pass to rewrite scalar +/// instructions, but just ignore instructions that we can analyze with scalar +/// evolution. Virtually independent blocks are blocks that only reference the +/// following values: +/// +/// o Values calculated within a basic block +/// o Values representable by SCEV +/// +/// During code generation we can ignore all instructions: +/// +/// - Ignore all instructions except: +/// - Load instructions +/// - Instructions that reference operands already calculated within the +/// basic block. +/// - Store instructions +struct SCEVRewriter : public SCEVVisitor<SCEVRewriter, const SCEV*> { +public: + static const SCEV *rewrite(const SCEV *scev, Scop &S, ScalarEvolution &SE, + ValueMapT &GlobalMap, ValueMapT &BBMap) { + SCEVRewriter Rewriter(S, SE, GlobalMap, BBMap); + return Rewriter.visit(scev); + } + + SCEVRewriter(Scop &S, ScalarEvolution &SE, ValueMapT &GlobalMap, + ValueMapT &BBMap) : S(S), SE(SE), GlobalMap(GlobalMap), + BBMap(BBMap) {}
Depending when we build the signature for the subfunction, if we build the signature before calling SCEVRewriter (this is what Polly do at the moment), detecting on the fly maybe too late. Or we may create the subfunction after all block of the subfunction are generated, this means we first place the blocks of the subfunction in the current function, and move them the the new function at last.+ + const SCEV *visit(const SCEV *Expr) { + // FIXME: The parameter handling is incorrect. + // + // Polly does only detect parameters in Access function and loop iteration + // counters, but it does not get parameters that are just used by + // instructions within the basic block. + // + // There are two options to solve this: + // o Iterate over all instructions of the SCoP and find the actual + // parameters. + // o Just check within the SCEVRewriter if Values lay outside of the SCoP + // and detect parameters on the fly.
Use 0 instead of NULL? I seem most of LLVM code use 0. (Not true now?)+ // + // This is especially important for OpenMP and GPGPU code generation, as + // they require us to detect and possibly rewrite the corresponding + // parameters. + if (isl_id *Id = S.getIdForParam(Expr)) { + isl_id_free(Id); + return Expr; + } + + + return SCEVVisitor<SCEVRewriter, const SCEV*>::visit(Expr); + } + + const SCEV *visitConstant(const SCEVConstant *Constant) { + return Constant; + } + + const SCEV *visitTruncateExpr(const SCEVTruncateExpr *Expr) { + const SCEV *Operand = visit(Expr->getOperand()); + return SE.getTruncateExpr(Operand, Expr->getType()); + } + + const SCEV *visitZeroExtendExpr(const SCEVZeroExtendExpr *Expr) { + const SCEV *Operand = visit(Expr->getOperand()); + return SE.getZeroExtendExpr(Operand, Expr->getType()); + } + + const SCEV *visitSignExtendExpr(const SCEVSignExtendExpr *Expr) { + const SCEV *Operand = visit(Expr->getOperand()); + return SE.getSignExtendExpr(Operand, Expr->getType()); + } + + const SCEV *visitAddExpr(const SCEVAddExpr *Expr) { + SmallVector<const SCEV *, 2> Operands; + for (int i = 0, e = Expr->getNumOperands(); i < e; ++i) { + const SCEV *Operand = visit(Expr->getOperand(i)); + Operands.push_back(Operand); + } + + return SE.getAddExpr(Operands); + } + + const SCEV *visitMulExpr(const SCEVMulExpr *Expr) { + SmallVector<const SCEV *, 2> Operands; + for (int i = 0, e = Expr->getNumOperands(); i < e; ++i) { + const SCEV *Operand = visit(Expr->getOperand(i)); + Operands.push_back(Operand); + } + + return SE.getMulExpr(Operands); + } + + const SCEV *visitUDivExpr(const SCEVUDivExpr *Expr) { + return SE.getUDivExpr(visit(Expr->getLHS()), visit(Expr->getRHS())); + } + + // Return a new induction variable if the loop is within the original SCoP + // or NULL otherwise. + Value *getNewIV(const Loop *L) { + Value *IV = L->getCanonicalInductionVariable(); + if (!IV) + return NULL;
This code fragment appear repeately, we can move it to a standalone function.+ + ValueMapT::iterator NewIV = GlobalMap.find(IV); + + if (NewIV == GlobalMap.end()) + return NULL; + + return NewIV->second; + } + + const SCEV *visitAddRecExpr(const SCEVAddRecExpr *Expr) { + Value *IV; + + IV = getNewIV(Expr->getLoop()); + + // The IV is not within the GlobalMaps. So do not rewrite it and also do + // not rewrite any descendants. + if (!IV) + return Expr; + + assert(Expr->getNumOperands() == 2 + && "An AddRecExpr with more than two operands can not be rewritten."); + + const SCEV *Base, *Step, *IVExpr, *Product; + + Base = visit(Expr->getStart()); + Step = visit(Expr->getOperand(1)); + IVExpr = SE.getUnknown(IV); + IVExpr = SE.getTruncateOrSignExtend(IVExpr, Step->getType()); + Product = SE.getMulExpr(Step, IVExpr); + + return SE.getAddExpr(Base, Product); + } + + const SCEV *visitSMaxExpr(const SCEVSMaxExpr *Expr) { + SmallVector<const SCEV *, 2> Operands; + for (int i = 0, e = Expr->getNumOperands(); i < e; ++i) { + const SCEV *Operand = visit(Expr->getOperand(i)); + Operands.push_back(Operand); + } + + return SE.getSMaxExpr(Operands); + } + + const SCEV *visitUMaxExpr(const SCEVUMaxExpr *Expr) { + SmallVector<const SCEV *, 2> Operands; + for (int i = 0, e = Expr->getNumOperands(); i < e; ++i) { + const SCEV *Operand = visit(Expr->getOperand(i)); + Operands.push_back(Operand); + } + + return SE.getUMaxExpr(Operands); + } + + const SCEV *visitUnknown(const SCEVUnknown *Expr) { + Value *V = Expr->getValue(); + + if (GlobalMap.count(V)) + return SE.getUnknown(GlobalMap[V]); + + if (BBMap.count(V)) + return SE.getUnknown(BBMap[V]); + + return Expr; + } + +private: + Scop &S; + ScalarEvolution &SE; + ValueMapT &GlobalMap; + ValueMapT &BBMap; +}; + class IslGenerator { public: IslGenerator(IRBuilder<> &Builder, std::vector<Value *> &IVS) : @@ -216,6 +418,7 @@ protected: IRBuilder<> &Builder; ScopStmt &Statement; Pass *P; + ScalarEvolution *SE; BlockGenerator(IRBuilder<> &B, ScopStmt &Stmt, Pass *P); @@ -283,7 +486,9 @@ protected: }; BlockGenerator::BlockGenerator(IRBuilder<> &B, ScopStmt &Stmt, Pass *P): - Builder(B), Statement(Stmt), P(P) {} + Builder(B), Statement(Stmt), P(P) { + SE = &P->getAnalysis<ScalarEvolution>(); +} Value *BlockGenerator::getNewValue(const Value *Old, ValueMapT &BBMap, ValueMapT &GlobalMap) { @@ -306,6 +511,21 @@ Value *BlockGenerator::getNewValue(const Value *Old, ValueMapT &BBMap, return BBMap[Old]; } + if (SCEVCodegen && SE->isSCEVable(Old->getType())) + if (const SCEV *Scev = SE->getSCEV(const_cast<Value*>(Old))) + if (!isa<SCEVCouldNotCompute>(Scev)) { + const SCEV *NewScev = SCEVRewriter::rewrite(Scev, + *Statement.getParent(), *SE, + GlobalMap, BBMap); + //errs() << "Scev : " << *Scev << "--" << *Scev->getType() << "\n" ; + SCEVExpander Expander(*SE, "polly"); + Value *Expanded = Expander.expandCodeFor(NewScev, Old->getType(), + Builder.GetInsertPoint()); + + BBMap[Old] = Expanded; + return Expanded; + } + // 'Old' is within the original SCoP, but was not rewritten. // // Such values appear, if they only calculate information already available in @@ -441,6 +661,17 @@ void BlockGenerator::copyInstruction(const Instruction *Inst, if (Inst->isTerminator()) return; + if (SCEVCodegen && SE->isSCEVable(Inst->getType())) + if (const SCEV *Scev = SE->getSCEV(const_cast<Instruction*>(Inst))) + if (!isa<SCEVCouldNotCompute>(Scev)) { + if (const SCEVUnknown *Unknown = dyn_cast<SCEVUnknown>(Scev)) { + if (Unknown->getValue() != Inst) + return; + } else { + return; + } + } + if (const LoadInst *Load = dyn_cast<LoadInst>(Inst)) { BBMap[Load] = generateScalarLoad(Load, BBMap, GlobalMap); return; @@ -845,6 +1076,17 @@ void VectorBlockGenerator::copyInstruction(const Instruction *Inst, if (Inst->isTerminator()) return; + if (SCEVCodegen && SE->isSCEVable(Inst->getType())) + if (const SCEV *Scev = SE->getSCEV(const_cast<Instruction*>(Inst))) + if (!isa<SCEVCouldNotCompute>(Scev)) { + if (const SCEVUnknown *Unknown = dyn_cast<SCEVUnknown>(Scev)) { + if (Unknown->getValue() != Inst) + return; + } else { + return; + } + }
+ if (const LoadInst *Load = dyn_cast<LoadInst>(Inst)) { generateLoad(Load, VectorMap, ScalarMaps); return; diff --git a/test/CodeGen/simple_vec_call_2.ll b/test/CodeGen/simple_vec_call_2.ll index 7b8b598..c6fc7f5 100644 --- a/test/CodeGen/simple_vec_call_2.ll +++ b/test/CodeGen/simple_vec_call_2.ll @@ -1,4 +1,5 @@ -; RUN: opt %loadPolly -basicaa -polly-codegen -enable-polly-vector -dce -S %s | FileCheck %s +; RUN: opt %loadPolly -basicaa -polly-codegen -enable-polly-vector -polly-codegen-scev=false -dce -S %s | FileCheck %s +; RUN: opt %loadPolly -basicaa -polly-codegen -enable-polly-vector -polly-codegen-scev=true -dce -S %s | FileCheck %s -check-prefix=CHECK-SCEV target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64" target triple = "x86_64-unknown-linux-gnu" @@ -35,11 +36,26 @@ return: ; CHECK: %1 = extractelement <4 x float> %value_p_splat, i32 1 ; CHECK: %2 = extractelement <4 x float> %value_p_splat, i32 2 ; CHECK: %3 = extractelement <4 x float> %value_p_splat, i32 3 -; CHECK: %p_result = tail call float** @foo(float %0) nounwind -; CHECK: %p_result4 = tail call float** @foo(float %1) nounwind -; CHECK: %p_result5 = tail call float** @foo(float %2) nounwind -; CHECK: %p_result6 = tail call float** @foo(float %3) nounwind -; CHECK: store float** %p_result, float*** %p_scevgep, align 4 -; CHECK: store float** %p_result4, float*** %p_scevgep1, align 4 -; CHECK: store float** %p_result5, float*** %p_scevgep2, align 4 -; CHECK: store float** %p_result6, float*** %p_scevgep3, align 4 +; CHECK: [[RES1:%[a-zA-Z0-9_]+]] = tail call float** @foo(float %0) nounwind +; CHECK: [[RES2:%[a-zA-Z0-9_]+]] = tail call float** @foo(float %1) nounwind +; CHECK: [[RES3:%[a-zA-Z0-9_]+]] = tail call float** @foo(float %2) nounwind +; CHECK: [[RES4:%[a-zA-Z0-9_]+]] = tail call float** @foo(float %3) nounwind +; CHECK: store float** [[RES1]], float*** %p_scevgep, align 4 +; CHECK: store float** [[RES2]], float*** %p_scevgep1, align 4 +; CHECK: store float** [[RES3]], float*** %p_scevgep2, align 4 +; CHECK: store float** [[RES4]], float*** %p_scevgep3, align 4 + +; CHECK-SCEV: %value_p_splat_one = load <1 x float>* bitcast ([1024 x float]* @A to <1 x float>*), align 8 +; CHECK-SCEV: %value_p_splat = shufflevector <1 x float> %value_p_splat_one, <1 x float> %value_p_splat_one, <4 x i32> zeroinitializer +; CHECK-SCEV: %0 = extractelement <4 x float> %value_p_splat, i32 0 +; CHECK-SCEV: %1 = extractelement <4 x float> %value_p_splat, i32 1 +; CHECK-SCEV: %2 = extractelement <4 x float> %value_p_splat, i32 2 +; CHECK-SCEV: %3 = extractelement <4 x float> %value_p_splat, i32 3 +; CHECK-SCEV: [[RES1:%[a-zA-Z0-9_]+]] = tail call float** @foo(float %0) nounwind +; CHECK-SCEV: [[RES2:%[a-zA-Z0-9_]+]] = tail call float** @foo(float %1) nounwind +; CHECK-SCEV: [[RES3:%[a-zA-Z0-9_]+]] = tail call float** @foo(float %2) nounwind +; CHECK-SCEV: [[RES4:%[a-zA-Z0-9_]+]] = tail call float** @foo(float %3) nounwind +; CHECK-SCEV: store float** [[RES1]], float*** getelementptr inbounds ([1024 x float**]* @B, i64 0, i64 0), align 4 +; CHECK-SCEV: store float** [[RES2]], float*** getelementptr inbounds ([1024 x float**]* @B, i64 0, i64 1), align 4 +; CHECK-SCEV: store float** [[RES3]], float*** getelementptr inbounds ([1024 x float**]* @B, i64 0, i64 2), align 4 +; CHECK-SCEV: store float** [[RES4]], float*** getelementptr inbounds ([1024 x float**]* @B, i64 0, i64 3), align 4 diff --git a/test/CodeGen/simple_vec_impossible.ll b/test/CodeGen/simple_vec_impossible.ll index 7b59a50..4658252 100644 --- a/test/CodeGen/simple_vec_impossible.ll +++ b/test/CodeGen/simple_vec_impossible.ll @@ -1,4 +1,5 @@ -; RUN: opt %loadPolly -basicaa -polly-codegen -enable-polly-vector -S %s | FileCheck %s +; RUN: opt %loadPolly -basicaa -polly-codegen -enable-polly-vector -S -polly-codegen-scev=false %s | FileCheck %s +; RUN: opt %loadPolly -basicaa -polly-codegen -enable-polly-vector -S -polly-codegen-scev=true %s | FileCheck %s -check-prefix=CHECK-SCEV target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64" target triple = "x86_64-unknown-linux-gnu" @@ -36,3 +37,12 @@ return: ; CHECK: store float** %value_p_scalar_4, float*** %p_scevgep1, align 4 ; CHECK: store float** %value_p_scalar_5, float*** %p_scevgep2, align 4 ; CHECK: store float** %value_p_scalar_6, float*** %p_scevgep3, align 4 + +; CHECK-SCEV: %value_p_scalar_ = load float*** getelementptr inbounds ([1024 x float**]* @A, i64 0, i64 0) +; CHECK-SCEV: %value_p_scalar_1 = load float*** getelementptr inbounds ([1024 x float**]* @A, i64 0, i64 0) +; CHECK-SCEV: %value_p_scalar_2 = load float*** getelementptr inbounds ([1024 x float**]* @A, i64 0, i64 0) +; CHECK-SCEV: %value_p_scalar_3 = load float*** getelementptr inbounds ([1024 x float**]* @A, i64 0, i64 0) +; CHECK-SCEV: store float** %value_p_scalar_, float*** getelementptr inbounds ([1024 x float**]* @B, i64 0, i64 0), align 4 +; CHECK-SCEV: store float** %value_p_scalar_1, float*** getelementptr inbounds ([1024 x float**]* @B, i64 0, i64 1), align 4 +; CHECK-SCEV: store float** %value_p_scalar_2, float*** getelementptr inbounds ([1024 x float**]* @B, i64 0, i64 2), align 4 +; CHECK-SCEV: store float** %value_p_scalar_3, float*** getelementptr inbounds ([1024 x float**]* @B, i64 0, i64 3), align 4--
I don't think this is a problem. Feel free to go ahead and split it into
a separate file.
When generating so many header files, we should probably create folders
in the include directory that correspond to the folders in lib/
Tobi
>
> When generating so many header files, we should probably create folders in
> the include directory that correspond to the folders in lib/
Ok, so i am going to create the include/CodeGen folder
>
> Tobi
this is a up to date patch, next patch will create the include/CodeGen folder.
best regards
ether
---
include/polly/BlockGenerators.h | 243 ++++++++++++
lib/CodeGen/BlockGenerators.cpp | 648 ++++++++++++++++++++++++++++++
lib/CodeGen/CMakeLists.txt | 1 +
lib/CodeGen/CodeGeneration.cpp | 827 +--------------------------------------
4 files changed, 895 insertions(+), 824 deletions(-)
create mode 100644 include/polly/BlockGenerators.h
create mode 100644 lib/CodeGen/BlockGenerators.cpp
diff --git a/include/polly/BlockGenerators.h b/include/polly/BlockGenerators.h
new file mode 100644
index 0000000..ba2e390
--- /dev/null
+++ b/include/polly/BlockGenerators.h
@@ -0,0 +1,243 @@
+//===-BlockGenerators.h - Helper to generate code for statements-*-
C++ -*-===//
+//
+// The LLVM Compiler Infrastructure
+//
+// This file is distributed under the University of Illinois Open Source
+// License. See LICENSE.TXT for details.
+//
+//===----------------------------------------------------------------------===//
+//
+// This file define the BlockGenerator and VectorBlockGenerator class, which
+// generate sequential code and vectorized code for a polyhedral statement,
+// respectively.
+//
+//===----------------------------------------------------------------------===//
+
+#ifndef POLLY_BLOCK_GENERATORS_H
+#define POLLY_BLOCK_GENERATORS_H
+
+#include "llvm/ADT/DenseMap.h"
+#include "llvm/Support/IRBuilder.h"
+
+#include "isl/map.h"
+
+#include <vector>
+
+namespace llvm {
+class Pass;
+}
+
+namespace polly {
+using namespace llvm;
+class ScopStmt;
+
+typedef DenseMap<const Value*, Value*> ValueMapT;
+typedef std::vector<ValueMapT> VectorValueMapT;
+
+/// @brief Generate a new basic block for a polyhedral statement.
+///
+/// The only public function exposed is generate().
+class BlockGenerator {
+public:
+ /// @brief Generate a new BasicBlock for a ScopStmt.
+ ///
+ /// @param Builder The LLVM-IR Builder used to generate the statement. The
+ /// code is generated at the location, the Builder
points to.
+ /// @param Stmt The statement to code generate.
+ /// @param GlobalMap A map that defines for certain Values
referenced from the
+ /// original code new Values they should be replaced with.
+ /// @param P A reference to the pass this function is called from.
+ /// The pass is needed to update other analysis.
+ static void generate(IRBuilder<> &Builder, ScopStmt &Stmt,
+ ValueMapT &GlobalMap, Pass *P) {
+ BlockGenerator Generator(Builder, Stmt, P);
+ Generator.copyBB(GlobalMap);
+ }
+
+protected:
+ IRBuilder<> &Builder;
+ ScopStmt &Statement;
+ Pass *P;
+
+ BlockGenerator(IRBuilder<> &B, ScopStmt &Stmt, Pass *P);
+
+ /// @brief Get the new version of a Value.
+ ///
+ /// @param Old The old Value.
+ /// @param BBMap A mapping from old values to their new values
+ /// (for values recalculated within this basic block).
+ /// @param GlobalMap A mapping from old values to their new values
+ /// (for values recalculated in the new ScoP, but not
+ /// within this basic block).
+ ///
+ /// @returns o The old value, if it is still valid.
+ /// o The new value, if available.
+ /// o NULL, if no value is found.
+ Value *getNewValue(const Value *Old, ValueMapT &BBMap, ValueMapT &GlobalMap);
+
+ void copyInstScalar(const Instruction *Inst, ValueMapT &BBMap,
+ ValueMapT &GlobalMap);
+
+ /// @brief Get the memory access offset to be added to the base address
+ std::vector<Value*> getMemoryAccessIndex(__isl_keep isl_map *AccessRelation,
+ Value *BaseAddress,
ValueMapT &BBMap,
+ ValueMapT &GlobalMap);
+
+ /// @brief Get the new operand address according to the changed access in
+ /// JSCOP file.
+ Value *getNewAccessOperand(__isl_keep isl_map *NewAccessRelation,
+ Value *BaseAddress, ValueMapT &BBMap,
+ ValueMapT &GlobalMap);
+
+ /// @brief Generate the operand address
+ Value *generateLocationAccessed(const Instruction *Inst,
+ const Value *Pointer, ValueMapT &BBMap,
+ ValueMapT &GlobalMap);
+
+ Value *generateScalarLoad(const LoadInst *load, ValueMapT &BBMap,
+ ValueMapT &GlobalMap);
+
+ Value *generateScalarStore(const StoreInst *store, ValueMapT &BBMap,
+ ValueMapT &GlobalMap);
+
+ /// @brief Copy a single Instruction.
+ ///
+ /// This copies a single Instruction and updates references to old values
+ /// with references to new values, as defined by GlobalMap and BBMap.
+ ///
+ /// @param BBMap A mapping from old values to their new values
+ /// (for values recalculated within this basic block).
+ /// @param GlobalMap A mapping from old values to their new values
+ /// (for values recalculated in the new ScoP, but not
+ /// within this basic block).
+ void copyInstruction(const Instruction *Inst, ValueMapT &BBMap,
+ ValueMapT &GlobalMap);
+
+ /// @brief Copy the basic block.
+ ///
+ /// This copies the entire basic block and updates references to old values
+ /// with references to new values, as defined by GlobalMap.
+ ///
+ /// @param GlobalMap A mapping from old values to their new values
+ /// (for values recalculated in the new ScoP, but not
+ /// within this basic block).
+ void copyBB(ValueMapT &GlobalMap);
+};
+
+/// @brief Generate a new vector basic block for a polyhedral statement.
+///
+/// The only public function exposed is generate().
+class VectorBlockGenerator : BlockGenerator {
+public:
+ /// @brief Generate a new vector basic block for a ScoPStmt.
+ ///
+ /// This code generation is similar to the normal, scalar code generation,
+ /// except that each instruction is code generated for several vector lanes
+ /// at a time. If possible instructions are issued as actual vector
+ /// instructions, but e.g. for address calculation instructions we currently
+ /// generate scalar instructions for each vector lane.
+ ///
+ /// @param Builder The LLVM-IR Builder used to generate the statement. The
+ /// code is generated at the location, the builder points
+ /// to.
+ /// @param Stmt The statement to code generate.
+ /// @param GlobalMaps A vector of maps that define for certain Values
+ /// referenced from the original code new Values
they should
+ /// be replaced with. Each map in the vector of maps is
+ /// used for one vector lane. The number of elements in the
+ /// vector defines the width of the generated vector
+ /// instructions.
+ /// @param P A reference to the pass this function is called from.
+ /// The pass is needed to update other analysis.
+ static void generate(IRBuilder<> &B, ScopStmt &Stmt,
+ VectorValueMapT &GlobalMaps, __isl_keep isl_set *Domain,
+ Pass *P) {
+ VectorBlockGenerator Generator(B, GlobalMaps, Stmt, Domain, P);
+ Generator.copyBB();
+ }
+
+private:
+ // This is a vector of global value maps. The first map is used
for the first
+ // vector lane, ...
+ // Each map, contains information about Instructions in the old ScoP, which
+ // are recalculated in the new SCoP. When copying the basic block, we replace
+ // all referenes to the old instructions with their recalculated values.
+ VectorValueMapT &GlobalMaps;
+
+ isl_set *Domain;
+
+ VectorBlockGenerator(IRBuilder<> &B, VectorValueMapT &GlobalMaps,
+ ScopStmt &Stmt, __isl_keep isl_set *Domain, Pass *P);
+
+ int getVectorWidth();
+
+ Value *getVectorValue(const Value *Old, ValueMapT &VectorMap,
+ VectorValueMapT &ScalarMaps);
+
+ Type *getVectorPtrTy(const Value *V, int Width);
+
+ /// @brief Load a vector from a set of adjacent scalars
+ ///
+ /// In case a set of scalars is known to be next to each other in memory,
+ /// create a vector load that loads those scalars
+ ///
+ /// %vector_ptr= bitcast double* %p to <4 x double>*
+ /// %vec_full = load <4 x double>* %vector_ptr
+ ///
+ Value *generateStrideOneLoad(const LoadInst *Load, ValueMapT &BBMap);
+
+ /// @brief Load a vector initialized from a single scalar in memory
+ ///
+ /// In case all elements of a vector are initialized to the same
+ /// scalar value, this value is loaded and shuffeled into all elements
+ /// of the vector.
+ ///
+ /// %splat_one = load <1 x double>* %p
+ /// %splat = shufflevector <1 x double> %splat_one, <1 x
+ /// double> %splat_one, <4 x i32> zeroinitializer
+ ///
+ Value *generateStrideZeroLoad(const LoadInst *Load, ValueMapT &BBMap);
+
+ /// @brief Load a vector from scalars distributed in memory
+ ///
+ /// In case some scalars a distributed randomly in memory. Create a vector
+ /// by loading each scalar and by inserting one after the other into the
+ /// vector.
+ ///
+ /// %scalar_1= load double* %p_1
+ /// %vec_1 = insertelement <2 x double> undef, double %scalar_1, i32 0
+ /// %scalar 2 = load double* %p_2
+ /// %vec_2 = insertelement <2 x double> %vec_1, double %scalar_1, i32 1
+ ///
+ Value *generateUnknownStrideLoad(const LoadInst *Load,
+ VectorValueMapT &ScalarMaps);
+
+ void generateLoad(const LoadInst *Load, ValueMapT &VectorMap,
+ VectorValueMapT &ScalarMaps);
+
+ void copyUnaryInst(const UnaryInstruction *Inst, ValueMapT &VectorMap,
+ VectorValueMapT &ScalarMaps);
+
+ void copyBinaryInst(const BinaryOperator *Inst, ValueMapT &VectorMap,
+ VectorValueMapT &ScalarMaps);
+
+ void copyStore(const StoreInst *Store, ValueMapT &VectorMap,
+ VectorValueMapT &ScalarMaps);
+
+ void copyInstScalarized(const Instruction *Inst, ValueMapT &VectorMap,
+ VectorValueMapT &ScalarMaps);
+
+ bool extractScalarValues(const Instruction *Inst, ValueMapT &VectorMap,
+ VectorValueMapT &ScalarMaps);
+
+ bool hasVectorOperands(const Instruction *Inst, ValueMapT &VectorMap);
+
+ void copyInstruction(const Instruction *Inst, ValueMapT &VectorMap,
+ VectorValueMapT &ScalarMaps);
+
+ void copyBB();
+};
+
+}
+#endif
+
diff --git a/lib/CodeGen/BlockGenerators.cpp b/lib/CodeGen/BlockGenerators.cpp
new file mode 100644
index 0000000..e32759d
--- /dev/null
+++ b/lib/CodeGen/BlockGenerators.cpp
@@ -0,0 +1,648 @@
+//===--- BlockGenerators.cpp - Generate code for statements -----*-
C++ -*-===//
+//
+// The LLVM Compiler Infrastructure
+//
+// This file is distributed under the University of Illinois Open Source
+// License. See LICENSE.TXT for details.
+//
+//===----------------------------------------------------------------------===//
+//
+// This file implement the BlockGenerator and VectorBlockGenerator class, which
+// generate sequential code and vectorized code for a polyhedral statement,
+// respectively.
+//
+//===----------------------------------------------------------------------===//
+
+#include "polly/ScopInfo.h"
+#include "polly/BlockGenerators.h"
+#include "polly/CodeGeneration.h"
+#include "polly/Support/GICHelper.h"
+
+#include "llvm/Transforms/Utils/BasicBlockUtils.h"
+#include "llvm/Support/CommandLine.h"
+
+#include "isl/aff.h"
+#include "isl/set.h"
+
+using namespace llvm;
+using namespace polly;
+
+static cl::opt<bool>
+Aligned("enable-polly-aligned",
+ cl::desc("Assumed aligned memory accesses."), cl::Hidden,
+ cl::value_desc("OpenMP code generation enabled if true"),
+ cl::init(false), cl::ZeroOrMore);
+
+static cl::opt<bool>
+GroupedUnrolling("enable-polly-grouped-unroll",
+ cl::desc("Perform grouped unrolling, but don't generate SIMD "
+ "instuctions"), cl::Hidden, cl::init(false),
+ cl::ZeroOrMore);
+// Helper class to generate memory location.
+namespace {
+class IslGenerator {
+public:
+ IslGenerator(IRBuilder<> &Builder, std::vector<Value *> &IVS) :
+ Builder(Builder), IVS(IVS) {}
+ Value *generateIslInt(__isl_take isl_int Int);
+ Value *generateIslAff(__isl_take isl_aff *Aff);
+ Value *generateIslPwAff(__isl_take isl_pw_aff *PwAff);
+
+private:
+ typedef struct {
+ Value *Result;
+ class IslGenerator *Generator;
+ } IslGenInfo;
+
+ IRBuilder<> &Builder;
+ std::vector<Value *> &IVS;
+ static int mergeIslAffValues(__isl_take isl_set *Set,
+ __isl_take isl_aff *Aff, void *User);
+};
+}
+
+
+Value *IslGenerator::generateIslInt(isl_int Int) {
+ mpz_t IntMPZ;
+ mpz_init(IntMPZ);
+ isl_int_get_gmp(Int, IntMPZ);
+ Value *IntValue = Builder.getInt(APInt_from_MPZ(IntMPZ));
+ mpz_clear(IntMPZ);
+ return IntValue;
+}
+
+Value *IslGenerator::generateIslAff(__isl_take isl_aff *Aff) {
+ Value *Result;
+ Value *ConstValue;
+ isl_int ConstIsl;
+
+ isl_int_init(ConstIsl);
+ isl_aff_get_constant(Aff, &ConstIsl);
+ ConstValue = generateIslInt(ConstIsl);
+ Type *Ty = Builder.getInt64Ty();
+
+ // FIXME: We should give the constant and coefficients the right type. Here
+ // we force it into i64.
+ Result = Builder.CreateSExtOrBitCast(ConstValue, Ty);
+
+ unsigned int NbInputDims = isl_aff_dim(Aff, isl_dim_in);
+
+ assert((IVS.size() == NbInputDims) && "The Dimension of Induction Variables"
+ "must match the dimension of the affine space.");
+
+ isl_int CoefficientIsl;
+ isl_int_init(CoefficientIsl);
+
+ for (unsigned int i = 0; i < NbInputDims; ++i) {
+ Value *CoefficientValue;
+ isl_aff_get_coefficient(Aff, isl_dim_in, i, &CoefficientIsl);
+
+ if (isl_int_is_zero(CoefficientIsl))
+ continue;
+
+ CoefficientValue = generateIslInt(CoefficientIsl);
+ CoefficientValue = Builder.CreateIntCast(CoefficientValue, Ty, true);
+ Value *IV = Builder.CreateIntCast(IVS[i], Ty, true);
+ Value *PAdd = Builder.CreateMul(CoefficientValue, IV, "p_mul_coeff");
+ Result = Builder.CreateAdd(Result, PAdd, "p_sum_coeff");
+ }
+
+ isl_int_clear(CoefficientIsl);
+ isl_int_clear(ConstIsl);
+ isl_aff_free(Aff);
+
+ return Result;
+}
+
+int IslGenerator::mergeIslAffValues(__isl_take isl_set *Set,
+ __isl_take isl_aff *Aff, void *User) {
+ IslGenInfo *GenInfo = (IslGenInfo *)User;
+
+ assert((GenInfo->Result == NULL) && "Result is already set."
+ "Currently only single isl_aff is supported");
+ assert(isl_set_plain_is_universe(Set)
+ && "Code generation failed because the set is not universe");
+
+ GenInfo->Result = GenInfo->Generator->generateIslAff(Aff);
+
+ isl_set_free(Set);
+ return 0;
+}
+
+Value *IslGenerator::generateIslPwAff(__isl_take isl_pw_aff *PwAff) {
+ IslGenInfo User;
+ User.Result = NULL;
+ User.Generator = this;
+ isl_pw_aff_foreach_piece(PwAff, mergeIslAffValues, &User);
+ assert(User.Result && "Code generation for isl_pw_aff failed");
+
+ isl_pw_aff_free(PwAff);
+ return User.Result;
+}
+
+
+BlockGenerator::BlockGenerator(IRBuilder<> &B, ScopStmt &Stmt, Pass *P):
+ Builder(B), Statement(Stmt), P(P) {}
+
+Value *BlockGenerator::getNewValue(const Value *Old, ValueMapT &BBMap,
+ ValueMapT &GlobalMap) {
+ // We assume constants never change.
+ // This avoids map lookups for many calls to this function.
+ if (isa<Constant>(Old))
+ return const_cast<Value*>(Old);
+
+ if (GlobalMap.count(Old)) {
+ Value *New = GlobalMap[Old];
+
+ if (Old->getType()->getScalarSizeInBits()
+ < New->getType()->getScalarSizeInBits())
+ New = Builder.CreateTruncOrBitCast(New, Old->getType());
+
+ return New;
+ }
+
+ if (BBMap.count(Old)) {
+ return BBMap[Old];
+ }
+
+ // 'Old' is within the original SCoP, but was not rewritten.
+ //
+ // Such values appear, if they only calculate information already
available in
+ // the polyhedral description (e.g. an induction variable increment). They
+ // can be safely ignored.
+ if (const Instruction *Inst = dyn_cast<Instruction>(Old))
+ if (Statement.getParent()->getRegion().contains(Inst->getParent()))
+ return NULL;
+
+ // Everything else is probably a scop-constant value defined as global,
+ // function parameter or an instruction not within the scop.
+ return const_cast<Value*>(Old);
+}
+
+void BlockGenerator::copyInstScalar(const Instruction *Inst, ValueMapT &BBMap,
+ ValueMapT &GlobalMap) {
+ Instruction *NewInst = Inst->clone();
+
+ // Replace old operands with the new ones.
+ for (Instruction::const_op_iterator OI = Inst->op_begin(),
+ OE = Inst->op_end(); OI != OE; ++OI) {
+ Value *OldOperand = *OI;
+ Value *NewOperand = getNewValue(OldOperand, BBMap, GlobalMap);
+
+ if (!NewOperand) {
+ assert(!isa<StoreInst>(NewInst)
+ && "Store instructions are always needed!");
+ delete NewInst;
+ return;
+ }
+
+ NewInst->replaceUsesOfWith(OldOperand, NewOperand);
+ }
+
+ Builder.Insert(NewInst);
+ BBMap[Inst] = NewInst;
+
+ if (!NewInst->getType()->isVoidTy())
+ NewInst->setName("p_" + Inst->getName());
+}
+
+std::vector<Value*> BlockGenerator::getMemoryAccessIndex(
+ __isl_keep isl_map *AccessRelation, Value *BaseAddress,
+ ValueMapT &BBMap, ValueMapT &GlobalMap) {
+
+ assert((isl_map_dim(AccessRelation, isl_dim_out) == 1)
+ && "Only single dimensional access functions supported");
+
+ std::vector<Value *> IVS;
+ for (unsigned i = 0; i < Statement.getNumIterators(); ++i) {
+ const Value *OriginalIV = Statement.getInductionVariableForDimension(i);
+ Value *NewIV = getNewValue(OriginalIV, BBMap, GlobalMap);
+ IVS.push_back(NewIV);
+ }
+
+ isl_pw_aff *PwAff = isl_map_dim_max(isl_map_copy(AccessRelation), 0);
+ IslGenerator IslGen(Builder, IVS);
+ Value *OffsetValue = IslGen.generateIslPwAff(PwAff);
+
+ Type *Ty = Builder.getInt64Ty();
+ OffsetValue = Builder.CreateIntCast(OffsetValue, Ty, true);
+
+ std::vector<Value*> IndexArray;
+ Value *NullValue = Constant::getNullValue(Ty);
+ IndexArray.push_back(NullValue);
+ IndexArray.push_back(OffsetValue);
+ return IndexArray;
+}
+
+Value *BlockGenerator::getNewAccessOperand(
+ __isl_keep isl_map *NewAccessRelation, Value *BaseAddress,
+ ValueMapT &BBMap, ValueMapT &GlobalMap) {
+ std::vector<Value*> IndexArray = getMemoryAccessIndex(NewAccessRelation,
+ BaseAddress,
+ BBMap, GlobalMap);
+ Value *NewOperand = Builder.CreateGEP(BaseAddress, IndexArray,
+ "p_newarrayidx_");
+ return NewOperand;
+}
+
+Value *BlockGenerator::generateLocationAccessed(const Instruction *Inst,
+ const Value *Pointer,
+ ValueMapT &BBMap,
+ ValueMapT &GlobalMap) {
+ MemoryAccess &Access = Statement.getAccessFor(Inst);
+ isl_map *CurrentAccessRelation = Access.getAccessRelation();
+ isl_map *NewAccessRelation = Access.getNewAccessRelation();
+
+ assert(isl_map_has_equal_space(CurrentAccessRelation, NewAccessRelation)
+ && "Current and new access function use different spaces");
+
+ Value *NewPointer;
+
+ if (!NewAccessRelation) {
+ NewPointer = getNewValue(Pointer, BBMap, GlobalMap);
+ } else {
+ Value *BaseAddress = const_cast<Value*>(Access.getBaseAddr());
+ NewPointer = getNewAccessOperand(NewAccessRelation, BaseAddress,
+ BBMap, GlobalMap);
+ }
+
+ isl_map_free(CurrentAccessRelation);
+ isl_map_free(NewAccessRelation);
+ return NewPointer;
+}
+
+Value *BlockGenerator::generateScalarLoad(const LoadInst *Load,
+ ValueMapT &BBMap,
+ ValueMapT &GlobalMap) {
+ const Value *Pointer = Load->getPointerOperand();
+ const Instruction *Inst = dyn_cast<Instruction>(Load);
+ Value *NewPointer = generateLocationAccessed(Inst, Pointer, BBMap,
GlobalMap);
+ Value *ScalarLoad = Builder.CreateLoad(NewPointer,
+ Load->getName() + "_p_scalar_");
+ return ScalarLoad;
+}
+
+Value *BlockGenerator::generateScalarStore(const StoreInst *Store,
+ ValueMapT &BBMap,
+ ValueMapT &GlobalMap) {
+ const Value *Pointer = Store->getPointerOperand();
+ Value *NewPointer = generateLocationAccessed(Store, Pointer, BBMap,
+ GlobalMap);
+ Value *ValueOperand = getNewValue(Store->getValueOperand(), BBMap,
GlobalMap);
+
+ return Builder.CreateStore(ValueOperand, NewPointer);
+}
+
+void BlockGenerator::copyInstruction(const Instruction *Inst,
+ ValueMapT &BBMap, ValueMapT &GlobalMap) {
+ // Terminator instructions control the control flow. They are explicitly
+ // expressed in the clast and do not need to be copied.
+ if (Inst->isTerminator())
+ return;
+
+ if (const LoadInst *Load = dyn_cast<LoadInst>(Inst)) {
+ BBMap[Load] = generateScalarLoad(Load, BBMap, GlobalMap);
+ return;
+ }
+
+ if (const StoreInst *Store = dyn_cast<StoreInst>(Inst)) {
+ BBMap[Store] = generateScalarStore(Store, BBMap, GlobalMap);
+ return;
+ }
+
+ copyInstScalar(Inst, BBMap, GlobalMap);
+}
+
+
+void BlockGenerator::copyBB(ValueMapT &GlobalMap) {
+ BasicBlock *BB = Statement.getBasicBlock();
+ BasicBlock *CopyBB = SplitBlock(Builder.GetInsertBlock(),
+ Builder.GetInsertPoint(), P);
+ CopyBB->setName("polly.stmt." + BB->getName());
+ Builder.SetInsertPoint(CopyBB->begin());
+
+ ValueMapT BBMap;
+
+ for (BasicBlock::const_iterator II = BB->begin(), IE = BB->end(); II != IE;
+ ++II)
+ copyInstruction(II, BBMap, GlobalMap);
+}
+
+VectorBlockGenerator::VectorBlockGenerator(IRBuilder<> &B,
+ VectorValueMapT &GlobalMaps, ScopStmt &Stmt, __isl_keep isl_set *Domain,
+ Pass *P) : BlockGenerator(B, Stmt, P), GlobalMaps(GlobalMaps),
+ Domain(Domain) {
+ assert(GlobalMaps.size() > 1 && "Only one vector lane found");
+ assert(Domain && "No statement domain provided");
+ }
+
+Value *VectorBlockGenerator::getVectorValue(const Value *Old,
+ ValueMapT &VectorMap,
+ VectorValueMapT &ScalarMaps) {
+ if (VectorMap.count(Old))
+ return VectorMap[Old];
+
+ int Width = getVectorWidth();
+
+ Value *Vector = UndefValue::get(VectorType::get(Old->getType(), Width));
+
+ for (int Lane = 0; Lane < Width; Lane++)
+ Vector = Builder.CreateInsertElement(Vector,
+ getNewValue(Old,
+ ScalarMaps[Lane],
+ GlobalMaps[Lane]),
+ Builder.getInt32(Lane));
+
+ VectorMap[Old] = Vector;
+
+ return Vector;
+}
+
+Type *VectorBlockGenerator::getVectorPtrTy(const Value *Val, int Width) {
+ PointerType *PointerTy = dyn_cast<PointerType>(Val->getType());
+ assert(PointerTy && "PointerType expected");
+
+ Type *ScalarType = PointerTy->getElementType();
+ VectorType *VectorType = VectorType::get(ScalarType, Width);
+
+ return PointerType::getUnqual(VectorType);
+}
+
+Value *VectorBlockGenerator::generateStrideOneLoad(const LoadInst *Load,
+ ValueMapT &BBMap) {
+ const Value *Pointer = Load->getPointerOperand();
+ Type *VectorPtrType = getVectorPtrTy(Pointer, getVectorWidth());
+ Value *NewPointer = getNewValue(Pointer, BBMap, GlobalMaps[0]);
+ Value *VectorPtr = Builder.CreateBitCast(NewPointer, VectorPtrType,
+ "vector_ptr");
+ LoadInst *VecLoad = Builder.CreateLoad(VectorPtr,
+ Load->getName() + "_p_vec_full");
+ if (!Aligned)
+ VecLoad->setAlignment(8);
+
+ return VecLoad;
+}
+
+Value *VectorBlockGenerator::generateStrideZeroLoad(const LoadInst *Load,
+ ValueMapT &BBMap) {
+ const Value *Pointer = Load->getPointerOperand();
+ Type *VectorPtrType = getVectorPtrTy(Pointer, 1);
+ Value *NewPointer = getNewValue(Pointer, BBMap, GlobalMaps[0]);
+ Value *VectorPtr = Builder.CreateBitCast(NewPointer, VectorPtrType,
+ Load->getName() + "_p_vec_p");
+ LoadInst *ScalarLoad= Builder.CreateLoad(VectorPtr,
+ Load->getName() + "_p_splat_one");
+
+ if (!Aligned)
+ ScalarLoad->setAlignment(8);
+
+ Constant *SplatVector =
+ Constant::getNullValue(VectorType::get(Builder.getInt32Ty(),
+ getVectorWidth()));
+
+ Value *VectorLoad = Builder.CreateShuffleVector(ScalarLoad, ScalarLoad,
+ SplatVector,
+ Load->getName()
+ + "_p_splat");
+ return VectorLoad;
+}
+
+Value *VectorBlockGenerator::generateUnknownStrideLoad(const LoadInst *Load,
+ VectorValueMapT &ScalarMaps) {
+ int VectorWidth = getVectorWidth();
+ const Value *Pointer = Load->getPointerOperand();
+ VectorType *VectorType = VectorType::get(
+ dyn_cast<PointerType>(Pointer->getType())->getElementType(), VectorWidth);
+
+ Value *Vector = UndefValue::get(VectorType);
+
+ for (int i = 0; i < VectorWidth; i++) {
+ Value *NewPointer = getNewValue(Pointer, ScalarMaps[i], GlobalMaps[i]);
+ Value *ScalarLoad = Builder.CreateLoad(NewPointer,
+ Load->getName() + "_p_scalar_");
+ Vector = Builder.CreateInsertElement(Vector, ScalarLoad,
+ Builder.getInt32(i),
+ Load->getName() + "_p_vec_");
+ }
+
+ return Vector;
+}
+
+void VectorBlockGenerator::generateLoad(const LoadInst *Load,
+ ValueMapT &VectorMap,
+ VectorValueMapT &ScalarMaps) {
+ if (GroupedUnrolling || !VectorType::isValidElementType(Load->getType())) {
+ for (int i = 0; i < getVectorWidth(); i++)
+ ScalarMaps[i][Load] = generateScalarLoad(Load, ScalarMaps[i],
+ GlobalMaps[i]);
+ return;
+ }
+
+ MemoryAccess &Access = Statement.getAccessFor(Load);
+
+ Value *NewLoad;
+ if (Access.isStrideZero(isl_set_copy(Domain)))
+ NewLoad = generateStrideZeroLoad(Load, ScalarMaps[0]);
+ else if (Access.isStrideOne(isl_set_copy(Domain)))
+ NewLoad = generateStrideOneLoad(Load, ScalarMaps[0]);
+ else
+ NewLoad = generateUnknownStrideLoad(Load, ScalarMaps);
+
+ VectorMap[Load] = NewLoad;
+}
+
+void VectorBlockGenerator::copyUnaryInst(const UnaryInstruction *Inst,
+ ValueMapT &VectorMap,
+ VectorValueMapT &ScalarMaps) {
+ int VectorWidth = getVectorWidth();
+ Value *NewOperand = getVectorValue(Inst->getOperand(0), VectorMap,
+ ScalarMaps);
+
+ assert(isa<CastInst>(Inst) && "Can not generate vector code for
instruction");
+
+ const CastInst *Cast = dyn_cast<CastInst>(Inst);
+ VectorType *DestType = VectorType::get(Inst->getType(), VectorWidth);
+ VectorMap[Inst] = Builder.CreateCast(Cast->getOpcode(), NewOperand,
DestType);
+}
+
+void VectorBlockGenerator::copyBinaryInst(const BinaryOperator *Inst,
+ ValueMapT &VectorMap,
+ VectorValueMapT &ScalarMaps) {
+ Value *OpZero = Inst->getOperand(0);
+ Value *OpOne = Inst->getOperand(1);
+
+ Value *NewOpZero, *NewOpOne;
+ NewOpZero = getVectorValue(OpZero, VectorMap, ScalarMaps);
+ NewOpOne = getVectorValue(OpOne, VectorMap, ScalarMaps);
+
+ Value *NewInst = Builder.CreateBinOp(Inst->getOpcode(), NewOpZero,
+ NewOpOne,
+ Inst->getName() + "p_vec");
+ VectorMap[Inst] = NewInst;
+}
+
+void VectorBlockGenerator::copyStore(const StoreInst *Store,
+ ValueMapT &VectorMap,
+ VectorValueMapT &ScalarMaps) {
+ int VectorWidth = getVectorWidth();
+
+ MemoryAccess &Access = Statement.getAccessFor(Store);
+
+ const Value *Pointer = Store->getPointerOperand();
+ Value *Vector = getVectorValue(Store->getValueOperand(), VectorMap,
+ ScalarMaps);
+
+ if (Access.isStrideOne(isl_set_copy(Domain))) {
+ Type *VectorPtrType = getVectorPtrTy(Pointer, VectorWidth);
+ Value *NewPointer = getNewValue(Pointer, ScalarMaps[0], GlobalMaps[0]);
+
+ Value *VectorPtr = Builder.CreateBitCast(NewPointer, VectorPtrType,
+ "vector_ptr");
+ StoreInst *Store = Builder.CreateStore(Vector, VectorPtr);
+
+ if (!Aligned)
+ Store->setAlignment(8);
+ } else {
+ for (unsigned i = 0; i < ScalarMaps.size(); i++) {
+ Value *Scalar = Builder.CreateExtractElement(Vector,
+ Builder.getInt32(i));
+ Value *NewPointer = getNewValue(Pointer, ScalarMaps[i], GlobalMaps[i]);
+ Builder.CreateStore(Scalar, NewPointer);
+ }
+ }
+}
+
+bool VectorBlockGenerator::hasVectorOperands(const Instruction *Inst,
+ ValueMapT &VectorMap) {
+ for (Instruction::const_op_iterator OI = Inst->op_begin(),
+ OE = Inst->op_end(); OI != OE; ++OI)
+ if (VectorMap.count(*OI))
+ return true;
+ return false;
+}
+
+bool VectorBlockGenerator::extractScalarValues(const Instruction *Inst,
+ ValueMapT &VectorMap,
+ VectorValueMapT &ScalarMaps) {
+ bool HasVectorOperand = false;
+ int VectorWidth = getVectorWidth();
+
+ for (Instruction::const_op_iterator OI = Inst->op_begin(),
+ OE = Inst->op_end(); OI != OE; ++OI) {
+ ValueMapT::iterator VecOp = VectorMap.find(*OI);
+
+ if (VecOp == VectorMap.end())
+ continue;
+
+ HasVectorOperand = true;
+ Value *NewVector = VecOp->second;
+
+ for (int i = 0; i < VectorWidth; ++i) {
+ ValueMapT &SM = ScalarMaps[i];
+
+ // If there is one scalar extracted, all scalar elements should have
+ // already been extracted by the code here. So no need to check for the
+ // existance of all of them.
+ if (SM.count(*OI))
+ break;
+
+ SM[*OI] = Builder.CreateExtractElement(NewVector, Builder.getInt32(i));
+ }
+ }
+
+ return HasVectorOperand;
+}
+
+void VectorBlockGenerator::copyInstScalarized(const Instruction *Inst,
+ ValueMapT &VectorMap,
+ VectorValueMapT &ScalarMaps) {
+ bool HasVectorOperand;
+ int VectorWidth = getVectorWidth();
+
+ HasVectorOperand = extractScalarValues(Inst, VectorMap, ScalarMaps);
+
+ for (int VectorLane = 0; VectorLane < getVectorWidth(); VectorLane++)
+ copyInstScalar(Inst, ScalarMaps[VectorLane], GlobalMaps[VectorLane]);
+
+ if (!VectorType::isValidElementType(Inst->getType()) || !HasVectorOperand)
+ return;
+
+ // Make the result available as vector value.
+ VectorType *VectorType = VectorType::get(Inst->getType(), VectorWidth);
+ Value *Vector = UndefValue::get(VectorType);
+
+ for (int i = 0; i < VectorWidth; i++)
+ Vector = Builder.CreateInsertElement(Vector, ScalarMaps[i][Inst],
+ Builder.getInt32(i));
+
+ VectorMap[Inst] = Vector;
+}
+
+int VectorBlockGenerator::getVectorWidth() {
+ return GlobalMaps.size();
+}
+
+void VectorBlockGenerator::copyInstruction(const Instruction *Inst,
+ ValueMapT &VectorMap,
+ VectorValueMapT &ScalarMaps) {
+ // Terminator instructions control the control flow. They are explicitly
+ // expressed in the clast and do not need to be copied.
+ if (Inst->isTerminator())
+ return;
+
+ if (const LoadInst *Load = dyn_cast<LoadInst>(Inst)) {
+ generateLoad(Load, VectorMap, ScalarMaps);
+ return;
+ }
+
+ if (hasVectorOperands(Inst, VectorMap)) {
+ if (const StoreInst *Store = dyn_cast<StoreInst>(Inst)) {
+ copyStore(Store, VectorMap, ScalarMaps);
+ return;
+ }
+
+ if (const UnaryInstruction *Unary = dyn_cast<UnaryInstruction>(Inst)) {
+ copyUnaryInst(Unary, VectorMap, ScalarMaps);
+ return;
+ }
+
+ if (const BinaryOperator *Binary = dyn_cast<BinaryOperator>(Inst)) {
+ copyBinaryInst(Binary, VectorMap, ScalarMaps);
+ return;
+ }
+
+ // Falltrough: We generate scalar instructions, if we don't know how to
+ // generate vector code.
+ }
+
+ copyInstScalarized(Inst, VectorMap, ScalarMaps);
+}
+
+void VectorBlockGenerator::copyBB() {
+ BasicBlock *BB = Statement.getBasicBlock();
+ BasicBlock *CopyBB = SplitBlock(Builder.GetInsertBlock(),
+ Builder.GetInsertPoint(), P);
+ CopyBB->setName("polly.stmt." + BB->getName());
+ Builder.SetInsertPoint(CopyBB->begin());
+
+ // Create two maps that store the mapping from the original instructions of
+ // the old basic block to their copies in the new basic block. Those maps
+ // are basic block local.
+ //
+ // As vector code generation is supported there is one map for scalar values
+ // and one for vector values.
+ //
+ // In case we just do scalar code generation, the vectorMap is not used and
+ // the scalarMap has just one dimension, which contains the mapping.
+ //
+ // In case vector code generation is done, an instruction may either appear
+ // in the vector map once (as it is calculating >vectorwidth< values at a
+ // time. Or (if the values are calculated using scalar operations), it
+ // appears once in every dimension of the scalarMap.
+ VectorValueMapT ScalarBlockMap(getVectorWidth());
+ ValueMapT VectorBlockMap;
+
+ for (BasicBlock::const_iterator II = BB->begin(), IE = BB->end();
+ II != IE; ++II)
+ copyInstruction(II, VectorBlockMap, ScalarBlockMap);
+}
diff --git a/lib/CodeGen/CMakeLists.txt b/lib/CodeGen/CMakeLists.txt
index 4d75652..3e10741 100755
--- a/lib/CodeGen/CMakeLists.txt
+++ b/lib/CodeGen/CMakeLists.txt
@@ -1,4 +1,5 @@
add_polly_library(PollyCodeGen
+ BlockGenerators.cpp
CodeGeneration.cpp
LoopGenerators.cpp
)
diff --git a/lib/CodeGen/CodeGeneration.cpp b/lib/CodeGen/CodeGeneration.cpp
index 7edd136..c68cf1d 100644
--- a/lib/CodeGen/CodeGeneration.cpp
+++ b/lib/CodeGen/CodeGeneration.cpp
@@ -28,8 +28,9 @@
#include "polly/LinkAllPasses.h"
#include "polly/ScopInfo.h"
#include "polly/TempScopInfo.h"
-#include "polly/Support/GICHelper.h"
+#include "polly/BlockGenerators.h"
#include "polly/LoopGenerators.h"
+#include "polly/Support/GICHelper.h"
#include "llvm/Module.h"
#include "llvm/ADT/SetVector.h"
@@ -38,7 +39,6 @@
#include "llvm/Analysis/ScalarEvolutionExpander.h"
#include "llvm/Support/CommandLine.h"
#include "llvm/Support/Debug.h"
-#include "llvm/Support/IRBuilder.h"
#include "llvm/Target/TargetData.h"
#include "llvm/Transforms/Utils/BasicBlockUtils.h"
@@ -59,6 +59,7 @@ struct isl_set;
namespace polly {
bool EnablePollyVector;
+bool EnablePollyAligned;
static cl::opt<bool, true>
Vector("enable-polly-vector",
@@ -78,829 +79,7 @@ AtLeastOnce("enable-polly-atLeastOnce",
cl::value_desc("OpenMP code generation enabled if true"),
cl::init(false), cl::ZeroOrMore);
-static cl::opt<bool>
-Aligned("enable-polly-aligned",
- cl::desc("Assumed aligned memory accesses."), cl::Hidden,
- cl::value_desc("OpenMP code generation enabled if true"),
- cl::init(false), cl::ZeroOrMore);
-
-static cl::opt<bool>
-GroupedUnrolling("enable-polly-grouped-unroll",
- cl::desc("Perform grouped unrolling, but don't generate SIMD "
- "instuctions"), cl::Hidden, cl::init(false),
- cl::ZeroOrMore);
-
-typedef DenseMap<const Value*, Value*> ValueMapT;
typedef DenseMap<const char*, Value*> CharMapT;
-typedef std::vector<ValueMapT> VectorValueMapT;
-
-class IslGenerator {
-public:
- IslGenerator(IRBuilder<> &Builder, std::vector<Value *> &IVS) :
- Builder(Builder), IVS(IVS) {}
- Value *generateIslInt(__isl_take isl_int Int);
- Value *generateIslAff(__isl_take isl_aff *Aff);
- Value *generateIslPwAff(__isl_take isl_pw_aff *PwAff);
-
-private:
- typedef struct {
- Value *Result;
- class IslGenerator *Generator;
- } IslGenInfo;
-
- IRBuilder<> &Builder;
- std::vector<Value *> &IVS;
- static int mergeIslAffValues(__isl_take isl_set *Set,
- __isl_take isl_aff *Aff, void *User);
-};
-
-Value *IslGenerator::generateIslInt(isl_int Int) {
- mpz_t IntMPZ;
- mpz_init(IntMPZ);
- isl_int_get_gmp(Int, IntMPZ);
- Value *IntValue = Builder.getInt(APInt_from_MPZ(IntMPZ));
- mpz_clear(IntMPZ);
- return IntValue;
-}
-
-Value *IslGenerator::generateIslAff(__isl_take isl_aff *Aff) {
- Value *Result;
- Value *ConstValue;
- isl_int ConstIsl;
-
- isl_int_init(ConstIsl);
- isl_aff_get_constant(Aff, &ConstIsl);
- ConstValue = generateIslInt(ConstIsl);
- Type *Ty = Builder.getInt64Ty();
-
- // FIXME: We should give the constant and coefficients the right type. Here
- // we force it into i64.
- Result = Builder.CreateSExtOrBitCast(ConstValue, Ty);
-
- unsigned int NbInputDims = isl_aff_dim(Aff, isl_dim_in);
-
- assert((IVS.size() == NbInputDims) && "The Dimension of Induction Variables"
- "must match the dimension of the affine space.");
-
- isl_int CoefficientIsl;
- isl_int_init(CoefficientIsl);
-
- for (unsigned int i = 0; i < NbInputDims; ++i) {
- Value *CoefficientValue;
- isl_aff_get_coefficient(Aff, isl_dim_in, i, &CoefficientIsl);
-
- if (isl_int_is_zero(CoefficientIsl))
- continue;
-
- CoefficientValue = generateIslInt(CoefficientIsl);
- CoefficientValue = Builder.CreateIntCast(CoefficientValue, Ty, true);
- Value *IV = Builder.CreateIntCast(IVS[i], Ty, true);
- Value *PAdd = Builder.CreateMul(CoefficientValue, IV, "p_mul_coeff");
- Result = Builder.CreateAdd(Result, PAdd, "p_sum_coeff");
- }
-
- isl_int_clear(CoefficientIsl);
- isl_int_clear(ConstIsl);
- isl_aff_free(Aff);
-
- return Result;
-}
-
-int IslGenerator::mergeIslAffValues(__isl_take isl_set *Set,
- __isl_take isl_aff *Aff, void *User) {
- IslGenInfo *GenInfo = (IslGenInfo *)User;
-
- assert((GenInfo->Result == NULL) && "Result is already set."
- "Currently only single isl_aff is supported");
- assert(isl_set_plain_is_universe(Set)
- && "Code generation failed because the set is not universe");
-
- GenInfo->Result = GenInfo->Generator->generateIslAff(Aff);
-
- isl_set_free(Set);
- return 0;
-}
-
-Value *IslGenerator::generateIslPwAff(__isl_take isl_pw_aff *PwAff) {
- IslGenInfo User;
- User.Result = NULL;
- User.Generator = this;
- isl_pw_aff_foreach_piece(PwAff, mergeIslAffValues, &User);
- assert(User.Result && "Code generation for isl_pw_aff failed");
-
- isl_pw_aff_free(PwAff);
- return User.Result;
-}
-
-/// @brief Generate a new basic block for a polyhedral statement.
-///
-/// The only public function exposed is generate().
-class BlockGenerator {
-public:
- /// @brief Generate a new BasicBlock for a ScopStmt.
- ///
- /// @param Builder The LLVM-IR Builder used to generate the statement. The
- /// code is generated at the location, the Builder
points to.
- /// @param Stmt The statement to code generate.
- /// @param GlobalMap A map that defines for certain Values
referenced from the
- /// original code new Values they should be replaced with.
- /// @param P A reference to the pass this function is called from.
- /// The pass is needed to update other analysis.
- static void generate(IRBuilder<> &Builder, ScopStmt &Stmt,
- ValueMapT &GlobalMap, Pass *P) {
- BlockGenerator Generator(Builder, Stmt, P);
- Generator.copyBB(GlobalMap);
- }
-
-protected:
- IRBuilder<> &Builder;
- ScopStmt &Statement;
- Pass *P;
-
- BlockGenerator(IRBuilder<> &B, ScopStmt &Stmt, Pass *P);
-
- /// @brief Get the new version of a Value.
- ///
- /// @param Old The old Value.
- /// @param BBMap A mapping from old values to their new values
- /// (for values recalculated within this basic block).
- /// @param GlobalMap A mapping from old values to their new values
- /// (for values recalculated in the new ScoP, but not
- /// within this basic block).
- ///
- /// @returns o The old value, if it is still valid.
- /// o The new value, if available.
- /// o NULL, if no value is found.
- Value *getNewValue(const Value *Old, ValueMapT &BBMap, ValueMapT &GlobalMap);
-
- void copyInstScalar(const Instruction *Inst, ValueMapT &BBMap,
- ValueMapT &GlobalMap);
-
- /// @brief Get the memory access offset to be added to the base address
- std::vector<Value*> getMemoryAccessIndex(__isl_keep isl_map *AccessRelation,
- Value *BaseAddress,
ValueMapT &BBMap,
- ValueMapT &GlobalMap);
-
- /// @brief Get the new operand address according to the changed access in
- /// JSCOP file.
- Value *getNewAccessOperand(__isl_keep isl_map *NewAccessRelation,
- Value *BaseAddress, ValueMapT &BBMap,
- ValueMapT &GlobalMap);
-
- /// @brief Generate the operand address
- Value *generateLocationAccessed(const Instruction *Inst,
- const Value *Pointer, ValueMapT &BBMap,
- ValueMapT &GlobalMap);
-
- Value *generateScalarLoad(const LoadInst *load, ValueMapT &BBMap,
- ValueMapT &GlobalMap);
-
- Value *generateScalarStore(const StoreInst *store, ValueMapT &BBMap,
- ValueMapT &GlobalMap);
-
- /// @brief Copy a single Instruction.
- ///
- /// This copies a single Instruction and updates references to old values
- /// with references to new values, as defined by GlobalMap and BBMap.
- ///
- /// @param BBMap A mapping from old values to their new values
- /// (for values recalculated within this basic block).
- /// @param GlobalMap A mapping from old values to their new values
- /// (for values recalculated in the new ScoP, but not
- /// within this basic block).
- void copyInstruction(const Instruction *Inst, ValueMapT &BBMap,
- ValueMapT &GlobalMap);
-
- /// @brief Copy the basic block.
- ///
- /// This copies the entire basic block and updates references to old values
- /// with references to new values, as defined by GlobalMap.
- ///
- /// @param GlobalMap A mapping from old values to their new values
- /// (for values recalculated in the new ScoP, but not
- /// within this basic block).
- void copyBB(ValueMapT &GlobalMap);
-};
-
-BlockGenerator::BlockGenerator(IRBuilder<> &B, ScopStmt &Stmt, Pass *P):
- Builder(B), Statement(Stmt), P(P) {}
-
-Value *BlockGenerator::getNewValue(const Value *Old, ValueMapT &BBMap,
- ValueMapT &GlobalMap) {
- // We assume constants never change.
- // This avoids map lookups for many calls to this function.
- if (isa<Constant>(Old))
- return const_cast<Value*>(Old);
-
- if (GlobalMap.count(Old)) {
- Value *New = GlobalMap[Old];
-
- if (Old->getType()->getScalarSizeInBits()
- < New->getType()->getScalarSizeInBits())
- New = Builder.CreateTruncOrBitCast(New, Old->getType());
-
- return New;
- }
-
- if (BBMap.count(Old)) {
- return BBMap[Old];
- }
-
- // 'Old' is within the original SCoP, but was not rewritten.
- //
- // Such values appear, if they only calculate information already
available in
- // the polyhedral description (e.g. an induction variable increment). They
- // can be safely ignored.
- if (const Instruction *Inst = dyn_cast<Instruction>(Old))
- if (Statement.getParent()->getRegion().contains(Inst->getParent()))
- return NULL;
-
- // Everything else is probably a scop-constant value defined as global,
- // function parameter or an instruction not within the scop.
- return const_cast<Value*>(Old);
-}
-
-void BlockGenerator::copyInstScalar(const Instruction *Inst, ValueMapT &BBMap,
- ValueMapT &GlobalMap) {
- Instruction *NewInst = Inst->clone();
-
- // Replace old operands with the new ones.
- for (Instruction::const_op_iterator OI = Inst->op_begin(),
- OE = Inst->op_end(); OI != OE; ++OI) {
- Value *OldOperand = *OI;
- Value *NewOperand = getNewValue(OldOperand, BBMap, GlobalMap);
-
- if (!NewOperand) {
- assert(!isa<StoreInst>(NewInst)
- && "Store instructions are always needed!");
- delete NewInst;
- return;
- }
-
- NewInst->replaceUsesOfWith(OldOperand, NewOperand);
- }
-
- Builder.Insert(NewInst);
- BBMap[Inst] = NewInst;
-
- if (!NewInst->getType()->isVoidTy())
- NewInst->setName("p_" + Inst->getName());
-}
-
-std::vector<Value*> BlockGenerator::getMemoryAccessIndex(
- __isl_keep isl_map *AccessRelation, Value *BaseAddress,
- ValueMapT &BBMap, ValueMapT &GlobalMap) {
-
- assert((isl_map_dim(AccessRelation, isl_dim_out) == 1)
- && "Only single dimensional access functions supported");
-
- std::vector<Value *> IVS;
- for (unsigned i = 0; i < Statement.getNumIterators(); ++i) {
- const Value *OriginalIV = Statement.getInductionVariableForDimension(i);
- Value *NewIV = getNewValue(OriginalIV, BBMap, GlobalMap);
- IVS.push_back(NewIV);
- }
-
- isl_pw_aff *PwAff = isl_map_dim_max(isl_map_copy(AccessRelation), 0);
- IslGenerator IslGen(Builder, IVS);
- Value *OffsetValue = IslGen.generateIslPwAff(PwAff);
-
- Type *Ty = Builder.getInt64Ty();
- OffsetValue = Builder.CreateIntCast(OffsetValue, Ty, true);
-
- std::vector<Value*> IndexArray;
- Value *NullValue = Constant::getNullValue(Ty);
- IndexArray.push_back(NullValue);
- IndexArray.push_back(OffsetValue);
- return IndexArray;
-}
-
-Value *BlockGenerator::getNewAccessOperand(
- __isl_keep isl_map *NewAccessRelation, Value *BaseAddress,
- ValueMapT &BBMap, ValueMapT &GlobalMap) {
- std::vector<Value*> IndexArray = getMemoryAccessIndex(NewAccessRelation,
- BaseAddress,
- BBMap, GlobalMap);
- Value *NewOperand = Builder.CreateGEP(BaseAddress, IndexArray,
- "p_newarrayidx_");
- return NewOperand;
-}
-
-Value *BlockGenerator::generateLocationAccessed(const Instruction *Inst,
- const Value *Pointer,
- ValueMapT &BBMap,
- ValueMapT &GlobalMap) {
- MemoryAccess &Access = Statement.getAccessFor(Inst);
- isl_map *CurrentAccessRelation = Access.getAccessRelation();
- isl_map *NewAccessRelation = Access.getNewAccessRelation();
-
- assert(isl_map_has_equal_space(CurrentAccessRelation, NewAccessRelation)
- && "Current and new access function use different spaces");
-
- Value *NewPointer;
-
- if (!NewAccessRelation) {
- NewPointer = getNewValue(Pointer, BBMap, GlobalMap);
- } else {
- Value *BaseAddress = const_cast<Value*>(Access.getBaseAddr());
- NewPointer = getNewAccessOperand(NewAccessRelation, BaseAddress,
- BBMap, GlobalMap);
- }
-
- isl_map_free(CurrentAccessRelation);
- isl_map_free(NewAccessRelation);
- return NewPointer;
-}
-
-Value *BlockGenerator::generateScalarLoad(const LoadInst *Load,
- ValueMapT &BBMap,
- ValueMapT &GlobalMap) {
- const Value *Pointer = Load->getPointerOperand();
- const Instruction *Inst = dyn_cast<Instruction>(Load);
- Value *NewPointer = generateLocationAccessed(Inst, Pointer, BBMap,
GlobalMap);
- Value *ScalarLoad = Builder.CreateLoad(NewPointer,
- Load->getName() + "_p_scalar_");
- return ScalarLoad;
-}
-
-Value *BlockGenerator::generateScalarStore(const StoreInst *Store,
- ValueMapT &BBMap,
- ValueMapT &GlobalMap) {
- const Value *Pointer = Store->getPointerOperand();
- Value *NewPointer = generateLocationAccessed(Store, Pointer, BBMap,
- GlobalMap);
- Value *ValueOperand = getNewValue(Store->getValueOperand(), BBMap,
GlobalMap);
-
- return Builder.CreateStore(ValueOperand, NewPointer);
-}
-
-void BlockGenerator::copyInstruction(const Instruction *Inst,
- ValueMapT &BBMap, ValueMapT &GlobalMap) {
- // Terminator instructions control the control flow. They are explicitly
- // expressed in the clast and do not need to be copied.
- if (Inst->isTerminator())
- return;
-
- if (const LoadInst *Load = dyn_cast<LoadInst>(Inst)) {
- BBMap[Load] = generateScalarLoad(Load, BBMap, GlobalMap);
- return;
- }
-
- if (const StoreInst *Store = dyn_cast<StoreInst>(Inst)) {
- BBMap[Store] = generateScalarStore(Store, BBMap, GlobalMap);
- return;
- }
-
- copyInstScalar(Inst, BBMap, GlobalMap);
-}
-
-
-void BlockGenerator::copyBB(ValueMapT &GlobalMap) {
- BasicBlock *BB = Statement.getBasicBlock();
- BasicBlock *CopyBB = SplitBlock(Builder.GetInsertBlock(),
- Builder.GetInsertPoint(), P);
- CopyBB->setName("polly.stmt." + BB->getName());
- Builder.SetInsertPoint(CopyBB->begin());
-
- ValueMapT BBMap;
-
- for (BasicBlock::const_iterator II = BB->begin(), IE = BB->end(); II != IE;
- ++II)
- copyInstruction(II, BBMap, GlobalMap);
-}
-
-/// @brief Generate a new vector basic block for a polyhedral statement.
-///
-/// The only public function exposed is generate().
-class VectorBlockGenerator : BlockGenerator {
-public:
- /// @brief Generate a new vector basic block for a ScoPStmt.
- ///
- /// This code generation is similar to the normal, scalar code generation,
- /// except that each instruction is code generated for several vector lanes
- /// at a time. If possible instructions are issued as actual vector
- /// instructions, but e.g. for address calculation instructions we currently
- /// generate scalar instructions for each vector lane.
- ///
- /// @param Builder The LLVM-IR Builder used to generate the statement. The
- /// code is generated at the location, the builder points
- /// to.
- /// @param Stmt The statement to code generate.
- /// @param GlobalMaps A vector of maps that define for certain Values
- /// referenced from the original code new Values
they should
- /// be replaced with. Each map in the vector of maps is
- /// used for one vector lane. The number of elements in the
- /// vector defines the width of the generated vector
- /// instructions.
- /// @param P A reference to the pass this function is called from.
- /// The pass is needed to update other analysis.
- static void generate(IRBuilder<> &B, ScopStmt &Stmt,
- VectorValueMapT &GlobalMaps, __isl_keep isl_set *Domain,
- Pass *P) {
- VectorBlockGenerator Generator(B, GlobalMaps, Stmt, Domain, P);
- Generator.copyBB();
- }
-
-private:
- // This is a vector of global value maps. The first map is used
for the first
- // vector lane, ...
- // Each map, contains information about Instructions in the old ScoP, which
- // are recalculated in the new SCoP. When copying the basic block, we replace
- // all referenes to the old instructions with their recalculated values.
- VectorValueMapT &GlobalMaps;
-
- isl_set *Domain;
-
- VectorBlockGenerator(IRBuilder<> &B, VectorValueMapT &GlobalMaps,
- ScopStmt &Stmt, __isl_keep isl_set *Domain, Pass *P);
-
- int getVectorWidth();
-
- Value *getVectorValue(const Value *Old, ValueMapT &VectorMap,
- VectorValueMapT &ScalarMaps);
-
- Type *getVectorPtrTy(const Value *V, int Width);
-
- /// @brief Load a vector from a set of adjacent scalars
- ///
- /// In case a set of scalars is known to be next to each other in memory,
- /// create a vector load that loads those scalars
- ///
- /// %vector_ptr= bitcast double* %p to <4 x double>*
- /// %vec_full = load <4 x double>* %vector_ptr
- ///
- Value *generateStrideOneLoad(const LoadInst *Load, ValueMapT &BBMap);
-
- /// @brief Load a vector initialized from a single scalar in memory
- ///
- /// In case all elements of a vector are initialized to the same
- /// scalar value, this value is loaded and shuffeled into all elements
- /// of the vector.
- ///
- /// %splat_one = load <1 x double>* %p
- /// %splat = shufflevector <1 x double> %splat_one, <1 x
- /// double> %splat_one, <4 x i32> zeroinitializer
- ///
- Value *generateStrideZeroLoad(const LoadInst *Load, ValueMapT &BBMap);
-
- /// @Load a vector from scalars distributed in memory
- ///
- /// In case some scalars a distributed randomly in memory. Create a vector
- /// by loading each scalar and by inserting one after the other into the
- /// vector.
- ///
- /// %scalar_1= load double* %p_1
- /// %vec_1 = insertelement <2 x double> undef, double %scalar_1, i32 0
- /// %scalar 2 = load double* %p_2
- /// %vec_2 = insertelement <2 x double> %vec_1, double %scalar_1, i32 1
- ///
- Value *generateUnknownStrideLoad(const LoadInst *Load,
- VectorValueMapT &ScalarMaps);
-
- void generateLoad(const LoadInst *Load, ValueMapT &VectorMap,
- VectorValueMapT &ScalarMaps);
-
- void copyUnaryInst(const UnaryInstruction *Inst, ValueMapT &VectorMap,
- VectorValueMapT &ScalarMaps);
-
- void copyBinaryInst(const BinaryOperator *Inst, ValueMapT &VectorMap,
- VectorValueMapT &ScalarMaps);
-
- void copyStore(const StoreInst *Store, ValueMapT &VectorMap,
- VectorValueMapT &ScalarMaps);
-
- void copyInstScalarized(const Instruction *Inst, ValueMapT &VectorMap,
- VectorValueMapT &ScalarMaps);
-
- bool extractScalarValues(const Instruction *Inst, ValueMapT &VectorMap,
- VectorValueMapT &ScalarMaps);
-
- bool hasVectorOperands(const Instruction *Inst, ValueMapT &VectorMap);
-
- void copyInstruction(const Instruction *Inst, ValueMapT &VectorMap,
- VectorValueMapT &ScalarMaps);
-
- void copyBB();
-};
-
-VectorBlockGenerator::VectorBlockGenerator(IRBuilder<> &B,
- VectorValueMapT &GlobalMaps, ScopStmt &Stmt, __isl_keep isl_set *Domain,
- Pass *P) : BlockGenerator(B, Stmt, P), GlobalMaps(GlobalMaps),
- Domain(Domain) {
- assert(GlobalMaps.size() > 1 && "Only one vector lane found");
- assert(Domain && "No statement domain provided");
- }
-
-Value *VectorBlockGenerator::getVectorValue(const Value *Old,
- ValueMapT &VectorMap,
- VectorValueMapT &ScalarMaps) {
- if (VectorMap.count(Old))
- return VectorMap[Old];
-
- int Width = getVectorWidth();
-
- Value *Vector = UndefValue::get(VectorType::get(Old->getType(), Width));
-
- for (int Lane = 0; Lane < Width; Lane++)
- Vector = Builder.CreateInsertElement(Vector,
- getNewValue(Old,
- ScalarMaps[Lane],
- GlobalMaps[Lane]),
- Builder.getInt32(Lane));
-
- VectorMap[Old] = Vector;
-
- return Vector;
-}
-
-Type *VectorBlockGenerator::getVectorPtrTy(const Value *Val, int Width) {
- PointerType *PointerTy = dyn_cast<PointerType>(Val->getType());
- assert(PointerTy && "PointerType expected");
-
- Type *ScalarType = PointerTy->getElementType();
- VectorType *VectorType = VectorType::get(ScalarType, Width);
-
- return PointerType::getUnqual(VectorType);
-}
-
-Value *VectorBlockGenerator::generateStrideOneLoad(const LoadInst *Load,
- ValueMapT &BBMap) {
- const Value *Pointer = Load->getPointerOperand();
- Type *VectorPtrType = getVectorPtrTy(Pointer, getVectorWidth());
- Value *NewPointer = getNewValue(Pointer, BBMap, GlobalMaps[0]);
- Value *VectorPtr = Builder.CreateBitCast(NewPointer, VectorPtrType,
- "vector_ptr");
- LoadInst *VecLoad = Builder.CreateLoad(VectorPtr,
- Load->getName() + "_p_vec_full");
- if (!Aligned)
- VecLoad->setAlignment(8);
-
- return VecLoad;
-}
-
-Value *VectorBlockGenerator::generateStrideZeroLoad(const LoadInst *Load,
- ValueMapT &BBMap) {
- const Value *Pointer = Load->getPointerOperand();
- Type *VectorPtrType = getVectorPtrTy(Pointer, 1);
- Value *NewPointer = getNewValue(Pointer, BBMap, GlobalMaps[0]);
- Value *VectorPtr = Builder.CreateBitCast(NewPointer, VectorPtrType,
- Load->getName() + "_p_vec_p");
- LoadInst *ScalarLoad= Builder.CreateLoad(VectorPtr,
- Load->getName() + "_p_splat_one");
-
- if (!Aligned)
- ScalarLoad->setAlignment(8);
-
- Constant *SplatVector =
- Constant::getNullValue(VectorType::get(Builder.getInt32Ty(),
- getVectorWidth()));
-
- Value *VectorLoad = Builder.CreateShuffleVector(ScalarLoad, ScalarLoad,
- SplatVector,
- Load->getName()
- + "_p_splat");
- return VectorLoad;
-}
-
-Value *VectorBlockGenerator::generateUnknownStrideLoad(const LoadInst *Load,
- VectorValueMapT &ScalarMaps) {
- int VectorWidth = getVectorWidth();
- const Value *Pointer = Load->getPointerOperand();
- VectorType *VectorType = VectorType::get(
- dyn_cast<PointerType>(Pointer->getType())->getElementType(), VectorWidth);
-
- Value *Vector = UndefValue::get(VectorType);
-
- for (int i = 0; i < VectorWidth; i++) {
- Value *NewPointer = getNewValue(Pointer, ScalarMaps[i], GlobalMaps[i]);
- Value *ScalarLoad = Builder.CreateLoad(NewPointer,
- Load->getName() + "_p_scalar_");
- Vector = Builder.CreateInsertElement(Vector, ScalarLoad,
- Builder.getInt32(i),
- Load->getName() + "_p_vec_");
- }
-
- return Vector;
-}
-
-void VectorBlockGenerator::generateLoad(const LoadInst *Load,
- ValueMapT &VectorMap,
- VectorValueMapT &ScalarMaps) {
- if (GroupedUnrolling || !VectorType::isValidElementType(Load->getType())) {
- for (int i = 0; i < getVectorWidth(); i++)
- ScalarMaps[i][Load] = generateScalarLoad(Load, ScalarMaps[i],
- GlobalMaps[i]);
- return;
- }
-
- MemoryAccess &Access = Statement.getAccessFor(Load);
-
- Value *NewLoad;
- if (Access.isStrideZero(isl_set_copy(Domain)))
- NewLoad = generateStrideZeroLoad(Load, ScalarMaps[0]);
- else if (Access.isStrideOne(isl_set_copy(Domain)))
- NewLoad = generateStrideOneLoad(Load, ScalarMaps[0]);
- else
- NewLoad = generateUnknownStrideLoad(Load, ScalarMaps);
-
- VectorMap[Load] = NewLoad;
-}
-
-void VectorBlockGenerator::copyUnaryInst(const UnaryInstruction *Inst,
- ValueMapT &VectorMap,
- VectorValueMapT &ScalarMaps) {
- int VectorWidth = getVectorWidth();
- Value *NewOperand = getVectorValue(Inst->getOperand(0), VectorMap,
- ScalarMaps);
-
- assert(isa<CastInst>(Inst) && "Can not generate vector code for
instruction");
-
- const CastInst *Cast = dyn_cast<CastInst>(Inst);
- VectorType *DestType = VectorType::get(Inst->getType(), VectorWidth);
- VectorMap[Inst] = Builder.CreateCast(Cast->getOpcode(), NewOperand,
DestType);
-}
-
-void VectorBlockGenerator::copyBinaryInst(const BinaryOperator *Inst,
- ValueMapT &VectorMap,
- VectorValueMapT &ScalarMaps) {
- Value *OpZero = Inst->getOperand(0);
- Value *OpOne = Inst->getOperand(1);
-
- Value *NewOpZero, *NewOpOne;
- NewOpZero = getVectorValue(OpZero, VectorMap, ScalarMaps);
- NewOpOne = getVectorValue(OpOne, VectorMap, ScalarMaps);
-
- Value *NewInst = Builder.CreateBinOp(Inst->getOpcode(), NewOpZero,
- NewOpOne,
- Inst->getName() + "p_vec");
- VectorMap[Inst] = NewInst;
-}
-
-void VectorBlockGenerator::copyStore(const StoreInst *Store,
- ValueMapT &VectorMap,
- VectorValueMapT &ScalarMaps) {
- int VectorWidth = getVectorWidth();
-
- MemoryAccess &Access = Statement.getAccessFor(Store);
-
- const Value *Pointer = Store->getPointerOperand();
- Value *Vector = getVectorValue(Store->getValueOperand(), VectorMap,
- ScalarMaps);
-
- if (Access.isStrideOne(isl_set_copy(Domain))) {
- Type *VectorPtrType = getVectorPtrTy(Pointer, VectorWidth);
- Value *NewPointer = getNewValue(Pointer, ScalarMaps[0], GlobalMaps[0]);
-
- Value *VectorPtr = Builder.CreateBitCast(NewPointer, VectorPtrType,
- "vector_ptr");
- StoreInst *Store = Builder.CreateStore(Vector, VectorPtr);
-
- if (!Aligned)
- Store->setAlignment(8);
- } else {
- for (unsigned i = 0; i < ScalarMaps.size(); i++) {
- Value *Scalar = Builder.CreateExtractElement(Vector,
- Builder.getInt32(i));
- Value *NewPointer = getNewValue(Pointer, ScalarMaps[i], GlobalMaps[i]);
- Builder.CreateStore(Scalar, NewPointer);
- }
- }
-}
-
-bool VectorBlockGenerator::hasVectorOperands(const Instruction *Inst,
- ValueMapT &VectorMap) {
- for (Instruction::const_op_iterator OI = Inst->op_begin(),
- OE = Inst->op_end(); OI != OE; ++OI)
- if (VectorMap.count(*OI))
- return true;
- return false;
-}
-
-bool VectorBlockGenerator::extractScalarValues(const Instruction *Inst,
- ValueMapT &VectorMap,
- VectorValueMapT &ScalarMaps) {
- bool HasVectorOperand = false;
- int VectorWidth = getVectorWidth();
-
- for (Instruction::const_op_iterator OI = Inst->op_begin(),
- OE = Inst->op_end(); OI != OE; ++OI) {
- ValueMapT::iterator VecOp = VectorMap.find(*OI);
-
- if (VecOp == VectorMap.end())
- continue;
-
- HasVectorOperand = true;
- Value *NewVector = VecOp->second;
-
- for (int i = 0; i < VectorWidth; ++i) {
- ValueMapT &SM = ScalarMaps[i];
-
- // If there is one scalar extracted, all scalar elements should have
- // already been extracted by the code here. So no need to check for the
- // existance of all of them.
- if (SM.count(*OI))
- break;
-
- SM[*OI] = Builder.CreateExtractElement(NewVector, Builder.getInt32(i));
- }
- }
-
- return HasVectorOperand;
-}
-
-void VectorBlockGenerator::copyInstScalarized(const Instruction *Inst,
- ValueMapT &VectorMap,
- VectorValueMapT &ScalarMaps) {
- bool HasVectorOperand;
- int VectorWidth = getVectorWidth();
-
- HasVectorOperand = extractScalarValues(Inst, VectorMap, ScalarMaps);
-
- for (int VectorLane = 0; VectorLane < getVectorWidth(); VectorLane++)
- copyInstScalar(Inst, ScalarMaps[VectorLane], GlobalMaps[VectorLane]);
-
- if (!VectorType::isValidElementType(Inst->getType()) || !HasVectorOperand)
- return;
-
- // Make the result available as vector value.
- VectorType *VectorType = VectorType::get(Inst->getType(), VectorWidth);
- Value *Vector = UndefValue::get(VectorType);
-
- for (int i = 0; i < VectorWidth; i++)
- Vector = Builder.CreateInsertElement(Vector, ScalarMaps[i][Inst],
- Builder.getInt32(i));
-
- VectorMap[Inst] = Vector;
-}
-
-int VectorBlockGenerator::getVectorWidth() {
- return GlobalMaps.size();
-}
-
-void VectorBlockGenerator::copyInstruction(const Instruction *Inst,
- ValueMapT &VectorMap,
- VectorValueMapT &ScalarMaps) {
- // Terminator instructions control the control flow. They are explicitly
- // expressed in the clast and do not need to be copied.
- if (Inst->isTerminator())
- return;
-
- if (const LoadInst *Load = dyn_cast<LoadInst>(Inst)) {
- generateLoad(Load, VectorMap, ScalarMaps);
- return;
- }
-
- if (hasVectorOperands(Inst, VectorMap)) {
- if (const StoreInst *Store = dyn_cast<StoreInst>(Inst)) {
- copyStore(Store, VectorMap, ScalarMaps);
- return;
- }
-
- if (const UnaryInstruction *Unary = dyn_cast<UnaryInstruction>(Inst)) {
- copyUnaryInst(Unary, VectorMap, ScalarMaps);
- return;
- }
-
- if (const BinaryOperator *Binary = dyn_cast<BinaryOperator>(Inst)) {
- copyBinaryInst(Binary, VectorMap, ScalarMaps);
- return;
- }
-
- // Falltrough: We generate scalar instructions, if we don't know how to
- // generate vector code.
- }
-
- copyInstScalarized(Inst, VectorMap, ScalarMaps);
-}
-
-void VectorBlockGenerator::copyBB() {
- BasicBlock *BB = Statement.getBasicBlock();
- BasicBlock *CopyBB = SplitBlock(Builder.GetInsertBlock(),
- Builder.GetInsertPoint(), P);
- CopyBB->setName("polly.stmt." + BB->getName());
- Builder.SetInsertPoint(CopyBB->begin());
-
- // Create two maps that store the mapping from the original instructions of
- // the old basic block to their copies in the new basic block. Those maps
- // are basic block local.
- //
- // As vector code generation is supported there is one map for scalar values
- // and one for vector values.
- //
- // In case we just do scalar code generation, the vectorMap is not used and
- // the scalarMap has just one dimension, which contains the mapping.
- //
- // In case vector code generation is done, an instruction may either appear
- // in the vector map once (as it is calculating >vectorwidth< values at a
- // time. Or (if the values are calculated using scalar operations), it
- // appears once in every dimension of the scalarMap.
- VectorValueMapT ScalarBlockMap(getVectorWidth());
- ValueMapT VectorBlockMap;
-
- for (BasicBlock::const_iterator II = BB->begin(), IE = BB->end();
- II != IE; ++II)
- copyInstruction(II, VectorBlockMap, ScalarBlockMap);
-}
/// Class to generate LLVM-IR that calculates the value of a clast_expr.
class ClastExpCodeGen {
--
1.7.5.4
Looks good, except some smaller comments inline. Please commit, if 'make
polly-test' passes.
> +++ b/include/polly/BlockGenerators.h
> @@ -0,0 +1,243 @@
> +//===-BlockGenerators.h - Helper to generate code for statements-*-
> C++ -*-===//
> +//
> +// The LLVM Compiler Infrastructure
> +//
> +// This file is distributed under the University of Illinois Open Source
> +// License. See LICENSE.TXT for details.
> +//
> +//===----------------------------------------------------------------------===//
> +//
> +// This file define the BlockGenerator and VectorBlockGenerator class, which
defines classes
> --- /dev/null
> +++ b/lib/CodeGen/BlockGenerators.cpp
> @@ -0,0 +1,648 @@
> +//===--- BlockGenerators.cpp - Generate code for statements -----*-
> C++ -*-===//
> +//
> +// The LLVM Compiler Infrastructure
> +//
> +// This file is distributed under the University of Illinois Open Source
> +// License. See LICENSE.TXT for details.
> +//
> +//===----------------------------------------------------------------------===//
> +//
> +// This file implement the BlockGenerator and VectorBlockGenerator class, which
implements classes
Cheers
Tobi
best regards
ether
---
include/polly/BlockGenerators.h | 243 -------------------------------
include/polly/CodeGen/BlockGenerators.h | 243 +++++++++++++++++++++++++++++++
include/polly/CodeGen/CodeGeneration.h | 20 +++
include/polly/CodeGen/LoopGenerators.h | 115 +++++++++++++++
include/polly/CodeGeneration.h | 20 ---
include/polly/LoopGenerators.h | 115 ---------------
lib/CodeGen/BlockGenerators.cpp | 3 +-
lib/CodeGen/CodeGeneration.cpp | 6 +-
lib/CodeGen/LoopGenerators.cpp | 2 +-
lib/ScheduleOptimizer.cpp | 2 +-
10 files changed, 384 insertions(+), 385 deletions(-)
delete mode 100644 include/polly/BlockGenerators.h
create mode 100644 include/polly/CodeGen/BlockGenerators.h
create mode 100644 include/polly/CodeGen/CodeGeneration.h
create mode 100644 include/polly/CodeGen/LoopGenerators.h
delete mode 100644 include/polly/CodeGeneration.h
delete mode 100644 include/polly/LoopGenerators.h
diff --git a/include/polly/BlockGenerators.h b/include/polly/BlockGenerators.h
deleted file mode 100644
index ba2e390..0000000
--- a/include/polly/BlockGenerators.h
+++ /dev/null
@@ -1,243 +0,0 @@
-//===-BlockGenerators.h - Helper to generate code for statements-*-
C++ -*-===//
-//
-// The LLVM Compiler Infrastructure
-//
-// This file is distributed under the University of Illinois Open Source
-// License. See LICENSE.TXT for details.
-//
-//===----------------------------------------------------------------------===//
-//
-// This file define the BlockGenerator and VectorBlockGenerator class, which
-// generate sequential code and vectorized code for a polyhedral statement,
-// respectively.
-//
-//===----------------------------------------------------------------------===//
-
-#ifndef POLLY_BLOCK_GENERATORS_H
-#define POLLY_BLOCK_GENERATORS_H
-
-#include "llvm/ADT/DenseMap.h"
-#include "llvm/Support/IRBuilder.h"
-
-#include "isl/map.h"
-
-#include <vector>
-
-namespace llvm {
-class Pass;
-}
-
-namespace polly {
-using namespace llvm;
-class ScopStmt;
-
-typedef DenseMap<const Value*, Value*> ValueMapT;
-typedef std::vector<ValueMapT> VectorValueMapT;
-
- /// @brief Load a vector from scalars distributed in memory
-#endif
-
diff --git a/include/polly/CodeGen/BlockGenerators.h
b/include/polly/CodeGen/BlockGenerators.h
new file mode 100644
index 0000000..ba2e390
--- /dev/null
+++ b/include/polly/CodeGen/BlockGenerators.h
@@ -0,0 +1,243 @@
+//===-BlockGenerators.h - Helper to generate code for statements-*-
C++ -*-===//
+//
+// The LLVM Compiler Infrastructure
+//
+// This file is distributed under the University of Illinois Open Source
+// License. See LICENSE.TXT for details.
+//
+//===----------------------------------------------------------------------===//
+//
+// This file define the BlockGenerator and VectorBlockGenerator class, which
diff --git a/include/polly/CodeGen/CodeGeneration.h
b/include/polly/CodeGen/CodeGeneration.h
new file mode 100644
index 0000000..5669143
--- /dev/null
+++ b/include/polly/CodeGen/CodeGeneration.h
@@ -0,0 +1,20 @@
+//===------ polly/CodeGeneration.h - The Polly code generator *- C++ -*-===//
+//
+// The LLVM Compiler Infrastructure
+//
+// This file is distributed under the University of Illinois Open Source
+// License. See LICENSE.TXT for details.
+//
+//===----------------------------------------------------------------------===//
+//
+//===----------------------------------------------------------------------===//
+
+#ifndef POLLY_CODEGENERATION_H
+#define POLLY_CODEGENERATION_H
+
+namespace polly {
+ extern bool EnablePollyVector;
+}
+
+#endif
+
diff --git a/include/polly/CodeGen/LoopGenerators.h
b/include/polly/CodeGen/LoopGenerators.h
new file mode 100644
index 0000000..5611dcd
--- /dev/null
+++ b/include/polly/CodeGen/LoopGenerators.h
@@ -0,0 +1,115 @@
+//===- LoopGenerators.h - IR helper to create loops -------------*-
C++ -*-===//
+//
+// The LLVM Compiler Infrastructure
+//
+// This file is distributed under the University of Illinois Open Source
+// License. See LICENSE.TXT for details.
+//
+//===----------------------------------------------------------------------===//
+//
+// This file contains functions to create scalar and OpenMP parallel loops
+// as LLVM-IR.
+//
+//===----------------------------------------------------------------------===//
+#ifndef POLLY_LOOP_GENERATORS_H
+#define POLLY_LOOP_GENERATORS_H
+#include "llvm/Support/IRBuilder.h"
+#include "llvm/ADT/SetVector.h"
+
+#include <map>
+
+namespace llvm {
+ class Value;
+ class Pass;
+ class BasicBlock;
+}
+
+namespace polly {
+using namespace llvm;
+
+/// @brief Create a scalar loop.
+///
+/// @param LowerBound The starting value of the induction variable.
+/// @param UpperBound The upper bound of the induction variable.
+/// @param Stride The value by which the induction variable is incremented.
+///
+/// @param Builder The builder used to create the loop.
+/// @param P A pointer to the pass that uses this function. It is used
+/// to update analysis information.
+///
+/// @return Value* The newly created induction variable for this loop.
+Value *createLoop(Value *LowerBound, Value *UpperBound, Value *Stride,
+ IRBuilder<> &Builder, Pass *P, BasicBlock *&AfterBlock);
+
+class OMPGenerator {
+public:
+ typedef std::map<Value*, Value*> ValueToValueMapTy;
+
+ OMPGenerator(IRBuilder<> &Builder, Pass *P): Builder(Builder), P(P) {}
+
+ /// @brief Create an OpenMP parallel loop.
+ ///
+ ///
+ /// @param LowerBound The starting value of the induction variable.
+ /// @param UpperBound The upper bound of the induction variable.
+ /// @param Stride The value by which the induction variable is
+ /// incremented.
+ ///
+ /// @param UsedValues A set of LLVM-IR Values that should be available to
+ /// the new loop body.
+ /// @param VMap This map is filled by createParallelLoop(). It
+ /// maps the values in UsedValues to Values through which
+ /// their content is available within the loop body.
+ /// @param LoopBody A pointer to an iterator that is set to point to the
+ /// body of the created loop. It should be used to insert
+ /// instructions that form the actual loop body.
+ ///
+ /// @return Value* The newly created induction variable for this loop.
+ Value *createParallelLoop(Value *LowerBound, Value *UpperBound,
Value *Stride,
+ SetVector<Value*> &UsedValues,
+ ValueToValueMapTy &VMap,
+ BasicBlock::iterator *LoopBody);
+
+private:
+ IRBuilder<> &Builder;
+ Pass *P;
+
+ IntegerType *getIntPtrTy();
+ Module *getModule();
+
+ void createCallParallelLoopStart(Value *SubFunction, Value *SubfunctionParam,
+ Value *NumberOfThreads, Value *LowerBound,
+ Value *UpperBound, Value *Stride);
+ Value *createCallLoopNext(Value *LowerBoundPtr, Value *UpperBoundPtr);
+ void createCallParallelEnd();
+ void createCallLoopEndNowait();
+
+ Value *loadValuesIntoStruct(SetVector<Value*> &Values);
+ void extractValuesFromStruct(SetVector<Value*> OldValues,
+ Value *Struct, ValueToValueMapTy &Map);
+
+ /// @brief Create the OpenMP subfunction.
+ ///
+ /// @param Stride The value by which the induction variable is
+ /// incremented.
+ /// @param Struct The structure that is used to make Values
available to
+ /// the loop body.
+ /// @param UsedValues A set of LLVM-IR Values that should be available to
+ /// the new loop body.
+ /// @param VMap This map that is filled by createSubfunction(). It
+ /// maps the values in UsedValues to Values through which
+ /// their content is available within the loop body.
+ /// @param SubFunction The newly created SubFunction is returned here.
+ ///
+ /// @return Value* The newly created induction variable.
+ Value *createSubfunction(Value *Stride, Value *Struct,
+ SetVector<Value*> UsedValues,
+ ValueToValueMapTy &VMap,
+ Function **SubFunction);
+
+ /// @brief Create the definition of the OpenMP subfunction.
+ Function *createSubfunctionDefinition();
+};
+} // end namespace polly
+#endif
+
diff --git a/include/polly/CodeGeneration.h b/include/polly/CodeGeneration.h
deleted file mode 100644
index 5669143..0000000
--- a/include/polly/CodeGeneration.h
+++ /dev/null
@@ -1,20 +0,0 @@
-//===------ polly/CodeGeneration.h - The Polly code generator *- C++ -*-===//
-//
-// The LLVM Compiler Infrastructure
-//
-// This file is distributed under the University of Illinois Open Source
-// License. See LICENSE.TXT for details.
-//
-//===----------------------------------------------------------------------===//
-//
-//===----------------------------------------------------------------------===//
-
-#ifndef POLLY_CODEGENERATION_H
-#define POLLY_CODEGENERATION_H
-
-namespace polly {
- extern bool EnablePollyVector;
-}
-
-#endif
-
diff --git a/include/polly/LoopGenerators.h b/include/polly/LoopGenerators.h
deleted file mode 100644
index 5611dcd..0000000
--- a/include/polly/LoopGenerators.h
+++ /dev/null
@@ -1,115 +0,0 @@
-//===- LoopGenerators.h - IR helper to create loops -------------*-
C++ -*-===//
-//
-// The LLVM Compiler Infrastructure
-//
-// This file is distributed under the University of Illinois Open Source
-// License. See LICENSE.TXT for details.
-//
-//===----------------------------------------------------------------------===//
-//
-// This file contains functions to create scalar and OpenMP parallel loops
-// as LLVM-IR.
-//
-//===----------------------------------------------------------------------===//
-#ifndef POLLY_LOOP_GENERATORS_H
-#define POLLY_LOOP_GENERATORS_H
-#include "llvm/Support/IRBuilder.h"
-#include "llvm/ADT/SetVector.h"
-
-#include <map>
-
-namespace llvm {
- class Value;
- class Pass;
- class BasicBlock;
-}
-
-namespace polly {
-using namespace llvm;
-
-/// @brief Create a scalar loop.
-///
-/// @param LowerBound The starting value of the induction variable.
-/// @param UpperBound The upper bound of the induction variable.
-/// @param Stride The value by which the induction variable is incremented.
-///
-/// @param Builder The builder used to create the loop.
-/// @param P A pointer to the pass that uses this function. It is used
-/// to update analysis information.
-///
-/// @return Value* The newly created induction variable for this loop.
-Value *createLoop(Value *LowerBound, Value *UpperBound, Value *Stride,
- IRBuilder<> &Builder, Pass *P, BasicBlock *&AfterBlock);
-
-class OMPGenerator {
-public:
- typedef std::map<Value*, Value*> ValueToValueMapTy;
-
- OMPGenerator(IRBuilder<> &Builder, Pass *P): Builder(Builder), P(P) {}
-
- /// @brief Create an OpenMP parallel loop.
- ///
- ///
- /// @param LowerBound The starting value of the induction variable.
- /// @param UpperBound The upper bound of the induction variable.
- /// @param Stride The value by which the induction variable is
- /// incremented.
- ///
- /// @param UsedValues A set of LLVM-IR Values that should be available to
- /// the new loop body.
- /// @param VMap This map is filled by createParallelLoop(). It
- /// maps the values in UsedValues to Values through which
- /// their content is available within the loop body.
- /// @param LoopBody A pointer to an iterator that is set to point to the
- /// body of the created loop. It should be used to insert
- /// instructions that form the actual loop body.
- ///
- /// @return Value* The newly created induction variable for this loop.
- Value *createParallelLoop(Value *LowerBound, Value *UpperBound,
Value *Stride,
- SetVector<Value*> &UsedValues,
- ValueToValueMapTy &VMap,
- BasicBlock::iterator *LoopBody);
-
-private:
- IRBuilder<> &Builder;
- Pass *P;
-
- IntegerType *getIntPtrTy();
- Module *getModule();
-
- void createCallParallelLoopStart(Value *SubFunction, Value *SubfunctionParam,
- Value *NumberOfThreads, Value *LowerBound,
- Value *UpperBound, Value *Stride);
- Value *createCallLoopNext(Value *LowerBoundPtr, Value *UpperBoundPtr);
- void createCallParallelEnd();
- void createCallLoopEndNowait();
-
- Value *loadValuesIntoStruct(SetVector<Value*> &Values);
- void extractValuesFromStruct(SetVector<Value*> OldValues,
- Value *Struct, ValueToValueMapTy &Map);
-
- /// @brief Create the OpenMP subfunction.
- ///
- /// @param Stride The value by which the induction variable is
- /// incremented.
- /// @param Struct The structure that is used to make Values
available to
- /// the loop body.
- /// @param UsedValues A set of LLVM-IR Values that should be available to
- /// the new loop body.
- /// @param VMap This map that is filled by createSubfunction(). It
- /// maps the values in UsedValues to Values through which
- /// their content is available within the loop body.
- /// @param SubFunction The newly created SubFunction is returned here.
- ///
- /// @return Value* The newly created induction variable.
- Value *createSubfunction(Value *Stride, Value *Struct,
- SetVector<Value*> UsedValues,
- ValueToValueMapTy &VMap,
- Function **SubFunction);
-
- /// @brief Create the definition of the OpenMP subfunction.
- Function *createSubfunctionDefinition();
-};
-} // end namespace polly
-#endif
-
diff --git a/lib/CodeGen/BlockGenerators.cpp b/lib/CodeGen/BlockGenerators.cpp
index e32759d..e4192fb 100644
--- a/lib/CodeGen/BlockGenerators.cpp
+++ b/lib/CodeGen/BlockGenerators.cpp
@@ -14,8 +14,7 @@
//===----------------------------------------------------------------------===//
#include "polly/ScopInfo.h"
-#include "polly/BlockGenerators.h"
-#include "polly/CodeGeneration.h"
+#include "polly/CodeGen/BlockGenerators.h"
#include "polly/Support/GICHelper.h"
#include "llvm/Transforms/Utils/BasicBlockUtils.h"
diff --git a/lib/CodeGen/CodeGeneration.cpp b/lib/CodeGen/CodeGeneration.cpp
index c68cf1d..f9c7b08 100644
--- a/lib/CodeGen/CodeGeneration.cpp
+++ b/lib/CodeGen/CodeGeneration.cpp
@@ -23,13 +23,13 @@
#define DEBUG_TYPE "polly-codegen"
#include "polly/Cloog.h"
-#include "polly/CodeGeneration.h"
#include "polly/Dependences.h"
#include "polly/LinkAllPasses.h"
#include "polly/ScopInfo.h"
#include "polly/TempScopInfo.h"
-#include "polly/BlockGenerators.h"
-#include "polly/LoopGenerators.h"
+#include "polly/CodeGen/CodeGeneration.h"
+#include "polly/CodeGen/BlockGenerators.h"
+#include "polly/CodeGen/LoopGenerators.h"
#include "polly/Support/GICHelper.h"
#include "llvm/Module.h"
diff --git a/lib/CodeGen/LoopGenerators.cpp b/lib/CodeGen/LoopGenerators.cpp
index 583bbe4..f56ec41 100644
--- a/lib/CodeGen/LoopGenerators.cpp
+++ b/lib/CodeGen/LoopGenerators.cpp
@@ -12,8 +12,8 @@
//
//===----------------------------------------------------------------------===//
-#include "polly/LoopGenerators.h"
#include "polly/ScopDetection.h"
+#include "polly/CodeGen/LoopGenerators.h"
#include "llvm/Module.h"
#include "llvm/Analysis/Dominators.h"
diff --git a/lib/ScheduleOptimizer.cpp b/lib/ScheduleOptimizer.cpp
index 171ed20..899712e 100644
--- a/lib/ScheduleOptimizer.cpp
+++ b/lib/ScheduleOptimizer.cpp
@@ -19,7 +19,7 @@
#include "polly/ScheduleOptimizer.h"
-#include "polly/CodeGeneration.h"
+#include "polly/CodeGen/CodeGeneration.h"
#include "polly/Dependences.h"
#include "polly/LinkAllPasses.h"
#include "polly/ScopInfo.h"
--
1.7.5.4
OK. Looks good.
Tobi
OK. I get what you mean.
However, we should keep in mind that the value map may not be
sufficient. As you remarked further down, on-demand parameter detection
for OpenMP or other code generation passes we may a different approach.
So we should make sure to remain flexible.
>> This does not mean I am not in favor of moving more stuff to generic
>> LLVM passes, but we should carefully evaluate the benefits here. I
>> personally would prefer if we can approach this in two ways:
>>
>> 1) Move all stuff into generic passes for which we are sure it is
>> beneficial and that does not increase complexity
>>
>> 2) Keep the Polly vectorizer code as it is and compare it to the
>> BBVectorizer approach.
> This is possible, and i will keep in mind. I only trying to decompose
> them to more fundamental pieces to enable more generic usage of these
> block, and I will only consider moving it to llvm transform library
> after we fix the vectorization bug, we can stop there if we found that
> moving BlockGenerator to llvm transform library is impratical.
Yes, I think this work is valuable. Also, because it causes a lot of
interesting discussions and exposes shortcomings in our code generation.
>> I think the Polly vectorizer is currently sufficiently separate to not
> Agree, using the BBVectorizer to vectorze the grouped unrolled loop by
> Polly looks like an overkill.
I don't get this one. Where is the overkill?
>> Another pass which we should get rid of is the -polly-indvars pass. We
>> currently require canonical loop induction variables. This is a
>> feature that is not available any more in LLVM and we should not keep
>> it alive in Polly.
>>
>>> So the new scop statement codegen flow looks like this:
>>>
>>> 1. Build substitution for the scop statement, map the instruction that
>>> is substituted and the new instruction (note that the IVs should be
>>> included in this map).
>>
>> Building the substitutions actually needs to change a little. At the
>> moment we create a map OldIV -> NewIV. In the future OldIV will not
>> exist. Instead we need to shift to virtual induction variables as used
>> by scalar evolution. Scalar evolution expression reference loop
>> induction variables with AddRecurrences of the form {start, +,
>> step}_<loop>. Those add recurrences define a virtual induction variable
>> that counts the number of loop iterations. The value of the
>> AddRecurrence is start + step * <virtual_iv_of_loop>. As the OldIV
>> will disappear, I propose to replace the map OldIV -> NewIV by a map,
>> <old_virtual_if_for_loop> -> NewIV. Or in short Loop -> NewIV.
> Looks like make the integeral computation part (e.g. address, condition
> ...) of the polyhedral model base on SCEV instead of base on LLVM-IR.
We already use SCEV in SCoPInfo so taking advantage of it during code
generation seems like a good next step.
>> Talking about [3]. You said you don't want this to be basic block centric?
> I think making the BlockGenerator memory access centric is a possible
> solution for finder grain statement.
What is memory access centric? We still need to define a polyhedral
statement in some way. At the moment it is a basic block. This
definition is simple and straightforward. Having something more fine
grained makes sense. Something I also thought about was taking each
memory write + the dependent instructions. However, this needs to be
properly defined. We especially need to think about possible problems
due to reordering of instructions or connected sub trees.
In this example possible statements would be S1 (%T = load A[0], store
%T to B[%index_2]) and S2 (%X = load B[%index_1], store %X to A[2]):
BB1:
%T = load A[0]
%X = load B[%index_1]
store %T to B[%index_2]
store %X to A[2]
However, in case %index_1 is equal to %index_2 the two statements cannot
be split apart.
>> How are you planning to keep track of the instructions that are part
>> of a statement. At the moment we just reference a basic block
> In brief, we can copy the entire "expression tree" of an instruction
> with side effect (and this is what we do in the IndependentPass). I will
> give an example below.
Yes, that works very well. I have three concerns here:
1. Possible correctness issues due to reordering
See the example above. We need to take this into account. As the
independent block pass works pretty well, I have no concerns this can be
solved.
2. Do we need canonical induction variables
Does this statement copying rely on canonical induction variables? In
case it does, we need to find a solution to get rid of this.
3. Redundant calculations
At the moment, the independent blocks pass introduces a lot of redundant
calculations. This will confuse the BB vectorizer.
I am wondering if combining the statement tree copying and the
SCEVRewriter approach could be a solution? The SCEVRewriter could take
care of the integer calculations that often introduce disturbing inter
basic block dependences. The statement tree copying could focus on the
intra basic block stuff and will enable us to create sub basic block
statements.
I get where you are going. I was previously thinking it would already be
sufficient to generate the write access and the read accesses would be
generated automatically. However, this will possibly allow invalid
reordering of loads.
Does your approach also work for the nested load/stores I showed further
up? Do we need to remember an order on the load/store instructions?
>>> Some other considerations which are not going to be implemented in the
>>> first step:
>>> 1. We can also perform CSE while generating code by generate the code
>>> in the dominator basicblock of current codegen location whenever
>>> possible and remember them in the substitution map, so that the other
>>> statements can reuse the code. In this case, we should delete the
>>> entries in the map when they are not dominating the current codegen
>>> location to preserve the SSA form.
>>
>> I hope we get this automatically by using the SCEVExpander.
> Yes, but SCEVExpander may not work for the floating-point computations?
No. The main reason to use it is to eliminate dependences introduced by
array index computations and to be able to analyze loops without a
canonical induction variable.
I think modeling inter basic block dependences on floating point values
with a data dependency is reasonable.
>>> 2. We should decouple the isl related code from the BlockGenerator,
>>> this can be done by generating the code derived from isl and setup the
>>> corresponding entries in the substitution map before we call
>>> BlockGenerator.
>>
>> What should the BlockGenerator be in the end? At the moment it is a
>> specific Polyhedral to LLVM-IR translator. How would a use look like,
>> that is not within Polly?
> At last, the BlockGenerator accept a value map and copies instructions
> recursively to new location, llvm may already has something like this,
> but llvm neither have something that vectorized the loop body nor
> something that grouped unrolled the loop.
So you want generic grouped unrolling and a generic loop vectorizer. I
am not yet convinced they will reduce overall complexity. You are
invited to prove me wrong.
>> I also attached a work in progress patch where I would love to hear
>> your comments. It implements a SCEVExpander based approach to get ride
>> of the independent blocks pass. It is just a very first step, but it
>> already passes 'make polly-test'. What do you think?
> Looks good to me, comments below.
Comments inline.
>> +static cl::opt<bool>
>> +SCEVCodegen("polly-codegen-scev", cl::desc("Use SCEV based code generation"),
>> + cl::init(true), cl::Hidden, cl::ZeroOrMore);
>> +
I will disable it by default.
>> +/// - Store instructions
>> +struct SCEVRewriter : public SCEVVisitor<SCEVRewriter, const SCEV*> {
>> +public:
>> + static const SCEV *rewrite(const SCEV *scev, Scop&S, ScalarEvolution&SE,
>> + ValueMapT&GlobalMap, ValueMapT&BBMap) {
>> + SCEVRewriter Rewriter(S, SE, GlobalMap, BBMap);
>> + return Rewriter.visit(scev);
>> + }
>> +
>> + SCEVRewriter(Scop&S, ScalarEvolution&SE, ValueMapT&GlobalMap,
>> + ValueMapT&BBMap) : S(S), SE(SE), GlobalMap(GlobalMap),
>> + BBMap(BBMap) {}
> the "ScalarEvolution &SE, ValueMapT &GlobalMap, ValueMapT &BBMap"
> appears repeatly in CodeGeneration.cpp, maybe we can grouped them in a
> struct named "CodeGenContext"?
Yes. That could make sense. Feel free to commit such a patch.
>> +
>> + const SCEV *visit(const SCEV *Expr) {
>> + // FIXME: The parameter handling is incorrect.
>> + //
>> + // Polly does only detect parameters in Access function and loop iteration
>> + // counters, but it does not get parameters that are just used by
>> + // instructions within the basic block.
>> + //
>> + // There are two options to solve this:
>> + // o Iterate over all instructions of the SCoP and find the actual
>> + // parameters.
>> + // o Just check within the SCEVRewriter if Values lay outside of the SCoP
>> + // and detect parameters on the fly.
> Depending when we build the signature for the subfunction, if we build
> the signature before calling SCEVRewriter (this is what Polly do at the
> moment), detecting on the fly maybe too late.
Agreed.
> Or we may create the
> subfunction after all block of the subfunction are generated, this means
> we first place the blocks of the subfunction in the current function,
> and move them the the new function at last.
Yes, either this or we load the values from memory:
%ptr = alloc
%parameter = load %ptr
and later we store the right values to that memory locations:
%ptr = alloc
store %NEW_PARAM to %ptr <- Added later on
%parameter = load %ptr
>> + // Return a new induction variable if the loop is within the original SCoP
>> + // or NULL otherwise.
>> + Value *getNewIV(const Loop *L) {
>> + Value *IV = L->getCanonicalInductionVariable();
>> + if (!IV)
>> + return NULL;
> Use 0 instead of NULL? I seem most of LLVM code use 0. (Not true now?)
I did not evaluate this but there are several cases where 'return NULL'
is used.
grep "return NULL" ../../lib/ -R | wc
249 955 16193
I actually prefer NULL replacing it with nullptr in C++11 is a lot easier.
>> + if (SCEVCodegen&& SE->isSCEVable(Inst->getType()))
>> + if (const SCEV**Scev = SE->getSCEV(const_cast<Instruction**>(Inst)))
>> + if (!isa<SCEVCouldNotCompute>(Scev)) {
>> + if (const SCEVUnknown *Unknown = dyn_cast<SCEVUnknown>(Scev)) {
>> + if (Unknown->getValue() != Inst)
>> + return;
>> + } else {
>> + return;
>> + }
>> + }
> This code fragment appear repeately, we can move it to a standalone
> function.
Agreed. I will fix it.
I plan to commit this code. This will make it easier for us to play with
it. It will obviously be disabled by default, as it still is incomplete.
If we decide the code becomes obsolete because of your instruction tree
copying, I am happy to remove it again.
Cheers
Tobi