[PATCH] Generate code for ScopStmt by copying the entire instruction tree of the instruction with side-effect, that is, the underlying instruction of memory access.

18 views
Skip to first unread message

Hongbin Zheng

unread,
Apr 25, 2012, 6:01:01 AM4/25/12
to poll...@googlegroups.com, Tobias Grosser
Hi tobi and others,

Here is a patch that improve the BlockGenerator, so that it is able to
generate ScopStmt that is no "independent" and no basicblock centric
(well, these features are not implemented in this patch).
After this patch, all instruction copying should be handle by
"cloneInstrTree", which means, we do not need these functions anymore:

void copyInstScalar(const Instruction *Inst, ValueMapT &BBMap,
ValueMapT &GlobalMap);

Value *generateScalarLoad(const LoadInst *load, ValueMapT &BBMap,
ValueMapT &GlobalMap);

Value *generateScalarStore(const StoreInst *store, ValueMapT &BBMap,
ValueMapT &GlobalMap);

And the copying process perform dead code elimination implicitly
because it only copy instructions that have side effect or contribute
to the instructions that have side effect.
It pass the regression tests on 32 bit platform right now, to not
break some other people's benchmark, i will going to add a command
line option to disable the newly added block generate function by
default when i commit the patch (Or I should move the function to a
new BlockGenerator so it doesn't messing up the old BlockGenrator?).

best regards
ether

PS: Whats the status of Polly's LNT? Any extra steps to run it comparing to [1]?
[1]http://llvm.org/docs/lnt/quickstart.html

---
lib/CodeGen/CodeGeneration.cpp | 179 +++++++++++++++++++++++++++++++++++++++-
1 files changed, 178 insertions(+), 1 deletions(-)

diff --git a/lib/CodeGen/CodeGeneration.cpp b/lib/CodeGen/CodeGeneration.cpp
index 7edd136..c274b34 100644
--- a/lib/CodeGen/CodeGeneration.cpp
+++ b/lib/CodeGen/CodeGeneration.cpp
@@ -212,6 +212,72 @@ public:
Generator.copyBB(GlobalMap);
}

+ /// @brief Generate a new BasicBlock for a ScopStmt.
+ ///
+ /// @param Builder The LLVM-IR Builder used to generate the statement. The
+ /// code is generated at the location, the Builder
points to.
+ /// @param Stmt The statement to code generate.
+ /// @param GlobalMap A map that defines for certain Values
referenced from the
+ /// original code new Values they should be replaced with.
+ /// @param P A reference to the pass this function is called from.
+ /// The pass is needed to update other analysis.
+ static void generateBlock(IRBuilder<> &Builder, ScopStmt &Stmt,
+ ValueMapT &GlobalMap, Pass *P) {
+ BlockGenerator Generator(Builder, Stmt, P);
+
+ BasicBlock *BB = Generator.Statement.getBasicBlock();
+ BasicBlock *CopyBB = SplitBlock(Builder.GetInsertBlock(),
+ Builder.GetInsertPoint(), P);
+ CopyBB->setName("polly.stmt." + BB->getName());
+ Builder.SetInsertPoint(CopyBB->begin());
+
+ ValueMapT BBMap;
+
+ // To generate code for finder grain statement, we need to remeber the
+ // underlying instruction of the memory access.
+ // typedef ScopStmt::memacc_iterator acc_it;
+ // for (acc_it I = Stmt.memacc_begin(), E = Stmt.memacc_end(); I != E; ++I)
+ // Generate code for access I.
+
+ typedef BasicBlock::const_iterator it;
+ for (it II = BB->begin(), IE = BB->end(); II != IE; ++II) {
+ const Instruction *Inst = II;
+ if (MemoryAccess *MA = Stmt.lookupAccessFor(Inst))
+ Generator.generateMemoryAccess(MA, Inst, GlobalMap, BBMap);
+ }
+ }
+
+
+ void generateMemoryAccess(MemoryAccess *MA, const Instruction *Inst,
+ ValueMapT &GlobalMap, ValueMapT &BBMap) {
+ isl_map *CurrentAccessRelation = MA->getAccessRelation();
+ isl_map *NewAccessRelation = MA->getNewAccessRelation();
+
+ assert(isl_map_has_equal_space(CurrentAccessRelation, NewAccessRelation)
+ && "Current and new access function use different spaces");
+
+ const Value *PointerOperand = MA->isRead() ?
+ cast<LoadInst>(Inst)->getPointerOperand() :
+ cast<StoreInst>(Inst)->getPointerOperand();
+
+ // Do we need to generate the address from new access relation?
+ if (NewAccessRelation) {
+ Value *BaseAddress = const_cast<Value*>(MA->getBaseAddr());
+ Value *NewPointer = getNewAccessOperand(NewAccessRelation, BaseAddress,
+ BBMap, GlobalMap);
+ BBMap[PointerOperand] = NewPointer;
+ }
+
+ cloneInstrTree(Inst, BBMap, GlobalMap);
+
+ // Free the context.
+ isl_map_free(CurrentAccessRelation);
+ if (NewAccessRelation) {
+ BBMap.erase(PointerOperand);
+ isl_map_free(NewAccessRelation);
+ }
+ }
+
protected:
IRBuilder<> &Builder;
ScopStmt &Statement;
@@ -280,11 +346,122 @@ protected:
/// (for values recalculated in the new ScoP, but not
/// within this basic block).
void copyBB(ValueMapT &GlobalMap);
+
+ static Value *castOperandType(Value *NewOperand, Type *DstTy,
+ Instruction *InsertPos) {
+ unsigned OldSize = DstTy->getScalarSizeInBits();
+ unsigned NewSize = NewOperand->getType()->getScalarSizeInBits();
+ // Insert a cast if types are different
+ // FIXME: Use IRBuilder, so we have chance to fold the cast.
+ if (OldSize < NewSize)
+ return CastInst::CreateTruncOrBitCast(NewOperand, DstTy,
+ NewOperand->getName() + "_c",
+ InsertPos);
+
+ return NewOperand;
+ }
+
+ /// @brief Clone the Instruction and its dependents recursively until the
+ /// instuction appears in the value map.
+ ///
+ /// @param Root The instruction need to be cloned.
+ /// @param BB The BasicBlock to hold the instruction.
+ ///
+ /// @return The newly cloned instruction.
+ Instruction *cloneInstrTree(const Instruction *Root, ValueMapT &BBMap,
+ ValueMapT &GlobalMap, const Twine
&NameSuffix="");
};

BlockGenerator::BlockGenerator(IRBuilder<> &B, ScopStmt &Stmt, Pass *P):
Builder(B), Statement(Stmt), P(P) {}

+Instruction *BlockGenerator::cloneInstrTree(const Instruction *Root,
+ ValueMapT &BBMap,
+ ValueMapT &GlobalMap,
+ const Twine &NameSuffix) {
+ // Clone the incoming instruction.
+ Instruction *NewI = Root->clone();
+ const BasicBlock *OldBB = Root->getParent();
+ if (!NewI->getType()->isVoidTy()) NewI->setName(Root->getName() +
NameSuffix);
+ // Insert it to the current bb and remember we already copy this instruction.
+ Builder.Insert(NewI);
+ // Remember we cloned this instruction.
+ BBMap[Root] = NewI;
+
+ // Depth first traverse the operand tree (or operand dag, because we will
+ // stop at PHINodes, so there are no cycle).
+ typedef Instruction::op_iterator ChildIt;
+ std::vector<std::pair<Instruction*, ChildIt> > WorkStack;
+
+ WorkStack.push_back(std::make_pair(NewI, NewI->op_begin()));
+
+ while (!WorkStack.empty()) {
+ Instruction *CurInst = WorkStack.back().first;
+ ChildIt It = WorkStack.back().second;
+ DEBUG(dbgs() << "Checking Operand of Node:\n" << *CurInst <<
"\n------>\n");
+ if (It == CurInst->op_end()) {
+ DEBUG(dbgs() << "Going to insert new inst: " << *CurInst << '\n');
+ // Insert the new instructions in topological order by insert the operand
+ // tree before the root.
+ // FIXME: Use IRBuilder?
+ if (!CurInst->getParent())
+ CurInst->insertBefore(NewI);
+
+ WorkStack.pop_back();
+ } else {
+ // for each node N,
+ Value *Operand = *It;
+ ++WorkStack.back().second;
+
+ DEBUG(dbgs() << "For Operand:\n" << *Operand << ' ' << Operand
<< "\n--->");
+
+ if (Value *NewParam = GlobalMap.lookup(Operand)) {
+ assert(Operand->getType() == NewParam->getType()
+ && "The type of parameter not match!");
+ DEBUG(dbgs() << "Replace parameter" << *Operand << " by "
+ << *NewParam << ".\n");
+ It->set(NewParam);
+ // If a value appeared in parameter map, it should not
appeared in local
+ // map
+ assert(!BBMap.count(Operand) && "Parameter in local map!");
+ continue;
+ }
+
+ Instruction *OpInst = dyn_cast<Instruction>(Operand);
+ // Do not clone the constant or parameters not in the parameters map.
+ if (OpInst == 0) continue;
+
+ // Now we need to clone Operand to CurBB.
+ // Check if we already clone it.
+ if (Value *Cloned = BBMap.lookup(OpInst)) {
+ DEBUG(dbgs() << "already cloned: " << *Cloned << ".\n");
+ It->set(castOperandType(Cloned, OpInst->getType(), NewI));
+ // Skip all its children as we already processed them.
+ continue;
+ }
+
+ // Ignore the parameter not in the subsitution map.
+ // FIXME: This is incorrect after IndependentBlocks pass is removed.
+ if (OldBB != OpInst->getParent()) continue;
+
+ // Else just clone the instruction.
+ // Note that NewOp is not inserted in any BB now, we will insert it when
+ // it popped form the work stack, so it will be inserted in topological
+ // order.
+ Instruction *NewOp = OpInst->clone();
+ NewOp->setName("p_" + OpInst->getName());
+ DEBUG(dbgs() << "Move to " << *NewOp << "\n");
+ It->set(NewOp);
+ // Insert it to the subsitiuation table.
+ BBMap[OpInst] = NewOp;
+ // Process its operands.
+ WorkStack.push_back(std::make_pair(NewOp, NewOp->op_begin()));
+ }
+ }
+
+ return NewI;
+}
+
Value *BlockGenerator::getNewValue(const Value *Old, ValueMapT &BBMap,
ValueMapT &GlobalMap) {
// We assume constants never change.
@@ -1197,7 +1374,7 @@ void ClastStmtCodeGen::codegen(const clast_user_stmt *u,
int VectorDimensions = IVS ? IVS->size() : 1;

if (VectorDimensions == 1) {
- BlockGenerator::generate(Builder, *Statement, ValueMap, P);
+ BlockGenerator::generateBlock(Builder, *Statement, ValueMap, P);

I will guard this with a command line operation when i commit the
patch, so that the new block generate function is disable by default.

return;
}

--
1.7.5.4

0001-Generate-code-for-ScopStmt-by-copying-the-entire-ins.patch

Hongbin Zheng

unread,
Apr 25, 2012, 9:42:28 AM4/25/12
to poll...@googlegroups.com, Tobias Grosser
Hi,

this is a up to date patch.

best regards
ether

---
include/polly/CodeGen/BlockGenerators.h | 59 +++++++++++
lib/CodeGen/BlockGenerators.cpp | 165 +++++++++++++++++++++++++++++++
lib/CodeGen/CodeGeneration.cpp | 2 +-
3 files changed, 225 insertions(+), 1 deletions(-)

diff --git a/include/polly/CodeGen/BlockGenerators.h
b/include/polly/CodeGen/BlockGenerators.h
index d3521e1..daa37e1 100644
--- a/include/polly/CodeGen/BlockGenerators.h
+++ b/include/polly/CodeGen/BlockGenerators.h
@@ -238,6 +238,65 @@ private:
void copyBB();
};

+/// @brief Generate a new basic block for a polyhedral statement.
+///
+class TreeCopyStmtGenerator : public BlockGenerator {
+public:


+ /// @brief Generate a new BasicBlock for a ScopStmt.
+ ///
+ /// @param Builder The LLVM-IR Builder used to generate the statement. The
+ /// code is generated at the location, the Builder
points to.
+ /// @param Stmt The statement to code generate.
+ /// @param GlobalMap A map that defines for certain Values
referenced from the
+ /// original code new Values they should be replaced with.
+ /// @param P A reference to the pass this function is called from.
+ /// The pass is needed to update other analysis.

+ static void codegen(IRBuilder<> &Builder, ScopStmt &Stmt,


+ ValueMapT &GlobalMap, Pass *P);
+

+protected:
+ TreeCopyStmtGenerator(IRBuilder<> &B, ScopStmt &Stmt, Pass *P)
+ : BlockGenerator(B, Stmt, P) {}
+
+ /// @brief Cast the type of the operand.
+ ///
+ /// @param NewOperand The value need to cast.
+ /// @param DstTy The destinated type.
+ /// @param InsertPos The position to insert the cast instruction.
+ ///
+ /// @return The casted value if the type of NewOperand do not match DstTy, or
+ /// the simply NewOperand otherwise.


+ static Value *castOperandType(Value *NewOperand, Type *DstTy,
+ Instruction *InsertPos);
+

+ /// @brief Generate code for memory access in ScopStmt.
+ ///
+ /// @param MA The memory access to generate code for.
+ /// @param Inst The underlying instruction that access the memory.
+ /// @param BBMap A mapping from old values to their new values
+ /// (for values recalculated within this basic block).
+ /// @param GlobalMap A mapping from old values to their new values
+ /// (for values recalculated in the new ScoP, but not
+ /// within this basic block).


+ void generateMemoryAccess(MemoryAccess *MA, const Instruction *Inst,
+ ValueMapT &GlobalMap, ValueMapT &BBMap);
+

+ /// @brief Clone the Instruction and its dependents recursively until the
+ /// instuction appears in the value map.
+ ///
+ /// @param Root The instruction need to be cloned.

+ /// @param BBMap A mapping from old values to their new values
+ /// (for values recalculated within this basic block).
+ /// @param GlobalMap A mapping from old values to their new values
+ /// (for values recalculated in the new ScoP, but not
+ /// within this basic block).
+ /// @param NameSuffix The name suffix of the newly cloned instruction.


+ ///
+ /// @return The newly cloned instruction.
+ Instruction *cloneInstrTree(const Instruction *Root, ValueMapT &BBMap,
+ ValueMapT &GlobalMap, const Twine
&NameSuffix="");

+};
+
}
#endif

diff --git a/lib/CodeGen/BlockGenerators.cpp b/lib/CodeGen/BlockGenerators.cpp
index 95b8279..5c0af25 100644
--- a/lib/CodeGen/BlockGenerators.cpp
+++ b/lib/CodeGen/BlockGenerators.cpp
@@ -19,6 +19,8 @@

#include "llvm/Transforms/Utils/BasicBlockUtils.h"
#include "llvm/Support/CommandLine.h"
+#define DEBUG_TYPE "polly-block-generators"
+#include "llvm/Support/Debug.h"

#include "isl/aff.h"
#include "isl/set.h"
@@ -645,3 +647,166 @@ void VectorBlockGenerator::copyBB() {
II != IE; ++II)
copyInstruction(II, VectorBlockMap, ScalarBlockMap);
}
+
+
+Value *TreeCopyStmtGenerator::castOperandType(Value *NewOperand, Type *DstTy,


+ Instruction *InsertPos) {
+ unsigned OldSize = DstTy->getScalarSizeInBits();
+ unsigned NewSize = NewOperand->getType()->getScalarSizeInBits();
+ // Insert a cast if types are different
+ // FIXME: Use IRBuilder, so we have chance to fold the cast.
+ if (OldSize < NewSize)
+ return CastInst::CreateTruncOrBitCast(NewOperand, DstTy,
+ NewOperand->getName() + "_c",
+ InsertPos);

+ else
+ assert(OldSize == NewSize && "Unexpect OldSize bigger than NewSize!");


+
+ return NewOperand;
+}
+

+void TreeCopyStmtGenerator::codegen(IRBuilder<> &Builder, ScopStmt &Stmt,


+ ValueMapT &GlobalMap, Pass *P) {

+ TreeCopyStmtGenerator Generator(Builder, Stmt, P);


+
+ BasicBlock *BB = Generator.Statement.getBasicBlock();
+ BasicBlock *CopyBB = SplitBlock(Builder.GetInsertBlock(),
+ Builder.GetInsertPoint(), P);
+ CopyBB->setName("polly.stmt." + BB->getName());
+ Builder.SetInsertPoint(CopyBB->begin());
+
+ ValueMapT BBMap;
+
+ // To generate code for finder grain statement, we need to remeber the
+ // underlying instruction of the memory access.
+ // typedef ScopStmt::memacc_iterator acc_it;
+ // for (acc_it I = Stmt.memacc_begin(), E = Stmt.memacc_end(); I != E; ++I)
+ // Generate code for access I.
+
+ typedef BasicBlock::const_iterator it;
+ for (it II = BB->begin(), IE = BB->end(); II != IE; ++II) {
+ const Instruction *Inst = II;
+ if (MemoryAccess *MA = Stmt.lookupAccessFor(Inst))
+ Generator.generateMemoryAccess(MA, Inst, GlobalMap, BBMap);
+ }
+}
+

+void TreeCopyStmtGenerator::generateMemoryAccess(MemoryAccess *MA,
+ const Instruction *Inst,
+ ValueMapT &GlobalMap,
+ ValueMapT &BBMap) {


+ isl_map *CurrentAccessRelation = MA->getAccessRelation();
+ isl_map *NewAccessRelation = MA->getNewAccessRelation();
+
+ assert(isl_map_has_equal_space(CurrentAccessRelation, NewAccessRelation)
+ && "Current and new access function use different spaces");
+
+ const Value *PointerOperand = MA->isRead() ?
+ cast<LoadInst>(Inst)->getPointerOperand() :
+ cast<StoreInst>(Inst)->getPointerOperand();
+
+ // Do we need to generate the address from new access relation?
+ if (NewAccessRelation) {
+ Value *BaseAddress = const_cast<Value*>(MA->getBaseAddr());
+ Value *NewPointer = getNewAccessOperand(NewAccessRelation, BaseAddress,
+ BBMap, GlobalMap);
+ BBMap[PointerOperand] = NewPointer;
+ }
+

+ cloneInstrTree(Inst, GlobalMap, BBMap);


+
+ // Free the context.
+ isl_map_free(CurrentAccessRelation);
+ if (NewAccessRelation) {
+ BBMap.erase(PointerOperand);
+ isl_map_free(NewAccessRelation);
+ }
+}
+

+Instruction *TreeCopyStmtGenerator::cloneInstrTree(const Instruction *Root,
+ ValueMapT &GlobalMap,
+ ValueMapT &BBMap,


+ const Twine &NameSuffix) {
+ // Clone the incoming instruction.
+ Instruction *NewI = Root->clone();
+ const BasicBlock *OldBB = Root->getParent();
+ if (!NewI->getType()->isVoidTy())

+ NewI->setName(Root->getName() + NameSuffix);

diff --git a/lib/CodeGen/CodeGeneration.cpp b/lib/CodeGen/CodeGeneration.cpp
index 4f46c95..55cb2c0 100644
--- a/lib/CodeGen/CodeGeneration.cpp
+++ b/lib/CodeGen/CodeGeneration.cpp
@@ -375,7 +375,7 @@ void ClastStmtCodeGen::codegen(const clast_user_stmt *u,


int VectorDimensions = IVS ? IVS->size() : 1;

if (VectorDimensions == 1) {
- BlockGenerator::generate(Builder, *Statement, ValueMap, P);

+ TreeCopyStmtGenerator::codegen(Builder, *Statement, ValueMap, P);
return;
}

--
1.7.5.4

0001-Generate-code-for-ScopStmt-by-copying-the-entire-ins.patch

Tobias Grosser

unread,
Apr 25, 2012, 1:23:54 PM4/25/12
to Hongbin Zheng, poll...@googlegroups.com
On 04/25/2012 12:01 PM, Hongbin Zheng wrote:
> Hi tobi and others,

Hi ether,

I am currently very busy, but plan to review this tomorrow.

> best regards
> ether
>
> PS: Whats the status of Polly's LNT? Any extra steps to run it comparing to [1]?
> [1]http://llvm.org/docs/lnt/quickstart.html

It is currently blocked by an LNT issue, where I did not have the time
to solve it. The problem is if 'lnt runtest nt' is run with
--multisample, than it creates the files build-tools.log, test.log,
configure.log only for the individual runs in their run directories, but
it does not create any global logs. This yields to sporadic timeouts in
the buildbot that runs the lnt as it cannot see the progress of the
build. Setting up the timeout is an option, but Daniel Dunbar asked me
to rather fix the core issue. If you have time, I would highly
appreciate if you could have a look.

The buildbot LNT builder for Polly is already available in the zorg
repository. It is called getPollyLNTFactory(). If you want to try Polly
and LNT without running a buildbot, you can get the relevant commands
from that file.

Tobi

Hongbin Zheng

unread,
Apr 25, 2012, 1:30:17 PM4/25/12
to Tobias Grosser, poll...@googlegroups.com
hi tobi,

On Thu, Apr 26, 2012 at 1:23 AM, Tobias Grosser <tob...@grosser.es> wrote:
> On 04/25/2012 12:01 PM, Hongbin Zheng wrote:
>>
>> Hi tobi and others,
>
>
> Hi ether,
>
> I am currently very busy, but plan to review this tomorrow.
>
>
>> best regards
>> ether
>>
>> PS: Whats the status of Polly's LNT? Any extra steps to run it comparing
>> to [1]?
>> [1]http://llvm.org/docs/lnt/quickstart.html
>
>
> It is currently blocked by an LNT issue, where I did not have the time to
> solve it. The problem is if 'lnt runtest nt' is run with --multisample, than
> it creates the files build-tools.log, test.log, configure.log only for the
> individual runs in their run directories, but it does not create any global
> logs. This yields to sporadic timeouts in the buildbot that runs the lnt as
You means the buildbot cannot see anything from stdio, so it thinks
the build is not responding?
> it cannot see the progress of the build. Setting up the timeout is an
> option, but Daniel Dunbar asked me to rather fix the core issue. If you have
> time, I would highly appreciate if you could have a look.
Ok, i will have a look.
>
> The buildbot LNT builder for Polly is already available in the zorg
> repository. It is called getPollyLNTFactory(). If you want to try Polly and
> LNT without running a buildbot, you can get the relevant commands from that
> file.
>
> Tobi

Best regards
ether

Tobias Grosser

unread,
Apr 25, 2012, 1:32:48 PM4/25/12
to Hongbin Zheng, poll...@googlegroups.com
On 04/25/2012 07:30 PM, Hongbin Zheng wrote:
> hi tobi,
>
> On Thu, Apr 26, 2012 at 1:23 AM, Tobias Grosser<tob...@grosser.es> wrote:
>> On 04/25/2012 12:01 PM, Hongbin Zheng wrote:
>>>
>>> Hi tobi and others,
>>
>>
>> Hi ether,
>>
>> I am currently very busy, but plan to review this tomorrow.
>>
>>
>>> best regards
>>> ether
>>>
>>> PS: Whats the status of Polly's LNT? Any extra steps to run it comparing
>>> to [1]?
>>> [1]http://llvm.org/docs/lnt/quickstart.html
>>
>>
>> It is currently blocked by an LNT issue, where I did not have the time to
>> solve it. The problem is if 'lnt runtest nt' is run with --multisample, than
>> it creates the files build-tools.log, test.log, configure.log only for the
>> individual runs in their run directories, but it does not create any global
>> logs. This yields to sporadic timeouts in the buildbot that runs the lnt as
> You means the buildbot cannot see anything from stdio, so it thinks
> the build is not responding?

Yes. lnt is mostly quiet on stdout and puts all the input into test.log
and Co. You can see this at the clang -O3 tester.

This run timed out:

http://lab.llvm.org:8011/builders/clang-x86_64-darwin10-nt-O3/builds/362

as test.log and Co are within the individual build directories, but lnt
does not generate any global logs.

Tobi

Tobias Grosser

unread,
Apr 25, 2012, 1:46:08 PM4/25/12
to Hongbin Zheng, poll...@googlegroups.com
On 04/25/2012 07:30 PM, Hongbin Zheng wrote:
> hi tobi,
>
> On Thu, Apr 26, 2012 at 1:23 AM, Tobias Grosser<tob...@grosser.es> wrote:
>> On 04/25/2012 12:01 PM, Hongbin Zheng wrote:
>>>
>>> Hi tobi and others,
>>
>>
>> Hi ether,
>>
>> I am currently very busy, but plan to review this tomorrow.

Another point that I was thinking to work on and did not get to it.

Inspired by the new config option for the basic block vectorizer, I was
thinking to add a new option to lib/RegisterPasses.cpp

It would be:

-polly-vectorize
=no No Vectorization
=polly Polly internal vectorizer
=bb The Basic Block vectorizer driven by Polly

The option defaults to =no but could be used to enable on of the two
vectorization approaches. For the time being, I would implement the =bb
option by schedule the basic block vectorizer directly after the
PollyCodegen Pass and enable Grouped Unrolling. We can later improve on
this.

When we are in vector mode, we need to enable the PollyVector flag of
the scheduling optimizers (either isl or PoCC). In case of =polly we
need to enable the Codegen vectorizer and in case of =bb we need to
schedule the basic block vectorizer.

It would be really cool if our two scheduling optimizers and the code
generation could get similar option passing as was recently introduced
for the basic block vectorizer. Like this we could still influence the
individual passes with the current options, but at the same time we
could configure them as necessary when running in
"-O3 -polly -polly-vectorize=?" mode.

As I have the feeling this week I am the bottle neck I offer this to
interested people. In case you are busy yourself, I will look into it
next week.

Cheers
Tobi


Hongbin Zheng

unread,
Apr 26, 2012, 11:39:50 AM4/26/12
to Tobias Grosser, poll...@googlegroups.com
hi tobi,
On Thu, Apr 26, 2012 at 1:32 AM, Tobias Grosser <tob...@grosser.es> wrote:
>
>
> Yes. lnt is mostly quiet on stdout and puts all the input into test.log and
> Co. You can see this at the clang -O3 tester.
>
> This run timed out:
>
> http://lab.llvm.org:8011/builders/clang-x86_64-darwin10-nt-O3/builds/362
>
> as test.log and Co are within the individual build directories, but lnt does
> not generate any global logs.
I found that there is already a runtime limit parameter, which default
to 500 seconds, passed to RunSafely.sh/RunToolSafely.sh in test suite
(in Makefiles.programs). This means the test suite itself already try
to kill the no-responding builds/tests. Furthermore, we can also try
to catch the no-responding builds in nt.py (or similar script). With
these timeout guards, we can safely disable the timeout in the
buildstep of the buildbot.

Because it is very common that the nightly build take more than one
hour to build, one disadvantage of setting timeout value at the level
of nt.py is that the timeout value maybe very large, and it maybe too
late to find that the build not responding. But i think this will
never happen, because the no-responding builds should be caught by
RunSafely.sh/RunToolSafely.sh quickly.

best regards
ether

>
> Tobi

Tobias Grosser

unread,
Apr 26, 2012, 11:43:17 AM4/26/12
to Hongbin Zheng, poll...@googlegroups.com
On 04/26/2012 05:39 PM, Hongbin Zheng wrote:
> hi tobi,
> On Thu, Apr 26, 2012 at 1:32 AM, Tobias Grosser<tob...@grosser.es> wrote:
>>
>>
>> Yes. lnt is mostly quiet on stdout and puts all the input into test.log and
>> Co. You can see this at the clang -O3 tester.
>>
>> This run timed out:
>>
>> http://lab.llvm.org:8011/builders/clang-x86_64-darwin10-nt-O3/builds/362
>>
>> as test.log and Co are within the individual build directories, but lnt does
>> not generate any global logs.
> I found that there is already a runtime limit parameter, which default
> to 500 seconds, passed to RunSafely.sh/RunToolSafely.sh in test suite
> (in Makefiles.programs). This means the test suite itself already try
> to kill the no-responding builds/tests. Furthermore, we can also try
> to catch the no-responding builds in nt.py (or similar script). With
> these timeout guards, we can safely disable the timeout in the
> buildstep of the buildbot.

Yes. However this does not catch timeouts due to problems in LNT itself
(non responing NFS mount, ...) and it also does not give us access the
log files

> Because it is very common that the nightly build take more than one
> hour to build, one disadvantage of setting timeout value at the level
> of nt.py is that the timeout value maybe very large, and it maybe too
> late to find that the build not responding. But i think this will
> never happen, because the no-responding builds should be caught by
> RunSafely.sh/RunToolSafely.sh quickly.

I agree that disabling the timeout is a solution that may work for now,
but Daniel explicitly asked me to do it the right way. So I don't really
want to take this shortcut.

Cheers
Tobi

Hongbin Zheng

unread,
Apr 26, 2012, 11:53:47 AM4/26/12
to Tobias Grosser, poll...@googlegroups.com
In fact i see lots of output in test.log even i only run MultiSource
by --only-test MultiSource. But i am not sure if the logs treat as the
"output" of the build step.

One possible solution is product some kind of heart beat output, for
example, we can add a "noisy" mode to RunSafely.sh/RunToolSafely.sh,
which means the RunSafely.sh/RunToolSafely.sh will output something
like "I am working on ***!" when they are running.

best regards
ether

>
> Cheers
> Tobi

Hongbin Zheng

unread,
Apr 26, 2012, 11:53:55 AM4/26/12
to Tobias Grosser, poll...@googlegroups.com
On Thu, Apr 26, 2012 at 11:43 PM, Tobias Grosser <tob...@grosser.es> wrote:

Hongbin Zheng

unread,
Apr 26, 2012, 12:02:04 PM4/26/12
to Tobias Grosser, poll...@googlegroups.com
This need to output to stderr, so that the message are redirect to
stdout of the buildstep.

Tobias Grosser

unread,
Apr 26, 2012, 12:05:55 PM4/26/12
to Hongbin Zheng, poll...@googlegroups.com
They are. The problem comes into play if you run lnt with
--multisample=2 (which is needed to reduce noise level to a acceptable
level). Now test.log is not at the place where it should be, but in a
subfolder. So the builder does not take it into account. Daniel
suggested that for a multisample build the (configure|test|...) logs are
written simultaneously into two different files. The run specific log
files and the overall log files. The buildbot could than always watch
the global log files (which are at the place where the log files are
without multi-sampleing).

If the buildbot can find the actual log files no heartbeat is needed and
the user can actually see where the build is at the moment.

Tobi

Hongbin Zheng

unread,
Apr 26, 2012, 11:46:22 PM4/26/12
to Tobias Grosser, poll...@googlegroups.com
hi tobi,
>
>
> They are. The problem comes into play if you run lnt with --multisample=2
> (which is needed to reduce noise level to a acceptable level). Now test.log
> is not at the place where it should be, but in a subfolder. So the builder
I am a bit confuse now, when i run lnt with --multisample=2, bug i can
still get something from the test.log:
bot@polly-MS-7680:~/pollywork/llvm-cbuild$
$HOME/pollywork/pollysb/bin/lnt runtest nt --sandbox
$HOME/pollywork/pollysb/ --cc
$HOME/pollywork/llvm-cbuild/Debug+Asserts/bin/clang --test-suite
$HOME/pollywork/test-suite/ --multisample=2 --only-test MultiSource&
[1] 23483
bot@polly-MS-7680:~/pollywork/llvm-cbuild$ nt.py:1140: note: inferred
C++ compiler under test as:
'/home/bot/pollywork/llvm-cbuild/Debug+Asserts/bin/clang++'
2012-04-27 02:46:08: (multisample) running iteration 0
...
2012-04-27 02:46:13: starting test in
'/home/bot/pollywork/pollysb/test-2012-04-27_02-46-13-0'
...
2012-04-27 02:46:21: executing "nightly tests" with -j1...

bot@polly-MS-7680:~/pollywork/llvm-cbuild$ cat
/home/bot/pollywork/pollysb/test-2012-04-27_02-46-13-0/test.log
2012-04-27 02:46:21: running: "make" "-k"
"TARGET_LLVMGCC=/home/bot/pollywork/llvm-cbuild/Debug+Asserts/bin/clang"
"CC_UNDER_TEST_TARGET_IS_I386=1" "ENABLE_HASHED_PROGRAM_OUTPUT=1"
"TARGET_CXX=None" "LLI_OPTFLAGS=-O3" "TARGET_CC=None"
"TARGET_LLVMGXX=/home/bot/pollywork/llvm-cbuild/Debug+Asserts/bin/clang++"
"CC_UNDER_TEST_IS_CLANG=1" "TEST=simple" "TARGET_FLAGS="
"USE_REFERENCE_OUTPUT=1" "OPTFLAGS=-O3" "LLC_OPTFLAGS=-O3"
"ENABLE_OPTIMIZED=1" "ARCH=x86" "DISABLE_CBE=1" "DISABLE_JIT=1" "-C"
"MultiSource" "-j" "1" "report" "report.simple.csv"
make: Entering directory
`/home/bot/pollywork/pollysb/test-2012-04-27_02-46-13-0/MultiSource'
make -C /home/bot/pollywork/pollysb/test-2012-04-27_02-46-13-0/tools all \
ORIGINAL_CC=cc \
ORIGINAL_CXX=cc
make[1]: Entering directory
`/home/bot/pollywork/pollysb/test-2012-04-27_02-46-13-0/tools'
make[1]: Nothing to be done for `all'.
make[1]: Leaving directory
`/home/bot/pollywork/pollysb/test-2012-04-27_02-46-13-0/tools'
make -j1 TEST=simple
....


Furthermore, option "--multisample=..." is coms from nt_flags of
getPollyLNTFactory, so where can i find the other options in nt_flags
those are passed to getPollyLNTFactory?

best regards
ether


> does not take it into account. Daniel suggested that for a multisample build
> the (configure|test|...) logs are written simultaneously into two different
> files. The run specific log files and the overall log files. The buildbot
> could than always watch the global log files (which are at the place where
> the log files are without multi-sampleing).
>
>
> Tobi

Hongbin Zheng

unread,
Apr 27, 2012, 12:38:17 AM4/27/12
to Tobias Grosser, poll...@googlegroups.com
hi tobi,

On Fri, Apr 27, 2012 at 11:46 AM, Hongbin Zheng <ethe...@gmail.com> wrote:
> hi tobi,
>>
>>
>> They are. The problem comes into play if you run lnt with --multisample=2
I found that the link
[http://lab.llvm.org:8011/builders/clang-x86_64-darwin10-nt-O3/builds/362/steps/lnt.nightly-test/logs/stdio]
you post to me is run lnt with --multisample=3, so i rerun lnt with
--multisample=3. But i still can some output from test log:
$ run lnt with --multisample=3
...
2012-04-27 04:33:36: starting test in
'/home/bot/pollywork/pollysb/test-2012-04-27_04-33-36-0'
...
$ cat /home/bot/pollywork/pollysb/test-2012-04-27_04-33-36-0/test.log
not empty

best regards
ether

Tobias Grosser

unread,
Apr 27, 2012, 4:03:46 AM4/27/12
to Hongbin Zheng, poll...@googlegroups.com
In the buildbot lnt is also run with --no-timestamp.

This ensures that the normal test.log file is directly in the root of
the testing directory. With --multisample the files are in 0/test.log
1/test.log, ...

It would be good to have a combined log at test.log

Tobi

Hongbin Zheng

unread,
Apr 27, 2012, 5:11:03 AM4/27/12
to Tobias Grosser, poll...@googlegroups.com
On Fri, Apr 27, 2012 at 4:03 PM, Tobias Grosser <tob...@grosser.es> wrote:
> On 04/27/2012 06:38 AM, Hongbin Zheng wrote:
>>
>> hi tobi,
>>
>> On Fri, Apr 27, 2012 at 11:46 AM, Hongbin Zheng<ethe...@gmail.com>
>>  wrote:
>>>
>>> hi tobi,
>>>>
>>>>
>>>>
>>>> They are. The problem comes into play if you run lnt with
>>>> --multisample=2
>>
>> I found that the link
>>
>> [http://lab.llvm.org:8011/builders/clang-x86_64-darwin10-nt-O3/builds/362/steps/lnt.nightly-test/logs/stdio]
>> you post to me is run lnt with --multisample=3, so i rerun lnt with
>> --multisample=3. But i still can some output from test log:
>> $ run lnt with --multisample=3
>> ...
>> 2012-04-27 04:33:36: starting test in
>> '/home/bot/pollywork/pollysb/test-2012-04-27_04-33-36-0'
>> ...
>> $ cat /home/bot/pollywork/pollysb/test-2012-04-27_04-33-36-0/test.log
>
>
> In the buildbot lnt is also run with --no-timestamp.
>
> This ensures that the normal test.log file is directly in the root of the
> testing directory. With --multisample the files are in 0/test.log
> 1/test.log, ...
Can use just parse the --multisample option, and insert the path of
the log file at each build isteration into the logfiles dictionary of
the build step?
The code in AddLNTTestsToFactory should looks like:

logfiles={'configure.log' : 'nt/build/configure.log',
'build-tools.log' : 'nt/build/build-tools.log',
'test.log' : 'nt/build/test.log',
'report.json' : 'nt/build/report.json'}

NumIteration = ...parse --multisample with regex ...

// Insert the pat of the logfile at each iteration, so that the
buildbot can watch them.
if (NumIteration != None)
for (i = 0; i < NumIteration; ++i) {
// Insert the path of the logfile at the ith iteration.
logfiles['configure.log.' + i ] = 'nt/build-' + i + '/configure.log'
... the same for build-tools.log, test.log, report.json ...
}

f.addStep( ... logfiles=logfiles ...)


Dose this make sense?
best regards
ether

Tobias Grosser

unread,
Apr 27, 2012, 5:14:12 AM4/27/12
to Hongbin Zheng, poll...@googlegroups.com
Yes. That would work.

As stated before Daniel would prefer to have global log files as a
feature for LNT as it seems useful in general. But I have personally not
a strong opinion. If you have a patch to LNT that makes it work, I am
fine with that.

Tobi

Tobias Grosser

unread,
Apr 27, 2012, 5:14:47 AM4/27/12
to Hongbin Zheng, poll...@googlegroups.com
>> ... the same for build-tools.log, test.log, report.json ...
>> }
>>
>> f.addStep( ... logfiles=logfiles ...)
>
> Yes. That would work.
>
> As stated before Daniel would prefer to have global log files as a
> feature for LNT as it seems useful in general. But I have personally not
> a strong opinion. If you have a patch to LNT that makes it work, I am
> fine with that.

I mean a patch for the buildbot scripts.

Tobi

Hongbin Zheng

unread,
Apr 27, 2012, 5:24:25 AM4/27/12
to Tobias Grosser, poll...@googlegroups.com
Oh, since the logs are redirected from the stdout of the
subprocess[1], i think redirect them to the same log file is possible.

env=os.environ)

But i am not sure if we can redirect the stdout of the subprocess to
more than one files at the same time, which means if we generate the
global log, the individual log for each iteration is gone.


[1] Code snippet:
...
build_tools_log_path = os.path.join(basedir, 'build-tools.log')
...
p = subprocess.Popen(args=args, stdin=None, stdout=build_tools_log,
stderr=subprocess.STDOUT, cwd=basedir,
>
> Tobi
>

Tobias Grosser

unread,
Apr 27, 2012, 5:51:15 AM4/27/12
to Hongbin Zheng, poll...@googlegroups.com
We can introduce a --global-logs option, that switches to global log
files. That would be OK for me. But I have no idea how complicated it
is. (Especially when merging JSON files). If you believe providing the
individual log files in the buildbot scripts is easier, we can commit
such a change as well. I have not had a deeper look, so I would rely on
your advise here.

Tobi

Hongbin Zheng

unread,
Apr 27, 2012, 6:08:33 AM4/27/12
to Tobias Grosser, poll...@googlegroups.com
I am trying to duplicate the stdout with "tee"

ether

Tobias Grosser

unread,
Apr 27, 2012, 6:17:30 AM4/27/12
to Hongbin Zheng, poll...@googlegroups.com
You can also do something like this:

proc = subprocess.Popen(path, 0, None, subprocess.PIPE, subprocess.PIPE,
None)

for l in proc.stdout.readlines():
output_file.write(l)


And output_file can be implemented as:

class DoubleFile
__init__(self, file1, file2):
self.file1 = open(file1, 'w')
self.file2 = open(file2, 'w')

write(self, data):
self.file1.write(data)
self.file2.write(data)

output_file = DoubleFile('file1.txt', 'file2.txt')

Cheers
Tobi

Hongbin Zheng

unread,
Apr 27, 2012, 7:05:21 AM4/27/12
to Tobias Grosser, poll...@googlegroups.com
Ok, thanks :)

Hongbin Zheng

unread,
Apr 29, 2012, 8:21:05 AM4/29/12
to Tobias Grosser, poll...@googlegroups.com
hi tobi,

On Fri, Apr 27, 2012 at 5:51 PM, Tobias Grosser <tob...@grosser.es> wrote:
> We can introduce a --global-logs option, that switches to global log files.
> That would be OK for me. But I have no idea how complicated it is.
> (Especially when merging JSON files). If you believe providing the
> individual log files in the buildbot scripts is easier, we can commit such a
> change as well. I have not had a deeper look, so I would rely on your advise
> here.
>

It is much more easier if we simply duplicate the logfile content to
stdout. This patch add a new option "--duplicate-logs", with this
option set, the content of logfiles will also be sent to stdout. But
duplicating the logs to stdout will produce lots of output, i think we
can further filter the output by some regex pattern provided by user.

best regards
ether
---
lnt/tests/nt.py | 42 +++++++++++++++++++++++++++---------------
1 files changed, 27 insertions(+), 15 deletions(-)

diff --git a/lnt/tests/nt.py b/lnt/tests/nt.py
index 291ec1b..f9b8226 100644
--- a/lnt/tests/nt.py
+++ b/lnt/tests/nt.py
@@ -80,6 +80,25 @@ def scan_for_test_modules(opts):
assert dirpath.startswith(base_modules_path + '/')
yield dirpath[len(base_modules_path) + 1:]

+def execute_command(test_log, basedir, args, opts):
+ logfile = test_log
+
+ if opts.duplicate_logs:
+ logfile = subprocess.PIPE
+
+ p = subprocess.Popen(args=args, stdin=None, stdout=logfile,
+ stderr=subprocess.STDOUT, cwd=basedir,
+ env=os.environ)
+
+ if opts.duplicate_logs:
+ while p.poll() is None:
+ l = p.stdout.readline()
+ if len(l) > 0:
+ test_log.write(l)
+ sys.stdout.write(l)
+
+ return p.wait()
+
def execute_test_modules(test_log, test_modules, test_module_variables,
basedir, opts):
# For now, we don't execute these in parallel, but we do forward the
@@ -236,10 +255,7 @@ def execute_nt_tests(test_log, make_variables,
basedir, opts):

print >>sys.stderr, '%s: building "nightly tests" with -j%u...' % (
timestamp(), opts.build_threads)
- p = subprocess.Popen(args=args, stdin=None, stdout=test_log,
- stderr=subprocess.STDOUT, cwd=basedir,
- env=os.environ)
- res = p.wait()
+ res = execute_command(test_log, basedir, args, opts)

# Then 'make report'.
args = common_args + ['-j', str(opts.threads),
@@ -254,10 +270,8 @@ def execute_nt_tests(test_log, make_variables,
basedir, opts):
# X (which changes the driver behavior and causes generally weirdness).
print >>sys.stderr, '%s: executing "nightly tests" with -j%u...' % (
timestamp(), opts.threads)
- p = subprocess.Popen(args=args, stdin=None, stdout=test_log,
- stderr=subprocess.STDOUT, cwd=basedir,
- env=os.environ)
- res = p.wait()
+
+ res = execute_command(test_log, basedir, args, opts)

def load_nt_report_file(report_path, opts):
# Compute the test samples to report.
@@ -658,9 +672,7 @@ def run_test(nick_prefix, opts, iteration):
configure_log.flush()

print >>sys.stderr, '%s: configuring...' % timestamp()
- p = subprocess.Popen(args=args, stdin=None, stdout=configure_log,
- stderr=subprocess.STDOUT, cwd=basedir)
- res = p.wait()
+ res = execute_command(configure_log, basedir, args, opts)
configure_log.close()
if res != 0:
fatal('configure failed, log is here: %r' % configure_log_path)
@@ -690,10 +702,7 @@ def run_test(nick_prefix, opts, iteration):
' '.join('"%s"' % a
for a in args))
build_tools_log.flush()
- p = subprocess.Popen(args=args, stdin=None, stdout=build_tools_log,
- stderr=subprocess.STDOUT, cwd=basedir,
- env=os.environ)
- res = p.wait()
+ res = execute_command(build_tools_log, basedir, args, opts)
build_tools_log.close()
if res != 0:
fatal('unable to build tools, aborting!')
@@ -1064,6 +1073,9 @@ class NTTest(builtintest.BuiltinTest):
help=("Don't put machine (instance) dependent "
"variables with machine info"),
action="store_false", default=True)
+ group.add_option("", "--duplicate-logs", dest="duplicate_logs",
+ help=("Duplicate the logfile content to stdout"),
+ action="store_true", default=False)
group.add_option("", "--run-order", dest="run_order", metavar="STR",
help="String to use to identify and order this run",
action="store", type=str, default=None)
--
1.7.5.4
0001-Nightly-test-Duplicate-the-logfile-content-to-stdout.patch

Tobias Grosser

unread,
Apr 29, 2012, 9:39:15 AM4/29/12
to Hongbin Zheng, poll...@googlegroups.com
On 04/29/2012 02:21 PM, Hongbin Zheng wrote:
> hi tobi,
>
> On Fri, Apr 27, 2012 at 5:51 PM, Tobias Grosser<tob...@grosser.es> wrote:
>> We can introduce a --global-logs option, that switches to global log files.
>> That would be OK for me. But I have no idea how complicated it is.
>> (Especially when merging JSON files). If you believe providing the
>> individual log files in the buildbot scripts is easier, we can commit such a
>> change as well. I have not had a deeper look, so I would rely on your advise
>> here.
>>
>
> It is much more easier if we simply duplicate the logfile content to
> stdout. This patch add a new option "--duplicate-logs", with this
> option set, the content of logfiles will also be sent to stdout. But
> duplicating the logs to stdout will produce lots of output, i think we
> can further filter the output by some regex pattern provided by user.

I commented in the bug report you opened. Here again:


Hi ether,

thanks a lot for working on this. Your patch looks very interesting as
it shows that you already got the log file duplication working.

I think it would make sense to not pipe the second stream to stdout, but
create a global log directory for the multisample runs.

At the moment we create e.g. the directories build-0, build-1, build-2.
I propose to create a directory build/ in which we create a second
version of the logs.

We can also use this to properly deliver the merged json report. At the
moment we create a new file called 'report-merged.json' and put it in
first_basedir:

# Write out the merged report.
lnt_report_path = os.path.join(first_basedir,
'report-merged.json')
report = lnt.testing.Report(machine, run, test_samples)
lnt_report_file = open(lnt_report_path, 'w')
print >>lnt_report_file,report.render()

To me it would be a lot more intuitive to have (even for the multisample
build) a directory build/ in which at the end of the build the merged
json file is available at file report.json. (That happens to be the very
same location where our buildbot scripts expects this file to be)

Hongbin Zheng

unread,
May 3, 2012, 4:36:21 AM5/3/12
to Tobias Grosser, poll...@googlegroups.com
Hi tobi,

Sorry for the delay.
The patch write the duplicated logfiles to a file, after this we can
parse the merge global log directly.

In fact the patch is not complete,
o we need to clean the files in the global dir.
o we need to check if people passing the opts.duplicate_logs file
without passing multi-sample flags, this is not allow
o Instead of use flag 'opts.duplicate_logs', we can check if people
provide the basedir of the global logs (with new option
'opts.global_log_basedir'). Providing the base dir of global log means
people want to duplicate the output logs to a global location.

What do you think about this?

best regards
ether.

---
lnt/tests/nt.py | 13 ++++++++++++-
1 files changed, 12 insertions(+), 1 deletions(-)

diff --git a/lnt/tests/nt.py b/lnt/tests/nt.py
index f9b8226..75cb671 100644
--- a/lnt/tests/nt.py
+++ b/lnt/tests/nt.py
@@ -85,6 +85,15 @@ def execute_command(test_log, basedir, args, opts):

if opts.duplicate_logs:
logfile = subprocess.PIPE
+ # Open a duplicated logfile at the global dir.
+ logdir, logname = os.path.split(test_log.name)
+ logdir, _ = os.path.split(logdir)
+ logdir = os.path.join(logdir, 'build')
+ if not os.path.exists(logdir):
+ os.mkdir(logdir)
+ global_log_path = os.path.join(logdir, logname)
+ print global_log_path
+ global_log = open(global_log_path, 'a+')

p = subprocess.Popen(args=args, stdin=None, stdout=logfile,
stderr=subprocess.STDOUT, cwd=basedir,
@@ -95,7 +104,9 @@ def execute_command(test_log, basedir, args, opts):
l = p.stdout.readline()
if len(l) > 0:
test_log.write(l)
- sys.stdout.write(l)
+ global_log.write(l)
+
+ global_log.close()

return p.wait()

--
1.7.5.4

Tobias Grosser

unread,
May 3, 2012, 7:22:21 AM5/3/12
to Hongbin Zheng, poll...@googlegroups.com
On 05/03/2012 10:36 AM, Hongbin Zheng wrote:
> Hi tobi,
[...]
>> file is available at file report.json. (That happens to be the very same
>> location where our buildbot scripts expects this file to be)
> The patch write the duplicated logfiles to a file, after this we can
> parse the merge global log directly.
>
> In fact the patch is not complete,
> o we need to clean the files in the global dir.
True.

> o we need to check if people passing the opts.duplicate_logs file
> without passing multi-sample flags, this is not allow
> o Instead of use flag 'opts.duplicate_logs', we can check if people
> provide the basedir of the global logs (with new option
> 'opts.global_log_basedir'). Providing the base dir of global log means
> people want to duplicate the output logs to a global location.

I would remove the option duplicate_logs. We should use duplicate logs
files as soon as multi-sample=x with x >= 2 is used.

Another item that is needed is to create the merged report in the global
dir instead of the last individual run.

I propose to change this piece of code:

# Multisample, if requested.
if opts.multisample is not None:
# Collect the sample reports.
reports = []
first_basedir = None

[...]

# Write out the merged report.
lnt_report_path = os.path.join(first_basedir,
'report-merged.json')
report = lnt.testing.Report(machine, run, test_samples)
lnt_report_file = open(lnt_report_path, 'w')
print >>lnt_report_file,report.render()
lnt_report_file.close()

return report


Instead of using first_basedir here, we can create (and possibly clean)
the global dir here. We can than easily put a file report.json into this
global dir. The very same directory can than be passed down to the
run_test() call. run_test() can, in case of global_dir != None, create a
second copy of it's logs in this global dir.

We also need to pay attention that the creation of the global dir work
both with and without --no-timestamp.

Tobi

Hongbin Zheng

unread,
May 3, 2012, 9:37:07 AM5/3/12
to Tobias Grosser, poll...@googlegroups.com
hi tobi,

Fulfill all your wishes:
$git diff git-svn

diff --git a/lnt/tests/nt.py b/lnt/tests/nt.py
index 291ec1b..31f376c 100644
--- a/lnt/tests/nt.py
+++ b/lnt/tests/nt.py
@@ -7,6 +7,7 @@ import subprocess
import sys
import time
import traceback
+import glob

from datetime import datetime

@@ -80,6 +81,32 @@ def scan_for_test_modules(opts):
assert dirpath.startswith(base_modules_path + '/')
yield dirpath[len(base_modules_path) + 1:]

+def execute_command(test_log, basedir, args, global_basedir):
+ logfile = test_log
+
+ if global_basedir is not None:
+ logfile = subprocess.PIPE
+ # Open a duplicated logfile at the global dir.
+ _, logname = os.path.split(test_log.name)
+ global_log_path = os.path.join(global_basedir, logname)
+ global_log = open(global_log_path, 'a+')
+
+ p = subprocess.Popen(args=args, stdin=None, stdout=logfile,
+ stderr=subprocess.STDOUT, cwd=basedir,
+ env=os.environ)
+
+ if global_basedir is not None:
+ while p.poll() is None:
+ l = p.stdout.readline()
+ if len(l) > 0:
+ test_log.write(l)
+ global_log.write(l)
+
+ global_log.close()
+
+ return p.wait()
+
+# FIXME: Support duplicate logfiles to global directory.
def execute_test_modules(test_log, test_modules, test_module_variables,
basedir, opts):
# For now, we don't execute these in parallel, but we do forward the
@@ -190,7 +217,7 @@ def compute_test_module_variables(make_variables, opts):

return test_module_variables

-def execute_nt_tests(test_log, make_variables, basedir, opts):
+def execute_nt_tests(test_log, make_variables, basedir, opts, global_basedir):
common_args = ['make', '-k']
common_args.extend('%s=%s' % (k,v) for k,v in make_variables.items())
if opts.only_test is not None:
@@ -236,10 +263,7 @@ def execute_nt_tests(test_log, make_variables,
basedir, opts):

print >>sys.stderr, '%s: building "nightly tests" with -j%u...' % (
timestamp(), opts.build_threads)
- p = subprocess.Popen(args=args, stdin=None, stdout=test_log,
- stderr=subprocess.STDOUT, cwd=basedir,
- env=os.environ)
- res = p.wait()
+ res = execute_command(test_log, basedir, args, global_basedir)

# Then 'make report'.
args = common_args + ['-j', str(opts.threads),
@@ -254,10 +278,8 @@ def execute_nt_tests(test_log, make_variables,
basedir, opts):
# X (which changes the driver behavior and causes generally weirdness).
print >>sys.stderr, '%s: executing "nightly tests" with -j%u...' % (
timestamp(), opts.threads)
- p = subprocess.Popen(args=args, stdin=None, stdout=test_log,
- stderr=subprocess.STDOUT, cwd=basedir,
- env=os.environ)
- res = p.wait()
+
+ res = execute_command(test_log, basedir, args, global_basedir)

def load_nt_report_file(report_path, opts):
# Compute the test samples to report.
@@ -537,7 +559,41 @@ def compute_run_make_variables(opts,
llvm_source_version, target_flags,

return make_variables

-def run_test(nick_prefix, opts, iteration):
+def prepare_basedir(opts, start_time, iteration):
+ # Set up the sandbox.
+ if not os.path.exists(opts.sandbox_path):
+ print >>sys.stderr, "%s: creating sandbox: %r" % (
+ timestamp(), opts.sandbox_path)
+ os.mkdir(opts.sandbox_path)
+
+ # Create the per-test directory.
+ if opts.timestamp_build:
+ ts = start_time.replace(' ','_').replace(':','-')
+ build_dir_name = "test-%s" % ts
+ else:
+ build_dir_name = "build"
+ if iteration is not None:
+ build_dir_name = "%s-%d" % (build_dir_name, iteration)
+ basedir = os.path.join(opts.sandbox_path, build_dir_name)
+
+ # Canonicalize paths, in case we are using e.g. an NFS remote mount.
+ #
+ # FIXME: This should be eliminated, along with the realpath call below.
+ basedir = os.path.realpath(basedir)
+
+ if os.path.exists(basedir):
+ needs_clean = True
+ else:
+ needs_clean = False
+ os.mkdir(basedir)
+
+ # Unless not using timestamps, we require the basedir not to exist.
+ if needs_clean and opts.timestamp_build:
+ fatal('refusing to reuse pre-existing build dir %r' % basedir)
+
+ return basedir
+
+def run_test(nick_prefix, opts, iteration, global_basedir):
print >>sys.stderr, "%s: checking source versions" % (
timestamp(),)
if opts.llvm_src_root:
@@ -600,37 +656,8 @@ def run_test(nick_prefix, opts, iteration):
nick += "__%s__%s" % (cc_nick, cc_info.get('cc_target').split('-')[0])
print >>sys.stderr, "%s: using nickname: %r" % (timestamp(), nick)

- # Set up the sandbox.
- if not os.path.exists(opts.sandbox_path):
- print >>sys.stderr, "%s: creating sandbox: %r" % (
- timestamp(), opts.sandbox_path)
- os.mkdir(opts.sandbox_path)
-
- # Create the per-test directory.
start_time = timestamp()
- if opts.timestamp_build:
- ts = start_time.replace(' ','_').replace(':','-')
- build_dir_name = "test-%s" % ts
- else:
- build_dir_name = "build"
- if iteration is not None:
- build_dir_name = "%s-%d" % (build_dir_name, iteration)
- basedir = os.path.join(opts.sandbox_path, build_dir_name)
-
- # Canonicalize paths, in case we are using e.g. an NFS remote mount.
- #
- # FIXME: This should be eliminated, along with the realpath call below.
- basedir = os.path.realpath(basedir)
-
- if os.path.exists(basedir):
- needs_clean = True
- else:
- needs_clean = False
- os.mkdir(basedir)
-
- # Unless not using timestamps, we require the basedir not to exist.
- if needs_clean and opts.timestamp_build:
- fatal('refusing to reuse pre-existing build dir %r' % basedir)
+ basedir = prepare_basedir(opts, start_time, iteration)

# FIXME: Auto-remove old test directories in the source directory (which
# cause make horrible fits).
@@ -658,9 +685,7 @@ def run_test(nick_prefix, opts, iteration):
configure_log.flush()

print >>sys.stderr, '%s: configuring...' % timestamp()
- p = subprocess.Popen(args=args, stdin=None, stdout=configure_log,
- stderr=subprocess.STDOUT, cwd=basedir)
- res = p.wait()
+ res = execute_command(configure_log, basedir, args, global_basedir)
configure_log.close()
if res != 0:
fatal('configure failed, log is here: %r' % configure_log_path)
@@ -690,10 +715,7 @@ def run_test(nick_prefix, opts, iteration):
' '.join('"%s"' % a
for a in args))
build_tools_log.flush()
- p = subprocess.Popen(args=args, stdin=None, stdout=build_tools_log,
- stderr=subprocess.STDOUT, cwd=basedir,
- env=os.environ)
- res = p.wait()
+ res = execute_command(build_tools_log, basedir, args, global_basedir)
build_tools_log.close()
if res != 0:
fatal('unable to build tools, aborting!')
@@ -714,7 +736,8 @@ def run_test(nick_prefix, opts, iteration):
run_nightly_test = (opts.only_test is None or
not opts.only_test.startswith("LNTBased"))
if run_nightly_test:
- execute_nt_tests(test_log, make_variables, basedir, opts)
+ execute_nt_tests(test_log, make_variables, basedir, opts,
+ global_basedir)

# Run the extension test modules, if needed.
test_module_results = execute_test_modules(test_log, test_modules,
@@ -836,7 +859,7 @@ def run_test(nick_prefix, opts, iteration):
print >>lnt_report_file,report.render()
lnt_report_file.close()

- return report, basedir
+ return report

###

@@ -1211,14 +1234,20 @@ class NTTest(builtintest.BuiltinTest):
if opts.multisample is not None:
# Collect the sample reports.
reports = []
- first_basedir = None
+ global_basedir = None
+ if opts.multisample > 1 :
+ global_basedir = prepare_basedir(opts, timestamp(), None)
+ # Remove all existed log or josn files.
+ for file in glob.glob(os.path.join(global_basedir, '*.log')):
+ os.remove(file)
+ for file in glob.glob(os.path.join(global_basedir, '*.json')):
+ os.remove(file)
+
for i in range(opts.multisample):
print >>sys.stderr, "%s: (multisample) running
iteration %d" % (
timestamp(), i)
- report, basedir = run_test(nick, opts, i)
+ report = run_test(nick, opts, i, global_basedir)
reports.append(report)
- if first_basedir is None:
- first_basedir = basedir

# Create the merged report.
#
@@ -1232,7 +1261,11 @@ class NTTest(builtintest.BuiltinTest):
for r in reports], [])

# Write out the merged report.
- lnt_report_path = os.path.join(first_basedir, 'report-merged.json')
+ if global_basedir is None:
+ # Simply place the json files to the directory of first
+ # iteration.
+ global_basedir = prepare_basedir(opts, timestamp(), 0)
+ lnt_report_path = os.path.join(global_basedir,
'report-merged.json')
report = lnt.testing.Report(machine, run, test_samples)
lnt_report_file = open(lnt_report_path, 'w')
print >>lnt_report_file,report.render()
@@ -1240,7 +1273,7 @@ class NTTest(builtintest.BuiltinTest):

return report

- report, _ = run_test(nick, opts, None)
+ report = run_test(nick, opts, None, None)
return report

def create_instance():
0001-Refactor-Move-the-basedir-preparing-code-to-a-standa.patch
0002-Generate-the-merged-json-file-in-a-basedir-without-i.patch
0003-Duplicate-logfile-to-a-iteration-independent-directo.patch

Tobias Grosser

unread,
May 3, 2012, 10:45:57 AM5/3/12
to Hongbin Zheng, poll...@googlegroups.com
On 05/03/2012 03:37 PM, Hongbin Zheng wrote:
> hi tobi,
>
> Fulfill all your wishes:

Nice. I tested it and it works very well! I hope this matches what
Daniel wants. I have just some comments on the code itself:

> From 62330808a396ddcb140e5e06518002b541ddfae1 Mon Sep 17 00:00:00 2001
> From: ether<ethe...@gmail.com>
> Date: Thu, 3 May 2012 19:53:37 +0800
> Subject: [PATCH 1/3] Refactor: Move the basedir preparing code to a
> standalone function.

No comments for this one.

> From b87c36e2f900e30626bf817d0f320d24604b72ec Mon Sep 17 00:00:00 2001
> From: ether<ethe...@gmail.com>
> Date: Thu, 3 May 2012 21:29:39 +0800
> Subject: [PATCH 2/3] Generate the merged json file in a basedir without
> iteration number in multisample mode.
> + if opts.multisample> 1 :

I know I requested this earlier, but looking at the code,
I don't think it makes sense to special case for multisample > 1. Having
a base dir for multisample == 1 does not hurt and if someone requested
multisample=1 he is prepared to get the multisample layout. I think we
should always create the base dir and remove this condition.

> # Write out the merged report.
> - lnt_report_path = os.path.join(first_basedir, 'report-merged.json')

I would call it 'report.json'. As the file is in another directory,
there the name will not clash with the report.json files of the
individual runs. By using 'report.json' the result of a single run is at
the same location as the (global) result of a mult-sample run.

> From f0d6995a64b3b51e7b269fab1dc541c5bc3e30f2 Mon Sep 17 00:00:00 2001
> From: ether<ethe...@gmail.com>
> Date: Thu, 3 May 2012 21:31:29 +0800
> Subject: [PATCH 3/3] Duplicate logfile to a iteration-independent directory
> in multisample mode.

Looks OK.

I think with this changes performed, we can submit it to daniel for
review.

Tobi

Hongbin Zheng

unread,
May 3, 2012, 8:54:43 PM5/3/12
to Tobias Grosser, poll...@googlegroups.com
Hi tobi,

On Thu, May 3, 2012 at 10:45 PM, Tobias Grosser <tob...@grosser.es> wrote:
> I know I requested this earlier, but looking at the code,
> I don't think it makes sense to special case for multisample > 1. Having
> a base dir for multisample == 1 does not hurt and if someone requested
> multisample=1 he is prepared to get the multisample layout. I think we
> should always create the base dir and remove this condition.
>
>
>>              # Write out the merged report.
>> -            lnt_report_path = os.path.join(first_basedir,
>> 'report-merged.json')
>
>
> I would call it 'report.json'. As the file is in another directory,
> there the name will not clash with the report.json files of the
> individual runs. By using 'report.json' the result of a single run is at
> the same location as the (global) result of a mult-sample run.
>
I do not have strong idea on this, but i feel like maybe we should
refuse to run the lnt with multisample=1, which is meaningless.

best regards
ether

Tobias Grosser

unread,
May 4, 2012, 3:32:20 AM5/4/12
to Hongbin Zheng, poll...@googlegroups.com
Yes, sounds reasonable. Though this is unrelated and we should submit it
in an independent patch.

Tobi

Hongbin Zheng

unread,
Jun 28, 2012, 10:39:58 AM6/28/12
to Tobias Grosser, poll...@googlegroups.com
Hi tobi,

So what do you think about this patch? :)

best regards
ether

On Thu, Apr 26, 2012 at 1:23 AM, Tobias Grosser <tob...@grosser.es> wrote:
> Hi ether,
>
> I am currently very busy, but plan to review this tomorrow.
>

Hongbin Zheng

unread,
Aug 26, 2012, 11:29:43 PM8/26/12
to Tobias Grosser, poll...@googlegroups.com
Hi tobi,

I attached the up to date patch. Though i need spend some more time to
use the SCEVRewriter to rewrite the address compuation.

best regards
ether

From 5084ada05e74ddb40907d9ffcb682895f21a439d Mon Sep 17 00:00:00 2001
From: ether <ethe...@gmail.com>
Date: Mon, 27 Aug 2012 11:25:06 +0800
Subject: [PATCH] Generate code for ScopStmt by copying the entire instruction
tree of the instruction.

---
include/polly/CodeGen/BlockGenerators.h | 59 +++++++++++
lib/CodeGen/BlockGenerators.cpp | 165 +++++++++++++++++++++++++++++++
lib/CodeGen/CodeGeneration.cpp | 2 +-
3 files changed, 225 insertions(+), 1 deletions(-)

diff --git a/include/polly/CodeGen/BlockGenerators.h
b/include/polly/CodeGen/BlockGenerators.h
index 842c616..0b9f4cf 100644
--- a/include/polly/CodeGen/BlockGenerators.h
+++ b/include/polly/CodeGen/BlockGenerators.h
@@ -246,6 +246,65 @@ private:
void copyBB();
};

+/// @brief Generate a new basic block for a polyhedral statement.
+///
+class TreeCopyStmtGenerator : public BlockGenerator {
+public:
+ /// @brief Generate a new BasicBlock for a ScopStmt.
+ ///
+ /// @param Builder The LLVM-IR Builder used to generate the statement. The
+ /// code is generated at the location, the Builder
points to.
+ /// @param Stmt The statement to code generate.
+ /// @param GlobalMap A map that defines for certain Values
referenced from the
+ /// original code new Values they should be replaced with.
+ /// @param P A reference to the pass this function is called from.
+ /// The pass is needed to update other analysis.
+ static void codegen(IRBuilder<> &Builder, ScopStmt &Stmt,
+ ValueMapT &GlobalMap, Pass *P);
+
+protected:
+ TreeCopyStmtGenerator(IRBuilder<> &B, ScopStmt &Stmt, Pass *P)
+ : BlockGenerator(B, Stmt, P) {}
+
+ /// @brief Cast the type of the operand.
+ ///
+ /// @param NewOperand The value need to cast.
+ /// @param DstTy The destinated type.
+ /// @param InsertPos The position to insert the cast instruction.
+ ///
+ /// @return The casted value if the type of NewOperand do not match DstTy, or
+ /// the simply NewOperand otherwise.
+ static Value *castOperandType(Value *NewOperand, Type *DstTy,
+ Instruction *InsertPos);
+
+ /// @brief Generate code for memory access in ScopStmt.
+ ///
+ /// @param MA The memory access to generate code for.
+ /// @param Inst The underlying instruction that access the memory.
+ /// @param BBMap A mapping from old values to their new values
+ /// (for values recalculated within this basic block).
+ /// @param GlobalMap A mapping from old values to their new values
+ /// (for values recalculated in the new ScoP, but not
+ /// within this basic block).
+ void generateMemoryAccess(MemoryAccess *MA, const Instruction *Inst,
+ ValueMapT &GlobalMap, ValueMapT &BBMap);
+
+ /// @brief Clone the Instruction and its dependents recursively until the
+ /// instuction appears in the value map.
+ ///
+ /// @param Root The instruction need to be cloned.
+ /// @param BBMap A mapping from old values to their new values
+ /// (for values recalculated within this basic block).
+ /// @param GlobalMap A mapping from old values to their new values
+ /// (for values recalculated in the new ScoP, but not
+ /// within this basic block).
+ /// @param NameSuffix The name suffix of the newly cloned instruction.
+ ///
+ /// @return The newly cloned instruction.
+ Instruction *cloneInstrTree(const Instruction *Root, ValueMapT &BBMap,
+ ValueMapT &GlobalMap, const Twine
&NameSuffix="");
+};
+
}
#endif

diff --git a/lib/CodeGen/BlockGenerators.cpp b/lib/CodeGen/BlockGenerators.cpp
index e8109bd..0d543d4 100644
--- a/lib/CodeGen/BlockGenerators.cpp
+++ b/lib/CodeGen/BlockGenerators.cpp
@@ -23,6 +23,8 @@
#include "llvm/Analysis/ScalarEvolutionExpander.h"
#include "llvm/Transforms/Utils/BasicBlockUtils.h"
#include "llvm/Support/CommandLine.h"
+#define DEBUG_TYPE "polly-block-generators"
+#include "llvm/Support/Debug.h"

#include "isl/aff.h"
#include "isl/set.h"
@@ -883,3 +885,166 @@ void VectorBlockGenerator::copyBB() {
II != IE; ++II)
copyInstruction(II, VectorBlockMap, ScalarBlockMap);
}
+
+
+Value *TreeCopyStmtGenerator::castOperandType(Value *NewOperand, Type *DstTy,
+ Instruction *InsertPos) {
+ unsigned OldSize = DstTy->getScalarSizeInBits();
+ unsigned NewSize = NewOperand->getType()->getScalarSizeInBits();
+ // Insert a cast if types are different
+ // FIXME: Use IRBuilder, so we have chance to fold the cast.
+ if (OldSize < NewSize)
+ return CastInst::CreateTruncOrBitCast(NewOperand, DstTy,
+ NewOperand->getName() + "_c",
+ InsertPos);
+
+ assert(OldSize == NewSize && "Unexpect OldSize bigger than NewSize!");
+
+ return NewOperand;
+}
+
+void TreeCopyStmtGenerator::codegen(IRBuilder<> &Builder, ScopStmt &Stmt,
+ ValueMapT &GlobalMap, Pass *P) {
+ TreeCopyStmtGenerator Generator(Builder, Stmt, P);
+
+ BasicBlock *BB = Generator.Statement.getBasicBlock();
+ BasicBlock *CopyBB = SplitBlock(Builder.GetInsertBlock(),
+ Builder.GetInsertPoint(), P);
+ CopyBB->setName("polly.stmt." + BB->getName());
+ Builder.SetInsertPoint(CopyBB->begin());
+
+ ValueMapT BBMap;
+
+ // To generate code for finder grain statement, we need to remeber the
+ // underlying instruction of the memory access.
+ // typedef ScopStmt::memacc_iterator acc_it;
+ // for (acc_it I = Stmt.memacc_begin(), E = Stmt.memacc_end(); I != E; ++I)
+ // Generate code for access I.
+
+ typedef BasicBlock::const_iterator it;
+ for (it II = BB->begin(), IE = BB->end(); II != IE; ++II) {
+ const Instruction *Inst = II;
+ if (MemoryAccess *MA = Stmt.lookupAccessFor(Inst))
+ Generator.generateMemoryAccess(MA, Inst, GlobalMap, BBMap);
+ }
+}
+
+void TreeCopyStmtGenerator::generateMemoryAccess(MemoryAccess *MA,
+ const Instruction *Inst,
+ ValueMapT &GlobalMap,
+ ValueMapT &BBMap) {
+ isl_map *CurrentAccessRelation = MA->getAccessRelation();
+ isl_map *NewAccessRelation = MA->getNewAccessRelation();
+
+ assert(isl_map_has_equal_space(CurrentAccessRelation, NewAccessRelation)
+ && "Current and new access function use different spaces");
+
+ const Value *PointerOperand = MA->isRead() ?
+ cast<LoadInst>(Inst)->getPointerOperand() :
+ cast<StoreInst>(Inst)->getPointerOperand();
+
+ // Do we need to generate the address from new access relation?
+ if (NewAccessRelation) {
+ Value *BaseAddress = const_cast<Value*>(MA->getBaseAddr());
+ Value *NewPointer = getNewAccessOperand(NewAccessRelation, BaseAddress,
+ BBMap, GlobalMap);
+ BBMap[PointerOperand] = NewPointer;
+ }
+
+ cloneInstrTree(Inst, GlobalMap, BBMap);
+
+ // Free the context.
+ isl_map_free(CurrentAccessRelation);
+ if (NewAccessRelation) {
+ BBMap.erase(PointerOperand);
+ isl_map_free(NewAccessRelation);
+ }
+}
+
+Instruction *TreeCopyStmtGenerator::cloneInstrTree(const Instruction *Root,
+ ValueMapT &GlobalMap,
+ ValueMapT &BBMap,
+ const Twine &NameSuffix) {
+ // Clone the incoming instruction.
+ Instruction *NewI = Root->clone();
+ const BasicBlock *OldBB = Root->getParent();
+ if (!NewI->getType()->isVoidTy())
+ NewI->setName(Root->getName() + NameSuffix);
+ // Insert it to the current bb and remember we already copy this instruction.
+ Builder.Insert(NewI);
+ // Remember we cloned this instruction.
+ BBMap[Root] = NewI;
+
+ // Depth first traverse the operand tree (or operand dag, because we will
+ // stop at PHINodes, so there are no cycle).
+ typedef Instruction::op_iterator ChildIt;
+ std::vector<std::pair<Instruction*, ChildIt> > WorkStack;
+
+ WorkStack.push_back(std::make_pair(NewI, NewI->op_begin()));
+
+ while (!WorkStack.empty()) {
+ Instruction *CurInst = WorkStack.back().first;
+ ChildIt It = WorkStack.back().second;
+ DEBUG(dbgs() << "Checking Operand of Node:\n" << *CurInst <<
"\n------>\n");
+ if (It == CurInst->op_end()) {
+ DEBUG(dbgs() << "Going to insert new inst: " << *CurInst << '\n');
+ // Insert the new instructions in topological order by insert the operand
+ // tree before the root.
+ // FIXME: Use IRBuilder?
+ if (!CurInst->getParent())
+ CurInst->insertBefore(NewI);
+
+ WorkStack.pop_back();
+ } else {
+ // for each node N,
+ Value *Operand = *It;
+ ++WorkStack.back().second;
+
+ DEBUG(dbgs() << "For Operand:\n" << *Operand << ' ' << Operand
<< "\n--->");
+
+ if (Value *NewParam = GlobalMap.lookup(Operand)) {
+ assert(Operand->getType() == NewParam->getType()
+ && "The type of parameter not match!");
+ DEBUG(dbgs() << "Replace parameter" << *Operand << " by "
+ << *NewParam << ".\n");
+ It->set(NewParam);
+ // If a value appeared in parameter map, it should not
appeared in local
+ // map
+ assert(!BBMap.count(Operand) && "Parameter in local map!");
+ continue;
+ }
+
+ Instruction *OpInst = dyn_cast<Instruction>(Operand);
+ // Do not clone the constant or parameters not in the parameters map.
+ if (OpInst == 0) continue;
+
+ // Now we need to clone Operand to CurBB.
+ // Check if we already clone it.
+ if (Value *Cloned = BBMap.lookup(OpInst)) {
+ DEBUG(dbgs() << "already cloned: " << *Cloned << ".\n");
+ It->set(castOperandType(Cloned, OpInst->getType(), NewI));
+ // Skip all its children as we already processed them.
+ continue;
+ }
+
+ // Ignore the parameter not in the subsitution map.
+ // FIXME: This is incorrect after IndependentBlocks pass is removed.
+ if (OldBB != OpInst->getParent()) continue;
+
+ // Else just clone the instruction.
+ // Note that NewOp is not inserted in any BB now, we will insert it when
+ // it popped form the work stack, so it will be inserted in topological
+ // order.
+ Instruction *NewOp = OpInst->clone();
+ NewOp->setName("p_" + OpInst->getName());
+ DEBUG(dbgs() << "Move to " << *NewOp << "\n");
+ It->set(NewOp);
+ // Insert it to the subsitiuation table.
+ BBMap[OpInst] = NewOp;
+ // Process its operands.
+ WorkStack.push_back(std::make_pair(NewOp, NewOp->op_begin()));
+ }
+ }
+
+ return NewI;
+}
diff --git a/lib/CodeGen/CodeGeneration.cpp b/lib/CodeGen/CodeGeneration.cpp
index 5395607..1e4c6b1 100644
--- a/lib/CodeGen/CodeGeneration.cpp
+++ b/lib/CodeGen/CodeGeneration.cpp
@@ -414,7 +414,7 @@ void ClastStmtCodeGen::codegen(const clast_user_stmt *u,
int VectorDimensions = IVS ? IVS->size() : 1;

if (VectorDimensions == 1) {
- BlockGenerator::generate(Builder, *Statement, ValueMap, P);
+ TreeCopyStmtGenerator::codegen(Builder, *Statement, ValueMap, P);
return;
}

--
1.7.5.4
0001-Use-block_iterator-in-function-Region-print.patch

Tobias Grosser

unread,
Aug 27, 2012, 3:28:37 AM8/27/12
to Hongbin Zheng, poll...@googlegroups.com
On 08/27/2012 05:29 AM, Hongbin Zheng wrote:
> Hi tobi,
>
> I attached the up to date patch. Though i need spend some more time to
> use the SCEVRewriter to rewrite the address compuation.

Hi ether,

thanks for restarting that discussion. I would love to have a look at
the most recent patch, but it seems you attached the block_iterator
patch. The patch that is inlined is unfortunately corrupt. Could you
send the right patch?

Cheers
Tobi

P.S.: I recently learned how to cite attached patches. So from my side
there is no need anymore to also inline the patches. ;-)

Hongbin Zheng

unread,
Aug 27, 2012, 8:26:29 AM8/27/12
to Tobias Grosser, poll...@googlegroups.com
Hi tobi,

On Mon, Aug 27, 2012 at 3:28 PM, Tobias Grosser <tob...@grosser.es> wrote:
> thanks for restarting that discussion. I would love to have a look at the
> most recent patch, but it seems you attached the block_iterator patch. The
> patch that is inlined is unfortunately corrupt. Could you send the right
> patch?
>
> Cheers
> Tobi
>
> P.S.: I recently learned how to cite attached patches. So from my side there
> is no need anymore to also inline the patches. ;-)
How?

best regards
ether
0001-Generate-code-for-ScopStmt-by-copying-the-entire-ins.patch

Tobias Grosser

unread,
Aug 27, 2012, 8:28:43 AM8/27/12
to Hongbin Zheng, poll...@googlegroups.com
In thunderbird, you just mark the attached text files with the mouse
before pressing the reply button. In case you do this, the attachments
will automatically be cited.

Also, thanks for the updated patch. I will have a look.

Tobi



Hongbin Zheng

unread,
Aug 27, 2012, 8:44:12 AM8/27/12
to Tobias Grosser, poll...@googlegroups.com
Thanks
>
> Tobi
>
>
>

Tobias Grosser

unread,
Aug 28, 2012, 7:42:44 AM8/28/12
to Hongbin Zheng, poll...@googlegroups.com
On 08/27/2012 02:26 PM, Hongbin Zheng wrote:
> From 5084ada05e74ddb40907d9ffcb682895f21a439d Mon Sep 17 00:00:00 2001
> From: ether<ethe...@gmail.com>
> Date: Mon, 27 Aug 2012 11:25:06 +0800
> Subject: [PATCH] Generate code for ScopStmt by copying the entire instruction
> tree of the instruction.

Hi ether,

thanks for the patch. I like the direction, but would also like to
understand your exact goal with this patch. Does this patch already
solve some existing problems, is preparation the ground for some upcoming
fixes or does it lower the requirements on our preprocessing? Maybe you
can describe in two sentences which problems this patch is solving?

Some comments on the high level algorithm:

1. I like the idea of walking the instruction tree

Reasoning about instruction trees is the right direction. It is also the
first step to support finer grained ScopStmts, not defined through basic
blocks, but through trees of connected instructions.

I even believe that we should reason about instruction trees way earlier
than code generation. At the very best we analyze the instruction trees
of the ScopStmts when we analyze the LLVM-IR the first time
(TempScopInfo or ScopInfo). At that point we can already use the
instruction trees to check if a ScopStmt is a reduction.

We can also use the trees to find the scev expressions used in a
ScopStmt. This will allow us to solve a big bug in the OpenMP code
generation: For now, we only reason about scev expressions in memory
accesses and loop bounds. Scev expressions that are only used as
operands to the calculations within the ScopStmt are currently not
analyzed. This means, parameters in these expressions are not detected
and are not passed to the OpenMP subfunctions.

Here an example where we don't understand that 'param' needs to be
passed to the subfunction.

; for (i = 0; i < 1024; i++)
; A[i] = A[i] * param

I attached a the llvm-ir of this example as param_referenced_in_stmt.ll

2. Building the instruction tree 'bottum-up'

When you build the instruction tree you start code generation at the
memory instructions. After a memory instruction is inserted, you insert
the instructions it depends on bottom-up. This is correct as it keeps
the original order of the memory instructions.
However, emitting the remaining instructions bottom-up has some
drawbacks:
1) The order of the non-memory-instructions may change. I think
keeping the original order as much as possible is important to
make the code generation easy to debug.
2) Using the IRBuilder (which would make the code more readable)
is difficult.
3) The new tree walk is a different algorithm, which makes reuse of
the existing code more complicated. Your changes are e.g. not
used by the vector code generation.

3. Copying instructions outside of the basic block

At the moment the independent blocks pass allows us to ignore
instructions outside of the same basic block. What are your goals
after removing the independent blocks pass? Are you planning to
also copy instructions that are outside of the statement basic block?
I personally am a little afraid of copying instructions from outside
of the basic block (except for the SCEVBuilder). Copying such
instructions may introduce duplicated calculations which can cause
surprising performance changes.
This patch does not introduce copy out-of-bb instructions, so there
is no problem for now. However, we may want to be careful later on.



I was thinking about your patch in the light of the above comments and
wander if it make sense to restructure the change as follows.

Step 1)
In ScopInfo or TempScopInfo, we detect all instructions that are part
of the
instruction tree of a ScopStmt. We save these instructions in some
kind of setlist.

Step 2)
We change the existing code generation algorithm to not blindly code
generate each instruction, but to only copy instructions that are part of
the instruction tree.

Step 3)
We enhance the instruction tree capturing, such that it knowns that
under -polly-scev-codegen the tree ends at any value that can be
expressed by a SCEV. We store those SCEVs and use them during code
generation to become independent of the indvars pass.

Step 4)
We detect the parameters in all SCEVS and pass them (if necessary)
to the OpenMP subfunction. This should fix the OpenMP bug above.

Step ...
Continue to improve the code generation and, at the same time,
remove more and more of the preprocessing requirements.


This is not just a patch review, but it is also infrastructure planning
for future Polly versions. I therefore appreciate comments from several
people. Also, please see my feedback as a 'Request for comments' that
should be carefully sanity checked.


And here some inline comments:

> +/// @brief Generate a new basic block for a polyhedral statement.
> +///
> +class TreeCopyStmtGenerator : public BlockGenerator {

Introducing a new class seems to cause code duplication. I would rather
change the behavior of the BlockGenerator, such that old and unneeded
code is removed. We can temporarily have code duplication guarded by
command-line options, but we should try to remove old functionality as
soon as possible.

> + /// @brief Cast the type of the operand.
> + ///
> + /// @param NewOperand The value need to cast.
> + /// @param DstTy The destinated type.
> + /// @param InsertPos The position to insert the cast instruction.
> + ///
> + /// @return The casted value if the type of NewOperand do not match DstTy, or
> + /// the simply NewOperand otherwise.
> + static Value *castOperandType(Value *NewOperand, Type *DstTy,
> + Instruction *InsertPos);

We should try to use the IRBuilder. This would make introducing our own
Reverse-IRBuilder method unnecessary.

> +void TreeCopyStmtGenerator::codegen(IRBuilder<> &Builder, ScopStmt &Stmt,
> + ValueMapT &GlobalMap, Pass *P) {
> + TreeCopyStmtGenerator Generator(Builder, Stmt, P);
> +
> + BasicBlock *BB = Generator.Statement.getBasicBlock();
> + BasicBlock *CopyBB = SplitBlock(Builder.GetInsertBlock(),
> + Builder.GetInsertPoint(), P);
> + CopyBB->setName("polly.stmt." + BB->getName());
> + Builder.SetInsertPoint(CopyBB->begin());

This looks like code duplication. Maybe we can introduce a new function
beginBlock() to the BlockGenerator, which does the basic block
splitting and which is than called by all BlockGenerators.
(This is a trivial patch that could be committed separtly)

> +
> + ValueMapT BBMap;
> +
> + // To generate code for finder grain statement, we need to remeber the
> + // underlying instruction of the memory access.
> + // typedef ScopStmt::memacc_iterator acc_it;
> + // for (acc_it I = Stmt.memacc_begin(), E = Stmt.memacc_end(); I != E; ++I)
> + // Generate code for access I.

Are you planning to use that code at some point? Otherwise, please
remove the old leftover code.

> +
> + typedef BasicBlock::const_iterator it;
> + for (it II = BB->begin(), IE = BB->end(); II != IE; ++II) {
> + const Instruction *Inst = II;
> + if (MemoryAccess *MA = Stmt.lookupAccessFor(Inst))
I like the idea of walking the instructions in the original order.

As described above, I wonder if we should avoid the actual tree-walk
here and copy all instruction-tree instructions, in the order they
appear in the basic block. We could reuse the old copyInst function, but
would just need to check, if the instruction is part of the instruction
tree.

> + Generator.generateMemoryAccess(MA, Inst, GlobalMap, BBMap);
> + }
> +}
> +
> +void TreeCopyStmtGenerator::generateMemoryAccess(MemoryAccess *MA,
> + const Instruction *Inst,
> + ValueMapT &GlobalMap,
> + ValueMapT &BBMap) {
> + isl_map *CurrentAccessRelation = MA->getAccessRelation();
> + isl_map *NewAccessRelation = MA->getNewAccessRelation();
> +
> + assert(isl_map_has_equal_space(CurrentAccessRelation, NewAccessRelation)
> + && "Current and new access function use different spaces");
> +
> + const Value *PointerOperand = MA->isRead() ?
> + cast<LoadInst>(Inst)->getPointerOperand() :
> + cast<StoreInst>(Inst)->getPointerOperand();
> +
> + // Do we need to generate the address from new access relation?
> + if (NewAccessRelation) {
> + Value*BaseAddress = const_cast<Value*>(MA->getBaseAddr());
> + Value *NewPointer = getNewAccessOperand(NewAccessRelation, BaseAddress,
> + BBMap, GlobalMap);
> + BBMap[PointerOperand] = NewPointer;
> + }
>
> + cloneInstrTree(Inst, GlobalMap, BBMap);

This looks very similar to generateLocationAccessed. We should not
duplicate the code, but rather try to reuse existing code. To make the
code more reusable, we could split the common parts of these functions
into subfunctions which could be reused. Such patches are often trivial
and can be committed beforehand to prepare this patch.
Is this cast always valid? If it is, we should put a comment describing
why it is always valid, otherwise it looks like a hack.

> + // Skip all its children as we already processed them.
> + continue;
> + }
> +
> + // Ignore the parameter not in the subsitution map.
> + // FIXME: This is incorrect after IndependentBlocks pass is removed.
> + if (OldBB != OpInst->getParent()) continue;
> +
> + // Else just clone the instruction.
> + // Note that NewOp is not inserted in any BB now, we will insert it when
> + // it popped form the work stack, so it will be inserted in topological
> + // order.
> + Instruction *NewOp = OpInst->clone();
> + NewOp->setName("p_" + OpInst->getName());
> + DEBUG(dbgs() << "Move to " << *NewOp << "\n");
> + It->set(NewOp);
> + // Insert it to the subsitiuation table.
> + BBMap[OpInst] = NewOp;
> + // Process its operands.
> + WorkStack.push_back(std::make_pair(NewOp, NewOp->op_begin()));

This function is too big. It duplicates functionality of
BlockGenerator::getNewValue() and merges it with the instruction tree
building. I would be more comfortable, if we could split this into
smaller parts.

> + }
> + }
> +
> + return NewI;
> +}
> diff --git a/lib/CodeGen/CodeGeneration.cpp b/lib/CodeGen/CodeGeneration.cpp
> index 5395607..1e4c6b1 100644
> --- a/lib/CodeGen/CodeGeneration.cpp
> +++ b/lib/CodeGen/CodeGeneration.cpp
> @@ -414,7 +414,7 @@ void ClastStmtCodeGen::codegen(const clast_user_stmt *u,
> int VectorDimensions = IVS ? IVS->size() : 1;
>
> if (VectorDimensions == 1) {
> - BlockGenerator::generate(Builder, *Statement, ValueMap, P);
> + TreeCopyStmtGenerator::codegen(Builder, *Statement, ValueMap, P);

I have the feeling with this patch a lot of the code in BlockGenerator
is dead, as the TreeCopyStmtGenerator introduces its own implementation
of a couple of things. However, we can also not remove the dead code, as
it is still used by the vector code generation, which does not yet take
advantage of the new TreeCopyStmtGenerator.

I have the feeling after this patch, we are somewhere in the middle of
a transition. That does not seem to be a good place to be. Changing
everything would make the patch bigger, which is also not a good thing.

Moving the tree detection to the (Temp)ScopInfo phase, may be an option
to split the tree detection procedure and possible later restructuring
of the code generation.

Cheers
Tobi
param_referenced_in_stmt.ll

ether

unread,
Aug 28, 2012, 9:27:56 AM8/28/12
to Tobias Grosser, poll...@googlegroups.com
Hi tobi,

于 2012/8/28 19:42, Tobias Grosser 写道:
> On 08/27/2012 02:26 PM, Hongbin Zheng wrote:
>> From 5084ada05e74ddb40907d9ffcb682895f21a439d Mon Sep 17 00:00:00 2001
>> From: ether<ethe...@gmail.com>
>> Date: Mon, 27 Aug 2012 11:25:06 +0800
>> Subject: [PATCH] Generate code for ScopStmt by copying the entire
>> instruction
>> tree of the instruction.
>
> Hi ether,
>
> thanks for the patch. I like the direction, but would also like to
> understand your exact goal with this patch. Does this patch already
> solve some existing problems, is preparation the ground for some upcoming
> fixes or does it lower the requirements on our preprocessing? Maybe you
> can describe in two sentences which problems this patch is solving?
What about this:

With this patch and subsequent patches, we want to remove the need of
the IndependenceBlock pass (bug12398[1]) and support the fine-grain
statement (bug12402[2]).

[1]http://llvm.org/bugs/show_bug.cgi?id=12398
[2]http://llvm.org/bugs/show_bug.cgi?id=12402
>
> Some comments on the high level algorithm:
>
> 1. I like the idea of walking the instruction tree
>
> Reasoning about instruction trees is the right direction. It is also the
> first step to support finer grained ScopStmts, not defined through basic
> blocks, but through trees of connected instructions.
Yes
>
> I even believe that we should reason about instruction trees way earlier
> than code generation. At the very best we analyze the instruction trees
> of the ScopStmts when we analyze the LLVM-IR the first time
> (TempScopInfo or ScopInfo). At that point we can already use the
> instruction trees to check if a ScopStmt is a reduction.
Sounds reasonable.
>
> We can also use the trees to find the scev expressions used in a
> ScopStmt. This will allow us to solve a big bug in the OpenMP code
> generation: For now, we only reason about scev expressions in memory
> accesses and loop bounds. Scev expressions that are only used as
> operands to the calculations within the ScopStmt are currently not
> analyzed. This means, parameters in these expressions are not detected
> and are not passed to the OpenMP subfunctions.
>
> Here an example where we don't understand that 'param' needs to be
> passed to the subfunction.
>
> ; for (i = 0; i < 1024; i++)
> ; A[i] = A[i] * param
>
> I attached a the llvm-ir of this example as param_referenced_in_stmt.ll
>
> 2. Building the instruction tree 'bottum-up'
>
> When you build the instruction tree you start code generation at the
> memory instructions. After a memory instruction is inserted, you insert
> the instructions it depends on bottom-up. This is correct as it keeps
> the original order of the memory instructions.
> However, emitting the remaining instructions bottom-up has some
> drawbacks:
> 1) The order of the non-memory-instructions may change. I think
> keeping the original order as much as possible is important to
> make the code generation easy to debug.
Ok, a quick fix for the debug issue is name the instructions before
passing them to Polly.
I agree that we should keep the original order as I can imagine that the
order of the copied instructions is different between different runs.
> 2) Using the IRBuilder (which would make the code more readable)
> is difficult.
Maybe we need ask the LLVM people for help.
> 3) The new tree walk is a different algorithm, which makes reuse of
> the existing code more complicated. Your changes are e.g. not
> used by the vector code generation.
In fact, i have a work in progress patch of the
"TreeCopyStmtVectorizer", which copy the tree and vectorize it at the
same time.
>
> 3. Copying instructions outside of the basic block
>
> At the moment the independent blocks pass allows us to ignore
> instructions outside of the same basic block. What are your goals
> after removing the independent blocks pass? Are you planning to
> also copy instructions that are outside of the statement basic block?
I think the answer is "yes"
> I personally am a little afraid of copying instructions from outside
> of the basic block (except for the SCEVBuilder). Copying such
> instructions may introduce duplicated calculations which can cause
> surprising performance changes.
Yes, maybe we need perform CSE on-the-fly (in fact, the SCEVRewriter has
its own CSE logic), but that may slow polly down.
An other option is fix this by schedule some LLVM pass, e.g. GVN, right
after codegen pass.
> This patch does not introduce copy out-of-bb instructions, so there
> is no problem for now. However, we may want to be careful later on.
Ok
>
>
>
> I was thinking about your patch in the light of the above comments and
> wander if it make sense to restructure the change as follows.
>
> Step 1)
> In ScopInfo or TempScopInfo, we detect all instructions that are
> part of the
> instruction tree of a ScopStmt. We save these instructions in some
> kind of setlist.
This may solve the ordering problem.
>
> Step 2)
> We change the existing code generation algorithm to not blindly code
> generate each instruction, but to only copy instructions that are
> part of
> the instruction tree.

>
> Step 3)
> We enhance the instruction tree capturing, such that it knowns that
> under -polly-scev-codegen the tree ends at any value that can be
> expressed by a SCEV.
I think we may need to introduce a InstTree class to capturing the
instructions trees, what do you think about this?
> We store those SCEVs and use them during code
> generation to become independent of the indvars pass.
>
> Step 4)
> We detect the parameters in all SCEVS and pass them (if necessary)
> to the OpenMP subfunction. This should fix the OpenMP bug above.
>
> Step ...
> Continue to improve the code generation and, at the same time,
> remove more and more of the preprocessing requirements.




>
> This is not just a patch review, but it is also infrastructure planning
> for future Polly versions. I therefore appreciate comments from several
> people. Also, please see my feedback as a 'Request for comments' that
> should be carefully sanity checked.
>
>
> And here some inline comments:
>
>> +/// @brief Generate a new basic block for a polyhedral statement.
>> +///
>> +class TreeCopyStmtGenerator : public BlockGenerator {
>
> Introducing a new class seems to cause code duplication. I would rather
> change the behavior of the BlockGenerator, such that old and unneeded
> code is removed. We can temporarily have code duplication guarded by
> command-line options, but we should try to remove old functionality as
> soon as possible.
Ok
Ok.
>
>> +
>> + ValueMapT BBMap;
>> +
>> + // To generate code for finder grain statement, we need to remeber
>> the
>> + // underlying instruction of the memory access.
>> + // typedef ScopStmt::memacc_iterator acc_it;
>> + // for (acc_it I = Stmt.memacc_begin(), E = Stmt.memacc_end(); I
>> != E; ++I)
>> + // Generate code for access I.
>
> Are you planning to use that code at some point? Otherwise, please
> remove the old leftover code.
Ok
>
>> +
>> + typedef BasicBlock::const_iterator it;
>> + for (it II = BB->begin(), IE = BB->end(); II != IE; ++II) {
>> + const Instruction *Inst = II;
>> + if (MemoryAccess *MA = Stmt.lookupAccessFor(Inst))
> I like the idea of walking the instructions in the original order.
>
> As described above, I wonder if we should avoid the actual tree-walk
> here and copy all instruction-tree instructions, in the order they
> appear in the basic block. We could reuse the old copyInst function, but
> would just need to check, if the instruction is part of the instruction
> tree.
Currently we are iterate on the BB, if we have the InstrTree class, we
can simply visit all instruction capture by the InstrTree and copy
them/vectorize them.
Ok
The cloned Instructions should alway have the identical type, but the
value in BBMap may be the induction variable, which may have a different
type.
Ok.
I think we should have some discussion on the tree capturing and the
corresponding class first, I feel that the tree capturing class may
simplify this patch a lot, or completely change this patch.

In my mind, the tree capturing class should be a container of the the
instructions in the instruction tree, and it also store the leaves of
the tree.
During code generation, we can simply copy the instructions in the tree,
and when we reach the leave of the tree, we
1) Reach a value outside the SCoP
2) Reach an instruction can be handled by SCEVRewriter, then we swith to
the SCEVRewriter to generate the rest of the code.

The instruction trees also allow use estimate the "cost" (e.g. register
pressure) of the tree to be vectorized.
>
> Cheers
> Tobi

best regards
ether

Tobias Grosser

unread,
Aug 28, 2012, 9:51:31 AM8/28/12
to ether, poll...@googlegroups.com
On 08/28/2012 03:27 PM, ether wrote:
> Hi tobi,
>
> 于 2012/8/28 19:42, Tobias Grosser 写道:
>> On 08/27/2012 02:26 PM, Hongbin Zheng wrote:
>>> From 5084ada05e74ddb40907d9ffcb682895f21a439d Mon Sep 17 00:00:00 2001
>>> From: ether<ethe...@gmail.com>
>>> Date: Mon, 27 Aug 2012 11:25:06 +0800
>>> Subject: [PATCH] Generate code for ScopStmt by copying the entire
>>> instruction
>>> tree of the instruction.
>>
>> Hi ether,
>>
>> thanks for the patch. I like the direction, but would also like to
>> understand your exact goal with this patch. Does this patch already
>> solve some existing problems, is preparation the ground for some upcoming
>> fixes or does it lower the requirements on our preprocessing? Maybe you
>> can describe in two sentences which problems this patch is solving?
> What about this:
>
> With this patch and subsequent patches, we want to remove the need of
> the IndependenceBlock pass (bug12398[1]) and support the fine-grain
> statement (bug12402[2]).
>
> [1]http://llvm.org/bugs/show_bug.cgi?id=12398
> [2]http://llvm.org/bugs/show_bug.cgi?id=12402

Sounds good.
The IRBuilder is probably not made for adding statements bottom-up. Even
if it could, the flow of adding stuff in the reverse order seems to be
more difficult to understand. (We could do it if needed, but I prefer if
we can avoid it)

>> 3) The new tree walk is a different algorithm, which makes reuse of
>> the existing code more complicated. Your changes are e.g. not
>> used by the vector code generation.
> In fact, i have a work in progress patch of the
> "TreeCopyStmtVectorizer", which copy the tree and vectorize it at the
> same time.

I was afraid of that. ;-) I have no doubts that you can do that (even
without introducing bugs), but it will require a lot of restructuring.
(You know how little I like larger patches. It just takes long until I
understand every little detail).

>> 3. Copying instructions outside of the basic block
>>
>> At the moment the independent blocks pass allows us to ignore
>> instructions outside of the same basic block. What are your goals
>> after removing the independent blocks pass? Are you planning to
>> also copy instructions that are outside of the statement basic block?
> I think the answer is "yes"

Why do you think this is needed? Would it hurt if we limit ourselves to
instruction trees within a basic block and see everything that goes
beyond that as a memory dependence?

>> I personally am a little afraid of copying instructions from outside
>> of the basic block (except for the SCEVBuilder). Copying such
>> instructions may introduce duplicated calculations which can cause
>> surprising performance changes.
>
> Yes, maybe we need perform CSE on-the-fly (in fact, the SCEVRewriter has
> its own CSE logic), but that may slow polly down.
> An other option is fix this by schedule some LLVM pass, e.g. GVN, right
> after codegen pass.

I am less worried about redundant instructions that can be removed by
GVN, but rather that each ScopStmt will reference more instructions and
that depending how the ScopStmts are scheduled, these instructions
cannot be eliminated any more.

>> Step 3)
>> We enhance the instruction tree capturing, such that it knowns that
>> under -polly-scev-codegen the tree ends at any value that can be
>> expressed by a SCEV.
> I think we may need to introduce a InstTree class to capturing the
> instructions trees, what do you think about this?
>> We store those SCEVs and use them during code

If we limit ourselves to InstTrees within a basic block, it seems
sufficient to put each instruction that is part of the InstTree into a
set<Instruction*>. The tree structure is already defined by the
instructions itself. We just need to know which instruction is actually
part of the tree.

We could do code generation as as follows:

for (Inst in BB)
if (Stmt.InstructionTreeElements.contains(Inst))
copyInst(Inst)

>>
>>> +
>>> + typedef BasicBlock::const_iterator it;
>>> + for (it II = BB->begin(), IE = BB->end(); II != IE; ++II) {
>>> + const Instruction *Inst = II;
>>> + if (MemoryAccess *MA = Stmt.lookupAccessFor(Inst))
>> I like the idea of walking the instructions in the original order.
>>
>> As described above, I wonder if we should avoid the actual tree-walk
>> here and copy all instruction-tree instructions, in the order they
>> appear in the basic block. We could reuse the old copyInst function, but
>> would just need to check, if the instruction is part of the instruction
>> tree.
> Currently we are iterate on the BB, if we have the InstrTree class, we
> can simply visit all instruction capture by the InstrTree and copy
> them/vectorize them.

What do you think about the suggestion above?


>>> + Instruction *OpInst = dyn_cast<Instruction>(Operand);
>>> + // Do not clone the constant or parameters not in the
>>> parameters map.
>>> + if (OpInst == 0) continue;
>>> +
>>> + // Now we need to clone Operand to CurBB.
>>> + // Check if we already clone it.
>>> + if (Value *Cloned = BBMap.lookup(OpInst)) {
>>> + DEBUG(dbgs() << "already cloned: " << *Cloned << ".\n");
>>> + It->set(castOperandType(Cloned, OpInst->getType(), NewI));
>>
>> Is this cast always valid? If it is, we should put a comment describing
>> why it is always valid, otherwise it looks like a hack.
> The cloned Instructions should alway have the identical type, but the
> value in BBMap may be the induction variable, which may have a different
> type.

So we either have an Sign/ZeroExtend or a Truncation here. I am not very
worried about type extensions, but we should understand ourselves (and
add a comment) if truncations can be introduced and if those are valid.
Oh yes. We may even commit see the tree capturing as a separate patch on
top of which we implement the other features.

> In my mind, the tree capturing class should be a container of the the
> instructions in the instruction tree, and it also store the leaves of
> the tree.

Yes, I have the same feeling. Would something like the following work?

class InstTree() {
public:
bool isInTree(Instruction *Inst);
bool isLeaf(Value *Inst);

// Return a SCEV if available, otherwise zero.
const SCEV *ge

private:
set<Instruction*> InTreeInstructions;
set<Value*> Leaves;
map<Value*, const *SCEV> ScevsOfTheLeaves;
}

As the tree structure itself is already defined by the instructions
itself, we don't need to capture it again. This could avoid a lot of
duplicated code.

> During code generation, we can simply copy the instructions in the tree,
> and when we reach the leave of the tree, we
> 1) Reach a value outside the SCoP
> 2) Reach an instruction can be handled by SCEVRewriter, then we swith to
> the SCEVRewriter to generate the rest of the code.

Yes.

> The instruction trees also allow use estimate the "cost" (e.g. register
> pressure) of the tree to be vectorized.

True!

Thanks
Tobi

ether

unread,
Aug 28, 2012, 10:46:56 AM8/28/12
to poll...@googlegroups.com, Tobias Grosser
于 2012/8/28 21:51, Tobias Grosser 写道:
>
> The IRBuilder is probably not made for adding statements bottom-up.
> Even if it could, the flow of adding stuff in the reverse order seems
> to be more difficult to understand. (We could do it if needed, but I
> prefer if we can avoid it)
This can be avoid with the InstTree based approach.
>
>>> 3) The new tree walk is a different algorithm, which makes reuse of
>>> the existing code more complicated. Your changes are e.g. not
>>> used by the vector code generation.
>> In fact, i have a work in progress patch of the
>> "TreeCopyStmtVectorizer", which copy the tree and vectorize it at the
>> same time.
>
> I was afraid of that. ;-) I have no doubts that you can do that (even
> without introducing bugs), but it will require a lot of restructuring.
> (You know how little I like larger patches. It just takes long until I
> understand every little detail).
>
>>> 3. Copying instructions outside of the basic block
>>>
>>> At the moment the independent blocks pass allows us to ignore
>>> instructions outside of the same basic block. What are your goals
>>> after removing the independent blocks pass? Are you planning to
>>> also copy instructions that are outside of the statement basic block?
>> I think the answer is "yes"
>
> Why do you think this is needed? Would it hurt if we limit ourselves
> to instruction trees within a basic block and see everything that goes
> beyond that as a memory dependence?
>
>
> I am less worried about redundant instructions that can be removed by
> GVN, but rather that each ScopStmt will reference more instructions
> and that depending how the ScopStmts are scheduled, these instructions
> cannot be eliminated any more.
We can aggressive hoist the newly copied instructions, so that they are
reachable to all its (potential) user, and we can avoid the unnecessary
duplicates.
>
>>> Step 3)
>>> We enhance the instruction tree capturing, such that it knowns that
>>> under -polly-scev-codegen the tree ends at any value that can be
>>> expressed by a SCEV.
>> I think we may need to introduce a InstTree class to capturing the
>> instructions trees, what do you think about this?
>>> We store those SCEVs and use them during code
>
> If we limit ourselves to InstTrees within a basic block, it seems
> sufficient to put each instruction that is part of the InstTree into a
> set<Instruction*>. The tree structure is already defined by the
> instructions itself. We just need to know which instruction is
> actually part of the tree.
Are you worrying about the insert position of the instructions in tree
when the limitation is removed?
I believe we can simply hoist them to the first position that dominated
by all its operands.
Nevertheless, we can start by limiting ourselves to InstTrees within a
basic block first.
>
> We could do code generation as as follows:
>
> for (Inst in BB)
> if (Stmt.InstructionTreeElements.contains(Inst))
> copyInst(Inst)
>
>>>
>>>> +
>>>> + typedef BasicBlock::const_iterator it;
>>>> + for (it II = BB->begin(), IE = BB->end(); II != IE; ++II) {
>>>> + const Instruction *Inst = II;
>>>> + if (MemoryAccess *MA = Stmt.lookupAccessFor(Inst))
>>> I like the idea of walking the instructions in the original order.
>>>
>>> As described above, I wonder if we should avoid the actual tree-walk
>>> here and copy all instruction-tree instructions, in the order they
>>> appear in the basic block. We could reuse the old copyInst function,
>>> but
>>> would just need to check, if the instruction is part of the instruction
>>> tree.
>> Currently we are iterate on the BB, if we have the InstrTree class, we
>> can simply visit all instruction capture by the InstrTree and copy
>> them/vectorize them.
>
> What do you think about the suggestion above?
I agree we copy instruction in this way :)
>
>
>>>> + Instruction *OpInst = dyn_cast<Instruction>(Operand);
>>>> + // Do not clone the constant or parameters not in the
>>>> parameters map.
>>>> + if (OpInst == 0) continue;
>>>> +
>>>> + // Now we need to clone Operand to CurBB.
>>>> + // Check if we already clone it.
>>>> + if (Value *Cloned = BBMap.lookup(OpInst)) {
>>>> + DEBUG(dbgs() << "already cloned: " << *Cloned << ".\n");
>>>> + It->set(castOperandType(Cloned, OpInst->getType(), NewI));
>>>
>>> Is this cast always valid? If it is, we should put a comment describing
>>> why it is always valid, otherwise it looks like a hack.
>> The cloned Instructions should alway have the identical type, but the
>> value in BBMap may be the induction variable, which may have a different
>> type.
>
> So we either have an Sign/ZeroExtend or a Truncation here. I am not
> very worried about type extensions, but we should understand ourselves
> (and add a comment) if truncations can be introduced and if those are
> valid.
Ok.
I think I can sumbit the patch for the InstTree first.
>
>> During code generation, we can simply copy the instructions in the tree,
>> and when we reach the leave of the tree, we
>> 1) Reach a value outside the SCoP
>> 2) Reach an instruction can be handled by SCEVRewriter, then we swith to
>> the SCEVRewriter to generate the rest of the code.
>
> Yes.
>
>> The instruction trees also allow use estimate the "cost" (e.g. register
>> pressure) of the tree to be vectorized.
>
> True!
>
> Thanks
> Tobi
best regards
ether

Tobias Grosser

unread,
Aug 28, 2012, 10:59:18 AM8/28/12
to ether, poll...@googlegroups.com
On 08/28/2012 04:46 PM, ether wrote:
> 于 2012/8/28 21:51, Tobias Grosser 写道:
>>
>> The IRBuilder is probably not made for adding statements bottom-up.
>> Even if it could, the flow of adding stuff in the reverse order seems
>> to be more difficult to understand. (We could do it if needed, but I
>> prefer if we can avoid it)
> This can be avoid with the InstTree based approach.

Great.

>>>> 3. Copying instructions outside of the basic block
>>>>
>>>> At the moment the independent blocks pass allows us to ignore
>>>> instructions outside of the same basic block. What are your goals
>>>> after removing the independent blocks pass? Are you planning to
>>>> also copy instructions that are outside of the statement basic block?
>>> I think the answer is "yes"
>>
>> Why do you think this is needed? Would it hurt if we limit ourselves
>> to instruction trees within a basic block and see everything that goes
>> beyond that as a memory dependence?
>>
>>
>> I am less worried about redundant instructions that can be removed by
>> GVN, but rather that each ScopStmt will reference more instructions
>> and that depending how the ScopStmts are scheduled, these instructions
>> cannot be eliminated any more.
> We can aggressive hoist the newly copied instructions, so that they are
> reachable to all its (potential) user, and we can avoid the unnecessary
> duplicates.

True, there are numerous options how to handle this. I am not saying we
cannot handle this. I just think we should first limit ourselves to the
simpler case, get all this restructuring committed and tested and than
evaluate if a more generic approach is needed.

>>>> Step 3)
>>>> We enhance the instruction tree capturing, such that it knowns that
>>>> under -polly-scev-codegen the tree ends at any value that can be
>>>> expressed by a SCEV.
>>> I think we may need to introduce a InstTree class to capturing the
>>> instructions trees, what do you think about this?
>>>> We store those SCEVs and use them during code
>>
>> If we limit ourselves to InstTrees within a basic block, it seems
>> sufficient to put each instruction that is part of the InstTree into a
>> set<Instruction*>. The tree structure is already defined by the
>> instructions itself. We just need to know which instruction is
>> actually part of the tree.
> Are you worrying about the insert position of the instructions in tree
> when the limitation is removed?
> I believe we can simply hoist them to the first position that dominated
> by all its operands.
> Nevertheless, we can start by limiting ourselves to InstTrees within a
> basic block first.

Good.
Great.

Thanks a lot
Tobi
Reply all
Reply to author
Forward
0 new messages