On Sep 12, 2016, at 8:51 AM, vivek pandya via llvm-dev <llvm...@lists.llvm.org> wrote:1 ) As LLVM MI is already in SSA form before reg allocation so for LLVM I think it does not require to build SSA graph and converting it back after optimization completed as mentioned in [1]2 ) We would like to add a pass similar to SCCP.cpp (Sparse Conditional ConstantPropagation based on Wegman and Zadeck's work http://dl.acm.org/citation.cfm?id=103136) as desribed in [1]. This pass will be scheduled to run before register allocation.3 ) Output of the pass added in Step 2 will be a Map of def to instructions pointers (instructions which can be used to remat the given live range). The map will contain live ranges which is due to single instruction and multiple instructions.
4 ) The remat APIs defined in LiveRangeEdit.cpp will use analysis from the Mapwhen a spill is required for RA.5 ) The remat transformation APIs like rematerializeAt() will be teached to rematlive ranges with multiple instructions too.6 ) A cost analysis will be require to decide between remat and spill. This should be based on at least two factors register pressure and spill costFew points:--------------* The analysis pass to be addes as per (2) will use target specific informationfrom TargetInstrInfo.cpp as the current remat infrastructure uses.* This approach will not be on demand as the current approach is (i.e remat specificcode will be executed only if there is a spill) so the pass in (2) can be anoverhead so we may want it to enable only for higher level of optimization.* Will it be possible to use existing SCCP.cpp code with few modification to latticeand related mathematical operation so that it can serve both purpose?* No changes in current register allocators or spill framework will be requiredbecause remat entry point will be LiveRangeEdit.Any other way with less overhead is always welcomed.Please help us developing a plan to implement this.Hoping for comments!Sincerely,Vivek
_______________________________________________
LLVM Developers mailing list
llvm...@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
There is a sequence of instructions used to materialize the constant, the firstone (the lis) is trivially rematerialiable, and the others depend only on that one,and have no side effects. If we otherwise needed to spill the constant, we mightwish to move the entire set of instructions that compute the value into the loop body.(Many thanks to Hal Finkel for this example and head start)
We are following very old but effective paper "Rematerialization"http://dl.acm.org/citation.cfm?id=143143 ------------------------------[1]
This extension will specially improve code quality for RICS backends likepowerpc, MIPS, ARM, AArch64 etc.
Here is a tentative apporach ( after reading the above mentioned paper and current remat code) that I would like to follow.
Please share your views because this may be totally wrong direction. Also I willbe happy if this gets into main line LLVM code but if community don't wantto make remat heavy than please guide me for my class project perspective.
1 ) As LLVM MI is already in SSA form before reg allocation so for LLVM I think it does not require to build SSA graph and converting it back after optimization completed as mentioned in [1]
2 ) We would like to add a pass similar to SCCP.cpp (Sparse Conditional ConstantPropagation based on Wegman and Zadeck's work http://dl.acm.org/citation.cfm?id=103136) as desribed in [1]. This pass will be scheduled to run before register allocation.
3 ) Output of the pass added in Step 2 will be a Map of def to instructions pointers (instructions which can be used to remat the given live range). The map will contain live ranges which is due to single instruction and multiple instructions.
4 ) The remat APIs defined in LiveRangeEdit.cpp will use analysis from the Mapwhen a spill is required for RA.
5 ) The remat transformation APIs like rematerializeAt() will be teached to rematlive ranges with multiple instructions too.
6 ) A cost analysis will be require to decide between remat and spill. This should be based on at least two factors register pressure and spill cost
Few points:--------------* The analysis pass to be addes as per (2) will use target specific informationfrom TargetInstrInfo.cpp as the current remat infrastructure uses.
* This approach will not be on demand as the current approach is (i.e remat specificcode will be executed only if there is a spill) so the pass in (2) can be anoverhead so we may want it to enable only for higher level of optimization.
* Will it be possible to use existing SCCP.cpp code with few modification to latticeand related mathematical operation so that it can serve both purpose?
* No changes in current register allocators or spill framework will be requiredbecause remat entry point will be LiveRangeEdit.
Any other way with less overhead is always welcomed.Please help us developing a plan to implement this.
Hoping for comments!
Sincerely,Vivek
This sounds overly complex. Can you implement this without needing the new side structure? Maintaining extra state and keeping it up to date is expensive. (From a maintenance and code complexity perspective.)
There is a sequence of instructions used to materialize the constant, the firstone (the lis) is trivially rematerialiable, and the others depend only on that one,and have no side effects. If we otherwise needed to spill the constant, we mightwish to move the entire set of instructions that compute the value into the loop body.(Many thanks to Hal Finkel for this example and head start)
We are following very old but effective paper "Rematerialization"http://dl.acm.org/citation.cfm?id=143143 ------------------------------[1]
This extension will specially improve code quality for RICS backends likepowerpc, MIPS, ARM, AArch64 etc.
Here is a tentative apporach ( after reading the above mentioned paper and current remat code) that I would like to follow.
Please share your views because this may be totally wrong direction. Also I willbe happy if this gets into main line LLVM code but if community don't wantto make remat heavy than please guide me for my class project perspective.
1 ) As LLVM MI is already in SSA form before reg allocation so for LLVM I think it does not require to build SSA graph and converting it back after optimization completed as mentioned in [1]
2 ) We would like to add a pass similar to SCCP.cpp (Sparse Conditional ConstantPropagation based on Wegman and Zadeck's work http://dl.acm.org/citation.cfm?id=103136) as desribed in [1]. This pass will be scheduled to run before register allocation.
3 ) Output of the pass added in Step 2 will be a Map of def to instructions pointers (instructions which can be used to remat the given live range). The map will contain live ranges which is due to single instruction and multiple instructions.
This would be unfortunate. Not fatal, just unfortunate.4 ) The remat APIs defined in LiveRangeEdit.cpp will use analysis from the Mapwhen a spill is required for RA.
5 ) The remat transformation APIs like rematerializeAt() will be teached to rematlive ranges with multiple instructions too.
6 ) A cost analysis will be require to decide between remat and spill. This should be based on at least two factors register pressure and spill cost
Few points:--------------* The analysis pass to be addes as per (2) will use target specific informationfrom TargetInstrInfo.cpp as the current remat infrastructure uses.
* This approach will not be on demand as the current approach is (i.e remat specificcode will be executed only if there is a spill) so the pass in (2) can be anoverhead so we may want it to enable only for higher level of optimization.
On Sep 12, 2016, at 10:14 AM, Andrew Trick via llvm-dev <llvm...@lists.llvm.org> wrote:On Sep 12, 2016, at 8:51 AM, vivek pandya via llvm-dev <llvm...@lists.llvm.org> wrote:1 ) As LLVM MI is already in SSA form before reg allocation so for LLVM I think it does not require to build SSA graph and converting it back after optimization completed as mentioned in [1]2 ) We would like to add a pass similar to SCCP.cpp (Sparse Conditional ConstantPropagation based on Wegman and Zadeck's work http://dl.acm.org/citation.cfm?id=103136) as desribed in [1]. This pass will be scheduled to run before register allocation.3 ) Output of the pass added in Step 2 will be a Map of def to instructions pointers (instructions which can be used to remat the given live range). The map will contain live ranges which is due to single instruction and multiple instructions.LiveIntervals maintains a quasi-SSA form via VNInfo. It does not allow efficient def-use queries, but use-def is there, which is all that you should need.
It would be great to have better remat during regalloc, but please try to avoid building additional state that needs to be maintained.
Hi,I've been looking at this myself for ARM, and came up with a much simpler solution: lower immediate materializations to a post-RA pseudo and expand the chain of materialization instructions after register allocation / remat. Remat only sees one instruction with no dependencies.Did you look down this route and discount it?
On Sep 19, 2016, at 11:17 AM, vivek pandya via llvm-dev <llvm...@lists.llvm.org> wrote:On Mon, Sep 19, 2016 at 6:21 PM, James Molloy <ja...@jamesmolloy.co.uk> wrote:Hi,I've been looking at this myself for ARM, and came up with a much simpler solution: lower immediate materializations to a post-RA pseudo and expand the chain of materialization instructions after register allocation / remat. Remat only sees one instruction with no dependencies.Did you look down this route and discount it?No actually I am not much familiar with this topic so I mostly reply on research papers available.But your idea seems to be simple and good solution but I am not sure if this can cover every possible cases.
From: "Bruce Hoult" <br...@hoult.org>
To: "vivek pandya" <vivekv...@gmail.com>
Cc: "llvm-dev" <llvm...@lists.llvm.org>, "Hal Finkel" <hfi...@anl.gov>, "Matthias Braun" <ma...@braunis.de>
Sent: Monday, September 19, 2016 9:10:17 AM
Subject: Re: [llvm-dev] [RFC] Register Rematerialization (remat) ExtensionThe idea seems sound, but do you really have a CPU in which such a complex rematerialization is better than an L1 cache load from the stack frame?lis 3, 12414ori 3, 3, 27470sldi 3, 3, 32oris 3, 3, 35809ori 30, 3, 20615I'm not familiar with modern PPC64 but seems like a lose on PPC G5 and from the docs I quickly found (2 cycle latency on dependent int ALU ops) Power8 too.OK, maybe (if I didn't screw up the rldimi):lis 3, 12414lis 30, 35809ori 3, 3, 27470ori 30, 30, 20615rldimi 30, 3, 32, 0Or is there something that optimizes such sequences building constants?
From: "Quentin Colombet via llvm-dev" <llvm...@lists.llvm.org>
To: "vivek pandya" <vivekv...@gmail.com>
Cc: "llvm-dev" <llvm...@lists.llvm.org>, "Nirav Rana" <h201...@pilani.bits-pilani.ac.in>, "Matthias Braun" <ma...@braunis.de>
Sent: Monday, September 19, 2016 1:27:10 PM
Subject: Re: [llvm-dev] [RFC] Register Rematerialization (remat) Extension
Hi Vivek,On Sep 19, 2016, at 11:17 AM, vivek pandya via llvm-dev <llvm...@lists.llvm.org> wrote:On Mon, Sep 19, 2016 at 6:21 PM, James Molloy <ja...@jamesmolloy.co.uk> wrote:Hi,I've been looking at this myself for ARM, and came up with a much simpler solution: lower immediate materializations to a post-RA pseudo and expand the chain of materialization instructions after register allocation / remat. Remat only sees one instruction with no dependencies.Did you look down this route and discount it?No actually I am not much familiar with this topic so I mostly reply on research papers available.But your idea seems to be simple and good solution but I am not sure if this can cover every possible cases.This is the way all targets deal with simple rematerialization involving several instructions in LLVM AFAIK.Basically, the target defines a pseudo instruction that encodes this sequence of instructions and expands it after register allocation. This is a case by case thing, there is no patch that can be generalized for other target.For instance, look at the expansion of AArch64::MOVi64imm.The bottom line is, our rematerialization scheme is currently limited, but I am not sure your proposal get us beyond what we already support.