More exactly, I have the following constraints on my (SIMD) processor:
- certain stores or loads, must be executed 1 cycle after the instruction
generating their input operands ends. For example, if I have:
R1 = R2 + R3
LS[R10] = R1 // this will not produce the correct result because it does not see
the updated value of R1 from the previous instruction
To make this code execute correctly we need to insert a NOP:
R1 = R2 + R3
NOP // or other instruction to fill the delay slot
LS[R10] = R1
- a compare instruction requires to add a NOP after it, before the predicated block
(something like a conditional JMP instruction) starts.
Thank you,
Alex
_______________________________________________
LLVM Developers mailing list
llvm...@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
You can program a post-RA scheduler which will return NoopHazard in the
appropriate circumstances. You can look at the PowerPC target (e.g.
lib/Target/PowerPC/PPCHazardRecognizers.cpp) as an example.
-Hal
--
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory
However, to my surprise, my very simple post-RA scheduler (using my class derived
from ScoreboardHazardRecognizer) is cycling FOREVER after this return NoopHazard, by
calling getHazardType() again and again for this SAME store instruction I found in the
first place with the data hazard problem. So, llc is no longer finishing - I have to stop
the process because of this strange behavior.
I was expecting after the first call to getHazardType() with the respective store
instruction (and return NoopHazard) that the scheduler would move forward to the other
instructions in the DAG/basic-block.
Do you have an idea what can I do to fix this problem?
Thank you very much,
Alex
It should emit a nop if all available instructions return NoopHazard.
>
> Do you have an idea what can I do to fix this problem?
I'm not sure. I recall running into a situation like this years ago, but
I don't recall now how I resolved it. Are you correctly handling the
Stalls argument to getHazardType?
-Hal
--
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory
_______________________________________________
Just to make sure: I am trying to use the post-RA (Register Allocation) scheduler to
avoid data hazards by inserting, if possible, other USEFUL instructions from the program
instead of (just) NOPs. Is this out-of-order scheduling (e.g., using the
ScoreboardHazardRecognizer) that employs useful program instructions instead of NOPs
working well with the post-RA scheduler?
Otherwise, if the post RA scheduler only inserts NOPs, since I have issues using it,
I could as well insert NOPs in the [Target]AsmPrinter.cpp module .
Thank you,
Alex
All of this makes sense, but are you correctly handling the Stalls
argument to getHazardType? What are you doing with it?
-Hal
--
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory
_______________________________________________
Let me state what I have added to my back end to enable scheduling with hazards:
- inspiring from lib/Target/PowerPC/PPCHazardRecognizers.h, I have created a class
[Target]DispatchGroupSBHazardRecognizer : public ScoreboardHazardRecognizer (I use
ScoreboardHazardRecognizer because I hope in the near future to make my class employ in
"out-of-order" execution USEFUL program instructions instead of NOP to handle my data
hazards), implementing for it only a method:
HazardType getHazardType(SUnit *SU, int Stalls);
In this method I check if the current SU is a vector store and the previous
instruction updates the register used by the store, which in my processor is a data
hazard, in which case I give:
return NoopHazard;
and otherwise, I give:
return ScoreboardHazardRecognizer::getHazardType(SU, Stalls);
- I implemented in [Target]InstrInfo.cpp 2 more methods:
- CreateTargetPostRAHazardRecognizer() to register the
[Target]DispatchGroupSBHazardRecognizer()
- insertNoop() which returns the target's NOP
- note that my vector (and scalar) instructions are inspired from the Mips back
end, which has MSAInst (and MipsInst) with NoItinerary InstrItinClass. Currently I am not
using a [Target]Schedule.td specifying functional units, processor and instruction
itineraries. This might be a problem - I guess ScoreboardHazardRecognizer relies on this
information.
In principle, should I maybe use the post-RA MI-scheduler instead of the standard
post-RA scheduler (maybe also
http://llvm.org/docs/doxygen/html/classllvm_1_1MachineSchedStrategy.html ) to deal with my
hazards ?
Following http://llvm.org/devmtg/2014-10/Slides/Estes-MISchedulerTutorial.pdf, the
MI-scheduler also handles hazards, but I guess it's less documented, although the AArch64
is using it.
Thank you,
Alex
if (Stalls == 0 && // no (pipeline) stalls
emittedNoop == false &&
isDataHazard(SU)) {
emittedNoop = true;
return NoopHazard;
}
else
emittedNoop = false;
return ScoreboardHazardRecognizer::getHazardType(SU, Stalls);
}
However, I would like to return Hazard instead of NoopHazard to put useful
instructions instead of NOPs - but when I do return Hazard nothing happens and the data
hazard is not removed actually.
Also, I see that when using the ScoreboardHazardRecognizer with a non-specialized
(the default) getHazardType() method the post-RA scheduler is safely changing the order of
a few instructions in the program (but he is not fixing my data hazards, because he can't
recognize them by himself).
As already said, my problem is that when returning from getHazardType() a Hazard
value if we find a data-hazard, does NOT do anything new to the code, which means that the
data hazard is not solved. (However, fortunately, when we return NoopHazard instead of
Hazard we insert a NOP at the right place, therefore fixing the data hazard.)
Please let me know if you can help me solve my issues with the data hazards by
employing useful instructions in the delay slots instead of NOPs.
Best regards,
Alex
PS: I see there are a few back ends successfully using the HazardRecognizer: PPC, ARM,
AMDGPU, Hexagon, SystemZ (SystemZ was added it seems later than Jul 2016) so I believe for
me it is also a good idea to use it. (Well, on the other hand, at least the Mips back end
has its own MipsDelaySlotFiller.cpp that treats in its own way, exactly the problem I also
want to solve, namely "// Simple pass to fill delay slots with useful instructions.")
On 2/11/2017 2:39 PM, Alex Susu wrote:
> Hello.
> Hal, the problem I have is that it doesn't advance at the next available instruction -
> it always gets the same store. This might be because I did not specify in a file like
> [Target]Schedule.td the functional units, processor and instruction itineraries.
> Regarding the Stalls argument to my method
> [Target]DispatchGroupSBHazardRecognizer::getHazardType() I always get the argument Stalls
> = 0. This is no surprise since in PostRASchedulerList.cpp we have only one call to it, in
> method SchedulePostRATDList::ListScheduleTopDown():
> ScheduleHazardRecognizer::HazardType HT =
> HazardRec->getHazardType(CurSUnit, 0/*no stalls*/);
!!!!!!!!!I am actually wrong - but getHazardType() is called only once in
PostRASchedulerList.cpp:
ConnexDispatchGroupSBHazardRecognizerPreRAScheduler::getHazardType(SU = SU(5): t57:
v128i1,ch = ST_INDIRECT_D<Mem:ST256[inttoptr (i16 52 to
i16*)](tbaa=<0x23143b8>)(alias.scope=<0x230a920>)(noalias=<0x2307b30>,<0x2307450>)> t47,
t104, t49, t56
, Stalls = 0)
isReadAfterWrite(SU = SU(5): t57: v128i1,ch = ST_INDIRECT_D<Mem:ST256[inttoptr (i16 52 to
i16*)](tbaa=<0x23143b8>)(alias.scope=<0x230a920>)(noalias=<0x2307b30>,<0x2307450>)> t47,
t104, t49, t56
)
isReadAfterWrite(): SU->Succs.size() = 1
isReadAfterWrite(): (SU->getNode())->isMachineOpcode() = 1
isReadAfterWrite(): (SU->getNode())->getOpcode() = 65430
isReadAfterWrite(): (SU->getNode())->getMachineOpcode() = 105
isReadAfterWrite(): SU->Succs[0] = SU(4): t73: ch = END_REPEAT_D t57:1
)
isReadAfterWrite(): (SUsucc->getNode())->getMachineOpcode() = 41
isReadAfterWrite(): numUses = 3
isReadAfterWrite(): MCID->getNumOperands() = 4
isReadAfterWrite(): MCID->getNumDefs() = 1
isReadAfterWrite(): SU->Preds.size() = 4
isReadAfterWrite(): SU->Succs.size() = 1
isReadAfterWrite(): SU can store
isReadAfterWrite(): SDN->getNumOperands() = 4
isReadAfterWrite(SU->Preds[0] = t47: v128i16 = <<Unknown Machine Node #65508>> t43, t32
)
isReadAfterWrite(): numDefs = 1
isReadAfterWrite(): PredSDN->getNumOperands() = 2
isReadAfterWrite(): SDN->getOperand(0) = t47: v128i16 = <<Unknown Machine Node #65508>>
t43, t32
isReadAfterWrite(): PredSDN->getOperand(0) = t43: v128i16,v128i1,ch = <<Unknown Machine
Node #65465>><Mem:LD256[inttoptr (i16 51 to
i16*)](tbaa=<0x23143b8>)(alias.scope=<0x2307450>)> t102, t104, t39, t97
isReadAfterWrite(): Found PredSDN == SDN->getOperand(idUse)
Pre-RA: getHazardType(): return NoopHazard
ConnexDispatchGroupSBHazardRecognizerPreRAScheduler::getHazardType(SU = SU(5): t57:
v128i1,ch = ST_INDIRECT_D<Mem:ST256[inttoptr (i16 52 to
i16*)](tbaa=<0x23143b8>)(alias.scope=<0x230a920>)(noalias=<0x2307b30>,<0x2307450>)> t47,
t104, t49, t56
, Stalls = -1)
I actually found a (temporary) solution to my problem: I use the PreEmitNoops()
method instead of getHazardType(). So, I'm implementing the following simple behavior in
PreEmitNoops():
{
if (isDataHazard(SU))
return 1;
return ScoreboardHazardRecognizer::PreEmitNoops(SU);
}
I guess this solution would prevent me to change the order of instructions in order
to avoid generating NOPs to fill the delay slots.
Please let me know your opinion.
Thank you,
Alex