[LLVMdev] Scheduling with RAW hazards

Fraser Cormack

unread,

May 9, 2013, 7:02:25 AM5/9/13

to llv...@cs.uiuc.edu

I have an instruction that takes no operands, and produces two results,
in two consecutive cycles.

I tried both of the following to my Schedule.td file:

InstrItinData<IIMyInstr, [InstrStage<2, [FuncU]>], [1, 2]>,
InstrItinData<IIMyInstr, [InstrStage<1, [FuncU]>, InstrStage<1,
[FuncU]>], [1, 2]>,

From what I can see in examples, these say that the first operand is
ready the cycle after issue, and the second is ready 2 cycles after issue.

But when I issue an instruction that uses both results, it does not obey
this hazard, and is issued the cycle immediately after. Are there any
target hooks I need to implement to get this scheduling correctly?

I noticed that my target was using the default HazardRecognizer, which
is effectively disabled, so I changed it to use the
ScoreboardHazardRecognizer instead. I'm also still using the
SelectionDAG scheduler, but will need to change to the MI scheduler at
some point, to keep up with trunk. Should either of these help?

Thanks,
Fraser

--
Fraser Cormack
Compiler Developer
Codeplay Software Ltd
45 York Place, Edinburgh, EH1 3HP
Tel: 0131 466 0503
Fax: 0131 557 6600
Website: http://www.codeplay.com
Twitter: https://twitter.com/codeplaysoft

This email and any attachments may contain confidential and /or privileged information and is for use by the addressee only. If you are not the intended recipient, please notify Codeplay Software Ltd immediately and delete the message from your computer. You may not copy or forward it,or use or disclose its contents to any other person. Any views or other information in this message which do not relate to our business are not authorized by Codeplay software Ltd, nor does this message form part of any contract unless so stated.
As internet communications are capable of data corruption Codeplay Software Ltd does not accept any responsibility for any changes made to this message after it was sent. Please note that Codeplay Software Ltd does not accept any liability or responsibility for viruses and it is your responsibility to scan any attachments.
Company registered in England and Wales, number: 04567874
Registered office: 81 Linkfield Street, Redhill RH1 6BY

_______________________________________________
LLVM Developers mailing list
LLV...@cs.uiuc.edu http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

Andrew Trick

unread,

May 9, 2013, 1:25:25 PM5/9/13

to Fraser Cormack, Dev

On May 9, 2013, at 4:02 AM, Fraser Cormack <fra...@codeplay.com> wrote:

I have an instruction that takes no operands, and produces two results, in two consecutive cycles.

I tried both of the following to my Schedule.td file:

InstrItinData<IIMyInstr, [InstrStage<2, [FuncU]>], [1, 2]>,
InstrItinData<IIMyInstr, [InstrStage<1, [FuncU]>, InstrStage<1, [FuncU]>], [1, 2]>,

From what I can see in examples, these say that the first operand is ready the cycle after issue, and the second is ready 2 cycles after issue.

Yes, they look equivalent.

But when I issue an instruction that uses both results, it does not obey this hazard, and is issued the cycle immediately after. Are there any target hooks I need to implement to get this scheduling correctly?

Look at -debug-only=pre-RA-sched and confirm that the DAG's edges have the correct latency.

It also prints the current cycle count each time it schedules an instruction.

DEBUG(dbgs() << "\n*** Scheduling [" << CurCycle << "]: ");

You should see a two cycle difference between MyInstr and its second dependent. The scheduler won't insert nops for you. You'd need to do that in a target-specific way.

I noticed that my target was using the default HazardRecognizer, which is effectively disabled, so I changed it to use the ScoreboardHazardRecognizer instead. I'm also still using the SelectionDAG scheduler, but will need to change to the MI scheduler at some point, to keep up with trunk. Should either of these help?

The hazard recognizer won't help you. It only enforces pipeline hazards (other instructions that need FuncU). It's the list scheduler itself that "enforces" operand latency.

MI scheduler allows you to use a new machine model that's simpler for most people who don't need the precision of Itineraries. Maybe not important in your case.

More importantly, SDScheduler is take-it-as-is, and will go away entirely after 3.3. Whereas MI scheduler can be fixed and improved. Now would be a good time to try switching over and start filing bugs. PPC is an example of using MI scheduler out-of-box. Hexagon is an example of customizing it at a high level. You could start off like PPC with minimal customization, but eventually you may want something in between--provide a custom MachineSchedStrategy:

class MyScheduler : public MachineSchedStrategy {...}

namespace llvm {

ScheduleDAGInstrs *createMySched(MachineSchedContext *C) {

ScheduleDAGMI *DAG = new ScheduleDAGMI(C, new MyScheduler());

DAG->addMutation(new MyDAGMutation());

return DAG;

}

} // namespace llvm

static MachineSchedRegistry

MySchedRegistry("mysched", "Custom My scheduler.", createMySched);

-Andy

Fraser Cormack

unread,

May 13, 2013, 9:51:00 AM5/13/13

to Andrew Trick, llv...@cs.uiuc.edu

On 09/05/2013 18:25, Andrew Trick wrote:

On May 9, 2013, at 4:02 AM, Fraser Cormack <fra...@codeplay.com> wrote:

I have an instruction that takes no operands, and produces two results, in two consecutive cycles.

I tried both of the following to my Schedule.td file:

InstrItinData<IIMyInstr, [InstrStage<2, [FuncU]>], [1, 2]>,
InstrItinData<IIMyInstr, [InstrStage<1, [FuncU]>, InstrStage<1, [FuncU]>], [1, 2]>,

From what I can see in examples, these say that the first operand is ready the cycle after issue, and the second is ready 2 cycles after issue.

Yes, they look equivalent.

But when I issue an instruction that uses both results, it does not obey this hazard, and is issued the cycle immediately after. Are there any target hooks I need to implement to get this scheduling correctly?

Look at -debug-only=pre-RA-sched and confirm that the DAG's edges have the correct latency.

It also prints the current cycle count each time it schedules an instruction.

DEBUG(dbgs() << "\n*** Scheduling [" << CurCycle << "]: ");

You should see a two cycle difference between MyInstr and its second dependent. The scheduler won't insert nops for you. You'd need to do that in a target-specific way.

Yes, I see the two-cycle difference between the two instructions. I enabled the post-RA scheduler, and noticed that it cared about the latencies, and started to rearrange the instructions accordingly. Is it necessary to use the post-RA scheduler to enforce such latencies?

I noticed that my target was using the default HazardRecognizer, which is effectively disabled, so I changed it to use the ScoreboardHazardRecognizer instead. I'm also still using the SelectionDAG scheduler, but will need to change to the MI scheduler at some point, to keep up with trunk. Should either of these help?

The hazard recognizer won't help you. It only enforces pipeline hazards (other instructions that need FuncU). It's the list scheduler itself that "enforces" operand latency.

Ah okay, thank you.

MI scheduler allows you to use a new machine model that's simpler for most people who don't need the precision of Itineraries. Maybe not important in your case.

More importantly, SDScheduler is take-it-as-is, and will go away entirely after 3.3. Whereas MI scheduler can be fixed and improved. Now would be a good time to try switching over and start filing bugs. PPC is an example of using MI scheduler out-of-box. Hexagon is an example of customizing it at a high level. You could start off like PPC with minimal customization, but eventually you may want something in between--provide a custom MachineSchedStrategy:

class MyScheduler : public MachineSchedStrategy {...}

namespace llvm {

ScheduleDAGInstrs *createMySched(MachineSchedContext *C) {

ScheduleDAGMI *DAG = new ScheduleDAGMI(C, new MyScheduler());

DAG->addMutation(new MyDAGMutation());

return DAG;

}

} // namespace llvm

static MachineSchedRegistry

MySchedRegistry("mysched", "Custom My scheduler.", createMySched);

-Andy

I've had a quick experiment with the MI Scheduler, and have a few further questions. From what I can see, if I pass -enable-misched to the compiler, it only works above O1, though addOptimizedRegAlloc(). Is O0 not supported without adding the pass myself in my PassConfig?

How does (or will) the MI Scheduler interact with the existing SD Scheduler? It seems as though they both run together at the moment.

Thanks,
Fraser

Andrew Trick

unread,

May 13, 2013, 1:21:13 PM5/13/13

to Fraser Cormack, Dev

On May 13, 2013, at 6:51 AM, Fraser Cormack <fra...@codeplay.com> wrote:

On 09/05/2013 18:25, Andrew Trick wrote:

On May 9, 2013, at 4:02 AM, Fraser Cormack <fra...@codeplay.com> wrote:

I have an instruction that takes no operands, and produces two results, in two consecutive cycles.

I tried both of the following to my Schedule.td file:

InstrItinData<IIMyInstr, [InstrStage<2, [FuncU]>], [1, 2]>,
InstrItinData<IIMyInstr, [InstrStage<1, [FuncU]>, InstrStage<1, [FuncU]>], [1, 2]>,

From what I can see in examples, these say that the first operand is ready the cycle after issue, and the second is ready 2 cycles after issue.

Yes, they look equivalent.

But when I issue an instruction that uses both results, it does not obey this hazard, and is issued the cycle immediately after. Are there any target hooks I need to implement to get this scheduling correctly?

Look at -debug-only=pre-RA-sched and confirm that the DAG's edges have the correct latency.

It also prints the current cycle count each time it schedules an instruction.
DEBUG(dbgs() << "\n*** Scheduling [" << CurCycle << "]: ");

You should see a two cycle difference between MyInstr and its second dependent. The scheduler won't insert nops for you. You'd need to do that in a target-specific way.

Yes, I see the two-cycle difference between the two instructions. I enabled the post-RA scheduler, and noticed that it cared about the latencies, and started to rearrange the instructions accordingly. Is it necessary to use the post-RA scheduler to enforce such latencies?

I noticed that my target was using the default HazardRecognizer, which is effectively disabled, so I changed it to use the ScoreboardHazardRecognizer instead. I'm also still using the SelectionDAG scheduler, but will need to change to the MI scheduler at some point, to keep up with trunk. Should either of these help?

SD scheduler has several heuristics that can defeat each other. Without debugging, I can't say what the problem is. PostRA scheduler was originally meant for targets where precise latency matters, so that's probably a better fit. Hopefully we can make MachineScheduler work for you in the long run.

The hazard recognizer won't help you. It only enforces pipeline hazards (other instructions that need FuncU). It's the list scheduler itself that "enforces" operand latency.

Ah okay, thank you.

MI scheduler allows you to use a new machine model that's simpler for most people who don't need the precision of Itineraries. Maybe not important in your case.

More importantly, SDScheduler is take-it-as-is, and will go away entirely after 3.3. Whereas MI scheduler can be fixed and improved. Now would be a good time to try switching over and start filing bugs. PPC is an example of using MI scheduler out-of-box. Hexagon is an example of customizing it at a high level. You could start off like PPC with minimal customization, but eventually you may want something in between--provide a custom MachineSchedStrategy:

class MyScheduler : public MachineSchedStrategy {...}

namespace llvm {
ScheduleDAGInstrs *createMySched(MachineSchedContext *C) {
ScheduleDAGMI *DAG = new ScheduleDAGMI(C, new MyScheduler());
DAG->addMutation(new MyDAGMutation());
return DAG;
}
} // namespace llvm

static MachineSchedRegistry
MySchedRegistry("mysched", "Custom My scheduler.", createMySched);

-Andy

I've had a quick experiment with the MI Scheduler, and have a few further questions. From what I can see, if I pass -enable-misched to the compiler, it only works above O1, though addOptimizedRegAlloc(). Is O0 not supported without adding the pass myself in my PassConfig?

MachineScheduler is integrated with the regalloc pipeline because it uses and updates LiveIntervals. -O0 does not compute LiveIntervals.

You could enable MachineScheduler using PassConfig. It should just compute LIS on demand in that case.

How does (or will) the MI Scheduler interact with the existing SD Scheduler? It seems as though they both run together at the moment.

Thanks,
Fraser

Good question. There's no point in running SD Scheduler when MachineScheduler is enabled. But there's no way to disable it, other than -pre-RA-sched=source. You can automatically get all the options you need using this hook: