[llvm-dev] Pseudo-instruction that overwrites its input register

88 views
Skip to first unread message

Dr. ERDI Gergo via llvm-dev

unread,
May 28, 2017, 2:27:35 AM5/28/17
to llvm...@lists.llvm.org
Hi,

I'd like to define a pseudo-instruction whose expansion will, as a
side-effect, overwrite an input register's value: the pseudo-instruction

ldw r1:r2, P

to load 2 bytes from memory address P is to be expaneded to

ld r1, P+
ld r2, P

where "ld _, P+" is an instruction that loads a single byte from P, and
post-increments P by one.

How can I represent this behaviour in LLVM? Currently, I have

let Constraints = "@earlyclobber $reg" in
def LDWRdPtr : Pseudo<(outs DREGS:$reg),
(ins PTRREGS:$ptrreg),
"ldw\t$reg, $ptrreg",
[(set i16:$reg, (load i16:$ptrreg))]>,
Requires<[HasSRAM]>;

The problem, of course, is that with this definition I end up with code
which assumes it is equivalent to save P before 'ldw r1:r2,P' or after. I
tried adding "@earlyclobber $ptrreg" as a Constraint, but that just leads
to an assertion failure during codegen (I assume because @earlyclobber is
for output ports)

void llvm::MachineOperand::setIsEarlyClobber(bool): Assertion `isReg() && IsDef && "Wrong MachineOperand accessor"' failed.


Thanks,
Gergo
_______________________________________________
LLVM Developers mailing list
llvm...@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

David Chisnall via llvm-dev

unread,
May 28, 2017, 3:07:50 AM5/28/17
to Dr. ERDI Gergo, llvm...@lists.llvm.org
On 28 May 2017, at 07:27, Dr. ERDI Gergo via llvm-dev <llvm...@lists.llvm.org> wrote:
>
> Hi,
>
> I'd like to define a pseudo-instruction whose expansion will, as a side-effect, overwrite an input register's value: the pseudo-instruction
>
> ldw r1:r2, P
>
> to load 2 bytes from memory address P is to be expaneded to
>
> ld r1, P+
> ld r2, P
>
> where "ld _, P+" is an instruction that loads a single byte from P, and post-increments P by one.
>
> How can I represent this behaviour in LLVM? Currently, I have
>
> let Constraints = "@earlyclobber $reg" in
> def LDWRdPtr : Pseudo<(outs DREGS:$reg),
> (ins PTRREGS:$ptrreg),
> "ldw\t$reg, $ptrreg",
> [(set i16:$reg, (load i16:$ptrreg))]>,
> Requires<[HasSRAM]>;
>
> The problem, of course, is that with this definition I end up with code which assumes it is equivalent to save P before 'ldw r1:r2,P' or after. I tried adding "@earlyclobber $ptrreg" as a Constraint, but that just leads to an assertion failure during codegen (I assume because @earlyclobber is for output ports)

You need to express the P as both an input and output operand and add a constraint that both must be the same register.

David

Dr. ERDI Gergo via llvm-dev

unread,
May 28, 2017, 6:04:37 AM5/28/17
to David Chisnall, llvm...@lists.llvm.org
On Sun, 28 May 2017, David Chisnall wrote:

>> let Constraints = "@earlyclobber $reg" in
>> def LDWRdPtr : Pseudo<(outs DREGS:$reg),
>> (ins PTRREGS:$ptrreg),
>> "ldw\t$reg, $ptrreg",
>> [(set i16:$reg, (load i16:$ptrreg))]>,
>> Requires<[HasSRAM]>;
>>
>> The problem, of course, is that with this definition I end up with code which assumes it is equivalent to save P before 'ldw r1:r2,P' or after. I tried adding "@earlyclobber $ptrreg" as a Constraint, but that just leads to an assertion failure during codegen (I assume because @earlyclobber is for output ports)
>
> You need to express the P as both an input and output operand and add a constraint that both must be the same register.

OK, but then the pattern will have to include that extra output operand
somehow, right? What would the pattern need to be so that during ISel,
this LDWRdPtr instruction with the extra output still matches?

Nemanja Ivanovic via llvm-dev

unread,
May 29, 2017, 11:43:09 PM5/29/17
to Dr. ERDI Gergo, llvm-dev
This is typically accomplished with something like PPC's `RegConstraint` and `NoEncode`. You can see examples of it that are very similar to what you're after in PPC's load/store with update forms (i.e. load a value and update the base register with the effective address - these are used for pre-increment loads/stores).
For example: the definition of LBZU and friends in lib/Target/PowerPC/PPCInstrInfo.td.
For a simpler example of just the `RegConstraint` usage (as it doesn't use a compound node like PPC's address nodes), you can look at all the fused multiply-add such as XSMADDADP in lib/Target/PowerPC/PPCInstrVSX.td.

Hope this helps.

Dr. ERDI Gergo via llvm-dev

unread,
May 30, 2017, 9:01:28 AM5/30/17
to Nemanja Ivanovic, llvm-dev
On Tue, 30 May 2017, Nemanja Ivanovic wrote:

> This is typically accomplished with something like PPC's `RegConstraint` and
> `NoEncode`. You can see examples of it that are very similar to what you're after in
> PPC's load/store with update forms (i.e. load a value and update the base register
> with the effective address - these are used for pre-increment loads/stores).
> For example: the definition of LBZU and friends in lib/Target/PowerPC/PPCInstrInfo.td.
> For a simpler example of just the `RegConstraint` usage (as it doesn't use a compound
> node like PPC's address nodes), you can look at all the fused multiply-add such as
> XSMADDADP in lib/Target/PowerPC/PPCInstrVSX.td.
>
> Hope this helps.

Thanks!

However, none of the NoEncode examples in PPCInstrInfo.td seem to have an
isel pattern; and the VSX examples, like XSMADDADP, seem to match on
setting a single output:

let BaseName = "XSMADDADP" in {
let isCommutable = 1 in
def XSMADDADP : XX3Form<60, 33,
(outs vsfrc:$XT), (ins vsfrc:$XTi, vsfrc:$XA, vsfrc:$XB),
"xsmaddadp $XT, $XA, $XB", IIC_VecFP,
[(set f64:$XT, (fma f64:$XA, f64:$XB, f64:$XTi))]>,
RegConstraint<"$XTi = $XT">, NoEncode<"$XTi">,
AltVSXFMARel;

If I'm reading this right, this matches an instruction that updates $XT
by taking the current $XT, and two extra args in $XA and $XB. However, my
situation would be something akin to

(set f64:$XC, (fma f64:$XA, f64:$XB, f64:$XTi))

with the extra constraint that $XTi is overwritten in the process.

Is there maybe a way to write a pattern like

(set (tuple f64:$XC, f64:$XT), (fma f64:$XA, f64:$XB, f64:$XTi))

that would match

(set f64:$XC, (fma f64:$XA, f64:$XB, f64:$XTi))

by automatically lifting it to store $XT as well? (of course, with a
RegConstraint that $XT = $XTi)

Nemanja Ivanovic via llvm-dev

unread,
May 30, 2017, 9:54:49 AM5/30/17
to Dr. ERDI Gergo, llvm-dev
The reason the ones in PPCInstrInfo.td don't have the patterns to match is the reason they are more analogous to your problem. Namely, tblgen does not have a way to produce nodes with more than one result. The load-with-update instructions do exactly that - one of the inputs is also an output, but the other output is independent (and necessarily a separate register). The FMA variants have patterns in the .td file because they don't have multiple results - they just have one of their operands being both an input and an output.

So the idea is that you specify your `outs` in the instruction definition, one of those will have a `RegConstraint` on them and finally, you emit these nodes in your <TargetName>ISelDAGToDAG.cpp.

Dr. ERDI Gergo via llvm-dev

unread,
Jun 3, 2017, 8:13:27 AM6/3/17
to Nemanja Ivanovic, llvm-dev
On Tue, 30 May 2017, Nemanja Ivanovic wrote:

> So the idea is that you specify your `outs` in the instruction definition, one of
> those will have a `RegConstraint` on them and finally, you emit these nodes in your
> <TargetName>ISelDAGToDAG.cpp.

OK, thanks, I now get the basic idea -- but I'm still struggling with the
implementation.

In my ISelDATToDAG, if I match something like

Selecting: t17: i16,ch = load<LD2[%v25](align=1)(dereferenceable)> t16:1, t2, undef:i16

then whatever I return as the machine node, it will have to be of the same
type, i.e. (i16, ch), right? But if I have this extra output port for the
changed address register, that means my output is now (i16, i16, ch). It
is unclear to me how to reconcile that with the original abstract node
that I'm matching on.

In more concrete terms, I tried ignoring this and just copying the address
argument and the chain:

const LoadSDNode *LD = cast<LoadSDNode>(N);
int Offs = cast<ConstantSDNode>(LD->getOffset())->getSExtValue();
if (AM == ISD::UNINDEXED && Offs == 0) {
SDNode* LDW = CurDAG->getMachineNode(
AVR::LDWRdPtr, SDLoc(N), VT, PtrVT, MVT::Other,
LD->getBasePtr(), LD->getChain());

ReplaceNode(N, LDW);
return true;
}

but this fails with

/home/cactus/prog/rust/rust-avr/llvm/include/llvm/Support/Casting.h:222:
typename std::enable_if<(! llvm::is_simple_type<Y>::value), typename
llvm::cast_retty<X, const Y>::ret_type>::type llvm::cast(const Y&)
[with X = llvm::ConstantSDNode;
Y = llvm::SDValue;
typename std::enable_if<(!llvm::is_simple_type<Y>::value),
typename llvm::cast_retty<X, const Y>::ret_type>::type = llvm::ConstantSDNode*]:
Assertion `isa<X>(Val) && "cast<Ty>() argument of incompatible type!"' failed.

Any more hints, please?

Thanks,
Gergo

Dr. ERDI Gergo via llvm-dev

unread,
Jun 3, 2017, 11:38:07 PM6/3/17
to Nemanja Ivanovic, llvm-dev
Sorry, that was a complete red herring (the cast<ConstantSDNode> failed),
here's the real error message I get with the below approach:

llc: CodeGen/SelectionDAG/SelectionDAG.cpp:6518:
void llvm::SelectionDAG::ReplaceAllUsesWith(llvm::SDNode*, llvm::SDNode*):
Assertion `(!From->hasAnyUseOfValue(i) || From->getValueType(i) == To->getValueType(i))
&& "Cannot use this version of ReplaceAllUsesWith!"'
failed.

which I assume is because of the type difference I mentioned ((i16, i16,
ch) vs (i16, ch) on the output ports)

Dr. ERDI Gergo via llvm-dev

unread,
Jun 4, 2017, 3:04:53 AM6/4/17
to Nemanja Ivanovic, llvm-dev
On Sun, 4 Jun 2017, Dr. ERDI Gergo wrote:

> the type difference I mentioned ((i16, i16, ch) vs (i16, ch) on the
> output ports)

OK I managed to make some progress on this, by using a MergeValues node:

SDNode* LDW = CurDAG->getMachineNode(
AVR::LDWRdPtr, SDLoc(N), VT, PtrVT, MVT::Other,
LD->getBasePtr(), LD->getChain());

SDValue Unpack[] = { SDValue(LDW, 0), SDValue(LDW, 2) };
SDNode* NN = CurDAG->getMergeValues(Unpack, SDLoc(N)).getNode();

ReplaceNode(N, NN);


which gets me from

t17: i16,ch = load<LD2[%v25](align=1)(dereferenceable)> t16:1, t2, undef:i16

to

t24: i16,i16,ch = LDWRdPtr t2, t16:1

looking tood; but then it fails during scheduling with

llc: CodeGen/SelectionDAG/InstrEmitter.cpp:303:
unsigned int llvm::InstrEmitter::getVR(
llvm::SDValue,
llvm::DenseMap<llvm::SDValue, unsigned int>&):
Assertion `I != VRBaseMap.end() && "Node emitted out of order - late"' failed.

For reference, the full DAG before and after ISEL:


SelectionDAG has 22 nodes:
t0: ch = EntryToken
t2: i16,ch = CopyFromReg t0, Register:i16 %vreg0
t5: i16,ch = load<LD2[%v25](align=1)(dereferenceable)> t0, t2, undef:i16
t9: ch,glue = callseq_start t5:1, TargetConstant:i16<0>
t11: ch,glue = CopyToReg t9, Register:i16 %R25R24, t5
t13: ch,glue = CALL t11, TargetGlobalAddress:i16<i8 (i16)* @read_ram> 0, Register:i16 %R25R24, RegisterMask:Untyped, t11:1
t14: ch,glue = callseq_end t13, TargetConstant:i16<0>, TargetConstant:i16<0>, t13:1
t16: i8,ch,glue = CopyFromReg t14, Register:i8 %R24, t14:1


t17: i16,ch = load<LD2[%v25](align=1)(dereferenceable)> t16:1, t2, undef:i16

t18: ch,glue = callseq_start t17:1, TargetConstant:i16<0>
t19: ch,glue = CopyToReg t18, Register:i16 %R25R24, t17
t20: ch,glue = CALL t19, TargetGlobalAddress:i16<i8 (i16)* @read_ram> 0, Register:i16 %R25R24, RegisterMask:Untyped, t19:1
t21: ch,glue = callseq_end t20, TargetConstant:i16<0>, TargetConstant:i16<0>, t20:1
t22: i8,ch,glue = CopyFromReg t21, Register:i8 %R24, t21:1
t23: ch = RET_FLAG t22:1

SelectionDAG has 23 nodes:
t0: ch = EntryToken
t2: i16,ch = CopyFromReg t0, Register:i16 %vreg0
t9: i16,ch,glue = ADJCALLSTACKDOWN TargetConstant:i16<0>, t27:1
t11: ch,glue = CopyToReg t9:1, Register:i16 %R25R24, t27
t13: ch,glue = CALLk TargetGlobalAddress:i16<i8 (i16)* @read_ram> 0, Register:i16 %R25R24, RegisterMask:Untyped, t11, t11:1
t14: i16,ch,glue = ADJCALLSTACKUP TargetConstant:i16<0>, TargetConstant:i16<0>, t13, t13:1
t18: i16,ch,glue = ADJCALLSTACKDOWN TargetConstant:i16<0>, t25:1
t19: ch,glue = CopyToReg t18:1, Register:i16 %R25R24, t25
t20: ch,glue = CALLk TargetGlobalAddress:i16<i8 (i16)* @read_ram> 0, Register:i16 %R25R24, RegisterMask:Untyped, t19, t19:1
t21: i16,ch,glue = ADJCALLSTACKUP TargetConstant:i16<0>, TargetConstant:i16<0>, t20, t20:1
t16: i8,ch,glue = CopyFromReg t14:1, Register:i8 %R24, t14:2
t24: i16,i16,ch = LDWRdPtr t2, t16:1
t25: i16,ch = merge_values t24, t24:2
t26: i16,i16,ch = LDWRdPtr t2, t0
t27: i16,ch = merge_values t26, t26:2
t22: i8,ch,glue = CopyFromReg t21:1, Register:i8 %R24, t21:2
t23: ch = RET t22:1

Reply all
Reply to author
Forward
0 new messages