Maxas is designed for experienced cuda programmers who aren't shy about examining compiled sass. With enough of this you're already a good ways towards understanding the ISA.
XMAD is one of the more complicated instructions. It's a 16 bit multiply 32 bit accumulate which has several modes to let you combine instructions to achieve wider bit multiplication.
I typically use 3 sequences:
A single unsigned 16bit mad (flags for signed mode are available):
XMAD d, a, b, c;
A 16 bit times a 32 bit:
XMAD d, a, b, c;
XMAD.PSL d, a.H1, b, d;
or
XMAD d, a, b, c;
XMAD.PSL d, a, b.H1, d;
A 32 bit times a 32 bit and just keeping the lower 32 bits of result:
XMAD.MRG x, a, b.H1, RZ;
XMAD d, a, b, c;
XMAD.PSL.CBCC d, a.H1, x.H1, d;
But there are many other modes to explore, and VMAD is also a similar instruction but lets you add sign operations.
Basically you'll need to study the MaxasGrammar file. I look at that file pretty frequently to find instructions and their flags.