Hi Anthony,
Yes you are right. If you are comfortable with coding directly in the
native ISA, of course you can write native ISA straightaway. If you
are not, you may choose to go from
PTX--ptxas-->cubin----cuobjdump-->disassembled ISA, modify the
disassembly in the way you want, and then reassemble it using asfermi.
If you want to code directly, it helps to read the output of cuobjdump
on your own kernels so you can familiarise yourself with the ISA.
But it's worth noting that asfermi doesn't support every Fermi
instruction, and so far we haven't gathered the information we need to
enable software scheduling for Kepler, without which the code on
Kepler runs significantly slower.
Cheers,
Yunqing