ISA decoding questions

90 views
Skip to first unread message

Dmitry N. Mikushin

unread,
Nov 23, 2012, 1:03:46 AM11/23/12
to asf...@googlegroups.com
Dear colleagues,

I wrote a simple prober, that examines cuobjdump output on different binary permutations. Interestingly, the result for sm_20/sm_30 does not have much common with layouts presented in AsFermi wiki. For example:

1111111100000000000000000000000000000000000000000000000000000000 BRA_N
0101001111011000000000000000000000000000000000000000000000000000 LDS_N
0100000000000000000000000000000000000000000000000000000000000000 IMAD32I
0010000000000000000000000000000000000000000000000000000000000000 CSET
0001101011110100000000000000000000000000000000000000000000000000 VMAD_N
1110000000000000000000000000000000000000000000000000000000000000 JMP
1101101011000010000000000000000000000000000000000000000000000000 ST_N
1101000110101010000000000000000000000000000000000000000000000000 LD_N
1100000000000000000000000000000000000000000000000000000000000000 IMADSP
1010000000000000000000000000000000000000000000000000000000000000 RED
0110000000000000000000000000000000000000000000000000000000000000 VILD
0101100100011110000000000000000000000000000000000000000000000000 STL_N
1011011010000101000000000000000000000000000000000000000000000000 IMAD_N
1001110000011011000000000000000000000000000000000000000000000000 AST_N
1001000100110111000000000000000000000000000000000000000000000000 IPA_N
1001001011011000100000000000000000000000000000000000000000000000 FADD_N
1001101000010101100000000000000000000000000000000000000000000000 FSETP_N
0001010111000110010000000000000000000000000000000000000000000000 FMUL_N
0001000100111110010000000000000000000000000000000000000000000000 I2F_N
0101101100000011010000000000000000000000000000000000000000000000 STS_N
0001010010101011010000000000000000000000000000000000000000000000 MOV_N
0001100101100100110000000000000000000000000000000000000000000000 F2F_N
0001100010011010110000000000000000000000000000000000000000000000 MOVI_N
0011100011000001110000000000000000000000000000000000000000000000 LDC_N
0001000101011001110000000000000000000000000000000000000000000000 F2I_N
0111001001010100001000000000000000000000000000000000000000000000 FFMA_N
0001001111000110001000000000000000000000000000000000000000000000 ISETP_N
0001000010111110001000000000000000000000000000000000000000000000 VABSDIFF4_N
1001011010001001001000000000000000000000000000000000000000000000 ALD_N
0001000011011001101000000000000000000000000000000000000000000000 VADD2_N
1011110000001001000100000000000000000000000000000000000000000000 LOP_N
0101000111000101000100000000000000000000000000000000000000000000 LDL_N
0001000011101010100100000000000000000000000000000000000000000000 VADD4_N
0011010100101000010100000000000000000000000000000000000000000000 IADD_N
0001110100000101010100000000000000000000000000000000000000000000 TEX_N
0001100000110001110100000000000000000000000000000000000000000000 I2I_N
0101010000010010101100000000000000000000000000000000000000000000 IMUL_N
0001000000000111111100000000000000000000000000000000000000000000 NOP_N
0001000010010101011010000000000000000000000000000000000000000000 VABSDIFF2_N
0001000001100100111001000000000000000000000000000000000000000000 SHL_N
0000000000000000000111111110000000000000000000000000000000000000 FMNMX
0000000000000000000011111111000000000000000000000000000000000000 FSET
0001000000110111010000000000100000000000000000000000000000000000 CSETP_N
0001000000000111100100000001100000000000000000000000000000000000 PSETP_N
0000000000000000000001111111100000000000000000000000000000000000 FSETP
0001000000111010110000000000010000000000000000000000000000000000 TEXDEPBAR_N
0000000000000000000001111111010000000000000000000000000000000000 FFMA
0000000000000000000001111110110000000000000000000000000000000000 FCMP
0000000000000000000000111111110000000000000000000000000000000000 FCCO
1111100000001010000000000000001000000000000000000000000000000000 SSY_N
0001000000011110001100000000001000000000000000000000000000000000 S2R_N
0000000000000000000011111110001000000000000000000000000000000000 FSWZ
0000000000000000000001111111001000000000000000000000000000000000 FADD
0001000000110101001000000000101000000000000000000000000000000000 BAR_N
0000000000000000000001111110101000000000000000000000000000000000 FMUL
0000000000000000000000111111101000000000000000000000000000000000 RRO
0001000000100100110000000000111000000000000000000000000000000000 RRO_N
1111110000000000100000000000000100000000000000000000000000000000 BPT_N
0001000000010011111000000000000100000000000000000000000000000000 MUFU_N
0001000001000110011010000000000100000000000000000000000000000000 SHR_N
0000000000000000000000011111110100000000000000000000000000000000 IPA
0000000000000000000001111110001100000000000000000000000000000000 MUFU
0000000000000000000011111110000010000000000000000000000000000000 DMNMX
1000100000000000000000000000000010000000000000000000000000000000 DSET
0000000000000000000001111110100010000000000000000000000000000000 DSETP
0000000000000000000000111111100010000000000000000000000000000000 DFMA
0000000000000000000001111110001010000000000000000000000000000000 DADD
0000000000000000000000111111001010000000000000000000000000000000 DMUL


I'm wondering what could it mean. Could there be alternative encodings? Theoretically, cuobjdump can lie and we won't be able to spot that without unit tests :)

- D.

Hou Yunqing

unread,
Nov 23, 2012, 4:47:17 AM11/23/12
to asfermi Google Group
Hey D.,

I don't quite understand what it is you are presenting... are you sure your prober has no bug? It seems to give too many continous zeros. Are you doing something similar to this? (this program reads a file containing lines similar to "      /*0018*/     /*0xfc0f5de21bffffff*/     MOV32I R61, -0x1;" and outputs the binary equivalent of the hex number in there. See explanation here )

I never verified the validity of instructions that end with _N, but cuobjdump tells me they are 32-bit in length. Their encoding is slightly different. See this.

Yunqing

Sun HuanHuan

unread,
Nov 23, 2012, 1:34:13 AM11/23/12
to asf...@googlegroups.com
Hi, asfermi and Dmitry,

You meant even sm_30, which was known to share a same encoding base
with sm_2x, is now proven to be using the different encodings compared
to sm_2x?

I noticed that most of these instruction with different encodings are
_N instructions (4B in length).

Can you use other disassembling tools, like nsight's disassembly
window, to show some encodings for these instructions? Nsight is known
to have diffent instructions disassembly naming than cuobjdump.

HuanHuan

On Fri, Nov 23, 2012 at 2:03 PM, Dmitry N. Mikushin <maem...@gmail.com> wrote:

Dmitry N. Mikushin

unread,
Nov 24, 2012, 12:29:14 AM11/24/12
to asf...@googlegroups.com
Hi Yunqing and Huan,

You're right, there was a number of bugs in my cuobjdump feedback parser, as result of a number of wrong assumptions about its behavior. With that issues fixed I'm now able to produce encodings for sm_20 commands, which are very similar to those in AsFermi knowledge base. I also tried to explore sm_35 a little bit. Please see both listings below.

At the moment, my prober (isadecode) is only capable of command name identification, i.e. determining the minimum number of bits required to represent each particular command. My goal is to make a Kepler K20 encoding and extend AsFermi to emit it. Before going further with identifying suffixes' and arguments' bits, I'd like to clarify if the current differences between AsFermi and isadecode make sense. For example, according to AsFermi, sm_20's JMP is represented as
1110011110111000000000000000000000000000000000000000000000000000
and according to isadecode, it's enough to have:

1110000000000000000000000000000000000000000000000000000000000000


There is the same situation with many other commands: it seems that some bits set in AsFermi are not really required. Could it be so? Is it reliable to develop a prober that intentionally tries to find the minimal combination of bits identifying particular command?

Thanks for suggestion about Nsight, Huan! I'm not using Windows at all, so that would be for the last resort.

Best,
- Dima.

$ ./isadecode sm_20
0000000000000000000000000000000000000000000000000000000000000000    BPT
1110000000000000000000000000000000000000000000000000000000000001    EXIT
0000000000000000000000000000000000000000000000000000000000000010    FCCO
0000000000000000000000000000000000000000000000000000000000000100    FSETP
0000000000000000000000000000000000000000000000000000000000001000    FSET
0000000000000000000000000000000000000000000000000000000000010000    FMNMX
0001000000000000000000000000000000000000000000000000000000000000    NOP_N
0010000000000000000000000000000000000000000000000000000000000000    CSET
0100000000000000000000000000000000000000000000000000000000000000    IMAD32I
0001000000000000000000000000000100000000000000000000000000000000    MUFU_N
0001000000000000000000000000001000000000000000000000000000000000    S2R_N
0001000000000000000000000000100000000000000000000000000000000000    CSETP_N
0001000001000000000000000000000000000000000000000000000000000000    SHL_N
0001000010000000000000000000000000000000000000000000000000000000    VABSDIFF2_N
0001000100000000000000000000000000000000000000000000000000000000    I2F_N
0001001000000000000000000000000000000000000000000000000000000000    ISETP_N
0001010000000000000000000000000000000000000000000000000000000000    MOV_N
0001100000000000000000000000000000000000000000000000000000000000    I2I_N
0101000000000000000000000000000000000000000000000000000000000000    LDL_N
1001000000000000000000000000000000000000000000000000000000000000    IPA_N
0110000000000000000000000000000000000000000000000000000000000000    VILD
1010000000000000000000000000000000000000000000000000000000000000    RED
0000000000000000000000000000000000000000000000000000000000000011    IPA
0010000000000000000000000000000000000000000000000000000000000001    VADD4
0000000000000000000000000000000000000000000000000000000000000110    RRO
0000000000000000000000000000000000000000000000000000000000001010    FADD
0000000000000000000000000000000000000000000000000000000000010010    FSWZ
0010000000000000000000000000000000000000000000000000000000000010    NOP
0100000000000000000000000000000000000000000000000000000000000010    ISCADD32I
0000000000000000000000000000000000000000000000000000000000001100    FFMA
0010000000000000000000000000000000000000000000000000000000000100    SEL
0100000000000000000000000000000000000000000000000000000000000100    FFMA32I
1000000000000000000000000000000000000000000000000000000000000100    DFMA
0100000000000000000000000000000000000000000000000000000000001000    IMUL32I
1000000000000000000000000000000000000000000000000000000000001000    DSET
0010000000000000000000000000000000000000000000000000000000010000    PSET
0100000000000000000000000000000000000000000000000000000000010000    IADD32I
1000000000000000000000000000000000000000000000000000000000010000    DMNMX
0010000000000000000000000000000000000000000000000000000000100000    CSETP
0001000010100000000000000000000000000000000000000000000000000000    VABSDIFF4_N
0001000011000000000000000000000000000000000000000000000000000000    VADD2_N
0001000101000000000000000000000000000000000000000000000000000000    F2I_N
0001100010000000000000000000000000000000000000000000000000000000    MOVI_N
0001010100000000000000000000000000000000000000000000000000000000    FMUL_N
0001100100000000000000000000000000000000000000000000000000000000    F2F_N
0001101000000000000000000000000000000000000000000000000000000000    VMAD_N
0101001000000000000000000000000000000000000000000000000000000000    LDS_N
1001001000000000000000000000000000000000000000000000000000000000    FADD_N
0011010000000000000000000000000000000000000000000000000000000000    IADD_N
0101010000000000000000000000000000000000000000000000000000000000    IMUL_N
1001010000000000000000000000000000000000000000000000000000000000    ALD_N
0011100000000000000000000000000000000000000000000000000000000000    LDC_N
0101100000000000000000000000000000000000000000000000000000000000    STL_N
0111000000000000000000000000000000000000000000000000000000000000    FFMA_N
1011000000000000000000000000000000000000000000000000000000000000    IMAD_N
1101000000000000000000000000000000000000000000000000000000000000    LD_N
1110000000000000000000000000000000000000000000000000000000000000    JMP
0001000000000000000000000001100000000000000000000000000000000000    PSETP_N
0001000001000000000000000000000100000000000000000000000000000000    SHR_N
0001000000000000000000000000101000000000000000000000000000000000    BAR_N
0110000000000000000000000000000000000000000000000000000000100000    ALD
0110000000000000000000000000000000000000000000000000000000010000    AST
1100000000000000000000000000000000000000000000000000000000010000    IMNMX
0110000000000000000000000000000000000000000000000000000000001000    PIXLD
1100000000000000000000000000000000000000000000000000000000001000    ISET
0010000000000000000000000000000000000000000000000000000000110000    PSETP
1100000000000000000000000000000000000000000000000000000000000100    IMAD
0010000000000000000000000000000000000000000000000000000000011000    I2F
0100000000000000000000000000000000000000000000000000000000011000    MOV32I
1000000000000000000000000000000000000000000000000000000000011000    DSETP
0010000000000000000000000000000000000000000000000000000000101000    F2I
1010000000000000000000000000000000000000000000000000000000000010    ATOM
1100000000000000000000000000000000000000000000000000000000000010    ISCADD
0000000000000000000000000000000000000000000000000000000000011100    FCMP
0010000000000000000000000000000000000000000000000000000000001100    P2R
0100000000000000000000000000000000000000000000000000000000001100    FMUL32I
0010000000000000000000000000000000000000000000000000000000010100    MOV
0100000000000000000000000000000000000000000000000000000000010100    FADD32I
0010000000000000000000000000000000000000000000000000000000100100    PRMT
0110000000000000000000000000000000000000000000000000000000000001    TEX
1010000000000000000000000000000000000000000000000000000000000001    LD
0000000000000000000000000000000000000000000000000000000000011010    FMUL
0010000000000000000000000000000000000000000000000000000000001010    BAR
1000000000000000000000000000000000000000000000000000000000001010    DMUL
0010000000000000000000000000000000000000000000000000000000010010    VOTE
1000000000000000000000000000000000000000000000000000000000010010    DADD
0010000000000000000000000000000000000000000000000000000000100010    LEPC
0000000000000000000000000000000000000000000000000000000000010011    MUFU
0010000000000000000000000000000000000000000000000000000000000011    VADD
0010000000000000000000000000000000000000000000000000000000000101    VADD2
0010000000000000000000000000000000000000000000000000000000001001    VSHR4
0010000000000000000000000000000000000000000000000000000000010001    VABSDIFF4
0010000000000000000000000000000000000000000000000000000000100001    VMNMX4
0001000011100000000000000000000000000000000000000000000000000000    VADD4_N
0001110100000000000000000000000000000000000000000000000000000000    TEX_N
0101101000000000000000000000000000000000000000000000000000000000    STS_N
1001101000000000000000000000000000000000000000000000000000000000    FSETP_N
1001110000000000000000000000000000000000000000000000000000000000    AST_N
1011100000000000000000000000000000000000000000000000000000000000    LOP_N
1101100000000000000000000000000000000000000000000000000000000000    ST_N
0001000000000000000000000000111000000000000000000000000000000000    RRO_N
1110000000000000000000000000000000000000000000000000000000010000    JMX
1110000000000000000000000000000000000000000000000000000000001000    JCAL
0010000000000000000001001000000000000000000000000000000000001000    F2F
0110000000000000000000000000000000000000000000000000000000101000    LDC
1100000000000000000000000000000000000000000000000000000000011000    ISETP
0010000000000000000000000000000000000000000000000000000000111000    I2I
1100000000000000000000000000000000000000000000000000000000010100    BFI
0010000000000000000000000000000000000000000000000000000000110100    S2R
1100000000000000000000000000000000000000000000000000000000001100    ICMP
1110000000000000000000000000000000000000000000000000000000000010    BRA
0010000000000000000000000000000000000000000000000000000000011100    B2R
0100000000000000000000000000000000000000000000000000000000011100    LOP32I
0010000000000000000000000000000000000000000000000000000000101100    R2P
1100000000000000000000000000000000000000000000000000000000010010    IADD
1100000000000000000000000000000000000000000000000000000000001010    IMUL
0010000000000000000000000000000000000000000000000000000000110010    STP
1100000000000000000000000000000000000000000000000000000000000110    SHL
0010000000000000000000000000000000000000000000000000000000101010    POPC
1010000000000000000000000000000000000000000000000000000000010001    LDU
0110000000000000000000000000000000000000000000000000000000001001    TLD
1010000000000000000000000000000000000000000000000000000000001001    ST
0010000000000000000000000000000000000000000000000000000000110001    VSET4
0110000000000000000000000000000000000000000000000000000000000101    TLD4
1010000000000000000000000000000000000000000000000000000000000101    LDLK
0010000000000000000000000000000000000000000000000000000000011001    VSEL4
0010000000000000000000000000000000000000000000000000000000101001    VSHL4
0110000000000000000000000000000000000000000000000000000000000011    TXQ
1010000000000000000000000000000000000000000000000000000000000011    LDL
0010000000000000000000000000000000000000000000000000000000001101    VSHR2
0010000000000000000000000000000000000000000000000000000000010101    VABSDIFF2
0010000000000000000000000000000000000000000000000000000000100101    VMNMX2
0010000000000000000000000000000000000000000000000000000000000111    VSHR
0010000000000000000000000000000000000000000000000000000000001011    VABSDIFF
0010000000000000000000000000000000000000000000000000000000010011    VMNMX
1111100000000000000000000000000000000000000000000000000000000000    BRA_N

$ ./isadecode sm_35
0000000000000000000000000000000000000000000000000000000000000000    BPT
0000000000000000000000000000000000000000000000000000000000000000    MOV
0000000000000000000000000000000000000000000000000000000000011000    EXIT
0000000000000000000000000000000000000000000000000000000000001000    BRA
0000000000000000000000000000000000000000000000000000000000000000    NOP
0000000000000000000000000000000000000000000000000000000000000001    IMAD32I
0000000000000000000000000000000000000000000000000000000000000010    FADD32I
0000000000000000000000000000000000000000000000000000000000000100    LOP32I
0000000000000000000000000000000000000000000000000000000000001000    JMX
0100000000000000000000000000000000000000000000000000000000000000    VSET
1000000000000000000000000000000000000000000000000000000000000000    VADD
0000000000000000000000000000000000000000000000000000000000000011    LD
0000000000000000000000000000000000000000000000000000000000000101    ISCADD32I
0100000000000000000000000000000000000000000000000000000000000001    VABSDIFF4
1000000000000000000000000000000000000000000000000000000000000001    FSET
0000000000000000000000000000000000000000000000000000000000000110    FFMA32I
1000000000000000000000000000000000000000000000000000000000000010    IADD32I
0100000000000000000000000000000000000000000000000000000000000100    FMUL32I
1000000000000000000000000000000000000000000000000000000000000100    VADD2
0000000000000000000000000000000000000000000000000000000000101000    PLONGJMP
0000000000000000000000000000000000000000000000000000000010001000    JCAL
0000000000000000000000000000000000000000000000000000000100001000    JMP
0100000000000000000000000000000000000000000000000000000000001000    VSET2
0000000000000000000000000000000000000000000000000000000000111000    RAM
0000000000000000000000000000000000000000000000000000000001011000    BRK
0000000000000000000000000000000000000000000000000000000010011000    RET
0000000000000000000000000000000000000000000000000000000100011000    LONGJMP
0000000000000000000000000000000000000000000000000000000001101000    GETCRSPTR
0000000000000000000000000000000000000000000000000000000010101000    PBK
0000000000000000000000000000000000000000000000000000000100101000    SSY
0000000000000000000000000000000000000000000000000000000011001000    CAL
0000000000000000000000000000000000000000000000000000000101001000    BRX
0100000000000000000000000000000000000000000000000000000000001100    SULDGA
0100000000000000000000000000000000000000000000000000000000010100    IMUL32I
0100000000000000000000000000000000000000000000000000000000000110    BFE
1000000000000000000000000000000000000000000000000000000000000110    LDG
0100000000000000000000000000000000000000000000000000000000001010    IMAD
0100000000000000000000000000000000000000000000000000000000010010    DSET
0000000000000000000000000000000000000000000000000000000000000111    ST
0100000000000000000000000000000000000000000000000000000000000101    VABSDIFF
0100000000000000000000000000000000000000000000000000000000100001    MUFU
0000000000000000000000000000000000000000000000000000000111001000    PRET
0000000000000000000000000000000000000000000000000000000011101000    SETCRSPTR
0000000000000000000000000000000000000000000000000000000101101000    GETLMEMBASE
0000000000000000000000000000000000000000000000000000000110101000    PCNT
0000000000000000000000000000000000000000000000000000000100111000    IDE
0000000000000000000000000000000000000000000000000000000011011000    RTT
0000000000000000000000000000000000000000000000000000000101011000    CONT
0000000000000000000000000000000000000000000000000000000110011000    KIL
0100000000000000000000000000000000000000000000000000000000011100    SUSTGA
0100000000000000000000000000000000000000000000000000000000110010    FFMA
0100000000000000000000000000000000000000000000000000000000011010    SUCLAMP
0100000000000000000000000000000000000000000000000000000000101010    IMADSP
1000000000000000000000000000000000000000000000000000000000001110    TLD4
0100000000000000000000000000000000000000000000000000000000010110    ATOM
1000000000000000000000000000000000000000000000000000000000010110    TEX
0100000000000000000000000000000000000000000000000000000000100110    DMUL
0100000000000000000000000000000000000000000000000000000001000110    LOP
0100000000000000000000000000000000000000000000000000000010000110    IMNMX
0100000000000000000000000000000000000000000000000000000100000110    IADD
0100000000000000000000000000000000000000000000000000001000000110    POPC
0100000000000000000000000000000000000000000000000000000001100001    R2B
0100000000000000000000000000000000000000000000000000000010100001    CSETP
0100000000000000000000000000000000000000000000000000000100100001    PSETP
0100000000000000000000000000000000000000000000000000001000100001    PSET
0100000000000000000000000000000000000000000000000000000000001101    VSHR
0100000000000000000000000000000000000000000000000000000000010101    VABSDIFF2
1000000000000000000000000000000000000000000000000000000000000111    VMNMX
1000000000000000000000000000000000000000000000000000000000001011    VADD4



2012/11/23 Sun HuanHuan <mailhu...@gmail.com>

Dmitry N. Mikushin

unread,
Nov 24, 2012, 12:43:30 AM11/24/12
to asf...@googlegroups.com
Oh, you may wonder why 3 instructions in sm_35 are represented with zero fields. That's my question too :) It turns out, decoder found two completely orthogonal encodings for the same command, that being ANDed, give zero. Fortunately, there is very little number of such collisions.

- D. 

2012/11/24 Dmitry N. Mikushin <maem...@gmail.com>

Sun HuanHuan

unread,
Nov 24, 2012, 10:18:13 AM11/24/12
to asf...@googlegroups.com
and, Dmitry, you should keep in mind that there are predicate bits in
the encodings.

an instruction, like fadd r2,r1,r0, is actually @p7 fadd r2,r1,r0

Predicate register(or pr (compared to the normal Regular Register (rr,
for example, the rr in -maxrregconunt option for ptxas))) 7 is always
true. so it's @true fadd r2,r1,r0, and then it's fadd r2,r1,r0




Dmitry N. Mikushin

unread,
Nov 24, 2012, 2:21:48 AM11/24/12
to asf...@googlegroups.com
So, you mean extra bits for JMP notation in AsFermi is PR?

- D.

2012/11/24 Sun HuanHuan <mailhu...@gmail.com>

Sun HuanHuan

unread,
Nov 24, 2012, 10:24:57 AM11/24/12
to asf...@googlegroups.com
It may be!

I remembered some pages in asfermi wiki shows they (predicate bits) are 3-bits.


Hou Yunqing

unread,
Nov 24, 2012, 3:29:58 AM11/24/12
to asfermi Google Group
Hi guys,

Now I understand what you are doing, D.
1110011110111000000000000000000000000000000000000000000000000000
1110 011110 1110 00000000000000000000000000000000000000000000000000
Bit field 10:13 is predicate. Bit 13 is the negation bit, so 1110 means P7=true, without negation. Some predicate fields are 3-bit because they do not support negation.
As for the field 4:9, I left it as 011110 because ptxas always creates this combination. I thought it's easier to follow what ptxas did.

If you want to automate things, well... I'm not sure, because it's very likely that not all code that cuobjdump recognizes is actually valid. 
But a few things may be automated:
  1. Finding out the opcode for each instruction
  2. Finding out all modifier groups and possible members under each group
  3. Finding out the number of operands needed for each instruction and their types
  4. Finding out the distribution of bit fields (like which field is for the first operand, and which for the second etc.)
The process of complete automation may be difficult, because there are really a lot of possibilities. I've encountered, for example, a weird and stubborn opcode bit that lies outside of the opcode fields (it's probably not really a opcode bit, but it surely acted like one!). Also, the fact that some modifiers are treated as "default" and are not displayed by cuobjdump under many situations may give you some slight trouble, though not much. There are also a lot of other stuffs like operators (plus, minus, negation, etc.)... it's just a lot of stuffs to code for if you want to automate everything.

Also, there's still the problem that the combination of certain operands, or modifiers, may not be valid to the hardware decoder but may be recognised by cuobjdump. So some manual work may still be needed to check the code that ptxas generates.

Anyway, I think the four things I listed above are still doable, but most instructions would still require some manual effort.

Yunqing

Sun HuanHuan

unread,
Nov 24, 2012, 3:43:19 AM11/24/12
to asf...@googlegroups.com
Oh! Thank you Yunqing for your great reply!!

really great!

Dmitry N. Mikushin

unread,
Nov 24, 2012, 3:44:48 AM11/24/12
to asf...@googlegroups.com
Yunqing, Huan,

Thank you for a lot of useful info!

Still, I don't completely understand why you always add positive predicate. Is it necessary?

Right, I want to make points 1-4 you mentioned, at least for the minimal set of execution/data instructions used in dynamic loader. The thing I'm currently worried most of all is that automatic prober is not able to find LEPC instruction on sm_35. If it was removed, that will be a catastrophe! I hope it is somewhere further with dense mask of significant bits, where automatic prober is very slow due to a very big number of combinations.

2012/11/24 Hou Yunqing <hyq.n...@gmail.com>

Sun HuanHuan

unread,
Nov 24, 2012, 3:50:16 AM11/24/12
to asf...@googlegroups.com
It's because only p7 is true.

So if you want to do things, you have to do:
if (true) do_things();

and, of couse, you can also:
p3 = true;
if (p3) do things;

But it's a waste of instructions. Using a constant predicate register
is good for setting an additional predicate register.

Dmitry N. Mikushin

unread,
Nov 24, 2012, 3:53:48 AM11/24/12
to asf...@googlegroups.com
OK, now I see that specifically 10-13 is filled in every instruction.

Hou Yunqing

unread,
Nov 24, 2012, 5:09:04 AM11/24/12
to asfermi Google Group
LEPC is listed under the section for Fermi ISA in cuobjdump.pdf
Is there a new section for the real Kepler ISA that lists something similar?

Dmitry N. Mikushin

unread,
Nov 24, 2012, 2:56:50 PM11/24/12
to asf...@googlegroups.com
Right, it's there, and finally found it's code:

0100000000000000000000000000000000000000000000000000000101100001    LEPC

2012/11/24 Hou Yunqing <hyq.n...@gmail.com>

Dmitry N. Mikushin

unread,
Nov 24, 2012, 3:04:57 PM11/24/12
to asf...@googlegroups.com
But cuobjdump.pdf lists only 93 instructions, while there were already 120 found by prober.

- D.

2012/11/24 Dmitry N. Mikushin <maem...@gmail.com>
Right, it's there, and finally found it's code:

Hou Yunqing

unread,
Nov 24, 2012, 9:49:33 PM11/24/12
to asfermi Google Group
I remember the 32-bit instructions aren't listed, neither are some straightforward variants such as IADD32I

Dmitry N. Mikushin

unread,
Nov 24, 2012, 10:53:05 PM11/24/12
to asf...@googlegroups.com

Dmitry N. Mikushin

unread,
Nov 24, 2012, 10:59:30 PM11/24/12
to asf...@googlegroups.com
Dear all,

In order to perform factorization (identify bits responsible for different parts of instruction) with reasonable efficiency, I need to make some assumptions reducing search complexity.

Please tell, how do you think: if we take all codes for base instructions (i.e. without suffixes, zero args and predicates), and OR them all - would it be correct to assume that bits responsible for all other parts can be located only outside of this mask for all instructions?

Thanks,
- D.

2012/11/25 Dmitry N. Mikushin <maem...@gmail.com>

Hou Yunqing

unread,
Nov 24, 2012, 11:37:20 PM11/24/12
to asfermi Google Group
I don't get what you mean by "base instructions"... do you mean stuffs like RET that don't take any operand and modifiers? If you are sure those instructions don't accept any modifiers, then I think it's worth a try.

But I think it's much easier to start with some manual work. I've noticed, for example, the control flow instructions have 60:63 = 1000
There's something similar in Fermi ISA:

For example, in bit field 0:3,
0: 0000: 32-bit floating op
1: 1000: 64-bit floating op
2: 0100: 32-bit immediate operand op
3: 1100: integer op
4: 0010: miscelanneous (MOV, conversion, NOP, LEPC, pop count etc)
5: 1010: memory op
6: 0110: texture/constant mem ops
7: 1110: control flow
See here.

In sm_35 it doesn't seem so obvious, though.

I was saying it's faster to start with some manual work, like this:
say you have an instruction MOV R23, R45;
You can start right away by looking for 11101[000] and 101101[00]
If they are found, you can be sure of the location of the bit fields for the first two operands of MOV
And it's very likely that most other instructions will follow a similar arrangement
You can also use cubinEditor to modify some bits by hand and see how the disassembly of cuobjdump changes.
This surely will be enough to give you some good ideas about how the bit fields are distributed.

Hou Yunqing

unread,
Nov 24, 2012, 11:42:17 PM11/24/12
to asfermi Google Group
You can look into nv's compiled libraries (hundreds of MB in size?) for a lot of real examples

Dmitry N. Mikushin

unread,
Nov 27, 2012, 12:55:19 AM11/27/12
to asf...@googlegroups.com
Hi Yunqing,

I tried to follow your idea about registers positions identification, but it does not work quiet well for me, neither in sm_20, nor in sm_35. For example, if trying to probe R27 (0b11011), there are matches only for very few instructions, and even among that few R27 never occurs on any other position, but the last argument:

0000000000000000000000000000000000000000000000000110110000100010    @P0 FCCO.DP1Z P0, P0, R0, R0, R27


On sm_35 R208 (0b11010000) can be seen (also very few, and also only as last argument), but only when preceded with extra 1 bit. Example:

0100000000000000000000000000000000000000000000101111010000000000    @P0 VSET.F.S8.U8.MRG_16H R0.CC, R0, R0, R208


So, it seems the proposed method misses something, probably some extra signaling bits. Are you sure operating on registers does not need flagging extra "mode" bits?

Hou Yunqing

unread,
Nov 27, 2012, 3:48:02 AM11/27/12
to asfermi Google Group
Huh? Looks to me you've already found the fields (of the last register operand, of FCCO and VSET)?

0000000000000000000000000000000000000000000000000 11011000 0100010 FCCO
                                                  last reg

010000000000000000000000000000000000000000 00001011 11010000000000 VSET
                                           last reg
What is the problem??

Anyway, if things really seem too confusing, I'll be able to do look into it this Friday.

Yunqing

Dmitry N. Mikushin

unread,
Nov 27, 2012, 12:08:35 PM11/27/12
to asf...@googlegroups.com
> What is the problem??

There is no problem finding last register argument of these particular functions. I'm saying they are *few* ones, for which such probing worked. So the question is why it does not work for other instructions and for other arguments.

- D.

2012/11/27 Hou Yunqing <hyq.n...@gmail.com>

Hou Yunqing

unread,
Nov 28, 2012, 3:48:51 AM11/28/12
to asfermi Google Group
I see...

Hou Yunqing

unread,
Nov 30, 2012, 4:19:15 AM11/30/12
to asfermi Google Group
Hi D.,

Maybe I'm still a bit lost regarding the problem... can you give me an example where the approach I said was failing?

I did a bit work for MOV, here's what I got:
    MOV[.S] reg0, reg1 [,cst];
01 00000000 00000000 1110 0 00000000 00000000000 1111 00000000 000000011 1
  na reg0              pr   m reg1                 cst           nb        m2

    MOV[.S] reg0, reg1 [,cst];
01 00000000 00000000 1110 0 00000000000000 00000 1111 00000000 000000011 0
na reg0              pr   m const addr     bank  cst           nb        m2

const addr = real addr/4, bytes not addressable
        m: 0 default
           1 .S
m2:0 use constant memory
  1 use reg 1
cst: a hex number <= 0xe
    1111 = 0xf = hidden default

I didn't see any weird mode stuff here... though the composite operand for MOV no longer accepts immediate value, so there is only a single mode bit (m2) for composite operand here.

Yunqing

Hou Yunqing

unread,
Nov 30, 2012, 4:21:05 AM11/30/12
to asfermi Google Group
Oops one error in the previous email. The second 
    MOV[.S] reg0, reg1 [,cst];
should be
    MOV[.S] reg0, c[bank][const addr] [,cst];
Reply all
Reply to author
Forward
0 new messages