Kepler shared much a lot of instructions with fermi. only a few new
instruction were added.
HuanHuan
I don't have much information. you can try the new 301 driver (4.2
driver) and 4.2 toolkit from
http://developer.download.nvidia.com/compute/cuda/4_2/rc/toolkit/cudatoolkit_4.2.6_win_64.msi
(RC) in which some information for sm_30 may be revealed.
Compiler assisted scheduling is always there: the cubin for our good old
sm_21 fermi have always two independent instructions (address end with
0x*0 and the next with 0x*8.
Some one from NV told me kepler shared much a same instruciton base
while I actually didn't test for it yet...
Huan
Would you please do some ptxas jobs and then use cuobjdump to see if the
ISA or the binary encodings changed or not?
Thank you!
Huan
They say programms run well under 4.2 RC, and,
please try the followings,
only some new SIMD instructions added. see the new PTX 3.0 manual (which
is different from the same version 3.0 manual from cuda 4.1)
>
> Here's the code generated for this function:
> __global__ void k(int* output){*output=0x123;}
>
> code for sm_30
> Function : _Z1kPi
> //notice that 0x0000 is not disassembled. I haven't checked what
> instruction it is.
> /*0008*/ /*0x10005de428004001*/ MOV R1, c [0x0] [0x44];
> /*0010*/ /*0x00001de428004005*/ MOV R0, c [0x0] [0x140]; //launch
> argument starts at 0x140, it seems
> /*0018*/ /*0x8c009de218000004*/ MOV32I R2, 0x123;
> /*0020*/ /*0x00009c8590000000*/ ST [R0], R2;
> /*0028*/ /*0x00001de780000000*/ EXIT;
> /*0030*/ /*0xe0001de74003ffff*/ BRA 0x30;
> /*0038*/ /*0x00001de440000000*/ NOP CC.T;
> .......................
>
> So... sm_30 really is Kepler? This is weird because nvcc doesn't work with
> sm_31... The released Kepler is GK104, which should be sm_31... but nvcc
> doesn't support sm_31 and supports sm_30 instead... well, that's as far as
> my speculation should go. asfermi needs to be updated before it can support
> Kepler, but I don't imagine that I can do it in any time soon.
It's beause sm_31 (the former sm_23) is powerful than sm_30 (the former
sm_22). sm_30 remove much support for 32-bit int and double (reduced to
4% performance) and logical opertions. the 32-bit mad may be implemented
via float in micro-code (it took 24 cycles!) (like the good old days we
have no 8087 so we patch them by ALU?).
Only the float performance remained unchanged making sm_30 the great
rubbish for general non-float processing and yet a good card for gaming.
So it should be sm_30, with his big brother sm_31 on the way.
As many instructions have a throught put of 32*n + 8 form.
So i guess integer mul/mad is NOT implment in micro-code. we have only 8
of 192SP's which are fully implemented. while the reset 184 are lame ones...
On 3/27/2012 7:19 PM, Hou Yunqing wrote: