I want do a analysis on PTX code to gather the instruction mixture. Do Ocelot can help me?
Or I just can write a program to do this? I can just count the instructions.
For example, if the loop will be executed in nk times, how many add, mul, ld/st instructions respectively?
BB0_3:
add.s32 %r21, %r23, %r3;
mul.wide.s32 %rd6, %r21, 4;
add.s64 %rd7, %rd2, %rd6;
ld.global.f32 %f7, [%rd7];
mul.f32 %f8, %f7, %f4;
mad.lo.s32 %r22, %r23, %r6, %r1;
mul.wide.s32 %rd8, %r22, 4;
add.s64 %rd9, %rd3, %rd8;
ld.global.f32 %f9, [%rd9];
fma.rn.f32 %f10, %f8, %f9, %f10;
st.global.f32 [%rd1], %f10;
add.s32 %r23, %r23, 1;
setp.lt.s32 %p5, %r23, %r7;
@%p5 bra BB0_3;