any news about the control opcode in sm_30?

27 views
Skip to first unread message

Peng Di

unread,
May 17, 2013, 1:11:25 AM5/17/13
to asf...@googlegroups.com
Hi 
If we use cuobjdump sm_30 cubin. elf shows more instructions than sass. Every 7 instructions add 0x...7 and 0x2.... we can insert them by hand. good news is it works and result is correct, but the performance may reduce compared with code generated by ptxas. 
Anyone knows how to insert right instructions without performance loss? Thanks. 

HuanHuan

unread,
May 17, 2013, 4:41:36 AM5/17/13
to asf...@googlegroups.com
What do you mean by saying cuobjdump shows more instructions by sass???

What cuobjdump compares to?

And the hidden instruction seems to be latency information. You can
specify a maximum possible latency for next 7 instructions, but
obviously the maximum latency will decrease performance.

The right latency for a particular instruction is a secret. NV never
release them to public.
> --
> You received this message because you are subscribed to the Google
> Groups "asfermi" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to asfermi+u...@googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.
>
>
Reply all
Reply to author
Forward
0 new messages