the "same" cubins behaving total differently.

86 views
Skip to first unread message

Sun HuanHuan

unread,
Mar 31, 2012, 4:08:57 AM3/31/12
to asf...@googlegroups.com
Hi, all,

There are two cubins. one is "good.cubin", generated by ptxas. the other
is "bad.cubin" generated by asfermi.

cuobjdump --dump-sass good.cubin

code for sm_20
Function : f
/*0000*/ /*0x00005de428004404*/ MOV R1, c [0x1] [0x100];
/*0008*/ /*0x8000dde428004000*/ MOV R3, c [0x0] [0x20];
/*0010*/ /*0x00301c8580000000*/ LD R0, [R3];
/*0018*/ /*0x10309c8580000000*/ LD R2, [R3+0x4];
/*0020*/ /*0x00201c0348000000*/ IADD R0, R2, R0;
/*0028*/ /*0x30301c8590000000*/ ST [R3+0xc], R0;
/*0030*/ /*0x00001de780000000*/ EXIT;

and

cuobjdump --dump-sass bad.cubin

code for sm_20
Function : f
/*0000*/ /*0x00005de428004404*/ MOV R1, c [0x1] [0x100];
/*0008*/ /*0x8000dde428004000*/ MOV R3, c [0x0] [0x20];
/*0010*/ /*0x00301c8580000000*/ LD R0, [R3];
/*0018*/ /*0x10309c8580000000*/ LD R2, [R3+0x4];
/*0020*/ /*0x00201c0348000000*/ IADD R0, R2, R0;
/*0028*/ /*0x30301c8590000000*/ ST [R3+0xc], R0;
/*0030*/ /*0x00001de780000000*/ EXIT;

Yes! they are exactly the same.

The common protype is:
__global__ void f(int *p)
{
p[3] = p[0] + p[2];
}

However, good_by_ptxas.cubin will run on almost all cards. and
bad_by_asfermi.cubin will not run all sm_20 cards, and could only run on
GT430/GT520-like low end card.

bad_by_asfermi.cubin will give access violation on GTX480/GTX460 cards.

Any ideas?

Thank you!
Sun HuanHuan

good_by_ptxas.cubin
bad_by_asfermi.cubin

Sun HuanHuan

unread,
Mar 31, 2012, 4:13:43 AM3/31/12
to asf...@googlegroups.com
On 3/31/2012 4:08 PM, Sun HuanHuan wrote:
> Hi, all,
>
> There are two cubins. one is "good.cubin", generated by ptxas. the other
> is "bad.cubin" generated by asfermi.
>

>

> The common protype is:
> __global__ void f(int *p)
> {
> p[3] = p[0] + p[2];

+P[1];
> }

typo.
:(

Sun HuanHuan

unread,
Mar 31, 2012, 6:11:01 AM3/31/12
to asf...@googlegroups.com
Hi all,

WOW! It's resolved!

Thank HYQ the great who helped in figuring out where the problem is.
Actually the two cubins are not the same.

I missed the !param which seemed to be some info for parameter
size/space. I added !param 4 and now it worked.

And,

LD/ST/RED/ATOM has nothing to do with SP! So please ignore the post
yesterday in which I think LD relies on SP which is not corret.

HuanHuan
:)

jayvant....@gmail.com

unread,
Jul 7, 2013, 12:13:59 AM7/7/13
to asf...@googlegroups.com
Hi,

I ran into a similar problem and got it fixed by inserting Param directive.
But I am not sure if I have fully understood the directive.
For example, for a kernel with prototype as
__global__ void k(int n0, int n1, int* g)

the directives I used are
!Param 4
!Param 4
!Param 8

and it worked
But for
__global__ void(int n0, int* g)
!Param 4
!Param 8
did not work,
but
!Param 8 2
worked.

How exactly do we specify Param attribute.
I thought listing them in the order in which they appear in the prototype, with the size of the parameter should work.
Do I need to take into account the alignment i.e. if the first one is 4 bytes and the second 8 bytes then
!Param 8
!Param 8

is how I need to specify?
BTW, I am using 64 bit machine.

Also do we have to insert other pragmas also? Are they mandatory?
Basically I am looking at some automated way of taking output of cudaobjdump -sass and then pass it on to asfermi, without having to
manually add various Pragmas.

Thanks. and regards,
Jayvant

Dmitry N. Mikushin

unread,
Jul 7, 2013, 9:59:23 AM7/7/13
to asf...@googlegroups.com
Hi Jayvant,

I think you are referring to the different possible ways of how kernel arguments could be aligned. You can find the detailed info with cuLaunchKernel call of CUDA Driver API (cite):

1) Kernel parameters can be specified via kernelParams. If f has N parameters, then
kernelParams needs to be an array of N pointers. Each of kernelParams[0] through
kernelParams[N-1] must point to a region of memory from which the actual kernel
parameter will be copied. The number of kernel parameters and their offsets and sizes
do not need to be specified as that information is retrieved directly from the kernel's
image.
2) Kernel parameters can also be packaged by the application into a single buffer that
is passed in via the extra parameter. This places the burden on the application of
knowing each kernel parameter's size and alignment/padding within the buffer. Here is
an example of using the extra parameter in this manner:

For pragmas - does you kernel argument list stays the same for all kernels you're supplying to asfermi? Or why could they be omitted? There are other ways to generate CUBIN ELF. For instance, you can generate placeholder-cubin (e.g. with nvcc) and then rewrite its contents with your asfermi-generated binary opcodes.

- D.

--
You received this message because you are subscribed to the Google Groups "asfermi" group.
To unsubscribe from this group and stop receiving emails from it, send an email to asfermi+u...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

jayvant....@gmail.com

unread,
Jul 23, 2013, 6:27:37 AM7/23/13
to asf...@googlegroups.com
Hi Dmitry,

Actually I am looking for the directives to be included in a sass file which is input to Asfermi.

To understand how Asfermi works I did the following:
1. compiled a cuda program with -cubin to generate .cubin file
2. using cuobjdump generated sass output from the .cubin file
3. Removed all the non-instruction lines from the sass file (i.e. code for ..., Function...)
4. Added !Kernel <kernelName> in the beginning and !EndKernel at the end
5. Passed this modified sass file to Asfermi with arguments -sm_20 -64
6. Used the Asfermi generated .cubin in the original code to load the kernel and execute it.
7. This failed. So used the original .cubin file and the program worked fine.
8. After going thru some of the posts on this forum, I added !Param directive to the sass file (as explained in my prev post) and then passed to asfermi
9. This worked for couple of examples (very small) but did not work on some others.

I am still struggling to understand
1. which directives are necessary
2. how exactly to add !Param directive

For example, here is one prototype I am having problem with
__global__ void k(int n0, const int* a, const int* b, const int* c, const int* d, const int* e, const int* f, int* g)
I used the param directive as follows:
!Param 4
!Param 8 7

regards,
Jayvant

Dmitry N. Mikushin

unread,
Jul 23, 2013, 7:48:42 AM7/23/13
to asf...@googlegroups.com
Hi Jayvant,

I was trying to explain that param specification depends not only on
the function prototype, but also on the way how you are executing the
kernel. For instance:

> __global__ void k(int n0, const int* a, const int* b, const int* c, const
> int* d, const int* e, const int* f, int* g)

May need

> !Param 4
> !Param 8 7

If parameters are packed (cuLaunchKernel doc, case 2), or

> !Param 8 8

If parameters are naturally-aligned, i.e. first 4-byte arg is padded
by 4-byte spacing (cuLaunchKernel doc, case 1).

So how are you launching the kernel?

FWIW, here's an example of SASS source + kernel loader:
https://hpcforge.org/scm/viewvc.php/branches/0.2/src/libasfermi/tests/sum_kernel/?root=kernelgen

- D.

2013/7/23 <jayvant....@gmail.com>:

Hou Yunqing

unread,
Jul 23, 2013, 7:52:04 AM7/23/13
to asfermi Google Group
Hi Jayvant,

I can't see exactly what went wrong. You can try the following:

1. Make sure your cuParamSetSize and cuParamSetv are correctly invoked. When I get a lot of parameters I always mess up these calls
2. Check to see if your kernels use shared, local or constant memory. If they do, the relevant directives need to be used. NV's compiler does tend to put some values in constant memory from time to time, so this is something to look for.
3. Check the return value of each API call. It helps to debug. Here's some helper code you can copy.
4. If nothing works, send your code to us. It could be asfermi's bug.

Regards,
Yunqing
Reply all
Reply to author
Forward
0 new messages