I assembled the following code into cubin, of course using asfermi,
!Machine 32
!Kernel f
MOV R0, c [0x0] [0x20];
LD R2, [R0];
Exit;
!EndKernel
And i got:
code for sm_21
Function : f
/*0000*/ /*0x80001de428004000*/ MOV R0, c [0x0] [0x20];
/*0008*/ /*0x00009c8580000000*/ LD R2, [R0];
however nsight report access violation on shared memory!!
Anything I was doing wrong?
Host code:
CUcontext context;
CUmodule module;
CUfunction kernel;
CUdeviceptr dev_p;
int size = sizeof(dev_p);
cuInit(0);
cuCtxCreate(&context, 0 ,0);
cuModuleLoad(&module, cubin);
cuModuleGetFunction(&kernel, module, "f");
cuMemAlloc(&dev_p, 1024);
void *config[] =
{
CU_LAUNCH_PARAM_BUFFER_POINTER, &dev_p,
CU_LAUNCH_PARAM_BUFFER_SIZE, &size,
CU_LAUNCH_PARAM_END
};
cuLaunchKernel(kernel, 256,1,1, 1024, 1, 1,0, 0, NULL, config);
cuCtxSynchronize();
HuanHuan
I did the following tests:
a) run kernel directly (failed)
b) run kernel under nsight (ok)
c) run kernel under nsight, then run kernel directly (ok)
d) run kernel under nsight, then directly(ok), then directly again (failed)
So LD may relay on something else.
I set up the stack pointer, (mov r1, c [0x1] [0x100] (pre 3.0 ABI)). Now
it can run with nishgt.
So,
1) if the kernel was run under nsight, nsight may automatically set up
the stack or other ABI processings for the kernel.
2) the register content may remain not cleared after kernel terminates.
(so next time R1 would be the proper SP).
I did some further testings. and found LD and atom instructions relay on
R1 (SP).
Sun HuanHuan
please check this!! And thanks to hyq. :)
https://groups.google.com/forum/?fromgroups#!topic/asfermi/F_xm6vhq0po