Bizarre CUBIN kernels loading success/failure

28 views
Skip to first unread message

Dmitry N. Mikushin

unread,
Oct 28, 2012, 10:21:49 PM10/28/12
to asf...@googlegroups.com
Hi Yunqing & Gentlemen,

May I ask your help analyzing the following strage case. Attached is a
cubin image I linked from two cubins (cuda 5.0 nvlink), one of which
is produced by asfermi, and another - by ptxas. Most of kernels are
similar and differ only by number of registers. There are also several
unique kernels and data fileds. In general image looks consistent, but
here is a strange thing: cuLaunchKernel is able to launch only every
third kernel, others fail with CUDA_ERROR_LAUNCH_OUT_OF_RESOURCES. The
tendence is regular, here is a table:

142 46 0000fc18 46 0xa2ef90 Success
143 14 0000fc98 14 0xa311f0 Fail
144 30 0000fd18 30 0xa333a0 Fail
145 62 0000fd98 62 0xa35550 Success
146 22 0000fe18 22 0xa37700 Fail
147 54 0000fe98 54 0xa398b0 Fail
148 6 0000ff18 6 0xa3ba60 Success
149 38 0000ff98 38 0xa3dc10 Fail
150 26 00010018 26 0xa3fdc0 Fail
151 58 00010098 58 0xa41f70 Success
152 10 00010118 10 0xa44120 Fail
153 42 00010198 42 0xa462d0 Fail
154 2 00010218 2 0xa48480 Success
155 34 00010298 34 0xa4a630 Fail
157 50 00010398 50 0xa4d3c0 Success
158 18 00010418 18 0xa4f590 Fail
159 4 00010498 4 0xa51740 Fail
160 36 00010518 36 0xa538f0 Success
161 52 00010598 52 0xa55aa0 Fail
162 20 00010618 20 0xa57c50 Fail
163 44 00010698 44 0xa59e00 Success
164 12 00010718 12 0xa5bfb0 Fail
165 28 00010798 28 0xa5e160 Fail
166 60 00010818 60 0xa60310 Success
167 24 00010898 24 0xa624c0 Fail
168 56 00010918 56 0xa64670 Fail
169
170 8 00010ad8 8 0xa68b70 Fail
171 40 00010b58 40 0xa6ad20 Fail
172 48 00010bd8 48 0xa6ced0 Success
173 16 00010c58 16 0xa6f080 Fail
174 0 00010cd8 0 0xa71230 Fail
175 32 00010d58 32 0xa733e0 Success
176 47 00010dd8 47 0xa75590 Fail
177 15 00010e58 15 0xa77740 Fail
178 31 00010ed8 31 0xa798f0 Success
179 63 00010f58 63 0xa7baa0 Fail
181 23 000235e0 23 0xaa4900 Success
182 55 00023660 55 0xaa6b40 Fail
183
184 7 000237a0 7 0xaa9950 Success
185 39 00023820 39 0xaabb20 Fail
186 3 000238a0 3 0xaadcd0 Fail
187 35 00023920 35 0xaafe80 Success
188 51 000239a0 51 0xab2030 Fail
189 19 00023a20 19 0xab41e0 Fail
190 27 00023aa0 27 0xab6390 Success
191 59 00023b20 59 0xab8540 Fail
192 43 00023ba0 43 0xaba6f0 Fail
193 11 00023c20 11 0xabc8a0 Success
194 25 00023ca0 25 0xabea50 Fail
195 57 00023d20 57 0xac0c00 Fail
196 9 00023da0 9 0xac2db0 Success
197 41 00023e20 41 0xac4f60 Fail
198 1 00023ea0 1 0xac7110 Fail
199 33 00023f20 33 0xac92c0 Success
200 49 00023fa0 49 0xacb470 Fail
201 17 00024020 17 0xacd620 Fail
202 45 000240a0 45 0xacf7d0 Success
203 13 00024120 13 0xad1980 Fail
204
205 29 00024260 29 0xad4790 Success
206 61 000242e0 61 0xad6960 Fail
207 5 00024360 5 0xad8b10 Fail
208 37 000243e0 37 0xadacc0 Success
209 21 00024460 21 0xadce70 Fail
210 53 000244e0 53 0xadf020 Fail

Number on the left is an ordinal of kernel in cubin byteflow. Second
column is number of registers, followed by addresses in cubin and in
runtime.
Do you have any ideas why kernels could succeed and fail to launch in
such strange regular way?

Attachments: problematic cubin and loader program demonstration fails.

Thanks,
- Dima.
Thanks,
- D.
c8cfd690.cubin.tar.gz
loader.cu

Dmitry N. Mikushin

unread,
Oct 28, 2012, 10:36:52 PM10/28/12
to asf...@googlegroups.com
For comparison, let me also attach the original cubin file produced by
asfermi, as before linking. It's clearly seen that Yunqing's original
ELF packer naturally avoids the issue mentioned above by intermixing
kernels texts with their corresponding info sections data. So, one
workaround would be to mimic the same layout in the linked cubin and
see if it works. But this method intoduces a lot of additional work on
adjusting relocation addresses, so better to figure out and fix the
problem.

- D.

2012/10/29 Dmitry N. Mikushin <maem...@gmail.com>:
b919982f.cubin.tar.gz

Dmitry N. Mikushin

unread,
Nov 8, 2012, 6:43:15 PM11/8/12
to asf...@googlegroups.com
Does it make sense that asfermi generates kernel staff as
text-constant0-info-shared-local, while CUDA 5.0 currently uses
sequence info-constant0-text-shared-local?

Does it make sense that asfermi includes in size of program header the
size of .nv.info.kernel section, while CUDA 5.0 does not?

Dmitry N. Mikushin

unread,
Nov 8, 2012, 7:02:17 PM11/8/12
to asf...@googlegroups.com
So far, nvlink problem is highly-likely related to malformed program header:

Program Headers:
Type Offset VirtAddr PhysAddr
FileSiz MemSiz Flags Align
LOOS+8732e00 0x747274732e006261 0x746d79732e006261 0x747865742e006261
0x676c656e72656b2e 0x6f6c6c616d5f6e65 WE 2e747865742e0063
LOOS+e72656b 0x5f7869736f705f6e 0x6e67696c616d656d 0x6b2e747865742e00
0x6e65676c656e7265 0x742e00656572665f R E 41695f5f2e747865
LOOS+96d6f74 0x6b2e747865742e00 0x6e65676c656e7265 0x6c616374736f685f
0x2e747865742e006c 0x65676c656e72656b WE 68636e75616c5f6e

Good one should look like:

Program Headers:
Type Offset VirtAddr PhysAddr
FileSiz MemSiz Flags Align
PHDR 0x0000000000002440 0x0000000000000000 0x0000000000000000
0x00000000000000a8 0x00000000000000a8 R E 8
LOAD 0x00000000000009b8 0x0000000000000000 0x0000000000000000
0x0000000000000488 0x0000000000000488 R E 8
LOAD 0x0000000000001000 0x0000000000000000 0x0000000000000000
0x0000000000001000 0x0000000000001017 RW 8

Something goes wrong with header, if asfermi-generated kernel is linked in.

2012/11/9 Dmitry N. Mikushin <maem...@gmail.com>:

Hou Yunqing

unread,
Nov 9, 2012, 12:31:13 AM11/9/12
to asfermi Google Group
Hey Dima,

I'm not familiar with cuda 5.0 and the new linker. If you need, I can take a look on this after my current semester ends... 

The order of the kernel sections can be changed by changing the order of the lines at around line 100 in helperCubin.cpp
To exclude the size of the info section, you can remove the last part of line 527.

Also, the 'good' program headers you listed use alignment of 8, but asfermi uses an alignment of 4 consistently in the program headers. I remember I left all the alignments as 4 because I started by working on 32-bit system, and even when I looked at the cubins nvcc generated on 64-bit systems, the program headers still had an alignment of 4... this could have changed in some version of nvcc. You can take a look at the program header of cubins generated by ptxas 5.0.

Yunqing

Dmitry N. Mikushin

unread,
Nov 9, 2012, 7:44:04 PM11/9/12
to asf...@googlegroups.com
Hi Yunqing,

Strange, but when I'm trying to change the region you suggested (and
also another one), kernels cannot be launched any longer (error 701).

Regarding program headers. When linker is looped in, it operates on
REL (relocatable) objects, rather than EXEC. Among other differences,
REL does not have program headers. Thus, I comment out the pheaders
generation part in cubinHelper. I don't know how, but it seems in case
of linking nvlink is in charge of creating proper program headers,
what he fails to do, if one of objects is from asfermi.

- D.

2012/11/9 Hou Yunqing <hyq.n...@gmail.com>:
Reply all
Reply to author
Forward
0 new messages