Theano Python2.4 buildbot Fail=0 Err=1 Ran=2349 Skip=32 KnownFail=16

5 views

Skip to first unread message

li...@iro.umontreal.ca

unread,

Aug 22, 2012, 6:58:25 AM8/22/12

to theano-...@googlegroups.com

Summary of the output:

File "/Tmp/nightly_build/Theano/theano/tensor/tests/test_opt.py", line 997, in test_gpu_fusion
Exception: Exception: nvcc return status

Full output:

/u/lisa/.virtualenvs/py2.4/lib/python2.4/site-packages/nose/plugins/manager.py:383: RuntimeWarning: Unable to load plugin noseprogressive = noseprogressive:ProgressivePlugin: invalid syntax (plugin.py, line 53)
RuntimeWarning)
Using gpu device 0: GeForce GTX 285
.......................................K............../Tmp/nightly_build/Theano/theano/compile/tests/test_inplace_opt_for_value.py:170: UserWarning: theano modules are deprecated and will be removed in release 0.7
super(ExampleRNN, self).__init__()
.............................................................................................WARNING (theano.gof.cmodule): Cache leak due to unpickle-able key data set([(((1,), (10, '1.6.1'), (10, '1.6.1')), ('CLinker.cmodule_key', ('-DREPLACE_WITH_AMDLIBM', '-O3', '-Wno-unused-label', '-Wno-unused-variable', '-Wno-write-strings', '-fno-math-errno'), ('amdlibm',), (), 'NPY_ABI_VERSION=0x1000009', 'c_compiler_str=g++ 4.5.1', 'md5:7156d426fe9db2f8a875c686e777841d', (<theano.gof.tests.test_compute_test_value.IncOneC object at 0xd20de50>, ((Scalar(int32), ((-1, 0), False)),), (1, (False,)))))])
...............................................................................................................SS..SSS...................................................................................................................................................................................................................................../u/lisa/.virtualenvs/py2.4/lib/python2.4/site-packages/scipy/signal/signaltools.py:421: ComplexWarning: Casting complex values to real discards the imaginary part
return sigtools._convolve2d(in1, in2, 1, val, bval, fillvalue)
........./Tmp/nightly_build/Theano/theano/tensor/nnet/conv.py:680: ComplexWarning: Casting complex values to real discards the imaginary part
zz[b,n,...] += _convolve2d(\
.....................................S..........SS...SSSSS...........................................................................KK................................................WARNING (theano.gof.cmodule): Cache leak due to unpickle-able key data set([(((3, (4,), (4,), (4,), (4,), (4,)), (10, '1.6.1'), (10, '1.6.1'), (10, '1.6.1'), (10, '1.6.1'), (10, '1.6.1'), (10, '1.6.1')), ('CLinker.cmodule_key', ('-DREPLACE_WITH_AMDLIBM', '-O3', '-Wno-unused-label', '-Wno-unused-variable', '-Wno-write-strings', '-fno-math-errno'), ('amdlibm',), (), 'NPY_ABI_VERSION=0x1000009', 'c_compiler_str=g++ 4.5.1', 'md5:7156d426fe9db2f8a875c686e777841d', (<theano.scalar.basic.Composite object at 0x127c8ed0>, ((Scalar(float64), ((-1, 0), False)), (Scalar(float64), ((-1, 1), False)), (Scalar(float64), ((-1, 2), False))), (1, (False, False, False)))))])
.WARNING (theano.gof.cmodule): Cache leak due to unpickle-able key data set([(((3, (4,), (4,), (4,)), (10, '1.6.1'), (10, '1.6.1'), (10, '1.6.1')), ('CLinker.cmodule_key', ('-DREPLACE_WITH_AMDLIBM', '-O3', '-Wno-unused-label', '-Wno-unused-variable', '-Wno-write-strings', '-fno-math-errno'), ('amdlibm',), (), 'NPY_ABI_VERSION=0x1000009', 'c_compiler_str=g++ 4.5.1', 'md5:7156d426fe9db2f8a875c686e777841d', (<theano.scalar.basic.Composite object at 0x10c79f90>, ((Scalar(float64), ((-1, 0), False)), (Scalar(float64), ((-1, 1), False))), (1, (False,)))))])
.WARNING (theano.gof.cmodule): Cache leak due to unpickle-able key data set([(((3, (4,), (4,), (4,)), (10, '1.6.1'), (10, '1.6.1'), (10, '1.6.1')), ('CLinker.cmodule_key', ('-DREPLACE_WITH_AMDLIBM', '-O3', '-Wno-unused-label', '-Wno-unused-variable', '-Wno-write-strings', '-fno-math-errno'), ('amdlibm',), (), 'NPY_ABI_VERSION=0x1000009', 'c_compiler_str=g++ 4.5.1', 'md5:7156d426fe9db2f8a875c686e777841d', (<theano.scalar.basic.Composite object at 0xfa01e10>, ((Scalar(float64), ((-1, 0), False)), (Scalar(float64), ((-1, 1), False))), (1, (False,)))))])
.............K........../Tmp/nightly_build/Theano/theano/sandbox/rng_mrg.py:758: UserWarning: MRG_RandomStreams Can't determine #streams from size (Shape.0), guessing 60*256
nstreams = self.n_streams(size)
..................K...............K............................................../u/lisa/.virtualenvs/py2.4/lib/python2.4/site-packages/scipy/sparse/data.py:54: ComplexWarning: Casting complex values to real discards the imaginary part
return self._with_data(self.data.astype(t))
/Tmp/nightly_build/Theano/theano/sparse/tests/test_basic.py:1960: ComplexWarning: Casting complex values to real discards the imaginary part
expected = data.toarray().astype(o_dtype)
..../u/lisa/.virtualenvs/py2.4/lib/python2.4/site-packages/scipy/sparse/compressed.py:486: SparseEfficiencyWarning: changing the sparsity structure of a csc_matrix is expensive. lil_matrix is more efficient.
SparseEfficiencyWarning)
/u/lisa/.virtualenvs/py2.4/lib/python2.4/site-packages/scipy/sparse/compressed.py:486: SparseEfficiencyWarning: changing the sparsity structure of a csr_matrix is expensive. lil_matrix is more efficient.
SparseEfficiencyWarning)
.....................................S.......S......................................................................................................................................................................................S.SS.................................................................................................................................................................................................................................................../Tmp/nightly_build/Theano/theano/scalar/basic.py:1338: DeprecationWarning: complex divmod(), // and % are deprecated
return x // y
............................................................................................................................................................................................................................................................................................................................................./Tmp/nightly_build/Theano/theano/tensor/basic.py:5395: DeprecationWarning: Division of two integer types with x / y is deprecated, please use x // y for an integer division.
missing = numpy.prod(ishapes[0]) / numpy.prod(requ_part)
.........................................K..............................................................................................................................S.................................................................................................................................................................................K..KKKK.K................................................K.............../Tmp/nightly_build/Theano/theano/tensor/tests/test_naacl09.py:64: UserWarning: RandomStreams is deprecated and will be removed in release 0.7. Use shared_randomstreams.RandomStreams or MRG_RandomStreams instead.
self.random = T.RandomStreams()
.....................................................S.S..S.......1 #include <Python.h>
2 #include <iostream>
3 #include <numpy/arrayobject.h>
4 #include "cuda_ndarray.cuh"
5 //////////////////////
6 //// Support Code
7 //////////////////////
8
9
10 #define INTDIV_POW2(a, b) (a >> b)
11 #define INTMOD_POW2(a, b) (a & ((1<<b)-1))
12 // Input 0 CudaNdarrayType(float32, matrix)
13 // Input 1 CudaNdarrayType(float32, matrix)
14 // Input 2 CudaNdarrayType(float32, matrix)
15 // Output 0 CudaNdarrayType(float32, matrix)
16 static __global__ void kernel_Composite_node_0_1(unsigned int numEls
17 , const int dim0
18 , const float * i0_data, int i0_str_0
19 , const float * i1_data, int i1_str_0
20 , const float * i2_data, int i2_str_0
21 , float * o0_data, int o0_str_0
22 )
23 {
24 const int idx = blockIdx.x * blockDim.x + threadIdx.x;
25 const int numThreads = blockDim.x * gridDim.x;
26 for (int i = idx; i < numEls; i += numThreads) {
27 int ii = i;
28 const float * ii_i0_data = i0_data;
29 const float * ii_i1_data = i1_data;
30 const float * ii_i2_data = i2_data;
31 float * ii_o0_data = o0_data;
32 int pos0 = ii;
33 ii_i0_data += pos0 * i0_str_0;
34 ii_i1_data += pos0 * i1_str_0;
35 ii_i2_data += pos0 * i2_str_0;
36 ii_o0_data += pos0 * o0_str_0;
37 {
38 npy_float32 V_DUMMY_ID__tmp1;
39 V_DUMMY_ID__tmp1 = sqrt(ii_i2_data[0]);
40 npy_float32 V_DUMMY_ID__tmp2;
41 V_DUMMY_ID__tmp2 = ii_i0_data[0] - ii_i1_data[0];
42 ii_o0_data[0] = V_DUMMY_ID__tmp2 + V_DUMMY_ID__tmp1;
43 }
44
45 }
46 }
47 // Input 0 CudaNdarrayType(float32, matrix)
48 // Input 1 CudaNdarrayType(float32, matrix)
49 // Input 2 CudaNdarrayType(float32, matrix)
50 // Output 0 CudaNdarrayType(float32, matrix)
51 static __global__ void kernel_Composite_node_0_2(unsigned int numEls
52 , const int dim0, const int dim1
53 , const float * i0_data, int i0_str_0, int i0_str_1
54 , const float * i1_data, int i1_str_0, int i1_str_1
55 , const float * i2_data, int i2_str_0, int i2_str_1
56 , float * o0_data, int o0_str_0, int o0_str_1
57 )
58 {
59 const int idx = blockIdx.x * blockDim.x + threadIdx.x;
60 const int numThreads = blockDim.x * gridDim.x;
61 for (int i = idx; i < numEls; i += numThreads) {
62 int ii = i;
63 const float * ii_i0_data = i0_data;
64 const float * ii_i1_data = i1_data;
65 const float * ii_i2_data = i2_data;
66 float * ii_o0_data = o0_data;
67 int pos1 = ii % dim1;
68 ii = ii / dim1;
69 ii_i0_data += pos1 * i0_str_1;
70 ii_i1_data += pos1 * i1_str_1;
71 ii_i2_data += pos1 * i2_str_1;
72 ii_o0_data += pos1 * o0_str_1;
73 int pos0 = ii;
74 ii_i0_data += pos0 * i0_str_0;
75 ii_i1_data += pos0 * i1_str_0;
76 ii_i2_data += pos0 * i2_str_0;
77 ii_o0_data += pos0 * o0_str_0;
78 {
79 npy_float32 V_DUMMY_ID__tmp1;
80 V_DUMMY_ID__tmp1 = sqrt(ii_i2_data[0]);
81 npy_float32 V_DUMMY_ID__tmp2;
82 V_DUMMY_ID__tmp2 = ii_i0_data[0] - ii_i1_data[0];
83 ii_o0_data[0] = V_DUMMY_ID__tmp2 + V_DUMMY_ID__tmp1;
84 }
85
86 }
87 }
88 // Input 0 CudaNdarrayType(float32, matrix)
89 // Input 1 CudaNdarrayType(float32, matrix)
90 // Input 2 CudaNdarrayType(float32, matrix)
91 // Output 0 CudaNdarrayType(float32, matrix)
92 static __global__ void kernel_Composite_node_0_Ccontiguous (unsigned int numEls
93 , const float * i0_data
94 , const float * i1_data
95 , const float * i2_data
96 , float * o0_data
97 )
98 {
99 const int idx = blockIdx.x * blockDim.x + threadIdx.x;
100 const int numThreads = blockDim.x * gridDim.x;
101 for (int i = idx; i < numEls; i += numThreads) {
102 {
103 npy_float32 V_DUMMY_ID__tmp1;
104 V_DUMMY_ID__tmp1 = sqrt(i2_data[i]);
105 npy_float32 V_DUMMY_ID__tmp2;
106 V_DUMMY_ID__tmp2 = i0_data[i] - i1_data[i];
107 o0_data[i] = V_DUMMY_ID__tmp2 + V_DUMMY_ID__tmp1;
108 }
109
110 }
111 }
112
113 static void can_collapse_node_0(int nd, const int * dims, const int * strides, int collapse[])
114 {
115 //can we collapse dims[i] and dims[i-1]
116 for(int i=nd-1;i>0;i--){
117 if(strides[i]*dims[i]==strides[i-1]){//the dims nd-1 are not strided again dimension nd
118 collapse[i]=1;
119 }else collapse[i]=0;
120 }
121 }
122
123
124 static int callkernel_node_0(unsigned int numEls, const int d,
125 const int * dims,
126 const float * i0_data, const int * i0_str, const float * i1_data, const int * i1_str, const float * i2_data, const int * i2_str,
127 float * o0_data, const int * o0_str)
128 {
129 numEls = dims[0]*dims[1]*1;
130
131 int local_dims[2];
132
133 int local_str[3][2];
134 int local_ostr[1][2];
135
136
137 int nd_collapse = 2;
138 for(int i=0;i<2;i++){//init new dim
139 local_dims[i]=dims[i];
140 }
141
142
143 for(int i=0;i<2;i++){//init new strides
144 local_str[0][i]=i0_str[i];
145 }
146
147
148 for(int i=0;i<2;i++){//init new strides
149 local_str[1][i]=i1_str[i];
150 }
151
152
153 for(int i=0;i<2;i++){//init new strides
154 local_str[2][i]=i2_str[i];
155 }
156
157
158 for(int i=0;i<2;i++){//init new strides
159 local_ostr[0][i]=o0_str[i];
160 }
161
162
163 for(int id=0;id<nd_collapse;id++){
164
165 bool all_broadcast=true;
166 for(int input_id=0;input_id<3;input_id++){
167 if(local_str[input_id][id]!=0 || local_dims[id]!=1) all_broadcast= false;
168 }
169 for(int input_id=0;input_id<1;input_id++){
170 if(local_ostr[input_id][id]!=0 || local_dims[id]!=1) all_broadcast= false;
171 }
172 if(all_broadcast){
173 for(int j=id+1;j<nd_collapse;j++)//remove dims i from the array
174 local_dims[j-1]=local_dims[j];
175 for(int input_id=0;input_id<3;input_id++){
176 for(int j=id+1;j<nd_collapse;j++){//remove dims i from the array
177 local_str[input_id][j-1]=local_str[input_id][j];
178 }
179 }
180 for(int output_id=0;output_id<1;output_id++){
181 for(int j=id+1;j<nd_collapse;j++){//remove dims i from the array
182 local_ostr[output_id][j-1]=local_ostr[output_id][j];
183 }
184 }
185 nd_collapse--; id--;
186 }
187 }
188
189 int nd_collapse_[2] = {1,1};
190
191 int nd_collapse_0[2] = {1,1};
192
193 can_collapse_node_0(nd_collapse, local_dims, local_str[0], nd_collapse_0);
194 for(int i=0;i<nd_collapse;i++){
195 if(nd_collapse_0[i]==0)
196 nd_collapse_[i]=0;
197 }
198
199
200 int nd_collapse_1[2] = {1,1};
201
202 can_collapse_node_0(nd_collapse, local_dims, local_str[1], nd_collapse_1);
203 for(int i=0;i<nd_collapse;i++){
204 if(nd_collapse_1[i]==0)
205 nd_collapse_[i]=0;
206 }
207
208
209 int nd_collapse_2[2] = {1,1};
210
211 can_collapse_node_0(nd_collapse, local_dims, local_str[2], nd_collapse_2);
212 for(int i=0;i<nd_collapse;i++){
213 if(nd_collapse_2[i]==0)
214 nd_collapse_[i]=0;
215 }
216
217
218 for(int i=nd_collapse-1;i>0;i--){
219 if(nd_collapse_[i]==1){
220 local_str[0][i-1]=local_str[0][i];//set new strides
221 for(int j=i+1;j<nd_collapse;j++)//remove stride i from the array
222 local_str[0][j-1]=local_str[0][j];
223 }
224 }
225
226
227 for(int i=nd_collapse-1;i>0;i--){
228 if(nd_collapse_[i]==1){
229 local_str[1][i-1]=local_str[1][i];//set new strides
230 for(int j=i+1;j<nd_collapse;j++)//remove stride i from the array
231 local_str[1][j-1]=local_str[1][j];
232 }
233 }
234
235
236 for(int i=nd_collapse-1;i>0;i--){
237 if(nd_collapse_[i]==1){
238 local_str[2][i-1]=local_str[2][i];//set new strides
239 for(int j=i+1;j<nd_collapse;j++)//remove stride i from the array
240 local_str[2][j-1]=local_str[2][j];
241 }
242 }
243
244
245 for(int i=nd_collapse-1;i>0;i--){
246 if(nd_collapse_[i]==1){
247 local_ostr[0][i-1]=local_ostr[0][i];//set new strides
248 for(int j=i+1;j<nd_collapse;j++)//remove stride i from the array
249 local_ostr[0][j-1]=local_ostr[0][j];
250 }
251 }
252
253
254 for(int i=nd_collapse-1;i>0;i--){
255 if(nd_collapse_[i]==1){
256 local_dims[i-1]*=local_dims[i];//set new dims
257 for(int j=i+1;j<nd_collapse;j++)//remove dims i from the array
258 local_dims[j-1]=local_dims[j];
259 }
260 }
261
262
263 for(int i=1, end=nd_collapse;i<end;i++){
264 if(nd_collapse_[i]==1)nd_collapse--;
265 }
266 if(nd_collapse == 1
267 && local_str[0][nd_collapse-1]==1 && local_str[1][nd_collapse-1]==1 && local_str[2][nd_collapse-1]==1 && local_ostr[0][nd_collapse-1]==1
268 ){nd_collapse=0;}
269 if(numEls==0) return 0;
270 switch (nd_collapse==0?0:min(2,nd_collapse)) {
271 case 0: {
272
273 //first use at least a full warp
274 int threads_per_block = std::min(numEls, (unsigned int)32); //WARP SIZE
275
276 //next start adding multiprocessors
277 int n_blocks = std::min(numEls/threads_per_block + (numEls % threads_per_block?1:0), (unsigned int)30); // UP TO NUMBER OF MULTIPROCESSORS
278
279 // next start adding more warps per multiprocessor
280 if (threads_per_block * n_blocks < numEls)
281 threads_per_block = std::min(numEls/n_blocks, (unsigned int)NUM_VECTOR_OP_THREADS_PER_BLOCK);
282 kernel_Composite_node_0_Ccontiguous<<<n_blocks, threads_per_block>>>(numEls, i0_data, i1_data, i2_data, o0_data);
283
284 //std::cerr << "calling callkernel returned\n";
285
286
287 CNDA_THREAD_SYNC;
288 cudaError_t err = cudaGetLastError();
289 if( cudaSuccess != err)
290 {
291 PyErr_Format(PyExc_RuntimeError, "Cuda error: %s: %s.\n n_blocks=%i threads_per_block=%i\n Call: %s\n",
292 "GpuElemwise node_0 Composite", cudaGetErrorString(err),
293 n_blocks, threads_per_block,
294 "kernel_Composite_node_0_Ccontiguous<<<n_blocks, threads_per_block>>>(numEls, i0_data, i1_data, i2_data, o0_data)");
295 return -1;
296
297 }
298
299 return 0;
300
301 } break;
302 case 1: {
303
304 //first use at least a full warp
305 int threads_per_block = std::min(numEls, (unsigned int)32); //WARP SIZE
306
307 //next start adding multiprocessors
308 int n_blocks = std::min(numEls/threads_per_block + (numEls % threads_per_block?1:0), (unsigned int)30); // UP TO NUMBER OF MULTIPROCESSORS
309
310 // next start adding more warps per multiprocessor
311 if (threads_per_block * n_blocks < numEls)
312 threads_per_block = std::min(numEls/n_blocks, (unsigned int)NUM_VECTOR_OP_THREADS_PER_BLOCK);
313
314 kernel_Composite_node_0_1<<<n_blocks, threads_per_block>>>(numEls, local_dims[0], i0_data, local_str[0][0], i1_data, local_str[1][0], i2_data, local_str[2][0], o0_data, local_ostr[0][0]);
315
316
317 CNDA_THREAD_SYNC;
318 cudaError_t err = cudaGetLastError();
319 if( cudaSuccess != err)
320 {
321 PyErr_Format(PyExc_RuntimeError, "Cuda error: %s: %s.\n n_blocks=%i threads_per_block=%i\n Call: %s\n",
322 "GpuElemwise node_0 Composite", cudaGetErrorString(err),
323 n_blocks, threads_per_block,
324 "kernel_Composite_node_0_Ccontiguous<<<n_blocks, threads_per_block>>>(numEls, local_dims[0], i0_data, local_str[0][0], i1_data, local_str[1][0], i2_data, local_str[2][0], o0_data, local_ostr[0][0])");
325 return -1;
326
327 }
328 return 0;
329
330 } break;
331 case 2: {
332
333 //first use at least a full warp
334 int threads_per_block = std::min(numEls, (unsigned int)32); //WARP SIZE
335
336 //next start adding multiprocessors
337 int n_blocks = std::min(numEls/threads_per_block + (numEls % threads_per_block?1:0), (unsigned int)30); // UP TO NUMBER OF MULTIPROCESSORS
338
339 // next start adding more warps per multiprocessor
340 if (threads_per_block * n_blocks < numEls)
341 threads_per_block = std::min(numEls/n_blocks, (unsigned int)NUM_VECTOR_OP_THREADS_PER_BLOCK);
342
343 kernel_Composite_node_0_2<<<n_blocks, threads_per_block>>>(numEls, local_dims[0], local_dims[1], i0_data, local_str[0][0], local_str[0][1], i1_data, local_str[1][0], local_str[1][1], i2_data, local_str[2][0], local_str[2][1], o0_data, local_ostr[0][0], local_ostr[0][1]);
344
345
346 CNDA_THREAD_SYNC;
347 cudaError_t err = cudaGetLastError();
348 if( cudaSuccess != err)
349 {
350 PyErr_Format(PyExc_RuntimeError, "Cuda error: %s: %s.\n n_blocks=%i threads_per_block=%i\n Call: %s\n",
351 "GpuElemwise node_0 Composite", cudaGetErrorString(err),
352 n_blocks, threads_per_block,
353 "kernel_Composite_node_0_Ccontiguous<<<n_blocks, threads_per_block>>>(numEls, local_dims[0], local_dims[1], i0_data, local_str[0][0], local_str[0][1], i1_data, local_str[1][0], local_str[1][1], i2_data, local_str[2][0], local_str[2][1], o0_data, local_ostr[0][0], local_ostr[0][1])");
354 return -1;
355
356 }
357 return 0;
358
359 } break;
360 }
361 return -2;
362 }
363
364
365 struct __struct_compiled_op_a8d0aecd8770fde3bb9ca34405878ea8 {
366 PyObject* __ERROR;
367
368 PyObject* storage_V3;
369 PyObject* storage_V5;
370 PyObject* storage_V7;
371 PyObject* storage_V1;
372
373
374 __struct_compiled_op_a8d0aecd8770fde3bb9ca34405878ea8() {}
375 ~__struct_compiled_op_a8d0aecd8770fde3bb9ca34405878ea8(void) {
376 cleanup();
377 }
378
379 int init(PyObject* __ERROR, PyObject* storage_V3, PyObject* storage_V5, PyObject* storage_V7, PyObject* storage_V1) {
380 Py_XINCREF(storage_V3);
381 Py_XINCREF(storage_V5);
382 Py_XINCREF(storage_V7);
383 Py_XINCREF(storage_V1);
384 this->storage_V3 = storage_V3;
385 this->storage_V5 = storage_V5;
386 this->storage_V7 = storage_V7;
387 this->storage_V1 = storage_V1;
388 int __failure = 0;
389
390 {
391
392 {
393
394 {
395
396 {
397
398 this->__ERROR = __ERROR;
399 return 0;
400 __label_7:
401
402 double __DUMMY_7;
403
404 }
405 __label_5:
406
407 double __DUMMY_5;
408
409 }
410 __label_3:
411
412 double __DUMMY_3;
413
414 }
415 __label_1:
416
417 double __DUMMY_1;
418
419 }
420
421 Py_XDECREF(this->storage_V3);
422 Py_XDECREF(this->storage_V5);
423 Py_XDECREF(this->storage_V7);
424 Py_XDECREF(this->storage_V1);
425
426 if (__failure) {
427 // When there is a failure, this code puts the exception
428 // in __ERROR.
429 PyObject* err_type = NULL;
430 PyObject* err_msg = NULL;
431 PyObject* err_traceback = NULL;
432 PyErr_Fetch(&err_type, &err_msg, &err_traceback);
433 if (!err_type) {err_type = Py_None;Py_INCREF(Py_None);}
434 if (!err_msg) {err_msg = Py_None; Py_INCREF(Py_None);}
435 if (!err_traceback) {err_traceback = Py_None; Py_INCREF(Py_None);}
436 PyObject* old_err_type = PyList_GET_ITEM(__ERROR, 0);
437 PyObject* old_err_msg = PyList_GET_ITEM(__ERROR, 1);
438 PyObject* old_err_traceback = PyList_GET_ITEM(__ERROR, 2);
439 PyList_SET_ITEM(__ERROR, 0, err_type);
440 PyList_SET_ITEM(__ERROR, 1, err_msg);
441 PyList_SET_ITEM(__ERROR, 2, err_traceback);
442 {Py_XDECREF(old_err_type);}
443 {Py_XDECREF(old_err_msg);}
444 {Py_XDECREF(old_err_traceback);}
445 }
446 // The failure code is returned to index what code block failed.
447 return __failure;
448
449 }
450 void cleanup(void) {
451 __label_1:
452
453 double __DUMMY_1;
454 __label_3:
455
456 double __DUMMY_3;
457 __label_5:
458
459 double __DUMMY_5;
460 __label_7:
461
462 double __DUMMY_7;
463
464 Py_XDECREF(this->storage_V3);
465 Py_XDECREF(this->storage_V5);
466 Py_XDECREF(this->storage_V7);
467 Py_XDECREF(this->storage_V1);
468 }
469 int run(void) {
470 int __failure = 0;
471
472 PyObject* py_V1;
473 CudaNdarray * V1;
474 PyObject* py_V3;
475 CudaNdarray * V3;
476 PyObject* py_V5;
477 CudaNdarray * V5;
478 PyObject* py_V7;
479 CudaNdarray * V7;
480 {
481
482 py_V1 = Py_None;
483 {Py_XINCREF(py_V1);}
484 V1 = NULL;
485 {
486
487 py_V3 = PyList_GET_ITEM(storage_V3, 0);
488 {Py_XINCREF(py_V3);}
489
490 assert(py_V3->ob_refcnt >= 2); // There should be at least one ref from the container object,
491 // and one ref from the local scope.
492
493 if (CudaNdarray_Check(py_V3))
494 {
495 //fprintf(stderr, "c_extract CNDA object w refcnt %p %i\n", py_V3, (py_V3->ob_refcnt));
496 V3 = (CudaNdarray*)py_V3;
497 //std::cerr << "c_extract " << V3 << '\n';
498 if (V3->nd != 2)
499 {
500 PyErr_Format(PyExc_RuntimeError, "Some CudaNdarray has rank %i, it was supposed to have rank 2", V3->nd);
501 V3 = NULL;
502 {__failure = 4; goto __label_4;};
503 }
504 //std::cerr << "c_extract " << V3 << " nd check passed\n";
505
506
507 assert(V3);
508 Py_INCREF(py_V3);
509 }
510 else if (py_V3 == Py_None)
511 {
512 PyErr_SetString(PyExc_TypeError,
513 "expected a CudaNdarray, not None");
514 V3 = NULL;
515 {__failure = 4; goto __label_4;};
516 }
517 else
518 {
519 //fprintf(stderr, "FAILING c_extract CNDA object w refcnt %p %i\n", py_V3, (py_V3->ob_refcnt));
520 PyErr_SetString(PyExc_TypeError, "Argument not a CudaNdarray");
521 V3 = NULL;
522 {__failure = 4; goto __label_4;};
523 }
524 //std::cerr << "c_extract done " << V3 << '\n';
525
526
527 {
528
529 py_V5 = PyList_GET_ITEM(storage_V5, 0);
530 {Py_XINCREF(py_V5);}
531
532 assert(py_V5->ob_refcnt >= 2); // There should be at least one ref from the container object,
533 // and one ref from the local scope.
534
535 if (CudaNdarray_Check(py_V5))
536 {
537 //fprintf(stderr, "c_extract CNDA object w refcnt %p %i\n", py_V5, (py_V5->ob_refcnt));
538 V5 = (CudaNdarray*)py_V5;
539 //std::cerr << "c_extract " << V5 << '\n';
540 if (V5->nd != 2)
541 {
542 PyErr_Format(PyExc_RuntimeError, "Some CudaNdarray has rank %i, it was supposed to have rank 2", V5->nd);
543 V5 = NULL;
544 {__failure = 6; goto __label_6;};
545 }
546 //std::cerr << "c_extract " << V5 << " nd check passed\n";
547
548
549 assert(V5);
550 Py_INCREF(py_V5);
551 }
552 else if (py_V5 == Py_None)
553 {
554 PyErr_SetString(PyExc_TypeError,
555 "expected a CudaNdarray, not None");
556 V5 = NULL;
557 {__failure = 6; goto __label_6;};
558 }
559 else
560 {
561 //fprintf(stderr, "FAILING c_extract CNDA object w refcnt %p %i\n", py_V5, (py_V5->ob_refcnt));
562 PyErr_SetString(PyExc_TypeError, "Argument not a CudaNdarray");
563 V5 = NULL;
564 {__failure = 6; goto __label_6;};
565 }
566 //std::cerr << "c_extract done " << V5 << '\n';
567
568
569 {
570
571 py_V7 = PyList_GET_ITEM(storage_V7, 0);
572 {Py_XINCREF(py_V7);}
573
574 assert(py_V7->ob_refcnt >= 2); // There should be at least one ref from the container object,
575 // and one ref from the local scope.
576
577 if (CudaNdarray_Check(py_V7))
578 {
579 //fprintf(stderr, "c_extract CNDA object w refcnt %p %i\n", py_V7, (py_V7->ob_refcnt));
580 V7 = (CudaNdarray*)py_V7;
581 //std::cerr << "c_extract " << V7 << '\n';
582 if (V7->nd != 2)
583 {
584 PyErr_Format(PyExc_RuntimeError, "Some CudaNdarray has rank %i, it was supposed to have rank 2", V7->nd);
585 V7 = NULL;
586 {__failure = 8; goto __label_8;};
587 }
588 //std::cerr << "c_extract " << V7 << " nd check passed\n";
589
590
591 assert(V7);
592 Py_INCREF(py_V7);
593 }
594 else if (py_V7 == Py_None)
595 {
596 PyErr_SetString(PyExc_TypeError,
597 "expected a CudaNdarray, not None");
598 V7 = NULL;
599 {__failure = 8; goto __label_8;};
600 }
601 else
602 {
603 //fprintf(stderr, "FAILING c_extract CNDA object w refcnt %p %i\n", py_V7, (py_V7->ob_refcnt));
604 PyErr_SetString(PyExc_TypeError, "Argument not a CudaNdarray");
605 V7 = NULL;
606 {__failure = 8; goto __label_8;};
607 }
608 //std::cerr << "c_extract done " << V7 << '\n';
609
610
611 {
612
613 //std::cerr << "C_CODE Composite{[add(sub(i0, i1), sqrt(i2))]} START\n";
614 //standard elemwise size checks
615
616
617 int dims[2] = {1,1};
618
619
620 int broadcasts_V3[2] = {0, 0};
621
622
623 int broadcasts_V5[2] = {0, 0};
624
625
626 int broadcasts_V7[2] = {0, 0};
627
628
629 //std::cerr << "C_CODE Composite{[add(sub(i0, i1), sqrt(i2))]} checking input V3\n";
630 if (2 != V3->nd)
631 {
632 PyErr_Format(PyExc_TypeError, "need 2 dims, not %i", V3->nd);
633 {__failure = 9; goto __label_9;};
634 }
635 for (int i = 0; i< 2; ++i)
636 {
637 dims[i] = (dims[i] == 1) ? CudaNdarray_HOST_DIMS(V3)[i] : dims[i];
638 if ((!(broadcasts_V3[i] && CudaNdarray_HOST_DIMS(V3)[i] == 1))&& (dims[i] != CudaNdarray_HOST_DIMS(V3)[i]))
639 {
640 //std::cerr << "C_CODE Composite{[add(sub(i0, i1), sqrt(i2))]} checking input V3 failed\n";
641 PyErr_Format(PyExc_ValueError, "GpuElemwise. Input dimension mis-match. One of your inputs has shape[%i] == %i, but the output's size on that axis is %i.",
642 i,
643 CudaNdarray_HOST_DIMS(V3)[i],
644 dims[i]
645 );
646 {__failure = 9; goto __label_9;};
647 }
648 }
649
650
651 //std::cerr << "C_CODE Composite{[add(sub(i0, i1), sqrt(i2))]} checking input V5\n";
652 if (2 != V5->nd)
653 {
654 PyErr_Format(PyExc_TypeError, "need 2 dims, not %i", V5->nd);
655 {__failure = 9; goto __label_9;};
656 }
657 for (int i = 0; i< 2; ++i)
658 {
659 dims[i] = (dims[i] == 1) ? CudaNdarray_HOST_DIMS(V5)[i] : dims[i];
660 if ((!(broadcasts_V5[i] && CudaNdarray_HOST_DIMS(V5)[i] == 1))&& (dims[i] != CudaNdarray_HOST_DIMS(V5)[i]))
661 {
662 //std::cerr << "C_CODE Composite{[add(sub(i0, i1), sqrt(i2))]} checking input V5 failed\n";
663 PyErr_Format(PyExc_ValueError, "GpuElemwise. Input dimension mis-match. One of your inputs has shape[%i] == %i, but the output's size on that axis is %i.",
664 i,
665 CudaNdarray_HOST_DIMS(V5)[i],
666 dims[i]
667 );
668 {__failure = 9; goto __label_9;};
669 }
670 }
671
672
673 //std::cerr << "C_CODE Composite{[add(sub(i0, i1), sqrt(i2))]} checking input V7\n";
674 if (2 != V7->nd)
675 {
676 PyErr_Format(PyExc_TypeError, "need 2 dims, not %i", V7->nd);
677 {__failure = 9; goto __label_9;};
678 }
679 for (int i = 0; i< 2; ++i)
680 {
681 dims[i] = (dims[i] == 1) ? CudaNdarray_HOST_DIMS(V7)[i] : dims[i];
682 if ((!(broadcasts_V7[i] && CudaNdarray_HOST_DIMS(V7)[i] == 1))&& (dims[i] != CudaNdarray_HOST_DIMS(V7)[i]))
683 {
684 //std::cerr << "C_CODE Composite{[add(sub(i0, i1), sqrt(i2))]} checking input V7 failed\n";
685 PyErr_Format(PyExc_ValueError, "GpuElemwise. Input dimension mis-match. One of your inputs has shape[%i] == %i, but the output's size on that axis is %i.",
686 i,
687 CudaNdarray_HOST_DIMS(V7)[i],
688 dims[i]
689 );
690 {__failure = 9; goto __label_9;};
691 }
692 }
693
694
695 Py_XDECREF(V1);
696 V1 = V3;
697 Py_INCREF(V1);
698 for (int i = 0; (i< 2) && (V1); ++i) {
699 if (dims[i] != CudaNdarray_HOST_DIMS(V1)[i])
700 {
701 Py_DECREF(V1);
702 V1 = NULL;
703 {__failure = 9; goto __label_9;};
704 }
705 }
706 //std::cerr << "ELEMWISE NEW V1 nd" << V1->nd << "\n";
707 //std::cerr << "ELEMWISE NEW V1 data" << V1->devdata << "\n";
708
709
710 {
711 //new block so that failure gotos don't skip over variable initialization
712 //std::cerr << "calling callkernel\n";
713 if (callkernel_node_0(1, 0, dims
714
715
716 , CudaNdarray_DEV_DATA(V3), CudaNdarray_HOST_STRIDES(V3)
717
718
719 , CudaNdarray_DEV_DATA(V5), CudaNdarray_HOST_STRIDES(V5)
720
721
722 , CudaNdarray_DEV_DATA(V7), CudaNdarray_HOST_STRIDES(V7)
723
724
725 , CudaNdarray_DEV_DATA(V1), CudaNdarray_HOST_STRIDES(V1)
726
727
728 ))
729 {
730 // error
731
732
733 Py_DECREF(V1);
734 V1 = NULL;
735
736
737 {__failure = 9; goto __label_9;};
738 }
739 else // no error
740 {
741 }
742 }
743 //std::cerr << "C_CODE Composite{[add(sub(i0, i1), sqrt(i2))]} END\n";
744
745 __label_9:
746
747 double __DUMMY_9;
748
749 }
750 __label_8:
751
752 //std::cerr << "cleanup " << py_V7 << " " << V7 << "\n";
753 //fprintf(stderr, "c_cleanup CNDA py_object w refcnt %p %i\n", py_V7, (py_V7->ob_refcnt));
754 if (V7)
755 {
756 //fprintf(stderr, "c_cleanup CNDA cn_object w refcnt %p %i\n", V7, (V7->ob_refcnt));
757 Py_XDECREF(V7);
758 }
759 //std::cerr << "cleanup done" << py_V7 << "\n";
760
761 {Py_XDECREF(py_V7);}
762
763 double __DUMMY_8;
764
765 }
766 __label_6:
767
768 //std::cerr << "cleanup " << py_V5 << " " << V5 << "\n";
769 //fprintf(stderr, "c_cleanup CNDA py_object w refcnt %p %i\n", py_V5, (py_V5->ob_refcnt));
770 if (V5)
771 {
772 //fprintf(stderr, "c_cleanup CNDA cn_object w refcnt %p %i\n", V5, (V5->ob_refcnt));
773 Py_XDECREF(V5);
774 }
775 //std::cerr << "cleanup done" << py_V5 << "\n";
776
777 {Py_XDECREF(py_V5);}
778
779 double __DUMMY_6;
780
781 }
782 __label_4:
783
784 //std::cerr << "cleanup " << py_V3 << " " << V3 << "\n";
785 //fprintf(stderr, "c_cleanup CNDA py_object w refcnt %p %i\n", py_V3, (py_V3->ob_refcnt));
786 if (V3)
787 {
788 //fprintf(stderr, "c_cleanup CNDA cn_object w refcnt %p %i\n", V3, (V3->ob_refcnt));
789 Py_XDECREF(V3);
790 }
791 //std::cerr << "cleanup done" << py_V3 << "\n";
792
793 {Py_XDECREF(py_V3);}
794
795 double __DUMMY_4;
796
797 }
798 __label_2:
799
800 if (!__failure) {
801
802 //std::cerr << "sync\n";
803 if (NULL == V1) {
804 // failure: sync None to storage
805 Py_XDECREF(py_V1);
806 py_V1 = Py_None;
807 Py_INCREF(py_V1);
808 }
809 else
810 {
811 if (py_V1 != (PyObject*)V1)
812 {
813 Py_XDECREF(py_V1);
814 py_V1 = (PyObject*)V1;
815 Py_INCREF(py_V1);
816 }
817 assert(py_V1->ob_refcnt);
818 }
819
820 PyObject* old = PyList_GET_ITEM(storage_V1, 0);
821 {Py_XINCREF(py_V1);}
822 PyList_SET_ITEM(storage_V1, 0, py_V1);
823 {Py_XDECREF(old);}
824 }
825
826 //std::cerr << "cleanup " << py_V1 << " " << V1 << "\n";
827 //fprintf(stderr, "c_cleanup CNDA py_object w refcnt %p %i\n", py_V1, (py_V1->ob_refcnt));
828 if (V1)
829 {
830 //fprintf(stderr, "c_cleanup CNDA cn_object w refcnt %p %i\n", V1, (V1->ob_refcnt));
831 Py_XDECREF(V1);
832 }
833 //std::cerr << "cleanup done" << py_V1 << "\n";
834
835 {Py_XDECREF(py_V1);}
836
837 double __DUMMY_2;
838
839 }
840
841
842 if (__failure) {
843 // When there is a failure, this code puts the exception
844 // in __ERROR.
845 PyObject* err_type = NULL;
846 PyObject* err_msg = NULL;
847 PyObject* err_traceback = NULL;
848 PyErr_Fetch(&err_type, &err_msg, &err_traceback);
849 if (!err_type) {err_type = Py_None;Py_INCREF(Py_None);}
850 if (!err_msg) {err_msg = Py_None; Py_INCREF(Py_None);}
851 if (!err_traceback) {err_traceback = Py_None; Py_INCREF(Py_None);}
852 PyObject* old_err_type = PyList_GET_ITEM(__ERROR, 0);
853 PyObject* old_err_msg = PyList_GET_ITEM(__ERROR, 1);
854 PyObject* old_err_traceback = PyList_GET_ITEM(__ERROR, 2);
855 PyList_SET_ITEM(__ERROR, 0, err_type);
856 PyList_SET_ITEM(__ERROR, 1, err_msg);
857 PyList_SET_ITEM(__ERROR, 2, err_traceback);
858 {Py_XDECREF(old_err_type);}
859 {Py_XDECREF(old_err_msg);}
860 {Py_XDECREF(old_err_traceback);}
861 }
862 // The failure code is returned to index what code block failed.
863 return __failure;
864
865 }
866 };
867
868
869 int __struct_compiled_op_a8d0aecd8770fde3bb9ca34405878ea8_executor(__struct_compiled_op_a8d0aecd8770fde3bb9ca34405878ea8* self) {
870 return self->run();
871 }
872
873 void __struct_compiled_op_a8d0aecd8770fde3bb9ca34405878ea8_destructor(void* executor, void* self) {
874 //printf("doing cleanup\n");
875 //fflush(stdout);
876 // ((__struct_compiled_op_a8d0aecd8770fde3bb9ca34405878ea8*)self)->cleanup();
877 // free(self);
878 delete ((__struct_compiled_op_a8d0aecd8770fde3bb9ca34405878ea8*)self);
879 //printf("done cleanup\n");
880 //fflush(stdout);
881 }
882
883 //////////////////////
884 //// Functions
885 //////////////////////
886 static PyObject * instantiate(PyObject * self, PyObject *argtuple) {
887 assert(PyTuple_Check(argtuple));
888 if (5 != PyTuple_Size(argtuple)){
889 PyErr_Format(PyExc_TypeError, "Wrong number of arguments, expected 5, got %i", (int)PyTuple_Size(argtuple));
890 return NULL;
891 }
892 __struct_compiled_op_a8d0aecd8770fde3bb9ca34405878ea8* struct_ptr = new __struct_compiled_op_a8d0aecd8770fde3bb9ca34405878ea8();
893 struct_ptr->init( PyTuple_GET_ITEM(argtuple, 0),PyTuple_GET_ITEM(argtuple, 1),PyTuple_GET_ITEM(argtuple, 2),PyTuple_GET_ITEM(argtuple, 3),PyTuple_GET_ITEM(argtuple, 4) );
894 PyObject* thunk = PyCObject_FromVoidPtrAndDesc((void*)(&__struct_compiled_op_a8d0aecd8770fde3bb9ca34405878ea8_executor), struct_ptr, __struct_compiled_op_a8d0aecd8770fde3bb9ca34405878ea8_destructor);
895 return thunk; }
896
897 //////////////////////
898 //// Module init
899 //////////////////////
900 static PyMethodDef MyMethods[] = {
901 {"instantiate", instantiate, METH_VARARGS, "undocumented"} ,
902 {NULL, NULL, 0, NULL}
903 };
904 PyMODINIT_FUNC inita8d0aecd8770fde3bb9ca34405878ea8(void){
905 import_array();
906 (void) Py_InitModule("a8d0aecd8770fde3bb9ca34405878ea8", MyMethods);
907 }
908
===============================
Error freeing device pointer 0x210b00 (initialization error).
!!!! error freeing device memory 0x210b00 (self=0x1f3fae30)
Error freeing device pointer 0x210a00 (initialization error).
!!!! error freeing dev_structure memory 0x210a00 (self=0x1f3fae30)
Error freeing device pointer 0x210700 (initialization error).
!!!! error freeing device memory 0x210700 (self=0x1fa1f5f0)
Error freeing device pointer 0x210600 (initialization error).
!!!! error freeing dev_structure memory 0x210600 (self=0x1fa1f5f0)
Error freeing device pointer 0x210f00 (initialization error).
!!!! error freeing device memory 0x210f00 (self=0x22783770)
Error freeing device pointer 0x210e00 (initialization error).
!!!! error freeing dev_structure memory 0x210e00 (self=0x22783770)
Error freeing device pointer 0x210900 (initialization error).
!!!! error freeing device memory 0x210900 (self=0x11f51b30)
Error freeing device pointer 0x210800 (initialization error).
!!!! error freeing dev_structure memory 0x210800 (self=0x11f51b30)
Error freeing device pointer 0x211100 (initialization error).
!!!! error freeing device memory 0x211100 (self=0x12481270)
Error freeing device pointer 0x211000 (initialization error).
!!!! error freeing dev_structure memory 0x211000 (self=0x12481270)
Error freeing device pointer 0x210d00 (initialization error).
!!!! error freeing device memory 0x210d00 (self=0x129765b0)
Error freeing device pointer 0x210c00 (initialization error).
!!!! error freeing dev_structure memory 0x210c00 (self=0x129765b0)
Error freeing device pointer 0x211600 (initialization error).
!!!! error freeing device memory 0x211600 (self=0x21992830)
Error freeing device pointer 0x211500 (initialization error).
!!!! error freeing dev_structure memory 0x211500 (self=0x21992830)
Error freeing device pointer 0x211300 (initialization error).
!!!! error freeing device memory 0x211300 (self=0x121dc4b0)
Error freeing device pointer 0x211200 (initialization error).
!!!! error freeing dev_structure memory 0x211200 (self=0x121dc4b0)
Error freeing device pointer 0x211800 (initialization error).
!!!! error freeing device memory 0x211800 (self=0x247ab130)
Error freeing device pointer 0x211700 (initialization error).
!!!! error freeing dev_structure memory 0x211700 (self=0x247ab130)
Error freeing device pointer 0x211a00 (initialization error).
!!!! error freeing device memory 0x211a00 (self=0x1f3ccc30)
Error freeing device pointer 0x211900 (initialization error).
!!!! error freeing dev_structure memory 0x211900 (self=0x1f3ccc30)
Error freeing device pointer 0x212600 (initialization error).
!!!! error freeing device memory 0x212600 (self=0x208af970)
Error freeing device pointer 0x212500 (initialization error).
!!!! error freeing dev_structure memory 0x212500 (self=0x208af970)
Error freeing device pointer 0x211e00 (initialization error).
!!!! error freeing device memory 0x211e00 (self=0xfff4730)
Error freeing device pointer 0x211d00 (initialization error).
!!!! error freeing dev_structure memory 0x211d00 (self=0xfff4730)
Error freeing device pointer 0x212400 (initialization error).
!!!! error freeing device memory 0x212400 (self=0x20ef6ef0)
Error freeing device pointer 0x212300 (initialization error).
!!!! error freeing dev_structure memory 0x212300 (self=0x20ef6ef0)
Error freeing device pointer 0x211c00 (initialization error).
!!!! error freeing device memory 0x211c00 (self=0x24018f70)
Error freeing device pointer 0x211b00 (initialization error).
!!!! error freeing dev_structure memory 0x211b00 (self=0x24018f70)
Error freeing device pointer 0x212800 (initialization error).
!!!! error freeing device memory 0x212800 (self=0x1f92ed70)
Error freeing device pointer 0x212700 (initialization error).
!!!! error freeing dev_structure memory 0x212700 (self=0x1f92ed70)
Error freeing device pointer 0x212000 (initialization error).
!!!! error freeing device memory 0x212000 (self=0x11d08e70)
Error freeing device pointer 0x211f00 (initialization error).
!!!! error freeing dev_structure memory 0x211f00 (self=0x11d08e70)
Error freeing device pointer 0x212c00 (initialization error).
!!!! error freeing device memory 0x212c00 (self=0x209bc9b0)
Error freeing device pointer 0x212b00 (initialization error).
!!!! error freeing dev_structure memory 0x212b00 (self=0x209bc9b0)
Error freeing device pointer 0x212200 (initialization error).
!!!! error freeing device memory 0x212200 (self=0x20d59a30)
Error freeing device pointer 0x212100 (initialization error).
!!!! error freeing dev_structure memory 0x212100 (self=0x20d59a30)
Error freeing device pointer 0x213000 (initialization error).
!!!! error freeing device memory 0x213000 (self=0x1f458330)
Error freeing device pointer 0x212f00 (initialization error).
!!!! error freeing dev_structure memory 0x212f00 (self=0x1f458330)
Error freeing device pointer 0x212a00 (initialization error).
!!!! error freeing device memory 0x212a00 (self=0x247b8230)
Error freeing device pointer 0x212900 (initialization error).
!!!! error freeing dev_structure memory 0x212900 (self=0x247b8230)
Error freeing device pointer 0x213600 (initialization error).
!!!! error freeing device memory 0x213600 (self=0x1f6b1870)
Error freeing device pointer 0x213500 (initialization error).
!!!! error freeing dev_structure memory 0x213500 (self=0x1f6b1870)
Error freeing device pointer 0x213200 (initialization error).
!!!! error freeing device memory 0x213200 (self=0x247af930)
Error freeing device pointer 0x213100 (initialization error).
!!!! error freeing dev_structure memory 0x213100 (self=0x247af930)
Error freeing device pointer 0x212e00 (initialization error).
!!!! error freeing device memory 0x212e00 (self=0x1fe905f0)
Error freeing device pointer 0x212d00 (initialization error).
!!!! error freeing dev_structure memory 0x212d00 (self=0x1fe905f0)
Error freeing device pointer 0x213800 (initialization error).
!!!! error freeing device memory 0x213800 (self=0x20292630)
Error freeing device pointer 0x213700 (initialization error).
!!!! error freeing dev_structure memory 0x213700 (self=0x20292630)
Error freeing device pointer 0x213400 (initialization error).
!!!! error freeing device memory 0x213400 (self=0x1f267330)
Error freeing device pointer 0x213300 (initialization error).
!!!! error freeing dev_structure memory 0x213300 (self=0x1f267330)
Exception exceptions.MemoryError: 'error freeing device pointer 0x213300 (initialization error)' in 'garbage collection' ignored
Fatal Python error: unexpected exception during garbage collection
E.......................................K......K.................................................................................................................................................SS...SSSSS.SSS....................................................
======================================================================
ERROR: test_gpu_fusion (theano.tensor.tests.test_opt.test_fusion)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/Tmp/nightly_build/Theano/theano/tensor/tests/test_opt.py", line 997, in test_gpu_fusion
self.do(mode, cuda.float32_shared_constructor, shp, gpu=True)
File "/Tmp/nightly_build/Theano/theano/tensor/tests/test_opt.py", line 921, in do
f = function(sym_inputs,[],updates=[(out, g)],mode=mode)
File "/Tmp/nightly_build/Theano/theano/compile/function.py", line 205, in function
profile=profile)
File "/Tmp/nightly_build/Theano/theano/compile/pfunc.py", line 482, in pfunc
on_unused_input=on_unused_input)
File "/Tmp/nightly_build/Theano/theano/compile/function_module.py", line 1398, in orig_function
on_unused_input=on_unused_input).create(
File "/Tmp/nightly_build/Theano/theano/compile/function_module.py", line 1227, in create
_fn, _i, _o = self.linker.make_thunk(input_storage=input_storage_lists)
File "/Tmp/nightly_build/Theano/theano/gof/link.py", line 380, in make_thunk
output_storage = output_storage)[:3]
File "/Tmp/nightly_build/Theano/theano/gof/vm.py", line 795, in make_all
no_recycling)
File "/Tmp/nightly_build/Theano/theano/sandbox/cuda/__init__.py", line 243, in make_thunk
compute_map, no_recycling)
File "/Tmp/nightly_build/Theano/theano/gof/op.py", line 576, in make_thunk
output_storage=node_output_storage)
File "/Tmp/nightly_build/Theano/theano/gof/cc.py", line 911, in make_thunk
keep_lock=keep_lock)
File "/Tmp/nightly_build/Theano/theano/gof/cc.py", line 854, in __compile__
keep_lock=keep_lock)
File "/Tmp/nightly_build/Theano/theano/gof/cc.py", line 1277, in cthunk_factory
module = get_module_cache().module_from_key(
File "/Tmp/nightly_build/Theano/theano/gof/cmodule.py", line 955, in module_from_key
module = compile_steps.next()
File "/Tmp/nightly_build/Theano/theano/gof/cc.py", line 1200, in compile_cmodule_by_step
preargs=preargs)
File "/Tmp/nightly_build/Theano/theano/sandbox/cuda/nvcc_compiler.py", line 355, in compile_str
'for cmd', ' '.join(cmd))
Exception: Exception: nvcc return status
-------------------- >> begin captured stdout << ---------------------
new cases 0
new cases 1
new cases 2
new cases 3
new cases 4
new cases 5
new cases 6
new cases 7
new cases 8
new cases 9
new cases 10
new cases 11
new cases 12
new cases 13
new cases 14
new cases 15
Skip test 16 as the gpu code currently supports only float32
new cases 17
new cases 18
new cases 19
new cases 20
new cases 21
Skip test 22 as the gpu code currently supports only float32
Skip test 23 as the gpu code currently supports only float32
Skip test 24 as the gpu code currently supports only float32
Skip test 25 as the gpu code currently supports only float32
new cases 26
new cases 27
new cases 28
new cases 29
Skip test 30 as the gpu code currently supports only float32
new cases 31
new cases 32
new cases 33
new cases 34
new cases 35
new cases 36
new cases 37
new cases 38
new cases 39
Skip test 40 as the gpu code currently supports only float32
new cases 41
new cases 42
Skip test 43 as the gpu code currently supports only float32
Skip test 44 as the gpu code currently supports only float32
Skip test 45 as the gpu code currently supports only float32
Skip test 46 as the gpu code currently supports only float32
Skip test 47 as the gpu code currently supports only float32
Skip test 48 as the gpu code currently supports only float32
new cases 49
new cases 50
new cases 51
new cases 52
new cases 53
new cases 54

['nvcc', '-shared', '-g', '--maxrregcount=32', '-O3', '-arch=sm_13', '-m64', '-Xcompiler', '-Wno-write-strings,-fno-math-errno,-Wno-unused-label,-Wno-unused-variable,-DCUDA_NDARRAY_CUH=bfd6d963a7dee49831dab72583d1fe02,-fPIC', '-Xlinker', '-rpath,/tmp/lisa/theano.NOBACKUP/compiledir_Linux-2.6.35.14-106.fc14.x86_64-x86_64-with-redhat-14-Laughlin-x86_64-2.4.4/cuda_ndarray', '-Xlinker', '-rpath,/opt/cuda-4.2.9/lib', '-Xlinker', '-rpath,/opt/cuda-4.2.9/lib64', '-I/tmp/lisa/theano.NOBACKUP/compiledir_Linux-2.6.35.14-106.fc14.x86_64-x86_64-with-redhat-14-Laughlin-x86_64-2.4.4/cuda_ndarray', '-I/opt/cuda-4.2.9/include', '-I/u/lisa/.virtualenvs/py2.4/lib/python2.4/site-packages/numpy/core/include', '-I/u/lisa/.virtualenvs/py2.4/include/python2.4', '-I/Tmp/nightly_build/Theano/theano/sandbox/cuda', '-o', '/tmp/lisa/theano.NOBACKUP/compiledir_Linux-2.6.35.14-106.fc14.x86_64-x86_64-with-redhat-14-Laughlin-x86_64-2.4.4/tmp0QLmy-/a8d0aecd8770fde3bb9ca34405878ea8.so', 'mod.cu', '-L/tmp/lisa
/theano.NOBACKUP/compiledir_Linux-2.6.35.14-106.fc14.x86_64-x86_64-with-redhat-14-Laughlin-x86_64-2.4.4/cuda_ndarray', '-L/opt/cuda-4.2.9/lib', '-L/opt/cuda-4.2.9/lib', '-L/opt/cuda-4.2.9/lib64', '-L/u/lisa/.virtualenvs/py2.4/lib', '-lpython2.4', '-lcudart', '-lcublas', '-lcuda_ndarray']

--------------------- >> end captured stdout << ----------------------

----------------------------------------------------------------------
Ran 2349 tests in 3071.075s

FAILED (KNOWNFAIL=16, SKIP=32, errors=1)

Reply all

Reply to author

Forward

0 new messages