Hi Dorival -
I haven't been using GCC 4.7, so I haven't seen those warnings before,
but I'm not surprised to see them. I'll work on eliminating them.
As to the benchmark question - there is one thing you can do to
improve things - use cuarrays as the input to your function instead of
numpy arrays. Change the allocation part to look like this:
t0 = time()
x = linspace( 0., 1., n)
y = linspace(10., 20., n)
cuarray_x = cuarray(x)
cuarray_y = cuarray(y)
t1 = time()
dta = t1-t0
print 'Allocation: dt =', dta
And then use cuarray_x and cuarray_y. If you use numpy arrays, the
cost of copying the data into the copperhead data structure is going
to be paid every time you call the function.
Using numpy arrays:
(copperhead-new)Elendil:samples catanzar$ python vadd.py 10000000
Allocation: dt = 0.483824968338
Python: dt = 0.0831031799316
TBB: dt = 0.451075077057
OpenMP: dt = 0.193276882172
Total: dt = 1.2112801075
Using cuarrays (on my Core 2 Duo laptop):
(copperhead-new)Elendil:samples catanzar$ python vadd.py 10000000
Allocation: dt = 0.63053393364
Python: dt = 0.111096858978
TBB: dt = 0.20470905304
OpenMP: dt = 0.133543968201
Total: dt = 1.07988381386
Using cuarrays helps - now the OpenMP code is almost as fast as the
native numpy code. However, I wouldn't expect this code to run much
faster than numpy, even when parallelized - the parallelization incurs
some overhead, and the code is basically bandwidth bound anyway. To
see more of a difference, you could try something more compute
intensive (like sort), or create a more complicated program, which
Copperhead would fuse together to reduce memory traffic.
For example, calling sort, I see the following:
Python: dt = 2.00371193886
TBB: dt = 2.5909011364
OpenMP: dt = 1.27429485321
- bryan