# How to do a dot product?

44 views

### Sajith T S

Dec 4, 2011, 10:38:59 PM12/4/11
Greetings!

None of the below worked, and apologies if it's too silly or
documented somewhere, but: how to do a dot product of two vectors?

(Been doing some Copperhead tests for a class project.)

There's also a secondary question about continued development of
Copperhead. Friend of mine who attended Bryan's talk at SC'11 says
the project is very much alive, but Google code commits are from a
year ago. Is there another place one should go looking?

Thanks!
Sajith.

from itertools import imap
from operator import mul
import numpy
import timeit
import sys

@cu
def dot_product(x, y):
return sum(imap(mul, x, y))

@cu
def dot_product(x, y):
return sum([x[i]*y[i] for i in range(len(x))])

@cu
def dot_product(x, y):
return reduce(lambda sum, p: sum + p[0] * p[1], zip(x, y), 0)

@cu
def dot_product(x, y):
return sum(lambda a, b: return a * b, x, y)

@cu
def dot_product(x, y):
def m(xi, yi):
prod = xi * yi
return prod
return sum(map(m, x, y))

@cu
def dot_product(x, y):
return numpy.dot(x, y)

--
"the lyf so short, the craft so long to lerne."
-- Chaucer.

### Bryan Catanzaro

Dec 4, 2011, 11:27:39 PM12/4/11
Hi Sajith -
Thanks for the question.  Here are a couple ways to do a dot product that should work:

@cu
def dot_product(x, y):
return sum(map(op_mul, x, y))

@cu
def dot_product(x, y):
def elem_wise(xi, yi):
return xi * yi
return sum(map(elem_wise, x, y))

To see what functions you can call from within a Copperhead program, take a look at prelude.py, which has some rudimentary documentation.  However, not all functions mentioned there work yet, especially with the code in the main public repository.

The project is very much alive, although I'm currently the only person working on it. During the past year I made a lot of progress (some of which you can see in the public clones on Google Code, especially this one: http://code.google.com/r/bryancatanzaro-copperhead/source/browse), and wrote and defended my dissertation on Copperhead (which you can see here if you're interested: http://www.eecs.berkeley.edu/Pubs/TechRpts/2011/EECS-2011-45.html).  I then joined Nvidia Research, where I am continuing work on the project, with a focus on making it practically useful, instead of being just an interesting research vehicle.  I have completely rewritten the compiler in the past few months, and hope to push a new, more stable version publically in the next few weeks. Although my new compiler doesn't support all the features the old compiler attempted to do, I am documenting it and testing it, so that I can invite more people like yourself to use it.

Thanks for the interest, and I'll let you know when I push my new compiler publicly.

Take care,
bryan

### Sajith T S

Dec 5, 2011, 12:17:21 AM12/5/11
Ah, that makes sense. Thanks Bryan!

However both didn't quite work for me. This is what I get:

Traceback (most recent call last):
File "./dot.py", line 39, in <module>
t1 = do_run(dogpu, "GPU")
File "./dot.py", line 27, in do_run
n = t.timeit(count)
File "/usr/lib64/python2.7/timeit.py", line 194, in timeit
timing = self.inner(it, self.timer)
File "/usr/lib64/python2.7/timeit.py", line 100, in inner
_func()
File "./dot.py", line 19, in dogpu
gpu = dot_product(x, y)
return P.execute(self, args, kwargs)
return execute(cufn, *args, **kwargs)
return_value = compiled_fn(*cu_inputs)
File "<string>", line 10, in dot_product
return self.fn(*args_cache)
result = module.entryPoint(array)
TypeError: No registered converter was able to produce a C++ rvalue of type unsigned long long from this Python object of type PooledDeviceAllocation

(The code I'm trying is attached, if you can risk looking at some
poor greenhorn lines of Python.)

should switch to another clone?

Lately I've had a chance to look at several GPU programming DSLs
haven't spent a lot of time with it -- along with Accelerate, Nikola
etc), and Copperhead is certainly among the most promising ones. Good
luck with the new direction!

I've found Thrust to be very interesting and useful, and I'm looking
forward to having Copperhead too with the official CUDA distribution
one day. Particularly so since (no offense!) I've found the whole
thing a pain to set up, but it's only expected of new code. :)

Out of curiosity, do you think if there ever will be an OpenCL
backend?

Regards,
Sajith.

dot.py

### Sajith T S

Dec 5, 2011, 12:52:15 AM12/5/11
Sajith T S <saj...@gmail.com> wrote:

> TypeError: No registered converter was able to produce a C++ rvalue of type unsigned long long from this Python object of type PooledDeviceAllocation

Oh, in fact this is the same error I've been getting from all sample
programs except simple_tests.py. Does it suggest that something is

Thanks,
Sajith.

### Bryan Catanzaro

Dec 5, 2011, 2:12:02 AM12/5/11
Yes, I think so.  What version of CodePy, PyCUDA and CUDA are you running?  I'm guessing you're on 64-bit Linux?

- bryan

### Sajith T S

Dec 5, 2011, 2:43:46 AM12/5/11
Yes, it's 64-bit Linux. This is what "uname -a" says:

Linux localhost 3.0.4-gentoo-r1 #1 SMP Fri Sep 30 12:05:35 EDT 2011 x86_64 Intel(R) Xeon(R) CPU X5365 @ 3.00GHz GenuineIntel GNU/Linux

CUDA 4.0, Codepy 2011.1, PyCUDA 2011.1.3, cgen 2011.1.

### Bryan Catanzaro

Dec 5, 2011, 11:54:15 AM12/5/11
I've seen this bug before - it arises from changes in the way PyCUDA and Boost export the functions PyCUDA provides, which Copperhead programs expect to use.  In the past, I've solved it by:
1.  Not using PyCUDA's shipped Boost library, and instead using the system Boost library when building PyCUDA.
2.  Sometimes I have had to use an older version of Boost.  1.41 has worked for me.  I'm not sure if this is absolutely necessary, or if just building PyCUDA with the system Boost library is good enough.

For what it's worth, the new version of the Copperhead runtime and compiler do not use PyCUDA (although they still use Codepy, another of Andreas Klöckner's projects).  In other words, this particular issue is solved in the new version of Copperhead I expect to release shortly. I realize Copperhead is too difficult to install, and I'm working to make this process easier.

Also, I notice from your trace that you're interested in timing Copperhead program execution.  A couple things:

0.  The development clone of Copperhead, signficantly reduced Copperhead runtime overhead compared to the version you're using.  You can grab it from here:

1.  The first time you run the function, Copperhead has to invoke nvcc, which takes O(10) seconds.  Subsequent runs will use a cached binary.

2.  If you care about the overhead of moving data back and forth between the CPU and GPU, you should use CuArray objects.
The following code will work, but more slowly:
a = dot_product(np.array(...), np.array(...))
Copperhead can't control GPU memory placement for numpy arrays, so this code will result in extraneous memory transfers.
x = CuArray(np.array(...))
y = CuArray(np.array(...))
a = dot_product(x, y)

This will ensure that data is only moved when necessary.

- bryan

### Sajith T S

Dec 5, 2011, 2:02:01 PM12/5/11
Ah, yes -- disabling shipped Boost library, and using system Boost (I
used 1.42) and then rebuilding and re-installing PyCUDA did the trick.
Thanks!

Thank you for the additional pointers also -- they are very helpful.

Sajith.

Bryan Catanzaro <bryan.c...@gmail.com> wrote:
> I've seen this bug before - it arises from changes in the way PyCUDA and
> Boost export the functions PyCUDA provides, which Copperhead programs
> expect to use. In the past, I've solved it by:
> 1. Not using PyCUDA's shipped Boost library, and instead using the system
> Boost library when building PyCUDA.
> 2. Sometimes I have had to use an older version of Boost. 1.41 has worked
> for me. I'm not sure if this is absolutely necessary, or if just building
> PyCUDA with the system Boost library is good enough.
>
>
> For what it's worth, the new version of the Copperhead runtime and compiler
> do not use PyCUDA (although they still use Codepy, another of Andreas

> Kl�ckner's projects). In other words, this particular issue is solved in

### Bryan Catanzaro

Dec 5, 2011, 4:51:00 PM12/5/11

On Mon, Dec 5, 2011 at 11:02 AM, Sajith T S wrote:
Ah, yes -- disabling shipped Boost library, and using system Boost (I
used 1.42) and then rebuilding and re-installing PyCUDA did the trick.
Thanks!

Thank you for the additional pointers also -- they are very helpful.

Sajith.

Bryan Catanzaro <bryan.c...@gmail.com> wrote:
> I've seen this bug before - it arises from changes in the way PyCUDA and
> Boost export the functions PyCUDA provides, which Copperhead programs
> expect to use.  In the past, I've solved it by:
> 1.  Not using PyCUDA's shipped Boost library, and instead using the system
> Boost library when building PyCUDA.
> 2.  Sometimes I have had to use an older version of Boost.  1.41 has worked
> for me.  I'm not sure if this is absolutely necessary, or if just building
> PyCUDA with the system Boost library is good enough.
>
>
> For what it's worth, the new version of the Copperhead runtime and compiler
> do not use PyCUDA (although they still use Codepy, another of Andreas
> Klöckner's projects).  In other words, this particular issue is solved in

### Sajith T S

Dec 6, 2011, 3:58:17 PM12/6/11
Hi Bryan,

Thank you for your patience. I guess I should try testing it to the
extreme. You know, the way people are supposed conduct themselves in
mailing lists. So I've got the next set of questions!

First, what would it take to make something like this work?

@cu
def vector_sum(x):
sum(map((lambda xi: xi if xi > 0 else xi * -1), x))

It dumps a bunch of traceback on me, ending with:

"ValueError: visiting unknown node: <_ast.Expr object at 0x2999950>".

I can send the whole thing if you're interested.

Second, have you tried to make Black & Scholes kernel (the one shipped
with Nvidia SDK) work with Copperhead? It doesn't look like a line by
line translation to Copperhead would work, in the absence of abs(),
exp(), sqrt() etc. Do you have suggestions on how to approach this?

Regards,
Sajith.

Bryan Catanzaro <bryan.c...@gmail.com> wrote:
> Glad to hear that worked!
>
> On Mon, Dec 5, 2011 at 11:02 AM, Sajith T S <saj...@gmail.com> wrote:
>
> > Ah, yes -- disabling shipped Boost library, and using system Boost (I
> > used 1.42) and then rebuilding and re-installing PyCUDA did the trick.
> > Thanks!
> >
> > Thank you for the additional pointers also -- they are very helpful.
> >
> > Sajith.
> >
> > Bryan Catanzaro <bryan.c...@gmail.com> wrote:
> > > I've seen this bug before - it arises from changes in the way PyCUDA and
> > > Boost export the functions PyCUDA provides, which Copperhead programs
> > > expect to use. In the past, I've solved it by:
> > > 1. Not using PyCUDA's shipped Boost library, and instead using the
> > system
> > > Boost library when building PyCUDA.
> > > 2. Sometimes I have had to use an older version of Boost. 1.41 has
> > worked
> > > for me. I'm not sure if this is absolutely necessary, or if just
> > building
> > > PyCUDA with the system Boost library is good enough.
> > >
> > >
> > > For what it's worth, the new version of the Copperhead runtime and
> > compiler
> > > do not use PyCUDA (although they still use Codepy, another of Andreas

> > > Kl�ckner's projects). In other words, this particular issue is solved in

### Bryan Catanzaro

Dec 6, 2011, 5:17:02 PM12/6/11
Hi Sajith -
Thanks for the questions, and don't worry, you're not testing my patience.  Keep the questions coming!

I would write your function like this:
@cu
def vector_sum(x):
def elwise(xi):
if (xi > 0):
return xi
else:
return -xi
return sum(map(elwise, x))

I haven't tried making the Black-Scholes kernel work with Copperhead, so I'm not sure.

- bryan

> > > Klöckner's projects).  In other words, this particular issue is solved in

### Bryan Catanzaro

Dec 6, 2011, 5:18:55 PM12/6/11
Sorry, I hit send before I meant to.
If all that's required to get Black-Scholes working with Copperhead is adding abs, sqrt (I think exp is already there), then that's easy.  Just add a Python implementation in prelude.py, and make sure the C++ include files that Copperhead includes have definitions.

- bryan

> > > Klöckner's projects).  In other words, this particular issue is solved in

### Sajith T S

Dec 6, 2011, 6:31:03 PM12/6/11
Hi Bryan,

That was the first thing I tried, but that didn't work; doing map(abs,
x) outside Copperhead did. I've attached the code I've been trying to
run and the traceback, in case you might want to see that.

(I realize that numpy.arange() do not generate negative numbers; but I
wasn't exactly interested in that...)

I haven't switched to the new bryancatanzaro-copperhead clone repo
yet; maybe I should try doing that?

(For whatever it's worth, friend of mine and I have been doing a
timing comparison between Accelerate and Copperhead for a class
project. Copperhead seems to be doing really well in our tests;
however surely it's too soon to draw conclusions since both of us are
not experienced in writing well performing Haskell and/or Python
and/or GPU programs. Still, thought you might be interested.)

Thanks,
Sajith.

> > > > > Kl�ckner's projects). In other words, this particular issue is

vector-sum.py
vector-sum-error.txt

### Bryan Catanzaro

Dec 6, 2011, 8:58:05 PM12/6/11
I've pushed some changes to the bryancatanzaro-copperhead clone repo that adds exp, abs, and sqrt functionality.  The following code runs correctly with your tester:
@cu
def vector_sum(x):
def el_wise(xi):
return abs(xi)
return sum(map(el_wise, x))

- bryan

> > > > > Klöckner's projects).  In other words, this particular issue is

### Sajith T S

Dec 7, 2011, 12:08:14 PM12/7/11