Is there any method to get mean values of 2D array in gpu?

152 views

Skip to first unread message

zlff...@gmail.com

unread,

Feb 12, 2017, 8:48:32 PM2/12/17

to reikna

I need to get mean values of 2D array which is a part of 3D array.

for example,

a[:,:,30].mean()

where 'a' is some kind of numpy 3D array in gpu.

Currently i calculate this value in cpu like as

a_mean_cpu = (a_gpu.get())[:,:,30].mean()

But i want to calculate it in gpu and take the result.

is there any method to calculate mean value in reikna?

Bogdan Opanchuk

unread,

Feb 12, 2017, 11:23:03 PM2/12/17

to reikna

The support for strides in PyOpenCL/PyCUDA was patchy when I last tried it, and I haven't followed it for some time. Apparently the current situation is the following.

Both PyOpenCL and PyCUDA support non-contiguous views (that's what you get from [:,:,30]), but if you try to get the `data` attribute of such a view from a PyOpenCL array, an exception is raised. And since this attribute is what you have to pass to a kernel, you can't do much with this view. If you use CUDA as a backend you can use the view as a regular array for preparation and execution of Reikna computations, since all the striding support is already in place:

import numpy
from reikna import cluda
from reikna.algorithms import Reduce, predicate_sum

api = cluda.cuda_api()
thr = api.Thread.create()

a = numpy.random.normal(size=(10, 20, 40))
a_part = a[:,:,30]
a_dev = thr.to_device(a)
a_dev_part = a_dev[:,:,30]

r = Reduce(a_part, predicate_sum(a.dtype)).compile(thr)
res = thr.empty_like(r.parameter.output)
r(res, a_dev_part)

assert numpy.allclose(res.get(), a_part.sum())

(the mean value can be obtained by attaching a `div_const()`/`div_param()` transformation).

Now there is one issue with that. Unlike on a CPU (well, ignoring the cache mechanics) reading from inconsequential areas of memory (every 40th element in the above example) can be pretty slow. So if your array is large, and its size along the innermost axis is small, you may be better off doing a transposition to bring the innermost axis outwards (transpose with axes=(2, 0, 1)) and then doing the reduction on a contiguous view `a[30]`. PyOpenCL does not support such views either, so you will need to copy that part of the array first (something like `a_dev[30].copy()` should work, or you could do it in a computation in a separate kernel) and then reduce over it.

Yes, it's quite a bit of boilerplate, and I know it's one of the biggest problems with reikna, but. Unfortunately, fixing that amounts to a lot of work, which I do not have time for. Plus, Theano's GPU support is under development now, and as soon as it's operational, Reikna will become largely obsolete.

Note also that I haven't tested the strided behavior much, because at the time when I wrote the array access functions for Reikna the support for strides in PyOpenCL/PyCUDA was in its infancy. It seems that the situation has changed a lot since then.

Reply all

Reply to author

Forward

0 new messages