Using Reikna with PyOpenCL

238 views
Skip to first unread message

smu...@uchicago.edu

unread,
Apr 28, 2017, 10:25:36 AM4/28/17
to reikna
Hi!

I am new to reikna (and pyopencl) and was wondering whether or not it is possible to use a reikna function within a context created by pyopencl.  I am trying to parallelize the following loop from python.  It solves the Gross-Pitaevskii equation with a split-step Fourier method.  This involves applying a nonlinear operator in real space to the wavefunction (the complex 2D array U), transforming to frequency space, then applying the Laplacian operator there, and transforming back.  The loop can be seen below, where: potential_mat, U, kxx, kyy, and dt are defined earlier in the code.

$for i in range(5000):
$ U=np.exp( -dt*( potential_mat + U*np.conjugate(U) ))*U # advance w/ nonlinear operator
$ C=sp.fftshift(sp.fft2(U)) # transform to fourier space
$ C=np.exp( -dt*0.5*( kxx**2+kyy**2 ) )*C # advance w/ Laplacian operator in frequency space
$ U=sp.ifft2(sp.fftshift(C)) # transform back


So far I have written the following pyopencl code to perform the first exponential step:


$ ctx = cl.create_some_context()
$ queue = cl.CommandQueue(ctx)
$
$ U = U.astype(np.complex128)
$ potential_mat = potential_mat.astype(np.float64)
$ dtmat = np.array(dt).astype(np.float64)
$ U_dev = cl_array.to_device(queue, U)
$ potential_mat_dev = cl_array.to_device(queue, initmat)
$ dt_dev = cl_array.to_device(queue, dtmat)
$ dest_dev = cl_array.empty_like(U_dev)

$ prg = cl.Program(ctx, """
$   #pragma OPENCL EXTENSION cl_khr_fp64: enable
$  #define PYOPENCL_DEFINE_CDOUBLE
$  #include <pyopencl-complex.h>
$  __kernel void part1(__global const cdouble_t *U, __global const double *potential_mat, __global const double *dt, __global cdouble_t *mult)
$  {
$
$    // get thread id numbers
$    int gid0 = get_global_id(0);
$    int gid1 = get_global_id(1);
$
$    // get the global sizes of the various blocks
$    int num0 = get_global_size(0);
$
$    // declare a place to store the magnitude value
$    __local double mag;
$    mag = cdouble_abs(U[gid0 + num0 * gid1]) * cdouble_abs(U[gid0 + num0 * gid1]);
$
$    //declare a place to store exponentiated value
$    __local double expo;
$    expo = exp(- dt[0] * mag - dt[0] * potential_mat[gid0 + num0 * gid1]);
$
$
$    // Multiply real exponent by complex wavefct
$   mult[gid0 + num0 * gid1] = cdouble_rmul( expo, U[gid0 + num0 * gid1]);
$
$    }
$    """).build()
$
$ prg.part1(queue, U_dev.shape, None, U_dev.data, potential_mat_dev.data, dt_dev.data, dest_dev.data)


Now I would like to use reikna's FFT function to transform the array dest_dev (currently sitting on the device) into C_dev = fftshift(fft2(dest_dev)).  Is there a way that I can convert the current pyopencl queue into a cluda thread, so that I can follow prg.part1 with this FFT transform?

Thanks so much for your help!
Seth

Bogdan Opanchuk

unread,
Apr 28, 2017, 10:45:58 AM4/28/17
to reikna
Yes, it's possible, just create the Thread object with your queue as the first argument (see http://reikna.publicfields.net/en/latest/api/cluda.html#reikna.cluda.api.Thread for details). Then all the kernels from Reikna computations will be serialized into that queue. Arrays created from the methods of this Thread (array(), empty_like()) are subclasses of PyOpenCL Array, so you can pass them to PyOpenCL kernels. The only thing that this class adds is a .thread attribute, which is used by reikna-integrator (see below), but regular Reikna computations should just accept PyOpenCL Arrays.

Incidentally, I've actually created Reikna to do similar calculations at my place of work, and there's an ODE/SDE integrator based on Reikna (https://github.com/fjarri/reikna-integrator) that implements the split-step method (along with several others). There are no docs, unfortunately, but there's an example (examples/soliton.py) which, hopefully, is sufficiently informative. There's even a BEC simulation toolkit on top of that (https://github.com/fjarri/beclab) with some docs (http://beclab.publicfields.net/) in case you are interested.

Bogdan Opanchuk

unread,
Apr 28, 2017, 10:55:53 AM4/28/17
to reikna
P.S. If you are going to test reikna-integrator, you will need to uninstall progressbar2 3.* that will be installed by pip, and do `pip install "progressbar2<3"`. The interface was changed in the 3.x, and I keep forgetting to add the <3 condition in setup.py, or fix the progressbar usage (I will try to finally do that during the weekend).

smu...@uchicago.edu

unread,
Apr 28, 2017, 5:32:38 PM4/28/17
to reikna
Hi Bogdan,

Thanks for the very helpful reply!  I tried out your advice in a simple case below.  I am attempting to square a small random array, first with pyopencl, then pass the result to the reikna code and square the output of the pyopencl code again.  However, I keep getting this error "pyopencl.cffi_cl.LogicError: when processing argument #2 (1-based): clSetKernelArg failed: INVALID_MEM_OBJECT".  I have copied my code below for reference.

$ from __future__ import absolute_import
$ from __future__ import print_function
$ import pyopencl as cl
$ import pyopencl.array as cl_array
$ import numpy as np
$ import time as time
$
$ import reikna.cluda as cluda
$ from reikna.core import Computation, Parameter, Annotation
$ from reikna.fft import FFT
$
$
$ num = 256
$ a = np.random.rand(num).astype(np.float32)
$
$ ###############################################################################
$ # PyOpenCL piece
$
$ ctx = cl.create_some_context()
$ queue = cl.CommandQueue(ctx)
$
$ a_dev0 = cl_array.to_device(queue, a)
$ dest_dev0 = cl_array.empty_like(a_dev0)
$
$ prg = cl.Program(ctx, """
$    __kernel void square(__global const float *a, __global float *c)
$    {
$      int gid0 = get_global_id(0);
$
$      c[gid0] = a[gid0] * a[gid0];
$    }
$    """).build()
$
$ prg.square(queue, a_dev0.shape, None, a_dev0.data, dest_dev0.data)
$
$ final_array0 = dest_dev0.get()
$
$
$ ################################################################################
$ # Reikna piece
$
$ api = cluda.ocl_api()
$ thr = api.Thread.create(queue)
$
$ dest_dev1 = thr.empty_like(a)
$
$ program = thr.compile("""
$ KERNEL void multiply_them(
$ GLOBAL_MEM float *dest,
$ GLOBAL_MEM float *a)
$ {
$ const SIZE_T i = get_local_id(0);
$ dest[i] = a[i] * a[i];
$ }
$ """)
$ multiply_them = program.multiply_them
$ multiply_them(dest_dev1, dest_dev0, local_size=num, global_size=num)
$ final_array1 = dest_dev1.get()

Do I have to do something extra to dest_dev0 to make it an acceptable array for multiply_them to operate on?
Thanks again!
Seth


Bogdan Opanchuk

unread,
Apr 28, 2017, 9:26:07 PM4/28/17
to reikna
Hi Seth,

You confused the Thread constructor (http://reikna.publicfields.net/en/latest/api/cluda.html#reikna.cluda.api.Thread) and the convenience classmethod create() (http://reikna.publicfields.net/en/latest/api/cluda.html#reikna.cluda.api.Thread.create). You need the former:

thr = api.Thread(queue)


smu...@uchicago.edu

unread,
Apr 28, 2017, 10:05:13 PM4/28/17
to reikna
Hi Bogdan,
It works!  Thanks for bearing with me,
Seth
Reply all
Reply to author
Forward
0 new messages