Transformation without load_same nor store_same

Skip to first unread message

Apr 25, 2017, 12:40:27 PM4/25/17
to reikna
I have written a Transformation which consists in a single complex input and two outputs.

While it works as expected (when used as a PureParallel object), as it does not call load_same or store_same, I do not know how I could use it within a Computation...

Here is the code:

def get_two_fft(arr_in):

# conjugate module
conj = reikna.cluda.functions.conj(arr_in.dtype)

# div module
div_by_double = reikna.cluda.functions.div(in_dtype1=arr_in.dtype,

# mul module
mul_by_j = reikna.cluda.functions.mul(arr_in.dtype, arr_in.dtype,

arr_type = Type(dtype=arr_in.dtype, shape=arr_in.shape)

nrows, ncols = arr_in.shape

j = np.complex64(1j)

return Transformation(
[Parameter('in_cplx', Annotation(arr_type, 'i')),
Parameter('out_cplx_1', Annotation(arr_type, 'o')),
Parameter('out_cplx_2', Annotation(arr_type, 'o')),

## get 2D indice
VSIZE_T idx_col = threadIdx.x + blockDim.x * blockIdx.x;
VSIZE_T idx_row = threadIdx.y + blockDim.y * blockIdx.y;

## get the reversed indice
int idx_new_row;
int idx_new_col;

if (idx_row==0)
idx_new_row = 0;
idx_new_row = ${nrows} - idx_row;

if (idx_col==0)
idx_new_col = 0;
idx_new_col = ${ncols} - idx_col;

## conjugate indice-reversed FFT
${in_cplx.ctype} rev_fft = ${conj}( ${in_cplx.load_idx}(idx_new_row, idx_new_col) );

## initial FFT
${in_cplx.ctype} init_fft = ${in_cplx.load_idx}(idx_row, idx_col);

## sum and divide
${in_cplx.ctype} sum_div = ${div}(init_fft + rev_fft, 2.0);

## substract and divide
${in_cplx.ctype} sub_div = ${div}(init_fft - rev_fft, 2.0);

## FFT of the first real-valued input
${out_cplx_1.store_idx}(idx_row, idx_col, sum_div);

## FFT second real-valued input
${out_cplx_2.store_idx}(idx_row, idx_col, -${mul}(${j}, sub_div));


render_kwds=dict(conj=conj, div=div_by_double, mul=mul_by_j, j=j,
nrows=nrows, ncols=ncols)

Bogdan Opanchuk

Apr 26, 2017, 8:28:30 AM4/26/17
to reikna
The main problem with this transformation is that you should use `idxs` in the transformation code and not the native CUDA functions, especially if you are going to attach it to the FFT (I explained why in the other post).

There is another subtle thing. Your transformation does not map the same indices of the input array to the same indices of the output arrays, which makes it unsafe to use with inplace array modifications. Currently it is not checked automatically, so you have to make sure your output array is different from the input array (see e.g. where the FFTShift transformation has the same problem).

Apr 26, 2017, 2:01:40 PM4/26/17
to reikna
As my Transformation is not suitable (lack of load_same/store_same) to be attached to the FFT, I rather call it after the FFT in _build_plan() and it works as expected.

However, contrary to what I wrote in the first post, I do not manage to call mul() with j (complex number) as inputs.

It raises an error saying that cuda cannot work with complex integral number.

The only way to solve this was to pass a (1,1) complex array, containing j, and then calling ${arr_j.load_idx}(0, 0).

Bogdan Opanchuk

Apr 27, 2017, 3:54:08 AM4/27/17
to reikna
Oh, I think I see what you mean. You can make it work by using `dtypes.c_constant()` function like so:

${out_cplx_2.store_idx}(idx_row, idx_col, -${mul}(${dtypes.c_constant(j)}, sub_div));

(which will evaluate to `COMPLEX_CTR(double2)(0, 1)`) or, alternatively, you can pass it to the transformation as a scalar parameter.

Reply all
Reply to author
0 new messages