(gpu c) optimistation!

Prroffessorr Fir Kenobi

unread,

Aug 18, 2015, 4:16:15 AM8/18/15

to

This night run my first gpu c code that is doing something and made some tests

this is a simple mandelbrot drwing code, first i
run scalar version

int mandelbrot_n( float cRe, float cIm, int max_iter )
{
float re = cRe;
float im = cIm;

float rere=re*re;
float imim=im*im;

for(int n=1; n<=max_iter; n++)
{

im = (re+re)*im + cIm;
re = rere - imim + cRe;

rere=re*re;
imim=im*im;

if ( (rere + imim) > 4.0 )
return n;

}

return 0;

}

for 256 x256 x 1000 iteration it take 90 ms

then i made sse intrinsic version

__attribute__((force_align_arg_pointer))
__m128i mandelbrot_n_sse( __m128 cre, __m128 cim, int max_iter )

{
__m128 re = _mm_setzero_ps();
__m128 im = _mm_setzero_ps();

__m128 _1 = _mm_set_ps1(1.);
__m128 _4 = _mm_set_ps1(4.);

__m128 iteration_counter = _mm_set_ps1(0.);

for(int n=0; n<=max_iter; n++)
{

__m128 re2 = _mm_mul_ps(re, re);
__m128 im2 = _mm_mul_ps(im, im);
__m128 radius2 = _mm_add_ps(re2,im2);

__m128 compare_mask = _mm_cmplt_ps( radius2, _4);
iteration_counter = _mm_add_ps( iteration_counter, _mm_and_ps(compare_mask, _1) );
if (_mm_movemask_ps(compare_mask)==0) break;

__m128 ren = _mm_add_ps( _mm_sub_ps(re2, im2), cre);
__m128 reim = _mm_mul_ps(re, im);

__m128 imn = _mm_add_ps( _mm_add_ps(reim, reim), cim);

re = ren;
im = imn;

}

__m128i n = _mm_cvtps_epi32(iteration_counter);

return n;
}

this run 20 ms (more that 4 times faster, dont know why)

(the procesor i run is anyway old core2 e6550 2.33GHz - i got better machine with avx support but didnt use it here yet)

then i make opencl code

"__kernel void square( \n" \
" __global int* input, \n" \
" __global int* output, \n" \
" const unsigned int count) \n" \
"{ \n" \
" int i = get_global_id(0); \n" \
" if(i < count) \n" \
" { \n" \
" int x = i%256; \n" \
" // if(x>=256) return; \n" \
" int y = i/256; \n" \
" // if(y>=256) return; \n" \
" float cRe = -0.5 + -1.5 + x/256.*3.; \n" \
" float cIm = 0.0 + -1.5 + y/256.*3.; \n" \
" float re = 0; \n" \
" float im = 0; \n" \
" int n = 0; \n" \
" for( n=0; n<=1000; n++) { \n" \
" if( re * re + im * im > 4.0 ) { output[256*y+x] = n + 256*n + 256*256*n; return;} \n" \
" float re_n = re * re - im * im + cRe; \n" \
" float im_n = 2 * re * im + cIm; \n" \
" re = re_n; \n" \
" im = im_n; \n" \
" } \n" \
" output[256*y+x] = 250<<8; \n" \
" } \n" \
"} \n" \
"\n";

this works with not a problem and works at 7 ms
(i got weak gpu gt610)

How to optimise this gpu version? Is it common to write such scalar code on gpu, maybe there is some way of writing something like sse intrinsics here? or other kind of optimisation?

(anyway i must say that thiose critics of gpu /opencl coding i dont fully agree this works
easy and fine - at least for some cases, (esp good is that it has not to much slowdown when
runing gpu from cpu and getting back results
- it seem i can run it in the 1 milisecond
window, so its very fine) i belive that with harder codes it may getting slower, but also belive with better card i may go also better than 7 ms)

Juha Nieminen

unread,

Aug 18, 2015, 9:02:31 AM8/18/15

to

Prroffessorr Fir Kenobi <profes...@gmail.com> wrote:
> " if( re * re + im * im > 4.0 ) { output[256*y+x] = n + 256*n + 256*256*n; return;} \n" \
>

> How to optimise this gpu version?

GPU shaders don't like conditionals. They are usually very slow. (A loop with
a fixed number of iterations is usually ok, though.) You could to see if you
could get rid of that conditional, and whether it makes it faster or slower.

--- news://freenews.netfront.net/ - complaints: ne...@netfront.net ---

Prroffessorr Fir Kenobi

unread,

Aug 18, 2015, 10:12:02 AM8/18/15

to

W dniu wtorek, 18 sierpnia 2015 15:02:31 UTC+2 użytkownik Juha Nieminen napisał:
> Prroffessorr Fir Kenobi <profes...@gmail.com> wrote:
> > " if( re * re + im * im > 4.0 ) { output[256*y+x] = n + 256*n + 256*256*n; return;} \n" \
> >
> > How to optimise this gpu version?
>
> GPU shaders don't like conditionals. They are usually very slow. (A loop with
> a fixed number of iterations is usually ok, though.) You could to see if you
> could get rid of that conditional, and whether it makes it faster or slower.
>

maybe it would be good way but how..
well i will try to google something