Hi,
I have quite a complex 2d kernel (cannot post the code itself unfortunately) containing a bunch of nested loops. basically (pseudo kernel code) like this:
float arr[1400];
//a bunch of other code here
for(i=0; i<y;++i){
//and here
for(h=0
;h<x;++h){
//and here
for(t=0
;t<y1;++t){
//and here
for(s=0
;s<x1;++s,other += 3, another +=4){
//and here
}
}
}
}
for(i; i<y;++i){
//and here
for(h;h<x;++h
,yetanother += 3, andanother +=4
){
//yes also here :)
}
}
output(x,y) = calculated_whatever; //output is for example a float array kernel parameter defined in global memory space
which works fine but when the loop conditions rhs values (y,x,y1,x1) are slightly higher (though none > 20) I get a "host out of memory" error
when the kernel is evaluated via enqueue_nd_range_kernel(kernel,2,start,end,0).
I thought the host memory is for example the RAM in a desktop pc? The machine I am testing this on has 64GB RAM, so the message kind of confuses me.
What could be the reason for this happening? The missing code is nothing but plain old datatypes and a few float3 variables. All in private memory space.
I am not sure yet if compute is the reason for the crash or if I have a code bug but I browsed over my code and couldn't find a culprit (also the exact same code runs on CPU, multithreaded without any issues which at least makes me think it COULD be boost compute)
Or maybe it's my stone age GPU with an older driver that causes the crash. Surely I gotta figure that one out myself but I thought I'd do some research first here. :)
I also wanna understand any traps I may step into with such code design.
Thanks in advance