self.sup_prog.multiply_them(self.FNxy_buf, self.mxf_buf,
self.FNyy_buf, self.myf_buf,
self.FNyz_buf, self.mzf_buf,
self.Hdy_buf, global_size = (self.Lz, self.Ly, self.Lx))
but obviously my GPU has even more memory resources, how can I still compute a larger arrays? Am I running out of indexes? Or am I using global memory in the wrong way and should use local?