int in struct not copied?

50 views
Skip to first unread message

Samir Kharchi

unread,
Jul 27, 2016, 9:08:12 AM7/27/16
to boost-compute
Hi,

my structs look like this:

struct myStruct
{
int a;
int b;
boost::compute::float4_ c;
boost::compute::float4_ d;
//Here some more float4_ members and one int4_
};

struct mySettings
{
boost::compute::float4_ off;
boost::compute::float4_ size;
};

I adapted them for opencl with all the necessary boost compute calls etc.

Then I put a bunch of myStruct on a host vector and copy them to a device vector using copy_async.
I also set a single mySettigns into a device vector which I pass to the kernel as well. Then I call a kernel on the data like:

__kernel void mytest(  __global const myStruct* input
                                , __global myStruct* output
                                , __global const mySettings* sett)
{
        const uint i = get_global_id(0);
        const __global myStruct* test = &input[i];

         output[i].d.xyz = test->c.xyz;
              
         float3 newval = test->c.xyz - sett->off.xyz; //corrupts the two int a/b in output[i]
         //float3 newval = test->c.xyz; //using this works fine, nothing is changed. the two int a/b in output[i] keep their value
               
         output[i].c.xyz = newval;
}

However, after I do the copy_async call from device back to host the two int values (which are initially > 0) of all myStruct entries are zeroed out or get an arbitrarily high value.
Strange thing is that all the float4_ types are not corrupted in any way. only the two int members.

Any idea what could cause this?

Thanks
Sam

Samir Kharchi

unread,
Jul 27, 2016, 5:12:20 PM7/27/16
to boost-compute
I thought it might have been an alignment issue. My next thought was that if I'd use only those vector types defined in the boost compute namespace it should work flawlessly,
so now I am only using int2_ (instead of the two 'int' I used before), float4_ and int4_ from the boost::compute namespace.

And that changed something! Well not really, only that both integers now have an arbitrarily high value. Before this was the case only for the first int member a, the other int b was zeroed out.
So, still something doesn't add up here. What else can I do than using boost::compute types only?

Btw. it doesn't matter if I subtract sett->off.xyz in the initial example code. I can also just subtract (float3)(4.0f) and swoop the int2_ values are corrupted.

Samir Kharchi

unread,
Jul 27, 2016, 5:16:19 PM7/27/16
to boost-compute
Oh just to clarify, this is what the struct now looks like:

struct myStruct
{
boost::compute::float2_ ab;

boost
::compute::float4_ c;
boost
::compute::float4_ d;
//Here some more float4_ members
boost::compute::int4_ e;
};

Samir Kharchi

unread,
Jul 27, 2016, 5:17:18 PM7/27/16
to boost-compute
Argh. This should look like this instead:

struct myStruct
{
boost::compute::int2_ ab;

boost
::compute::float4_ c;
boost
::compute::float4_ d;
//Here some more float4_ members
boost::compute::int4_ e;

};

Samir Kharchi

unread,
Jul 27, 2016, 5:25:23 PM7/27/16
to boost-compute
Last info (I promise), Windows7 64-Bit VS2013 is what I am using to compile.

Jakub Szuppe

unread,
Jul 27, 2016, 6:00:08 PM7/27/16
to boost-compute
I may have time tomorrow to look at this. I would be really grateful if you could
provide a test case like in any other Boost.Compute test case (context, queue and
device variables are provided in them).

I'm curious: do the values in output become zeroed out or set to an arbitrarily high value
after you perform `output[i].c.xyz = newval;` (you can print the values from the kernel) or
after copying back to host? That's important. If it happens in the kernel, well, then it may be
OpenCL case and if it happens only after copying then it may be about Boost.Compute.
Which platform OpenCL do you use? AMD? Intel? NVIDIA? POCL? That's more important
than C++ compiler.

Samir Kharchi

unread,
Jul 28, 2016, 7:05:04 AM7/28/16
to boost-compute
I am using the NVIDIA SDK for opencl 1.1.

Here is a working (well, non-working if you want so) test code.
In this example not only the int2_ member values are corrupted but also the third and fourth component values of the float4_ c member!

struct myStruct
{
    myStruct
(int x)
       
: id_attributes(10 + x,x)
       
, c(0.0f,0.0f,0.0f,10000000.0f)
       
, d(10.0f,10.0f,10.0f,10000.0f)
   
{
   
}
   
    boost
::compute::int2_    id_attributes;

    boost
::compute::float4_ c;
    boost
::compute::float4_ d;
};

typedef    std::vector<myStruct> HostData;
BOOST_COMPUTE_ADAPT_STRUCT
(myStruct, myStruct, (id_attributes, c, d))
typedef    boost::compute::vector<myStruct>    GPUData;

#define get_typedef boost::compute::type_definition<myStruct>() + "\n"

struct CContext {
    boost
::compute::device        device;
    boost
::compute::context       context;
    boost
::compute::command_queue queue;

   
CContext() :
        device
( boost::compute::system::default_device() ),
        context
( boost::compute::system::default_context() ),
        queue  
( boost::compute::system::default_queue() )
   
{}
};

void myStructKernel()
{
   
CContext context;

   
HostData data;
    data
.push_back(myStruct(1));
    data
.push_back(myStruct(2));
    data
.push_back(myStruct(3));
   
//Print A

   
GPUData gpudata(data.size(), context.context);
    boost
::compute::copy(data.begin(), data.end(), gpudata.begin(), context.queue);
    //Print B

    std
::string source = BOOST_COMPUTE_STRINGIZE_SOURCE(
        __kernel
void custom_kernel(__global const myStruct *input, __global myStruct *output)

       
{
           
const uint i = get_global_id(0);

           
const __global myStruct *testy = &input[i];

            output
[i].d.xyz = testy->c.xyz;

            float3 newval
= testy->c.xyz - (float3)(5.f);


            output
[i].c.xyz = newval;
       
}

   
);

    source
= get_typedef + source;

    boost
::compute::program program = boost::compute::program::build_with_source(source, context.context);

    boost
::compute::kernel custom_kernel = program.create_kernel("custom_kernel");
    custom_kernel
.set_arg(0, gpudata);
    custom_kernel
.set_arg(1, gpudata);

    context
.queue.enqueue_1d_range_kernel(custom_kernel, 0, gpudata.size(), 1);
    context
.queue.finish();
    //Print C

    boost
::compute::copy(gpudata.begin(), gpudata.end(), data.begin(), context.queue);
    //Print D
}

Here is the print out result:
//Print A (Host)
id
/attributes: 11/1
c
: 0/0/0/1e+007
d
: 0/0/0/1e+007
--------------------------------
id
/attributes: 12/2
c
: 0/0/0/1e+007
d
: 0/0/0/1e+007
--------------------------------
id
/attributes: 13/3
c
: 0/0/0/1e+007
d
: 0/0/0/1e+007
--------------------------------

//Print B (Device after Copy)
id
/attributes: 11/1
c
: 0/0/0/1e+007
d
: 0/0/0/1e+007
--------------------------------
id
/attributes: 12/2
c
: 0/0/0/1e+007
d
: 0/0/0/1e+007
--------------------------------
id
/attributes: 13/3
c
: 0/0/0/1e+007
d
: 0/0/0/1e+007
--------------------------------

//Print C (Device after Kernel)
id
/attributes: -1063256064/-1063256064
c
: -5/0/1.54143e-044/1.4013e-045
d
: -5/0/1.54143e-044/1.4013e-045
--------------------------------
id
/attributes: 12/2
c
: -5/-5/-5/1e+007
d
: -5/-5/-5/1e+007
--------------------------------
id
/attributes: -1063256064/-1063256064
c
: -5/0/1.82169e-044/4.2039e-045
d
: -5/0/1.82169e-044/4.2039e-045
--------------------------------

//Print D (Host after copy)
id
/attributes: -1063256064/-1063256064
c
: -5/0/1.54143e-044/1.4013e-045
d
: -5/0/1.54143e-044/1.4013e-045
--------------------------------
id
/attributes: 12/2
c
: -5/-5/-5/1e+007
d
: -5/-5/-5/1e+007
--------------------------------
id
/attributes: -1063256064/-1063256064
c
: -5/0/1.82169e-044/4.2039e-045
d
: -5/0/1.82169e-044/4.2039e-045
--------------------------------

Anyway, if I change the int2_ member to an int4_ (so now I have only type4_ members) then it works! The values are not corrupted anymore and do not change arbitrarily.
Here the print out when using an int4_:
//Print A (Host)
id/attributes: 11/1
c: 0/0/0/1e+007
d: 0/0/0/1e+007
--------------------------------
id/attributes: 12/2
c: 0/0/0/1e+007
d: 0/0/0/1e+007
--------------------------------
id/attributes: 13/3
c: 0/0/0/1e+007
d: 0/0/0/1e+007
--------------------------------


//Print B (Device after Copy)
id/attributes: 11/1
c: 0/0/0/1e+007
d: 0/0/0/1e+007
--------------------------------
id/attributes: 12/2
c: 0/0/0/1e+007
d: 0/0/0/1e+007
--------------------------------
id/attributes: 13/3
c: 0/0/0/1e+007
d: 0/0/0/1e+007
--------------------------------


//Print C (Device after Kernel)
id/attributes: 11/1
c: -5/-5/-5/1e+007
d: -5/-5/-5/1e+007
--------------------------------
id/attributes: 12/2
c: -5/-5/-5/1e+007
d: -5/-5/-5/1e+007
--------------------------------
id/attributes: 13/3
c: -5/-5/-5/1e+007
d: -5/-5/-5/1e+007
--------------------------------


//Print D (Host after copy)
id/attributes: 11/1
c: -5/-5/-5/1e+007
d: -5/-5/-5/1e+007
--------------------------------
id/attributes: 12/2
c: -5/-5/-5/1e+007
d: -5/-5/-5/1e+007
--------------------------------
id/attributes: 13/3
c: -5/-5/-5/1e+007
d: -5/-5/-5/1e+007
--------------------------------


Doesn't this indicate a problem with the struct definition itself (which should be taken care of by BOOST_COMPUTE_ADAPT_STRUCT right?) and maybe the memory alignment size?
It's definetly not the copy call and I guess it doesn't seem to be opencl itself either as it works with an int4_ member? Not sure.

Samir Kharchi

unread,
Jul 28, 2016, 7:17:40 AM7/28/16
to boost-compute
Sorry, in the previous post the c variable was erroneously print out twice instead of c & d.
So for completeness, these are the correct print outs now for using int4_ (working) and int2_ (not working) respectively.

int4_
//Print A (Host)
id
/attributes: 11/1
c
: 0/0/0/1e+007

d
: 10/10/10/10000

--------------------------------
id
/attributes: 12/2
c
: 0/0/0/1e+007

d
: 10/10/10/10000

--------------------------------
id
/attributes: 13/3
c
: 0/0/0/1e+007

d
: 10/10/10/10000

--------------------------------

//Print B (Device after Copy)
id
/attributes: 11/1
c
: 0/0/0/1e+007

d
: 10/10/10/10000

--------------------------------
id
/attributes: 12/2
c
: 0/0/0/1e+007

d
: 10/10/10/10000

--------------------------------
id
/attributes: 13/3
c
: 0/0/0/1e+007

d
: 10/10/10/10000

--------------------------------

//Print C (Device after Kernel)
id
/attributes: 11/1
c
: -5/-5/-5/1e+007

d
: 0/0/0/10000

--------------------------------
id
/attributes: 12/2
c
: -5/-5/-5/1e+007

d
: 0/0/0/10000

--------------------------------
id
/attributes: 13/3
c
: -5/-5/-5/1e+007

d
: 0/0/0/10000

--------------------------------

//Print D (Host after copy)
id/attributes: 11/1
c
: -5/-5/-5/1e+007

d
: 0/0/0/10000

--------------------------------
id
/attributes: 12/2
c
: -5/-5/-5/1e+007

d
: 0/0/0/10000

--------------------------------
id
/attributes: 13/3
c
: -5/-5/-5/1e+007

d
: 0/0/0/10000
--------------------------------


Int2_
//Print A (Host)
id
/attributes: 11/1
c
: 0/0/0/1e+007

d
: 10/10/10/10000

--------------------------------
id
/attributes: 12/2
c
: 0/0/0/1e+007

d
: 10/10/10/10000

--------------------------------
id
/attributes: 13/3
c
: 0/0/0/1e+007

d
: 10/10/10/10000

--------------------------------

//Print B (Device after Copy)
id
/attributes: 11/1
c
: 0/0/0/1e+007

d
: 10/10/10/10000

--------------------------------
id
/attributes: 12/2
c
: 0/0/0/1e+007

d
: 10/10/10/10000

--------------------------------
id
/attributes: 13/3
c
: 0/0/0/1e+007

d
: 10/10/10/10000

--------------------------------

//Print C (Device after Kernel)
id
/attributes: -1063256064/-1063256064
c
: -5/0/1.54143e-044/1.4013e-045

d
: 0/10/10/10000

--------------------------------
id
/attributes: 12/2
c
: -5/-5/-5/1e+007

d
: 0/0/0/10000

--------------------------------
id
/attributes: -1063256064/-1063256064
c
: -5/0/1.82169e-044/4.2039e-045

d
: 0/10/10/10000

--------------------------------

//Print D (Host after copy)
id
/attributes: -1063256064/-1063256064
c
: -5/0/1.54143e-044/1.4013e-045

d
: 0/10/10/10000

--------------------------------
id
/attributes: 12/2
c
: -5/-5/-5/1e+007

d
: 0/0/0/10000

--------------------------------
id
/attributes: -1063256064/-1063256064
c
: -5/0/1.82169e-044/4.2039e-045

d
: 0/10/10/10000
--------------------------------






Jakub Szuppe

unread,
Jul 28, 2016, 8:21:38 AM7/28/16
to boost-compute
It works perfectly fine for me, both with int2_ and int4_. That's on Linux, AMD APP Platform and gcc. I'll
check on Windows in a sec.

Jakub Szuppe

unread,
Jul 28, 2016, 8:31:34 AM7/28/16
to boost-compute
It works on Windows with MVSC too (I tested it on AMD APP platform and Intel Platform).

Samir Kharchi

unread,
Jul 28, 2016, 8:37:45 AM7/28/16
to boost-compute
Hmm ok. So what could this be? Could it be my old GTX260 card? Or a driver issue?
I will try this on another machine tonight with a newer card and see if that makes any difference.

Jakub Szuppe

unread,
Jul 28, 2016, 8:53:47 AM7/28/16
to boost-compute
I doubt it's the card, it may be NVIDIA OpenCL platform (SDK) or driver. I don't know.

Jakub Szuppe

unread,
Jul 28, 2016, 1:10:35 PM7/28/16
to boost-compute
btw. The code you published (including struct) may be just an example, idk, but I just want to
say that maybe you try struct of arrays (buffer) instead array (buffer) of structs approach, and
also in that kernel, it would be better to load value from input to some private variable and use that
instead of using a pointer.

Samir Kharchi

unread,
Jul 29, 2016, 5:26:33 AM7/29/16
to boost-compute
Thanks.

I installed the latest driver and the problem still persists. :-/ I will install the Intel SDK and see if that makes a difference!

And yes, currently I am simply trying out what is possible and what not. I haven't worked with GPU targetting code before (i.e. C99 and OpenCL) so this is all basic research to later make some design choices.
I guess you mean struct of arrays probably fits the vectorization pattern of GPUs better (and helps for easier parallelization). So you say it's valid to have a struct with boost::compute::vectors (or rather global pointers to the buffers)?

And I assume you mean that the private memory variables are faster to execute than picking them from the global array?

Cheers so far

Jakub Szuppe

unread,
Jul 29, 2016, 5:41:58 AM7/29/16
to boost-compute

On Friday, July 29, 2016 at 11:26:33 AM UTC+2, Samir Kharchi wrote:
Thanks.

I installed the latest driver and the problem still persists. :-/ I will install the Intel SDK and see if that makes a difference!

OK. Let me know then.
 
And yes, currently I am simply trying out what is possible and what not. I haven't worked with GPU targetting code before (i.e. C99 and OpenCL) so this is all basic research to later make some design choices.
I guess you mean struct of arrays probably fits the vectorization pattern of GPUs better (and helps for easier parallelization). So you say it's valid to have a struct with boost::compute::vectors (or rather global pointers to the buffers)?

I mean when you use a normal typically struct, then each time you want to use any value from that struct, you have to load
whole struct. You want to read 2 ints, that is 8 bytes, and you end up reading 16 bytes or whatever the size of your struct is,
more than 8 bytes... or you can try reading only those 2 ints but then you have strided memory access. Both ways your 
performance is damaged. 

In this case struct of arrays technique would really be a struct of boost::compute::vectors (or boost::compute::buffers, whatever).

 

And I assume you mean that the private memory variables are faster to execute than picking them from the global array?

Yes. Read value to private mem, use it and later save it. Don't read twice the same value from a buffer if you really
really don't need to.
 

Cheers so far

Samir Kharchi

unread,
Jul 29, 2016, 1:20:10 PM7/29/16
to boost-compute
Am Freitag, 29. Juli 2016 11:41:58 UTC+2 schrieb Jakub Szuppe:

OK. Let me know then.

With the Intel SDK it magically works! :)
 
I mean when you use a normal typically struct, then each time you want to use any value from that struct, you have to load
whole struct. You want to read 2 ints, that is 8 bytes, and you end up reading 16 bytes or whatever the size of your struct is,
more than 8 bytes... or you can try reading only those 2 ints but then you have strided memory access. Both ways your 
performance is damaged. 

Ah I see. Sure thing. Totally following Einsteins "You gotta do things as simple as possible. But not simpler".

In this case struct of arrays technique would really be a struct of boost::compute::vectors (or boost::compute::buffers, whatever).

Excellent. Cheers for that info.
 
Yes. Read value to private mem, use it and later save it. Don't read twice the same value from a buffer if you really
really don't need to.


Will do!

Thanks for all your input Jakub. This is all really helpful. I will probably have more questions in the next weeks but currently it seems I will be able to use compute for my commercial projects in the future.

Jakub Szuppe

unread,
Jul 30, 2016, 8:13:12 AM7/30/16
to boost-compute

On Friday, July 29, 2016 at 7:20:10 PM UTC+2, Samir Kharchi wrote:
Am Freitag, 29. Juli 2016 11:41:58 UTC+2 schrieb Jakub Szuppe:

OK. Let me know then.

With the Intel SDK it magically works! :)

I see. Your problems on NVIDIA may be related to this https://devtalk.nvidia.com/default/topic/903165/nvidia-opencl-structs-glitches/,
so you're not the only one having those problems.
 
 
I mean when you use a normal typically struct, then each time you want to use any value from that struct, you have to load
whole struct. You want to read 2 ints, that is 8 bytes, and you end up reading 16 bytes or whatever the size of your struct is,
more than 8 bytes... or you can try reading only those 2 ints but then you have strided memory access. Both ways your 
performance is damaged. 

Ah I see. Sure thing. Totally following Einsteins "You gotta do things as simple as possible. But not simpler".

In this case struct of arrays technique would really be a struct of boost::compute::vectors (or boost::compute::buffers, whatever).

Excellent. Cheers for that info.
 
Yes. Read value to private mem, use it and later save it. Don't read twice the same value from a buffer if you really
really don't need to.


Will do!

Thanks for all your input Jakub. This is all really helpful. I will probably have more questions in the next weeks but currently it seems I will be able to use compute for my commercial projects in the future.

I'm always happy to help (whenever I have some free time). I hope you will find Boost.Compute
and OpenCL useful in your projects! CU!
Reply all
Reply to author
Forward
0 new messages