'Cannot bind packed field'

Christian Gollwitzer

unread,

Oct 19, 2018, 4:42:43 PM10/19/18

to

Hi,

I've got a strange error message while developing a program for OpenCL
when compiling on gcc 4.8.5

clfdk.cpp:322:23: error: cannot bind packed field ‘geom.fdk_geom::Pmat’
to ‘cl_float4 (&)[4]’
mat4copy(P, geom.Pmat);

The error has not occured with previous versions, or with clang++ or
VC++, so I'd like to know, if this is a compiler bug/missing feature.

mat4copy is defined as a template:

// 4x4 matrix assignment, for incompatible types
template <typename mat4like1, typename mat4like2>
void mat4copy(mat4like1 src, mat4like2 &dest) {
for (int i = 0; i < 4; ++i) {
for(int j = 0; j < 4; ++j)
{
dest[i].s[j]=src[i].s[j];
}
}
}

It should perform elementwise assignment of collections of 4 cl_float4
datatypes, which are defined as structs of 4 floats in the OpenCL system
header.

I use a structure to pass data from the CPU to the GPU, which is defined as:

==========================
#ifdef __OPENCL_VERSION__
// On the device
#define ffloat float
#define fint int

#else
// On the host
#include "cl.hpp"
#define HOST_COMPILER

#define ffloat cl_float
#define fint cl_int
#define float4 cl_float4

#endif

// make sure the structure is packed
#ifdef _MSC_VER
# define PACKED_STRUCT(def) \
__pragma("pack(push, 1)") struct def; __pragma("pack(pop)")
#else
# define PACKED_STRUCT(def) struct __attribute__((packed)) def;
#endif

PACKED_STRUCT(fdk_geom {
/***************** output geometry **************/
// Resulting homogeneous projection matrix
// Organized as 4 float4 rows
// such that the matrix product can be written as dot products
// First element to be always aligned
float4 Pmat[4];

/************* Input geometry ******************/
// Distances
ffloat SOD;
....
})
============================

This header is included by both copmilers (host & CPU). The reason to
define the struct as packed in this way is that the memory layout
between CPU and GPU must be exactly the same.

gcc 4.8 seems to have a problem with the reference to the element of the
packed struct, which seems strange to me, since the other compilers all
accept it (and it works). I'm especially astonished because the template
mat4copy has exactly the purpose of copying data between different
containers of 4 cl_float4s, like std::vector, std::aray, plain array...
but the compiler refuses to instantiate the template.

Christian

Öö Tiib

unread,

Oct 19, 2018, 5:20:07 PM10/19/18

to

Sounds like that bug/problem:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=36566

Alf P. Steinbach

unread,

Oct 19, 2018, 6:35:16 PM10/19/18

to

As I read the comments on that it's a problem, not a bug.

Because in general an member of a packed struct needs not be properly
aligned for the member type, and then using a reference to that member
might crash on some architectures, or with OS intervention for the trap
cause Real Slow Execution on others.

Since the member of interest is at the start of the structure, and since
the standards require no padding before that member, practical solution
might be to reinterpret_cast and `std::launder` a pointer to the
complete packed struct, assuming that it in turn isn't part of packed
struct.

An alternative practical solution that should work in general, is to
just overload the function that unpacks, with a variant that accepts the
whole struct as argument.

I would not go as far as to copy bytes.

Cheers!,

- Alf

Christian Gollwitzer

unread,

Oct 20, 2018, 4:21:28 AM10/20/18

to

Am 20.10.18 um 00:35 schrieb Alf P. Steinbach:

> On 19.10.2018 23:19, Öö Tiib wrote:
>> On Friday, 19 October 2018 23:42:43 UTC+3, Christian Gollwitzer wrote:
>>> Hi,
>>>
>>> I've got a strange error message while developing a program for OpenCL
>>> when compiling on gcc 4.8.5
>>>
>>> clfdk.cpp:322:23: error: cannot bind packed field ‘geom.fdk_geom::Pmat’
>>> to ‘cl_float4 (&)[4]’
>>> mat4copy(P, geom.Pmat);

>>> gcc 4.8 seems to have a problem with the reference to the element of the
>>> packed struct, which seems strange to me, since the other compilers all
>>> accept it (and it works). I'm especially astonished because the template
>>> mat4copy has exactly the purpose of copying data between different
>>> containers of 4 cl_float4s, like std::vector, std::aray, plain array...
>>> but the compiler refuses to instantiate the template.
>>
>> Sounds like that bug/problem:
>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=36566
>
> As I read the comments on that it's a problem, not a bug.
>
> Because in general an member of a packed struct needs not be properly
> aligned for the member type, and then using a reference to that member
> might crash on some architectures, or with OS intervention for the trap
> cause Real Slow Execution on others.
>
> Since the member of interest is at the start of the structure, and since
> the standards require no padding before that member, practical solution
> might be to reinterpret_cast and `std::launder` a pointer to the
> complete packed struct, assuming that it in turn isn't part of packed
> struct.
>
> An alternative practical solution that should work in general, is to
> just overload the function that unpacks, with a variant that accepts the
> whole struct as argument.
>
> I would not go as far as to copy bytes.

My "practical solution" now is a C-style cast to cast away the
"packed"-thing. What bugs me most is that basically this downgrades the
applicability of a template. The reason to have the template at all was
to copy, e.g. the content of a std::vector<cl_double4> with a length of
4 to a, e.g. std::array<cl_float4, 4>. There are at least three
different, but functionally equivalent data types in this program.

If I'd write it out by hand, it would work. If I used a macro, it would
work. If I could overload the operator= from the "outside" for a
std::array, it would work, but it must be a member (why?). If I drop the
packed, it compiles but breaks with invalid memory accesses due to
different alignments between the two compilers. Only because of a
pedantic compiler I'm forced to jump through these hoops.

Christian

Öö Tiib

unread,

Oct 20, 2018, 5:37:55 AM10/20/18

to

On Saturday, 20 October 2018 11:21:28 UTC+3, Christian Gollwitzer wrote:
>
> My "practical solution" now is a C-style cast to cast away the
> "packed"-thing. What bugs me most is that basically this downgrades the
> applicability of a template. The reason to have the template at all was
> to copy, e.g. the content of a std::vector<cl_double4> with a length of
> 4 to a, e.g. std::array<cl_float4, 4>. There are at least three
> different, but functionally equivalent data types in this program.

Templates are very useful for bit-crunching and hacking with packed
structs and bitfields in structs. Doing it by hand or with macros is
way bigger headache. The templates just have to be made for that.

The issue as you posted wasn't related to templates at all.
It was attempting to take address of element of packed struct.
GCC can't ignore it since it is often used to compile for platforms
that immediately raise signals on wrongly aligned references.

It is not always safe with visual studio either. For example the
XMVECTOR and XMMATRIX of DirectX11 must be 16-byte aligned and
will result with confusing run-time SEH exceptions (crashes) when
not aligned. There it would be lot better if visual studio was
also pedantic about alignment. ;)