In the drive to improve performance of OSS on untapped potential we have is the extended floating point instruction set (AVX, AVX2 etc). For certain operations e.g. fixed size, fixed number of entries there can be a speed up of almost x2 depending on data and operation. Eigen, which supports all our mathematical calculations supports AVX for certain operations. Sadly just turning on AVX yields a large number of runtime errors. To make data useable for AVX instructions data has to be aligned on 32 bit boundaries. Eigen checks for that allocation and will generate an error if that is not the case.
There are two different Eigen data structures (Fixed size and dynamic size) and two different memory allocation models (stack and heap) that have impact. Any dynamic eigen structure, i.e. variable sized vectors and matrices, and sparse matrices do their own memory allocation, they will produce correctly aligned code. Also correct aligment will be generated by stack allocated data. The only issue is with fixed size data structures that are allocated on the heap. And this impacts datastructures directly allocated on the heap and members of objects that are allocated on the heap.
There are a lot of heap allocations that are done in OSS (if that is a good thing is a separate issue). Most std containers allocate objects on the heap, all our components are allocated on the heap (and a lot of them contain fixed size eigen structures). There are various other structs containing fixed size eigen members that are built up from heap data.
Eigen provides a couple of tools to mitigate this problem.
a) A macro that can be used inside a class EIGEN_MAKE_ALIGNED_OPERATOR_NEW. Adding this to a class will override 'new' for this class, so when a new instance is created it will be correctly aligned (some or all standard containers do not use the class new for allocating memory, see below for a separate solution)
b) A specific aligned allocator Eigen::aligned_allocator<T>, this can be passed as a template argument to std::vector for example, it then will create members that are correctly aligned. std::make_shared also has to be dealt with in a separate manner.
c) Fixed sized structures can be told to ignore alignment (therefore disabling use of instrinsic functions) via Eigen:DontAlign as a template option.
In trying to implement this in OSS, I have gotten to a point where there seem to be three alternate routes that we can go on.
a) Move all allocations to using aligned allocators
b) Turn all fixed size structures to non-alligned
c) Turn all fixed size structures in heap allocations to non-aligned
Each of these options pose their own challenges,
(a) We will have to make sure that all classes containing Eigen members contain the macro, all containers that take class members use the allocator, and wrap std::make_shared() so it utilizes an aligned allocated when necessary
(b) We could change our typedefs to include the Eigen:DontAlign marker so that all OSS fixed size structured e.g. Matrix4d, RigidTransform3d, etc ... are not aligned, the effort here is that some eigen functions don't preserve this type, this will create a mix of unaligned and aligned types in some places, requiring us to rewrite some of our functions taking multiple eigen structures to deal with mixed types (e.g. in Geometry, where all the functions assume the same eigen options). It might also require us to create extra copies to convert from auto aligned eigen results to non-aligned member data (by removing `auto` in some places creating explicit copies or casting .eval() calls). Doing this means we probably don't have any 'per class' actions that need to be taken but will also disable any intrinsic speed up on all fixed size structures potentially slowing us down rather than speeding us up
(c) We introduce a second OSS type declaration xxxNonAligned, e.g. Vector3DNonAligned that can be used as a member type, here we would have to go into all classes that have eigen members and change the types to the NonAligned type. We still might have to edit functions in geometry.h to handle multiple types (this is hard to gauge), but we could use the auto aligned versions when they are allocated on the stack preserving any speed gains at that point.
I am starting to tend towards solution (c) it's more specific than (b) and preserves our ability to move certain classes in OSS to be fully aligned without us having to do this for every class. It is still probably more effort than (b) but leaves more avenues open. It also might be less of a performance hit than (b) overall as some structs will stack allocated and therefore aligned.
Overall the cost of doing this and the benefit are hard to gauge, straight up AVX can be up to x2 faster than SSE (which is what we are using now). But sparse operations do not utilize avx code, and a large part of our math happens in sparse matrices. The compound effect on us is really hard to gauge ...
Any input is appreciated, thanks, Harry
Principal Software Engineer
Simquest Solutions Inc.