Hi Will,
Yeppp! requires that all types are naturally aligned (i.e. short/ushort numbers are aligned on
2 bytes, int/uint/float on 4 bytes, and long/ulong/double on 8 bytes). This requirement holds
for all C runtimes (malloc'ed memory is always aligned on 2*sizeof(void*), i.e. at least 8 bytes),
and I assumed that it hold for .Net framework, which was validated by my tests. Natural alignment
is important both for performance and portability:
- Misaligned (say load of double which is not aligned on 8 bytes) loads on x86 are slower than aligned loads
- If the array is naturally aligned it is possible to process first few elements and until the address of an element pointer becomes aligned by 16 or 32: the latter is important for SIMD processing as SIMD registers are 16 or 32 bytes wide and aligned loads/stores for SIMD registers are much faster than misaligned. Besides that, some types of store operations are supported only for aligned pointers.
- Thirdly, some architectures do not allow misaligned loads and would generate a hardware exception when attempting to perform such operation. Examples are MIPS, Xeon Phi (for SIMD instructions), and ARM before ARMv6.
(with less than 1000 elements) are not guaranteed to be aligned on 8 bytes on 32-bit systems.
So, here is what happens in your use case: in Yeppp 1.0 all compute functions require naturally
aligned pointers and return YepStatusMisalignedPointer error if this requirement is violated.
Yeppp! CLR bindings, in turn, convert YepStatusMisalignedPointer status to DataMisalignedException.
Since you use CLR bindings with small array length, sometimes CLR allocates arrays in such way
that they are not aligned on 8 bytes, which causes Yeppp! kernels to fail. Via CLR bindings this failure
is translated to C# program as DataMisalignedException.
I will change the requirements for Yeppp! functions on 32-bit systems to accept arrays which are not
aligned on 8 bytes. As a temporary workaround for your project I would suggest to use 64-bit version
of Yeppp! (which is also much faster as Yeppp! 1.0.0 does not include optimized implementations for
32-bit x86). BTW, I don't expect Yeppp! to be efficient on small arrays, it is optimized for arrays of 100
elements of more.
Regards,
Marat