sorting using simd

Gerald Lodron

unread,

Oct 30, 2014, 7:52:36 AM10/30/14

to nt2...@googlegroups.com

Hello

I am searching for a fast sorting algorithm and found floki:
https://github.com/khegeman/floki
which uses the boost simd library of nt2.

I successfully compiled nt2 3.1.2 with boost 1.56.0, tbb 4.2.3 and CUDA 6.5.14 with VS2013x64 and tested the floki::sort of related link in my own software test (so do not compield floki directly, only copied the aa-sort algorithm of it which uses your boost simd).

I compared it with std::sort (there should be a significant improvement due to the autor) and it was much slower:

My processor has AVX support with 256 bit registers (Intel Xeon E5-1620 v2) and I also have a GTX 760 if required (but don’t think so).

Do I have to tell the boost library that it should use AVX? How can i find out which internals (AVX, SSE etc.) the boost library use?

Thanks for help.

Gerald Lodron

Mathias Gaunard

unread,

Oct 31, 2014, 4:32:56 AM10/31/14

to nt2...@googlegroups.com

On 30/10/14 12:52, Gerald Lodron wrote:
> Hello
>
> I am searching for a fast sorting algorithm and found floki:
> https://github.com/khegeman/floki
> which uses the boost simd library of nt2.

I wasn't aware of this.
We, the NT2 developers, actually have our own version of a bitonic sort
algorithm using Boost.SIMD, but it's not available online.

> I successfully compiled nt2 3.1.2 with boost 1.56.0, tbb 4.2.3 and CUDA
> 6.5.14 with VS2013x64 and tested the floki::sort of related link in my
> own software test (so do not compield floki directly, only copied the
> aa-sort algorithm of it which uses your boost simd).
>
> I compared it with std::sort (there should be a significant improvement
> due to the autor) and it was much slower:

Bear in mind that VS2013 is known to perform much worse than clang with
regards to Boost.SIMD, particularly for the constructs that floki
appears to use.
VS2013's AVX support is also a bit buggy.

>
> My processor has AVX support with 256 bit registers (Intel Xeon E5-1620
> v2) and I also have a GTX 760 if required (but don’t think so).
>
>
> Do I have to tell the boost library that it should use AVX? How can i
> find out which internals (AVX, SSE etc.) the boost library use?

Since you're building for 64-bit, you have SSE enabled by default.
To enable AVX, you need to build with the /arch:AVX compilation flag.

kyle.h...@gmail.com

unread,

Feb 18, 2015, 9:45:50 PM2/18/15

to nt2...@googlegroups.com

I just became aware of this mailing list and found this post. I worked on this implementation last summer, but I haven't looked at it much since. At the time, the only compiler it performed well with was clang. I took another look today with the most recent nt2 master branch and g++ performance is now comparable with clang. However, with g++ I still need to compile with "-fno-strict-aliasing -DBOOST_SIMD_NO_STRICT_ALIASING" as was pointed out to me in this issue https://github.com/MetaScale/nt2/issues/741 or the compiler crashes.

Gerald sent me the assembler output from vs2013 last November. There were a lot of template functions that failed to inline from boost simd. Do you see better results with the constructs I'm using in more recent versions of VS?

Reply all

Reply to author

Forward