Windows does not interpret Boost.SIMD C++ instructions in SIMD assembly instructions but scalar ones

49 views
Skip to first unread message

Léo Sauget

unread,
Mar 26, 2015, 10:24:31 AM3/26/15
to nt2...@googlegroups.com

 Hi all,

I am trying to use Boost.SIMD library and I am facing a main issue.

I have installed Boost (and tested it) and the release-3.2 deposit on github. I have implemented the dot product example and some others basic functions (fused multiply-add,..) in order to evaluate the speed gain with Boost.SIMD.

I tried with both Visual Studio 2012 and 2013.


My issue is that Visual Studio does not manage (with /arch:SSE2 build option) to interpret Boost.SIMD instructions as SIMD ones. In a same program, I have SSE2 instructions working good (_mm_store_ps() gives movaps) and Boost.SIMD instructions staying scalar when I look to the assembly generate code.

Strangely, this code works perfectly on Mac.


Have you ever be confronted with this issue? Can you suggest some processes or reflections to resolve it?

Joel FALCOU

unread,
Mar 26, 2015, 10:28:01 AM3/26/15
to nt2...@googlegroups.com
Hi

Visual Studio does not set any preprocessor variable to indicate it
selected SSE2 or anything else. If you define
BOOST_SIMD_HAS_SSE2_SUPPORT in the preprocessor yourself, it should work.

I know noticed this is note part of the doc, I'll add that.

Side point releae-3.2 is not an official release as no tag is associated
with it. Better use release-3.1.
> --
> You received this message because you are subscribed to the Google
> Groups "nt2-dev" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to nt2-dev+u...@googlegroups.com
> <mailto:nt2-dev+u...@googlegroups.com>.
> For more options, visit https://groups.google.com/d/optout.

Léo Sauget

unread,
Mar 26, 2015, 11:48:01 AM3/26/15
to nt2...@googlegroups.com

At first, I would like to thank you for your very quick answer.
I added the macro you gave me, but it didn't change anything.
Actually, I have some SSE instructions (working on only 32-bit memory location)  : movss, addss and mulss. I don't get why I do not have movaps, addps or mulps into the disassembly code.
Here is the disassembly code generated debugging (program works) the SIMD Hello World programme (didn't change anything) of NT² doc :
00CBBDFA  lea         ecx,[res]  
00CBBDFD  call        boost::simd::pack<float,4>::pack<float,4> (0CB11E0h)  
    p_t u
(10);
00CBBE02  mov         dword ptr [ebp-178h],0Ah  
00CBBE0C  push        0  
00CBBE0E  lea         eax,[ebp-178h]  
00CBBE14  push        eax  
00CBBE15  lea         ecx,[u]  
00CBBE18  call        boost::simd::pack<float,4>::pack<float,4><int> (0CB15DCh)  
    p_t r
= boost::simd::splat<p_t>(11);
00CBBE1D  mov         dword ptr [ebp-16Ch],0Bh  
00CBBE27  push        0  
00CBBE29  lea         eax,[ebp-16Ch]  
00CBBE2F  push        eax  
00CBBE30  lea         ecx,[ebp-160h]  
00CBBE36  push        ecx  
00CBBE37  call        boost::simd::splat<boost::simd::pack<float,4>,int> (0CB1159h)  
00CBBE3C  add         esp,8  
00CBBE3F  push        eax  
00CBBE40  lea         ecx,[r]  
00CBBE43  call        boost::simd::pack<float,4>::pack<float,4><boost::simd::expression<boost::proto::exprns_::expr<boost::simd::tag::splat_,boost::proto::argsns_::list2<boost::simd::expression<boost::proto::exprns_::basic_expr<boost::proto::tagns_::tag::terminal,boost::proto::argsns_::term<int>,0>,int>,boost::simd::expression<boost::proto::exprns_::basic_expr<boost::proto::tagns_::tag::terminal,boost::proto::argsns_::term<boost::dispatch::meta::as_<boost::simd::native<float,boost::simd::tag::sse_,void> > >,0>,boost::dispatch::meta::as_<boost::simd::native<float,boost::simd::tag::sse_,void> > > >,2>,boost::simd::native<float,boost::simd::tag::sse_,void> > > (0CB10A5h)  

    res
= (u + r) * 2.f;
00CBBE48  movss       xmm0,dword ptr ds:[0CC3B84h]  
00CBBE50  movss       dword ptr [ebp-150h],xmm0  
00CBBE58  lea         eax,[ebp-150h]  
00CBBE5E  push        eax  
00CBBE5F  lea         ecx,[r]  
00CBBE62  push        ecx  
00CBBE63  lea         edx,[u]  
00CBBE66  push        edx  
00CBBE67  lea         eax,[ebp-144h]  
00CBBE6D  push        eax  
00CBBE6E  call        boost::simd::operator+<boost::simd::pack<float,4>,boost::simd::pack<float,4> > (0CB17EEh)  
00CBBE73  add         esp,0Ch  
00CBBE76  push        eax  
00CBBE77  lea         ecx,[ebp-134h]  
00CBBE7D  push        ecx  
00CBBE7E  call        boost::simd::operator*<boost::simd::expression<boost::proto::exprns_::expr<boost::proto::tagns_::tag::plus,boost::proto::argsns_::list2<boost::simd::expression<boost::proto::exprns_::basic_expr<boost::proto::tagns_::tag::terminal,boost::proto::argsns_::term<boost::simd::native<float,boost::simd::tag::sse_,void> const &>,0>,boost::simd::native<float,boost::simd::tag::sse_,void> const &>,boost::simd::expression<boost::proto::exprns_::basic_expr<boost::proto::tagns_::tag::terminal,boost::proto::argsns_::term<boost::simd::native<float,boost::simd::tag::sse_,void> const &>,0>,boost::simd::native<float,boost::simd::tag::sse_,void> const &> >,2>,boost::simd::native<float,boost::simd::tag::sse_,void> >,float> (0CB174Eh)  
00CBBE83  add         esp,0Ch  
00CBBE86  push        eax  
00CBBE87  lea         ecx,[res]  
00CBBE8A  call        boost::simd::pack<float,4>::operator=<boost::simd::expression<boost::proto::exprns_::expr<boost::proto::tagns_::tag::multiplies,boost::proto::argsns_::list2<boost::simd::expression<boost::proto::exprns_::expr<boost::proto::tagns_::tag::plus,boost::proto::argsns_::list2<boost::simd::expression<boost::proto::exprns_::basic_expr<boost::proto::tagns_::tag::terminal,boost::proto::argsns_::term<boost::simd::native<float,boost::simd::tag::sse_,void> const &>,0>,boost::simd::native<float,boost::simd::tag::sse_,void> const &>,boost::simd::expression<boost::proto::exprns_::basic_expr<boost::proto::tagns_::tag::terminal,boost::proto::argsns_::term<boost::simd::native<float,boost::simd::tag::sse_,void> const &>,0>,boost::simd::native<float,boost::simd::tag::sse_,void> const &> >,2>,boost::simd::native<float,boost::simd::tag::sse_,void> >,boost::simd::expression<boost::proto::exprns_::basic_expr<boost::proto::tagns_::tag::terminal,boost::proto::argsns_::term<float>,0>,float> >,2>,boost::simd::native<float,b... (0CB16EFh)  


Thanks in advance for your support

Joel Falcou

unread,
Mar 26, 2015, 12:40:16 PM3/26/15
to nt2...@googlegroups.com

Did you turned optimisation  on ?

--
You received this message because you are subscribed to the Google Groups "nt2-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to nt2-dev+u...@googlegroups.com.

Léo Sauget

unread,
Mar 26, 2015, 12:46:12 PM3/26/15
to nt2...@googlegroups.com

Yes SSE2 is on.

Joel Falcou

unread,
Mar 26, 2015, 12:52:54 PM3/26/15
to nt2...@googlegroups.com

I was speaking of O3

Léo Sauget

unread,
Mar 27, 2015, 4:47:13 AM3/27/15
to nt2...@googlegroups.com
I have turned Ox full-optimization and it works as well. In contrast, I only have O1, O2 and Ox options, what is O3 optimization ?


Joel FALCOU

unread,
Mar 27, 2015, 4:48:05 AM3/27/15
to nt2...@googlegroups.com
O3 is he gcc colloqualism for Ox on MSVC sorry.

So you do confirm it generates proper assembly with /Ox ?

On 27/03/2015 09:47, Léo Sauget wrote:
> I have turned Ox full-optimization and it works as well. In contrast, I
> only have O1, O2 and Ox options, what is O3 optimization ?
>
>
> --
> You received this message because you are subscribed to the Google
> Groups "nt2-dev" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to nt2-dev+u...@googlegroups.com
> <mailto:nt2-dev+u...@googlegroups.com>.

Léo Sauget

unread,
Mar 27, 2015, 6:20:49 AM3/27/15
to nt2...@googlegroups.com

Yes I do, it generates SIMD assembly instructions as expected. With the first tests, Boost.SIMD instructions seems to be 6x quicker than scalar ones. I will do much more assays on different architectures to evaluate the gain with accuracy.
Thank you for your job and your impressive support.
Have a nice day   

Mathias Gaunard

unread,
Mar 27, 2015, 1:20:49 PM3/27/15
to nt2...@googlegroups.com
On 26/03/2015 15:13, Léo Sauget wrote:

> My issue is that Visual Studio does not manage (with /arch:SSE2 build
> option) to interpret Boost.SIMD instructions as SIMD ones. In a same
> program, I have SSE2 instructions working good (_mm_store_ps() gives
> movaps) and Boost.SIMD instructions staying scalar when I look to the
> assembly generate code.
>
> Strangely, this code works perfectly on Mac.

Boost.SIMD requires inlining to work correctly.
It probably works on Mac because with GCC or Clang we can force inlining
to always occur. With MSVC, You need /Ob1 or preferably /Ob2 (/Ob2 is
enabled by /0x among others).

Mathias Gaunard

unread,
Mar 27, 2015, 1:22:59 PM3/27/15
to nt2...@googlegroups.com
On 26/03/2015 15:28, Joel FALCOU wrote:
> Hi
>
> Visual Studio does not set any preprocessor variable to indicate it
> selected SSE2 or anything else. If you define
> BOOST_SIMD_HAS_SSE2_SUPPORT in the preprocessor yourself, it should work.
>
> I know noticed this is note part of the doc, I'll add that.

That is incorrect.
/arch:SSE2 (or even nothing at all if you build for x64) is enough to
enable SSE2.

The BOOST_SIMD_HAS_***_SUPPORT macros are only necessary for SSE3,
SSSE3, SSE4.1, SSE4.2, SSE4a, (all of which become useless if you use
/arch:AVX), XOP, FMA4 and FMA3.

Joel Falcou

unread,
Mar 27, 2015, 1:46:03 PM3/27/15
to nt2...@googlegroups.com

Indeed

--
You received this message because you are subscribed to the Google Groups "nt2-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to nt2-dev+unsubscribe@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages