newbie question, how to do: r = a > b

45 views
Skip to first unread message

mau...@blueskystudios.com

unread,
May 1, 2015, 2:35:18 PM5/1/15
to nt2...@googlegroups.com
Hi,

I'm totally new to boost::simd so forgive me if this is an obvious question or if I didn't install it properly.
(I just downloaded boost-simd-3.1.0.tgz and using it as is. I didn't run any make or install commands)

I've successfully compiled the 'hello world' program from the doc and I'm now trying this the example below.

pack<int> target;
pack
<int> mask;
pack
<int> value;

mask
= target > value;

but I'm getting compile errors with clang 3.5.0 (on linux), see below.

Thanks.

In file included from /netDISKS/master/netmt/LINUX_C65/cgi/boost-simd/3.1.0/boost-simd/include/boost/simd/sdk/simd/pack/generator.hpp:13:
/netDISKS/master/netmt/LINUX_C65/cgi/boost-simd/3.1.0/boost-simd/include/boost/simd/sdk/dsl/typed_expression.hpp:59:58: error:
     
no type named 'type' in 'boost::dispatch::meta::call<boost::proto::tagns_::tag::assign (boost::simd::native<int,
      boost::simd::tag::sse_, void> &, const boost::simd::native<boost::simd::logical<int>, boost::simd::tag::sse_, void>
      &), void>'

    BOOST_PP_REPEAT_FROM_TO
(1, BOOST_DISPATCH_MAX_ARITY, M0, ~)
<more errors follow>


Joel FALCOU

unread,
May 1, 2015, 3:09:40 PM5/1/15
to nt2...@googlegroups.com


On 01/05/2015 20:01, mau...@blueskystudios.com wrote:
> Hi,
>
> I'm totally new to boost::simd so forgive me if this is an obvious
> question or if I didn't install it properly.
> (I just downloaded boost-simd-3.1.0.tgz
> <http://nt2.metascale.fr/downloads/boost-simd-3.1.0.tgz> and using it as
> is. I didn't run any make or install commands)
>
> I've successfully compiled the 'hello world' program from the doc and
> I'm now trying this the example below.
>
> |
> pack<int>target;
> pack<int>mask;
> pack<int>value;
>
> mask =target >value;
> |
>
> but I'm getting compile errors with clang 3.5.0 (on linux), see below.
>
> Thanks.
>

Boolean mask in SIMD are not portably storable in a vector of regular
arithmetic type. YOu have yo use the logical<T> type that encapsulate
portable SIMD booleans.

pack<int> target, value;
pack<logical<int>> mask;

mask = target > value;

mau...@blueskystudios.com

unread,
May 5, 2015, 1:49:31 PM5/5/15
to nt2...@googlegroups.com
Ok, that worked, thanks. Sorry for the double post by the way.

I'll probably have some more questions on how to use the result of a relational operator directly rather then assigning to a mask,

Thanks again.

Joel FALCOU

unread,
May 5, 2015, 1:54:16 PM5/5/15
to nt2...@googlegroups.com


On 05/05/2015 19:49, mau...@blueskystudios.com wrote:
> Ok, that worked, thanks. Sorry for the double post by the way.
>
> I'll probably have some more questions on how to use the result of a
> relational operator directly rather then assigning to a mask,

Depending on what you want, you can use if_else( cond,a, b) function
which emulates a if else block assigning to a value.

pack<float> r,a,b;

r = if_else( a > b, b, -a);

which means each element of r get the corresponding element of b or -a
depedning on the a > b condition, eg:

a = [ 1 5 6 7 ]
b = [ 2 8 0 9 ]
a > b= [ F F T F ]

r = [ -1 -5 -6 0 -7 ]

This doesn't require you to handle mask direcly and it's optimized to
take advantage of the bitwise representation of the conditionnal.

There's also variant like seldec selinc etc which do i = i+1 if a
condition is true.


mau...@blueskystudios.com

unread,
May 6, 2015, 3:39:45 PM5/6/15
to nt2...@googlegroups.com
Great, I got that working.

How would you go about it if the action required in the two branches is more than a simple assignment ?


       
if (a > b)
{
    a
= do_1 (a, b);
}
else
{
    a
= do_2 (a, b);
}

// my attempt is slow

pack
<float> pa = load<pack<float>> (a+i);
pack
<float> pb = load<pack<float>> (b+i);
store
<pack<float>> (
    if_else
(pa > pb, do_1 (pa, pb), do_2 (pa, pb)),
    a
+i);





Joel Falcou

unread,
May 6, 2015, 3:45:59 PM5/6/15
to nt2...@googlegroups.com
your do1/2 has to be SIMDified of course but you can just do this.
Take care that if_else in SIMD evaluates both branch then select so if the cost of both do functions is high, your total cost is the sum of them minsu pipeline effects.
Also cehck your do functions are inline or even BOOST_FORCEINLINE.

--
You received this message because you are subscribed to the Google Groups "nt2-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to nt2-dev+u...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

mau...@blueskystudios.com

unread,
May 7, 2015, 3:18:25 PM5/7/15
to nt2...@googlegroups.com
That all seems to work now, but it will run slower then straight up c++ as you pointed out


   
for (int32 i = 0; i < (1024*1024/8); i += simd::pack<float>::size ())
   
{        
        simd
::pack<float> pa = simd::load<simd::pack<float>> (a+i);
        simd
::pack<float> pb = simd::load<simd::pack<float>> (b+i);
       
        simd
::store<simd::pack<float>> (
// 50% slower simd::if_else (pa > pb, (pa+pb)/(pa-pb), (pa+pb)/(pb-pa)), a+i);
// 12% slower simd::if_else (pa > pb, -pb, -pa), a+i);
   
}


That is with -mavx on a Intel(R) Xeon(R) CPU E5-2687W 0 @ 3.10GHz with clang 3.5.0

Joel Falcou

unread,
May 8, 2015, 2:32:47 AM5/8/15
to nt2...@googlegroups.com

You are compiling with -O3 right ?

mau...@blueskystudios.com

unread,
May 8, 2015, 11:33:38 AM5/8/15
to nt2...@googlegroups.com
We usually go no higher then 02 (we have seen problems with some of our numerical code with 02) but using 03 gives the same result.

Joel FALCOU

unread,
May 8, 2015, 11:59:31 AM5/8/15
to nt2...@googlegroups.com
if you know that your data are stored aligned in memory, you can just do

simd::pack<float> pa(a+i);

to generate aligned memory load (movaps) instrinsics.

If this doesn't change anything, I'm looking forward an isolated repro
with your exact compiler command line.

On 08/05/2015 17:33, mau...@blueskystudios.com wrote:
> We usually go no higher then 02 (we have seen problems with some of our
> numerical code with 02) but using 03 gives the same result.
>
> On Friday, May 8, 2015 at 2:32:47 AM UTC-4, Joel Falcou wrote:
>
> You are compiling with -O3 right ?
>
> Le 7 mai 2015 20:18, <mau...@blueskystudios.com <javascript:>> a écrit :
>
> That all seems to work now, but it will run slower then straight
> up c++ as you pointed out
>
> |
>
> for(int32 i =0;i <(1024*1024/8);i +=simd::pack<float>::size ())
> {
> simd::pack<float>pa =simd::load<simd::pack<float>>(a+i);
> simd::pack<float>pb =simd::load<simd::pack<float>>(b+i);
>
> simd::store<simd::pack<float>>(
> // 50% slower simd::if_else (pa > pb, (pa+pb)/(pa-pb),
> (pa+pb)/(pb-pa)), a+i);
> // 12% slower simd::if_else (pa > pb, -pb, -pa), a+i);
> }
>
> |
>
> That is with -mavx on a Intel(R) Xeon(R) CPU E5-2687W 0 @
> 3.10GHz with clang 3.5.0
>
>
> On Wednesday, May 6, 2015 at 3:45:59 PM UTC-4, Joel Falcou wrote:
>
> your do1/2 has to be SIMDified of course but you can just do
> this.
> Take care that if_else in SIMD evaluates both branch then
> select so if the cost of both do functions is high, your
> total cost is the sum of them minsu pipeline effects.
> Also cehck your do functions are inline or even
> BOOST_FORCEINLINE.
>
> 2015-05-06 21:39 GMT+02:00 <mau...@blueskystudios.com>:
>
> Great, I got that working.
>
> How would you go about it if the action required in the
> two branches is more than a simple assignment ?
>
>
> |
>
> if(a >b)
> {
> a =do_1 (a,b);
> }
> else
> {
> a =do_2 (a,b);
> }
>
> // my attempt is slow
>
> pack<float>pa =load<pack<float>>(a+i);
> pack<float>pb =load<pack<float>>(b+i);
> <https://groups.google.com/d/optout>.
>
>
> --
> You received this message because you are subscribed to the
> Google Groups "nt2-dev" group.
> To unsubscribe from this group and stop receiving emails from
> it, send an email to nt2-dev+u...@googlegroups.com <javascript:>.
> For more options, visit https://groups.google.com/d/optout
> <https://groups.google.com/d/optout>.
>
> --
> You received this message because you are subscribed to the Google
> Groups "nt2-dev" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to nt2-dev+u...@googlegroups.com
> <mailto:nt2-dev+u...@googlegroups.com>.

mau...@blueskystudios.com

unread,
May 13, 2015, 4:17:32 PM5/13/15
to nt2...@googlegroups.com
To update everyone interested in this, it turns out that in my testcase clang did a very good job auto vectorizing the c++ comparison function. So I was comparing vectorization by clang versus vectorization by boost::simd, which makes the performance difference I saw a lot less dramatic. 
Reply all
Reply to author
Forward
0 new messages