newbie question, how to do: r = a

mau...@blueskystudios.com

unread,

May 1, 2015, 2:35:18 PM5/1/15

to nt2...@googlegroups.com

Hi,

I'm totally new to boost::simd so forgive me if this is an obvious question or if I didn't install it properly.
(I just downloaded boost-simd-3.1.0.tgz and using it as is. I didn't run any make or install commands)

I've successfully compiled the 'hello world' program from the doc and I'm now trying this the example below.

pack<int> target;
pack<int> mask;
pack<int> value;

mask = target > value;

but I'm getting compile errors with clang 3.5.0 (on linux), see below.

Thanks.

In file included from /netDISKS/master/netmt/LINUX_C65/cgi/boost-simd/3.1.0/boost-simd/include/boost/simd/sdk/simd/pack/generator.hpp:13:
/netDISKS/master/netmt/LINUX_C65/cgi/boost-simd/3.1.0/boost-simd/include/boost/simd/sdk/dsl/typed_expression.hpp:59:58: error: 
      no type named 'type' in 'boost::dispatch::meta::call<boost::proto::tagns_::tag::assign (boost::simd::native<int,
      boost::simd::tag::sse_, void> &, const boost::simd::native<boost::simd::logical<int>, boost::simd::tag::sse_, void>
      &), void>'
    BOOST_PP_REPEAT_FROM_TO(1, BOOST_DISPATCH_MAX_ARITY, M0, ~)
<more errors follow>

Joel FALCOU

unread,

May 1, 2015, 3:09:40 PM5/1/15

to nt2...@googlegroups.com

On 01/05/2015 20:01, mau...@blueskystudios.com wrote:
> Hi,
>
> I'm totally new to boost::simd so forgive me if this is an obvious
> question or if I didn't install it properly.
> (I just downloaded boost-simd-3.1.0.tgz

> <http://nt2.metascale.fr/downloads/boost-simd-3.1.0.tgz> and using it as

> is. I didn't run any make or install commands)
>
> I've successfully compiled the 'hello world' program from the doc and
> I'm now trying this the example below.
>
> |
> pack<int>target;
> pack<int>mask;
> pack<int>value;
>

> mask =target >value;

> |
>
> but I'm getting compile errors with clang 3.5.0 (on linux), see below.
>
> Thanks.
>

Boolean mask in SIMD are not portably storable in a vector of regular
arithmetic type. YOu have yo use the logical<T> type that encapsulate
portable SIMD booleans.

pack<int> target, value;
pack<logical<int>> mask;

mask = target > value;

mau...@blueskystudios.com

unread,

May 5, 2015, 1:49:31 PM5/5/15

to nt2...@googlegroups.com

Ok, that worked, thanks. Sorry for the double post by the way.

I'll probably have some more questions on how to use the result of a relational operator directly rather then assigning to a mask,

Thanks again.

Joel FALCOU

unread,

May 5, 2015, 1:54:16 PM5/5/15

to nt2...@googlegroups.com

On 05/05/2015 19:49, mau...@blueskystudios.com wrote:
> Ok, that worked, thanks. Sorry for the double post by the way.
>
> I'll probably have some more questions on how to use the result of a
> relational operator directly rather then assigning to a mask,

Depending on what you want, you can use if_else( cond,a, b) function
which emulates a if else block assigning to a value.

pack<float> r,a,b;

r = if_else( a > b, b, -a);

which means each element of r get the corresponding element of b or -a
depedning on the a > b condition, eg:

a = [ 1 5 6 7 ]
b = [ 2 8 0 9 ]
a > b= [ F F T F ]

r = [ -1 -5 -6 0 -7 ]

This doesn't require you to handle mask direcly and it's optimized to
take advantage of the bitwise representation of the conditionnal.

There's also variant like seldec selinc etc which do i = i+1 if a
condition is true.

mau...@blueskystudios.com

unread,

May 6, 2015, 3:39:45 PM5/6/15

to nt2...@googlegroups.com

Great, I got that working.

How would you go about it if the action required in the two branches is more than a simple assignment ?

        
if (a > b) 
{
    a = do_1 (a, b);
}
else
{
    a = do_2 (a, b);
}

// my attempt is slow

pack<float> pa = load<pack<float>> (a+i);
pack<float> pb = load<pack<float>> (b+i);
store<pack<float>> (
    if_else (pa > pb, do_1 (pa, pb), do_2 (pa, pb)), 
    a+i);

Joel Falcou

unread,

May 6, 2015, 3:45:59 PM5/6/15

to nt2...@googlegroups.com

your do1/2 has to be SIMDified of course but you can just do this.

Take care that if_else in SIMD evaluates both branch then select so if the cost of both do functions is high, your total cost is the sum of them minsu pipeline effects.

Also cehck your do functions are inline or even BOOST_FORCEINLINE.

--
You received this message because you are subscribed to the Google Groups "nt2-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to nt2-dev+u...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

mau...@blueskystudios.com

unread,

May 7, 2015, 3:18:25 PM5/7/15

to nt2...@googlegroups.com

That all seems to work now, but it will run slower then straight up c++ as you pointed out


    for (int32 i = 0; i < (1024*1024/8); i += simd::pack<float>::size ())
    {        
        simd::pack<float> pa = simd::load<simd::pack<float>> (a+i);
        simd::pack<float> pb = simd::load<simd::pack<float>> (b+i);
        
        simd::store<simd::pack<float>> (
// 50% slower simd::if_else (pa > pb, (pa+pb)/(pa-pb), (pa+pb)/(pb-pa)), a+i);
// 12% slower simd::if_else (pa > pb, -pb, -pa), a+i);
    }

That is with -mavx on a Intel(R) Xeon(R) CPU E5-2687W 0 @ 3.10GHz with clang 3.5.0

Joel Falcou

unread,

May 8, 2015, 2:32:47 AM5/8/15

to nt2...@googlegroups.com

You are compiling with -O3 right ?

mau...@blueskystudios.com

unread,

May 8, 2015, 11:33:38 AM5/8/15

to nt2...@googlegroups.com

We usually go no higher then 02 (we have seen problems with some of our numerical code with 02) but using 03 gives the same result.

Joel FALCOU

unread,

May 8, 2015, 11:59:31 AM5/8/15

to nt2...@googlegroups.com

if you know that your data are stored aligned in memory, you can just do

simd::pack<float> pa(a+i);

to generate aligned memory load (movaps) instrinsics.

If this doesn't change anything, I'm looking forward an isolated repro
with your exact compiler command line.

On 08/05/2015 17:33, mau...@blueskystudios.com wrote:
> We usually go no higher then 02 (we have seen problems with some of our
> numerical code with 02) but using 03 gives the same result.
>
> On Friday, May 8, 2015 at 2:32:47 AM UTC-4, Joel Falcou wrote:
>
> You are compiling with -O3 right ?
>

> Le 7 mai 2015 20:18, <mau...@blueskystudios.com <javascript:>> a écrit :
>
> That all seems to work now, but it will run slower then straight
> up c++ as you pointed out
>
> |
>

> for(int32 i =0;i <(1024*1024/8);i +=simd::pack<float>::size ())
> {
> simd::pack<float>pa =simd::load<simd::pack<float>>(a+i);
> simd::pack<float>pb =simd::load<simd::pack<float>>(b+i);

>
> simd::store<simd::pack<float>>(
> // 50% slower simd::if_else (pa > pb, (pa+pb)/(pa-pb),
> (pa+pb)/(pb-pa)), a+i);
> // 12% slower simd::if_else (pa > pb, -pb, -pa), a+i);
> }
>
> |
>
> That is with -mavx on a Intel(R) Xeon(R) CPU E5-2687W 0 @
> 3.10GHz with clang 3.5.0
>
>
> On Wednesday, May 6, 2015 at 3:45:59 PM UTC-4, Joel Falcou wrote:
>
> your do1/2 has to be SIMDified of course but you can just do
> this.
> Take care that if_else in SIMD evaluates both branch then
> select so if the cost of both do functions is high, your
> total cost is the sum of them minsu pipeline effects.
> Also cehck your do functions are inline or even
> BOOST_FORCEINLINE.
>
> 2015-05-06 21:39 GMT+02:00 <mau...@blueskystudios.com>:
>
> Great, I got that working.
>
> How would you go about it if the action required in the
> two branches is more than a simple assignment ?
>
>
> |
>
> if(a >b)
> {
> a =do_1 (a,b);
> }
> else
> {
> a =do_2 (a,b);
> }
>
> // my attempt is slow
>

> pack<float>pa =load<pack<float>>(a+i);
> pack<float>pb =load<pack<float>>(b+i);

> <https://groups.google.com/d/optout>.

>
>
> --
> You received this message because you are subscribed to the
> Google Groups "nt2-dev" group.
> To unsubscribe from this group and stop receiving emails from

> it, send an email to nt2-dev+u...@googlegroups.com <javascript:>.

> For more options, visit https://groups.google.com/d/optout

> <https://groups.google.com/d/optout>.

>
> --
> You received this message because you are subscribed to the Google
> Groups "nt2-dev" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to nt2-dev+u...@googlegroups.com

> <mailto:nt2-dev+u...@googlegroups.com>.

mau...@blueskystudios.com

unread,

May 13, 2015, 4:17:32 PM5/13/15

to nt2...@googlegroups.com

To update everyone interested in this, it turns out that in my testcase clang did a very good job auto vectorizing the c++ comparison function. So I was comparing vectorization by clang versus vectorization by boost::simd, which makes the performance difference I saw a lot less dramatic.

Reply all

Reply to author

Forward

newbie question, how to do: r = a > b

mau...@blueskystudios.com

Joel FALCOU

mau...@blueskystudios.com

Joel FALCOU

mau...@blueskystudios.com

Joel Falcou

mau...@blueskystudios.com

Joel Falcou

mau...@blueskystudios.com

Joel FALCOU

mau...@blueskystudios.com