do_vectorized() in struct Dot and alike for shared memory parallel reduction

10 views
Skip to first unread message

Denis Davydov

unread,
Dec 20, 2016, 11:08:27 AM12/20/16
to deal.II developers
Hi All,

I was digging around lac/vectorized_operations_internal.h and noticed that objects that are used in SIMD to do reduction are of the form:

template <typename Number, typename Number2>

  struct Dot

  {    
   
VectorizedArray<Number>
    do_vectorized
(const size_type i) const
   
{
     
VectorizedArray<Number> x, y;
      x
.load(X+i);
      y
.load(Y+i);
     
return x * y;
   
}

    const Number  *X;

    const Number2 *Y;
 
}


which are used like

        VectorizedArray<Number> r0 = op.do_vectorized(index);
       
VectorizedArray<Number> r1 = op.do_vectorized(index+nvecs);
       
VectorizedArray<Number> r2 = op.do_vectorized(index+2*nvecs);
       
VectorizedArray<Number> r3 = op.do_vectorized(index+3*nvecs);

where 

const unsigned int nvecs = VectorizedArray<Number>::n_array_elements;


I wonder should not declarations be moved to the class variable as mutable?
I don't know internals of VectorizedArray, but i would assume that currently it allocates memory for the two variables on each call to do_vectorized(),
which would not be the case otherwise.

Regards,
Denis.

Martin Kronbichler

unread,
Dec 20, 2016, 11:22:50 AM12/20/16
to dealii-d...@googlegroups.com

Hi Denis,


        VectorizedArray<Number> r0 = op.do_vectorized(index);
       
VectorizedArray<Number> r1 = op.do_vectorized(index+nvecs);
       
VectorizedArray<Number> r2 = op.do_vectorized(index+2*nvecs);
       
VectorizedArray<Number> r3 = op.do_vectorized(index+3*nvecs);

where 

const unsigned int nvecs = VectorizedArray<Number>::n_array_elements;


I wonder should not declarations be moved to the class variable as mutable?
I don't know internals of VectorizedArray, but i would assume that currently it allocates memory for the two variables on each call to do_vectorized(),
which would not be the case otherwise.
Do you mean r0 through r3? There is no memory allocation involved as VectorizedArray<double> is a small array of at most 512 bit (64 bytes). It is kept on the stack or just as a temporary variable in the registers of the machine code. From the outside, it is the same as a double variable and creating/copying it is just as cheap or expensive as copying a double variable, at least as long as load/stores go to L1 cache.
So it is actually cheaper to not have the variables in the class. The compiler would need to write back the value of memory variable to memory once it leaves the function, whereas it can simply ignore them for temporary variables.

Best,
Martin

Denis Davydov

unread,
Dec 20, 2016, 11:26:34 AM12/20/16
to deal.II developers


On Tuesday, December 20, 2016 at 5:22:50 PM UTC+1, Martin Kronbichler wrote:

Hi Denis,


        VectorizedArray<Number> r0 = op.do_vectorized(index);
       
VectorizedArray<Number> r1 = op.do_vectorized(index+nvecs);
       
VectorizedArray<Number> r2 = op.do_vectorized(index+2*nvecs);
       
VectorizedArray<Number> r3 = op.do_vectorized(index+3*nvecs);

where 

const unsigned int nvecs = VectorizedArray<Number>::n_array_elements;


I wonder should not declarations be moved to the class variable as mutable?
I don't know internals of VectorizedArray, but i would assume that currently it allocates memory for the two variables on each call to do_vectorized(),
which would not be the case otherwise.
Do you mean r0 through r3? There is no memory allocation involved as VectorizedArray<double> is a small array of at most 512 bit (64 bytes). It is kept on the stack or just as a temporary variable in the registers of the machine code. From the outside, it is

I actually meant those inside 

do_vectorized(const size_type i) const

but I presume the same explanation applies. 

Thanks for clarifying!
 
the same as a double variable and creating/copying it is just as cheap or expensive as copying a double variable, at least as long as load/stores go to L1 cache.
So it is actually cheaper to not have the variables in the class. The compiler would need to write back the value of memory variable to memory once it leaves the function, whereas it can simply ignore them for temporary variables.

Interesting! Many small but important details ;-)

Cheers,
Denis. 
Reply all
Reply to author
Forward
0 new messages