OpenMP 4.0

17 views
Skip to first unread message

Jack Poulson

unread,
Jul 24, 2013, 1:15:18 PM7/24/13
to elemen...@googlegroups.com
Dear all,

The specification for OpenMP 4.0 was recently released,
http://openmp.org/wp/2013/07/openmp-40/
and one of the features caught my eye: SIMD constructs. This would
likely have a noticeable impact on Elemental's performance.

Does anyone have any idea when vendors will be releasing implementations
of the new standard?

Jack

Jeff Hammond

unread,
Jul 25, 2013, 12:27:05 AM7/25/13
to elemen...@googlegroups.com
Intel already supports this pragma. I think other vendors have
something similar enough.

If you tell me what parts of the code you think that it will matter
the most, I'll implement the best available thing with #define to the
supported options.

Jeff
> --
> You received this message because you are subscribed to the Google Groups "elemental-dev" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to elemental-de...@googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.
>
>



--
Jeff Hammond
jeff.s...@gmail.com

Jack Poulson

unread,
Jul 25, 2013, 11:45:54 AM7/25/13
to elemen...@googlegroups.com
Hi Jeff,

A good test case might be the unblocked unpivoted LDL factorization here:
https://github.com/poulson/Elemental/blob/master/include/elemental/lapack-like/LDL/Var3.hpp#L20

I'm uncertain of whether or not it would be better to have manually
SIMD-instrumented for loops to replace blas::Axpy calls in routines such as:
https://github.com/poulson/Elemental/blob/master/include/elemental/blas-like/level1/AxpyTriangle.hpp
which is used quite frequently within symmetric/Hermitian rank-(2)k
updates in order to handle diagonal blocks.

The most productive thing to do would probably be to set up some
microbenchmarks to test the performance of various approaches to some of
these kernels. Perhaps a microbench/ folder should be created, and the
performance of some of the level 1 BLAS routines versus
SIMD-instrumented code could be compared. This would also be a good
place to stress test MPI collectives over subcommunicators. Do you want
to lead this?

Jack

Jeff Hammond

unread,
Jul 25, 2013, 2:11:01 PM7/25/13
to elemen...@googlegroups.com
I already have an MPI collective stress test
(https://code.google.com/p/mpi-qoit/) that I use for Blue Gene. It is
great for finding O(N) metadata :-)

I think the BLAS1 tests would be cool. I'd like to write them so that
they are suitable for evaluating compiler autovectorization as well
since Hal Finkel and I want to be able to do that for LLVM, etc.

I'll not have time to write vectorization tests until September or
October but I'll take the lead if no one else steps up and does it
first.

And I would make the SIMD tests a separate Github (e.g.) project
rather than part of Elemental. Agreed?

Jeff
Reply all
Reply to author
Forward
0 new messages