BLIS 0.6.0 now available

26 views
Skip to first unread message

Field Van Zee

unread,
Jun 3, 2019, 7:40:48 PM6/3/19
to blis-devel
Friends of BLIS,

BLIS 0.6.0 is now available.

The marquee feature introduced in BLIS 0.6.0 is support for accelerated dgemm on small/skinny matrix problems for a wide range of modern x86_64 microarchitectures. (This feature was made possible thanks to funding from AMD.) We've also provided a new document, docs/PerformanceSmall.md, to showcase the improved performance on select hardware [1].

A complete list of improvements present in 0.6.0 follows.

Framework:
- Implemented small/skinny/unpacked (sup) framework for accelerated level-3 performance when at least one matrix dimension is small (or very small). For now, only dgemm is optimized, and this new implementation currently only targets Intel Haswell through Coffee Lake, and AMD Zen-based Ryzen/Epyc. (The existing kernels should extend without significant modification to Zen2-based Ryzen/Epyc once they are available.) Also, multithreaded parallelism is not yet implemented, though application-level threading should be fine. (AMD)
- Changed function pointer usages of void* to new, typedef'ed type void_fp.
- Allow compile-time disabling of BLAS prototypes in BLIS, in case the application already has access to prototypes.
- In bli_system.h, define _POSIX_C_SOURCE to 200809L if the macro is not already defined. This ensures that things such as pthreads are properly defined by an application has #include "blis.h" but omits the definition of _POSIX_C_SOURCE from the command-line compiler options. (Christos Psarras)

Kernels:
- None.

Build system:
- Updated the way configure and the top-level Makefile handle installation prefixes (prefix, exec_prefix, libdir, includedir, sharedir) to better conform with GNU conventions.
- Improved clang version detection. (Isuru Fernando)
- Use pthreads on MinGW and Cygwin. (Isuru Fernando)

Testing:
- Added Eigen support to test drivers in test/3.
- Fix inadvertently hidden xerbla_() in blastest drivers when building only shared libraries. (Isuru Fernando, M. Zhou)

Documentation:
- Added docs/PerformanceSmall.md to showcase new BLIS small/skinny dgemm performance on Kaby Lake and Epyc.
- Added Eigen results (3.3.90) to performance graphs showcased in docs/Performance.md.
- Added BLIS thread factorization info to docs/Performance.md.

Special thanks to Isuru Fernando, Sameer Agarwal, M. Zhou, Christos Psarras, and the CPU libraries group at AMD for their numerous helpful bug reports, suggestions, contributions, and help in tracking down issues. (And please forgive me if I inadvertently left out your name.) Please continue to give us feedback--it is always appreciated!

As always, a full list of changes is available in the CHANGELOG:

https://github.com/flame/blis/blob/master/CHANGELOG

Please let us know if you have any questions.

Field

[1] https://github.com/flame/blis/blob/master/docs/PerformanceSmall.md

Reply all
Reply to author
Forward
0 new messages