BLIS 0.7.0 now available

22 views
Skip to first unread message

Field Van Zee

unread,
Apr 7, 2020, 3:49:35 PM4/7/20
to blis-devel
Friends of BLIS,

BLIS 0.7.0 is now available.

The marquee feature introduced in BLIS 0.7.0 is support for multithreading within the sup framework. (This feature was made possible thanks to our partnership with AMD.) We've also added new graphs to docs/PerformanceSmall.md to showcase multithreaded performance of the sup implementation on select hardware [1].

A complete list of improvements present in 0.7.0 follows.

Framework:
- Implemented support for multithreading within the sup (skinny/small/unpacked) framework, which previously was single-threaded only. Note that this feature works harmoniously with the selective packing introduced into the sup framework in 0.6.1. (AMD)
- Renamed bli_thread_obarrier() and bli_thread_obroadcast() functions to drop the 'o', which was left over from when thrcomm_t objects tracked both "inner" and "outer" communicators.
- Fixed an obscure int-to-packbuf_t type conversion error that only affects certain C++ compilers (including g++) when compiling application code that includes the BLIS header file blis.h. (Ajay Panyala)
- Added a missing early return statement in bli_thread_partition_2x2(), which provides a slight optimization. (Kiran Varaganti)

Kernels:
- Fixed the semantics of the bli_amaxv() kernels ('s' and 'd') within the 'zen' kernel set. Previously, the kernels (incorrectly) returned the index of the last element whose absolute value was largest (in the event there were multiple of equal value); now, it (correctly) returns the index of the first of such elements. The kernels also now return the index of the first NaN, if one is encountered. (Mat Cross, Devin Matthews)

Build system:
- Warn the user at configure-time when hardware auto-detection returns the 'generic' subconfiguration since this is probably not what they were expecting. (Devin Matthews)
- Removed unnecessary sorting (and duplicate removal) on LDFLAGS in common.mk. (Isuru Fernando)
- Specify the full path to the location of the dynamic library on OSX so that other dynamic libraries that depend on BLIS know where to find the library. (Satish Balay, Jed Brown)

Testing:
- Updated and reorganized test drivers in test/sup so that they work for either single-threaded or multithreaded purposes. (AMD)
- Updated/optimized octave scripts in test/sup for use with octave 5.2.0.
- Minor updates/tweaks to test/1m4m.

Documentation:
- Updated existing single-threaded sup performance graphs with new data and added multithreaded sup graphs to docs/PerformanceSmall.md.
- Added mention of Gentoo support under the external packages section of the README.md.
- Tweaks to docs/Multithreading.md that clarify that setting any BLIS_*_NT variable to 1 will be considered manual specification for the purposes of determining whether to auto-factorize via BLIS_NUM_THREADS. (AMD)

Special thanks to Isuru Fernando, Ajay Panyala, M. Zhou, Mat Cross, Devin Matthews, Satish Balay, Jed Brown, and the CPU libraries group at AMD for their numerous helpful bug reports, suggestions, contributions, and help in tracking down issues. (And please forgive me if I inadvertently left out your name.) Please continue to give us feedback--it is always appreciated!

As always, a full list of changes is available in the CHANGELOG:

  https://github.com/flame/blis/blob/master/CHANGELOG

Please let us know if you have any questions.

Field

[1] https://github.com/flame/blis/blob/master/docs/PerformanceSmall.md
Reply all
Reply to author
Forward
0 new messages