Hi all,
Version 1.8 is officially out!
This release contains a *lot* of performance optimization that's been ongoing since the last release. The memory pooling has been completely re-engineered, task teams are now much more reliable, native TLS is supported when practical, we have a hazard-pointer implementation, and much much more. Tilera support has been improved (as their development environment has been improved), as has PPC support (we now work on POWER7), and we also now have ARM support. One of the big accomplishments is the existence of spr_init() and spr_unify(), which is part of an effort to create a cross-node environment (which relies on Portals4). We now have a concurrent lock-free hash table (dictionary) with multiple implementations. We've improved our Chapel support significantly. And, of course, we've found and fixed a boatload of bugs, from the subtle to the obvious. I'm including the relevant part of the NEWS file below for a complete, though more terse, summary of the major changes.
Anyway, enjoy!
http://code.google.com/p/qthreads/downloads/detail?name=qthread-1.8.tar.bz2
--- 1.8 ---
New Features:
- Concurrent lock-free hash table (dictionary); three alternate designs
- New "simple" tasks: reduced context swap overhead, no stack size limit, but cannot block or yield
- qthread_spawn() function exposed to simplify complex task creation
- Arbitrary blocking functions now public (BE CAREFUL)
- Optional (experimental) lock-free hash table implementation of FEBs (very fast)
- ARM support
- Tasks can use a sinc as a return value location
- spr_init() can replace MPI_Init()
- spr_unify() converts SPMD to single-flow-of-control
- C++ version of qt_loopaccum_balance and SPR interface
- QT_HWPAR environment variable simplifies concurrency management
Improvements:
- LOTS of performance improvements
- Massive improvement to memory pooling
- Handle buggy versions of hwloc without crashing
- Terse multinode documentation
- Public memory pool now front-end for internal memory pool
- Scribbling in memory pool (debugging assist)
- Reduced memory footprint
- Additional (micro-)benchmarks, including Cilk, TBB, and OpenMP implementations
- Better checking of compile requirements
- Faster precondition handling
- Better Tilera support
- Configure-time define default stack size
- Task teams are now reliable
- Better PPC support (detect all 3 ABIs)
- Use compiler ("native") TLS when practical
- Better behavior when topology information is unavailable
- Use PMI runtime API for multinode
- Hazardptr implementation works under high load
- Matt Baker's Chapel syncvar idea
- Add configure-time "oversubscription" mode; avoids spinlocks, uses sched_yield() when necessary
- More man pages
Bugfixes:
- Fix distance support with hwloc
- Fix alignment error in C++ futurelib (old bug)
- Fix rare hang in Sherwood scheduler
- Fix pwrite() define (Issue #11)
- Improve floating-point tests
- Fix fincr/dincr synchronization bug (some compilers created two reads)
- Fix BOTS benchmark string handling
- Fix reinitialization race condition
- Many minor fixes
--
Kyle B. Wheeler
Dept. 1423: Scalable System Software
Sandia National Laboratories
505-844-0394