Intel is currently offering a beta version of a C++ library for shared-memory programming. The library contains templates for common parallel programming patterns and containers. These pre-tested templates simplify the writing of correct scalable parallel programs.
Arch D. Robison wrote: > Intel is currently offering a beta version of a C++ library for > shared-memory programming. The library contains templates for common > parallel programming patterns and containers. These pre-tested templates > simplify the writing of correct scalable parallel programs.
> Arch Robison (lead developer of the library) > Intel Corporation
Is there anyway to find out what the api looks like without having to register and download the entire thing? Perhaps somebody who thinks they're going to need it anyway can summarize here.
-- Joe Seigh
When you get lemons, you make lemonade. When you get hardware, you make software.
Joe Seigh wrote: > Is there anyway to find out what the api looks like without > having to register and download the entire thing? Perhaps > somebody who thinks they're going to need it anyway can > summarize here.
It includes some lock-free containers, capabilities, and some frameworky stuff for parallel applications. Looks useful for those who don't want to roll their own--how many people can really write solid lock-free algorithms?
I'm curious what the eventual asking price and competitors would be. Mutex + standard container + condtion is pretty wasteful on multi-processor/multi-core hardware.
Arch D. Robison wrote: > Intel is currently offering a beta version of a C++ library for > shared-memory programming. The library contains templates for common > parallel programming patterns and containers. These pre-tested templates > simplify the writing of correct scalable parallel programs.
Has anyone already compared it to the recently accepted boost library shmem?
>>Is there anyway to find out what the api looks like without >>having to register and download the entire thing? Perhaps >>somebody who thinks they're going to need it anyway can >>summarize here.
> It includes some lock-free containers, capabilities, and some > frameworky stuff for parallel applications. Looks useful for those who > don't want to roll their own--how many people can really write solid > lock-free algorithms?
> I'm curious what the eventual asking price and competitors would be. > Mutex + standard container + condtion is pretty wasteful on > multi-processor/multi-core hardware.
It's probably moot how many people can write lock-free algorithms. Except maybe for lock-free LIFO stacks and Scott and Michael's lock-free queue, most lock-free algorithms have been or will be patented. So the use of lock-free will be limited to commercial libraries.
There another commercial library by Parallel Scalable Solutions http://www.pss-ab.com/ You don't have to register to see the documentation. I haven't used it myself.
-- Joe Seigh
When you get lemons, you make lemonade. When you get hardware, you make software.
Obligatory disclaimer: The following are my own opinions, not Intel's. I am the lead developer of the library.
There are programming languages with direct support for parallel programming, but they can be difficult to integrate into existing environments. So we looked at various parallel languages, and asked ourselves "how much of this can we turn into a C++ library?" Though a library cannot provide the beautiful syntax of a new language, C++ is nonetheless a powerful language that let us adapt much useful functionality from the other languages. For examples, the library's task scheduler is adapted from Cilk (http://supertech.csail.mit.edu/cilk/), and the parallel loops operate on a recursive range concept inspired by STAPL (http://parasol.tamu.edu/compilers/research/STAPL/). C++ is a powerful language for this purpose because it combines the efficiency of C with support for generic programming. Two examples: You use the parallel for-all template with your own type of iteration space. You can use the parallel reduction template on your own types, not only built-in types.
The library is not a general purpose threading library. It targets threading for speed-up on systems with multiple CPUs or multi-core. Threading for speed-up, as anyone who has done it will attest, not only requires avoiding race conditions, but also using resources efficiently (e.g. cache, memory bandwidth, memory space, load balancing). Simply unleashing a thread for every possible piece of work that can be done in parallel will bog a machine down. The library uses a work-stealing approach from Cilk, which tends to make efficient use of memory space and cache, avoid oversubscribing the hardware with an excessive number of threads. Also, the work-stealing approach deals well with load balancing across processors.
The containers in the library are independent of the scheduler. They use a combination of lock-free techniques and fine-grain locking. We tried both approaches for some containers, and found that in general, fine-grain locking performs better than lockless, because the former usually use fewer atomic operations. Atomic operations are fairly expensive on modern processors, because of their interaction with deep pipelines and caches. Furthermore, there are subtle memory reclamation issues in lock-free algorithms that are an issue for languages without garbage collection. See http://www.research.ibm.com/people/m/michael/ieeetpds-2004.pdf for a discussion. In some contexts, the advantages of lockless algorithms outweigh their costs. Depending on feedback, we will consider whether to add purely lockless algorithms in future versions of the library. We deliberately started small, and want to grow the library based on experience.
Arch D. Robison wrote: > The containers in the library are independent of the scheduler. They use a > combination of lock-free techniques and fine-grain locking. We tried both > approaches for some containers, and found that in general, fine-grain > locking performs better than lockless, because the former usually use fewer > atomic operations. Atomic operations are fairly expensive on modern > processors, because of their interaction with deep pipelines and caches. > Furthermore, there are subtle memory reclamation issues in lock-free > algorithms that are an issue for languages without garbage collection. See > http://www.research.ibm.com/people/m/michael/ieeetpds-2004.pdf for a > discussion. In some contexts, the advantages of lockless algorithms > outweigh their costs. Depending on feedback, we will consider whether to > add purely lockless algorithms in future versions of the library. We > deliberately started small, and want to grow the library based on > experience.
You can use things like a version of RCU combined with SMR hazard pointers to eliminate the requirement for the store/load (MFENCE?) memory barrier in the hazard pointer. E.g. on my system (no MFENCE instruction) the hazard pointer load code w/o memory barriers to stall the pipeline