For more information, download the beta for free via
http://www.intel.com/software/products/tbb/beta
Arch Robison (lead developer of the library)
Intel Corporation
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
Is there anyway to find out what the api looks like without
having to register and download the entire thing? Perhaps
somebody who thinks they're going to need it anyway can
summarize here.
--
Joe Seigh
When you get lemons, you make lemonade.
When you get hardware, you make software.
It includes some lock-free containers, capabilities, and some
frameworky stuff for parallel applications. Looks useful for those who
don't want to roll their own--how many people can really write solid
lock-free algorithms?
I'm curious what the eventual asking price and competitors would be.
Mutex + standard container + condtion is pretty wasteful on
multi-processor/multi-core hardware.
Has anyone already compared it to the recently accepted boost library
shmem?
http://lists.boost.org/boost-announce/2006/02/0082.php
http://lists.boost.org/boost-announce/2006/02/0083.php
Malte
It's probably moot how many people can write lock-free algorithms. Except maybe
for lock-free LIFO stacks and Scott and Michael's lock-free queue, most
lock-free algorithms have been or will be patented. So the use of lock-free
will be limited to commercial libraries.
There another commercial library by Parallel Scalable Solutions
http://www.pss-ab.com/
You don't have to register to see the documentation. I haven't
used it myself.
--
Joe Seigh
When you get lemons, you make lemonade.
When you get hardware, you make software.
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
There are programming languages with direct support for parallel
programming, but they can be difficult to integrate into existing
environments. So we looked at various parallel languages, and asked
ourselves "how much of this can we turn into a C++ library?" Though a
library cannot provide the beautiful syntax of a new language, C++ is
nonetheless a powerful language that let us adapt much useful functionality
from the other languages. For examples, the library's task scheduler is
adapted from Cilk (http://supertech.csail.mit.edu/cilk/), and the parallel
loops operate on a recursive range concept inspired by STAPL
(http://parasol.tamu.edu/compilers/research/STAPL/). C++ is a powerful
language for this purpose because it combines the efficiency of C with
support for generic programming. Two examples: You use the parallel for-all
template with your own type of iteration space. You can use the parallel
reduction template on your own types, not only built-in types.
The library is not a general purpose threading library. It targets
threading for speed-up on systems with multiple CPUs or multi-core.
Threading for speed-up, as anyone who has done it will attest, not only
requires avoiding race conditions, but also using resources efficiently
(e.g. cache, memory bandwidth, memory space, load balancing). Simply
unleashing a thread for every possible piece of work that can be done in
parallel will bog a machine down. The library uses a work-stealing approach
from Cilk, which tends to make efficient use of memory space and cache,
avoid oversubscribing the hardware with an excessive number of threads.
Also, the work-stealing approach deals well with load balancing across
processors.
The containers in the library are independent of the scheduler. They use a
combination of lock-free techniques and fine-grain locking. We tried both
approaches for some containers, and found that in general, fine-grain
locking performs better than lockless, because the former usually use fewer
atomic operations. Atomic operations are fairly expensive on modern
processors, because of their interaction with deep pipelines and caches.
Furthermore, there are subtle memory reclamation issues in lock-free
algorithms that are an issue for languages without garbage collection. See
http://www.research.ibm.com/people/m/michael/ieeetpds-2004.pdf for a
discussion. In some contexts, the advantages of lockless algorithms
outweigh their costs. Depending on feedback, we will consider whether to
add purely lockless algorithms in future versions of the library. We
deliberately started small, and want to grow the library based on
experience.
Arch D. Robison
Intel Corporation
You can use things like a version of RCU combined with SMR hazard pointers
to eliminate the requirement for the store/load (MFENCE?) memory barrier
in the hazard pointer. E.g. on my system (no MFENCE instruction)
the hazard pointer load code w/o memory barriers to stall the
pipeline
static __inline__ void * smrload32(void **hptr, void **src) {
void * ret;
__asm__ __volatile__ (
"mov 0(%2), %%ecx ;\n" // load source pointer
"1:\t"
"mov %%ecx, %0 ;\n"
"mov %%ecx, 0(%1) ;\n" // store into hazard pointer
"mov 0(%2), %%ecx ;\n" // reload source pointer
"cmp %%ecx, %0 ;\n"
"jne 1b ;\n"
"mov %%ecx, 4(%1) ;\n" // store into hazard pointer[1]
: "=&r" (ret)
: "r" (hptr), "r" (src)
: "cc", "memory", "ecx"
);
return ret;
}
runs about 10 times faster than the hazard pointer load w/ memory barriers
(8 psecs vs. 81 psec on a 866 Mhz P3)
static __inline__ void * smrload_sync32(void **hptr, void **src) {
void * ret;
__asm__ __volatile__ (
"mov 0(%2), %%ecx ;\n" // load source pointer
"1:\t"
"mov %%ecx, %0 ;\n"
"lock; addl $0, 0(%%esp);\n" // release membar
"mov %%ecx, 0(%1) ;\n" // store into hazard pointer
"lock; addl $0, 0(%%esp);\n" // store/load membar
"mov 0(%2), %%ecx ;\n" // reload source pointer
"cmp %%ecx, %0 ;\n"
"jne 1b ;\n"
"mov %%ecx, 4(%1) ;\n" // store into hazard pointer[1]
: "=&r" (ret)
: "r" (hptr), "r" (src)
: "cc", "memory", "ecx"
);
return ret;
}
--
Joe Seigh
When you get lemons, you make lemonade.
When you get hardware, you make software.
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
> runs about 10 times faster than the hazard pointer load w/ memory barriers
> (8 psecs vs. 81 psec on a 866 Mhz P3)
That should be nsec, not psec of course.
--
Joe Seigh
When you get lemons, you make lemonade.
When you get hardware, you make software.
[ See http://www.gotw.ca/resources/clcm.htm for info about ]