Regards
A superlinear decrease in execution time is possible in any parallel
environment when running an iterative algorithm for which the communication
required between processes is very small.
But it is not a true "speedup" in pure, Amdahlian terms. The reason is that
in order to achieve a superlinear decrease in execution time, the algorithm
must converge faster in a parallel implementation than it does in a serial
implementation. So, you're not really running the same code anymore; the
"faster" test runs only a subset of the code of the "slower" test. I know
this doesn't make any difference to the end user, who does see an "apparent"
speedup, but to those of us who rely on these benchmarks to architect the
underlying hardware and software, it's an important distinction.
However, that being said, here are some links that may lead you to a
code/application that will produce these results. I've never worked with
either of these personally; these are just headlines that have caught my
eye within the past few months:
http://www.ncsa.uiuc.edu/SCD/SciHi/Pottenger0797.html
http://www.marc.com/Headlines/k73bnchmrk/benchmarks.htm
D. R. Commander
if all you want to do is demonstrate the cause of "superlinear speedup"
then the method is fairly simple, actually.
find a distributed program/application that uses less memory per process
for the parallel case. run it with a problem size that will cause it to
swap to disk on a uniprocessor. add processors (and memory) until it
stops swapping.
voila! superlinear speedup! :-) (most likely)
now replace the words "memory/disk" with "cache/memory" in the above
paragraph and reread it.
-bill
--
bill rankin ...................................... philosopher/coffee-drinker
wra...@ee.duke.edu ........................................ doctoral wannabe
duke university dept. of electrical engr ......... scientific computing group
As an example you can have a look at:
http://www.genias.de/projects/parasol/newsletter/issue-2/newsletter_2.html
Or directly at:
http://www.genias.de/projects/parasol/newsletter/issue-2/newsletter_2.html#FETI_Solver_shows
If you call this effect super-linear speedup or not is up to you ;-)
Hubert Ertl
--
- - - - - - - - - - - - - - - - - - - - - -- - - - - - - - - -
Dr. Hubert Ertl, GENIAS Software GmbH, Phone: +49 9401 9200-50
*** PaTENT MPI 4.0 http://www.genias.de/products/patent/ ***
@PUB:EMAIL.COM EMDIR wrote:
> Does anyone know whether we can have speedup using MPI? If so, please direct me to the place
> to get the code. I need the C code that will work on dec machine.
>
> Regards
--
Jinsheng
please comment my reasoning, since superlinear speedup is
something strange for me...
it seems to me more than logical that superlinear speedup is impossible,
if you program well the sequential program...
I understand the argument proposed here that you you can improve the cache locality
in some algorithm, by dividing it into small chunk, and thus run faster in parallel.
but you can do this either on a parallel machine, or on a sequential machine
that reproduce the scheduling...
the case of blocked matrix alogrithm is exemplar.
real parallelism just add communication costs.
but I suppose also that you have as much memory
for your sequential mahine as the whole parallel machine have...and as much cache...
so super linear speed up can be just the effect of a gain in global cache size,
and global memory size, on the sequential machine...
anyway this is an interesting effect since, like
CPU performance, cache size is limited by technology,
so parallelism is a way to increase global CPU and cache above limit on a single processor...
am I wrong in my reasoning?