To put a 2x speedup in perspective for this type calculation, consider
that 2x-5x speedups are considered a big deal by the CAE community ...
see e.g. "Accelerating the ANSYS Direct Sparse Solver with
GPUs" (Krawezik and Poole),
http://saahpc.ncsa.illinois.edu/09/papers/Krawezik_paper.pdf
.
This simple iterative solver is extremely STREAM-like, so the ability
of the ISPC system to offer any speedups over natively compiled gcc is
I think an accomplishment. Note that the direct sparse solver
discussed in the previously mentioned paper relies heavily on GEMM
(matrix-multiplication) operations which both AMD and Intel offer
optimized implementations of as part of their ACML/MKL math
libraries. Neither of these libraries offer sparse iterative solver
support despite their compactness in storage (memory usage) and their
ability to offer in general faster time to solution (especially with
preconditioning) as opposed to direct solver approaches.
Doug
> Some observations that can be made is that in general ISPC can offer
> better than 2x speedups for singe precision code and almost 2x
> speedups for double precision codes.
>
>