Long Chen Ziang Hu Junmin Lin Gao, G.R.
Electr. & Comput. Eng. Dept., Delaware Univ., Newark, DE;
This paper appears in: Parallel and Distributed Processing Symposium,
2007. IPDPS 2007. IEEE International
Publication Date: 26-30 March 2007
On page(s): 1-8
ISBN: 1-4244-0910-1
INSPEC Accession Number: 9516876
Digital Object Identifier: 10.1109/IPDPS.2007.370639
Posted online: 2007-06-11 10:26:41.0
-Aarul Jain
Graduate Student, EE Dept.
Arizona State University, Tempe, US
Ph: 480-278-9230
Hi, Sugan,
Thanks for your critique!
About the weakness 3 in your critique,
“The authors have not clearly mentioned about the speed-up that could be obtained if all the techniques that they mentioned are implemented together and the effect of one performance optimization technique over the other “
I think the authors did this.
The table 1 in the paper gives the 1D FFT incremental optimizations performance with 128 threads.
The column GLOPS indicates the performance achieved by combining certain number of techniques together.
i.e.
13.17Gflops is achieved by using Base implementation+ optimal work unit
16.92Gflops is achieved by using Base implementation+ optimal work unit + Special Handling of the first States
And so on and so forth.
So the best performance with all the techniques they mentioned is 20.72Glops
The effect of one performance optimization technique over the other is also can be seen from
the “Speedup Over Base Version” column in the table.
For 2D FFT, they did the same thing but no table was given. Please notice
The performance in this paper, 5.11Gflops->19.37->20.000Gflops.
Thanks,
-Tao