Here's my critic for this paper:
The paper describes a technique of utilizing different architectures
to get the best performance. It utilizes the capabilities of the
compiler to extract threads, further split them if necessary and
schedule them on specific processors. It also illustrates the
optimizations that can be performed by the compiler.
Strengths of the paper:
1) With the technique presented, different architectures can
complement each other's disadvantages in performance and power.
2) The technique can be extended to further cores to get the optimal
execution time.
3) The paper considers imbalance of thread sizes between the
superscalar and VLIW processors and accordingly presents a technique
to overcome it by splitting the thread and re-analyzing it.
4) The compiler also performs a number of optimizations (profitability
analysis) to determine the best combination.
5) The paper performs analysis for varying amount of L2 miss latency
and shows that with pre-execution, the L2 miss latency has little
effect on the execution time.
6) The technique of pre-execution yields good results for applications
that suffer from high L2 miss frequency.
Weaknesses:
1) There is no significant improvement in total execution time with
the use of Pre-execution and pre-fetching.
2) The average utilization of the Superscalar functional units is very
low: 4.2%
3) Currently, the focus is on a superscalar processor and a VLIW
processor of pre-defined frequency. The compiler has to be finely
tuned to extract maximum performance considering different
configurations that can be used.
Ashay