Atomic mode is not supported, you need to run in timing mode. Use the
examples in configs/fractal/. Those in configs/example are from
original M5 releases and are not MV5 specific. Also, splash benchmarks
are not supported since they use pthread. The supported benchmarks are
in api/tests/.
Jiayuan
I would say the latter, switching among threads over a scalar core. I guess it is implemented by MTSimpleCPU?
------------------
Liang Wang
On Thu, Nov 1, 2012 at 7:30 PM, Jiayuan Meng <meng.j...@gmail.com> wrote:What do you mean by multi-threading simulations? Do you mean switching among warps over SIMD cores, or you mean switching among threads over a scalar core?Thanks,Jiayuan
You can look at configs/fractal/frCommons.py to see if there are smaller input. (I'll take a look later when I get back to Chicago if you need even smaller inputs)For LU, if you set the last parameter in SIMD mode to 0, like lu 300 0, it will be equal to lu 300 (though take much longer)If you can build the cross compiler, type "make MODE=3" in api/ to rebuild the binaries in bin_smp.
On Sat, Nov 17, 2012 at 10:11 AM, Liang Wang <lwan...@gmail.com> wrote:
Yeah! These new binaries work! One minor issue:* The LU binary for smp does not accept the same cmd parameters as that for simd. With smp binary, it accepts exactly 1 parameters (assert(argc==2)), while simd binary accepts exactly 2 parameters (assert(argc==3)). So if I want to make the two simulation results comparable, how should I set parameters for smp binary with a single parameter?Besides, the current benchmarks input sets leads to quite a long simulation time for multi-core configurations (e.g. --numcpus=8), so is there a set of benchmark cmds with a smaller input data set to reduce the simulation time?Thanks!
I see... If you are trying to compare SIMD with SMP, it's probably a
bit tricky.. Because the fractal API implements SIMD and SMP
differently, and the SMP binaries are generated in earlier days, there
could be some inefficiency in the API itself associated with every
innermost loop iteration, which has a noticeable overhead especially
if the innermost loop body is extremely small. Which benchmark is it?
A better way to compare SIMD and multi-threaded core is to just use
SIMDTimingCPU for the multi-threaded core, where you set
numHWTCs=threads_to_hide_latency, and wapSize=1.