How to run example SMP

50 views
Skip to first unread message

Eliseu Miguel

unread,
Mar 21, 2012, 2:19:50 PM3/21/12
to MV5sim
Dear guys,

I am trying to run your example, smp.py, with the following
command line:

~/simulator/hgfractal$ build/ALPHA_SE/m5.fast configs/example/smp.py
--rootdir=splash2/codes/kernels --benchmark=FFT --numcpus=16 --
l1latency=2ns --frequency=1GHz --l1size=32kB --l2size=256kB --
l2latency=10ns --physmem=128MB

But, I am getting the following error:

Traceback (most recent call last):
File "<string>", line 1, in <module>
File "build/ALPHA_SE/python/m5/main.py", line 350, in main
execfile(sys.argv[0], scope)
File "configs/example/smp.py", line 158, in <module>
for i in xrange(options.numcpus)]
File "build/ALPHA_SE/python/m5/SimObject.py", line 531, in __init__
setattr(self, key, val)
File "build/ALPHA_SE/python/m5/SimObject.py", line 612, in
__setattr__
% (self.__class__.__name__, attr)
AttributeError: Class MTAtomicCPU has no parameter numberOfThreads

Could someone explain me what I should do to get this example
running correctly?

Warm regards,

Eliseu

Jiayuan Meng

unread,
Mar 21, 2012, 10:28:08 PM3/21/12
to mv5...@googlegroups.com
Hi Eliseu,

Atomic mode is not supported, you need to run in timing mode. Use the
examples in configs/fractal/. Those in configs/example are from
original M5 releases and are not MV5 specific. Also, splash benchmarks
are not supported since they use pthread. The supported benchmarks are
in api/tests/.

Jiayuan

Jiayuan Meng

unread,
Mar 21, 2012, 10:28:51 PM3/21/12
to mv5...@googlegroups.com
FYI, if you use configs/fractal/fractal_smp.py, it is by default using
timing mode.

Liang Wang

unread,
Oct 31, 2012, 4:45:02 PM10/31/12
to mv5...@googlegroups.com, meng.j...@gmail.com
Hi Jiayuan,

I came up with seg fault using fractal_smp.py in non-simd mode. Although I adopted this command line from one of the SIMD experiments in hgfractal/drivers, the config script of fractal_smp.py tends to instantiate MT core only with relevant options. So I only change the --simd option to False to do a MT run. Are there any incompatible options for MT that leads to this segmentation fault? My complete command line is as follow:

/if10/lw2aw/archsim/hgfractal/build/ALPHA_SE/m5.fast --stats-file=/home/lw2aw/tmp/m5.txt /if10/lw2aw/archsim/hgfractal/configs/fractal/fractal_smp.py --rootdir=/if10/lw2aw/archsim/hgfractal/ --bindir=/if10/lw2aw/archsim/hgfractal/api/binsimd_blocktask/ --simd=False --CpuFrequency=2.00GHz --DTBsize=1024 --DcacheAssoc=8 --DcacheBanks=4 --DcacheBlkSize=128 --DcacheHWPFdegree=1 --DcacheHWPFpolicy=none --DcacheLookupLatency=2ns --DcachePropagateLatency=0ns --DcacheRepl=LRU --DcacheSize=32kB --ITBsize=256 --IcacheAssoc=4 --IcacheBlkSize=128 --IcacheHWPFdegree=1 --IcacheHWPFpolicy=none --IcacheSize=16kB --L2NetBandWidthMbps=456000 --L2NetFrequency=300MHz --L2NetPortOutQueueLength=4 --L2NetRouterBufferSize=256 --L2NetRoutingLatency=1ns --L2NetTimeOfFlight=13t --L2NetType=FullyNoC --L2NetWormHole=True --MemNetBandWidthMbps=128000 --MemNetFrequency=266MHz --MemNetPortOutQueueLength=4 --MemNetRouterBufferSize=2048 --MemNetTimeOfFlight=130t --benchmark=KMEANS --l2Assoc=16 --l2Banks=16 --l2BlkSize=128 --l2HWPFDataOnly=False --l2HWPFdegree=1 --l2HWPFpolicy=none --l2MSHRs=256 --l2Repl=LRU --l2Size=4096kB --l2TgtsPerMSHR=64 --l2lookupLatency=2ns --l2propagateLatency=30ns --l2tol1ratio=2 --localAddrPolicy=0 --maxThreadBlockSize=0 --numHWTCs=32 --numSWTCs=2 --numcpus=1 --physmemLatency=100ns --physmemSize=1024MB --portLookup=0 --protocol=mesi --randStackOffset=True --restoreContextDelay=0 --retryDcacheDelay=100 --stackAlloc=3 --switchOnDataAcc=True --warpSize=8 --footprint=/home/lw2aw/tmp/m5.footprint

===================================

The stdout is as follows:

M5 Simulator System

Copyright (c) 2001-2006
The Regents of The University of Michigan
All Rights Reserved


M5 compiled Oct 30 2012 16:05:10
M5 started Wed Oct 31 14:21:18 2012
M5 executing on tesla
command line: (omitted as duplicate)

L2 cache size: 4096kB
Global frequency set at 1000000000000 ticks per second
object: system.cpus_smp
clock: 500
numTCs: 34
object: system.cpus_smp.icache
hit_latency: 652
pass_latency: 388
object: system.cpus_smp.dcache
hit_latency: 710
pass_latency: 345
object: system.l2
hit_latency: 34857
pass_latency: 404
object: system.toL2net
clock: 3333
width(bytes/cycles): 189
routingLatency(cycles): 1
maxPortInBufferSize(bytes): 256
maxPortOutQueueLen(# packets): 4
linkQueueSize(bytes): 256
warn: Entering event queue @ 0.  Starting simulation...
warn: Increasing stack size by one page.
warn: Program declares local address space but the hardware is not configured fo
warn: Increasing stack size by one page.
warn: Increasing stack size by one page.
test.pbs: line 12: 24452 Segmentation fault      (follows the complete line of commands, omitted as duplicate)

Jiayuan Meng

unread,
Oct 31, 2012, 11:55:17 PM10/31/12
to mv5...@googlegroups.com
Liang, I was trying to reproduce the error, but got disconnect half way. I'll do it tomorrow. 

Thanks,

Jiayuan

Jiayuan Meng

unread,
Nov 1, 2012, 4:41:07 PM11/1/12
to mv5...@googlegroups.com
Liang, set "simd=True" instead of False, and then try again.

Liang Wang

unread,
Nov 1, 2012, 6:49:31 PM11/1/12
to mv5...@googlegroups.com, meng.j...@gmail.com

simd=True works fine. But I need to do multithreading simulation with MV5, that is why I set simd to False.

Jiayuan Meng

unread,
Nov 1, 2012, 10:24:19 PM11/1/12
to Liang Wang, mv5...@googlegroups.com
Hey Liang,

Use the following for multi-threading on scalar cores. Basically, you need to use the precompiled benchmarks in  api/binsmp/. And you don't need to specify warp size.

Thanks,

Jiayuan

build/ALPHA_SE/m5.debug configs/fractal/fractal_smp.py --bindir=api/binsmp/ --simd=False --CpuFrequency=2.00GHz --DTBsize=1024 --DcacheAssoc=8 --DcacheBanks=4 --DcacheBlkSize=128 --DcacheHWPFdegree=1 --DcacheHWPFpolicy=none --DcacheLookupLatency=2ns --DcachePropagateLatency=0ns --DcacheRepl=LRU --DcacheSize=32kB --ITBsize=256 --IcacheAssoc=4 --IcacheBlkSize=128 --IcacheHWPFdegree=1 --IcacheHWPFpolicy=none --IcacheSize=16kB --L2NetBandWidthMbps=456000 --L2NetFrequency=300MHz --L2NetPortOutQueueLength=4 --L2NetRouterBufferSize=256 --L2NetRoutingLatency=1ns --L2NetTimeOfFlight=13t --L2NetType=FullyNoC --L2NetWormHole=True --MemNetBandWidthMbps=128000 --MemNetFrequency=266MHz --MemNetPortOutQueueLength=4 --MemNetRouterBufferSize=2048 --MemNetTimeOfFlight=130t --benchmark=FILTER --l2Assoc=16 --l2Banks=16 --l2BlkSize=128 --l2HWPFDataOnly=False --l2HWPFdegree=1 --l2HWPFpolicy=none --l2MSHRs=256 --l2Repl=LRU --l2Size=4096kB --l2TgtsPerMSHR=64 --l2lookupLatency=2ns --l2propagateLatency=30ns --l2tol1ratio=2 --localAddrPolicy=0 --maxThreadBlockSize=0 --numHWTCs=32 --numSWTCs=2 --numcpus=1 --physmemLatency=100ns --physmemSize=1024MB --portLookup=0 --protocol=mesi --randStackOffset=True --restoreContextDelay=0 --retryDcacheDelay=100 --stackAlloc=3 --switchOnDataAcc=True

On Thu, Nov 1, 2012 at 6:37 PM, Liang Wang <lwan...@gmail.com> wrote:
I would say the latter, switching among threads over a scalar core. I guess it is implemented by MTSimpleCPU?

------------------
Liang Wang



On Thu, Nov 1, 2012 at 7:30 PM, Jiayuan Meng <meng.j...@gmail.com> wrote:
What do you mean by multi-threading simulations? Do you mean switching among warps over SIMD cores, or you mean switching among threads over a scalar core? 

Thanks,

Jiayuan

Liang Wang

unread,
Nov 17, 2012, 2:46:33 PM11/17/12
to Jiayuan Meng, mv5...@googlegroups.com
This is a repost to the mv5sim group.

Yeah! These new binaries work! One minor issue:

* The LU binary for smp does not accept the same cmd parameters as that for simd. With smp binary, it accepts exactly 1 parameters (assert(argc==2)), while simd binary accepts exactly 2 parameters (assert(argc==3)). So if I want to make the two simulation results comparable, how should I set parameters for smp binary with a single parameter?

Besides, the current benchmarks input sets leads to quite a long simulation time for multi-core configurations (e.g. --numcpus=8), so is there a set of benchmark cmds with a smaller input data set to reduce the simulation time?


Thanks!
------------------
Liang Wang

Liang Wang

unread,
Nov 17, 2012, 11:00:16 PM11/17/12
to Jiayuan Meng, mv5...@googlegroups.com
Hi Jiayuan,

The provided command line works well except for KMEANS. For example, the following command line fails with panic information Tried to execute unmapped address 0. The actual panic occurs after the simulation runs for a while, e.g. half an hour on my machine. The complete command line is:
build/ALPHA_SE/m5.fast /if10/lw2aw/archsim/hgfractal/configs/fractal/fractal_smp.py --rootdir=/if10/lw2aw/archsim/hgfractal/ --bindir=/if10/lw2aw/archsim/hgfractal/api/binsmp/ --simd=False --CpuFrequency=2.00GHz --DTBsize=1024 --DcacheAssoc=8 --DcacheBanks=4 --DcacheBlkSize=128 --DcacheHWPFdegree=1 --DcacheHWPFpolicy=none --DcacheLookupLatency=2ns --DcachePropagateLatency=0ns --DcacheRepl=LRU --DcacheSize=32kB --ITBsize=256 --IcacheAssoc=4 --IcacheBlkSize=128 --IcacheHWPFdegree=1 --IcacheHWPFpolicy=none --IcacheSize=16kB --L2NetBandWidthMbps=456000 --L2NetFrequency=300MHz --L2NetPortOutQueueLength=4 --L2NetRouterBufferSize=256 --L2NetRoutingLatency=1ns --L2NetTimeOfFlight=13t --L2NetType=FullyNoC --L2NetWormHole=True --MemNetBandWidthMbps=128000 --MemNetFrequency=266MHz --MemNetPortOutQueueLength=4 --MemNetRouterBufferSize=2048 --MemNetTimeOfFlight=130t --benchmark=KMEANS --l2Assoc=16 --l2Banks=16 --l2BlkSize=128 --l2HWPFDataOnly=False --l2HWPFdegree=1 --l2HWPFpolicy=none --l2MSHRs=256 --l2Repl=LRU --l2Size=4096kB --l2TgtsPerMSHR=64 --l2lookupLatency=2ns --l2propagateLatency=30ns --l2tol1ratio=2 --localAddrPolicy=0 --maxThreadBlockSize=0 --numHWTCs=2 --numSWTCs=2 --numcpus=1 --physmemLatency=100ns --physmemSize=1024MB --portLookup=0 --protocol=mesi --randStackOffset=True --restoreContextDelay=0 --retryDcacheDelay=100 --stackAlloc=3 --switchOnDataAcc=True --warpSize=16

The complete stdout is:
************* Begin of stdout ********************
M5 Simulator System

Copyright (c) 2001-2006
The Regents of The University of Michigan
All Rights Reserved


M5 compiled Oct 30 2012 16:05:10
M5 started Sat Nov 17 15:23:04 2012
M5 executing on sulla
command line: /if10/lw2aw/archsim/hgfractal/build/ALPHA_SE/m5.fast --stats-file=/home/lw2aw/projects/fpca/work/mv5/experiments/mimd-smp/rawstat/1.txt /if10/lw2aw/archsim/hgfractal/configs/fractal/fractal_smp.py --rootdir=/if10/lw2aw/archsim/hgfractal/ --bindir=/if10/lw2aw/archsim/hgfractal/api/binsmp/ --simd=False --CpuFrequency=2.00GHz --DTBsize=1024 --DcacheAssoc=8 --DcacheBanks=4 --DcacheBlkSize=128 --DcacheHWPFdegree=1 --DcacheHWPFpolicy=none --DcacheLookupLatency=2ns --DcachePropagateLatency=0ns --DcacheRepl=LRU --DcacheSize=32kB --ITBsize=256 --IcacheAssoc=4 --IcacheBlkSize=128 --IcacheHWPFdegree=1 --IcacheHWPFpolicy=none --IcacheSize=16kB --L2NetBandWidthMbps=456000 --L2NetFrequency=300MHz --L2NetPortOutQueueLength=4 --L2NetRouterBufferSize=256 --L2NetRoutingLatency=1ns --L2NetTimeOfFlight=13t --L2NetType=FullyNoC --L2NetWormHole=True --MemNetBandWidthMbps=128000 --MemNetFrequency=266MHz --MemNetPortOutQueueLength=4 --MemNetRouterBufferSize=2048 --MemNetTimeOfFlight=130t --benchmark=KMEANS --l2Assoc=16 --l2Banks=16 --l2BlkSize=128 --l2HWPFDataOnly=False --l2HWPFdegree=1 --l2HWPFpolicy=none --l2MSHRs=256 --l2Repl=LRU --l2Size=4096kB --l2TgtsPerMSHR=64 --l2lookupLatency=2ns --l2propagateLatency=30ns --l2tol1ratio=2 --localAddrPolicy=0 --maxThreadBlockSize=0 --numHWTCs=2 --numSWTCs=2 --numcpus=1 --physmemLatency=100ns --physmemSize=1024MB --portLookup=0 --protocol=mesi --randStackOffset=True --restoreContextDelay=0 --retryDcacheDelay=100 --stackAlloc=3 --switchOnDataAcc=True --warpSize=16 --footprint=/home/lw2aw/projects/fpca/work/mv5/experiments/mimd-smp/checkout/1.footprint
L2 cache size: 4096kB
Global frequency set at 1000000000000 ticks per second
object: system.cpus_smp
clock: 500
numTCs: 4
object: system.cpus_smp.icache
hit_latency: 652
pass_latency: 388
object: system.cpus_smp.dcache
hit_latency: 710
pass_latency: 345
object: system.l2
hit_latency: 34857
pass_latency: 404
object: system.toL2net
clock: 3333
width(bytes/cycles): 189
routingLatency(cycles): 1
maxPortInBufferSize(bytes): 256
maxPortOutQueueLen(# packets): 4
linkQueueSize(bytes): 256
warn: Entering event queue @ 0.  Starting simulation...
warn: Increasing stack size by one page.
warn: Program declares local address space but the hardware is not configured for it. Local space is wasted
warn: Increasing stack size by one page.
warn: Increasing stack size by one page.
panic: Tried to execute unmapped address 0.
 @ cycle 1159999860000
[invoke:build/ALPHA_SE/arch/alpha/faults.cc, line 183]
Program aborted at cycle 1159999860000
************** End of stdout *****************************************

I have tried the same command line with other combinations of numHWTCs and numcpus:
numHWTCs=2,4,8
numcpus=1,2,4,8
All of them failed with the same panic message.

I do have successful simulation using above cmd with numHWTCs fixed to 1, and numcpus varies 1,2,4,8. Hope this information may help to diagnose.

Thanks!

------------------
Liang Wang

Liang Wang

unread,
Nov 29, 2012, 11:12:56 PM11/29/12
to Jiayuan Meng, mv5sim
Hi Jiayuan,

I am comparing the executed instructions on SIMD and MIMD for the same workload with the same input dataset. For SIMD, I am using binaries from binsimd_blocktask, and for MIMD I am using binaries from binsmp. I noticed the stats of "num_insts" is significant different for the two binaries with the same input dataset. Both SIMD and MIMD are configured as a single core, so I only have one copy of num_insts in both stats files. The numbers I got are like:
SIMD: 75609349
SMP: 217276152

So the total number of instructions for SMP binary is about 3x fold than that for SIMD binary. This sounds unexpected to me. Since I am feeding the two binaries with the same inputs, how can the executed instructions differ so much? I guess I may have missed something or mis-interpret the stats. Anyway, could you give any suggestion on this?

Thanks!
  

------------------
Liang Wang



On Sun, Nov 18, 2012 at 9:59 AM, Jiayuan Meng <meng.j...@gmail.com> wrote:
You can look at configs/fractal/frCommons.py to see if there are smaller input. (I'll take a look later when I get back to Chicago if you need even smaller inputs)
For LU, if you set the last parameter in SIMD mode to 0, like lu 300 0, it will be equal to lu 300 (though take much longer)

If you can build the cross compiler, type "make MODE=3" in api/ to rebuild the binaries in bin_smp. 

On Sat, Nov 17, 2012 at 10:11 AM, Liang Wang <lwan...@gmail.com> wrote:
Yeah! These new binaries work! One minor issue:

* The LU binary for smp does not accept the same cmd parameters as that for simd. With smp binary, it accepts exactly 1 parameters (assert(argc==2)), while simd binary accepts exactly 2 parameters (assert(argc==3)). So if I want to make the two simulation results comparable, how should I set parameters for smp binary with a single parameter?

Besides, the current benchmarks input sets leads to quite a long simulation time for multi-core configurations (e.g. --numcpus=8), so is there a set of benchmark cmds with a smaller input data set to reduce the simulation time?

Thanks!
 

Liang Wang

unread,
Nov 30, 2012, 12:20:55 AM11/30/12
to Jiayuan Meng, mv5sim
The workload I was investigated is HOTSPOT. 

Your suggestion on using SIMD with warpSize=1 to mimic multi-threaded sounds great. A follow-up question, can the system be configured to multiple  SIMD cores, and each SIMD cores has a warpSize of 1. Would this be a reasonable approximation to MIMD?

Thanks!

------------------
Liang Wang



On Fri, Nov 30, 2012 at 12:02 AM, Jiayuan Meng <meng.j...@gmail.com> wrote:
I see... If you are trying to compare SIMD with SMP, it's probably a
bit tricky.. Because the fractal API implements SIMD and SMP
differently, and the SMP binaries are generated in earlier days, there
could be some inefficiency in the API itself associated with every
innermost loop iteration, which has a noticeable overhead especially
if the innermost loop body is extremely small.  Which benchmark is it?

A better way to compare SIMD and multi-threaded core is to just use
SIMDTimingCPU for the multi-threaded core, where you set
numHWTCs=threads_to_hide_latency, and wapSize=1.
Reply all
Reply to author
Forward
0 new messages