benchmarks available ?

691 views
Skip to first unread message

TG

unread,
Jan 2, 2012, 8:17:50 AM1/2/12
to Sniper simulator
Could the benchmarks used for validation against hardware (in figure 5
of Sniper paper) be made available ?
Only fft is there in the $SNIPER_ROOT/test folder ..

thanks

Wim Heirman

unread,
Jan 4, 2012, 4:48:03 AM1/4/12
to snip...@googlegroups.com
Hi,

These are just the normal SPLASH-2 benchmarks (the original page for
them has gone offline a while ago but you can find them mirrored at
[1]). We did have to do some patches to make them work for x86-64, and
add ROI begin and end markers. I'll see if I can clean up this patch a
bit and release it.

Regards,
Wim

[1] http://users.elis.ugent.be/~wheirman/simics/splash2/

> --
> --
> You received this message because you are subscribed to the Google
> Groups "Sniper simulator" group.
> To post to this group, send email to snip...@googlegroups.com
> To unsubscribe from this group, send email to
> snipersim+...@googlegroups.com
> For more options, visit this group at
> http://groups.google.com/group/snipersim?hl=en

Wim Heirman

unread,
Jan 16, 2012, 9:36:59 AM1/16/12
to snip...@googlegroups.com
Hi,

I've released our patched version of the SPLASH-2 benchmarks, along with some runner scripts that define the input set sizes and should generally make it easier to use Sniper. You can download the package from [1]. I've included some basic usage instructions, if you have more questions feel free to ask on this list.

Regards,
Wim


TG

unread,
Jan 18, 2012, 3:09:02 PM1/18/12
to Sniper simulator
hi,
while running some workloads from this suite I get the following
error :

pinbin: /home/tgangwan/Desktop/sniper-1.04/common/misc/
circular_queue.h:81: void CircularQueue<T>::push(const T&) [with T =
BasicBlock*]: Assertion `!full()' failed

I got this in oceans, radiosity and in cholesky with tk25 file. Could
you provide any insights into this issue.

Thanks!

Wim Heirman

unread,
Jan 19, 2012, 8:00:01 AM1/19/12
to snip...@googlegroups.com
Hi TG,

I haven't been able to reproduce this yet. Can you give me some more
information on:
- the compiler and Linux version with which you compiled the benchmarks
- if you made any changes to the build config for the benchmarks
- what command-line options you use for the benchmarks and sniper (the
run-sniper command line should do)
- a copy of sim.cfg for one of the failing runs

Can you also try running Sniper with the --gdb command-line option and
get a backtrace? There are a few CircularQueues in use, the backtrace
should show which one is overflowing.

Thanks,
Wim

TG

unread,
Jan 19, 2012, 8:38:45 AM1/19/12
to Sniper simulator
-Linux version - Linux 3.0.0-12-generic x86_64 (Ubuntu 11.10 running
on VMware player)
-Compiler - gcc version 4.6.1

-No changes to the build config. Did a make on sniper_root/benchmarks/
and that built all the workloads

Command line -- ./run-sniper --gdb -p splash2-ocean.ncont -i test -n 1
-c gainestown

GDB backtrace ::
---------------------------------------------------------------------------------------------------
(gdb) bt
#0 0x00007f35843a03a5 in raise () from /lib/x86_64-linux-gnu/libc.so.
6
#1 0x00007f35843a3b0b in abort () from /lib/x86_64-linux-gnu/libc.so.
6
#2 0x00007f3584398d4d in __assert_fail () from /lib/x86_64-linux-gnu/
libc.so.6
#3 0x00007f3583fc9b1b in push (this=<optimized out>, t=<optimized
out>) at /home/tgangwan/Desktop/sniper-1.04/common/misc/
circular_queue.h:81
#4 CircularQueue<BasicBlock*>::push (this=<optimized out>,
t=<optimized out>) at /home/tgangwan/Desktop/sniper-1.04/common/misc/
circular_queue.h:79
#5 0x00007f3583fc9a07 in PerformanceModel::queueDynamicInstruction
(this=0x7f3570f02008, i=0x7f354a3fa700) at /home/tgangwan/Desktop/
sniper-1.04/common/performance_model/performance_model.cc:107
#6 0x00007f3583fe11a6 in
ParametricDramDirectoryMSI::MemoryManager::accessTLB (this=0x30be6350,
tlb=<optimized out>, address=<optimized out>, isIfetch=false)
at /home/tgangwan/Desktop/sniper-1.04/common/core/memory_subsystem/
parametric_dram_directory_msi/memory_manager.cc:422
#7 0x00007f3583fe1ca0 in
ParametricDramDirectoryMSI::MemoryManager::coreInitiateMemoryAccess
(this=0x30be6350, mem_component=MemComponent::L1_DCACHE,
lock_signal=Core::MIN_LOCK_SIGNAL, mem_op_type=Core::WRITE,
address=139867212681472,
offset=<optimized out>, data_buf=0x0, data_length=8, modeled=true)
at /home/tgangwan/Desktop/sniper-1.04/common/core/memory_subsystem/
parametric_dram_directory_msi/memory_manager.cc:286
#8 0x00007f3583f94678 in Core::initiateMemoryAccess (this=0x30bfe3a0,
mem_component=MemComponent::L1_DCACHE,
lock_signal=Core::MIN_LOCK_SIGNAL, mem_op_type=Core::WRITE,
address=139867212681472, data_buf=0x0, data_size=8,
modeled=Core::MEM_MODELED_RETURN, eip=4245605) at /home/tgangwan/
Desktop/sniper-1.04/common/core/core.cc:366
#9 0x00007f3583f95052 in Core::accessMemory (this=0x30bfe3a0,
lock_signal=Core::MIN_LOCK_SIGNAL, mem_op_type=Core::WRITE,
d_addr=139867212681472, data_buffer=<optimized out>, data_size=8,
modeled=Core::MEM_MODELED_RETURN, eip=4245605)
at /home/tgangwan/Desktop/sniper-1.04/common/core/core.cc:476
#10 0x00007f3583fc91e5 in PerformanceModel::getDynamicInstructionInfo
(this=0x7f3570f02008, instruction=...) at /home/tgangwan/Desktop/
sniper-1.04/common/performance_model/performance_model.cc:216
#11 0x00007f3583fce03b in IntervalPerformanceModel::handleInstruction
(this=0x7f3570f02008, instruction=0x7f354a3f8820) at /home/tgangwan/
Desktop/sniper-1.04/common/performance_model/performance_models/
interval_performance_model.cc:229
#12 0x00007f3583fc8d35 in PerformanceModel::iterate
(this=0x7f3570f02008) at /home/tgangwan/Desktop/sniper-1.04/common/
performance_model/performance_model.cc:145
#13 0x00007f35711eb5b7 in ?? ()
#14 0x0000000030c3bca8 in LEVEL_VM::ICONTEXT_SPILL_AREA::m_instance ()
#15 0x0000000000000042 in ?? ()
#16 0x00007f3563c73000 in ?? ()
#17 0x00007f358170fdd0 in ?? ()
#18 0x0000000000000132 in ?? ()
#19 0x0000000000000013 in ?? ()
#20 0x0000000030b670a0 in ?? ()
#21 0x0000000000000001 in ?? ()
#22 0x0000907d009b62f2 in ?? ()
#23 0x0000000000000008 in ?? ()
#24 0x0000000000000008 in ?? ()
#25 0x0000000000000007 in ?? ()
#26 0x00007f35645b48e0 in ?? ()
#27 0x0000000000000132 in ?? ()
#28 0x00007f356412b720 in ?? ()
#29 0x00007f3580f0e080 in ?? ()
#30 0x0000000000000033 in ?? ()
#31 0x000000000000002b in ?? ()
#32 0x0000000000000000 in ?? ()
(gdb) quit
---------------------------------------------------------------------------------------------------



sim.cfg ::
---------------------------------------------------------------------------------------------------
[bbv]
sampling = 0

[caching_protocol]
type = "parametric_dram_directory_msi"

[clock_skew_minimization]
report = "false"
scheme = "barrier"

[clock_skew_minimization/barrier]
quantum = 100

[clock_skew_minimization/random_pairs]
quantum = 100000
slack = 100000
sleep_fraction = 0.4

[clock_skew_minimization/ring]
slack = 1000

[dvfs]
transition_latency = 2000
type = "simple"

[dvfs/simple]
cores_per_socket = 4

[general]
enable_dcache_modeling = "true"
enable_icache_modeling = "true"
enable_performance_modeling = "true"
enable_shared_mem = "true"
enable_syscall_modeling = "true"
inst_mode_end = "fast_forward"
inst_mode_init = "cache_only"
inst_mode_roi = "detailed"
magic = "true"
mode = "lite"
num_processes = 1
output_dir = "/home/tgangwan/Desktop/sniper-1.04/benchmarks"
output_file = "sim.out"
total_cores = 1

[hooks]
numscripts = 0

[log]
disabled_modules = ""
enabled = "false"
enabled_modules = ""
mutex_trace = "false"
stack_trace = "false"

[network]
memory_model_1 = "emesh_hop_counter"
memory_model_2 = "emesh_hop_counter"
system_model = "magic"
user_model_1 = "emesh_hop_counter"
user_model_2 = "emesh_hop_counter"

[network/analytical]
n = 1
processing_cost = 100
s = 1
Tw2 = 1
update_interval = 1000
W = 32

[network/emesh_hop_by_hop_basic]
hop_latency = 2
link_bandwidth = 64

[network/emesh_hop_by_hop_basic/queue_model]
enabled = "true"
type = "history_list"

[network/emesh_hop_by_hop_broadcast_tree]
hop_latency = 4
link_bandwidth = 64

[network/emesh_hop_by_hop_broadcast_tree/queue_model]
enabled = "true"
type = "history_list"

[network/emesh_hop_counter]
hop_latency = 2
link_bandwidth = 64

[osemu]
nprocs = 0
pthread_replace = "true"

[perf_model]
perfect_llc = "false"

[perf_model/branch_predictor]
mispredict_penalty = 17
size = 1024
type = "pentium_m"

[perf_model/cache]
levels = 3

[perf_model/core]
frequency = 3.33
logical_cpus = 1
type = "interval"

[perf_model/core/interval_timer]
dispatch_width = 4
fu_contention = "false"
lll_cutoff = 30
memory_dependency_granularity = 4
num_outstanding_loadstores = 10
window_size = 128

[perf_model/core/iocoom]
num_outstanding_loads = 32
num_store_buffer_entries = 20

[perf_model/core/static_instruction_costs]
add = 1
branch = 0
div = 18
dynamic_misc = 0
fadd = 3
fdiv = 6
fmul = 5
fsub = 3
generic = 1
jmp = 1
mem_access = 0
mul = 3
recv = 0
spawn = 0
string = 0
sub = 1
sync = 0
tlb_miss = 0

[perf_model/core0]
frequency = -1

[perf_model/dram]
chips_per_dimm = 8
controller_positions = ""
controllers_interleaving = 4
dimms_per_controller = 3
latency = 45
num_controllers = -1
per_controller_bandwidth = 16

[perf_model/dram/queue_model]
enabled = "true"
type = "history_list"

[perf_model/dram_directory]
associativity = 16
directory_cache_access_time = 10
directory_type = "full_map"
home_lookup_param = 6
max_hw_sharers = 64
total_entries = 1048576

[perf_model/dram_directory/limitless]
software_trap_penalty = 200

[perf_model/dtlb]
associativity = 4
size = 1024

[perf_model/itlb]
associativity = 4
size = 256

[perf_model/l1_dcache]
associativity = 8
cache_block_size = 64
cache_size = 32
data_access_time = 4
dvfs_domain = "core"
enable = "true"
perf_model_type = "parallel"
prefetcher = "false"
replacement_policy = "lru"
shared_cores = 1
tags_access_time = 1
writeback_time = 0
writethrough = 0

[perf_model/l1_icache]
associativity = 4
cache_block_size = 64
cache_size = 32
data_access_time = 4
dvfs_domain = "core"
enable = "true"
perf_model_type = "parallel"
prefetcher = "false"
replacement_policy = "lru"
shared_cores = 1
tags_access_time = 1
writeback_time = 0
writethrough = 0

[perf_model/l2_cache]
associativity = 8
cache_block_size = 64
cache_size = 256
data_access_time = 8
dvfs_domain = "core"
enable = "true"
perf_model_type = "parallel"
prefetcher = "false"
replacement_policy = "lru"
shared_cores = 1
tags_access_time = 3
writeback_time = 50
writethrough = 0

[perf_model/l3_cache]
associativity = 16
cache_size = 8192
data_access_time = 30
dvfs_domain = "global"
perf_model_type = "parallel"
prefetcher = "false"
replacement_policy = "lru"
shared_cores = 4
tags_access_time = 10
writeback_time = 0
writethrough = 0

[perf_model/sync]
reschedule_cost = 1000

[perf_model/tlb]
penalty = 0

[power]
technology_node = 45
vdd = 1.2

[process_map]
process0 = "127.0.0.1"
process1 = "127.0.0.1"
process10 = "127.0.0.1"
process11 = "127.0.0.1"
process12 = "127.0.0.1"
process13 = "127.0.0.1"
process14 = "127.0.0.1"
process15 = "127.0.0.1"
process16 = "127.0.0.1"
process2 = "127.0.0.1"
process3 = "127.0.0.1"
process4 = "127.0.0.1"
process5 = "127.0.0.1"
process6 = "127.0.0.1"
process7 = "127.0.0.1"
process8 = "127.0.0.1"
process9 = "127.0.0.1"

[progress_trace]
enabled = "false"
filename = ""
interval = 5000

[queue_model]

[queue_model/basic]
moving_avg_enabled = "true"
moving_avg_type = "arithmetic_mean"
moving_avg_window_size = 1024

[queue_model/history_list]
analytical_model_enabled = "true"
max_list_size = 100

[stack]
stack_base = 2415919104
stack_size_per_core = 2097152

[transport]
base_port = 49700
---------------------------------------------------------------------------------------------------

Wim Heirman

unread,
Jan 19, 2012, 4:47:40 PM1/19/12
to snip...@googlegroups.com
Can you update sniper/common/performance_model/performance_model.cc
line 52 and change the m_basic_block_queue(16) into
m_basic_block_queue(128) ?

-Wim

TG

unread,
Jan 21, 2012, 1:50:04 AM1/21/12
to Sniper simulator
Works wonders !
Thanks Wim.

Diana

unread,
Feb 13, 2012, 7:27:22 PM2/13/12
to Sniper simulator
Hi Wim and TG,

I followed the instruction on http://snipersim.org/w/Download_Benchmarks
when I run make under BENCHMARKS_ROOT, I get the following error. Do
you know how to resolve this?

make[3]: Entering directory `path/to/sniper/benchmarks/splash2/splash2/
codes/apps/raytrace'
gcc bbox.o-opt cr.o-opt env.o-opt fbuf.o-opt geo.o-opt huprn.o-opt
husetup.o-opt hutv.o-opt isect.o-opt main.o-opt matrix.o-opt memory.o-
opt poly.o-opt raystack.o-opt shade.o-opt sph.o-opt trace.o-opt tri.o-
opt workpool.o-opt -g -O3 -I/path/to/sniper/benchmarks/splash2/splash2/
codes -I/path/to/sniper/benchmarks/splash2/splash2/codes/pthreads -I/
path/to/sniper/benchmarks/tools/hooks -I/path/to/sniper//include -
DUSE_LOCK_INC -o RAYTRACE.opt -lm -pthread -uparmacs_roi_end -
uparmacs_roi_start -L/path/to/sniper/benchmarks/tools/hooks -
lhooks_base -lrt -pthread
shade.o-opt: In function `Shade':
path/to/sniper/benchmarks/splash2/splash2/codes/apps/raytrace/shade.c:
202: undefined reference to `__sync_fetch_and_add'
path/to/sniper/benchmarks/splash2/splash2/codes/apps/raytrace/shade.c:
279: undefined reference to `__sync_fetch_and_add'
path/to/sniper/benchmarks/splash2/splash2/codes/apps/raytrace/shade.c:
297: undefined reference to `__sync_fetch_and_add'
trace.o-opt: In function `ConvertPrimRayJobToRayMsg':
path/to/sniper/benchmarks/splash2/splash2/codes/apps/raytrace/trace.c:
184: undefined reference to `__sync_fetch_and_add'
collect2: ld returned 1 exit status
make[3]: *** [RAYTRACE.opt] Error 1
make[3]: Leaving directory `path/to/sniper/benchmarks/splash2/splash2/
codes/apps/raytrace'
make[2]: *** [all] Error 255
make[2]: Leaving directory `path/to/sniper/benchmarks/splash2/splash2/
codes'
make[1]: *** [splash2-build] Error 2
make[1]: Leaving directory `path/to/sniper/benchmarks/splash2'
make: *** [all] Error 2

Thanks a lot!

Diana


On Jan 19, 4:47 pm, Wim Heirman <w...@heirman.net> wrote:
> Can you update sniper/common/performance_model/performance_model.cc
> line 52 and change the m_basic_block_queue(16) into
> m_basic_block_queue(128) ?
>
> -Wim
>
> ...
>
> read more »

Wim Heirman

unread,
Feb 14, 2012, 3:30:26 AM2/14/12
to snip...@googlegroups.com
Hi Diana,

Which version of GCC are you using, and is this on a 32-bit or a
64-bit platform?

Regards,
Wim

Wim Heirman

unread,
Feb 14, 2012, 4:06:51 AM2/14/12
to snip...@googlegroups.com
Diana,

It looks like the __sync_fetch_and_add problem occurs when you compile
for a 32-bit architecture, using the default i386 target. Adding
-march=i486 to CFLAGS in splash2/codes/Makefile should solve this.
Alternatively, you can just disable the raytrace.opt benchmark by
removing $(TARGET).opt from the list next to "all:" near the bottom of
splash2/codes/apps/raytrace/makefile

However, we usually don't run Sniper in 32-bit mode, so I'm not even
sure it will work. Is it an option for you to use a 64-bit machine
instead?

Regards,
Wim

On 14 February 2012 01:27, Diana <guo...@gmail.com> wrote:

Xiao Guo

unread,
Feb 14, 2012, 12:01:17 PM2/14/12
to snip...@googlegroups.com
Hi Wim,

I am using gcc44 and the machine I am running on is 64 bit. The --march=i486 did not work out. I'll just leave out the $(TARGET).opt for the moment.
Is it possible for you to release the other benchmark suites on the website? For example, PARSEC with the simulator hook and run script.

Thanks a lot!


Diana

Wim Heirman

unread,
Feb 19, 2012, 12:08:03 PM2/19/12
to snip...@googlegroups.com
Hi Diana,

I've added PARSEC to the benchmarks distribution. Let me know if you
have any problems building or running it.
The other suite in our IISWC paper is Rodinia [1], but they released
version 2.0 recently so it doesn't make much sense in me releasing our
integration of Rodinia 1.0. We'll probably upgrade internally, I'll
try to release that as well. (Or, if you want to write your own
integration scripts and contribute them back, I'll gladly add them to
our distribution).

Regards,
Wim


[1] https://www.cs.virginia.edu/~skadron/wiki/rodinia/index.php/Main_Page

Xiao Guo

unread,
Feb 20, 2012, 5:07:12 PM2/20/12
to snip...@googlegroups.com
Hi Wim,

Thanks for your instant response. I've tried to compile the parsec benchmark, but the parsec/checkdependencies.py seems for Debain system only. Does sniper require the same operating system as Graphite? The current machine that I am running on is red hat, I could build a virtual machine if debain is required. Thanks!


best
xiao

Xiao Guo
School of Engineering and Applied Sciences
GSAS, Harvard
Email: xia...@fas.harvard.edu
Tel: 617-548-1456

Wim Heirman

unread,
Feb 21, 2012, 8:29:15 AM2/21/12
to snip...@googlegroups.com
Hi Xiao,

We aim to support RedHat as well, although we do most of our
development on Ubuntu so I guess that's slightly better supported. I
made some updates for the dependency script, I'm now able to
successfully compile all Parsec applications on a CentOS 6 host. I
didn't try running them on Sniper yet so let me know if that fails.
Also, keep in mind that some Parsec applications do very weird things
for which we haven't been able to find a working solution yet, so only
the applications listed in our IISWC paper are supposed to work.

Regards,
Wim

Xiao Guo

unread,
Feb 22, 2012, 4:44:35 PM2/22/12
to snip...@googlegroups.com
Hi Wim,

The parsec benchmark works well on our machine now. Thanks a lot!
I have two other questions.

(1) Is it possible to run parsec/splash on multiple machines? It seems to me that parsec/splash2 can only run in graphite-lite mode because unimplemented system calls, and graphite-lite mode does not support multiple machines.

(2) Is it possible to run cpu2006 multi-programed benchmark on sniper?


best regards

Wim Heirman

unread,
Feb 22, 2012, 6:07:33 PM2/22/12
to snip...@googlegroups.com
Hi Xiao,

(1) For Splash2, and a few of the Parsec benchmarks, it's possible to
implement sufficient system calls to make them work in full mode.
(Make sure to compile them in a relatively old environment, such as
Debian/Lenny, as the newer glibc libraries end up using a much larger
set of system calls). We spent a significant amount of time trying to
get more Parsec benchmarks working, but every additional one required
so much extra effort that we didn't feel it was worth it in the end.
Lite mode runs faster, and you can still simulate several hundred
cores on a single machine (assuming it has sufficient memory - but say
48 GB should get you up to 256 cores and isn't very expensive these
days).

(2) Graphite/Sniper is Pin based, and Pin works at the single process
level. So it's not possible out of the box. Since single-threaded
workloads usually aren't affected by timing, it should be possible
though to just dump an instruction stream from each application, and
then feed multiple of them (from a file, or even on-line through
inter-process communication) into the BasicBlockQueue and
DynamicInstructionInfoQueue of each core. No timing or synchronization
information needs to be fed back to the process, so I don't think
there would be much more to it (as long as your application doesn't
directly request the current time through an rdtsc instruction or a
SYS_clock system call, but you probably don't care about that being
correct since it would only be used for reporting, not for something
that could affect the application's control flow).

Regards,
Wim

Reply all
Reply to author
Forward
0 new messages