Techniques for increasing simulation speed

326 views
Skip to first unread message

Hamid Reza

unread,
Nov 30, 2013, 11:31:09 AM11/30/13
to snip...@googlegroups.com
Hi all,

I have installed Sniper, and simulated a qual-core machine consisting of two-level caches. But, I found that Sniper simulation speed isn't high. So, could you tell me how simulation speed of Sniper can be increased where simulation accuracy not affected very much?

Thanks for your answers in advance.

Wim Heirman

unread,
Dec 1, 2013, 5:12:27 AM12/1/13
to snip...@googlegroups.com
Hamid,

In addition to sampling [1], there are a few options you can try. One is workload reduction, for instance by using a smaller input set (although this may affect working set sizes!) or reducing the number of iterations if you have an iteration-based application. These are accepted practices so you should be able to find a significant amount of academic literature about the methods and trade-offs involved.

In Sniper itself, there are a few time-consuming models that you can turn off, but that of course requires that you already know how your benchmark behaves to gauge how this will affect accuracy. Instruction cache modeling is fairly slow, so if your application has a small code footprint and isn't generally affected by I-cache misses (check the CPI stack on a full simulation first), you can save some time by not simulation I-cache accesses (-ggeneral/enable_icache_modeling=false). The same holds for branch prediction (use -gperf_model/branch_predictor/type=none to turn it off).

Sniper 5.3 added a new simulation mode (cache-only, enabled with the -ccacheonly configuration option) where only caches and branch predictors are simulated, along with a first-order timing model (one-IPC plus cache and branch latencies). If you don't care about absolute runtime but only about miss rates, this could be an option for you as well.

As for how much faster you'll be able to make simulation using these things, that depends a lot on how well you know your application, and what accuracy you need or are willing to give up. Reducing the application itself can give you good speedup (10x or more) but requires that you have good insight into application behavior to know which parts are relevant. Using cache-only should give you roughly 10x but you'll loose most of the timing information. This method is probably better suited for validation (e.g. making sure that your cache hit rates don't change too much after changing the input set). Disabling simulation models (I-cache, branch) won't give you more than some 10s of %.

Regards,
Wim



--
--
--
You received this message because you are subscribed to the Google
Groups "Sniper simulator" group.
To post to this group, send email to snip...@googlegroups.com
To unsubscribe from this group, send email to
snipersim+...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/snipersim?hl=en
 
---
You received this message because you are subscribed to the Google Groups "Sniper simulator" group.
To unsubscribe from this group and stop receiving emails from it, send an email to snipersim+...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hamid Reza Khaleghzadeh

unread,
Dec 2, 2013, 11:49:14 AM12/2/13
to snip...@googlegroups.com
Hi Wim,

Thanks for your useful answer, but, in time-based sampling, I don't know how sampling interval must be set where accuracy not affected very much. Could you tell me how I can practically determine sampling interval?

Best regards


You received this message because you are subscribed to a topic in the Google Groups "Sniper simulator" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/snipersim/cczjaQYBZFc/unsubscribe.
To unsubscribe from this group and all its topics, send an email to snipersim+...@googlegroups.com.

Wim Heirman

unread,
Dec 8, 2013, 5:51:28 AM12/8/13
to snip...@googlegroups.com
Hamid,

In our sampling paper we used a Pin tool that computed the autocorrelation using BBV vectors, leading to Figure 6. This allows you to read off the application periodicities. We have not yet released the Pin tool to do this however, we plan to do this eventually but we can't give an expected time frame yet.

For now, you can try out a few parameters and compare the results with a detailed simulation to make sure the differences are not too high. Once you found valid parameters for an application, you can use them to simulate this application on other architectures as well (i.e. the parameters are workload dependent but architecture independent).

As a starting point, you'll probably want to set sampling/periodic/fastforward_interval=0 to make sure caches are alway enabled (this is what we used in our ISPASS paper as well, otherwise you'll experience warmup error as well). Then, pick a fast-forward to detailed ratio, we used either 5x or 10x which should be reasonable; this will be the ratio between warmup_interval and detailed_interval. Finally, you can sweep over the detailed interval length. Start at 10k ns (lower does not make much sense), you can go up to maybe 10M or 100M ns.

Regards,
Wim
Reply all
Reply to author
Forward
0 new messages