SST/macro + SST/micro

Tommy

unread,

Apr 18, 2014, 10:30:34 AM4/18/14

to SST-si...@googlegroups.com

Hi,

Is it possible to run SST/macro and SST/micro together, e.g., using SST/micro for a single node simulation?
I am thinking about running NPB benchmarks.
https://www.nas.nasa.gov/publications/npb.html
That would be great if there is any small example that I can follow.
Thanks.

Tommy

Hammond, Simon David (-EXP)

unread,

Apr 18, 2014, 10:50:49 AM4/18/14

to SST-si...@googlegroups.com

Not in a single piece of software sense but definitely as part of a workflow, yes. You can get kernel times from a micro simulation and them parameterize a Macro skeleton.

We are working on a method to unify the simulators (at the software level) but this won't be ready for some time.

S

--
Si Hammond
Sandia National Laboratories
Remote Connection

--
You received this message because you are subscribed to the Google Groups "Structural Simulation Toolkit" group.
To unsubscribe from this group and stop receiving emails from it, send an email to SST-simulato...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Saurabh Gupta

unread,

May 1, 2014, 2:20:43 PM5/1/14

to SST-si...@googlegroups.com, sdh...@sandia.gov

Hi,

I am interested in the workflow you talked about here. I want to study the impact of network topology/BW changes for an application's total runtime. One way is to run the SST/micro simulation with different network topologies but the application is computation heavy and I would prefer to fast-forward that time instead of simulation that in detail. The solution seems to use SST/macro.

So, how do I use the parameterized Macro skeleton? (i.e. the macro simulation should account for the time spent in the omitted computation part from the skeleton and as you suggest here I should obtain that from micro simulation somehow and put that in parameter file???)

-Saurabh

Hammond, Simon David (-EXP)

unread,

May 1, 2014, 2:27:02 PM5/1/14

to Saurabh Gupta, SST-si...@googlegroups.com

Does the application use a conventional MPI communication template - e.g.
2D halo exchange etc? If so, there are methods to achieve this in
SST/Micro which are lightweight.

SST/Macro will allow you to run skeletons but you will have to manually
skeletonize the code if you want a truly high performance solution. If you
do develop a Macro skeleton then you can instrument sections of your real
application with timers and use the time taken for the kernels in the
skeleton. If that works acceptably and gives you reasonably
reliable/validated results you would then be able to obtain these kernel
timings from running the kernels on various SST/Micro configurations and
utilize these in the skeleton as parameters.

Is that the kind of thing you are looking to do?

--
Simon Hammond
Scalable Computer Architectures (CSRI/111, 01422)
Sandia National Laboratories, NM, USA

Saurabh Gupta

unread,

May 1, 2014, 2:35:32 PM5/1/14

to Hammond, Simon David (-EXP), SST-si...@googlegroups.com

Yes, I think so. I would greatly appreciate you pointing me to the resource that talks about skipping the detailed simulation of computation part of my application (i.e. avoid execution driven simulation of compute node and mainly focus on simulating the network/topology and file system part)

When using SST/macro, how does inserting the timers to replace the computation in the application help speedup the simulation?

Thank you.

Hammond, Simon David (-EXP)

unread,

May 1, 2014, 3:32:06 PM5/1/14

to Saurabh Gupta, SST-si...@googlegroups.com

Hi Saurabh,

The way to run a skeletonized simulation is basically the following:

(1) Take your real application code, identify the computational kernels and then add timers around these so you get a per kernel time

(2) Take your application code and make a skeleton of this - i.e. put the MPI functions into the skeleton as in the real code and then add a call to SST/Macro’s advance the time function in place of the computational kernel. This essentially replaces the actual kernel code itself with a statement to skip the simulation time forward

(3) Benchmark the application, record the kernel times

(4) Run the skeleton in the simulation with the recorded kernels times merged into the appropriate statements in the skeleton

Once you have that working you can then use SST/Micro to generate the kernel times and repeat step 4 as needed. Right now this is a fairly manual process and it can get a bit messy. This is usually a much more accurate and scalable approach to simulating large codes however.

S.

--

Simon Hammond

Scalable Computer Architectures (CSRI/111, 01422)

Sandia National Laboratories, NM, USA

Reply all

Reply to author

Forward