Power-/Thermal-Aware Scheduler in Sniper

Santiago Pagani

unread,

Mar 16, 2016, 7:25:48 AM3/16/16

to Sniper simulator

Hi,

I'm working on implementing a power-/thermal-aware scheduler inside Sniper.
I have already modified the energystats.py script to output the power information from McPAT that I'm interested in.
In order to also have periodic power data, rather than simply the average information, I execute Sniper by also calling the stattrace.py script (although I don't actually need any statistic, if I don't add that script McPAT is only executed once and I only get average power).
I then execute: ./run-sniper -p splash2-fft -i test -n 2 -c gainestown -senergystats -sstattrace:core.energy-dynamic

What I want to do now is to be able to read this power information inside Sniper, so I can also integrated with HotSpot and make runtime scheduling decisions based on power and temperature.
Is there a simple way in which I can read the power data I have in the Python script from inside Sniper's C++ code?
I've seen some interfaces for such interaction in the topic of DVFS, but not on power.

Thanks and best,
Santiago

Wim Heirman

unread,

Mar 18, 2016, 6:29:26 AM3/18/16

to snip...@googlegroups.com

Santiago,

The energystats.py script makes calls to sim.stats.register(), which register the energy counters as regular Sniper statistics. You can read these from C++ code using something like Sim()->getStatsManager()->getMetricObject("core", coreid, "energy-dynamic")->recordMetric().

Regards,

Wim

--
--
--
You received this message because you are subscribed to the Google
Groups "Sniper simulator" group.
To post to this group, send email to snip...@googlegroups.com
To unsubscribe from this group, send email to
snipersim+...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/snipersim?hl=en

---
You received this message because you are subscribed to the Google Groups "Sniper simulator" group.
To unsubscribe from this group and stop receiving emails from it, send an email to snipersim+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Santiago Pagani

unread,

Mar 24, 2016, 1:12:33 PM3/24/16

to Sniper simulator

Hi Wim,

Thanks for your answer. I managed to do what you suggested and also to integrate HotSpot inside Sniper after reading the power information.
On a related topic, I would like to be able to know the current CPI of each core inside my scheduler, and also make decisions based on such metric (particularly, if CPI and power consumption are tightly related).

What would you suggest as the best way of doing so (not inside a Python script, but also inside Sniper)?.
Right know I'm looking into counting the number of instructions executed in each core at periodic intervals, and using the elapsed time and core frequency to estimate the CPI.
However, there might be already some metric that I can use or that has more information which I can also read using the recordMetric() method.

Thanks and best,
Santiago

Wim Heirman

unread,

Mar 24, 2016, 1:17:33 PM3/24/16

to snip...@googlegroups.com

Hi Santiago,

No, what you describe is the way to do it. None of the counters or statistics in Sniper measure computed values such as IPC, the idea is that we track only monotonously increasing things like instruction and cycle counts, and leave it up to the user to compute changes over time and other derived metrics.

Regards,

Wim

--

Santiago Pagani

unread,

Mar 25, 2016, 7:31:27 PM3/25/16

to Sniper simulator

Hi Wim,

Thanks again for your reply. I'm using function "Sim()->getCoreManager()->getCoreFromID(core_id)->getInstructionCount()" to get the number of accumulated instructions executed on "core_id". Is this the correct method?

Finally, I have one complicated question regarding my scheduler.
What I want to do is to evaluate the resulting performance of executing a set of applications when having both a centralized scheduler and a distributed scheduler.
A centralized scheduler has global information of the system, meaning that it can potentially reach better solutions, but the computation time of the scheduling algorithm can grow exponentially and there is a great deal of communication overheads to get such a global information.
Contrarily, a distributed scheduler will make local decisions, meaning that the solution might not be as good, but the computational effort is considerably reduced and it is also scalable, as only local information needs to be shared.

For the reported performance and timing to be accurate, I would ideally need to consider the computation and communication overheads required by each scheduler. More specifically, when the scheduler is running, it should actually be executed on a simulated core and other threads assigned to that cores should be stalled, and also some information needs to go through the NoC.
However, although I've been implementing the scheduler inside the "SchedulerPinnedBase::periodic(SubsecondTime time)" method, I still haven't figured out a way to simulate such overheads inside Sniper.
Another alternative would be to implement the scheduler as an application (either a single threaded application for the centralized case, or a multi-threaded application for the distributed case), but that would mean some interaction between Sniper and the application, as threads need to be migrated and DVFS levels changed from within Sniper, but such decisions would be made at an application level.

Could you suggest to me what could be the best way to do this?

Best,
Santiago

Message has been deleted

Santiago Pagani

unread,

Mar 27, 2016, 12:33:18 PM3/27/16

to Sniper simulator

Hi Wim,

I've been reading posts in the group, and it would seem that implementing my scheduler as an external application is the easiest way to account for the scheduling overheads (please stop me here if you don't agree).
There are already a few functions in the sim_api.h file which are very useful (e.g., SimSetFreqMHz() and SimGetFreqMHz()).

In order to get the additional functionality I need, I've extended the magic_client.cc, magic_server.cc, and sim_api.h files. Specifically:
1*- Inside method "UInt64 handleMagicInstruction(thread_id_t thread_id, UInt64 cmd, UInt64 arg0, UInt64 arg1)" of the magic_client.cc file, I added a few more options to the first case:
...
   case SIM_CMD_GET_TEMPERATURE:
   case SIM_CMD_GET_POWER:
   case SIM_CMD_SET_THREAD_AFFINITY:
      return handleMagic(thread_id, cmd, arg0, arg1);

2*- These commands I also defined in the sim_api.h file, along with some new functions:
#define SIM_CMD_GET_TEMPERATURE 15
#define SIM_CMD_GET_POWER 16
#define SIM_CMD_SET_THREAD_AFFINITY 17
#define SimGetTemperature(proc)        SimMagic1(SIM_CMD_GET_TEMPERATURE, proc)
#define SimGetOwnTemperature()        SimGetTemperature(SimGetProcId())
#define SimGetPower(proc)            SimMagic1(SIM_CMD_GET_POWER, proc)
#define SimGetOwnPower()            SimGetPower(SimGetProcId())
#define SimSetThreadAffinity(thread, affinity)    SimMagic2(SIM_CMD_SET_THREAD_AFFINITY, thread, affinity)
#define SimSetOwnAffinity(affinity)    SimSetThreadAffinity(SimGetThreadId(), affinity)

3*- I added the corresponding cases to file magic_server.cc, inside method "UInt64 MagicServer::Magic_unlocked(thread_id_t thread_id, core_id_t core_id, UInt64 cmd, UInt64 arg0, UInt64 arg1)"
      case SIM_CMD_GET_TEMPERATURE:
        if(arg0 < Sim()->getConfig()->getApplicationCores())
            return (UInt64)(Temperature[arg0]*1000000.0);
        else
            return 0;
      case SIM_CMD_GET_POWER:
        if(arg0 < Sim()->getConfig()->getApplicationCores())
            return (UInt64)(Power[arg0]*1000000.0);
        else
            return 0;
      case SIM_CMD_SET_THREAD_AFFINITY:
        if(arg0 < Sim()->getThreadManager()->getNumThreads())
            Sim()->getThreadManager()->getScheduler()->setAffinitySingle(arg0, arg1);
        return 0;

Everything seems to be working fine. I have a simple scheduler implemented as a benchmark, and I can migrate threads, read the temperature and power of the cores, and read/write the frequency of the cores.
Did I make the correct implementation choices?
I don't really understand the difference between implementing my new magic functions inside the "MagicServer::Magic_unlocked" method or directly inside the "handleMagicInstruction" method, e.g., like SIM_CMD_NUM_PROCS or SIM_CMD_NUM_THREADS. What is the purpose of calling "Sim()->getThreadManager()->getLock()" before calling "MagicServer::Magic_unlocked"?

Finally, now that I can implement my scheduler as an application, I would need that different threads of my scheduler application exchange some information between them (e.g., for the centralized scheduler, there could be one thread pinned to each core that periodically sends power and temperature information to the central manager). From the different communication mechanism between threads inside Linux, which one is better suited to ensure that Sniper models such communication over the NoC? Can I simply used shared memory?

Thanks and best,
Santiago

Wim Heirman

unread,

Mar 29, 2016, 11:31:43 AM3/29/16

to snip...@googlegroups.com

Hi Santiago,

Just using shared memory between the threads of your scheduler sounds good. Data written by one thread and read by another one will trigger the normal coherency traffic on the mesh, which sounds like what you need.

The implementation with magic instructions is exactly what I would have recommended so you're definitely on the right track. The lock is needed in case multiple threads execute magic instructions simultaneously. Some things like reading from a global Temperature array will work fine without the lock, but to do things in the scheduler such as affinity calls you need to acquire the lock to prevent data races.

Regards,

Wim

--

Santiago Pagani

unread,

Mar 30, 2016, 7:26:41 AM3/30/16

to Sniper simulator

Hi Wim,

Thank you very much for your reply. Sniper's support is awesome.
I just have a couple more general questions, which are hopefully going to be the last ones:

1- I'm using function "Sim()->getCoreManager()->

getCoreFromID(core_id)->getInstructionCount()" to get the number of accumulated instructions executed on "core_id". Is this the correct function?

2- How can I know the active time of individual cores during a time interval? I want to compute core utilization at runtime, so I basically need to know how long each core was active and the duration of the interval. I'm computing this inside the "SchedulerPinnedBase::periodic" function, so the duration of the interval I can get from "SubsecondTime delta = time - m_last_periodic;". However, I'm not sure the correct way to know how long each core was active in this interval.

3- Given that I implemented my task-to-core mapping and DVFS algorithms as an additional application, I know rely on the round-robin scheduler to schedule the tasks assigned to each core. Each core has one mapping/DVFS task running, plus the actual application that needs to be executed. I naturally don't want the mapping/DVFS task to be constantly executed, but I would like to ensure that it is executed at some given rate (for example, every 1ms). In order for the mapping/DVFS tasks not to be constantly executed, I introduced a sleep function "usleep( 1000 );", but I'm not sure if there is a better way, because of course this is only a lower bound. To ensure that the mapping/DVFS task get periodically scheduled, I most likely need to make some changes inside Sniper. Given that function "periodic" calls "reschedule(time, core_id, true);" for every core, my idea was to ensure that my mapping/DVFS tasks get the highest score in the "for(thread_id_t thread_id = 0; thread_id < (thread_id_t)m_threads_runnable.size(); ++thread_id)" loop, such that when testing "if (current_thread_id != new_thread_id)" the mapping/DVFS thread id is in "new_thread_id" every 1ms. Is this the correct line of thought?

4- What method would you recommend to measure the execution time of individual applications? For example, I pass many different applications through the command line, and rather than being interested in the total execution time I can read from sim.out, I'm interested in the execution time of each individual application.

5- In line with question "4", I'm observing that function "SimGetNumThreads()" changes when an application spawns a new thread, but not when a thread is finished. I would assume that this is done because the thread IDs are somehow related to their position in the vector and you don't want threads to disappear from it. Or is it something entire different? How can I tell that a thread is finished so my mapping/DVFS algorithm does not consider it anymore? I've seen a message saying "[TRACE:1] -- DONE --", so it seems that there is some way of knowing that a thread is finished although "SimGetNumThreads()" does not change.

6- Also related to question "4", I'm observing some funny messages when running multiple applications with the "--benchmark" option at the time points in which applications finish. It does not seem to be affecting the simulation whatsoever, but I'm not sure what the problem is. For example, for command line: "./run-sniper --benchmarks=splash2-fft-small-2,splash2-fft-small-4,splash2-fft-small-4 -n 64 -c gainestown -c noc -senergystats -sstattrace:core.energy-dynamic --sim-end=last", when the first two applications finish, I get the following message:
[TRACE:75] -- DONE --
[TRACE:1] -- DONE --
[app1] ------------------------------------------------------------
[app1] Exception Code: ACCESS_INVALID_ADDRESS. Exception Address = 0x7efdc92eaba7. Access Type: UNKNOWN. Access Address = 0x000000000
[app1]
[app1] Backtrace:
[app1] sift_writer.cc:Sift::Writer::Instruction:229
[app1] ??:??:0
[app1] ------------------------------------------------------------
[app1] C:Tool (or Pin) caused signal 11 at PC 0x7efdc92eaba7
[app1] Pin app terminated abnormally due to signal 11.
[app1] [SPLASH] [---------- End of output ----------]
[app1] [SPLASH] Done.

Thanks and best,
Santiago

Wim Heirman

unread,

Mar 30, 2016, 8:19:43 AM3/30/16

to snip...@googlegroups.com

Santiago,

1: Yes

2: Sim()->getCoreManager()->getCoreFromID(core_id)->getPerformanceModel()->getNonIdleElapsedTime() returns the total non-idle time since the start of the run, as before you can keep the previous value and subtract

3: sleep/usleep are emulated by the simulator to use simulated time, and should be pretty accurate (more than what the Linux kernel will give you). An alternative is to let the userspace thread sleep on a futex, and make a FUTEX_WAKE system call from periodic()

4: You can listen for HOOK_APPLICATION_BEGIN/END hooks. These are also written to the statistics database in the events list, use "dumpstats.py -e" to read it after a run.

5: Correct, thread ids are never reused, SimGetNumThreads is the highest number assigned (+1). HOOK_THREAD_EXIT is triggered when a thread exits, or you can use Sim()->getThreadManager()->getThreadState(thread_id) which will return Core::IDLE if the thread has finished.

6: The SIFT recorder does not handle cleanup very gracefully. We know that but never had much incentive to fix it, because as you say it should not affect simulation results.

Regards,

Wim

--

Santiago Pagani

unread,

Mar 31, 2016, 6:00:42 PM3/31/16

to Sniper simulator

Hi Wim,

Thanks for the very useful answers. Everything is working just fine.
I just have one more question about NoC.

I would like to be able to evaluate a mapping decision at runtime with respect to the NoC.
For example, a multi-threaded application that has a very big data interaction among its threads should have a very different performance when the entire application is mapped to a single cluster of cores that share a common L3 cache in comparison to mapping the threads to different clusters and having data go through the NoC (more so depending the number of hops and NoC bandwidth).
Therefore, I would like to be able to measure some statistic that allows me to realize if I made a bad mapping decision at runtime, so I can migrate the threads accordingly.

The most ideal thing would be to know how many messages are being sent from one core to another, but I doubt that that can be done, right?
More realistically, I would assume that we can measure the used bandwidth of different NoC routers, the number of bytes being sent/received in such routers, or something like that. The less appealing statistic would be to measure the number of L3 cache misses, as these don't necessary relate entirely to the NoC and would also occur in single-threaded applications.
What would you suggest?

Thanks and best,
Santiago

Santiago Pagani

unread,

Mar 31, 2016, 6:18:57 PM3/31/16

to Sniper simulator

Btw, I forgot to mention that I would like to read the statistics from within Sniper and not from a Python script.
Namely, I need this information in my application-level scheduler, and I make this happen by interacting with Sniper through magic instructions that I expose as needed.

Wim Heirman

unread,

Apr 9, 2016, 5:47:43 AM4/9/16

to snip...@googlegroups.com

Santiago,

There is some code that counts messages being set between core pairs, in network/network_model.cc. It's disabled by default so you need to set network/collect_traffic_matrix=true, you'll then have a set of statistics that you can read and expose to the application in the regular way.

Regards,

Wim

On 1 April 2016 at 00:18, Santiago Pagani <santiag...@gmail.com> wrote:

Btw, I forgot to mention that I would like to read the statistics from within Sniper and not from a Python script.
Namely, I need this information in my application-level scheduler, and I make this happen by interacting with Sniper through magic instructions that I expose as needed.

--

Santiago Pagani

unread,

Apr 9, 2016, 10:42:24 AM4/9/16

to Sniper simulator

Hi Wim,

Thanks again for your reply. What you suggest is definitely what I need.
Are there also some statistics more related to the NoC routers? For example, the used bandwidth of the routers or the number of bytes being sent/received?
I'm just thinking about the statistics that I can expect to see in the real hardware, as I would hope that my scheduler can be ported to a real platform just with additional implementation effort, and not by a change of concept.

In another topic, I've been experiencing some problems with the simulations. Particularly, I very often get this error message: "[barrier_sync_server.: 76] *ERROR* Core(52) or its sibling is already in the barrier (this is thread 129, we have thread 85)".
This error is triggered in line 76 of the barrier_sync_server.cc file when "m_barrier_acquire_list[master_core_id]" is true: LOG_ASSERT_ERROR(m_barrier_acquire_list[master_core_id] == false, "Core(%i) or its sibling is already in the barrier (this is thread %d, we have thread %d)", master_core_id, thread_me, m_core_thread[master_core_id]);
However, I haven't been able to identify why this is happening, specially since it is kind of a random error.
The command I'm using is for example: ./run-sniper --benchmarks=local-myScheduler-small-4,parsec-x264-small-7,parsec-x264-small-7,parsec-x264-small-7,parsec-x264-small-7,parsec-x264-small-7,parsec-x264-small-7,parsec-x264-small-7,parsec-x264-small-7 -n 64 -c gainestown -c noc -senergystats -sstattrace:core.energy-dynamic::1000000 --sim-end=last

Thanks and best,
Santiago

Derrick Chou

unread,

Jan 8, 2017, 7:27:14 PM1/8/17

to Sniper simulator

Hi Santiago,

I'm currently trying thermal-adaptive things in Sniper, and also want to integrate HotSpot into Sniper. Could you please introduce some details of how to do so? Thanks so much!

Derrick

在 2016年3月16日星期三 UTC-7上午4:25:48，Santiago Pagani写道：

MANJARI GUPTA

unread,

Jun 11, 2018, 1:38:09 PM6/11/18

to Sniper simulator

Hi Santiago,

I am a research scholar working in the field of Multicore processors. I came across your queries/work in snipersim group. I am also currently working on similar lines where I am trying to develop a Temperature aware task allocation multicore system. I am facing some issues and would greatly appreciate your help/input:

1. Integration of HotSpot with Sniper : I tried to run patches specified by Wim but they are not working. Could you please point my mistake / steps missed.

2. I generated few temperature files independently (without sniper-hotspot integration) . I am facing challenge in extracting temperature for each core. As per your thesis, use have used only T_DTM.So how can I use Hotspot for multicore temperature monitoring.

Any response would be helpful.

Thank you.

Manjari

SANCHIT S

unread,

Jan 30, 2020, 3:49:36 PM1/30/20

to Sniper simulator

how to run owm c code in sniper.

On Friday, March 18, 2016 at 3:59:26 PM UTC+5:30, Wim Heirman wrote:

Santiago,

The energystats.py script makes calls to sim.stats.register(), which register the energy counters as regular Sniper statistics. You can read these from C++ code using something like Sim()->getStatsManager()->getMetricObject("core", coreid, "energy-dynamic")->recordMetric().

Regards,
Wim

On 16 March 2016 at 12:25, Santiago Pagani <santiag...@gmail.com> wrote:

Hi,

I'm working on implementing a power-/thermal-aware scheduler inside Sniper.
I have already modified the energystats.py script to output the power information from McPAT that I'm interested in.
In order to also have periodic power data, rather than simply the average information, I execute Sniper by also calling the stattrace.py script (although I don't actually need any statistic, if I don't add that script McPAT is only executed once and I only get average power).
I then execute: ./run-sniper -p splash2-fft -i test -n 2 -c gainestown -senergystats -sstattrace:core.energy-dynamic

What I want to do now is to be able to read this power information inside Sniper, so I can also integrated with HotSpot and make runtime scheduling decisions based on power and temperature.
Is there a simple way in which I can read the power data I have in the Python script from inside Sniper's C++ code?
I've seen some interfaces for such interaction in the topic of DVFS, but not on power.

Thanks and best,
Santiago

--
--
--
You received this message because you are subscribed to the Google
Groups "Sniper simulator" group.
To post to this group, send email to snip...@googlegroups.com
To unsubscribe from this group, send email to

snip...@googlegroups.com

For more options, visit this group at
http://groups.google.com/group/snipersim?hl=en

---
You received this message because you are subscribed to the Google Groups "Sniper simulator" group.

To unsubscribe from this group and stop receiving emails from it, send an email to snip...@googlegroups.com.

Reply all

Reply to author

Forward