Discussion Summary by David Phillips
An Analysis of Efficient Multi-Core Global Power Management Policies
This paper extends the concept of Dynamic Frequency Voltage Scaling
(DVFS) to multi-core systems by accounting for each individual core's
power consumption. Rather than blindly decrease/increase the voltage
frequency to all cores (chip-wide) at the same time at the system-wide
level, this paper proposes the idea of a monitor that watches and
controls the frequency of each core at runtime. The monitor can
decrease the frequency of one core while increasing the frequency of
others which would reduce overall performance loss by decreasing
voltage to a single core rather than all cores. The monitor
implements a policy that assigns voltage to each core based upon the
core's priority level and power budget. When a single core exceeds
its power budget, the monitor can reduce the voltage to the single
core while allowing the other cores to remain running at their current
frequency.
The paper discusses 4 different scaling policies:
1.) Priority Policy
Each task is assigned a priority level from 1 to n where n is the
number of cores.
Core N has the highest priority and core 1 has the lowest priority.
Therefore, the global controller always tries to run Core N at its
fastest possible speed. If the power budget is exceeded, the global
monitor will decrease the frequency of the first core, second core,
third core, etc before it ever decreases the frequency of the Nth
core.
2.) PullhiPushLo
This implementation tries to spread the power consumption evenly
across all of the cores. When the power budget is exceeded, the
controller slows down the core with the most power. When power
becomes available, the controller speeds up the slowest running core.
3.) MaxBIPS
This implementation attempts to predict the necessary power
consumption and select an appropriate power mode for each core.
4.) Chip-Wide DVFS
This implementation slows down all of the cores when the power budget
is exceeded. When power becomes available again, all of the cores are
sped back up. There is no per-core scaling. It is simpler than the
other policies but much more rigid.
Strengths:
The paper extends previously related work by presenting new ideas for
power scaling policies at the individual core level which improves
existing policies which apply scaling to all cores even if some of the
cores have not yet exceeded their power budget.
The paper shows that applying frequency scaling to all of the cores
without accounting for individual core consumption leads to
unnecessary performance loss.
The paper presents a new static power analysis tool for evaluating
power mode policies.
Weaknesses:
The paper assumes that a core is stalled during a power mode
transition. While the author admits that it is not the most
efficient, it does maintain synchronization of the cores. The author
does not provide supporting information on this and thus it is
difficult for the reader to determine whether or not it is in fact a
valid argument.
The paper does not discuss how power budgets are created. The paper's
argument implicitly makes the assumption that it is possible for the
operating system to predict, ahead of time, the power performance
behavior of a particular application. Depending on the workload, it
seems as though the "power budget" can vary depending on the job size
which could make the power consumption behavior different from one
instance to another.
The paper identifies the MaxBIPS policy as the ideal implementation,
however, the explanation of how the power consumption is predicted and
actually used is difficult to understand and follow. The ideas should
be made clearer so that readers that are not intimately familiar with
the subject are able to understand.
The paper does not discuss the system overhead and hardware complexity
that is introduced by implementing a global power monitor.
The paper only considers three different power mdes - "Turbo" (full
speed, no power savings), "Efficient1" (voltage reduced 15% which
causes 5% performance loss), and "Efficient2" (voltage reduced 45%
which causes a 15% performance loss). The paper states that the goal
is to keep the power reduction to performance loss ratio at 3:1.
However, there is no consideration as to why including more power
levels would not be prudent. The paper claims that more than 3 power
levels leads to much more complexity for the power monitor, however,
no facts are given to support the claim.
Considering that there are existing DVFS implementations (Sossaman and
Woodcrest platforms) that account for more than just 3 power modes,
it appears as though the paper's implementation would not support such
"real-world" scenarios.
The paper does not consider a wide range of benchmarks in its
simulations. SPEC CPU2000 is used which simulates high/low and low/
high CPU utilization versus memory utilization. Simulation using
other benchmarks would provide better results for different types of
workloads.
For the "Priority" policy, the paper does not explicitly state how the
policies are assigned. It is implied that the core1 has the lowest
priority and coreN has the highest priority. Is it not possible for
two cores to have the same priority?