Reviews: SearchWithMobile

Rodrigo Fonseca

未讀,

2010年10月20日下午6:11:372010/10/20

收件者：CSCI2950-u Fall 10 - Brown

Please post your reviews as a response to this message.
Thanks,
Rodrigo

Hammurabi Mendes

未讀,

2010年10月20日晚上8:45:282010/10/20

收件者：brown-csci...@googlegroups.com

Paper Title

Web Search Using Mobile Cores: Quantifying and Mitigating the Price of
Efficiency

Authors

Vijay Janapa Reddi, Benjamin Lee, Trishul Chilimbi, Kushagra Vaid

Date

ISCA'10 - International Symposium on Computer Architecture, 2010

Novel Idea

The work describes and quantifies what the authors call "the price of
efficiency" - how increasingly stringent demands (in a search engine
context) on low cores impact the general QoS.

Main Results

The paper compares the performance of Xeon and Atom processors under a
particular search engine workload and measure the impact that low
cores have on the QoS. They also propose some strategies to lighten
the "price of efficiency".

Impact

The paper promotes the discussion about data center efficiency in a
particular workload scenario to the processor micro-architecture
level. Therefore, I believe that their results give important
characterizations both to the hardware design industry and to managers
that need to quantify investments in data centers for this particular
scenario.

Evidence

The authors present multiple characterizations of "the price of
efficiency": how processors adhere to QoS restrictions, how they adapt
to bursts of demands (including some metrics particularly important
for web-searching, as the latency cutoff), complexity, throughput, and
others.

I think that they have an excellent evaluation section.

Prior Work

They mention some articles that identify change trends in data
centers, the increasing awareness about power, and systems that also
propose using low-power cores in a similar scenario (including FAWN).

Competitive Work

They mention the Piranha and Niagara, as well as FAWN (in which case
they note that their discussion is on the processor micro-architecture
level instead on the consumption measurement level).

Reproducibility

They use industry processors and their mention the tools they employ
to their evaluations. I believe that their results are reproducible.

Questions + Criticism

[Criticism] When the authors are discussing about the issue of
"over-provisioning and under utilization", I think that they should
have explored the possibility of just employing processors in which we
can dynamically scale the voltage/frequency. [Question] Would such
approach provide the appropriate performance on low power and low
demand scenario, as well as being flexible in the sense that when more
load is required, as the frequency can be adapted?

Ideas for Further Work

Trying out middle range processors and dynamic voltage and frequency scaling.

Tom Wall

未讀,

2010年10月20日下午6:50:402010/10/20

收件者：CSCI2950-u Fall 10 - Brown

Web Search Using Moblie Cores: Quantifying and Mitigating the Price of
Efficiency
Vijay Janapa Reddi, Benjamine Lee, Trishul Chilimbi, Kushagra Vaid
ISCA 2010

Novel Idea:
Microsoft evaluates the usage of small lightweight Atom processors
instead of server class Xeon processors for Bing search. This
application differs from most distributed applications in that at its
core are machine learning algorithms which increase accuracy with more
iterations. Thus to retain a high quality of service, the slower Atom
processors are expected to have higher latencies. While Atom
underperforms with this Bing implementation, it is far more energy
efficient. By tweaking the application to suit Atom procesors and
predicting how Atom hardware will evolve, they find that they can
diminish the performance gap between Xeon and Atom while keeping the
benefits of better power efficiency.

Main Result:
They provide a thorough analysis of how the various processors perform
with respect to search and study how both the hardware and application
should change to meet their goals.

Evidence:
They measure how the search application stresses various operations on
the CPU for both Xeon and Atom processors. They measure how Atom
performs with respect to Xeon and come up with a reasonable projection
of a future Atom-like processor. They then compute the TCO of each of
the 3 categories of processor.

Impact:
While Atom uses less power, the architecture needs to evolve a bit
before it can compete with Xeon. Once a processor like their
Hypothetical Atom model comes around, then Atom may have a better
chance. Their process of evaluating how each processor performs with
respect to their search application was well done and it may be useful
for profiling other distributed applications.

Similar Work:
The Atom processor is another example of energy proportional
computation. They note that in [3] google is also experimenting with
energy proportional data centers. They recognize that the Pirhana and
Niagra architectures have demonstrated that small efficient cores are
a viable alternative to the high performance high energy processors
common to most data centers. Finally they mention FAWN, which also
uses low power CPUs.

Question:
How might AMD's comparable line of efficient processors or the Nvidia
Tegra compare? Or similarly, why only evaluate the two extreme ends of
the processor spectrum? Perhaps there is a middle ground that
satisfies all the requirements.

Visawee

未讀,

2010年10月20日晚上10:56:282010/10/20

收件者：CSCI2950-u Fall 10 - Brown

Paper Title :
Web Search Using Mobile Cores: Quantifying and Mitigating the Price of
Efficiency

Author(s) :
Vijay Janapa Reddi, Benjamin Lee, Trishul Chilimbi, Kushagra Vaid

Date :
ISCA’10, June 19–23, 2010, Saint-Malo, France

Novel Idea :
Evaluating power efficiency, quality-of-service, robustness, and
flexibility between server- and mobile-class architectures for search
workload.

Main Result(s) :
Although Atom is more energy efficient (more sustainable QPS per watt)
than Xeon, this efficiency comes with some price.
Robustness - Atom is unable to absorb a large number of additional
workload and quickly violates its QoS target. Atom's latency, in this
case, also is nearly 3x greater than that of the Xeon.
Flexibility - Atom latency distributions are highly sensitive to
activity spikes, while Xeon is able to absorb activity spikes
smoothly.
Reliability - Atom is more reliable then Xeon because its load is more
distributed than load of Xeon.
However, with better Atom system integration, an Atom-based data
center can achieve approximately 1.5x the throughput-per-TCO dollar of
Xeons.

Impact :
Guideline for implementing more efficient architecture

Evidence :
The authors set up experiments using search workload from Bing to
compare between these two architectures.

Prior Work :
There were several experiments regarding using low-power embaeeded
processors for performance and power efficiency.
However, this work shows the price that comes with that efficiency.

Reproducibility :
The results are irreproducible because the experiments are based on
the real workload from Bing search.
The authors also do not give the detail of the system.

Criticism :
When comparing these two architectures, the authors should also take
into consideration about the maintanance cost (human labor cost).
The cost of maintaining a large number of mobile-class cpus might be
much more than the cost of maintaining a small number of server-class
cpus.

On Oct 20, 6:11 pm, Rodrigo Fonseca <rodrigo.fons...@gmail.com> wrote:

Sandy Ryza

未讀,

2010年10月20日晚上11:55:192010/10/20

收件者：CSCI2950-u Fall 10 - Brown

Title:
Web Search Using Mobile Cores: Quantifying and Mitigating the Price of
Efficiency

Authors:

Vijay Janapa Reddi, Benjamin Lee, Trishul Chilimbi, Kushagra Vaid

Date:

ISCA '10

Novel Idea:
The authors compare performance and power efficiency high powered Xeon
chips, the type typically used for computation in datacenters, with
lower powered, but more energy efficient Atom chips for the task of
running a commercial search engine. They drive their system using
real queries from Microsoft Bing user activity, and analyze
performance and efficiency tradeoffs.

Main Result(s):
They find that search is far more efficient on Atom chips in terms of
queries per second per watt spent on processors, but that this
increased efficiency comes at the expense of robustness of quality-of-
service guarantees. They find that total cost of ownership as well as
total energy is worse when placing Atom chips in the same machines
that Xeons were used in. Only with hypothetical machines built to
reduce power used on other non-CPU parts does using Atoms seem more
cost-effective.

Evidence:
Results are collected on how much power is used, how many queries can
be handled per second, and how quality of service degrades using the
separate processors. The authors measure how complex queries affect
latency with the different configurations, and also how increased
latency reflects on the quality of search results. A detailed total-
cost-of-ownership analysis is also presented.

Impact:
The results of the paper suggest that if the hypothetical Atom
computers described were built, it would be cost efficient to switch
the processing behind Bing to them. Perhaps Microsoft is working on
this right now.

Prior Work:
The authors mention Pirahna and Niagara, systems which use small cores
to improve efficiency for memory and IO bound workloads. They also
mention FAWN, which also targets non-computationally-intense
workloads, and seeks to improve efficiency both by using smaller cores
and by using flash instead of disk for memory.

Competitive Work:
Nothing that I can think of.

Reproducibility:
The results do not seem very reproducible, both because the details of
the datacenter's configuration and computational framework are not
presented and because the results rely on real queries available only
because of the authors' access to Bing.

Criticism & Question:
A lot of the cost analysis is based on hypothetical machines. How
valid is this?

Ideas for further work:
Perhaps run similar tests using intermediately powerful-processors?

On Oct 20, 6:11 pm, Rodrigo Fonseca <rodrigo.fons...@gmail.com> wrote:

Dimitar

未讀,

2010年10月20日晚上11:52:252010/10/20

收件者：CSCI2950-u Fall 10 - Brown

Web Seach Using Mobil Cores:

Quantifying and Mitigating the Price of

Efficiency.

Authors Vijay Janapa Reddi, Benjamin Lee, Trishul Chilimbi, Kusagra
Vaid

Date: 2010

Novel Idea: Internet services are experiencing a shift from
computation and data migrates from
client to distributed data centers. The commoditization of data
center economies of scale and
Internet scale workload demand greater power efficiency in order to
sustain scalability. Based on
this principles the paper evaluates the performance per watt on Xeon
an Atom CPU . The two
platforms represent two opposite ends of commodity processors.
Main Results: The authors demonstrate that power efficient Atom can
achieved approximately 1.5
better the throughput-per TCO dollar of Xeon. This is due to the fact
that multiprocessor-based Atom
consumes three time less power , and requires a lot less power for
cooling.
Impact: The recent advances in mobile computing technology can
drastically change how data centers
are build.
Evidence:
The authors evaluate the two architectures based on robustness of the
system which is quantified by
combination of queries per second per watt , throughput and latency.
They show that Atom is
five times more efficient than Xeon in terms of QPS per watt. This is
largely to the fact that Xeon uses about
25 times more power compare to the Atom when idle, and most of the
time CPUS in the systems are idle.
Price/performance also favors Atom. Xeon is 8.5 times more
expensiveyet only 3.8 times faster than the Atom.

Prior Work. Raganthan and Jouppi discuss the changing workload mix and
usage patterns, motivating
the need for integrated analysis of microarchitectural efficiency.
This paper complements their work
by examing the role of power efficiency from small cores.
Reproducibility: I think the test cases in this paper are easily
reproducable if we have the necessary
equipment to take the measurements for power consumption.

Question Is there any data center being build using mobile computing
technology?

Criticism. The paper wasn't really convincing in showing that Atom
architecture is better in building
data centers compare to Xeon, especially when they demonstrated a lot
of its shortcoming such as
handling complex queries

On Oct 20, 6:11 pm, Rodrigo Fonseca <rodrigo.fons...@gmail.com> wrote:

Shah

未讀,

2010年10月20日下午6:18:022010/10/20

收件者：CSCI2950-u Fall 10 - Brown

Title:

Web Search Using Mobile Cores: Quantifying and Mitigating the Price of
Efficiency

Authors:

[1] Vijay Janapa Reddi
[2] Benjamin C. Lee
[3] Trishul Chilimbi
[4] Kushagra Vaid

Source and Date:

Proceedings of the 37th Annual International Symposium on Computer
Architecture, Saint-Malo, France.

Novel Idea:

The novel idea presented in this paper is the use of Atom and Xeon
(mobile) processors to evaluate an industry-strength, online web
engine (Microsoft's Bing).

Main Result:

The scientists use a series of small cores to perform the following:

[1] Evaluate online search

[2] Compare the efficiency of Atom and Xeon mobile-class architectures

[3] Analyze price efficiency

[4] Mitigate the price of effiency

Impact:

Though the authors state that in the future reducing power associated
with peripherals is important perhaps the paper is too new to support
this point.

Evidence:

The researchers provide several experimental results throughout the
paper that give a wide array of metrics. However, the setup of these
are not explained in detail.

Prior Work:

The scientists mention that both Piranha and Niagara propose
integrating simple cores indicating to us that this idea of using
mobile computing has a been used in the past.

Competitive Work:

The authors mention that they build on the work of Ranganathan and
Jouppi complementing it. They also the mention the work of Barroso et
al., Lim et al. and Vasudevan et al. (in FAWN).

Reproducibility

As is the case with papers written by corporations, although there are
numerous experiments to support the authors' claims, there is no
detail of the procedures.

Questions:

Has this caught on? What are some of the other technologies smaller
cores could be used for?

Criticism:

In the beginning and end of the paper, the authors make claims that
speak of the popularity of this paradigm in the future without
necessarily informing the readers why.

Ideas for Further Work:

Could there be other uses for this model that are not purely related
to search?

Abhiram Natarajan

未讀,

2010年10月20日晚上8:21:152010/10/20

收件者：CSCI2950-u Fall 10 - Brown

Paper Title: Web Search Using Mobile Cores: Quantifying and Mitigating
the Prince of Efficiency

Author(s): Vijay Janapa Reddi, Benjamin Lee, Trishul Chilimbi,
Kushagra Vaid

Date: 2010, ISCA

Novel Idea: Finding out the challenges that constitute the price of
efficiency from small cores

Main Result(s): The authors quantify the price of efficiency for an
industry-strength, online web search engine.

Impact: The results in the paper contravene conventional wisdom with
respect to small cores and data centers for online web search.

Evidence: The perform thorough experimentation and provide sufficient
empirical results. Specifically, they quantify the price of small-core
power e ciency for industry-strength web search, Microsoft Bing.

Prior Work: (1) Survey of enterprise information technology trends
(2) Energy proportional computing by Barroso et al.
(3) Power management schemes across an ensemble of
systems by Ranganathan et al.
(4) Piranha, Niagara
(6) System view of warehouse-computing environments
by Lim et al.
(5) FAWN

Competitive Work: Specifically, they make the following contributions
(a) Search: Evaluation of an online web
search engine operating in an enterprise
environment
(b) Effi ciency: Comparison of the power
efficiency of the server-class Xeon and
mobile-class Atom
microarchitectures for Search
(c) Price of Efficiency
(d) Mitigating the Price of Effi ciency:
Effe ctive system design strategies, greater
integration and enhanced
microarchitectures etc to impact Prince of
efficiency

Reproducibility: They give quite a bit of details; however, complete
reproducibility is not possible.

Criticism (+ve): There is a good bit of description about Microsoft
Bing, which was nice. They used real production workloads rather than
workload models or simulations

On Oct 20, 6:11 pm, Rodrigo Fonseca <rodrigo.fons...@gmail.com> wrote:

Basil Crow

未讀,

2010年10月20日晚上10:13:082010/10/20

收件者：brown-csci...@googlegroups.com

Title: Web Search Using Mobile Cores: Quantifying and Mitigating the

Price of Efficiency

Authors: Vijay Janapa Reddi, Benjamin Lee, Trishul Chilimbi, and Kushagra Vaid

Date: ISCA 2010

Novel idea: As the authors clearly state, "advances in processors
comprised of small cores provide opportunities for power efficiency."

Main results: Search is 5x more efficient on Atom than on Xeon
per-core on the basis of queries per second per watt. For search,
query latency is higher on Atom than on Xeon; Xeon is able to absorb
activity spikes more smoothly than Atom; and Atom is more reliable and
energy-efficient.

Impact: Switching to processors such as Atom may provide TCO benefits.

Evidence: The authors analyzed the performance of both Atom and Xeon
processors running their search workload using VTune, a toolbox that
provides access to processor hardware counters.

Prior work: The authors cite inspiration from Lim et al's unified
systems as well as FAWN.

Competitive work: In contrast to previous approaches, the authors take
into account microarchitectural details of processor performance in
addition to higher-level metric such as energy consumption.

Reproducibility: Assuming you had a large number of computers with
some kind of similar search workload (with both Atom and Xeon
processors), it should be possible to collect similar metrics.

Siddhartha Jain

未讀,

2010年10月20日晚上10:31:562010/10/20

收件者：brown-csci...@googlegroups.com

Title: Web search using Mobile cores: Quantifying and mitigating price of efficiency

Novel Idea:

Nothing novel per se. The performance of the Bing search engine with respect to energy

efficiency is analyzed using atom and xeon processors.

Main Results:

Cpu utilization at the microarchitecture level is given and the power consumption is measured

under various loads and spikes and its effect on qos and search engine relevance is given.

Suggestions for improving energy efficiency are given.

Impact:

Although it contributes nothing conceptually new, it has a lot of analysis of how a particular

application behaves in terms of energy efficiency and cpu utilization under various loads and how

factors like hardware failure affect the search engine performance which could be very useful

for coming up with new techniques.

Evidence:

Numbers for various configurations are provided and analyzed

Prior Work:

Prior work in optimizing energy efficiency of systems for instance, works like proportional ensemble

and FAWN.

Reproducibility:

Can't be reproduced

Question/Future work:

Interesting analysis. It would be interesting to see if some optimization techniques could be appliedto increase the energy efficiency vs. just ad-hoc optimization based on guessing using the data.

On Wed, Oct 20, 2010 at 6:11 PM, Rodrigo Fonseca <rodrigo...@gmail.com> wrote:

Zikai

未讀,

2010年10月21日上午11:00:262010/10/21

收件者：CSCI2950-u Fall 10 - Brown

Paper Title: Web Search Using Mobile Cores: Quantifying and Mitigating
the Price of Efficiency

Author(s):
Vijay Janapa Reddi (Harvard University)
Benjamin Lee (Microsoft Research)
Trishul Chilimbi (Microsoft Research)
Kushagra Vaid (Microsoft)

Date/Conference: ISCA (International Symposium on Computer
Architecture) 2010

Novel Idea: Quantify price of small-core power efficiency in terms of
quality-of-service, robustness and flexibility for industry-strength
web search, Microsoft Bing. The version used relies on machine
learning kernels at its core and significantly increases computation
at nodes.

Main Results:
(1) Low power, mobile-class processors (like Intel Atom) have better
power efficiency, area efficiency and price efficiency than high-
performance, server-class microprocessors (like Intel Xeon).
(2) High performance, server-class microprocessors provide better
robustness in terms of higher throughput, smaller latency, better
quality of service and more relevant query results.
(3) High performance, server-class microprocessors provide better
flexibility in handling search activity spikes measured in absolute
terms.
(4) Low power, mobile-class processors provide better reliability in
handling node failures and search activity spikes measured in relative
terms because load of failed node is more distributed.
(5) It is possible to mitigate price of efficiency of small scores by
<1> use microprocessor integration <2>use over-provisioning and under-
utilization in a way that best fit production-level robustness,
flexibility and reliability requirements <3> perform micro-
architectural enhancement like improving divider, branch predictor and
cache hierarchy design <4> use heterogeneous cores and application-
specific accelerators.

Evidence: Results (1) to (4) are based on real experiments outcomes.
Result (5) is based on hypothetical analysis on figures collected in
prior experiments.

Prior Work: Part 6 discusses related work in detail like data center
workloads, power efficient data center design and chip
multiprocessors.

Reproducibility: Because the application used in evaluation is some
version of Bing, it is hard for someone outside Microsoft to get it.
Moreover, authors provide little explanation on selection of dataset,
queries and experiment methods. Therefore, it is hard to reproduce the
experiments.

Question: Why authors choose Xeon Harpertown which is an old design
three years ago to represent high-end processors? Intel has published
faster but much more power-efficient server-side CPU based on Nehalem
microarchitecture. They are 20% faster but save 30% of power than
Harpertown. Is their conclusion still valid now? What if Intel is able
to make further faster but power-efficient server-side CPUs?

On Oct 20, 6:11 pm, Rodrigo Fonseca <rodrigo.fons...@gmail.com> wrote:

Joost

未讀,

2010年10月21日中午12:24:172010/10/21

收件者：CSCI2950-u Fall 10 - Brown

Paper: Web Search Using Mobile Cores: Quantifying and Mitigating the
Price of Efficiency
Authors: Vijay Janapa Reddi, Benjamin Lee, Trishul Chilimbi, Kushagra
Vaid
Date: ISCA 2010
Novel Idea: The authors seek to examine the effect costs of different
cores in terms of total expected price of operation per unit. This
examines power usage, initial cost, and performance/unit.
Main Results: The authors specifically examined the Atom and Xeon
processors across a wide variety of metrics. In particular they
focused on the processors under a search engine like strain.
Impact: This paper brings to the forefront the idea of cost reduction
through the presentation of a variety of metrics which measure
efficiency in performance and cost.
Evidence: The authors presented a variety of new approaches both in
terms of Quality of Search metrics and in terms of cost measurements.
Prior Work: The paper builds off of the ideas first presented in other
papers that first sought to bring to light the increasing share of
power in the operating budget of data centers such as FAWN.
Reproducibility: The tests which they ran could be reconstructed
granted access to the types of server clusters that they used.
Questions/Criticism: Since the cost measures being presented are
relatively new, a lack of comparative measures makes the current
measures while promising seem just the first step. How do we know if
the metrics being used actually encompass the efficacy of the machines
that the authors claim it does?

On Oct 20, 6:11 pm, Rodrigo Fonseca <rodrigo.fons...@gmail.com> wrote:

回覆所有人

回覆作者

轉寄