12/1 Reviews - FAWN

9 views
Skip to first unread message

Rodrigo

unread,
Nov 30, 2009, 8:10:58 PM11/30/09
to CSCI2950-u Fall 09 - Brown
Please post your reviews here.

Steve Gomez

unread,
Nov 30, 2009, 11:16:57 PM11/30/09
to CSCI2950-u Fall 09 - Brown
Author: D. G. Andersen, J. Franklin, M. Kaminsky, et al.
Paper Title: "FAWN: A Fast Array of Wimpy Nodes"
Date: In SOSP 2009

The authors describe FAWN, "a new cluster architecture for low-power,
data intensive computing", and present a key-value storage
implementation (FAWN-KV) built on FAWN. The novel idea is that many
low-power CPUs with local flash storage can form a viable cluster
topology that is also energy-efficient. This works because the
authors hold an assumption about the cluster's workload: being data (I/
O) intensive, and not computationally expensive.

Given major services like Facebook and Amazon (whose infrastructures
are massively I/O intensive and parallelized) this assumed load
reflects a major portion of real load in internet services, so FAWN
might be broadly applicable. There are also few cheap and appropriate
cluster architectures for this type of load. The authors point out
that conventional disk-based clusters suffer from poor seek
performance with small-object random-access workloads. Memory-based
clusters use expensive hardware and are energy-expensive.

Related work includes cluster architectures that try to balance cost
and performance (an issue for basically all cluster architectures).
In this case, FAWN is trying to consider energy cost as well as other
costs in address heavy I/O, small write workloads. Other datacenter
architectures that use energy-efficient commodity hardware include
CEMS, AmdahlBlades, and Microblades. Theoretical work in proportional
energy use has been done, and the Intelligent RAM and Active Disk
projects focus on creating more energy-efficient uses of memory and
disk. Storage specifically based on Flash includes FlashDB and FD-
Tress projects, as well as JFFS2. The work in this paper also builds
off general filesystem principles using DHTs, replication and
consistent hashing, as in GFS and other distributed storage we've
looked at.

The authors evaluate FAWN (through FAWN-DS) against conventional
datastores, using both Flash and disk. Evidence for the authors'
hypotheses about performance and queries-per-joule efficiency of FAWN
is provided in the form of multiple tables and plots. I think Figure
16 (solution space for lowest cost of operation, given dataset size
and query rate) is the best take-away result of the paper's evaluation
(but may also be the most misleading, because it represents some
important assumptions about total ownership cost and device
selection).

One criticism I have in this section is about Figure 9 and section
4.1.1. The authors try to make the point that "Flash devices are
capable of handling the FAWN-DS write workload extremely well - but a
system designer must exercise care in selecting devices that actually
do so." I realize that the figure and experiment they describe
regarding different Flash drives are somewhat of a novelty - to prove
a point - but there is little analysis regarding what constitutes a
"capable" device. What would the total performance variation be
between FAWN architectures using the best and worst devices tested
here? At what point does device selection start to narrow the "FAWN +
Flash" solution space in Figure 16?

Overall, this was a thoughtful paper. I am interested to see how easy
it is to move a large-scale, currently operating service (like
Facebook) onto a FAWN-based architecture. Where would performance be
sub-optimal, and does this suggest that large-scale service developers
should be thinking about host architecture in a new way as these
platforms become more common?

Xiyang Liu

unread,
Dec 1, 2009, 12:17:18 AM12/1/09
to CSCI2950-u Fall 09 - Brown
Paper Title
FAWN: A Fast Array of Wimpy Nodes

Author(s)
David G. Andersen, Jason Franklin, Michael Kaminsky,
Amar Phanishayee, Lawrence Tan, Vijay Vasudevan

Date
SOSP’09

Novel Idea
The paper presented the design of FAWN, a cluster built with flash
memory and 'wimpy' processors. FAWN system contains several front-end
nodes and a network of back-end data store nodes. The data store nodes
form a consistent hashing ring and data is replicated on three
successive nodes. The flash store system is a log-structured store
system which copes with the slow small random write of flash memory.

Main Result(s)
FAWN system is best tailored for read-intensive key-value datastore
for small objects. It is proved that FAWN has better access speed than
disk+DRAM servers and consumes much less power.

Impact
Flash memory beats hard disk in speed, stability and power efficiency.
It has been explored for years to use flash memory as an alternative
to hard disk in industry. The design of FAWN applies the idea to
clusters and matches the need to reduce power consumption for data
centers. Although it is unlikely to completely replace hard disk by
flash memory, it is highly possible that the future data center would
use a hybrid structure of hard disk, flash memory and DRAM.

Evidence
A 21-node FAWN system was evaluated. It was also compared to
traditional server in query throughput and power efficiency.

Prior Work
The log-structured data store system is similar to previous flash file
system. The network structure of nodes also resembles Chord DHT.

Reproducibility
The system is reproducible by connecting the commodity devices and
implementing consistent hashing and log-structured store.

Question & Criticism
It is not clear how FAWN architecture works for modern data-intensive
applications. Hard disk has the advantage of capacity and less cost
per terabyte. It might not be practical for FAWN to serve applications
involving terabytes even petabytes data such as web search. Flash
memory is likely to be incorporated into future data centers as an
additional layer besides DRAM and hard disk.


On Nov 30, 8:10 pm, Rodrigo <rodrigo.fons...@gmail.com> wrote:

Dan Rosenberg

unread,
Nov 30, 2009, 11:16:35 PM11/30/09
to brown-cs...@googlegroups.com
Paper Title
FAWN: A Fast Array of Wimpy Nodes

Authors
David G. Andersen et al.

Date
October, 2009

Novel Idea
By using low-power embedded CPUs and flash storage, which is faster than disk, cheaper than DRAM, and more energy efficient than either, FAWN attempts to provide a cost-effective cluster solution that provides the same performance guarantees as existing solutions.

Main Result
The authors implemented a prototype of FAWN and evaluate its performance.

Impact
FAWN may provide insights and groundwork for future work in providing both cost- and energy-efficient distributed data stores.

Evidence
The authors test their work by evaluating the performance of FAWN-DS on a single node, and then evaluating the performance of a 21-node cluster.

Prior Work
This work expands on previous work in consistent hashing, studies on energy efficiency, and log-based data storage.

Reproducibility
Some of the components of the system aren't described in enough detail to implement with ease.  In addition, creating such a system would require substantial software and hardware investment.  However, given a working system, the evaluation performed would be easily reproduced.

Criticism
Most of my criticism is due to lack of detail and limited evaluation, rather than the actual architecture of the system.  More detail could have been provided on the splitting and garbage collection algorithms used, but I appreciated the evaluation of their performance.  What hash function is used (the fact that it's 160 bits suggests SHA-1...why not tell us)?  The evaluation is a good start, but the authors could have included a section assessing the impact of node failures.

Questions/Ideas for Further Work
I think FAWN is a cool idea that's a good start to solving some problems in energy efficiency.  I would be interested in a demonstration that this system would scale well to the order of magnitude required by its claimed competitive work (Dynamo, etc.).  Also, I would be interested in analyzing the potential cost, energy, and performance benefits in a wider variety of workloads, beyond the intuition-based assessment provided here.

Marcelo Martins

unread,
Nov 30, 2009, 10:07:17 PM11/30/09
to brown-cs...@googlegroups.com
Paper Title "FAWN: A Fast Array of Wimpy Nodes"

Author(s) David G. Andersen et al.

Date ACM SOSP'09 October 2009

Novel Idea

Motivated by the poor seek performance of disks on random-access
workloads and the large power draw required by cluster and datacenters,
Andersen et al. propose a new approach to furnish distributed
computation and fast disk access by substituting traditional machines
for low-power, embedded CPUs with flash storage for rapid random-data
access.

Main Result(s)

The evaluation shows that FAWN-KV's performance grows by leaps and
bounds as soon as the data storage can fit the DRAM cache. Moreover, the
authors conclude that depending on the brand of Flash device, read and
write operations present totally different performances for the same
workload. Finally FAWN-DS design is suits flash writes better if
compared to Berkeley DB.

Regarding cluster evaluation, FAWN-KV copes well with ring membership
changes as long as the system is not running on its limits.

Finally, compared to traditional systems, FAWN+SSD presents the best
bang for the buck regarding data access, TCO and power draw.

Impact

The FAWN approach has the potential to achieve high performance and be
more energy-efficient than conventional architecture, while harnessing a
well-matched cluster system.

Evidence

FAWN has been implemented using a cluster of embedded components and
different types of SSD and disk-based storage to fault-tolerable,
consistent key-value access. FAWN-KV is the software component
responsible for managing data access and distribution.

Prior Work

Several works have used commodity embedded low-power CPUs and flash
storage for cluster key-value applications, such as Gordon, CEMS,
AmdahlBlades, Microblades and Microsoft's Marlowe.

FAWN-KV organizes the back-end virtual IDs into a storage ring-structure
using consistent hashing, a concept that have been previously used by
DHT system like Chord and other applications like CoralCDN.

Competitive work

Traditional systems composed of desktop-like machines with SSD or
disk-based storage are the direct competitors of FAWN

Reproducibility

Due to the many intricacies of FAWN and the lack of a better description
of its hardware and software components, the results cannot be reproduced.

Question

1) Andersen et al. hypothesizes that SSD performance may reach into DRAM
territory. Considering that in the last few years I/O bandwidth has not
changed much, is this really feasible?

Criticism

1) The authors are only worried about the performance of data access and
power draw, but forget that distributed systems are motivated from the
need of parallel computation and scalable data crunching. At no moment
Andersen et al. considered the case of applications that require large
amounts of processing power, which is common in many distributed
applications is and cannot be ensured by the wimpy nodes.

2) In Section 5, the authors only consider scenarios where FAWN performs
better than traditional systems. What about "large datasets, high-query
rates"? I believe that traditional system scan respond more promptly and
as shown in Table 4, the power savings compared to FAWN are almost the
same.

3) The authors mention that in the future FAWN+SSD could become the
dominant architecture for a wide range of random-access workloads. In
the case of large datasets, a FAWN+SSD architecture would require many
nodes to achieve a large storage space. One parameter that was not
mentioned in Table 4 was the GB/$. Although they have existed for a
while, SSD disks are still expensive, a large number of nodes would
require a considerable amount of investments on infrastructure. In
addition, as mentioned by the authors, a large number of nodes incurs an
increase on the number of switches, which are well known for not being
energy-efficient. Being this way, the FAWN+SSD architecture would not be
as cheap and as energy-efficient as the authors expect it to be.

4) Graphs 9, 10, 11, 12 lack variance measurements. Table 1 and 2 also
should have provided such values.

yash

unread,
Nov 30, 2009, 11:39:16 PM11/30/09
to CSCI2950-u Fall 09 - Brown
Paper Title: FAWN: A Fast Array of Wimpy Nodes

Authors: David Andersen, Jason Franklin. Amar, Vijay and colleagues.

Novel Idea: The paper introduces a new cluster architecture of low-
powered embedded CPUs having flash storage. The main purpose behind
this architecture is to save power and provide the same efficiency for
data-centric I/O intensive jobs, as usual clusters with hard-drives.

Main Result: This new architecture consist of low-powered embedded
CPUs and flash drives for saving energy and to provide efficient, fast
and cost-effective access to large, random-access data. The key design
choice for FAWN, cluster based key-store is logstructured per-node
datastore called FAWN-DS which provides high performance reads and
writes using flash memory and supports caching, replication (hashing)
and consistency.

Impact: Among all the expenses incurred by a datacenter, electricity
expense is a major one. The proposed architecture if implemented would
lower the electricity by a good percentage, saving lot of energy and
money.

Prior work: There are many prior researches like JoulSort, CEMS,
AmdalhBlades, Marlowe,etc. to conserve the energy consumed by
datacenters.

Reproducibility: It is difficult to reproduce as it require special
type of hardwares. But can be reproduced.

Criticism: Limited usability- as it cannot be used for computational
expensive jobs.

Sunil Mallya

unread,
Nov 30, 2009, 11:27:55 PM11/30/09
to CSCI2950-u Fall 09 - Brown
Paper Title
FAWN: A Fast Array of Wimpy
Author(s)
Nodes Andersen, David, Franklin, Jason, Kaminsky, Michael,
Phanishayee, Amar, and Lawrence Tan, Vijay Vasudevan
Date
SOSP 2009

Novel Idea
Creating a new cluster architecture using flash drives and low powered
embedded CPUs to create a data intensive computing system which can
handle more than 1300 queries per joule of energy spent.

Main Result(s)
The authors strengthen their case for providing a new cluster
architecture for energy efficient data intensive computing by
providing use cases for Amazon and LinkedIn where CPUs tend to consume
cycles & energy when most of the workloads are IO bound, also they
give an estimate of the power drawn in these clusters which tend to
cost upto 50% of the infrastructure in place over 3 years.
FAWN has low power, efficient CPUs with flash storage to provide
efficient, fast and cost effective access to large random access data.
Fawn uses log structure per node datastore called FAWN-DS which
provides the mentioned high performance reads & writes using flash
memory. FAWN-DS is designed to operate within the DRAM available on
the individual nodes which make up FAWN-DS, all writes data storage
are sequential & all reads are a single random access which is made
faster and efficient by maintaining a DRAM hash table that maps keys
to an offset in the data log store, which basically avoids random
seeks. The FAWN-KV system is responsible for the request processing,
it organizes the back end DS nodes or their virtual Ids into a ring
using consistent hashing. Also the front end maintains the entire node
membership list and directly forwards the query to the backend and
caches these results to avoid hot spots for common queries.

Impact
This is really cool technology! Now datacenters can have more
hardware, handle more queries at the same energy cost.

Evidence
The evaluation was done on a prototype of nodes built with commodity
hardware. There are 21 backend nodes each with low powered single core
AMD Geode LX processor, the front end consists of Intel Atom based
machines.
FAWN is tested for its performance to deliver small data intensive
workloads. This was done by benchmarking individual node performance
and then on the entire system. The evaluation against Berkeley DB
helps understand better the benefit of using the log structure in FAWN-
DS. I also liked the fact that they evaluated how changes in the ring
membership could effect the query throughput.
I think there is enough evidence to suggest that this is a good
technology and can really be constructed upon. Given the fact that it
has been installed for over 4 years now, the results are promising.

Question & Ideas
How long will it take before facebook shifts to FAWN? Looks like they
can save a lot of money on power and also provide faster access than
what memcached delivers!

Criticism & future work
Right now looks like it can only work wonders for Key-Value data
storage systems which focus on getting really small amount of data.

Dongbo Wang

unread,
Dec 1, 2009, 1:02:47 AM12/1/09
to brown-cs...@googlegroups.com

Paper Title: FAWN: A Fast Array of Wimpy Nodes

Authors: David G. Andersen, Jason Franklin, Michael Kaminsky, Amar Phanishayee, Lawrence Tan and Vijay Vasudvan

Date: 2009

Novel Idea: This paper presents FAWN, a low-power data-intensive computing cluster architecture built on embedded CPU and flash storages. The storage node of the FAWN are organized into a ring using consistent hashing as the chord system, and each physical node joins the ring as some number of virtual nodes. When front-end get a request, the request will be forward to the node that contains the key-value pair desired. I think the most interesting idea is the way to map a key to a value described in the paper. The paper proposes a smart way to locate storage location of a key with efficient utilization of memory space.

Main Result & Impact: This paper presents the architecture of a low-power CPU and flash storage based data-intensive cluster. The paper gives detailed description on how the architecture works (key-value mapping strategy, operations like store, lookup and split, merge, and how storage nodes join and leave the ring organization). The cluster architecture is proposed for its energy-efficiency and also its good performance on data-intensive computing workload. As the paper points out, the FAWN cluster achieves 364 queries per Joule, and this is two orders of magnitude better than traditional disk-based clusters.

Evidence: In the evaluation section of the paper, the I/O performance of FAWN and its energy consumption is evaluated. The I/O performance of FAWN is compared with that of the wimpy nodes. The results show that the I/O performance is nearly the same as that of the wimpy nodes. This means the FAWN itself is very efficient, not adding to much additional workload to the storage system. The energy evaluation focus on the query rate. It turns out that the network switches accounts for 20% of the energy used by the entire system, which means the scaling of the cluster would be a problem.

Prior Work: JouleSort is a energy efficiency benchmark which is a disk-based system with low-power CPU. It gives better performance in terms of records sorted per joule. There are many other projects using low-power processors in datacenter to reduce energy consumption.

Reproducibility: I think it's possible to reproduce the work of this paper. The paper is very detailed in its content.
Question and Criticism: none



2009/11/30 Rodrigo <rodrigo...@gmail.com>

joeyp

unread,
Nov 30, 2009, 11:38:03 PM11/30/09
to CSCI2950-u Fall 09 - Brown
FAWN: A Fast Array of Wimpy Nodes

Anderson et al

SOSP 2009

This paper presents a distributed key-value store based on using Flash
devices and low-power, low-performance CPUs. The goal is to
significantly reduce the energy consumption needed for this type of
distributed system. The design emphasizes random-read,
sequential-write workloads, using a logging store to handle deletes
and updates while avoiding random writes.

Results from the paper strongly suggest that this is a viable
alternative to other implementations of such services in more
power-hungry datacenters. One key abstraction that is noted in the
paper is the relationship between I/O, CPU, and power use. The goal
of this work is to bring the CPU workload and I/O workload closer
together, noting that since the lower of the two bounds the speed of
processing, there's little reason to have the other operate at levels
of higher power and performance.

If FAWN-based solutions become feasible to implement, it could have a
tremendous impact on energy consumption in datacenters. There are
likely significant software and architecture considerations to contend
with, along with the inertia of the existing datacenters being built
differently (focusing on commodity PC's, for instance). Even so, the
potential rewards are great enough that these challenges may not stop
adopters for long. In a way, FAWN has a pretty close relationship
with the general idea of using low-cost, commodity parts anyway. It
just is a bit different from the canonical view of commodity machines'
construction.

FAWN is based heavily on existing work in distributed systems,
especially consistent hashing and strategies used in other similar
key-value stores. It also has strong influences from other
append-only and log-store strategies in the database world, all of
which it notes. The paper explains how these solutions were adapted
and optimized for FAWN nodes, emphasizing the importance of avoiding
random writes on these wimpy nodes throughout the architecture. They
leave a number of optimizations of the distributed hashing, fault
tolerance, and load balancing for future work. This is a reasonable
thing to do here, since the bulk of their results have to do with the
feasibility of this type of node, and they can make this point without
extensive implementation, and simply appeal to existing work.

In the evaluation, the paper gives a strong argument for why FAWN
architectures are preferable in a number of settings. It is simply a
rare case that an application is best-suited for the type of
architecture that many systems use today, with respect to both their
CPU and their ability to write to memory and disk. Lowering the
less-needed of these two (often the CPU) saves power without
sacrificing performance of the system, at the cost of reimplementing
low-level parts of the system.

I was pretty confused by figures 13 and 14, and I'm not sure how
useful they really are here. At first I thought they were trying to
show something about overall performance in terms of queries
processed, then I thought it was just to show how long joining and
merging take. I realized finally that it is a mix of the two, and
I'm not sure how important it is to the point that the paper is trying
to make. In terms of performance, it might have been interesting to
see data about increasing the number of nodes. Figure 15 is the most
helpful in this regard.

There's plenty of opportunity for future work here. They mention
issues of large network topologies for FAWN architectures, which would
have to be solved before this solution would be adopted for large
datacenters. There are also plenty of implementation challenges in
getting the protocols at a low level to work for FAWN nodes - they
mention changing OS parameters to get some things to work the way they
want - and there are undoubtedly other optimizations that could be
performed here, and ways to make adoption easier.





On Nov 30, 8:10 pm, Rodrigo <rodrigo.fons...@gmail.com> wrote:

小柯

unread,
Nov 30, 2009, 11:53:59 PM11/30/09
to brown-cs...@googlegroups.com
Paper Title:    FAWN: A Fast Array of Wimpy Nodes

Authors:        David G. Andersen
                    Jason Franklin
                    Michael Kaminsky
                    Amar Phanishayee
                    Lawrence Tan
                    Vijay Vasudevan

Date:           2009

Novel Idea:
    Facing the problem that many applications using conventional disk to save their key/value pair data in a inefficient way in both performance and energy consumption, authors proposed a new method to take advantage of flash disks, low-energy embedded CPU and FAWN architecture to solve it. FAWN applies consistent hashing to deal with data distribution and provides strategies for replication and consistency.

Main Result:
    Fawn-KV, as a implementation of the FAWN architecture, is created, evaluated and compared with conventional disks in both aspects of performance and energy consumption. The result suggests that Fawn has outstanding improvements in both aspects.

Impact:
    Flash disks and embedded CPU seems very suitable for key/value pair storage. Other I/O devices might also have more appropriate hardware and software architecture based on their requirement. Future research involve this sort of improvement.

Evidence:
    The basic idea and architecture are presented. Fawn-KV works as an example to provide more detailed mechanisms and operations of FAWN. Also there are many explanations of design decision.
  
Prior Work:


Competitive work:


Reproducibility:
    Yes.

Question:
    Except the slow writes, flash disks also have a finite number of erase-write cycles.
    If a flash disks are to be used for a long period of time. This might become a limitation.
    One way to deal with this might be the replication mechanism. Once a flash disk is broken, it would be replaced by another one and replicated data could be recovered using replicated data. Is this the solution applied in FAWN architecture? Ot there's some other ways FAWN tackle this situation?

Criticism:
  
Ideas for further work:
   

2009/11/30 Rodrigo <rodrigo...@gmail.com>

Juexin Wang

unread,
Nov 30, 2009, 11:59:49 PM11/30/09
to brown-cs...@googlegroups.com

Paper Tile:

FAWN: A Fast Array of Wimpy Nodes


Date:
Oct.2009

Author:

David G. Andersen, Jason Franklin, Michael Kaminsky

Novel Idea:
-Using cluster of datastores which hiring log-structure, flash storage, key-value storage instead of disk storage 


Main Results:
-Principle, design and implementation of FAWN-KV—a consistent, replicated, highly available, and high-performance key-value storage system built on a FAWN prototype.
 

Impact:
- Designed a system that can afford more I/O request, better support the random access and consume less power.
 

Evidence
-Use ring structure to organize the distributed nodes, use log style key-value storage in the nodes (DS), use flash memory as physical storage equipment. DS can split, merge, compact the key range.
-For looking up, it map the keys to values, keep a fragment of a key in the memory. 
-Buffer, chain replication, global allocation manage, etc.
-The evaluation demonstrates that FAWN clusters can handle more queries per Joule of energy than a disk-based system.

Reproducibility
N/A

Comparative Work
.all other disk-based systems and systems not using FAWN mechanism.

Question
I believe the flash memory is more reliable (low failure rate), but since the disks are widely use, we have many tools to monitor their statues and performance. Do we also have ways to efficiently detect a failed, abnormal behavor flash-memory ? 
Criticism
- They should discribe more detailly in 5.2(architecture comparison), to support the figure 16, this figure looks very useful and demonstrates the FAWN is really good.
.The flash memary is more expensive in price, considering the datacenter who stores PBs of data, it's a real matter.

Ideas for further work
.Decentralize the allocation manager

 
On Mon, Nov 30, 2009 at 8:10 PM, Rodrigo <rodrigo...@gmail.com> wrote:
Please post your reviews here.



--
J.W
Happy to receive ur message

James Tavares

unread,
Nov 30, 2009, 9:32:12 PM11/30/09
to brown-cs...@googlegroups.com
*FAWN*

Paper Title: FAWN: A Fast Array of Wimpy Nodes

Author(s): David G. Anderson, Jason Franklin, Michael Kaminsky, Amar
Phanishayee, Lawrence Tan, Vijay Vasudevan

Date: October 11-14, 2009. SOSP �09.

Novel Idea: This paper challenges the `commodity server� paradigm with
the introduction of FAWN, an architecture based on a multitude of
�wimpy� (low processing speed) nodes using Flash as their persistent
storage mechanism. Citing energy costs as a major factor in the 3-year
TCO of data centers, the authors build an architecture meant to exploit
the benefits and mitigate the disadvantages of flash, achieving higher
overall throughput per Joule vs. conventional disk-based systems.

Main Result(s)/Evidence: The paper starts out by making the case for the
FAWN architecture (citing high energy consumption of DRAM and disks,
energy costs as 50% TCO of a computer, and the typical need to store
lots of small objects in modern data centers). Next, it describes the
FAWN architecture itself (front-end and back-end nodes, consistent
hashing, and the FAWN-KV application). Finally, the primary result of an
evaluation is that FAWN achieves about 364 queries/joule on Flash-based
nodes vs. 51.7 queries/joule for a desktop system using an SSD drive and
1.96 queries/joule for a desktop using a traditional hard drive, a major
improvement in energy savings.

Impact: Work is perhaps too new to tell. However, the authors (and
other) others have made a convincing argument for the need for energy
savings in the datacenter. Solid-state data storage sure seems to be the
end-game in both server and consumer devices, so perhaps this work is a
useful stepping stone to proving the usefulness of flash.

Prior/Competitive Work: The authors mention a great deal of previous
work from similar architectures (low-power CPUs+Flash) to
flash-optimized file systems and distributed hash tables. One notable
mention: The authors list MapReduce clusters running atop GFS as the
�next target� for FAWN�but I�m not sure why this is such an obvious
target as these systems rely on large sequential reads of very large
datasets, rather than a multitude of seeks.

Reproducibility: Neither the paper nor the FAWN website make any mention
of obtaining binaries or source, so reproducing the exact FAWN
architecture may be difficult. However, my feeling is that the energy
savings have more to do with the hardware than the design of FAWN
itself; it should be possible to verify the energy savings statistics
with a trivially designed test application on a handful of nodes.

Question: How does the ext2 filesystem�s block-size interact with flash
block-size? How does FAWN achieve true persistence for small writes
without fsync()ing the open datastore and causing an entire flash block
to be erased and re-written after every FAWN-DS append/delete operation?
Perhaps a smarter filesystem could exploit flash block-sizes to avoid an
erase operation on appends (at the expense of wasted space).

Criticism:

1.) The evaluation is flawed in its comparison to disk-based systems.
The FAWN-KV system is clearly optimized for media which excels at
random-access, yet they run FAWN-KV during their disk evaluations.
Perhaps a better evaluation would have been against a range of
disk-optimized KV systems, such as Dynamo (Dynamite).

2.) The 1-to-80 front-end-to-back-end ratio derived in the paper (page
10) relies on 1KB-sized queries� I realize this is the target value-size
for FAWN-KV, however, there are two observations here: first, back-end
nodes could possibly generate larger volumes of traffic with larger
query sizes (therefore decreasing the number of back-ends a single
front-end can process), and second, the front-end node appears to be a
possible bottleneck.

3.) Much of this paper is spent describing consistent hashing and a
key-value store system that is nearly all regurgitated from previous
work, although lacking some notable features such as data versioning
(that is, being able to retrieve older versions).

Future Work: None.

Rodrigo

unread,
Dec 1, 2009, 8:41:06 AM12/1/09
to CSCI2950-u Fall 09 - Brown
---------- Forwarded message ----------
From: qiao <qi...@cs.brown.edu>
Date: Tue, Dec 1, 2009 at 12:44 AM
Subject: [2950-U review]cannot deliver to the google group
To: rfon...@cs.brown.edu


Hi Rodrigo,
I cannot send it to the google group.

Paper tile: FAWN

[novel idea]
FAWN takes the advantages of flash storage to bulid a fast and energy
efficient architecture for random read-intensive workloads.

[main results]
Flash has the feature of fast random reads and slow random writes.
Principles of FAWN architecture. Design and implementation of FAWN-KV
storage system:
The underline flash storage system, the per-node datasotre and the
key-value lookup system.
FAWN nodes delivered over an order of magnitude more queries per Joule
than
conventional disk-based system.

[impact]
The idea and practical design of using flash storage to break the I/O
bottleneck are worthy to learn, reproduce and furthur

explore.

[comparative work]
JouleSort--A SATA disk-based "balanced" system cupled with a low-power
CPU.
Gordon -- A hardware architecture that pairs an array of flash chips
and
DRAM with low-power CPUs.
CEMS, AmdahlBlades and Microblades also use commodity components as a
building block for datacenters.
Ongoing research on using flash as the database storage medium.
Filesystems using flash storage, such as JFFS2.

[prior work]
Consistant hashing, Distributed Hash Table, Key-value storage system.

[reproductivity]
Yes.

[question]
According to my experience, flash disks are easy to broke and have a
shorter life time, do we need more replicas to ensure

availability?

[idea for future work]
Design a hybrid system that uses flash storage, in-memory storage and
disk
storage to meet read+write intensive workload.


On Nov 30, 8:10 pm, Rodrigo <rodrigo.fons...@gmail.com> wrote:

Spiros E.

unread,
Dec 1, 2009, 10:05:04 AM12/1/09
to CSCI2950-u Fall 09 - Brown
The paper presents FAWN, a distributed key/value store designed to
minimize power consumption. FAWN uses flash memory and slow, low-power
CPUs to acheive its goal.

Though it is not mentioned in the paper, the creators of FAWN chose to
use NOR flash technology rather than NAND flash. Despite there being
two different technologies, each with its own pros and cons, the paper
repeatedly makes the claim that FAWN performs well on flash storage
(unqualified). What would happen if we ran FAWN on a system that used
NAND flash instead of NOR flash?

Despite the fact that the paper states that FAWN can be used as a drop-
in replacement for memcached, the paper does not offer a comparions.
Instead, the paper offers a comparison of FAWN to BereleyDB. A
comparison which the paper itself admits is unfair, though it is
illustrative of the fact that existing systems can't use flash memory
as a drop-in replacement for disks for the workload that the paper is
interested in.

It would have been nice for the performance and power conumption
benchmarks of FAWN to be compared to another system, instead of
different node configuration for the same system. There isn't a single
graph comparing these quantities across different systems, and little
discussion of it in the the text of the paper. In fact it's unclear to
me what Section 5 is trying to say. They talk about "systems" and
"node configurations" but it's unclear how the node configurations are
used in "traditional" "systems."
Reply all
Reply to author
Forward
0 new messages