Reviews: GFS

208 views
Skip to first unread message

Rodrigo Fonseca

unread,
Sep 20, 2010, 3:31:43 PM9/20/10
to CSCI2950-u Fall 10 - Brown
Please post your review as a group reply to this message.
Thanks,
Rodrigo

Abhiram Natarajan

unread,
Sep 20, 2010, 6:33:40 PM9/20/10
to brown-csci...@googlegroups.com
Paper Title: The Google File System

Author(s): Sanjay Ghemawat, Howard Gobioff, Shun-Tak Leung

Date: 2003, SOSP

Novel Idea(s): (a) Clusters with single masters
                     (b) Replication of data chunks
                     (c) Storing metadata in memory
                     (d) Relaxed consistency model
                     (e) Minimum master involvement

Main Results: GFS is a distributed file system which handles large 
datasets and manages to run efficiently using commodity hardware, 
which are more prone to failure than expensive server hardware.

Impact: In GFS, we have a file system that is:
            (a) Is Scalable
            (b) Is highly Fault Tolerant
            (c) Has high throughput
            (d) Handles google's workloads

Evidence: The authors measured performance on a GFS cluster 
consisting of one master, two master replicas, 16 chunkservers, and 
16 clients. They examine the read-rate, write-rate and append-rate. 
They then verify various performance parameters on real-world clusters. 
Finally, they evaluate a detailed breakdown of the workloads on real 
GFS clusters.

Prior Work: (a) AFS - Provides location independent namespace
                  (b) xFS, Frangipani, Intermezzo - Provides caching below 
                       the file system interface
                  (c) Frangipani, xFS, Minnesota's GFS, GPFS - No Centralized 
                       Server
                  (d) NASD architecture

Competitive Work: Though they do not provide a direct comparison with 
previous works, the fact that GFS handles Google's workloads (!) is 
implicit evidence of the superiority of some of its features.

Reproducibility: Given the description, parts of the system are reproducible.

Criticism: (a) Single master may not be the wisest decision
               (b) Design not optional in the event of a large number 
                    of small-sized files
               (c) Lazy reclamation of file space would mean space is 
                    sometimes not available even though it should be

Joost

unread,
Sep 20, 2010, 11:37:05 PM9/20/10
to CSCI2950-u Fall 10 - Brown
Paper: The Google File System
Authors: Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung
Date: SOSP’03, October 19–22, 2003
Novel Idea: This paper presents a data structure that deals with
comparatively massive files, and a dominance for appends as opposed to
overwrites in terms of its write traffic. Furthermore, this system
was designed to deal quickly and efficiently with component failure.
Main Results: The authors demonstrated the effectiveness of this
system for the purposes that they outlined as the objectives of the
file system. In particular they detailed and experimentally validated
communication and restart protocols.
Impact: As pointed out in the paper, the system proposed is finely
tuned to Google's internal necessities, though the design decisions in
terms of master/client/chunk protocols and processing order can be
very useful.
Evidence: The paper demonstrated a set of benchmarks for the system
both by using actual usage data from two separate GFS file systems
actually in use at Google, and through a small GFS that was designed
explicitly for testing specific loads and conditions. It is
interesting to note that the authors sought to pass in rather
difficult situations in the designed experiments (such as starting
from an empty buffer on all the machines for the read test).
Prior Work: The paper explicitly mentions such systems as Minnesota
GFS, NASD, and Lustre as previous platforms that had dealt with
similar implementation issues. And while 2 or three of the mentioned
systems were within 2 years old, most of the theoretical grounding the
paper cites seems to come form the late 80s and early 90s.
Competitive Work: It is hard to compare this system in which som much
of the write traffic is appends, and therefore the system is tailored
to it, to other database structures in which no such assumption is
made, other than to say that it is understandably much faster at
writing.
Reproducibility: It would be possible to design a GFS system using the
information provided in the article. However, lacking the propreitary
implementations of certain of the protocols might make it hard to
reproduce the numbers and results they were getting.
Question: How would the given system act if the dominance of append
writes were reduced to a slight majority threshold?

Duy Nguyen

unread,
Sep 20, 2010, 10:41:15 PM9/20/10
to brown-csci...@googlegroups.com
Paper Title

"The Google File System"

Authors
Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung

Date
2003/SOSP

Novel Idea
A distributed file system which is: for extremely large files, read favor (in contrast with Dynamo) and high availability. Like Dynamo, it is also built on clusters of commodity hardware.

Main Results/Impact
The paper described GFS architecture in quite detail, gave readers enough information on how the whole thing works: a single master that is responsible for managing metadata used for chunk mapping/leasing (for read/write requests from clients), logging all changes made to metadata, checking chunk servers status by pinging them, replicating logs,...; a set of chunk-servers that actually store 64MB chunks of files, each chunk is replicated to 3 chunk-servers.
The design of GFS has proved that commodity hardware can be used to support large-scale processing workload. It also has inspired open source community to build their own version HDFS which is currently used by major web content companies like Yahoo, Facebook.

Evidence
Their testbed is described in detail: 1 master, 16 chunk-servers and 16 clients. However, this is a modest number in the context that there are many more machines in their real datacenters. So the performance data reported in the paper is just for reference purpose, it gives no real statistic on how GFS performs in the real world. But we can see Google keeps growing, so I think GFS performance must be very good. The number is pretty good, esp in the 'Recovery Time' part where a chunk containing 660GB data went down, all chunks were successfully restored in 23.2 minutes at replication rate of 440MB/s.

Prior Work
They mentioned AFS, xFS, Minnesota's GFS,... and pointed out what diff in their approach. I haven't read those papers so can't judge what they said is reasonable or not.

Reproducibility
Although HDFS is available out there, I'm not sure it is 100% cloned from GFS, esp the paper just gives information at very high level.

Question/Idea for future work
Although the master just deals with metadata and they have notion of "shadow" master to provide read-only access, it's still a single point of failure and bottle. In some sense, GFS is similar to NSF: they share a common idea of having a centralized management server to serve clients (NFS does not have replication though). Can we apply "consistent hashing" techniques to get rid of master?

Criticism
Like other paper from corporate, some info still missed details. E.g in section 5.1.1 Fast Recovery, they just said master and chunk-servers are designed to be able to restart in seconds. How can they guarantee that?

On Mon, Sep 20, 2010 at 3:31 PM, Rodrigo Fonseca <rodrigo...@gmail.com> wrote:

Jake Eakle

unread,
Sep 20, 2010, 11:53:28 PM9/20/10
to brown-csci...@googlegroups.com
Paper Title

The Google File System 

Author(s)

Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung

Date 2003
Novel Idea A distributed filesystem optimized for very large files, high hardware and network failure rates, concurrent appends, sequential reads, and very few random reads and writes.
Main Result(s) They describe GFS, which provides: a concurrency model suitable for doing many simultaneous appends atomically; a master/chunkserver architecture that enables availability in the face of frequent failures; optimization for large files and sequential reads by using a very large chunk size, a very thin master (from the perspective of the client) who's job is to make good placements behind the scenes using global knowledge, but then grant leases to clients quickly and get out of the way, and incremental checksums. 
Impact Well, google exists. That's pretty big I guess. I have no idea if Microsoft and the other big search providers use similar technology; it seems likely that they would have similar needs.
Evidence If the performance of google's services themselves are not evidence enough, they provide a lot of sandboxed benchmarks for things like read and write throughput, as well as data from actual instances running real google services. However, the data are so old at this point as to be all but meaningless; by now their hardware is certainly an order of magnitude or more faster than it was, so it's perhaps silly to worry too much about the numbers they state. Even at the time, without a comparison to another system that can function on the same scale, their numbers are hard to use for evaluation purposes. It would be nice to see some indications of the changes in performance they see due to making various design decisions, but they do not provide this.
Prior Work The linux kernel; AFS and every other distributed file system ever.
Reproducibility They describe all major components of the system at a high level, but do not go into implementation details.
Question They say that their hardware fails a lot, but they don't quantify this. What is the relationship between disk failure rates, number of chunkservers assigned a chunk, and availability? How about network failure rates?
Criticism Buh.. guh .. pretty good paper, honestly. Same issues as with any publication from a company instead of a research lab; they want to protect their IP to some degree. Main thing I saw missing I already addressed in my question above.
Ideas for further work Trying to quantify the optimal replication rate given a hardware failure rate and an availability goal (and maybe a budget).


--
A warb degombs the brangy. Your gitch zanks and leils the warb.

Dimitar

unread,
Sep 20, 2010, 11:12:43 PM9/20/10
to CSCI2950-u Fall 10 - Brown
The Google File System

Author: Sanjay Ghemawat, Howard Gobioff, and Shun_Tak Leung
Date: 2003

Novel Idea: This paper presents The Google File Systems, a scalable
distributed file systems for
large-intensive applications. The main goals of the GFS is to provide
scalability,reliability,
performance and availability at the cost of consistency. It also
provides cost effective solution
because it run on commodity hardware.

Main Result(s): Google File System is widely deployed at Google to
support many production and
research services .

Impact: The paper doesn't present any revolutionary ideas, but it
shows how to combined proven
useful techniques to get a production ready distributed file systems.
The authors demonstrated the
usefulness of writing leases while mutating a chunk, record append
( atomic appends) to a file which
are essential part in their system.

Evidence: As mention the system has been fully deployed in production
environment for several years.
In section 6.2 the authors present tests showing that their system is
well optimized for appending
and reading files. It also shows that having a master server will not
bottleneck their system.

Prior Work: Google File System builds on previous distributed files
systems , however its design
is based on the needs of Google applications which require fast reads
and appends to file, and
occasional random writes.

Competitive Work: The authors mentions that Google File System most
closely resemble NASD
architecture. The main differences between the systems are :
1. NASD uses network disk drives while GFS uses commodity machines as
chunkservers.
2. NASD uses variable size objects while GFS only supports fixed size
objects.
Reproducibility: The work is nor reproducible because the authors left
out many questions about
load balancing and scalability unanswered. Also, the focus is on
design, but not implementation
of the system.

Question: In Dynamo paper we discuss Merkle tree for detection of
inconsistency and minimize
data transfer. Can Merkle tree be used instead of checksum and piping
in Google file System
to detect corrupt data and improve performance?

Criticism: The paper lacks experiments for scalability of the
system,and in some cases it lacks details.
For example, the paper talks about shadow servers as key component of
providing reliably and
availability to the clients, but it only discussed them briefly. The
paper doesn't mention
how a new master server is elected when the master server goes down.

On Sep 20, 3:31 pm, Rodrigo Fonseca <rodrigo.fons...@gmail.com> wrote:

Zikai

unread,
Sep 20, 2010, 4:05:03 PM9/20/10
to CSCI2950-u Fall 10 - Brown
Paper Title: The Google File System
Author(s): Sanjay Ghemawat (Google) et.al
Date/Conference: SOSP03

Novel Idea: (1) Explore design points in an environment with large-
scale distributed data intensive applications. The new design points
(component failure is normal, huge files, sequential read and
concurrent appending, etc.) are radically different from traditional
ones and guide the design decisions.
(2) Design a system architecture with single master and multiple
chunkservers. The use of single master allows global FS operations
like namespace management and locking, replica placement (and
creation, re-replication, rebalancing), garbage collection vastly
simplified. Meanwhile, master’s role is carefully designed so that it
would not be the system bottleneck.
(3) Keep metadata in main memory of master to speed up looking up
while using operation logs and checkpoints to avoid failures.
(4) Relax consistency model for data mutations (serial and concurrent)
to allow easy design and efficiency (especially the at-least-once
semantic for concurrent appending).
(5) Separate data flow and control flow to achieve high throughput.
(6) Achieve high availability with fast recovery of master and
chunkservers and sophisticated chunk replication policies.

Main Results:
(1) Reexamine traditional design choices and explore radically
different design points based on Google application workloads and
technological environment.
(2) Design and implement GFS, a scalable distributed file system for
large distributed data-intensive application (MapReduce maybe). The
system handles frequent component failures well. It is optimized for
huge files that are mostly concurrent appended and sequentially read.
It is targeted at high aggregate throughput for a great number of
concurrent readers and writers performing a variety of tasks rather
than latency.
(3) Evaluate GFS with micro-benchmarks and in real-world clusters. The
evaluation shows both capacities and bottlenecks of GFS architecture
and implementation.

Impact: Google’s GFS and MapReduce lead a new era in distributed
computing and have a deep impact in both industry and academia. In
industry, with Hadoop and other implementations of GFS+MapReduce,
companies finally have a powerful and cost-effective computing tool to
handle huge-amount of data. In academia, GFS lays a good example for
future distributed file system with similar workload and application
environments.

Evidence: In part 6.1, micro-benchmarks are used to measure
performance metrics (read rate, write rate, master ops) on a GFS
cluster. In part 6.2, performance in real world clusters are measured.
In part6.3, a detailed breakdown of workloads on two real-world GFS
clusters is showed.

Prior Work: In Part8, authors discusses prior work and related work in
detail. For example, GFS spreads a file’s data across storage servers’
in a way akin to xFS and Swift. Another example is that GFS most
closely resembles the NASD architecture.

Reproducibility: GFS paper has good reproducibility. It has clear
illustrations and enough details which allow a number of GFS
implementations like Hadoop’s HDFS which have most of GFS
functionality and capacities around.

Question: Why concurrent appending is not implemented in HDFS? Is it
because GFS paper is not detailed enough or concurrent appending puts
some restrictions on applications?


On Sep 20, 3:31 pm, Rodrigo Fonseca <rodrigo.fons...@gmail.com> wrote:

Basil Crow

unread,
Sep 20, 2010, 11:57:12 PM9/20/10
to brown-csci...@googlegroups.com
Title: The Google File System

Authors: Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung

Date: SOSP 2003

Novel Idea: GFS is designed for inexpensive commodity components that often fail and optimized for large (multi-GB) files that are mostly appended to (often concurrently) and then read (rarely randomly). A GFS cluster consists of a single master and multiple "chunkservers," which are monitored using heartbeats and kept consistent via a sophisticated replication procedure.

Main Result(s): GFS was able to saturate a 1 Gbps link between two switches while doing reads. After killing two chunkservers, they were restored to at least 2x replication within 2 minutes. The authors claim that GFS has successfully met Google's storage needs and is widely used within Google.

Impact: GFS showed that reexamining traditional file system assumptions in the light of modern Internet-scale systems could yield tangible performance benefits.

Evidence: The authors present a series of "micro-benchmarks" testing reads, writes, and record appends. They also examine the read/write rates and recovery time of two live GFS clusters. Finally, they describe the distribution of operations on chunkservers by size and by type.

Prior Work: GFS builds on several previous large distributed file systems, such as AFS and NASD.

Competitive work: In contrast to previous approaches, such as Minnesota's GFS and NASD, GFS is based on commodity hardware and does not seek to alter the model of the storage device.

Reproducibility: The source to GFS is not available, and it would not be trivial to replicate such a filesystem based only on the details given in the paper.

Matt Mallozzi

unread,
Sep 20, 2010, 11:21:53 PM9/20/10
to brown-csci...@googlegroups.com
Matt Mallozzi
9/20/10

Title:
The Google File System
Date:
2003
Novel Ideas:
A distributed file system with high throughput performance, optimized for
large files, sequential reads, and append writes. Assumes a great amount of
failure in its large pool of commodity hardware, and provides fault
tolerance against it.
Main Results:
A working system that has been successfully deployed in the massive data
centers of Google, and that has survived many real-world, Google-sized
applications and workloads.
Impact:
An significant example of a distributed file system being heavily tailored
to the expected workload (sequential reads and appending writes). Provides
insight into a system which has managed to stand up under one of the biggest
possible deployments in terms of number of machines and amount of data
stored.
Evidence:
The system was widely deployed within Google at the point of this paper,
with cluster sizes in the thousands of machines and total storage in the
hundreds of terabytes. That already is significant evidence that it meets
Google's strong reliability and performance needs. The numbers given are for
both a small, controlled test instance (19 total servers), as well as for
two real-world clusters.
Prior Work:
By this time there were many previous attempts at distributed file systems.
GFS does not seem to directly derive from any one system, but it is related
in ways to previous systems such as xFS, Swift, and NASD.
Competitive Work:
Their Related Work section provides a feature and implementation comparison
to several other systems, but they do not have performance or reliability
comparisons.
Reproducibility:
Although the source code for GFS is not available (this is always a
criticism of mine - I am a big fan of open source, especially under the
GPL), their descriptions of the various parts of the system are quite
detailed. The file system used with Hadoop (HDFS) seems to have reproduced
it decently well, although I am unsure how much of HDFS is based on GFS
implementation details and how much of it is simply inspired by GFS.
Question:
Would it be worth considering porting the daemon code to the Linux kernel to
avoid repeated context switches, as just about every chunkserver operation
requires a system call?
Criticism:
It would have been nice to see a comparison in performance and reliability
for different numbers of chunk replicas - why 3?
Further Work:
Ensuring that the operations of different applications/users stay isolated.

On Mon, Sep 20, 2010 at 3:31 PM, Rodrigo Fonseca <rodrigo...@gmail.com> wrote:

Hammurabi Mendes

unread,
Sep 20, 2010, 5:48:45 PM9/20/10
to brown-csci...@googlegroups.com
Paper Title

The Google File System

Authors

Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung

Date

SOSP'03 - ACM Symposium on Operating Systems Principles, 2003

Novel Idea

The paper presents the Google File System (GFS), a distributed file
system designed to deal with Google's workload characteristics. GFS
provides certain fault tolerance and availability guarantees, while it
monitors for data integrity through checksums.

Impact

The authors assume a scenario where component failures is common,
decide to optimize append operations in large files and aim high
throughput instead of focusing on latency. They provide a solution
that is structured as follows.

A single master indeed simplifies the implementation and allows it to
make better allocation decisions as it has global knowledge. The
master stores low amounts of filesystem metadata, in order to keep
everything in memory. It also synchronizes its information with other
replicas. The master replays operation logs to find the current
filesystem state, and sometimes flushes logs into the storage system
to keep its log small. The system interactions avoid incurring in too
much load on the master, which keeps low metadata.

There is a lease system for performing updates. The way it is
designed, it can cause what the authors call "consistent but
undefined" state. Therefore, applications usually work around this
problem using self-validating records. Chunks of data are checksummed
and versioned to help data integrity and consistency.

The master re-replicates chunks if the number of available replicas is
low. It rebalances replicas across chunkservers to increase load
balance.

Evidence

The authors provide measurements on a small cluster and on production
clusters. In the small cluster, and find their writes are not what
they expect, and that record append operations cause network
congestion.

In production clusters, they characterize very well their workload,
and indeed justify some of their system design assumptions (for
instance, their option to optimize record appends). They also show
that the master is indeed alleviated from a high load, despite the
size of the system.

Prior Work + Competitive Work

The authors cite GFS (another one, "The Global Filesystem") and GPFS
as alternatives that employ distributed algorithms to manage the
filesystem, but they claim their centralized approach is simpler yet
effective (and that centralized knowledge means better decisions).

They also mention NASD, based on network-attached disks, but they say
basically that GFS is better to their needs, and it is based on
commodity equipment, which probably have a better cost-benefit
relation. Lustre is cited as something that aims similar results, but
they claim GFS is more adequate for their own needs. Other systems are
mentioned, but, again, they mention their own workload characteristics
as hints for their design choices.

Reproducibility

It is difficult to reproduce the experiments that happened on
production clusters at Google. However the paper provides some insight
on their workload. This could be useful if someone wants to compare
performance with GFS, although the data provided in the paper is not a
full characterization (which doesn't seem viable, anyway).

Questions + Criticism

The authors claim that the network stack does not interact well with
their pipeline scheme to replicate data. They do not give further
explanation, but I think that it is an important issue, as their
system is focused on throughput (low throughput could deeply affect
their pipeline scheme). Are these problems caused by TCP buffering,
timeouts, ..., ? Did they change anything in the network stack after
this conclusion?

They also show that record appends cause network congestion. However,
they optimized such operation in the system, given its relevance to
their own needs. They could explain better why this congestion
appears. OK, the clients are writing a single file, but why the
variance (the throughput increase) as the number of clients increase
once again after 10 or so clients?

Ideas for Further Work

As the system employs a central master, it is perhaps viable to avoid
the problem where concurrent writes can leave the system in a state
where every client sees the same data, but the data itself is
inconsistent (this is caused by the lease system). As the authors are
more interested in throughput instead of latency, this seems something
that is possible to implement reasonably. Also, as concurrent record
appends are common and critical, perhaps avoiding "duplicate" appends
(a problem that can happen, according to the authors) is useful.
Perhaps a specialized master just to arbitrate on these issues could
solve the problem. The clients, then, would need to do less
work-arounds to deal with this potential issue.

On Mon, Sep 20, 2010 at 3:31 PM, Rodrigo Fonseca
<rodrigo...@gmail.com> wrote:

Visawee

unread,
Sep 20, 2010, 9:55:42 PM9/20/10
to CSCI2950-u Fall 10 - Brown

Paper Title :
The Google File System

Author(s) :
Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung

Date :
SOSP’03, October 19–22, 2003, Bolton Landing, New York, USA

Novel Idea :
GFS is a distributed file system that support large-scale data
processing workloads on commodity hardware. The design of the GFS
saperates file system control from data transfer, and therefore, be
able to achieve high transfer rate.
GFS uses an online repair mechanism that regularly and transparently
repairs the damage and compensates of lost replicas as soon as
possible by trying to reduce the interferance of the real workload.
GFS also supports snapshort.

Main Result(s) :
The system is able to serve large-scale data processing workloads very
well. The read rate and write rate is very high. It also shows the
capability of recovering from lost of chunkservers within a short
time.

Impact :
A distributed file system that can support google-like large-scale
data processing on commodity hardware.

Evidence :
The Micro-benchmarks show that the read rate, write, and append rates
of the system are good. The read rate acheive 75% of the maximum
bandwitdth in the case of 16 readers.
The real workloads show that it is able to support large-scale data
processing very well. Another experiment also shows that the recovery
strategy of the GFS works very well - it restores to 2x replication
within 2 minutes in the case that it has only 1 replication left.

Prior Work :
The general concept of distributed file system. However, GFS is custom
made for the workloads in Google.
It's design is also closely resembles the NASD architecture.

Competitive work :
NFS : GFS can scale a lot better than NFS both in terms of number of
clients and storage size.

Reproducibility :
The results are not reproducible. The authors didn't mention all
detail implementations of the GFS.
Some experiments are also based on the real workloads inside Google.

Question :
What will happen if the master server is unable to recover from crash?

Criticism :
- The authors did not show any evidence supporting that the workloads
are well distributed among chunkservers.
- This design is proned to lose all data when the whole data center is
down. The data should be replicated to different data centers.
- The explanation in the data integrity section is not clear to me.

Ideas for further work :
- The data flow approach between chunkservers of the GFS is a greedy
algorithm which might be able to be improved by using shortest path
algorithm. The improvment will reduce the write latency.

On Sep 20, 3:31 pm, Rodrigo Fonseca <rodrigo.fons...@gmail.com> wrote:

Sandy Ryza

unread,
Sep 21, 2010, 1:00:28 AM9/21/10
to CSCI2950-u Fall 10 - Brown
Novel Idea:
The authors present a distributed file system meant to be run on
clusters of thousands commodity machines, optimized for Google's
unusual storage needs - common multi-GB files with the most common
write operation being an append. Maximizing use of bandwidth is
emphasized over minimizing latency. The system relies on a model in
which files are broken up into large chunks, replicated at a few
different computers called chunkservers, managed by a master, which
contains all the file metadata.

Main Result(s):
The system handles read and write throughput at an acceptable fraction
of the maximum theoretical bandwidth. Read capacity is substantially
more than write capacity.

Evidence:
They ran tests on their system in two different settings: a test
cluster of 16 computers specifically for GFS's purpose and a couple of
real world clusters in use at Google. Their evaluation of their
system primarily concerns itself with bandwidth. They also mention a
few tests concerning how their system deals with chunkserver failure
(results about master failure are not present).

Impact:
The system is successful and in use at Google, serving many petabytes
of data.

Prior Work:
NFS? OceanStore?

Competitive Work:
Comparisons with other work are absent from the paper. GFS occupies a
similar role as Dynamo, providing a distributed storage system to
store the data for an organization. GFS, however, is tailored for
almost opposite storage needs.

Reproducibility:
As with Dynamo, the authors describe the design and give a general
architecture for the system, but omit numerous implementation
details. Reproducing the system would require a fair amount of
creative work.

Question:
My understanding was that due to certain relaxed notions of
consistency with the atomic append, reads of the same data region can
return data in different order. Is my understanding correct, and what
effects does this have on applications?

Criticism:
I would have liked to see more detail on the namespace management and
locking.


On Sep 20, 3:31 pm, Rodrigo Fonseca <rodrigo.fons...@gmail.com> wrote:

James Chin

unread,
Sep 20, 2010, 10:53:28 PM9/20/10
to CSCI2950-u Fall 10 - Brown
Paper Title: “The Google File System”

Authors(s): Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung

Date: 2003 (SOSP ‘03)

Novel Idea: This paper describes the Google File System, which Google
uses as the storage platform for the generation and processing of data
used by its service as well as research and development efforts that
require large data sets. The design of GFS has been drive by Google’s
observations of its application workloads and technological
environment that reflect a marked departure from some earlier file
system assumptions. GFS treats component failures as the norm rather
than the exception, optimizes for huge files that are mostly appended
to and then read, and both extends and relaxes the standard file
system interface to improve the overall system.

Main Result(s): GFS has successfully met Google’s storage needs and is
widely used within the company as the storage platform for research
and development as well as production data processing. It is an
important tool that enables Google to continue to innovate and attack
problems on the scale of the entire web.

Impact: GFS demonstrates the qualities essential for supporting large-
scale data processing workloads on commodity hardware. While some
design decisions are specific to Google’s unique setting, many many
apply to data processing tasks of a similar magnitude and cost
consciousness.

Evidence: This paper presents a few micro-benchmarks to illustrate the
bottlenecks inherent in the GFS architecture and implementation. It
also examines real clusters in use at Google in terms of storage,
metadata, read and write rates, the load on the master, and recovery
time.

Prior Work: GFS builds on other distributed file systems like AFS,
xFS, Swift, Frangipani, Intermezzo, Harp, and Lustre. Some of those
systems, like Frangipani and xFS, remove the centralized server and
rely on distributed algorithms for consistency and management. The
authors opt for the centralized approach to simplify the design,
increase its reliability, and gain flexibility. In particular, a
centralized master makes it much more easier to implement
sophisticated chunk placement and replication policies, since the
master already has most of the relevant information and controls how
it changes.

Competitive Work: GFS most closely resembles the NASD architecture.
While the NASD architecture is based on network-attached disk drives,
GFS uses commodity machines as chunkservers, as done in the NASD
prototype. Unlike the NASD work, Google’s chunkservers use lazily
allocated fixed-size chunks rather than variable-length objects.
Also, GFS implements features such as rebalancing, replication, and
recovery that are required in a production environment.

Reproducibility: The findings would be reproducible if one was given
access to GFS.

Question: What security issues does GFS present?

Criticism: GFS is a proprietary file system that is only used
internally by Google.

Ideas for further work: Examine more than 2 GFS clusters at a time.


On Sep 20, 3:31 pm, Rodrigo Fonseca <rodrigo.fons...@gmail.com> wrote:

Tom Wall

unread,
Sep 20, 2010, 4:16:10 PM9/20/10
to CSCI2950-u Fall 10 - Brown
The Google File System
Sanjay Ghemawat Howard Gobioff Shun-Tak Leung
SOSP 2003

Novel Idea:
GFS is a scalable, highly fault tolerant file system that optimized
for google applications. Unlike traditional file systems it is
optimized for huge files, with focus on large sequential reads and
appends (via an atomic record append operation) rather than random
reads and writes.

Main Result:
GFS has proven suitable for Google's needs and has scaled to thousands
of machines and offering hundreds of terabytes of data. Writes are
slower than they would like, but overall the system performs well
enough.

Impact:
GFS is often used hand in hand with MapReduce to simplify distributed
programming. While GFS/MapReduce is proprietary, the open source
implementations of these have been very popular.

Prior Work/Competitive Work:
GFS has taken and adapted many ideas common in the file systems
community. In particular they note similarites fo AFS, xFS,
Frangipani and GPFS. They also note how their architecture is similar
to CMU's NASD storage solution.

Reproducibility:
They do not go into extreme detail with their design and thus it would
probably be difficult to reproduce their findings. Similarly, we have
to trust in their experimental results because (1) they too are
lacking in some details and (2) few have the resources to create such
large clusters and traffic.

Question:
Just a curiosity - they say GFS works well for them because most of
their apps typically append data rather than overwriting. Is this due
to the nature of the apps, or were they designed that way? And if
they were designed that way, was it done to accommodate GFS or vice
versa? How usable is GFS different types of applications?

Criticism:
This paper has similar problems that the amazon paper had of a lack of
detail in some areas. For example, In 6.1.2 they opaquely blame their
network stack for the suboptimal performance of writes. A little
explanation of how their network slowed things down would have been
nice. They also do not show how the system handles concurrent writes
to the same file. Also, it is hard to appreciate the significance of
Table 3 without a more detailed explanation of the work that each of
the two clusters was doing during the measurements. It would have been
better if we saw tests similar to the micro benchmarks on the real
clusters.

On Sep 20, 3:31 pm, Rodrigo Fonseca <rodrigo.fons...@gmail.com> wrote:

Siddhartha Jain

unread,
Dec 13, 2010, 3:32:57 AM12/13/10
to brown-csci...@googlegroups.com
Novel Idea:
The Google File system is described. It is optimized for the workload Google experiences
which involves massive amounts of sequential reads and writes, high failure rate, etc.

Main Results:
The Gfs architecture is described in detail. The fault tolerance policy, redundancy policy
and load balancing policies are described. They present the design choices they had to
make to confirm with Google's needs like high availability

Impact:
Used at Google and has had a lot of impact.

Evidence:
Results are given for how close the read/write rate gets to the theoretical optimum based
on network topology. Depending on the number of clients, they seem to get from 1/3-more than 1/2
of the theoretical maximum. Some issues that came up while designing GFS are described.

Prior Work:
Ton of other distributed file systems. What's special about GFS is that it is the first
one that is optimized towards the needs of a company like Google.

Reproducibility:
While GFS is proprietary, an open source implementation, Hadoop File system is not.


On Mon, Sep 20, 2010 at 3:31 PM, Rodrigo Fonseca <rodrigo...@gmail.com> wrote:
Reply all
Reply to author
Forward
0 new messages