Review: BigTable

297 views
Skip to first unread message

Rodrigo Fonseca

unread,
Sep 22, 2010, 9:09:53 PM9/22/10
to CSCI2950-u Fall 10 - Brown
Hi, as usual, please post your review as a group reply to this
message.

Thanks,
Rodrigo

Matt Mallozzi

unread,
Sep 22, 2010, 11:58:07 PM9/22/10
to brown-csci...@googlegroups.com
Matt Mallozzi
9/22/10

Title:
Bigtable: A Distributed Storage System for Structured Data
Authors:
Chang, Dean, Ghemawat, Hsieh, Wallach, Burrows, Chandra, Fikes, Gruber
Date:
2006
Novel Idea:
A storage system for structured data built on top of a high-throughput
distributed file system (GFS) and a highly available distributed lock
manager (Chubby), which supports loads requiring low latency and/or high
throughput.
Main Results:
A system that had been in production use at Google for a reasonable amount
of time.  The API is abstract enough to be applicable to many Google
products and services, and powerful enough to allow clients to take
advantage of the performance it has to offer.
Impact:
This creates a system that can be widely used by a whole host of
applications - both those that serve data to a user and require low latency,
and those that are involved in batch processing and therefore require high
throughput.
Evidence:
Both test instances and real-world instances. The experiments found that
writing is quite fast, thanks to the commit log that groups multiple
commits into one GFS transaction.
Prior Work:
Boxwood was a similar project, although Boxwood exposed lower level details
of the system - it was intended to serve as the foundation for a database
or file system, whereas Bigtable is the basis for applications. Similarities
are noted to distributed databases, while stark contrasts to distributed
hash tables are noted.
Competitive Work:
Boxwood - Bigtable brings the flexibility of Boxwood to a higher level to
allow common application developers at Google ot take advantage of the
system. Distributed databases are the most similar work after Boxwood,
although they provide more complex semantics than Bigtable which were
determined to be mostly unnecessary for Google's applications.
Reproducibility:
Largely not - many details were omitted. Also, it makes heavy use of other
Google-built services whose reproducibility is also questionable, so
reproducing these results would involve a lot more than this paper
describes.
Question:
To guard against stale client caches, could an idle client perform a
low-frequency, exponential-backoff ping to keep its cache relatively
fresh?  Anything to avoid six round trips over the network...
Criticism:
Each machine in the test instance had enough physical memory to hold the
working set of all running processes - is this a stretch, or are machines
like this common?
Ideas for Further Work:
Experiment with idle client cache-freshening schemes.  See if support for
more arbitrary transactions can be built in.

James Chin

unread,
Sep 22, 2010, 9:22:31 PM9/22/10
to CSCI2950-u Fall 10 - Brown
Paper Title: “Bigtable: A Distributed Storage System for Structured
Data”

Authors(s): Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh,
Deborah A. Wallach, Mike Burrows, Tushar Chandra, Andrew Fikes, and
Robert E. Gruber

Date: 2006 (OSDI 2006)

Novel Idea: This paper describes Bigtable, a distributed storage
system for managing structured data that is designed to scale to a
very large size: petabytes of data across thousands of commodity
servers. A Bigtable is a sparse, distributed, persistent
multidimensional sorted map.

Main Result(s): Many Google applications place very different demands
on Bigtable, both in terms of data size and latency requirements.
Despite these varied demands, Bigtable has successfully provided a
flexible, high-performance solution for all of these Google products.
BIgtable has achieved several goals: wide applicability, scalability,
high performance, and high availability.

Impact: Bigtable’s users like the performance and high availability
provided by the Bigtable implementation, and they like how they can
scale the capacity of their clusters by simply adding more machines to
the system as their resource demands change over time.

Evidence: The authors set up a Bigtable cluster with N tablet servers
to measure the performance and scalability of Bigtable as N is
varied. Also, as of August 2006, there were 388 non-test Bigtable
clusters running in various Google machine clusters, with a combined
total of about 24,500 tablet servers. Services likes Google
Analytics, Google Earth, and Personalized Search perform have
performed well on Bigtable clusters.

Prior Work: Bigtable builds on previous “Internet-scale” work,
including that of distributed hash tables that began with projects
such as CAN, Chord, Tapestry, and Pastry.

Competitive Work: The Boxwood project has components that overlap in
some ways with Chubby, GFS, and Bigtable, since it provides for
distributed agreement, locking, distributed chunk storage, and
distributed B-tree storage. Several database vendors have also
developed parallel databases that can store large volumes of data.
For instance, Oracle’s Real Application Cluster database uses shared
disks to store data (Bigtable uses GFS) and a distributed lock manager
(Bigtable uses Chubby).

Reproducibility: It would be difficult to reproduce the findings in
this paper, as Bigtable is built for internal use within Google..

Question: How does Bigtable compare to other distributed storage
systems in terms of applicability, scalability, performance, and
availability?

Criticism: Bigtable is a proprietary file system that is only used
internally by Google.

Ideas for further work: Finish implementing support for secondary
indices and infrastructure for building cross-data-center replicated
Bigtables with multiple master replicas (if this hasn't been done
already).

Visawee

unread,
Sep 22, 2010, 9:27:21 PM9/22/10
to CSCI2950-u Fall 10 - Brown
Paper Title :
Bigtable: A Distributed Storage System for Structured Data

Author(s) :
Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A.
Wallach
Mike Burrows, Tushar Chandra, Andrew Fikes, Robert E. Gruber

Date :
OSDI 2006

Novel Idea :
Big table is a distributed storage system for managing sturctured data
that can scale to a very large size.
It provides clients with a simple data model that supports dynamic
control over data layout and format (3 dymensional data - row, column,
and timestamp).
Clients communicate directly with tablet servers for reads and writes.
Big table also let clietns dynamically control whether to serve data
out from disk or from memory.

Main Result(s) :
Big table scale to serve large workloads very well. The bigest cluster
has 8069 tablet servers, and can serve more than 1.2 million requests
per second, with incoming RPC traffic of about 741 MB/s and outgoing
RPC traffic of about 16GB/s.
Throughputs of writes are better than the reads because of the design.
Throughputs of sequential reads is better than random reads.
Throughputs of sequential writes and random writes are almost the
same.
The simple data model of the Big table also can serve various kind of
workloads within Google from batch processing workloads to interactive
application workloads.

Impact :
Developers has option of using data structure that is more complex
than key-value pairs and is able to scale to serve a very large size
at the same time.

Evidence :
The author set up experiments on 1786 machines with various kind of
workloads. Every machine ran a GFS server. Some of the machines also
run either a tablet server, or a client process.
The reads and writes throughputs in the main result section is
obtained from these experiments. The results of the experiments also
support the authors' claim about the scalability of the system. It
scales very well, eventhough not linearly.
Moreover, the fact that it is being used by many projects in Google
shows that it's simple data model and it's performance are efficient
enough to support various kind of applications very well.

Prior Work :
Bigtable uses GFS as its persistent storage. It uses Chubby as its
distributed lock service. It also uses Bloom filters to reduce the
number of disk accesses when a tablet server looks for a specific row/
column pair.

Competitive work :
Parallel databases : they provide a complex relational model with
transactions while Big table provides a simple data model and no
transaction support. However, Big table can scale to a very large size
very well, and it's simple data model
is proved to be efficient becuase it's successfully used by many
projects within Google.
Distributed hashtables : Both distributed hashtables and Bigh table
can scale very well. However, distributed hash tables provide only key-
vaule pairs data structure while Big table provides more complex data
structure than that (row, column, and timestamp).

Reproducibility :
Although the experiments are very clear, The results are not
reproducible.
The authors didn't mention all detail implementations of the Bigtable.

Criticism :
The experiment did not show how fast Big table can recover from
failure.

Ideas for further work :
- Secondary index as mentioned in the paper.

On Sep 22, 9:09 pm, Rodrigo Fonseca <rodrigo.fons...@gmail.com> wrote:

Sandy Ryza

unread,
Sep 22, 2010, 11:26:23 PM9/22/10
to CSCI2950-u Fall 10 - Brown
Title:
Bigtable: A Distributed Storage System for Structured Data

Authors:
Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A.
Wallach, Mike Burrows, Tushar Chandra, Andrew Fikes, Robert E. Gruber

Date:
OSDI 2006

Novel Idea:
The authors present a distributed multi-dimensional sorted map for use
in managing enormous data using large clusters of commodity machines.
Their architecture relies on a set of tablet servers, each responsible
for serving the data for a set of rows (a tablet), and a single master
server which is responsible for assigning tablets to tablet servers,
schema modifications, and a few other tasks.

Main Result(s):
The authors state that the system they built achieved their original
goals, i.e. wide applicability, scalability, high performance, and
high availability.

Evidence:
The authors provide an performance evaluation which presents
statistics on how many basic operations (reads and writes, both random
and sequential) the system can handle per second with different
numbers of tablet servers. Test were run on top of a GFS cluster of
1786 machines. Noticeably absent are tests dealing with failure or
with concurrent basic operations of different types.

Impact:
The system is used heavily at Google, often in conjunction with
MapReduce, for a variety of large-scale data processing tasks. At the
time of the paper's writing, it was used by more than 60 products at
Google.

Prior Work:
The system relies on GFS for persistent storage. It handles locking
using Chubby, a distributed lock service. Google's SSTable file
format, which provides an ordered immutable map, is used to store the
data. It was also designed with MapReduce in mind, and has wrappers
which allows it to be easily used as both an input and output source
for it.

Competitive Work:
Bigtable's most obvious competitors are traditional databases, which
provide fuller transaction support than Bigtable. Bigtable's
advantage lies in better availability and scalability, as well as
being optimized for Google's particular needs. The authors also
mention the Boxwood project, which also provides a distributed storage
platform with agreement and locking mechanisms, but is somewhat lower
level.

Reproducibility:
While the paper does provide more detail than the other corporate
papers I've read, including a number of optimizations and particular
issues ranging from compression to the commit-log implementation,
sufficient detail is still not provided to fully reproduce the system.

Criticism:
I would have liked to see more discussion of the client library and
more about the responsibilities it shoulders (e.g. caching).

Questions & ideas for further work:
Although most accesses do not involve it, I imagine that a single
master eventually imposes some scalability limitations (as in GFS). I
would be interested in what point these limitations would arise at,
and, for further work, what the challenges of a multi-master system
would be.


On Sep 22, 9:09 pm, Rodrigo Fonseca <rodrigo.fons...@gmail.com> wrote:

Basil Crow

unread,
Sep 23, 2010, 2:26:57 AM9/23/10
to brown-csci...@googlegroups.com
Title: Bigtable: A Distributed Storage System for Structured Data

Authors: Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Mike Burrows, Tushar Chandra, Andrew Fikes, Robert E. Gruber

Date: OSDI 2006

Novel idea: Bigtable is a database system which trades off conventional RDBMS features (such as a full relational data model and transactions) in exchange for scalability and high performance. The authors describe Bigtable as a "sparse, distributed, persistent multidimensional sorted map" with compression in which clients can control the locality of their data.

Main results: As of August 2006, there were 388 non-test Bigtable clusters running in various Google machine clusters. For exampole, the Google Earth team uses Bigtable to serve tens of thousands of queries per second with low latency on a 500 GB database table.

Impact: Bigtable allows application developers to store data in a flexible way while also worrying less about potential scalability issues.

Evidence: The authors tested random and sequential reads/writes as well as scans from both single tablet servers in the case of both single and aggregate tablet servers. They found that performance increased significantly (though not linearly) as the number of tablet servers increased. The authors also described the manner in which several internal groups used Bigtable (Google Analytics, Google Earth, and Personalized Search).

Prior work: Bigtable is based the Google File System (GFS), Chubby Lock Service, and a few other Google programs. In terms of prior art, Bigtable goes beyond distributed B-trees and distributed hash tables, which the authors claim are "too limiting."

Competitive work: Bigtable shares similarities with Boxwood, but it seeks to present an API at a lower lever (i.e., directly to application developers). However, it is less flexible than Oracle's Real Application Cluster and IBM's DB2 Parallel Edition (both of which provide a complete relational model with transactions).

Reproducibility: It's not possible to reproduce Bigtable directly, but there are several related/similar open source projects (HBase, Hypertable, Apache Cassandra, Neptune, and KDI).

Question: Is compression as much of a concern nowadays as it was in 2006?

Criticism: The authors discuss in the "Lessons" section that large distributed systems are vulnerable to many types of failures, and even give a few examples (memory and network corruption, large clock skew, hung machines, etc.). However, they don't give any details of these scenarios. It would be nice to have a few examples. However, their choice is understandable given the length of the paper.

Zikai

unread,
Sep 22, 2010, 9:15:47 PM9/22/10
to CSCI2950-u Fall 10 - Brown
Paper Title: Bigtable: A Distributed Storage System for Structured
Data
Author(s): Fay Chang (Google) et.al
Date/Conference: OSDI 2006

Novel Idea:
(1) The new data model (multiple dimensional sorted map) allows
managing structured data like relational data model. Moreover, it
supports dynamic control over data layout and format, control over
data locality and control whether data is served from memory or disk.
(2) Perform access control and disk/memory account at column-family
level. This allows applications to customize and increases
flexibility.
(3) Each cell in Bigtable keeps multiple versions of same data to
allow easy collision resolution.
(4) Use distributed lock service (Chubby) to assist control flow of
the master and tablet servers, and decreases the load on the master.
(5) Keep recent data mutations in sorted buffer in main memory
(memtable) and older updates in SSTables on GFS. This allows easy
recovery and fast write/read.
(6) Use bloom filter to avoid lookups for non-existent rows/columns
from reaching disk.

Main Results:
(1) Propose the new data model suitable for managing structured data
in distributed storage system.
(2) Based on the data model, design Bigtable system architecture to
utilize advantages of the model and achieve wide applicability,
scalability, high performance and high availability.
(3) Perform a serial of optimizations necessary for the design goals
(4) Implement the design, deploy it intensively in Google production
environments and make evaluations.

Evidence: In part 7, random reads, random reads with data in memory,
read writes, sequential reads, sequential writes and scans are used as
main performance metrics to evaluate Bigtable. Both per tablet server
performance and aggregate performance (for up to 500 tablet servers)
are tested. The experiments demonstrate high performance and
scalability of the system.

Prior Work: Bigtable directly relies on Google’s GFS and Chubby
project.

Competitive Work: In part10, competitive work is discussed (for
example in distributed storage, higher-level services over wide area
networks, parallel database, etc.)

Reproducibility: (1) Reproducibility of Bigtable relies on
reproducibility of GFS and Chubby. Fortunately, HDFS is an open source
implementation of GFS and Zookeeper can be use to replace Chubby. (2)
From Bigtable paper, it is not difficult to implement basic Bigtable
architecture. However, to achieve the high performance of Google’s
production system, a number of optimizations have to be done. The
paper’s discussion on optimization part is not detailed enough.
Therefore, it can be expected that great efforts have to be taken to
achieve all optimizations. (4) The evaluation part relies on a huge
GFS cluster (1786 nodes). Meanwhile, features of data (workload) used
in benchmark is unknown.

Question: (1) How can Bigtable be used together with MapReduce? The
API doesn’t seem a good fit for MapReduce programs.
(2) How can users choose the row name, row range, column families and
columns to achieve flexibility of representing data and performance
advantages like locality?

Criticism: (1) In the evaluation part, no experiment is set to test
availability of Bigtable. For example, failure of master/tablet server/
Chubby server and reassignment and balancing of tablets are not
tested. Moreover, in the experiments to demonstrate high performance
and scalability, test data is not introduced at all.
(2) Though Bigtable is used in a number of Google projects like Google
Earth or Google Analytics, wide applicability of the system is still
not proved sufficiently. For example, how can users choose row and
column settings to achieve their specific goals? Lack of guidelines to
build a specific data model is a big obstacle to usability.




On Sep 22, 9:09 pm, Rodrigo Fonseca <rodrigo.fons...@gmail.com> wrote:

Siddhartha Jain

unread,
Sep 22, 2010, 11:57:42 PM9/22/10
to brown-csci...@googlegroups.com
Siddhartha Jain


Title: Bigtable: A Distributed Storage System for Structured Data

Novel Idea:
The paper described BigTable, the system Google uses to store structured data designed for very large amount of data distributed
across thousands of computers. They target the space between distributed hash tables and distributed trees (too limited in the different
ways data can be stored) and full RDBMSs.

Main Results:
The authors describe the various aspects of the design in good detail.

Impact:
While not introducing any fundamentally new ideas, the paper is useful for learning the some design decisions which may apply
to other situations and challenges the authors faced. The authors have an entire section on the lessons they learnt
from designing BigTable which is nice.

Evidence:
The authors give some experimental evidence as to how their design behaves. They also give some statistics
for the type of data stored in BigTable. The experimental evidence for performance seems to be particularly lacking though.
They don't compare against anything and they essentially have a table and a graph to demonstrate the performance of their
system. They do give a few additional numbers and description of behavior of the system as it scales in the text itself.
A strong point is that they describe quite a few applications that use BigTable and how they use it.

Prior Work:
A lot of prior work on distributed storage in the form of distributed hash tables, RDBMSs, etc. The authors mention analogous
ideas in the different systems but they target a new space.

Reproducibility:
Not reproducible as the code is not available. The design would have to be given in much greater detail for verification to
be possible at all. One might be able to check the wisdom of some design decisions by implementing a similar system and duplicating
an analogous workload especially since the authors give some idea of how BigTable is used.

Question:
The authors mention that for 0.0047% of the server hours, data was unavailable due to Chubby unavailability. What's the percentage for
the unavailability due to Chubby out of the total unavailability rather than the total number of server hours - i.e. how responsible is
Chubby for unavailability in context of the BigTable clusters compared to other causes?
How does performance change as the cap on the size of a tablet changes?

Criticism:
Too little performance analysis - even for an industrial paper. Failure policy is not discussed in much detail at all.

Ideas for future work:
How quick is it to change the key to move rows closer to one another? A specialized strategy for that might be explored and might be useful when
storing data like news where one might want to move similar news articles close to one another to exploit locality.
Each tablet is assigned to only one tablet server at a time. Maybe exploring a strategy for assigning a single tablet to multiple servers
would be nice when for instance there are rows that are accessed a lot together so it's better for them to be in the same tablet but one server
is not enough to take the load for all of them.

On Wed, Sep 22, 2010 at 9:09 PM, Rodrigo Fonseca <rodrigo...@gmail.com> wrote:

Hammurabi Mendes

unread,
Sep 22, 2010, 9:53:43 PM9/22/10
to brown-csci...@googlegroups.com
Paper Title

Bigtable: A Distributed Storage System for Structured Data

Authors

Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A.

Wallach Mike Burrows, Tushar Chandra, Andrew Fikes, Robert E. Gruber.

Date

7th USENIX Symposium on Operating Systems Design and Implementation

Novel Idea

Bigtable is a distributed storage system in which entries are indexed
by arbitrary row and column identifiers, and a timestamp.

Main Results

The system provides the previous interface while its design allows the
system to reach the multi-petabyte scale in a cluster of machines. The
paper also describes many refinements such locality groups and
insightful compression mechanisms that improve the system performance.

Impact

By describing a system conceived to deal with Google's workload, the
paper demonstrates that centralized, structurally simple designs can
be formulated to scale and be applied in practice, if the right design
decisions are taken to alleviate the load on the central parts, and
the overall structure is tightly coupled to the workload in question.

Evidence

The authors decide on using 6 different kinds of operations to analyze
the system performance. Those are random reads (both when the data
must be fetched from GFS and when it is in memory), random writes,
sequential reads and writes, and scans (sequentially reading all the
values within many rows).

The data shows the system scales. For instance, the rate of operations
per tablet server when using 500 tablet servers only drops to about
1/2 to about 1/5 of the load when using a single tablet server.

The paper also shows how real Google applications benefit from the
system, showing its practicality.

However, some aspects related to fault tolerance were not explored in
the experimental section (see section "Questions + Criticism").

Prior Work

The paper relates Bigtable's feature of "locality groups" with a
system called C-Store, which organizes data by column. (Others are
also mentioned in this matter: Sybase IQ, SenSage, KDB+.)

They also mention how C-Store and Bigtable have similar ways to store
things, but they make a difference on the API provided by each system
(C-Store provides a read-optimized relational model, while Bigtable
provides simpler, throughput-optimized operations).

Competitive Work

The paper cites the Boxwood project as a possible competitor, but they
note that while this project aims at providing a basis for
higher-level services, Bigtable is directly aimed at Google's
application needs.

The paper also cites industry database vendors, such as Oracle and
IBM, but implies that these aim at different goals (namely, the
transactional relational model). Moreover, they mention that DHTs
provide much more features than what Bigtable actually needed.

Reproducibility

If the system itself and enough machines were available, the
experimental methodology is reproducible. Some data provided about
fault tolerance, however, is clearly not reproducible (see below).

Question + Criticism

[Criticism] The authors do not provide experimental results regarding
issues related to faults. They provide some data when they describe
Chubby, but the corresponding experiments are not discussed.
[Questions] Without such analysis, many questions could be asked.
What's the overall system availability? How much does it depend on, or
it is affected by, the availability of Chubby and GFS?

Ideas for Further Work

I wonder if it is viable that the system could identify locality
groups and decide their tuning parameters dynamically, based on the
requests made by an application. Perhaps the client library could help
on this.

However, it can be the case that the system was designed assuming that
no one besides the application should know better about its optimal
locality group setting. But if the access depend on user's input, this
feature might be useful.


On Wed, Sep 22, 2010 at 9:09 PM, Rodrigo Fonseca
<rodrigo...@gmail.com> wrote:

Tom Wall

unread,
Sep 22, 2010, 10:35:28 PM9/22/10
to CSCI2950-u Fall 10 - Brown
Bigtable: A Distributed Storage System for Structured Data
Fay Chang, Jefferey Dean, Sanay Ghemawat, Wilson C. Hsieh, Deborah A.
Wallach, Mike Burrows, Tushar Chandra, Andrew Fikes, Robert E. Gruber
OSDI 2006

Novel Idea:
Bigtable is a highly scalable storage system built on top of GFS. It
offers a very simple and flexible data model while still remaining
scalable and fast. It gives client applications control over how and
where data is stored.

Main Result:
They have implemented Bigtable and put it to use in a wide variety of
applications which all have varying requirements. It is fast enough
for customer facing applications but also scalable enough to do bulk
data processing. It was nice that they had a section dedicated to
their production systems using bigtable. The fact that there are 388
bigtable instances running production systems proves just how
effective the system is.

Impact:
Lots of people have inadvertently benefited from BigTable because it
powers so many google services.
The level of control client applications have over the storage of
their bigtable data is a nice feature that could be used in other
similar systems.

Evidence:
They provide a set of benchmark tests that run on an average (for
Google) sized cluster. They show that their throughput is pretty good
in most cases. They don't really touch on latencies with these tests.
They also provide some anecdotal evidence from many production
systems.

Reproducibility:
Not very reproducible. It is a proprietary system that operations on
unknown sized clusters. We have to trust their numbers because there
isn't really a way for anyone else to verify them. Despite this, the
information the do provide would be at least a good starting point for
the creation of a similar system.

Prior/Competitive Work:
The system shares many features with both row and column oriented
databases. They mention the Boxwood project as being similar to the
BigTable/GFS/Chubby suite, albeit at a lower level. Also, there are
various commercial alternatives to their BigTable, including IBM's DB2
and Oracle's Real Application Cluster Database.

Questions/Criticisms:
- They do not go into detail about the central cluster management
system that is necessary for BigTable operation. Does this introduce
a bottleneck/point of failure?
- Their benchmarks are somewhat lacking. For example, they do not do
anything to test their object versioning system.
- It would be nicer if the user had better control over the dynamic
row partitioning. In their example, they experienced higher data
locality by using a reversed URL as the row key because rows are order
lexicographically by key. Why should you have to name your keys in
such a way to do this? It seems awkward.

Future Work:
Releasing it to the public! There are many open source projects that
aim to provide some of the things that gfs/chubby/bigtable do in
various states of completion.

On Sep 22, 9:09 pm, Rodrigo Fonseca <rodrigo.fons...@gmail.com> wrote:

Shah

unread,
Sep 22, 2010, 11:25:20 PM9/22/10
to CSCI2950-u Fall 10 - Brown
Title:

Bigtable: A Distributed Storage System for Structured Data

Authors:

[1] Fay Chang
[2] Jeffrey Dean
[3] Sanjay Ghemawat
[4] Wilson C. Hsieh
[5] Deborah A. Wallach
[6] Mike Burrows
[7] Tushar Chandra
[8] Andrew Fikes
[9] Robert E. Gruber

Source and Date:

OSDI'06: Seventh Symposium on Operating System Design and
Implementation, Seattle, WA, November, 2006.

Novel Ideas:

Big Table introduces several new ideas including the use of a multi-
dimensional map that uses rows, columns and timestamps to provide a
distributed storage system.

Main Result:

This paper details the functioning of Bigtable - Google's distributed
storage system for managing large amounts of data. It is very scalable
and employs commodity machines as servers. It also resembles a
database.

Impact:

As is mentioned by the authors early in the paper - Bigtable is used
more than sixty Google products including Google Analytics, Google
Finance, Orkut and Google Earth. Since of all these products are used
widely, it has a global impact.

Evidence:

The scientists perform two series of experiments in order to validate
their claims.

Prior Work:

Though Bigtable is a first in its league, it's built on several other
technologies. They are Google File System (GFS) (to store and retrieve
files), Google SSTable (that provides a translation from keys to
values) and Chubby (a distributed lock service).

Competitive Work:

The authors state that there is some similarity with Boxwood - a
similar project spearheaded by Microsoft Research. However, they state
there is a distinction between the two - Bigtable provides client
applications that wish to store data while Boxwood provides
infrastructure for building higher-level services.

Reproducibility:

Like so many papers written by corporations, the scientists tend to
omit important details that make the experiments reproducible.

Question:

N/A

Criticism:

As with other work from corporations, Bigtable is heavily "Google-
centric" - and does not necessarily provide the clear insight as to
its exact workings.

Ideas for Further Work:

N/A

Abhiram Natarajan

unread,
Sep 22, 2010, 9:53:32 PM9/22/10
to brown-csci...@googlegroups.com
Paper Title: Bigtable: A Distributed Storage System for Structured Data

Author(s): Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, 
Deborah A. Wallach, Mike Burrows, Tushar Chandra, Andrew Fikes, 
Robert E. Gruber

Date: 2006, OSDI

Novel Idea: Using hashing to build a non-relational database, being able 
to identify values uniquely by timestamp (almost)

Main Result(s): A sparse, distributed, persistent and multi-dimensional 
and sorted map. Data is replicated three times.

Impact: Project like Google Earth, Google Finance, etc, use BigTable 
to effect

Evidence: It is has high applicability, scalability; superior performance 
and availability; more than 60 google products and projects use it.

Prior Work: The Boxwood project is of direct relevance. Projects like 
CAN, Chord, Tapestry, Pastry, etc have succeeded in providing 
distributed storage on higher-level services at Internet Scale. Oracle 
and IBM have products that develop parallel databases that can store 
large volumes of data. Other than that, C-STore, AT&T's Daytona 
database, etc are of relevance.

Competitive Work: The adequately elaborate performance evaluation 
mechanisms inspects various aspects of the system and gives evidence 
of the performance efficiency that is claimed. Other than that, they 
report good compression factors of the databases for Google Analytics, 
Google Earth, Orkut, Personalised Search, Google Base, etc.

Reproducibility: Like in most of the papers we have been reading, parts 
of the system are reproducible.

Criticism: No criticism, I specifically liked the short-and-sweet 
explanation in the API section (section 3).

On Wed, Sep 22, 2010 at 9:09 PM, Rodrigo Fonseca <rodrigo...@gmail.com> wrote:

Duy Nguyen

unread,
Sep 22, 2010, 11:47:57 PM9/22/10
to brown-csci...@googlegroups.com
Paper Title
"Bigtable: A Distributed Storage System for Structured Data"

Authors

Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach
Mike Burrows, Tushar Chandra, Andrew Fikes, Robert E. Gruber

Date
2006/OSDI

Novel Idea
A distributed storage system that offers near-SQL relationship in which data is mapped by key/multiple-value scheme.
It is designed to store petabyte data and is built on top other Google products: Chubby, GFS.

Main Results/Impact
The ideas described in paper is straightforward (assumed that we have Chubby and GFS are up & running).
But I think the biggest achievement for Google is they really know how to design/implement many products with specific features,
then combine them to create a scalable system.

Evidence
Google Earth and others are using BigTable so I think the evidence is obvious enough.
They also provide some evalution but I think it misses some detail.

Prior/Competitive Work
BigTable is built on Chubby & GFS, and uses BTree-similar data structure to store tablet location.
They also mention that Boxwood share some common features and briefly discuss about CAN, Chord,
Tapestry where they points out these projects address diff problem as BigTable does?!

Reproducibility
Not possible.

Question
None

On Wed, Sep 22, 2010 at 9:09 PM, Rodrigo Fonseca <rodrigo...@gmail.com> wrote:

Dimitar

unread,
Sep 22, 2010, 11:29:54 PM9/22/10
to CSCI2950-u Fall 10 - Brown
Bigtable: A distributed Storage System for
Structured Data

Authors: Fay Chang, Jeffrey Dean, Sanjay Ghemawat,Wilson C. Hsieh,
Deborah A. Wallach,
Mike Burrows, Tushar Chandra, Andrew Fikes, Robert
E. Gruber

Novel Idea: Bigtable is a distributed storage system design
specifically to store structured data.
Bigtable implementation has many common points with parallel and main-
momory databases which
have high availability and performance. But the key difference
between them is that Bigtable
doesn't support full relational mode. Instead, it provides clients
with a simple data model and format,
and allows clients to control the properties of the data stored.

Main Results :Bigtable has high availability, scalability and
performance even with random reads
and writes.

Impact: : Production quality distributed storage system that can
service many Google’s applications
such as Orkut and Google Earth. In 2006, 368 production cluster were
using Bigatable.

Evidence: The authors provided experimental test data and production
workload data to support their
claims. In section 7 , they show how their system perform based on the
number of tablet servers in
their cluster. The data suggests that the system has high performance
except when is doing random
reads. The authors explained that the slow random reads are due to the
transfer of a 64KB SSTable
block from GFS. The data also shows high scalability of their design.
In Section 8, authors give an
example of their production workload; 14 clusters with 8069 tablets
server serving 1.2 mil requests
per second.

Reproducibility: The work can be reproduce because the paper gives
detailed explanation
of the implementation, but the production workload data will be hard
to reproduce.

Competitive work : C-Store has many of the properties of Bigtable
like share-nothing architecture,
two different data structures, but unlike Big table C-Stores supports
relational model.

Question: how the master server does load balancing when clients have
no direct communication
with the master?

Criticism: Load balancing should be better explained, and I believe
that in their performance test
they should have compared Bigtable to similar distributed storage
system.

On Sep 22, 9:09 pm, Rodrigo Fonseca <rodrigo.fons...@gmail.com> wrote:

Jake Eakle

unread,
Sep 23, 2010, 12:03:38 AM9/23/10
to brown-csci...@googlegroups.com
Paper Title

Bigtable: A Distributed Storage System for Structured Data

Author(s)

Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach 

Mike Burrows, Tushar Chandra, Andrew Fikes, Robert E. Gruber

Date 2006
Novel Idea A distributed data store, built on top of GFS (and with Chubby in between), that provides row/column access for applications that need to structure their data, but are willing to trade full sql access for extreme scalability, low latency, and high availability.
Main Result(s) They describe Bigtable, which uses a bunch of tricks to achieve its goals. Google wants basically all of its applications to want to use it, so it provides fairly minimal structure: keys and data are all just uninterpreted bytes, everything is stored lexicographically and it's up to the application to provide locality if it cares to, and rather than a relationally-complete query language, they support regexes on row keys and grouping columns together into families, but not much else.

They want extreme scalability, which they achieve by partitioning tables into tablets and storing different tablets on different machines, managed by a single (but replaceable) master. The master almost never has to interact with clients, and thus cannot become a bottleneck.  Bigtable does not need to provide additional data replication, since Bigtable is already on top of Chubby and GFS. It does need to care about fragmentation and tablet server load though, so it splits and merges tablets when they cross size thresholds in the appropriate directions.

They devote a good amount of space to describing optimizations on the basic structure, including: a client-configurable compression scheme that makes it easy to take advantage of locality (and in some cases the properties of the data e.g. web pages from the same domain with boilerplate in common) to achieve extremely high compression ratios very quickly; two different kinds of caching, one for commonly-accessed cells and one for data near recently accessed data; locality-based filters that for some applications can avoid huge numbers of disk accesses by assuring Bigtable that the requested data isn't in that part of the table.
Impact I never have anything interesting to write in this section.
Evidence They do pretty in-depth performance analysis on a number of axes at each of 1, 50, 250, and 500 tablet servers, giving a good sense of where the bottlenecks are (random reads) and how each is affected by scaling (most get a lot better; random reads still don't scale well).

They also give a good overview table of actual applications and their bigtable usage statistics. This shows that Bigtable is pretty adaptable, as the applications range widely in space usage, compressibility, column density, etc.
Prior Work The most interesting note in their Related Work section is the one where they say that bigtable's row/column structure is nearly as lightweight as {Chord, Tapestry, Pastry}'s key/value structure (and thus still suitable for widely distributed use) but significantly more powerful and useful for developers. Perhaps those projects would be in more widespread use if they adopted bigtables model, and added their trademark ability to function in untrusted and highly variable environments?
Reproducibility They give a little bit of code! Otherwise, same old, same old.
Question Why don't more things use this row/column structure?
Criticism It'd be nice if they explained what Chubby was in a bit more detail at first. It becomes sort of clear as they introduce each way it is used, but the first paragraph they mentioned it in pretty much lost me.
Ideas for further work Put row/column structure on top of Tapestry, etc.; see if this is more awesome.




--
A warb degombs the brangy. Your gitch zanks and leils the warb.
Reply all
Reply to author
Forward
0 new messages