Flume comparison to Chukwa (was Re: Flume comparison to Scribe)

4,476 views
Skip to first unread message

Jonathan Hsieh

unread,
Jul 8, 2010, 3:53:24 AM7/8/10
to Kimsterv, Flume Users
Hi Kim, 

Flume and Chukwa have different problem scopes, and slightly different
goals.  Both are intended to help collect logs from clusters.

Roughly the Chukwa story is: 
* Get data to the centralized store, and do periodic near-real-time
  analysis.

At the same level of granularity, the Flume story is: 
* Reliably get data to the centralized store, enable continuous near
  real-time analysis, and enable periodic batch analysis.

1)  Architecture and Near-realtime. 

* Chukwa's near real-time == minutes
* Flume's near real-time == seconds (hopefully milliseconds).

Both systems have a agent-collector topology for nodes.  Architecturally,
Chukwa is a batch/minibatch system. In contrast, Flume is designed 
more as a continuous stream processing system. 

2) Reliability 

Flume's reliability levels are tunable, just pick the appropriate sink
to specify the mode you want to collect data at.  It offers three
levels -- best effort, store+retry on failure, and end-to-end mode
that uses acks and a write ahead log.

AFAICT Chukwa is best effort from agent to collector, writes to local
disk at the pre-demux collector, and then finally becomes reliable
when written to hdfs. This seems stronger than scribe's reliability
mechanisms (equiv to Flume's Store on failure), but weaker than
Flume's end-to-end reliability mode (write ahead log and acks). 

3) Manageability

Flume just requires the deployment of a master (or set of masters) and
nodes.  It then provides a centralized management point that allows
you to configure nodes dynamically, and to reconfigure the data flow
topology dynamically.

Chukwa's deployment story is restrictive and more complicated than
Flume's.  It only supports a agent/collector topology.  Despite this
restriction, it requires the depoyment of more different programs than
flume -- agents, collectors, a console daemon.  For Chukwa to work as
intended, it also has dependencies on a hadoop cluster (both hdfs and
mapreduce) and a MySQL database.

4) Support.
Cloudera packages Flume as part of its distribution of Hadoop and will
provide commercial support for users of the system. 

In two years at Cloudera, we've seen zero production installations of 
Chukwa at enterprises and no commercial support vendors.

Hope it helps!

Jon.

On Tue, Jul 6, 2010 at 6:12 PM, Kimsterv <kim....@gmail.com> wrote:
And perhaps chukwa too?

http://wiki.apache.org/hadoop/Chukwa


--
// Jonathan Hsieh (shay)
// j...@cloudera.com

Ari Rabkin

unread,
Jul 13, 2010, 5:01:08 PM7/13/10
to Flume Users
Howdy!

Found this post via links from the Cloudera blog post announcing
Flume. Wanted to supply a few notes and corrections about Chukwa.

On Jul 8, 12:53 am, Jonathan Hsieh <j...@cloudera.com> wrote:
> Hi Kim,
>
> Flume and Chukwa have different problem scopes, and slightly different
> goals.  Both are intended to help collect logs from clusters.
>
> 1)  Architecture and Near-realtime.
>
> * Chukwa's near real-time == minutes
> * Flume's near real-time == seconds (hopefully milliseconds).

This isn't quite right. Chukwa has an option for fast delivery --
seconds to milliseconds.

> 2) Reliability
>
> Flume's reliability levels are tunable, just pick the appropriate sink
> to specify the mode you want to collect data at.  It offers three
> levels -- best effort, store+retry on failure, and end-to-end mode
> that uses acks and a write ahead log.
>
> AFAICT Chukwa is best effort from agent to collector, writes to local
> disk at the pre-demux collector, and then finally becomes reliable
> when written to hdfs. This seems stronger than scribe's reliability
> mechanisms (equiv to Flume's Store on failure), but weaker than
> Flume's end-to-end reliability mode (write ahead log and acks).

This isn't right at all. Chukwa supports a range of reliability
mechanisms, including write-ahead logging at the agent.
There is not now, nor ever was, a write to local disk at the
collector.

> 3) Manageability
>
> Chukwa's deployment story is restrictive and more complicated than
> Flume's.  It only supports a agent/collector topology.  Despite this
> restriction, it requires the depoyment of more different programs than
> flume -- agents, collectors, a console daemon.  For Chukwa to work as
> intended, it also has dependencies on a hadoop cluster (both hdfs and
> mapreduce) and a MySQL database.

You don't need a console daemon. You do not need MySQL. You do need
HDFS.

--Ari

Jonathan Hsieh

unread,
Jul 13, 2010, 7:09:55 PM7/13/10
to Ari Rabkin, Flume Users
Hi Ari!  

This comparison was based off of information gleaned from the Chukwa webpages.  We actually talked with Eric Yang on Friday and learned more about Chukwa's current state.  After seeing this, I was impressed by the amount of work invested in the monitoring and visualization portions of the Chukwa system.  

I think there are some fairly major design differences between the two systems.  Flume has so far been focused on  the reliable collection  and manageability story.  More details inline.

On Tue, Jul 13, 2010 at 2:01 PM, Ari Rabkin <asra...@gmail.com> wrote:
Howdy!

Found this post via links from the Cloudera blog post announcing
Flume. Wanted to supply a few notes and corrections about Chukwa.

On Jul 8, 12:53 am, Jonathan Hsieh <j...@cloudera.com> wrote:
> Hi Kim,
>
> Flume and Chukwa have different problem scopes, and slightly different
> goals.  Both are intended to help collect logs from clusters.
>
> 1)  Architecture and Near-realtime.
>
> * Chukwa's near real-time == minutes
> * Flume's near real-time == seconds (hopefully milliseconds).

This isn't quite right. Chukwa has an option for fast delivery --
seconds to milliseconds.


We designed the Flume system to support low latency operations first and then optionally adding batching operations for tuning throughput vs latency.  Flume also support "in flow" operations such data demultiplexing and some basic extraction/processing (regex extractions, projection,  field splitting/extraction, sampling).  

Flume also has a concept of isolated flows that allows us to bypass demultiplexing all together.  We see no need to physically multiplex data from many sources into one stream only to later have  to demultiplex data in a batch or streaming manner.  Different logs that come from initially different sources stay separated from each other and enables us to allocate different amounts of resources to each flow of data.

Making the system extensible for different sources of data and for different data sinks was a goal and not a special case. Initially we focused on writing data reliably to HDFS.   However it is really easy for us to write to other destinations such as HBase.   HBase is not a special option -- is it just another sink  (https://issues.cloudera.org/browse/FLUME-6) that Flume can deliver data to.  I haven't seen the code for this yet, but apparently someone has put together an initial voldemort (LinkedIn's key value store) connector.

Eric mentioned an HBase connector path for Chukwa and I think it was recently mentioned in the mailing list.  Up until this point, Chukwa seemed fundamentally tied to the demux MapReduce job and the latency that firing off MapReduce jobs incur.   This is a fairly new story and it seems pretty cool!   

While the story with the Chukwa MR job seem fairly clear, with the new HBase story, does Chukwa depend on HBase  to demux different kinds of data from different sources?  Is it based on the data's key somehow?

Does Chukwa depend on getting data into HBase before performing extraction or analytic operations?  

> 2) Reliability
>
> Flume's reliability levels are tunable, just pick the appropriate sink
> to specify the mode you want to collect data at.  It offers three
> levels -- best effort, store+retry on failure, and end-to-end mode
> that uses acks and a write ahead log.
>
> AFAICT Chukwa is best effort from agent to collector, writes to local
> disk at the pre-demux collector, and then finally becomes reliable
> when written to hdfs. This seems stronger than scribe's reliability
> mechanisms (equiv to Flume's Store on failure), but weaker than
> Flume's end-to-end reliability mode (write ahead log and acks).

This isn't right at all. Chukwa supports a range of reliability
mechanisms, including write-ahead logging at the agent.
There is not now, nor ever was, a write to local disk at the
collector.


As I alluded to, I couldn't find much about Chukwa's reliability story on its web pages -- and I stand corrected about a Chukwa collector writing to local disk.  Do you have a pointer or write up on the different modes?  Is the WAL mode the default?

Flume's logs are durable and is intended to allow and recover from multiple failures and  the cluster of machines.   Flume supports a hot failover mechanism to different collectors ( I'd imagine that Chukwa supports this).  Moreover, Flume also supports  dynamic reconfiguration of nodes -- this allows us to allocate more collectors at the system's master and take load from other collectors in an automated fashion.  

> 3) Manageability
>
> Chukwa's deployment story is restrictive and more complicated than
> Flume's.  It only supports a agent/collector topology.  Despite this
> restriction, it requires the depoyment of more different programs than
> flume -- agents, collectors, a console daemon.  For Chukwa to work as
> intended, it also has dependencies on a hadoop cluster (both hdfs and
> mapreduce) and a MySQL database.

You don't need a console daemon.  You do not need MySQL. You do need
HDFS.


Chukwa's core original mechanisms still requires  MapReduce jobs to run to demux data.  I can see that if you drop the console/monitoring portion then you can drop the mysql part.    Chukwa still has have specialized an agent daemons  and a collectors daemons.  

Flume essentially has a data plane that have one kind of daemon (nodes) and a control plane that has another daemon (masters).  With the control plan we can dynamically change the role of a nodes and its behavior from a centralized point.  This means a flume master can react and deal with newly  provisioned nodes in the system, or re-purpose nodes to be come collectors in different data flows.

Jon

-- 

macroadster

unread,
Jul 15, 2010, 11:13:16 AM7/15/10
to Flume Users


On Jul 13, 4:09 pm, Jonathan Hsieh <j...@cloudera.com> wrote:

> While the story with the Chukwa MR job seem fairly clear, with the new HBase
> story, does Chukwa depend on HBase  to demux different kinds of data from
> different sources?  Is it based on the data's key somehow?

The new HBase Writer uses the same mapreduce demux parser classes,
which runs on the collector to extract and filter data at real time.
The composition of row key is based on the decision put on the demux
parser.

> Does Chukwa depend on getting data into HBase before performing extraction
> or analytic operations?

It can do both or sending as raw bytes, but it is preferred to perform
extraction before sending to HBase to trim down the storage bytes.

> > 2) Reliability
>
> > > Flume's reliability levels are tunable, just pick the appropriate sink
> > > to specify the mode you want to collect data at.  It offers three
> > > levels -- best effort, store+retry on failure, and end-to-end mode
> > > that uses acks and a write ahead log.
>

All three methods exist in Chukwa. The first attempt, SeqFileWriter
uses HDFS client to write data with flush, then ack to the source
where data has deposited. This method is fairly slow because HDFS
sync can not be called too frequently. It was decided to sync every
15 seconds. If there is a collector system crash without closing the
file, the data is still lost without written. The second method was
to use LocalWriter, where data is written to local disk of the
collector, and uploaded to HDFS as one shot, when the chunks of data
has been close after n minute. This is also a problem when data
arrival rate is faster than HDFS can write, data are buffering faster
than sinked to hdfs, which causes the collector to become disk full.
The improved version is having agent to write ahead and retry if no
ack received, collector does not buffer any data. This allows
collector to go down without losing data and open up the possibility
to implement other writer like HbaseWriter. This model works the best
for us so far. I think both team has done the same research but in
the different order. End-to-end reliability mode did not work for
Chukwa because hdfs flush was no ops in Hadoop 0.18, and hasn't been
improved in 0.20.

> As I alluded to, I couldn't find much about Chukwa's reliability story on
> its web pages -- and I stand corrected about a Chukwa collector writing to
> local disk.  Do you have a pointer or write up on the different modes?  Is
> the WAL mode the default?

There are SeqFileWriter (HDFS), SocketTeeWriter (in memory),
LocalWriter (local disk), and soon HbaseWriter (in memory). All of
them are using agent write ahead, and retry on no ack.

> Flume's logs are durable and is intended to allow and recover from multiple
> failures and  the cluster of machines.   Flume supports a hot failover
> mechanism to different collectors ( I'd imagine that Chukwa supports this).
>  Moreover, Flume also supports  dynamic reconfiguration of nodes -- this
> allows us to allocate more collectors at the system's master and take load
> from other collectors in an automated fashion.

Chukwa doesn't have dynamic allocation of more collectors, but this
can be easily implemented.

> > 3) Manageability
>
> Chukwa's core original mechanisms still requires  MapReduce jobs to run to
> demux data.  I can see that if you drop the console/monitoring portion then
> you can drop the mysql part.    Chukwa still has have specialized an agent
> daemons  and a collectors daemons.

The separation of agent and collector is important because collector
is generally work in push model, where agent can utilize both pull
model or push model. This has several benefits. First, this model
reduces the chance of operator error for flipping incorrect data flow
direction. Second, agent only open one tcp connection to one collect
at any given time. It reduces the number of TCP connections open to
avoid tcp in-cast problem. Third, the source can decide to buffer
data for reliability reason without making this decision in middle of
the pipe to cause spills. Forth, each component is optimized for the
assigned task without generalization trade off, collector does not
need to implement logic for tracking rotated log, and agent does not
need to worry about filter or mirror of data, this reduces memory foot
print to live on source nodes. Chukwa agent can run with only 4mb of
jvm heap size.

> Flume essentially has a data plane that have one kind of daemon (nodes) and
> a control plane that has another daemon (masters).  With the control plan we
> can dynamically change the role of a nodes and its behavior from a
> centralized point.  This means a flume master can react and deal with newly
>  provisioned nodes in the system, or re-purpose nodes to be come collectors
> in different data flows.

Chukwa has the managing aspect open ended, and it could be added by
using zookeeper to keep track of collector list, in 5 minutes. With
hope, there may be more experience sharing and collaboration among
Flume and Chukwa.

regards,
Eric
Reply all
Reply to author
Forward
0 new messages