Flume comparison to Scribe

1,507 views
Skip to first unread message

Otis

unread,
Jul 2, 2010, 2:02:27 AM7/2/10
to Flume Users
Hi,

Could somebody compare and contract Flume and Scribe?

Scribe: http://wiki.github.com/facebook/scribe/

Thanks,
Otis

Eric Wadsworth

unread,
Jul 6, 2010, 5:55:45 PM7/6/10
to Flume Users
Yes please!

Henry Robinson

unread,
Jul 6, 2010, 7:25:03 PM7/6/10
to Otis, Flume Users
Hi -

We're planning a blog post or two at cloudera.com/blog in the near future to help people figure out how Flume differentiates itself from Scribe. While we're getting that in order, here are a few interesting points of comparison to be going on with:

1. Flume allows you to configure your Flume installation from a central point, without having to ssh into every machine, update a configuration variable and restart a daemon or two. You can start, stop, create, delete and reconfigure logical nodes on any machine running Flume from any command line in your network with the Flume jar available.

2. Flume also has centralised liveness monitoring. We've heard a couple of stories of Scribe processes silently failing, but lying undiscovered for days until the rest of the Scribe installation starts creaking under the increased load. Flume allows you to see the health of all your logical nodes in one place (note that this is different from machine liveness monitoring; often the machine stays up while the process might fail).

3. Flume supports three distinct types of reliability guarantees, allowing you to make tradeoffs between resource usage and reliability. In particular, Flume supports fully ACKed reliability, with the guarantee that all events will eventually make their way through the event flow.

4. Flume's also really extensible - it's really easy to write your own source or sink and integrate most any system with Flume. If rolling your own is impractical, it's often very straightforward to have your applications output events in a form that Flume can understand (Flume can run Unix processes, for example, so if you can use shell script to get at your data, you're golden).

This isn't an exhaustive list of benefits to using Flume - I haven't touched on using decorators for lightweight transformation or metadata extraction, the configuration language, the ability to run several logical nodes in a single Flume process, automatic bucketing and rolling of log files in HDFS... there's lots more about Flume that we're looking forward to sharing with everyone.

cheers,
Henry
--
Henry Robinson
Software Engineer
Cloudera
415-994-6679

Kimsterv

unread,
Jul 6, 2010, 9:12:54 PM7/6/10
to Flume Users
And perhaps chukwa too?

http://wiki.apache.org/hadoop/Chukwa
Reply all
Reply to author
Forward
0 new messages