Comparing flume to message queues (f.ex. rabbit+riak)

1,140 views
Skip to first unread message

bradford

unread,
Jul 10, 2010, 2:00:14 PM7/10/10
to Flume Users
I want to take a look at the relative advantages, disadvantages and
tradeoffs of Flume vs. something like rabbitmq + riak.

I am building a very large scale crawling and event-driven system that
i want to build on a nice messaging backbone. Thought I ought to
consider Flume as well.

Henry Robinson

unread,
Jul 10, 2010, 2:49:17 PM7/10/10
to bradford, Flume Users
Hi Bradford - 

The difference between Flume and a message queue like rabbitmq, 0mq etc. lies in the delivery guarantees.

Message queues usually make strong guarantees about the order in which messages are delivered (usually at least preserving the order of production on an individual client), and the number of times a single message is delivered (typically exactly once).  They also may offer transactional guarantees about the delivery of groups of messages.

Flume makes weaker guarantees in the interest of moving data around more quickly and to enable cheaper fault tolerance*. In Flume's end-to-end reliability mode, events are delivered at least once, but with no ordering guarantees. We've found this sufficient for using Flume as a data conduit, since messages can be de-duplicated either at write time or by a post-hoc batch process. However, this means that Flume is harder to use as a message passing or eventing framework unless your application is setup to be idempotent wrt duplicate events and there is no causal relationship between events that is required to be preserved upon delivery. 

*The idea is to minimise the amount of state that Flume has to keep. Replicated state is what makes fault-tolerance hard, and makes reasoning about failure conditions difficult. 

cheers,
Henry
--
Henry Robinson
Software Engineer
Cloudera
415-994-6679

AshwinJay

unread,
Jul 24, 2010, 8:42:07 PM7/24/10
to Flume Users
I personally think that the "ordered delivery" requirement is over
rated. Most messaging systems these days dump data into a pool of
Application servers in a round robin fashion. Ordering is lost anyway.
The apps are built to take care of events arriving out of order.
That's why App servers and related technologies like JPA/Hibernate
have 1 and 2 level caches to hold and re-assemble the messages after
arrival.


On Jul 10, 11:49 am, Henry Robinson <he...@cloudera.com> wrote:
> Hi Bradford -
>
> The difference between Flume and a message queue like rabbitmq, 0mq etc.
> lies in the delivery guarantees.
>
> Message queues usually make strong guarantees about the order in which
> messages are delivered (usually at least preserving the order of production
> on an individual client), and the number of times a single message is
> delivered (typically exactly once).  They also may offer transactional
> guarantees about the delivery of groups of messages.
>
> Flume makes weaker guarantees in the interest of moving data around more
> quickly and to enable cheaper fault tolerance*. In Flume's end-to-end
> reliability mode, events are delivered at least once, but with no ordering
> guarantees. We've found this sufficient for using Flume as a data conduit,
> since messages can be de-duplicated either at write time or by a post-hoc
> batch process. However, this means that Flume is harder to use as a message
> passing or eventing framework unless your application is setup to be
> idempotent wrt duplicate events and there is no causal relationship between
> events that is required to be preserved upon delivery.
>
> *The idea is to minimise the amount of state that Flume has to keep.
> Replicated state is what makes fault-tolerance hard, and makes reasoning
> about failure conditions difficult.
>
> cheers,
> Henry
>

Henry Robinson

unread,
Jul 25, 2010, 11:33:28 PM7/25/10
to AshwinJay, Flume Users
Yes, if you have a total order on events you can always reconstruct ordering at the point of delivery - *if* you know exactly what you're expecting and you have enough buffer somewhere in the network to wait for any events that you missed. We could make Flume support ordered delivery at the cost of some performance, but I agree, OOO delivery is usually completely sufficient for the kinds of use cases we're interested in. In particular, we really don't want collectors to be waiting for the arrival of a particular message before they can forward any other messages; this would kill throughput in the current design.

Henry

Berkay

unread,
Aug 6, 2010, 3:46:37 PM8/6/10
to Flume Users
Can you elaborate little more on what you mean by "weaker guarantees"?
I'm trying to understand in what circumstances order may be altered.
Would some reliability configurations offer more guarantee than
others?
When processing syslog messages, order can be significant. For
example, syslog messages from network devices can indicate an
interface being up or down. Processing them in the wrong order would
result in the wrong conclusion.

Flume looks great for sure, just trying to understand whether/where it
fits

Regards,

Berkay

On Jul 25, 11:33 pm, Henry Robinson <he...@cloudera.com> wrote:
> Yes, if you have a total order on events you can always reconstruct ordering
> at the point of delivery - *if* you know exactly what you're expecting and
> you have enough buffer somewhere in the network to wait for any events that
> you missed. We could make Flume support ordered delivery at the cost of some
> performance, but I agree, OOO delivery is usually completely sufficient for
> the kinds of use cases we're interested in. In particular, we really don't
> want collectors to be waiting for the arrival of a particular message before
> they can forward any other messages; this would kill throughput in the
> current design.
>
> Henry
>

Henry Robinson

unread,
Aug 6, 2010, 6:41:52 PM8/6/10
to Berkay, Flume Users
Hi - 

There are two ways that events may be re-ordered:

1. They are transmitted in DFO or E2E modes, and a failure delays them until after the successful delivery of some chronologically later events. The agent will try and retransmit unacknowledged events, but that could happen after some events get delivered just fine. 

2. The network reorders the packets. That can't happen with current TCP protocols (i.e. there's buffering and reordering done at the receiver), but I can't rule out us going to UDP, precisely because we don't need those guarantees.

You can always reconstruct causal order after all events are delivered by looking at their timestamps, but at the time of delivery you don't know if there are events that you missed, unless you attach sequence numbers to each. If you are using Flume for alerting then you just need to track when the last interesting state was Say you received an ERROR notification with timestamp t - just make sure you save t and silently drop any messages that arrive after it with timestamps < t. 

In BE mode, currently, events should arrive in order but it's possible they could be delivered to different collectors, if you have more than one. You have to be aware of the possibility that events could be arbitrarily delayed, as well, although the delay you see for BE should be less than for DFO or E2E (i.e. events are usually delivered quickly, or not at all).

Does this help? Let me know if you'd like me to go into more detail on anything.

cheers,
Henry

Patrick Hunt

unread,
Aug 6, 2010, 7:57:47 PM8/6/10
to Henry Robinson, Berkay, Flume Users
Great reply Henry! Perhaps a good one for the FAQ?

Patrick

> <mailto:mbe...@gmail.com>> wrote:
>
> Can you elaborate little more on what you mean by "weaker guarantees"?
> I'm trying to understand in what circumstances order may be altered.
> Would some reliability configurations offer more guarantee than
> others?
> When processing syslog messages, order can be significant. For
> example, syslog messages from network devices can indicate an
> interface being up or down. Processing them in the wrong order would
> result in the wrong conclusion.
>
> Flume looks great for sure, just trying to understand whether/where it
> fits
>
> Regards,
>
> Berkay
>
> On Jul 25, 11:33 pm, Henry Robinson <he...@cloudera.com

> <mailto:he...@cloudera.com>> wrote:
> > Yes, if you have a total order on events you can always
> reconstruct ordering
> > at the point of delivery - *if* you know exactly what you're
> expecting and
> > you have enough buffer somewhere in the network to wait for any
> events that
> > you missed. We could make Flume support ordered delivery at the
> cost of some
> > performance, but I agree, OOO delivery is usually completely
> sufficient for
> > the kinds of use cases we're interested in. In particular, we
> really don't
> > want collectors to be waiting for the arrival of a particular
> message before
> > they can forward any other messages; this would kill throughput
> in the
> > current design.
> >
> > Henry
> >
> > On 24 July 2010 17:42, AshwinJay <ashwin.jayaprak...@gmail.com

> <mailto:ashwin.jayaprak...@gmail.com>> wrote:
> >
> >
> >
> >
> >
> > > I personally think that the "ordered delivery" requirement is over
> > > rated. Most messaging systems these days dump data into a pool of
> > > Application servers in a round robin fashion. Ordering is lost
> anyway.
> > > The apps are built to take care of events arriving out of order.
> > > That's why App servers and related technologies like JPA/Hibernate
> > > have 1 and 2 level caches to hold and re-assemble the messages
> after
> > > arrival.
> >
> > > On Jul 10, 11:49 am, Henry Robinson <he...@cloudera.com

Berkay Mollamustafaoglu

unread,
Aug 8, 2010, 2:25:25 PM8/8/10
to Flume Users
Henry, 

Thanks!, this is indeed very helpful. It looks like we may have some challenges with alerting use case, but there may be ways to overcome the issues. 
BE mode would not be suitable due to chance of  missing events, and dropping events that arrive with earlier timestamps can mean to miss clear events, but adding sequence number to each event sounds like a viable option.
Looking forward to dig deeper into Flume!

Regards,
Berkay Mollamustafaoglu
mberkay on yahoo, google and skype
Reply all
Reply to author
Forward
0 new messages