Embedding Flume or Flume components into an application

161 views
Skip to first unread message

Pete Berry

unread,
Jul 12, 2010, 3:44:24 PM7/12/10
to Flume Users
I am planning to embed Flume in my application, the user guide says it
is incomplete, can
anyone share their experiences? Is there a sample code to do it?

-- Pete

Jonathan Hsieh

unread,
Jul 12, 2010, 4:56:58 PM7/12/10
to Pete Berry, Flume Users
Pete,

The level of complexity this requires depends on the level of data reliability you want.   Let's start with some questions and see where this goes:

What level of reliability do you need?
Are you ok withe best effort delivery from application to a flume node running on the same machine? 
What kind of application do you want to embed flume in/ (java, another language?)  
How static is the setup of your application/network?

If you are extremely concerned about losing any logs and want to do something like synchronous logging, or have highly dynamic configurations,  a full embed of a fume node is possible in java.  We currently do this in test cases for nodes -- these have multiple FlumeNode instances (physical nodes) embedded in a single execution.

Jon.
--
// Jonathan Hsieh (shay)
// j...@cloudera.com

Pete Berry

unread,
Jul 12, 2010, 6:58:54 PM7/12/10
to Flume Users
Thanks Jon.

My data source is a Java application and I will be running a flume
agent in the same VM has the application. Reliability is very
important
and the configuration is static. I should be able to reliably transfer
data to a HDFS and I can work off of that.

I am still trying to get my hands dirty on Flume, I am sure I should
be
able to answer better after I play with it for some time.

-- Pete

On Jul 12, 1:56 pm, Jonathan Hsieh <j...@cloudera.com> wrote:
> Pete,
>
> The level of complexity this requires depends on the level of data
> reliability you want.   Let's start with some questions and see where this
> goes:
>
> What level of reliability do you need?
> Are you ok withe best effort delivery from application to a flume node
> running on the same machine?
> What kind of application do you want to embed flume in/ (java, another
> language?)
> How static is the setup of your application/network?
>
> If you are extremely concerned about losing any logs and want to do
> something like synchronous logging, or have highly dynamic configurations,
>  a full embed of a fume node is possible in java.  We currently do this in
> test cases for nodes -- these have multiple FlumeNode instances (physical
> nodes) embedded in a single execution.
>
> Jon.
>

Pete Berry

unread,
Jul 13, 2010, 1:52:02 PM7/13/10
to Flume Users
Jon, reading the user-guide and looking at the source code
really helped. I will look at the node related test cases.

-- Pete

Jonathan Hsieh

unread,
Jul 13, 2010, 1:56:45 PM7/13/10
to Pete Berry, Flume Users
Pete,

Great!

There are roughly two things that embedding flume in your program can buy you:
1) offload the responsibility of reliability (logging, retries, etc) from your application to the embedded flume library.  
2) allow for the configuration of you logging to by dynamically controlled by a centralized source.

It sounds like you are willing to do some coding.  If you need both #1 and  #2 a full embed seems required (instantiating a FlumeNode).  If you only need the first, we can do a lighter-weight embedding (instantiating a LogicalNode) with a hard coded config or may a file based config.

There are many test cases that instantiate LogicalNodes or use the LogicalNodeManager.  This would be good things to look at.

Feel free to shoot more questions!

Jon.
Reply all
Reply to author
Forward
0 new messages