Is there a way to embed Flume in my Java application? I would like to write a sink that will make in process method calls to other objects/classes.

ilan.ilje

unread,

Nov 9, 2010, 7:28:14 AM11/9/10

to Flume Users

I would like to embed flume agent in my Java application in a way that
for every mesage that arrives, I will send it to my application
classes for further processing.
If this is possible, please explain how to do it.

Mant thanks in advance
Ilan.

Jonathan Hsieh

unread,

Nov 9, 2010, 12:28:01 PM11/9/10

to ilan.ilje, Flume Users

Ilan,

I think the easiest thing to do is to instantiate an rpcSource in your application. This will instantiate a server on your application machine on a port you pick. Your data sources just need to point their data at the app machine and port you picked. The application would then would just have a thread that pull data from it by calling the next() method, and then does it custom processing thing.

If you only use best effort or disk failover modes, this is about all you have to do!

Jon

--
// Jonathan Hsieh (shay)
// Software Engineer, Cloudera

// j...@cloudera.com

ilan.ilje

unread,

Nov 10, 2010, 2:18:35 AM11/10/10

to Flume Users

Hi Jon,
Can you please direct me how to embed Flume in my Java application? Do
you have an example that I can look into?
In case I embed Flume in my application, does this agent still be
listening and controlled by the Flume Master?

Many thanks for your help
Ilan.

Jonathan Hsieh

unread,

Nov 18, 2010, 1:17:26 AM11/18/10

to ilan.ilje, Flume Users

Ilan,

I'm a little confused -- do you want to have flume data sent to your application where it can be processed or do you want your application to generate data and send it to a flume node elsewhere?

If you want to embed flume into your application and use it to send data there is roughly 3 ways to do it.

1) embed a sink. instantiate a rpcSink and create events and append the events to the sink. no heartbeats (thus no central config, no direct e2e), and in the same thread as your app.

2) embed a logical node. you'd have a custom source. There is no heartbeat.

3) embed a physical node. This would include the heartbeating mechanism, and allow the node to be centrally configed and work with e2e mode.

I'd suggest option 1 since it is the simplest. It could be used to feed a "normal" flume node on the same machine.

Jon

Reply all

Reply to author

Forward