Kafka Log4j Appender

Thomas Söhngen

unread,

Aug 3, 2012, 10:18:27 AM8/3/12

to storm...@googlegroups.com

Hi everyone,

congratulations and a big Thank You to Nathan and all contributors for
the new release! I think with Trident, Storm will become a real
alternative to Hadoop for more and more use-cases.

I have an unrelated question about using Kafka as a log4j Appender. I
added this
> <dependency>
> <groupId>storm</groupId>
> <artifactId>kafka</artifactId>
> <version>0.7.0-incubating</version>
> </dependency>
to my pom.xml and the following definitions to the storm.log.properties:
> log4j.appender.KAFKA=kafka.producer.KafkaLog4jAppender
> log4j.appender.KAFKA.Host={my-ip}
> log4j.appender.KAFKA.Port=9092
> log4j.appender.KAFKA.Topic=storm-log
When starting nimbus with this configuration, I get an exception:
> log4j:ERROR Could not instantiate class
> [kafka.producer.KafkaLog4jAppender].
> java.lang.ClassNotFoundException: kafka.producer.KafkaLog4jAppender
Is there anything else to do in order to get Kafka to run as a Log4j
appender in Storm?

Regards
Thomas

Nathan Marz

unread,

Aug 6, 2012, 12:31:55 AM8/6/12

to storm...@googlegroups.com

I don't know, you probably have something misconfigured. You'll need to put the Kafka jar in the lib/ dir before starting Nimbus.

--
Twitter: @nathanmarz
http://nathanmarz.com

Thomas Söhngen

unread,

Aug 6, 2012, 7:49:51 AM8/6/12

to storm...@googlegroups.com

I placed the jar in the lib dir, without success. Do I have to compile storm with kafka to get it running?

Jeryl Cook

unread,

Aug 6, 2012, 4:15:08 PM8/6/12

to storm...@googlegroups.com

@Thomas,

Hi! when you say "Storm will become a real alternative to Hadoop". what do you mean exactly? From my 'understanding" Hadoop solves batch processing on massive amounts of stored data, and Storm handles stream processing over continuous streams of data....two different tools solving two different use-cases......

Michael Rose

unread,

Aug 6, 2012, 4:45:39 PM8/6/12

to storm...@googlegroups.com

A lot of batch processes are only batch because the facilities don't exist to do the computation in real time. What if instead of aggregating your logs every hour and running jobs for rollups, you had all of your data in real time?

Obviously, Storm isn't going to be able to kill the standard ETL jobs, but given the right environment it can replace many latency-ridden processes.

--

Michael Rose (@Xorlev)

Senior Backend Engineer, FullContact
mic...@fullcontact.com

Thomas Söhngen

unread,

Aug 6, 2012, 5:23:25 PM8/6/12

to storm...@googlegroups.com

I think this is true for almost any data processing. As soon as you have the data, you can process it. I don't see the need to batch things except for timed reports and even that you can implement easily with Storm.

Trident is a powerful abstraction layer, which allows definition of Topologies very similar to the way MapReduce job-flows are defined. Sure you have no HDFS out-of-the-box, but you can use HBase or whatever you want to store your state. I can't say anything about performance, but I would really like to see a benchmark of the throughput of Storm and Hadoop on ETL jobs on a comparable cluster.

Reply all

Reply to author

Forward