Realtime vs Long Term Data Analysis with Storm/Hadoop/Cassandra

Gary Malouf

unread,

Feb 25, 2013, 11:01:20 PM2/25/13

to storm...@googlegroups.com

I've been experimenting with Storm + Cassandra for our realtime ad serving analytics platform. While doing research, I came across a 2011 blog post from Nathan: http://nathanmarz.com/blog/how-to-beat-the-cap-theorem.html. One of the things that stood out to me was the suggestion that everything in Cassandra be transient as the 'correct' computation will take place in Hadoop within a few hours. Is this how Twitter handles their data analytics - keeping the past few hours almost accurate data in Cassandra but replacing this with Hadoop's batch processing results as time goes on?

Basically, I'm trying to decide whether I should keep things like impression/click counts JUST in Cassandra or if I should be recomputing everything in Hadoop.

Connection to Storm - as demonstrated in the ETE 2012 presentation, I am using Storm to validate data and run partial ETL (Translate third party data to internal api). Once these steps are done, I both append to Hadoop and update the minute buckets for the ad in Cassandra column family.

Nathan Marz

unread,

Feb 26, 2013, 12:19:02 AM2/26/13

to storm...@googlegroups.com

Yes, that's how we're doing much of our data analytics. The realtime computation is still correct, but having the majority of our rollups served via Hadoop computed indices gives us a lot of flexibility with respect to doing recomputes when necessary. It also lets us keep our Cassandra cluster much smaller, which is a huge win.

--
You received this message because you are subscribed to the Google Groups "storm-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to storm-user+...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
Twitter: @nathanmarz
http://nathanmarz.com

Gary Malouf

unread,

Mar 5, 2013, 4:16:29 PM3/5/13

to storm...@googlegroups.com, nat...@nathanmarz.com

Is it possible? Yes. I am setting up storm ( as Twitter does) to validate incoming data and then in append it to Hadoop along with updating my DHT Cassandra. I think the only difference in what you are describing is to update HBase instead of Cassandra right?

On Tuesday, March 5, 2013 6:34:03 AM UTC-5, SHOBIN JOSEPH wrote:

Can we Integrate Hadoop and Storm together? Is it possible to fetch data relatime from Haoop's Hbase through Storm and can we update tables in Hbase using Storm?

Viral Bajaria

unread,

Mar 5, 2013, 4:17:34 PM3/5/13

to storm...@googlegroups.com, nat...@nathanmarz.com

Yes it is possible to integrate HBase and Storm. You can use HBase as your final database while Storm as your processing engine. I am not sure what you meant by integrating Hadoop, if all you meant was to ask about HBase then yes that is possible but if you meant whether you can integrate MapReduce then No (or atleast I don't know about it) but you can definitely write to HDFS.

On Tue, Mar 5, 2013 at 3:34 AM, SHOBIN JOSEPH <shobinjo...@gmail.com> wrote:

Can we Integrate Hadoop and Storm together? Is it possible to fetch data relatime from Haoop's Hbase through Storm and can we update tables in Hbase using Storm?

On Tuesday, 26 February 2013 10:49:02 UTC+5:30, Nathan Marz wrote:

Ted Dunning

unread,

Mar 5, 2013, 10:25:45 PM3/5/13

to storm...@googlegroups.com, nat...@nathanmarz.com

Funny you should use just those words.

http://www.slideshare.net/tdunning/buzz-words-2012-realtime-learning

You don't even necessarily need something like HBase or Cassandra.

On Tue, Mar 5, 2013 at 6:34 AM, SHOBIN JOSEPH <shobinjo...@gmail.com> wrote:

Can we Integrate Hadoop and Storm together? Is it possible to fetch data relatime from Haoop's Hbase through Storm and can we update tables in Hbase using Storm?

On Tuesday, 26 February 2013 10:49:02 UTC+5:30, Nathan Marz wrote:

SHOBIN JOSEPH

unread,

Apr 26, 2013, 5:26:43 AM4/26/13

to storm...@googlegroups.com

Hi,

Whether Storm and Kafka can co-exist with Apache Hadoop (in the same cluster or different)?
whether HBASE can support both MapReduce as well as Storm based data processing?

Ted Dunning

unread,

Apr 26, 2013, 12:08:07 PM4/26/13

to storm...@googlegroups.com

Your questions are hard to understand.

What is it that you are actually asking?

--

Message has been deleted

SHOBIN JOSEPH

unread,

Apr 29, 2013, 12:18:02 AM4/29/13

to storm...@googlegroups.com

Hi, Currently I had a hadoop cluster avialable, whether I can configure storm on the same cluster? I am planning to use Apache Kafka and Storm for big data real time analytics in the same hadoop cluster, is this architecture is possible? whether i can pass the data in hbase(in the above mentioned cluster) to storm and/or the output of storm to hbase?

manoj jannu

unread,

Jul 13, 2013, 11:07:03 PM7/13/13

to storm...@googlegroups.com

hi,i am very new to storm -cassandra.can anyone plz send me a proper working steps for storm-cassandra integration,

storm dev

unread,

Jul 14, 2013, 9:33:31 PM7/14/13

to storm...@googlegroups.com, nat...@nathanmarz.com

pls help me add this group, thanks

Reply all

Reply to author

Forward