Hadoop vs storm in twitter analytics

1,095 views
Skip to first unread message

anahap

unread,
Sep 16, 2011, 6:41:17 AM9/16/11
to storm-user
hi nathan
Could you tell me why you need hadoop for twitter analytics if storm
in so pwrfomant at processing large data. Why not only storm? Im not
critcising, just want to understand tge use cases.

seanm

unread,
Sep 16, 2011, 2:41:11 PM9/16/11
to storm-user
In my mind there is a key tradeoff between the two:

hadoop: very easy to sort/join/pivot large sets of data and requires
very little work to do so. Does not address any real-time needs.

storm: similar paradigm, but very easy to do real-time

Sean

nathanmarz

unread,
Sep 16, 2011, 3:03:14 PM9/16/11
to storm-user
Batch processing and realtime processing have different properties and
tradeoffs. With Hadoop, you can run idempotent functions on all your
data at once, but with high latency. With Storm, you can run
incremental functions very quickly, but since you don't look at the
whole dataset at once you can't run the same range of functions.

It turns out that these two paradigms complement each other extremely
well. All of our applications have both a batch processing component
and a realtime processing component.

You can see the slides for a presentation I gave about this batch/
realtime approach here: http://www.slideshare.net/nathanmarz/the-secrets-of-building-realtime-big-data-systems

-Nathan


On Sep 16, 3:41 am, anahap <a...@nahapetian.com> wrote:
Reply all
Reply to author
Forward
0 new messages