Storm for a BI tool

154 views
Skip to first unread message

Madan Thangavelu

unread,
Jan 24, 2012, 1:16:47 PM1/24/12
to storm...@googlegroups.com
I am looking at an application like Google Analytics (standard reporting) to be backed by storm. Can storm fit the bill if I saved the data in cassandra and used storm DRPC to run complex join/filter/aggregate queries that are triggered from a web app. Responsiveness is the key. The world of possible queries are too many to maintain denormalized data for every case that the user would want. Any ideas about responsiveness and storm DRPC?

Thanks,
Madan

Gustavo Arjones

unread,
Jan 24, 2012, 3:24:29 PM1/24/12
to storm-user
Hi Madan,
We developed an analytical platform to count tweets for each political
candidate on last Argentinean elections. http://elecciones2011.socialmetrix.com/
All filters and reports are already saved with aggregated information
in REDIS.

From my point of view, the kind of use-case you want to use is more
suitable for Hadoop once each action would activate a "batch process"
to resolve selected query, filter, slicing, etc.

I am interested on similar solutions as well, if you want to build
some kind of proof of concept I would be glad to join to a open-
sourced project.

Cheers,
Gustavo

Madan Thangavelu

unread,
Jan 24, 2012, 4:17:54 PM1/24/12
to storm...@googlegroups.com, natha...@gmail.com
Hi Gustavo,

The link you provider looks sleek. What will become interesting is when that page become interactive.

Standard Hadoop would work great on batch processing but for a use-case that I described, we do not have a storage for aggregated data. My use-case is more in lines with "Reach" calculation. To make the use-case more clear, how responsive will the "Reach" calculation be if I wanted to only consider tweets in California and NewYork by people whose name started with the letter "jos".  If the user had control over selecting the cities and the characters.

I am experimenting with storm on a single machine and building bolts for the above use-case.

@Nathan,
I had emailed you earlier about building responsive web apps backed by storm, do you think it is a an option for the usecase that I have describe above.

Thanks,
Madan

Gustavo Arjones

unread,
Jan 25, 2012, 9:09:17 AM1/25/12
to storm-user
Hi Mandan,
We are relying in Redis to hold this kind of information and resolve
those queries inside the db.

Just as reference of what we're working on currently, we learned a lot
from both blog posts:

http://blog.getspool.com/2011/11/29/fast-easy-realtime-metrics-using-redis-bitmaps/
http://instagram-engineering.tumblr.com/post/12202313862/storing-hundreds-of-millions-of-simple-key-value-pairs

I realize it won't solve your problem completely once you have a
string matching query (name = jos*) but it will handle well the city
filter.

Cheers,
- Gustavo


On Jan 24, 6:17 pm, Madan Thangavelu <madankuma...@gmail.com> wrote:
> Hi Gustavo,
>
> The link you provider looks sleek. What will become interesting is when
> that page become interactive.
>
> Standard Hadoop would work great on batch processing but for a use-case
> that I described, we do not have a storage for aggregated data. My use-case
> is more in lines with "Reach" calculation. To make the use-case more clear,
> how responsive will the "Reach" calculation be if I wanted to only consider
> tweets in California and NewYork by people whose name started with the
> letter "jos".  If the user had control over selecting the cities and the
> characters.
>
> I am experimenting with storm on a single machine and building bolts for
> the above use-case.
>
> @Nathan,
> I had emailed you earlier about building responsive web apps backed by
> storm, do you think it is a an option for the usecase that I have describe
> above.
>
> Thanks,
> Madan
>
> On Tue, Jan 24, 2012 at 12:24 PM, Gustavo Arjones <garjo...@socialmetrix.com

Nathan Marz

unread,
Jan 26, 2012, 7:58:24 PM1/26/12
to Madan Thangavelu, storm...@googlegroups.com
Hi Madan,

Storm DRPC is intended for things like responsive web apps that need to do intense on the fly queries. So yes, I think it is fine for your use case. Of course, DRPC isn't the whole story, as you need to make sure to store your data in a way such that the db can handle the throughput of requests.

BTW, sorry for the late response. I've been cranking away on 0.7.0 and fell behind on email.

-Nathan
--
Twitter: @nathanmarz
http://nathanmarz.com

Nathan Marz

unread,
Jan 26, 2012, 8:00:27 PM1/26/12
to Madan Thangavelu, storm...@googlegroups.com
Oh, and to answer your second question, when we implemented reach on Storm, we were able to get the latency down to about 2s for URLs with very large reach (millions). This computation involved almost 1000 database calls.  URLs with smaller reach have much lower latency, of course.

Madan Thangavelu

unread,
Jan 26, 2012, 8:41:30 PM1/26/12
to Nathan Marz, storm...@googlegroups.com
Hi Nathan,

Thank you for your response. It is definitely exciting that you could get the latency down to 2s in such large use-cases. 

@Gustavo, Redis form my perspective is just the data storage part of the equation but storm definitely sounds to me like the computation unit of the system.

Thanks all,
Madan

On Thu, Jan 26, 2012 at 4:58 PM, Nathan Marz <natha...@gmail.com> wrote:
Reply all
Reply to author
Forward
0 new messages