Combining job that runs more frequently than daily

11 views
Skip to first unread message

Jason Bodnar

unread,
Feb 24, 2014, 2:20:51 PM2/24/14
to dat...@googlegroups.com
Is it possible to have a combining job that runs more frequently than daily?

I'm looking at using Hourglass to calculate the number of gifts and total dollars received by participants in fundraising events. With hundreds of clients operating hundreds of events with a 1000 participants or more at each event I feel like Hadoop would be a good tool for this. But, our clients prefer that we update stats more often than daily or even hourly. Our current process updates them every 15 minutes.

Looking through the Hourglass API nothing screamed out for setting the interval for a job. While I realize running jobs on a periodic basis is the responsibility of a scheduler, like Oozie, I would think Hourglass needs to know something about interval in order to create proper directories for combined data. Is this possible with Hourglass?

Thanks,

Jason Bodnar

Matthew Hayes

unread,
Feb 25, 2014, 12:55:07 PM2/25/14
to dat...@googlegroups.com
MapReduce jobs in general may not be the right choice if you are looking at low latency updates like 15 minutes.  It probably comes down to how you're performing your aggregation and whether you have a dedicated cluster to run on.  As for Hourglass, the granularity of the inputs is currently expected to be daily.  I do want to add support for other intervals, such as hourly, but haven't gotten to this yet.

-Matt


--
You received this message because you are subscribed to the Google Groups "DataFu" group.
To unsubscribe from this group and stop receiving emails from it, send an email to datafu+un...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Reply all
Reply to author
Forward
0 new messages