Does S4 use a filesystem as HDFS and no sql database as HBase?

79 views
Skip to first unread message

donal

unread,
Nov 9, 2010, 12:02:25 PM11/9/10
to s4-project
rt

Anand Kesari

unread,
Nov 9, 2010, 4:54:05 PM11/9/10
to s4-project
S4 is a framework for consuming a stream events and performing
computations on the stream (e.g. find the most frequent keyword in a
stream of messages). For this, it provides facilities for distributing
the computation and for routing events between nodes. Storage of data
(persistence) is not a part of the core functionality of S4.
Persisters can be plugged into S4. These provide functionality such as
writing to the local filesystem, to a database, HDFS, etc.

--
Anand

On Nov 9, 9:02 am, donal <donal0...@gmail.com> wrote:
> rt

donal

unread,
Nov 9, 2010, 5:57:32 PM11/9/10
to s4-project
I have a structured message stream queued in ActiveMQ in a rate about 200 per second.
I want to do computations on the stream based on several time intervals, eg. 5min, 1hour, 12hours,1day....
I also would like to save these computing results for later look up .
 
So, S4 will do the pre-defined computations on the stream.
How about if I want to do more computations on the old messages ?
 
 
2010-11-09

donal

发件人: Anand Kesari
发送时间: 2010-11-09  22:54:11
收件人: s4-project
抄送:
主题: Re: Does S4 use a filesystem as HDFS and no sql database as HBase?

Ted Dunning

unread,
Nov 9, 2010, 6:02:26 PM11/9/10
to s4-pr...@googlegroups.com
Persist them and run map-reduce on a collection of old messages.

Ian Holsman

unread,
Nov 9, 2010, 6:37:11 PM11/9/10
to s4-pr...@googlegroups.com, s4-pr...@googlegroups.com
Do you only want historical numbers or  rt windows showing aggregated numbers for the current period. 

I'm not sure if m/r would do the window based approach cleanly. 

---
Ian Holsman - 703 879-3128

I saw the angel in the marble and carved until I set him free -- Michelangelo

Ted Dunning

unread,
Nov 9, 2010, 6:43:10 PM11/9/10
to s4-pr...@googlegroups.com
Current period aggregates are definitely easier in S4, but hadoop can do them just fine.  You just emit each record several times for 
each window it appears in and reduce on window id.

donal

unread,
Nov 10, 2010, 3:46:06 AM11/10/10
to s4-project
Thank you guys!
I want the real time numbers to support for some decisions.
Meanwhile I also want to show these numbers in a plot for any period.
 
Maybe I can use S4 and store the results in a database like cassandra for later lookup.
And also store the original messages in case when I want to do new kind of analysis on them.
If only S4 would do all these things .
 
 
2010-11-10

donal

发件人: Ted Dunning
发送时间: 2010-11-10  00:49:31
收件人: s4-project
抄送:
主题: Re: Does S4 use a filesystem as HDFS and no sql database as HBase?
Current period aggregates are definitely easier in S4, but hadoop can do them just fine.  You just emit each record several times for 
each window it appears in and reduce on window id.

On Tue, Nov 9, 2010 at 3:37 PM, Ian Holsman <i...@holsman.net> wrote:
14.gif

Ted Dunning

unread,
Nov 10, 2010, 2:12:04 PM11/10/10
to s4-pr...@googlegroups.com
I think it will do all this pretty easily.

Put a persister on the input or a short time aggregate of the input and put another persister on the final aggregator.  That gives you everything you are asking for.

Serguei Boldyrev

unread,
May 18, 2012, 5:32:05 AM5/18/12
to s4-pr...@googlegroups.com
Hi,

I just came across this post and is interested if anyone has followed up with S4 to HDFS or SQL persister?
______
Serguei.
Reply all
Reply to author
Forward
0 new messages