How complicated can behaviors be?

todd

unread,

Nov 4, 2010, 2:14:30 PM11/4/10

to s4-project

I was wondering how complicated the functions for an event can be? The
examples seem to be consuming feeds, but it could work equally well as
an application infrastructure based on events driving distributed
objects. What kind of application infrastructure can I have?

Anish Nair

unread,

Nov 4, 2010, 4:04:07 PM11/4/10

to s4-project

Todd:

The PE's implement both state and behavior. The behavior can be as
complicated as you want, the tradeoff is in how much traffic you can
handle. We've played with things like simple aggregation, complex
online adaptation, search personalization and so on.

I'm not exactly sure what you mean by "events driving distributed
objects". PE's are distributed objects, and they are instantiated via
events. Do you mean you want the PE's to be served directly to an
external system?

Todd Hoff

unread,

Nov 4, 2010, 5:11:11 PM11/4/10

to s4-pr...@googlegroups.com

On Thu, Nov 4, 2010 at 1:04 PM, Anish Nair <a...@s4.io> wrote:
> I'm not exactly sure what you mean by "events driving distributed
> objects". PE's are distributed objects, and they are instantiated via
> events. Do you mean you want the PE's to be served directly to an
> external system?

For example, if I have a user data structure containing set of
friends and an operation was executed for a Like between that user and
a resource, in this case an edge needs to be added to the social
graph. The user would be a persistent object in memory. The operation
would be an event that was routed to the user by key. The event causes
an action to be executed that adds the edge. That edge being added
causes update events to be sent out to anyone interested in such
events and a response event is deposited back at the key of the
original requester.

That's the kind of flow I'm talking about.

thanks

Anish Nair

unread,

Nov 4, 2010, 7:50:33 PM11/4/10

to s4-project

This is an excellent use-case.

The S4 implementation for this would have something like a UserPE for
storing state about the user and operations like adding edges. Every
time the UserPE gets a request to add an edge, if would dispatch an
event that would get routed to an appropriate EdgePE (keyed on a pair
of users; or a user and a resource). Both of these PE's would be in
the form of objects distributed across the cluster. The routing of
events to the appropriate PE (object) is handled by the S4 platform.
The UserPE and EdgePE can also dispatch events whenever their state is
updated, or at regular intervals, or for every input event (it really
is pretty generic!). These output 'event streams' can be subscribed to
by whoever is interested (another PE, or an external system).

So yes, quite doable.

--
Anish

todd

unread,

Nov 8, 2010, 4:46:13 PM11/8/10

to s4-project

On Nov 4, 3:50 pm, Anish Nair <a...@s4.io> wrote:
> The S4 implementation for this would have something like a UserPE for
> storing state about the user and operations like adding edges. Every
> time the UserPE gets a request to add an edge, if would dispatch an
> event that would get routed to an appropriate EdgePE (keyed on a pair
> of users; or a user and a resource). Both of these PE's would be in
> the form of objects distributed across the cluster. The routing of
> events to the appropriate PE (object) is handled by the S4 platform.
> The UserPE and EdgePE can also dispatch events whenever their state is
> updated, or at regular intervals, or for every input event (it really
> is pretty generic!). These output 'event streams' can be subscribed to
> by whoever is interested (another PE, or an external system).
>
> So yes, quite doable.

Thanks for the description. Any thoughts on how to handle consistency
in case of partial failure?And do you see this as something supporting
a workflow like SEDA architecture?

thanks

Anish Nair

unread,

Nov 10, 2010, 5:08:05 PM11/10/10

to s4-project

You could do some sort of state checkpointing to recover from partial
failures, via an external persister. We're working on decentralized
fault tolerance, will keep this forum posted about it.

We haven't looked beyond Spring yet for describing apps, but are quite
open to suggestions for DSLs.

--
Anish

Ted Dunning

unread,

Nov 10, 2010, 6:15:30 PM11/10/10

to s4-pr...@googlegroups.com

Have you looked at Plume (a clone of FlumeJava)?

That seems like a very provocative possibility in that it gives most of the higher level expressivity of a language like

Pig together with the Turing completion of Java.

todd

unread,

Nov 11, 2010, 12:38:23 PM11/11/10

to s4-project

Related post on Flume: https://groups.google.com/a/cloudera.org/group/flume-user/browse_thread/thread/c4e47978ea44854c#

Ted, do you have a source for good information that is not held
captive behind an ACM paywall?

Ted Dunning

unread,

Nov 11, 2010, 2:03:26 PM11/11/10

to s4-pr...@googlegroups.com

First off Flume != FlumeJava, FlumeJava == Plume

The best I can offer is a smaller paywall via deepdyve: http://www.deepdyve.com/lp/association-for-computing-machinery/flumejava-easy-efficient-data-parallel-pipelines-wwPgFt2hWB

You can read on-line for free with a trial subscription or for 99 cents if you have already exhausted your free trial reads.

I have a conflict of interest here because I used to be CTO at Deepdyve and am still an advisor there.

Todd Hoff

unread,

Nov 11, 2010, 2:17:12 PM11/11/10

to s4-pr...@googlegroups.com

Thanks, but it's a little hard to consider something that you can't
learn about readily.

Ted Dunning

unread,

Nov 11, 2010, 3:47:16 PM11/11/10

to s4-pr...@googlegroups.com

I don't understand your comment.

Plume is open source and you can learn about that easily.

The ACM paper is available for free or near free.

Neither is too complicated to understand.

What do you mean?

Ashwin Jayaprakash

unread,

Nov 20, 2010, 3:14:05 AM11/20/10

to s4-project

If your application is not large volume, streams of events, perhaps a
more modest "compute grid" would serve your purpose? Have a look at
GridGain if you haven't - http://www.gridgainsystems.com/wiki/display/GG15UG/Examples+Gallery
- Open source + Java.

Or JBoss Infinispan (http://jboss.org/infinispan) for a simpler data
grid with eventing. Again - Open source + Java.

Regards,
Ashwin.

PS: Sorry to divert attention from the S4 topic.

On Nov 11, 11:17 am, Todd Hoff <toddhoffi...@gmail.com> wrote:
> Thanks, but it's a little hard to consider something that you can't
> learn about readily.
>

> On Thu, Nov 11, 2010 at 11:03 AM, Ted Dunning <ted.dunn...@gmail.com> wrote:
>
> > First off Flume != FlumeJava, FlumeJava ==Plume
> > The best I can offer is a smaller paywall via

> > deepdyve: http://www.deepdyve.com/lp/association-for-computing-machinery/flumej...

> > You can read on-line for free with a trial subscription or for 99 cents if
> > you have already exhausted your free trial reads.
> > I have a conflict of interest here because I used to be CTO at Deepdyve and
> > am still an advisor there.
>

> > On Thu, Nov 11, 2010 at 9:38 AM, todd <toddhoffi...@gmail.com> wrote:
>
> >> Related post on Flume:

> >>https://groups.google.com/a/cloudera.org/group/flume-user/browse_thre...

>
> >> Ted, do you have a source for good information that is not held
> >> captive behind an ACM paywall?
>
> >> On Nov 10, 3:15 pm, Ted Dunning <ted.dunn...@gmail.com> wrote:

> >> > Have you looked atPlume(a clone of FlumeJava)?

Reply all

Reply to author

Forward