Some basic questions

15 views
Skip to first unread message

Patrick Wright

unread,
Oct 17, 2009, 4:42:42 PM10/17/09
to swarm-...@googlegroups.com
To start a discussion and pin some things down, in a prototypical
Swarm application

1- Who launches a Swam continuation? Where would this fit in to an
application lifecycle?

2- How does a continuation report back on the results of its work? In
the current demos, a new thread is spawned, and it may end by moving
the continuation elsewhere. How do we get ahold of the end results?

3- What happens if a continuation hits an exception while it is
a) on its home node, but running in a separate thread
b) not on its home node

4- What happens if a continuation wants to move to another node and
a) that node is not responding
b) communicating with that node throws an exception

5- On any node, is there any way to give the continuation access to
the surrounding execution context other than the current one of using
static (or in Scala, object) methods and fields?

Thanks

Rick R

unread,
Oct 17, 2009, 5:42:05 PM10/17/09
to swarm-...@googlegroups.com
This is the perfect discussion to take to an IRC/SILC chatroom. It would greatly decrease the latency of responses to ideas.  That said, I'll have a go at these.

To start, I'll say that my favorite interfaces for distributed processing are purely asynchronous. A good example is the E programming language.  http://www.erights.org/
In addition to offering a capabilities based security model. It performs processing by returning Promises for every function call. The receiver of that Promise can choose to block on that Promise for its fulfillment, or register a callback on the success or error conditions and continue in its asynchrony. I think the people that have done a lot of research in the realm of distributed systems seem to agree that asynchrony is terribly important.  We should build tools that leverage this.

Also, I hear people are having reasonable success with this Erlang thing.

On Sat, Oct 17, 2009 at 4:42 PM, Patrick Wright <pdou...@gmail.com> wrote:

To start a discussion and pin some things down, in a prototypical
Swarm application

1- Who launches a Swam continuation? Where would this fit in to an
application lifecycle?

This would be the data storage processing back-end for large, busy systems, like the back-end of a data processing intensive website or multiplayer game. I am assuming that it could itself drive a web framework, or one would interact with it in the usual way that one would interact with back-ends. (REST, RPC, etc)

2- How does a continuation report back on the results of its work? In
the current demos, a new thread is spawned, and it may end by moving
the continuation elsewhere. How do we get ahold of the end results?

I would like to see a continuation report back to the Actor that spawned it via some sort of standard Complete message. This would include the ID of the request to which it is responding.

3- What happens if a continuation hits an exception while it is
a) on its home node, but running in a separate thread
b) not on its home node

 Instead of a Complete message, it would be an Error message. This would be the case whether it was local or remote.

4- What happens if a continuation wants to move to another node and
a) that node is not responding
b) communicating with that node throws an exception

I am assuming that the primary reason for wanting to move to that node would be one of 2 reasons:
1. There is data there that this task needs.
2. There are available CPU cycles on that node.

In either case, becoming unresponsive seems to invalidate both cases. I guess this should cause the entire system to seek other nodes.  (We need to be careful with this, I've found through painful experience that failover systems are never as simple as one originally thinks) 

5- On any node, is there any way to give the continuation access to
the surrounding execution context other than the current one of using
static (or in Scala, object) methods and fields?

I'm not sure of the reasoning for this. And I'm not sure of the answer.


If we choose not to use the actor model (I have no idea why we would) I would argue strongly in favor of the Promise model used in E.

If you do wish to chat, we can jump onto  irc.freenode.net/#swarm at a specified time. I should be pretty tied up this weekend (except for late evening EST) Other than that, I can make time whenever.

Ian Clarke

unread,
Oct 17, 2009, 5:54:13 PM10/17/09
to swarm-...@googlegroups.com
On Sat, Oct 17, 2009 at 3:42 PM, Patrick Wright <pdou...@gmail.com> wrote:
> To start a discussion and pin some things down, in a prototypical
> Swarm application
>
> 1- Who launches a Swam continuation? Where would this fit in to an
> application lifecycle?

I think it would depend on the application. In a web framework, I
could see a number of small nodes each listening on port 80. If a
connection comes in to one of these nodes (perhaps via a
load-balancer) then HttpRequest and HttpResponse objects are created
and placed inside Refs (since these objects are not Serializable).

Processing occurs, which may involve the continuation jumping around a
bit, until eventually you need to send the response, which requires
accessing the HttpResponse object, which automatically causes the
continuation to jump back to the originating Swarm node.

It seems quite neat to me.

> 2- How does a continuation report back on the results of its work? In
> the current demos, a new thread is spawned, and it may end by moving
> the continuation elsewhere. How do we get ahold of the end results?

See previous answer.

> 3- What happens if a continuation hits an exception while it is
> a) on its home node, but running in a separate thread
> b) not on its home node

That will depend on what happens within the exception handling code I
guess. I'm not sure how the Scala continuations plugin handles
exceptions to be honest.

> 4- What happens if a continuation wants to move to another node and
> a) that node is not responding
> b) communicating with that node throws an exception

In this case Swarm.moveTo() throws an exception? Hopefully this type
of thing can be mitigated through data redundancy.

> 5- On any node, is there any way to give the continuation access to
> the surrounding execution context other than the current one of using
> static (or in Scala, object) methods and fields?

Not sure, perhaps running the continuation from a subclass of Thread
which contains additional fields. So it would be something like:

Thread.currentThread().asInstanceOf[SwarmThread].getContext()

Where getContext() is implemented in SwarmThread, itself a subclass of Thread.

Ian.

--
Ian Clarke
CEO, Uprizer Labs
Email: i...@uprizer.com
Ph: +1 512 422 3588
Fax: +1 512 276 6674

Rick R

unread,
Oct 17, 2009, 6:31:01 PM10/17/09
to swarm-...@googlegroups.com
According to the docs on delimited continuations, they don't yet support exceptions, but it should be easy to add.

Patrick Wright

unread,
Oct 17, 2009, 6:35:52 PM10/17/09
to swarm-...@googlegroups.com
> This is the perfect discussion to take to an IRC/SILC chatroom. It would
> greatly decrease the latency of responses to ideas.  That said, I'll have a
> go at these.

Never got used to IRC, but it's always time for something new..

>
> To start, I'll say that my favorite interfaces for distributed processing
> are purely asynchronous. A good example is the E programming language.
> http://www.erights.org/

Will look at it.

> In addition to offering a capabilities based security model. It performs
> processing by returning Promises for every function call. The receiver of
> that Promise can choose to block on that Promise for its fulfillment, or
> register a callback on the success or error conditions and continue in its
> asynchrony. I think the people that have done a lot of research in the realm
> of distributed systems seem to agree that asynchrony is terribly important.
> We should build tools that leverage this.

Promise: sounds interesting...

Asynchrony: sure


> I would like to see a continuation report back to the Actor that spawned it
> via some sort of standard Complete message. This would include the ID of the
> request to which it is responding.

It's the "report back to" that I'm not clear on. Our continuation is
now on node N in the cluster. It needs to get back to node A and
access it's Actor...how? How does it know what Actor instance to
communicate with?


>  Instead of a Complete message, it would be an Error message. This would be
> the case whether it was local or remote.

Sure, I'm just not clear on the return path.


> I am assuming that the primary reason for wanting to move to that node would
> be one of 2 reasons:
> 1. There is data there that this task needs.
> 2. There are available CPU cycles on that node.
>
> In either case, becoming unresponsive seems to invalidate both cases. I
> guess this should cause the entire system to seek other nodes.  (We need to
> be careful with this, I've found through painful experience that failover
> systems are never as simple as one originally thinks)

I agree that it may invalidate the progress of the continuation.
Again, we could handle it with your Error case (or ErrorIncomplete),
but again we need to get back home...


>> 5- On any node, is there any way to give the continuation access to
>> the surrounding execution context other than the current one of using
>> static (or in Scala, object) methods and fields?
>
> I'm not sure of the reasoning for this. And I'm not sure of the answer.

Reasoning: static state is essentially global within a classloader.
It's not (any longer) a standard way to provide access to anything
other than something like singleton instances or completely immutable
(fixed) data. I'm not sure we can guarantee we will end up in the
right classloader, or with visibility into that classloader (due to
security controls). I'd like some way to be able to inject/make
available the hooks the continuation needs to access state outside
itself.


> If we choose not to use the actor model (I have no idea why we would) I
> would argue strongly in favor of the Promise model used in E.

Am agnostic about whether Actors are the best approach. Fun to try them out.


> If you do wish to chat, we can jump onto  irc.freenode.net/#swarm at a
> specified time. I should be pretty tied up this weekend (except for late
> evening EST) Other than that, I can make time whenever.

Sure. I'm on CET. We can try to coordinate next week some time.

Patrick Wright

unread,
Oct 17, 2009, 7:01:20 PM10/17/09
to swarm-...@googlegroups.com
> I think it would depend on the application.  In a web framework, I
> could see a number of small nodes each listening on port 80.  If a
> connection comes in to one of these nodes (perhaps via a
> load-balancer) then HttpRequest and HttpResponse objects are created
> and placed inside Refs (since these objects are not Serializable).
>
> Processing occurs, which may involve the continuation jumping around a
> bit, until eventually you need to send the response, which requires
> accessing the HttpResponse object, which automatically causes the
> continuation to jump back to the originating Swarm node.
>
> It seems quite neat to me.

That is a neat approach. I don't like the idea of "parking" the
request for an unknown amount of time, but there you go.


> In this case Swarm.moveTo() throws an exception?  Hopefully this type
> of thing can be mitigated through data redundancy.

And what happens then? The exception is thrown on node N...


>> 5- On any node, is there any way to give the continuation access to
>> the surrounding execution context other than the current one of using
>> static (or in Scala, object) methods and fields?
>
> Not sure, perhaps running the continuation from a subclass of Thread
> which contains additional fields.  So it would be something like:
>
> Thread.currentThread().asInstanceOf[SwarmThread].getContext()
>
> Where getContext() is implemented in SwarmThread, itself a subclass of Thread.

Hmm. We could use thread locals, which would be like an untyped map.
But it's better than using statics, IMO. Would give us more control
(on the receiving node) as to what the continuation has access to.


Some notes of my own:
The current demo code operates on a "simple" send (e.g. an IO write,
with no response). An alternative would be send-and-reply (like a
remote method call), but this seems incorrect as the receiver would
not have a reply until the continuation was done, or would always
reply with nothing (meaning "acknowledged"). As the continuation can
wander across several nodes, I think it needs to operate as a sort of
event which arrives within the node, performs some work, then possibly
moves to another node, or back "home". That model seems better than
modeling this as sends or remote method calls; I point that out
because distribution via RPC would be one (classic) approach.

Here's a queue-based or space-based approach. Every time a
continuation needs to move, it becomes a task in the queue/space. The
task has a unique id (UUID) and a "target", which is the node it wants
to move to. There are then three cases
- a task finishes successfully, and enqueues a result
- a task reaches an exception, and enqueues an error
- a task is waiting to reach a node which it can't reach

The original sender posts the task, then waits for either a result,
error, or a timeout on the task (e.g. it checks for there still being
an enqueued task with that UUID). If the sender itself reaches a
timeout, it can either dequeue the task itself, or if the task is on a
node, enqueue a kill message, indicating the task should hop no more.

An advantage of this is that while we add the extra hops to and from
the queue/space, each node can act asynchronously and doesn't have to
respond to incoming synchronous requests. A node can die, meaning
tasks targeted at it never get picked up and eventually time out, and
either the enqueue, the continuation processing on a node, or the
requeue can fail (they are transactional between the node and the
queue/space) without changing the fundamental state of the
continuation as a whole.

Tasks don't need to be specifically targeted at a node, either; they
can be keyed by meta data (of the data the continuation needs) and
just get picked up by whatever node(s) happen to have that data at
that time.

This would all fall apart pretty badly if continuations were modifying
data on nodes. I have no idea how to coordinate that in this approach.

Note that while queues/spaces are a point of failure, you can make
them persistent and redundant, and there need to be far fewer of them
compared to the number of nodes you have active to achieve
reliability.


Man, it's way too late. Hope this makes sense.


Patrick

Rick R

unread,
Oct 17, 2009, 7:48:47 PM10/17/09
to swarm-...@googlegroups.com

It's the "report back to" that I'm not clear on. Our continuation is
now on node N in the cluster. It needs to get back to node A and
access it's Actor...how? How does it know what Actor instance to
communicate with?


The swarm will actually be of the type RemoteActor, which behaves as you might expect. I am almost done with a prototype that uses RemoteActors for the Swarm.  It even transparently handles the object de/serialization. In short, it behaves exactly like the current system, it just abstracts away the details.

The task would simply maintain  a reference to the RemoteActor that initiated the Task, and it simply sends a message when it's done. The message would look like:

case class Success(id: Int, result : Ref)

or

case class Failure(id : Int, reason : String)


One upside of this approach is that the actor model is simple. The downside of this is that this actor may have spawned several Tasks, but it receives the same message type for all results, so it has to use the ID to close out the transaction that it has open.

If we go the Promise approach, then we can assign different callbacks to different promises, thus removing the need for a person to dispatch on request ids.

In either case, the Task needs to maintain a reference to the actor (or swarm location/port) that initiated it.

Patrick Wright

unread,
Oct 18, 2009, 5:04:40 AM10/18/09
to swarm-...@googlegroups.com
It will be interesting to see the code for the Actor/RemoteActor design.

It seems you can't (or won't want to) retain a reference to the Actor
itself within the task, because then you will serialize it as you move
the task across node. You mentioned retaining the home node's location
instead, which seems to make more sense to me.

Ian Clarke

unread,
Oct 18, 2009, 10:56:39 AM10/18/09
to swarm-...@googlegroups.com
On Sat, Oct 17, 2009 at 6:01 PM, Patrick Wright <pdou...@gmail.com> wrote:
> That is a neat approach. I don't like the idea of "parking" the
> request for an unknown amount of time, but there you go.

Right, but the HttpRequest and Response need to be "parked" in a
conventional web framework too. One assumes that they wouldn't be
parked for long unless something goes wrong.

>> In this case Swarm.moveTo() throws an exception?  Hopefully this type
>> of thing can be mitigated through data redundancy.
>
> And what happens then? The exception is thrown on node N...

I guess that depends on what happens in the "catch" block, if there is
a catch block.

Rick R

unread,
Oct 18, 2009, 12:22:34 PM10/18/09
to swarm-...@googlegroups.com
Seaside (and I'm fairly sure Lift as well) already "park" HttpResponses in continuations. It's a handy session management feature.  Lift has a rather convoluted "garbage collection" feature to destroy unused (infinitely parked) sessions. But it seems to work.

Patrick Wright

unread,
Oct 18, 2009, 12:36:02 PM10/18/09
to swarm-...@googlegroups.com
My question wasn't about whether request parking was possible, but
rather about whether it's the best way to approach the problem. An
alternate design is that a request is parked for either no time or a
very limited time, and the result of the continuation can result in an
update which a subsequent request comes back later to check on. Or one
could use Comet-style pushes out to the client when the continuation
was complete. In those alternate models, the request is a trigger to
start a process, but is not modeled around a blocking send-and-wait.

It's more a matter of what the first set of goals are. Some of Ian's
proposed uses for Swarm (e.g. matching two people on a social network)
could take some time to complete. If requests are parked, then the
model is roughly that the entire continuation will take no more than
<some acceptable time for a web request to block> to complete. If the
request just fires off a continuation and comes back later to check on
it, the continuations can be correspondingly more complex and take
longer. That's all.

Rick R

unread,
Oct 18, 2009, 12:43:52 PM10/18/09
to swarm-...@googlegroups.com
That makes sense. I tend towards the non-monitoring solution myself. The job will complete when it completes, the author should build in the appropriate system to handle a job that takes excessively long.  But if we can easily support both, that's great too.

Rick R

unread,
Oct 19, 2009, 1:39:44 PM10/19/09
to swarm-...@googlegroups.com
I have Ian's code refactored to use RemoteActor. I am running into an issue. Either due to the actors and continuations (unlikely) or some mistake on my part (more likely)  the remote node is recursively re-applying the moveTo when it executes a remote Ref storage. I'll post the code tonight
Reply all
Reply to author
Forward
0 new messages