1- Who launches a Swam continuation? Where would this fit in to an
application lifecycle?
2- How does a continuation report back on the results of its work? In
the current demos, a new thread is spawned, and it may end by moving
the continuation elsewhere. How do we get ahold of the end results?
3- What happens if a continuation hits an exception while it is
a) on its home node, but running in a separate thread
b) not on its home node
4- What happens if a continuation wants to move to another node and
a) that node is not responding
b) communicating with that node throws an exception
5- On any node, is there any way to give the continuation access to
the surrounding execution context other than the current one of using
static (or in Scala, object) methods and fields?
Thanks
To start a discussion and pin some things down, in a prototypical
Swarm application
1- Who launches a Swam continuation? Where would this fit in to an
application lifecycle?
2- How does a continuation report back on the results of its work? In
the current demos, a new thread is spawned, and it may end by moving
the continuation elsewhere. How do we get ahold of the end results?
3- What happens if a continuation hits an exception while it is
a) on its home node, but running in a separate thread
b) not on its home node
4- What happens if a continuation wants to move to another node and
a) that node is not responding
b) communicating with that node throws an exception
5- On any node, is there any way to give the continuation access to
the surrounding execution context other than the current one of using
static (or in Scala, object) methods and fields?
I think it would depend on the application. In a web framework, I
could see a number of small nodes each listening on port 80. If a
connection comes in to one of these nodes (perhaps via a
load-balancer) then HttpRequest and HttpResponse objects are created
and placed inside Refs (since these objects are not Serializable).
Processing occurs, which may involve the continuation jumping around a
bit, until eventually you need to send the response, which requires
accessing the HttpResponse object, which automatically causes the
continuation to jump back to the originating Swarm node.
It seems quite neat to me.
> 2- How does a continuation report back on the results of its work? In
> the current demos, a new thread is spawned, and it may end by moving
> the continuation elsewhere. How do we get ahold of the end results?
See previous answer.
> 3- What happens if a continuation hits an exception while it is
> a) on its home node, but running in a separate thread
> b) not on its home node
That will depend on what happens within the exception handling code I
guess. I'm not sure how the Scala continuations plugin handles
exceptions to be honest.
> 4- What happens if a continuation wants to move to another node and
> a) that node is not responding
> b) communicating with that node throws an exception
In this case Swarm.moveTo() throws an exception? Hopefully this type
of thing can be mitigated through data redundancy.
> 5- On any node, is there any way to give the continuation access to
> the surrounding execution context other than the current one of using
> static (or in Scala, object) methods and fields?
Not sure, perhaps running the continuation from a subclass of Thread
which contains additional fields. So it would be something like:
Thread.currentThread().asInstanceOf[SwarmThread].getContext()
Where getContext() is implemented in SwarmThread, itself a subclass of Thread.
Ian.
--
Ian Clarke
CEO, Uprizer Labs
Email: i...@uprizer.com
Ph: +1 512 422 3588
Fax: +1 512 276 6674
Never got used to IRC, but it's always time for something new..
>
> To start, I'll say that my favorite interfaces for distributed processing
> are purely asynchronous. A good example is the E programming language.
> http://www.erights.org/
Will look at it.
> In addition to offering a capabilities based security model. It performs
> processing by returning Promises for every function call. The receiver of
> that Promise can choose to block on that Promise for its fulfillment, or
> register a callback on the success or error conditions and continue in its
> asynchrony. I think the people that have done a lot of research in the realm
> of distributed systems seem to agree that asynchrony is terribly important.
> We should build tools that leverage this.
Promise: sounds interesting...
Asynchrony: sure
> I would like to see a continuation report back to the Actor that spawned it
> via some sort of standard Complete message. This would include the ID of the
> request to which it is responding.
It's the "report back to" that I'm not clear on. Our continuation is
now on node N in the cluster. It needs to get back to node A and
access it's Actor...how? How does it know what Actor instance to
communicate with?
> Instead of a Complete message, it would be an Error message. This would be
> the case whether it was local or remote.
Sure, I'm just not clear on the return path.
> I am assuming that the primary reason for wanting to move to that node would
> be one of 2 reasons:
> 1. There is data there that this task needs.
> 2. There are available CPU cycles on that node.
>
> In either case, becoming unresponsive seems to invalidate both cases. I
> guess this should cause the entire system to seek other nodes. (We need to
> be careful with this, I've found through painful experience that failover
> systems are never as simple as one originally thinks)
I agree that it may invalidate the progress of the continuation.
Again, we could handle it with your Error case (or ErrorIncomplete),
but again we need to get back home...
>> 5- On any node, is there any way to give the continuation access to
>> the surrounding execution context other than the current one of using
>> static (or in Scala, object) methods and fields?
>
> I'm not sure of the reasoning for this. And I'm not sure of the answer.
Reasoning: static state is essentially global within a classloader.
It's not (any longer) a standard way to provide access to anything
other than something like singleton instances or completely immutable
(fixed) data. I'm not sure we can guarantee we will end up in the
right classloader, or with visibility into that classloader (due to
security controls). I'd like some way to be able to inject/make
available the hooks the continuation needs to access state outside
itself.
> If we choose not to use the actor model (I have no idea why we would) I
> would argue strongly in favor of the Promise model used in E.
Am agnostic about whether Actors are the best approach. Fun to try them out.
> If you do wish to chat, we can jump onto irc.freenode.net/#swarm at a
> specified time. I should be pretty tied up this weekend (except for late
> evening EST) Other than that, I can make time whenever.
Sure. I'm on CET. We can try to coordinate next week some time.
That is a neat approach. I don't like the idea of "parking" the
request for an unknown amount of time, but there you go.
> In this case Swarm.moveTo() throws an exception? Hopefully this type
> of thing can be mitigated through data redundancy.
And what happens then? The exception is thrown on node N...
>> 5- On any node, is there any way to give the continuation access to
>> the surrounding execution context other than the current one of using
>> static (or in Scala, object) methods and fields?
>
> Not sure, perhaps running the continuation from a subclass of Thread
> which contains additional fields. So it would be something like:
>
> Thread.currentThread().asInstanceOf[SwarmThread].getContext()
>
> Where getContext() is implemented in SwarmThread, itself a subclass of Thread.
Hmm. We could use thread locals, which would be like an untyped map.
But it's better than using statics, IMO. Would give us more control
(on the receiving node) as to what the continuation has access to.
Some notes of my own:
The current demo code operates on a "simple" send (e.g. an IO write,
with no response). An alternative would be send-and-reply (like a
remote method call), but this seems incorrect as the receiver would
not have a reply until the continuation was done, or would always
reply with nothing (meaning "acknowledged"). As the continuation can
wander across several nodes, I think it needs to operate as a sort of
event which arrives within the node, performs some work, then possibly
moves to another node, or back "home". That model seems better than
modeling this as sends or remote method calls; I point that out
because distribution via RPC would be one (classic) approach.
Here's a queue-based or space-based approach. Every time a
continuation needs to move, it becomes a task in the queue/space. The
task has a unique id (UUID) and a "target", which is the node it wants
to move to. There are then three cases
- a task finishes successfully, and enqueues a result
- a task reaches an exception, and enqueues an error
- a task is waiting to reach a node which it can't reach
The original sender posts the task, then waits for either a result,
error, or a timeout on the task (e.g. it checks for there still being
an enqueued task with that UUID). If the sender itself reaches a
timeout, it can either dequeue the task itself, or if the task is on a
node, enqueue a kill message, indicating the task should hop no more.
An advantage of this is that while we add the extra hops to and from
the queue/space, each node can act asynchronously and doesn't have to
respond to incoming synchronous requests. A node can die, meaning
tasks targeted at it never get picked up and eventually time out, and
either the enqueue, the continuation processing on a node, or the
requeue can fail (they are transactional between the node and the
queue/space) without changing the fundamental state of the
continuation as a whole.
Tasks don't need to be specifically targeted at a node, either; they
can be keyed by meta data (of the data the continuation needs) and
just get picked up by whatever node(s) happen to have that data at
that time.
This would all fall apart pretty badly if continuations were modifying
data on nodes. I have no idea how to coordinate that in this approach.
Note that while queues/spaces are a point of failure, you can make
them persistent and redundant, and there need to be far fewer of them
compared to the number of nodes you have active to achieve
reliability.
Man, it's way too late. Hope this makes sense.
Patrick
It's the "report back to" that I'm not clear on. Our continuation is
now on node N in the cluster. It needs to get back to node A and
access it's Actor...how? How does it know what Actor instance to
communicate with?
It seems you can't (or won't want to) retain a reference to the Actor
itself within the task, because then you will serialize it as you move
the task across node. You mentioned retaining the home node's location
instead, which seems to make more sense to me.
Right, but the HttpRequest and Response need to be "parked" in a
conventional web framework too. One assumes that they wouldn't be
parked for long unless something goes wrong.
>> In this case Swarm.moveTo() throws an exception? Hopefully this type
>> of thing can be mitigated through data redundancy.
>
> And what happens then? The exception is thrown on node N...
I guess that depends on what happens in the "catch" block, if there is
a catch block.
It's more a matter of what the first set of goals are. Some of Ian's
proposed uses for Swarm (e.g. matching two people on a social network)
could take some time to complete. If requests are parked, then the
model is roughly that the entire continuation will take no more than
<some acceptable time for a web request to block> to complete. If the
request just fires off a continuation and comes back later to check on
it, the continuations can be correspondingly more complex and take
longer. That's all.