What if I do something like:
var dog = dogRef() // Dereference dogRef to get the dog it points to
...stuff...
...more stuff...
At the point in time, the load balancer decides to move the object
dogRef points to to a different computer
...more stuff...
dog.makeDead() // we call a method on dog that mutates the dog ie. we
kill the dog
BUT - the actual dog pointed to by dogRef() is no-longer the same
object that was placed in the "dog" variable! We've mutated the wrong
object!
Does this make sense?
Ian.
--
Ian Clarke
CEO, Uprizer Labs
Email: i...@uprizer.com
Ph: +1 512 422 3588
Fax: +1 512 276 6674
Bayani's notion of a tombstone reference could be used passively, e.g.
you get an exception if a node you try to access within your process
has been moved and the reference is now a tombstone.
Just thinking out loud, it seems we would want to reduce the chance of
failing transactions due to data moves. One way to do this is to run
the operations on the data on a different time cycle, so to speak,
than the data-reorg process. So a reorg which attempts to co-locate
data, happens on a very slow, leisurely schedule. The operations
against the data operate at full speed. That might reduce the
contention between the two; an additional advantage is that moving
data will likely be (relatively) slow and expensive, and you may want
to defer this to off-peak times.
It seems there has to be some sort of prioritization, where the
request to write and the request to move are both balanced against
each other. Part of the prioritization would include elapsed time,
e.g. a request to move gets greater priority the longer it remains
unfulfilled. At some point, the node says, "sorry, guys, I know you
need to work on this data, but it's needed elsewhere as well." and
moves it.
Just a thought, I think their approach has advantages in a distributed
system. You have to address "the fallacies of distributed computing".
Nodes can crash--either the program, or the hardware-- admins can muck
up the network config making nodes unreachable, you can have failure
at any stage while moving data between nodes, etc. If you decentralize
everything, you need ways to recover when things go wonky. Central
authorities can help track the state of the overall system, and let
you know what processes were incomplete so you can restart them, for
example.
First, I don't think you can erase the distinction between local and
distributed computing. [1]
That aside, I think there's a pretty fundamental problem with the
current Ref design, which I pointed out in another thread. It's fine
if the continuation can stop on reaching an unapply for a Ref, and be
pushed over to another node where that Ref resides. But as soon as we
access the Ref in its local space (the node where it resides) and
assign it explicitly or implicitly to a local variable in the
continuation, we are going to carry it with us if and when we move to
other nodes. The more nodes we access, the more data we potentially
carry with us. That's expensive, potentially too expensive if we
happen to hold on to the root of a large object graph. I think we have
to somehow make an explicit distinction between data which is local to
the continuation and data which we can query, but will not actually
want to hold on to.
I think that in some ideal form, the idea of the wandering
continuation fits best to some accumulator model. That is, the purpose
and goal of the continuation is to visit nodes and collect data (say,
statistics) as it visits, without actually retrieving data in the
normal case. At most, it may retrieve a set of "keys", which it
delivers somewhere before it exits.
One approach I've thought of is that a Ref would not actually have a
normal unapply to access its data. Rather, you can either get the key
to the Ref (its UUID, so to speak), or you can ask the Ref to perform
some operation for you, using a closure. While we can't prevent you
from accidentally holding on to some remote data within the closure,
we could enforce some constraints, via an (as of now imaginary)
compiler plugin:
- from within a closure executed by a Ref, one is provided (via a
parameter) access to the data pointed to by the Ref, but are not
allowed to reference Refs directly within the block
- the return from the closure must not be a Ref, nor contain a
reference to a Ref
something like that. People may still make mistakes, but maybe we can
keep them away from the most egregious ones.
What I'm a little more interested in right now is the issue of the
conceptual model of programming within Swarm, rather than the issues
of concurrency control or balancing data across nodes. I think what
Ian has demo'ed so far is exciting, but needs a more solid conceptual
model before I'd find it useful.
Patrick
1: "A Note on Distributed Computing",
http://research.sun.com/techrep/1994/abstract-29.html
First, I don't think you can erase the distinction between local and
distributed computing. [1]
That aside, I think there's a pretty fundamental problem with the
current Ref design, which I pointed out in another thread. It's fine
if the continuation can stop on reaching an unapply for a Ref, and be
pushed over to another node where that Ref resides. But as soon as we
access the Ref in its local space (the node where it resides) and
assign it explicitly or implicitly to a local variable in the
continuation, we are going to carry it with us if and when we move to
other nodes. The more nodes we access, the more data we potentially
carry with us.
One approach I've thought of is that a Ref would not actually have a
normal unapply to access its data. Rather, you can either get the key
to the Ref (its UUID, so to speak), or you can ask the Ref to perform
some operation for you, using a closure.
...
What I'm a little more interested in right now is the issue of the
conceptual model of programming within Swarm, rather than the issues
of concurrency control or balancing data across nodes. I think what
Ian has demo'ed so far is exciting, but needs a more solid conceptual
model before I'd find it useful.
I think I replied to the wrong part of your earlier message, and maybe
I'm misunderstanding your point there. What I was replying to was "is
to make the distributed system as transparent as possible. So
whatever we come up with, the net effect is that it needs to be no
different than a completely unencumbered local application." It's that
point of view that the article I pointed to disagrees with.
I don't think that's possible to treat local calls and remote calls
the same, and I don't think it's useful to aim for it, except in the
sense that, sure, we can hide the details of data and computation
transfer from the end developer. It can be a comfortable API to use.
And sure, central to the proposal for Swarm is distributed
computation. But we can't (and shouldn't) obscure the underlying
difficulties.
I want to communicate the right tone here: yes, distributed, yes, easy
to use, yes, scalable. Sure. But we will have errors where somewhere
along its travel between nodes, for example when a continuation can't
move to the next node (node is unreachable; transfer breaks halfway
through; transfer succeeds but connection is lost before the client
realizes this). Or the next node isn't accepting more requests. Or
there's a serialization error (because the target node doesn't have
the correct version of the classfiles). Or the target nodes starts
working, then crashes. And so on. Those are just realities. I see them
all the time where I work, despite our best effort and intentions.
So the challenge as I see it is how to make Swarm relatively easy to
use, while still letting Swarm users build reliable distributed
systems.
> For that reason, as well as a myriad of others, things would be much easier
> if this were a pure functional system. No mutable state means no locking or
> access control required.
>
> Perhaps it would work as partially functional system like Erlang, in which
> the workers can read and copy state, but they can't modify it directly. They
> have to ask an external database (like mnesia) which does offer its own
> access control.
This is just a riff--but makes me think of XSL, in some way. A
computation visits the nodes of a tree that it's interested in, but
its goal is not to modify the tree, but to produce a tree of it's own
as output.
Maybe this would be one type of computation in Swarm, a sort of pure
visitor which simply could not have side-effects on the system it was
visiting. Updates would happen through a different API. It somewhat
inverts the Actor model (we aren't sending data, we are sending
computations), but keeps immutability as a core principle.
This is just a riff--but makes me think of XSL, in some way. A
computation visits the nodes of a tree that it's interested in, but
its goal is not to modify the tree, but to produce a tree of it's own
as output.
Maybe this would be one type of computation in Swarm, a sort of pure
visitor which simply could not have side-effects on the system it was
visiting. Updates would happen through a different API. It somewhat
inverts the Actor model (we aren't sending data, we are sending
computations), but keeps immutability as a core principle.
Can you write some Scala code to demonstrate what this might look like
in practice (ie. not the code to implement it, but some "user" code
that assumes its already been implemented).
I think this would give us a good sense of the "ergonomics" of the
approach you are suggesting.
Right, but how do we do this within the constraints of the JVM? How
do we make simply using a value in a variable throw an exception?