How to access data on a remote server?

Patrick Wright

unread,

Oct 9, 2009, 9:30:44 AM10/9/09

to swarm-...@googlegroups.com

Hi

One question that's been bugging me the last couple of days vis-a-vis
swarm is: when I've sent my serialized continuation from server A to
server B, how is the restarted continuation to access data on server
B? It's not like a class which gets deserialized and into which we can
inject some state when it arrives on server B. If the point is for the
continuation to move to where the data is, we have to have some way of
locating the data and working with it...

I'm probably missing something here.

Thanks
Patrick

Ian Clarke

unread,

Oct 9, 2009, 11:39:13 AM10/9/09

to swarm-...@googlegroups.com

Hi Patrick,

Swarm uses the "Ref" class see [1] to refer to data which may reside
on a remote computer. Within Ref, this data is indexed by a long
integer uid.

If you try to retrieve the data from a Ref class (see Ref.apply()),
and the data is on a remote computer, then the continuation will be
moved automatically such that the data is now local, and it can be
retrieved via the integer uid from the Store [2].

I hope that answers your question,

Ian.

[1] http://github.com/sanity/Swarm/blob/master/src/swarm/Ref.scala
[2] http://github.com/sanity/Swarm/blob/master/src/swarm/Store.scala

--
Ian Clarke
CEO, Uprizer Labs
Email: i...@uprizer.com
Ph: +1 512 422 3588
Fax: +1 512 276 6674

Patrick Wright

unread,

Oct 11, 2009, 4:07:45 PM10/11/09

to swarm-...@googlegroups.com

Hi Ian

> Swarm uses the "Ref" class see [1] to refer to data which may reside
> on a remote computer. Within Ref, this data is indexed by a long
> integer uid.
>
> If you try to retrieve the data from a Ref class (see Ref.apply()),
> and the data is on a remote computer, then the continuation will be
> moved automatically such that the data is now local, and it can be
> retrieved via the integer uid from the Store [2].

Your Scala-foo is more advanced than mine. What I'm missing in Ref is
"...and the data is on a remote computer". What I see in the sources
is that apply() , without a location, uses Swarm.myLocation, which is
the local server, correct? Where do we maintain the reference from a
Ref id to a node?

I can wait until you've written a demo for it, though, that may clear it up.

Thanks
Patrick

Ian Clarke

unread,

Oct 11, 2009, 5:26:46 PM10/11/09

to swarm-...@googlegroups.com

On Sun, Oct 11, 2009 at 3:07 PM, Patrick Wright <pdou...@gmail.com> wrote:

> Swarm uses the "Ref" class see [1] to refer to data which may reside
> on a remote computer. Within Ref, this data is indexed by a long
> integer uid.
>
> If you try to retrieve the data from a Ref class (see Ref.apply()),
> and the data is on a remote computer, then the continuation will be
> moved automatically such that the data is now local, and it can be
> retrieved via the integer uid from the Store [2].

Your Scala-foo is more advanced than mine. What I'm missing in Ref is
"...and the data is on a remote computer". What I see in the sources
is that apply() , without a location, uses Swarm.myLocation, which is
the local server, correct?

Correct.

Where do we maintain the reference from a
Ref id to a node?

Its in the "location" field, which is declared in the class constructor (line 15 of Ref.scala):

@serializable class Ref[Type](val typeClass : Class[Type], val location : Location, val uid : Long) {

Unlike Java, where you have to declare fields explicitly, Scala allows you to declare them in the constructor per the above. So the line above creates 3 fields in Ref, typeClass, location, and uid. Saves time and typing, but I can see how it would be confusing to Scala newcomers.

I can wait until you've written a demo for it, though, that may clear it up.

A demo for...?

Ian.

Patrick Wright

unread,

Oct 12, 2009, 12:57:11 AM10/12/09

to swarm-...@googlegroups.com

Thanks for the clarification. I misunderstood your earlier
explanation: I thought that, given
Ref("users.dat")
or
Ref("users.dat", 12345)

Then a Ref would (internally) know the location to which the users.dat
or uid 12345 was mapped, or would know how to look the location up
from somewhere else.

By demo I mean--there are no demos for either Ref or TreeMap at the
moment, so I'm not sure what this would look like in client code.
Given that a Ref must be given a location, I assume a TreeMap will be
have hard-coded ref-location entries, or would download a TreeMap from
another server to know what refs were available from that server?

Thanks
Patrick

Ian Clarke

unread,

Oct 12, 2009, 9:51:38 AM10/12/09

to swarm-...@googlegroups.com

On Sun, Oct 11, 2009 at 11:57 PM, Patrick Wright <pdou...@gmail.com> wrote:

Thanks for the clarification. I misunderstood your earlier
explanation: I thought that, given
Ref("users.dat")
or
Ref("users.dat", 12345)

Then a Ref would (internally) know the location to which the users.dat
or uid 12345 was mapped, or would know how to look the location up
from somewhere else.

Hmm. Ref(object) creates a reference to the object using Swarm.myLocation. So if that Ref travels to a remote node, then because it contains this computer's location, it can always find its way back here if the code needs to retrieve that object.

By demo I mean--there are no demos for either Ref or TreeMap at the
moment, so I'm not sure what this would look like in client code.

TreeMap isn't complete yet, its a work-in-progress, but Ref is used in the ForceRemoteRef.scala demo - is that what you meant?

Given that a Ref must be given a location, I assume a TreeMap will be
have hard-coded ref-location entries, or would download a TreeMap from
another server to know what refs were available from that server?

Typically the Ref constructor will not be given a location, rather it will use Swarm.myLocation to indicate that the object in question is on the computer the code is currently executing on. Refs are always created locally, but then they may find themselves being transferred to remote nodes (since Refs are serializable, even though the objects they point to don't need to be).

I hope that helps to clarify, let me know if it doesn't.

Patrick Wright

unread,

Oct 12, 2009, 4:07:20 PM10/12/09

to swarm-...@googlegroups.com

Hi

This does help. I had actually seen ForceRemoteRef before, but I
thought it was a non-working demo or stub, since I didn't get how Refs
worked.

That demo is pretty mind-blowing; impressive how much it shows in so
little code. Kudos.

A Ref is still a bit conceptually confusing, though, or rather,
there's a feature I wasn't expecting.

val vLoc = Ref("test local string");

Creates a ref with location = this server.

vRem = Ref(new Location(myLocation.address, 9997), "test remote string");

Creates a ref with location = remote server, then serializes itself to
the remote server, saves the content of the ref on the remote server,
and continues executing on the remote server.

I'm not sure why just creating a ref with a remote location would
cause the transfer, and why server A should be able to push data into
the Store in server B. Like all of this, it's interesting, but what
are you aiming at with this functionality?

Also, to what extent should Swarm expose or make clear operations
which may cause a move, and to what extent should it be hidden (though
possibly easier to use)?

Adding this line
println(format("%s:%s:%s:%s:%s",vRem(), vLoc(), vRem(), vLoc(),vRem()))

to the demo causes multiple hops between servers; imagine iterating
Refs inside a list.

Very interesting stuff, though. Hope I am closer to grokking it.

Patrick

Ian Clarke

unread,

Oct 12, 2009, 4:22:25 PM10/12/09

to swarm-...@googlegroups.com

On Mon, Oct 12, 2009 at 3:07 PM, Patrick Wright <pdou...@gmail.com> wrote:

vRem = Ref(new Location(myLocation.address, 9997), "test remote string");

Creates a ref with location = remote server, then serializes itself to
the remote server, saves the content of the ref on the remote server,
and continues executing on the remote server.

I'm not sure why just creating a ref with a remote location would
cause the transfer,

So that is calling Ref.apply(location, value) defined on line 30 of Ref.scala. Note that on line 32 there is a call to Swarm.moveTo() - this is why the continuation moves to the remote server if location is non-local (of course, if location is the local node then Swarm.moveTo() has no effect).

and why server A should be able to push data into
the Store in server B.

You wouldn't, typically the decision to locate a piece of data on a remote server would be made automatically, perhaps for purposes of load-balancing.

I do it manually in this demo just to demonstrate how the code will automatically be moved to the location of a Ref, because the stuff which would automatically move data around isn't implemented yet.

Also, to what extent should Swarm expose or make clear operations
which may cause a move, and to what extent should it be hidden (though
possibly easier to use)?

In theory, the programmer shouldn't need be be aware of when a move occurs, it should be entirely transparent to them. It is explicit in this demo (and the other one) simply so that I can demonstrate this functionality because the stuff to handle this automatically is yet to be implemented.

Adding this line
println(format("%s:%s:%s:%s:%s",vRem(), vLoc(), vRem(), vLoc(),vRem()))

to the demo causes multiple hops between servers; imagine iterating
Refs inside a list.

Yup. In practice, a load balancing mechanism would try to avoid something like that by ensuring that all those Refs are on the same machine.

Very interesting stuff, though. Hope I am closer to grokking it.

I think so, hope this helps.

Ian.

Patrick Wright

unread,

Oct 13, 2009, 4:14:00 PM10/13/09

to swarm-...@googlegroups.com

> Typically the Ref constructor will not be given a location, rather it will
> use Swarm.myLocation to indicate that the object in question is on the
> computer the code is currently executing on. Refs are always created
> locally, but then they may find themselves being transferred to remote nodes
> (since Refs are serializable, even though the objects they point to don't
> need to be).

One comment about the serialization aspect--when a field is not
serializable, my experience is that you will typically receive a null
on the receiving end, possibly without any exception being thrown in
the process. The Serializable interface doesn't demand transitivity,
although I believe a number of code-checkers will warn you if you have
non-serializable fields inside a class marked serializable. But if you
receive data from someone else, all bets are off--you just won't know
if you can transport it across the wire.

I mention this because it will be a design issue for people working
with Swarm, and it's one thing I've had to worry about in working with
serialization over the last couple of years.

Patrick

Ian Clarke

unread,

Oct 13, 2009, 5:28:04 PM10/13/09

to swarm-...@googlegroups.com

Shouldn't this throw a NotSerializableException?

Ian.

Patrick Wright

unread,

Oct 14, 2009, 12:47:12 AM10/14/09

to swarm-...@googlegroups.com

> Shouldn't this throw a NotSerializableException?

That's a good question; maybe I misspoke. I will see if I can come up
with a scenario where no exception is thrown. I thought I had seen
that happen before, but I may be confusing two different issue.

Patrick Wright

unread,

Oct 14, 2009, 3:47:14 AM10/14/09

to swarm-...@googlegroups.com

> One comment about the serialization aspect--when a field is not
> serializable, my experience is that you will typically receive a null
> on the receiving end, possibly without any exception being thrown in
> the process. The Serializable interface doesn't demand transitivity,
> although I believe a number of code-checkers will warn you if you have
> non-serializable fields inside a class marked serializable. But if you
> receive data from someone else, all bets are off--you just won't know
> if you can transport it across the wire.

Please file this in the "I didn't mean to say that" category. The
first part is wrong (you will cause an exception if you try to
serialize a non-serializable class, or one with a non-serializable
field). You won't end up with a serialized object but a null field for
the value that couldn't be serialized. I was thinking about nulls when
I wrote that because of the error I ran into with JavaSpaces, where
the serialization code for JS ignores any non-public fields. I feel
stupid :).

What I meant to say :) was that the problem is that first, when
receiving an object from someone else, or even one of your own, you
can't know, without examining the object graph, whether it's actually
serializable or not. You can declare a class Serializable and include
in it non-transient fields for object types that are not
serializable--that will compile. It will fail on serialization,
though. So if you receive some type of reference from a method call
you can't know if you can actually send it across the wire or not. You
would have to use reflection (I think) to check the graph all the way
down.

I think that everyone knows that is an issue when working with
something like RMI or Jini, that the method calls must take parameters
and return values that are serializable. But if we are working in
Swarm, we need to know that the data we are accessing on any given
server is serializable, otherwise, if we assign it to a "local"
variable (e.g. local to the continuation, in the method or block we
are serializing), when we try to hop over to the next server, it will
fail. That would imply that the data sets we work with in Swarm can't
be any old data set available in our applications, right? We would
have to separate out the part that may not be serializable, and should
not be referenced from within a continuation, from the part we can use
within a continuation.

Am I thinking about this correctly?
Patrick

Reply all

Reply to author

Forward