Now works with JavaSpaces

6 views
Skip to first unread message

Patrick Wright

unread,
Oct 11, 2009, 9:25:12 AM10/11/09
to swarm-...@googlegroups.com
Hi

I wanted to report on my attempt to get Swarm to work with JavaSpaces
[1] [2]. To whit: it works.

A whole bunch of details follow.

A JavaSpace instance is hosted by a server; each space can be located
anywhere on the network, and can be identified by metadata so that you
can select one or the other space. Communication with the space is
supported by Jini, which means you can choose the transport,
serialization protocol, security controls, etc. for each situation.
Clients write "entries" (implement marker interface Entry), where each
Entry has a set of public fields which represent either metadata or
content, all fields are Objects, all fields are serializable. Clients
can read, or read-and-remove, entries, and can select entries based on
some combination of metadata fields. Both reading and writing can be
wrapped in a transaction. Entries are written with a "lease", which
determines how long they will persist in the space before being
automatically removed. Clients can receive callback notifications when
entries matching their search appear in the space.

Once the various JS-related servers are started, to use JS from Swarm,
there are three basic steps
- locate a space (instance of JavaSpace, or it's
batch-operation-enabling subclass, JavaSpace05)
- write entries into the space by creating an Entry instance, with
fields set to the data to transfer, and calling space.write(...)
- read entries from the space by calling space.read(...),
space.take(...) or space.takeIfExists(...)

The read operation provides a "template" instance of the entry which
the space uses to find matches.

One catch: JavaSpaces and Jini base their metadata on public fields in
Java (the Artima interview with Ken Arnold addresses why). It seems
there is no way to generate the bytecode for "real" public fields in
Scala, so my Entry is a Java class that looks like this:
public class NewBee extends Name {
public Object task;
public NewBee() {}
public NewBee(String name) { super(name); }

public String toString() {
return "NewBee(" + task + ") :> " + super.toString();
}
}

Name is a basic entry provided with the JS API; it just has a single
String field, name. I'm using the name field to represent a server
which can send or receive Bees. The task field represents the
serialized continuation; it's supposed to be a Function1 reference but
I couldn't get the generics signature right in Java-Scala. Note that
apart from public fields, you need a public no-args constructor as
well. Getters and setters are not used by the space itself, in case
you declare them. Entries can also have methods which can be used by
readers or writers, as normal.

The write operation (Swarm.execute()) looks roughly like this:
case IsBee(contFunc, location) => {
space = JiniUtil.getSpace()
val bee = new NewBee(format("swarm(%s:%s)", location.address,
location.port))
bee.task = contFunc
try {
val lb = space.write(bee, null, Lease.FOREVER)
log(format("wrote entries %s into space", bee))
}
catch {
case e => e.printStackTrace()
}

The name field is set to a string representing the IP:port of the
target (listener) process. It could be anything. You can have as many
fields to match on as you like; the only requirement is that they be
public, non-primitive fields.

The read operation (Swarm.listen()) creates a listener thread like this:
var listenThreadSpace = new Thread() {
override def run() = {
while (true) {
try {
val tmpl = new NewBee(format("swarm(%s:%s)",
myLocation.address, myLocation.port))
log(format("Waiting for entries in Space using template:
%s", tmpl));
val bee: NewBee = space.take(tmpl, null, 10 *
1000).asInstanceOf[NewBee]
if (bee == null) {
log("nothing found; sleeping")
Thread.sleep(1000)
} else {
log(format("READ: Got bee %s from Space; executing
continuation", bee));

if (bee.task == null) {
log("hmm, task is null: " + bee)
} else {
var task : (Unit => Bee) = bee.task.asInstanceOf[(Unit => Bee)]
Swarm.run((Unit) => shiftUnit(task()))
}
}
}
catch {
case e => e.printStackTrace()
}
}
}
}

Because I'm using an Object type for the continuation, there's a cast
operation (asInstanceOf[]) when removing the entry. Note the template
at the top: we tell the space we are looking for an instance of NewBee
with a name of our ip:port (as a string). Also, this read is written
as "block for no more than 10 seconds"; it could be non-blocking
(timeout of 0), meaning it returns immediately with a null if there is
nothing to take, or could block forever if we want to wait until
something appears. I used a limited timeout for debugging purposes.

Looking up the space to work with is handled by JiniUtil.getSpace(); I
used the most simple approach possible; there's actually a huge amount
of flexibility (for better or worse) in that part of the Jini APIs.
def getSpace(): JavaSpace05 = {
if (System.getSecurityManager() == null) {
System.setSecurityManager(new RMISecurityManager());
}
try {
val reg: ServiceRegistrar = new
LookupLocator("jini://cgbspender/").getRegistrar()
val space = reg.lookup(new ServiceTemplate(null,
Array(classOf[JavaSpace05]), null)).asInstanceOf[JavaSpace05]
if (space == null) throw new RuntimeException("not found")
space
} catch {
case e: IOException => throw new RuntimeException(e);
case e: ClassNotFoundException => throw new RuntimeException(e);
}
}

Here I'm using a LookupLocator, which basically performs a unicast
lookup; useful when you know the target server (you can specify more
than one, in an array) or when you can't use multicast. Multicast, if
your network supports it, is more comfortable.

The ServiceRegistrar is a reference to a server called a Lookup
Service in Jini terminology. It acts as a central registry for all
Jini services running on the network. It is also, haha, a Jini
service, which means you can look up the SR using metadata, can have
multiple of them running, etc.

Note that the client code, in Listen.scala and ExplicitMoveTo.scala,
aren't affected at all by any of this.

This is an asynchronous approach. Asynchrony has some advantages, one
of which being that the target server (where we want the continuation
to go next) can more easily throttle incoming requests. Also, any
entry written in to the space stays there until it is taken or it
times out; that means that server A can post a continuation even if
server B is down, and vice-versa. A synchronous approach would be
available by registering a Jini service; a lot of the infrastructure
is the same, the main difference is that you receive an instance of
some Java interface, say ContinuationExecutor, and just call a method
on it, passing your continuation as an argument.

Sorry for the brain dump, this chewed up most of my weekend and I want
to rejoice :).


Cheers
Patrick

1 - you may want to read up on Tuple Spaces, the more general concept
behind JavaSpaces: http://en.wikipedia.org/wiki/Tuple_space
2 - lots of (mostly older) docs on the web; see
http://java.sun.com/developer/technicalArticles/jini/javaspaces/,
http://www.javaworld.com/javaworld/jw-11-1999/jw-11-jiniology.html,
and a great set of interviews with Ken Arnold on Artima.com,
http://www.artima.com/intv/perfect.html

Ian Clarke

unread,
Oct 11, 2009, 10:37:46 AM10/11/09
to swarm-...@googlegroups.com
On Sun, Oct 11, 2009 at 8:25 AM, Patrick Wright <pdou...@gmail.com> wrote:
I wanted to report on my attempt to get Swarm to work with JavaSpaces
[1] [2]. To whit: it works.

Awesome!  What advantages do you see of using JavaSpaces?  I assume persistence is one of them, are there others?

Do you think Swarm should always use JavaSpaces, or should it be plugable?

Really great to have another committer on the project :-)

Ian.

--
Ian Clarke
CEO, Uprizer Labs
Email: i...@uprizer.com
Ph: +1 512 422 3588
Fax: +1 512 276 6674

Patrick Wright

unread,
Oct 11, 2009, 11:08:50 AM10/11/09
to swarm-...@googlegroups.com
Hi Ian

Starting with your second question first--I think the mechanism to
discover and move continuations should be pluggable. There will
probably be a bunch of experimentation--including Razie's Home
Cloud/agents--before anyone knows what works best. We'd need to figure
out the minimum API an implementation would support--e.g. accepting
serialized continuations, supporting the Ref and TreeMap, etc.

Also, during development, distributed programming is a nightmare; it's
easy to end up fighting with the infrastructure when you want to test
the functionality of your code. Being pluggable means we can support
an in-memory/mocked remote service and write unit tests against that,
then integration-tests against the running servers.

I see JavaSpaces as having advantages and disadvantages.
Advantages
- real asynchrony; continuations can be "posted" to be handled at some
later time
- real independence: the sender doesn't need to know who will handle the request
- allows for easy throttling of incoming tasks on receiving server
- very simple client API
- support for transactions; a target server could crash or for some
reason fail to complete the execution, and the entry would remain in
the space for another server to pick up (or itself, on a retry).
Transactions can also cover multiple takes/puts relative to the space
- real distributed architecture: run multiple spaces, identify them
using metadata, locate them either with fixed addresses or via
multicast
- multiple serialization and transport protocols; could run over
HTTP(S), for example
- flexibility: any server can register itself on the network at any
time, along with the data it owns. That means that the Map of data to
service could be updated dynamically. In fact, when applying a Ref,
you could use the Ref's ID as a key to an entry in the space, locate
the owner of the data, and move there (lots of options)

Disadvantages
- Infrastructure complexity. I think I can talk anyone through a plain
vanilla setup (or push it into Maven configuration) but the learning
curve for the infrastructure parts of JavaSpaces and Jini is pretty
steep. Lots of flexibility built in to try and address the challenges
of distributed programming.
- Not a lot of options for the spaces server implementation: there's
the RI (called Outrigger), Blitz JavaSpaces, Gigaspaces/Rio and ?.
There may be a few others. All of these should be stable but it's not
like there are a dozen to choose from.
- JavaSpaces depends on Jini for the networking stuff, which means
enormous flexibility, but a high complexity esp. around security. It's
easy to run with security off, but if you start to enable it, you need
to think and plan. The security model is partly enforced by using
special classloaders, which can be a difficult to debug if something
isn't working.

Basically: as long as you stick to a very basic infrastructure
configuration, JavaSpaces has a really simple, but conceptually very
powerful API.


> Really great to have another committer on the project :-)

Glad to be aboard.


Cheers,
Patrick

Reply all
Reply to author
Forward
0 new messages