I'm wondering if anyone has put thought into how any given node in a
Swarm finds out where a given data item or data set is located at the
moment?
I think at a minimum there are two problems to address
- how, within a continuation, we identify which data we want to access
- how to have an up-to-date view on each node of where data is located
As far as identification, the simplest thing I can think of to start
with is that we identify data by a class name (type) and a single key.
The key's value must be unique to that type across the cluster, and is
opaque (or blind).
There are a bunch of ways to propagate the knowledge of where the data
is. For example, we could have a handshake when nodes start up,
exchange their keys and type lists, and a broadcast when data moves.
But I haven't worked out anything concrete yet.
Thoughts?
Patrick
Well, as Peter points out, it would be continuation that would be
shipped to the data, the data doesn't move.
But apart from that, I like your idea. Here is a bit more depth on
how it could be implemented:
Every piece of data in Swarm has a uid, perhaps a 128 bit number
randomly chosen on object creation (meaning that the probability of a
collision is 1/(2^64) - ie. very low).
References to data contain the UID, and optionally the last known
location of the data.
If you need to ship a continuation, you can then use the last known
location. If its not there, then that peer tries to get it to its
destination.
If you have no last known location, or perhaps if you've already tried
to route this continuation and its found its way back to you, then you
do a broadcast to all nodes to find out where it is. The response
should also be a broadcast and all nodes that receive it should update
their last known location based on it.
The idea is that these broadcasts should be rare.
Thoughts?
Ian.
--
Ian Clarke
CEO, Uprizer Labs
Email: i...@uprizer.com
Ph: +1 512 422 3588