I
am trying to fully understand the previous work on which the SAFE
network is built and I am currently looking at the Kademlia
Distributed Hash Table, which served as inspiration for the
MaidSafe DHT.
In the Kademlia paper by Petar Maymounkov and David Mazières, it is said that the XOR distance is a valid non-Euclidian metric with limited explanations as to why each of the properties of a valid metric are necessary and/or interesting, namely:
Why
is it important for a metric to have these properties in general?
Why is each of these properties necessary in the context of
routing queries in the Kademlia Distributed Hash Table
implementation?
In
addition, the paper mentions that unidirectionality (for a given
x, and a distance l, there exist only a single y for which d(x,y)
= l) guarantees that all queries will converge along the same
path. Why is that so?
The question was posted on Stack
Overflow, so feel free to reply there directly or here and
I'll reformat the answer.
Thanks,
Erick
--
You received this message because you are subscribed to the Google Groups "MaidSafe-Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to maidsafe-develop...@googlegroups.com.
To post to this group, send email to maidsafe-d...@googlegroups.com.
Visit this group at http://groups.google.com/group/maidsafe-development.
To view this discussion on the web, visit https://groups.google.com/d/msgid/maidsafe-development/540FA859.8040200%40gmail.com.
For more options, visit https://groups.google.com/d/optout.
ChristopheCheers,In practice the problem appears to be even nicer than that, although that's a purely empirical observation: simulating unidirectional interest with exactly 64 nodes (rather than, say, between 9 and 256) results in a distribution that is eerily similar to that of a whole network of millions of nodes.Studying the behavior of neighborhood problems in the whole network is therefore mostly equivalent to studying the behavior of a small number of nodes over many configurations - in other words, neighborhoods relations appear to be roughly ergodic.For instance, a ratio of 64 nodes to each prefix group has the great majority of node distributions verifying that each prefix group contains at least 9 nodes. One then merely needs to study the behavior of between 9 nodes and up in one single prefix group to have a pretty good idea of the distribution of unidirectional interest relations.Assuming n >> 1, one can set c in the equation above to make the probability of having one prefix group hold less than k nodes arbitrarily close to 0. The probability mass of any closeness-related outcome (say, having a node with a unidirectional interest count of 10) is then almost fully contained in instances of the outcome happening within prefix groups.Note in particular that the number of nodes Tk required is O(n log n), which is consistent with the observation that worst-case unidirectional interest count is O(log node count), although this requires a bit of an explanation:which expresses the asymptotic behavior of the probability of needing less than a given number of nodes in the address space relative to the number of prefix groups (n in this case).This problem is a generalization of the 'coupon collector's problem', with g coupons. In particular: the following formula applies:So now the question is: how does one partition the address space so that there is a high probability that each prefix group contains g nodes or more?It follows that neighborhood problems (and unidirectional interest in particular) are mostly 'local'. For instance, dividing the address space in 2^p prefix groups - that is, groups of addresses prefixed by the same p bits - one can study closeness within each individual prefix group with very good precision so long as each prefix group is very likely to contain a sufficient number of nodes.|bc| = |ab| ^ |ac| starts with m zeros.if a and b are close, and c is far from a (that is, |ab|'s binary representation starts with n zeros, and |ac| starts with m < n zeros), then c is far from b:Now representing the network relative to one node (that is, as a set of distances from 0 to 2^512), we find that two close nodes have very close views:|ab| ^ |bc| = |ac|Here are some thoughts on the XOR metric, starting from the easy ones:Denoting |ab| as the distance between a and b, i.e. a^b
a and b therefore see c in the same bucket.
In the case of unidirectional interest, that number of nodes is 8+1: in a prefix group of 9 nodes, each node will find its 8 closest, and conversely each other node which finds it in its closest group is in the prefix group.
The ratio Tk/n (the average number of nodes per prefix group) therefore gives a good idea of the order of magnitude of the worst-case unidirectional interest count, and it follows from above that Tk/n is O(log n).
The '9 nodes and up' above part may seem a bit problematic, but the studied prefix group can be further divided in two for a sufficient number of nodes, at which point we are back to the same problem. It is therefore most likely sufficient to study the cases where the node count is between g and G, where G > g but G/g < 100.
---
The above analysis is not restricted to unidirectional interest, but rather applies to any network feature that is based on local groups.Well, congratulations if you've read all that.
I think this may explain it a wee bit, let me know http://metaquestions.me/2014/08/01/shortest-distance-between-two-points-is-not-always-a-straight-line/Basically each hop if it were only one bit at a time in a fully populated network (extreme) then would have twice the knowledge of the previous hop. As you converge the knowledge is greater until you get to the closest nodes whose knowledge is ultimate in the network.
On Wed, Sep 10, 2014 at 2:24 AM, Erick Lavoie <erick....@gmail.com> wrote:
I am trying to fully understand the previous work on which the SAFE network is built and I am currently looking at the Kademlia Distributed Hash Table, which served as inspiration for the MaidSafe DHT.
In the Kademlia paper by Petar Maymounkov and David Mazières, it is said that the XOR distance is a valid non-Euclidian metric with limited explanations as to why each of the properties of a valid metric are necessary and/or interesting, namely:
- d(x,x) = 0
- d(x,y) > 0, if x != y
- forall x,y : d(x,y) = d(y,x) -- symmetry
- d(x,z) <= d(x,y) + d(y,z) -- triangle inequality
Why is it important for a metric to have these properties in general? Why is each of these properties necessary in the context of routing queries in the Kademlia Distributed Hash Table implementation?
In addition, the paper mentions that unidirectionality (for a given x, and a distance l, there exist only a single y for which d(x,y) = l) guarantees that all queries will converge along the same path. Why is that so?
The question was posted on Stack Overflow, so feel free to reply there directly or here and I'll reformat the answer.
Thanks,
Erick
--
You received this message because you are subscribed to the Google Groups "MaidSafe-Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to maidsafe-development+unsub...@googlegroups.com.
To post to this group, send email to maidsafe-d...@googlegroups.com.
Visit this group at http://groups.google.com/group/maidsafe-development.
To view this discussion on the web, visit https://groups.google.com/d/msgid/maidsafe-development/540FA859.8040200%40gmail.com.
For more options, visit https://groups.google.com/d/optout.
To view this discussion on the web, visit https://groups.google.com/d/msgid/maidsafe-development/CAGYUzraarifCcCfrMTq%2BF%3DUV%2BvzxZNtak4htDsAgAk_YAYy1_w%40mail.gmail.com.
To unsubscribe from this group and stop receiving emails from it, send an email to maidsafe-develop...@googlegroups.com.
To post to this group, send email to maidsafe-d...@googlegroups.com.
Visit this group at http://groups.google.com/group/maidsafe-development.
To view this discussion on the web, visit https://groups.google.com/d/msgid/maidsafe-development/dd320c2f-284a-46f0-a7c3-498408a65728%40googlegroups.com.