Understanding the XOR metric

491 views
Skip to first unread message

Erick Lavoie

unread,
Sep 9, 2014, 9:24:49 PM9/9/14
to maidsafe-d...@googlegroups.com

I am trying to fully understand the previous work on which the SAFE network is built and I am currently looking at the Kademlia Distributed Hash Table, which served as inspiration for the MaidSafe DHT.

In the Kademlia paper by Petar Maymounkov and David Mazières, it is said that the XOR distance is a valid non-Euclidian metric with limited explanations as to why each of the properties of a valid metric are necessary and/or interesting, namely:

  • d(x,x) = 0
  • d(x,y) > 0, if x != y
  • forall x,y : d(x,y) = d(y,x) -- symmetry
  • d(x,z) <= d(x,y) + d(y,z) -- triangle inequality

Why is it important for a metric to have these properties in general? Why is each of these properties necessary in the context of routing queries in the Kademlia Distributed Hash Table implementation?

In addition, the paper mentions that unidirectionality (for a given x, and a distance l, there exist only a single y for which d(x,y) = l) guarantees that all queries will converge along the same path. Why is that so?

The question was posted on Stack Overflow, so feel free to reply there directly or here and I'll reformat the answer.

Thanks,

Erick

David Irvine

unread,
Sep 9, 2014, 9:42:37 PM9/9/14
to maidsafe-d...@googlegroups.com
I think this may explain it a wee bit, let me know http://metaquestions.me/2014/08/01/shortest-distance-between-two-points-is-not-always-a-straight-line/

Basically each hop if it were only one bit at a time in a fully populated network (extreme) then would have twice the knowledge of the previous hop. As you converge the knowledge is greater until you get to the closest nodes whose knowledge is ultimate in the network. 


--
You received this message because you are subscribed to the Google Groups "MaidSafe-Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to maidsafe-develop...@googlegroups.com.
To post to this group, send email to maidsafe-d...@googlegroups.com.
Visit this group at http://groups.google.com/group/maidsafe-development.
To view this discussion on the web, visit https://groups.google.com/d/msgid/maidsafe-development/540FA859.8040200%40gmail.com.
For more options, visit https://groups.google.com/d/optout.



--

David Irvine
twitter: @metaquestions

Christophe Aguettaz

unread,
Sep 10, 2014, 5:33:56 AM9/10/14
to maidsafe-d...@googlegroups.com
This reminds me that I said I would post the following message on the public mailing list, and then never did until now. It may be of interest:

Here are some thoughts on the XOR metric, starting from the easy ones:

Denoting |ab| as the distance between a and b, i.e. a^b

|ab| ^ |bc| = |ac|

Now representing the network relative to one node (that is, as a set of distances from 0 to 2^512), we find that two close nodes have very close views:

if a and b are close, and c is far from a (that is, |ab|'s binary representation starts with n zeros, and |ac| starts with m < n zeros), then c is far from b:
|bc| = |ab| ^ |ac| starts with m zeros.

a
and b therefore see c in the same bucket.

It follows that neighborhood problems (and unidirectional interest in particular) are mostly 'local'. For instance, dividing the address space in 2^p prefix groups - that is, groups of addresses prefixed by the same p bits - one can study closeness within each individual prefix group with very good precision so long as each prefix group is very likely to contain a sufficient number of nodes.

In the case of unidirectional interest, that number of nodes is 8+1: in a prefix group of 9 nodes, each node will find its 8 closest, and conversely each other node which finds it in its closest group is in the prefix group.

So now the question is: how does one partition the address space so that there is a high probability that each prefix group contains g nodes or more?

This problem is a generalization of the 'coupon collector's problem', with g coupons. In particular: the following formula applies:



which expresses the asymptotic behavior of the probability of needing less than a given number of nodes in the address space relative to the number of prefix groups (n in this case).

Note in particular that the number of nodes Tk required is O(n log n), which is consistent with the observation that worst-case unidirectional interest count is O(log node count), although this requires a bit of an explanation:

Assuming n >> 1, one can set c in the equation above to make the probability of having one prefix group hold less than k nodes arbitrarily close to 0. The probability mass of any closeness-related outcome (say, having a node with a unidirectional interest count of 10) is then almost fully contained in instances of the outcome happening within prefix groups.

For instance, a ratio of 64 nodes to each prefix group has the great majority of node distributions verifying that each prefix group contains at least 9 nodes. One then merely needs to study the behavior of between 9 nodes and up in one single prefix group to have a pretty good idea of the distribution of unidirectional interest relations.

The ratio Tk/n (the average number of nodes per prefix group) therefore gives a good idea of the order of magnitude of the worst-case unidirectional interest count, and it follows from above that Tk/n is O(log n).

---

The '9 nodes and up' above part may seem a bit problematic, but the studied prefix group can be further divided in two for a sufficient number of nodes, at which point we are back to the same problem. It is therefore most likely sufficient to study the cases where the node count is between g and G, where G > g but G/g < 100.

Studying the behavior of neighborhood problems in the whole network is therefore mostly equivalent to studying the behavior of a small number of nodes over many configurations - in other words, neighborhoods relations appear to be roughly ergodic.

In practice the problem appears to be even nicer than that, although that's a purely empirical observation: simulating unidirectional interest with exactly 64 nodes (rather than, say, between 9 and 256) results in a distribution that is eerily similar to that of a whole network of millions of nodes.

The above analysis is not restricted to unidirectional interest, but rather applies to any network feature that is based on local groups.

Well, congratulations if you've read all that.

Cheers,

Christophe

On Wednesday, September 10, 2014 3:42:37 AM UTC+2, David Irvine wrote:
I think this may explain it a wee bit, let me know http://metaquestions.me/2014/08/01/shortest-distance-between-two-points-is-not-always-a-straight-line/

Basically each hop if it were only one bit at a time in a fully populated network (extreme) then would have twice the knowledge of the previous hop. As you converge the knowledge is greater until you get to the closest nodes whose knowledge is ultimate in the network. 

On Wed, Sep 10, 2014 at 2:24 AM, Erick Lavoie <erick....@gmail.com> wrote:

I am trying to fully understand the previous work on which the SAFE network is built and I am currently looking at the Kademlia Distributed Hash Table, which served as inspiration for the MaidSafe DHT.

In the Kademlia paper by Petar Maymounkov and David Mazières, it is said that the XOR distance is a valid non-Euclidian metric with limited explanations as to why each of the properties of a valid metric are necessary and/or interesting, namely:

  • d(x,x) = 0
  • d(x,y) > 0, if x != y
  • forall x,y : d(x,y) = d(y,x) -- symmetry
  • d(x,z) <= d(x,y) + d(y,z) -- triangle inequality

Why is it important for a metric to have these properties in general? Why is each of these properties necessary in the context of routing queries in the Kademlia Distributed Hash Table implementation?

In addition, the paper mentions that unidirectionality (for a given x, and a distance l, there exist only a single y for which d(x,y) = l) guarantees that all queries will converge along the same path. Why is that so?

The question was posted on Stack Overflow, so feel free to reply there directly or here and I'll reformat the answer.

Thanks,

Erick

--
You received this message because you are subscribed to the Google Groups "MaidSafe-Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to maidsafe-development+unsub...@googlegroups.com.

To post to this group, send email to maidsafe-d...@googlegroups.com.
Visit this group at http://groups.google.com/group/maidsafe-development.
To view this discussion on the web, visit https://groups.google.com/d/msgid/maidsafe-development/540FA859.8040200%40gmail.com.
For more options, visit https://groups.google.com/d/optout.

Erick Lavoie

unread,
Sep 10, 2014, 10:10:58 AM9/10/14
to maidsafe-d...@googlegroups.com
@David, thanks for the blog post link which I was not aware of. The examples were useful, as well as the insight that the fact that the distance is symmetric does not imply that closeness also is (I think we had a similar discussion about it last year on the mailing list but did not made the link until now).

There is also an interesting paradigm shift suggested by saying that every node has a local and personal view of the network, which is only partially shared by other nodes.  I get that the XOR metric has properties that supports that point of view. I'll also try to see how reasoning by 'travelling' to each node and see the network from its perspective helps in thinking about the behaviour of algorithms and the design of the system.

Erick

Erick Lavoie

unread,
Sep 10, 2014, 10:16:44 AM9/10/14
to maidsafe-d...@googlegroups.com
Thanks for sharing. I am not there yet to understand the partitioning problem and possible solutions, we'll see once I am more familiar with the XOR metric.

Erick
To unsubscribe from this group and stop receiving emails from it, send an email to maidsafe-develop...@googlegroups.com.

To post to this group, send email to maidsafe-d...@googlegroups.com.
Visit this group at http://groups.google.com/group/maidsafe-development.

Erick Lavoie

unread,
Sep 10, 2014, 10:20:51 AM9/10/14
to maidsafe-d...@googlegroups.com
BTW, thanks to Fraser for answering on stack overflow, this is exactly what I was looking for. I'll try to get in touch with a Math Prof at McGill to understand better metric spaces in general and I'll share any supplementary insights I might get from that.

Erick
Reply all
Reply to author
Forward
0 new messages