A Case for an ODRI Network

nano

unread,

May 25, 2006, 5:04:02 AM5/25/06

to okopipi-dev

Here is the interesting paper in question:
http://www.cs.uiowa.edu/~ghosh/sigcomm2003.pdf

I encourage you to read the paper, but even if you don't, my
description here should make sense.

Introduction

First, an introduction to de Bruijn graphs. De Bruijn graphs (usually
with a lower-case "de") are a class of graph
(http://mathworld.wolfram.com/Graph.html) that has had much recent
attention in mathematical circles because of their nearly optimal
routing capabilities. They are special kinds of graphs that are built
in such a way that the maximum number of hops from one node to another
is always a constant factor, no matter how many nodes are in the
network, and yet the degree of the graph (the number of out-bound
connections per node) is also constant. That means that, for example,
with a constant degree of ten, the maximum number of hops for any path
in the network is a constant number, like six (for a million nodes in
the network). This makes it roughly two to three times faster than
other p2p networks like Chord, CAN, Pastry, etc., which enables us to
halve the degree of the network from the usual ~20 peers to ~10 (which
lowers network overhead), while still maintaining nearly 100% faster
routing.

Down the Rabbit Hole

Here we go. The following is an in-depth description of de Bruin
graphs, which will in turn be followed by a description of ODRI, a p2p
network topology which is based on a de Bruijn graph topology.

A de Bruijn graph has two variables: the degree of the graph (k), and
the diameter of the graph (D). The number of nodes in the graph is N,
where N = k^D. The classic example is that of a network with k=2 and
D=3. Each node in the graph has a string label, where the string is D
characters long, and each character is a digit with base k. So, in our
(2,3) de Bruijn graph, each node has a label like: 000, 001, 010, etc.

Here is an image of our (2,3) graph:
http://www.unifr.ch/econophysics/minority/debruijnm3.png

As you may notice, de Bruijn graphs are "directed" as opposed to
"undirected" graphs, which means that edges between nodes are
uni-directional as opposed to bi-directional. In the image, for
instance, node "000" is connected to "001", but "001" is not directly
connected to "000". They are connected in a very particular way, which
is absolute genius once it's explained to you, and appears to be utter
madness until then. So, let me explain it:

1) Take this node's string label
2) Remove the first character
3) This node connects to all nodes that begin with this string

For example, take node "000". If we remove the first zero, we have
"00". That means that "000" connects to all nodes that begin with
"00"--it connects to "000" (itself) and "001". The "101" node connects
to all nodes that begin with "01", so it connects to "010" and "011".
Take a look at the image and verify that the nodes are connected based
on this simple algorithm.

So, big deal, right? Well, the cool part is the routing algorithm (let
me warn you in advance--it's genius). The path from one node to another
is expressed as a string, where each successive substring of length D
is a node in the graph. For instance, "000111" breaks down into "000",
"001", "011", "111". To get the path from one node to another in the
graph, we simply take the label of the first node and concatenate it
with the label of the second node. THEN, we merge the longest possible
substring between the suffix of the first label and the prefix of the
second. For instance, to get from "000" to "010", instead of using the
path "000010", we merge the overlapping "0" from the end of "000" and
the beginning of "010", so that the path is "00010" (which breaks down
into "000", "001", "010"). If we're trying to get from "111" to "110",
instead of "111110" it's simply "1110" (breaks into "111", "110"). See
how this works? Try it out yourself!

If you're up to it, grab a notebook and pencil and draw out a small de
Bruijn graph where k=2 and D=2, or k=3 and D=2, and try routing between
the nodes. Just don't draw out a network of, say, k=4, D=2, because
there are 16 nodes and 64 edges, and the resulting pencil mish-mash is
hard on the eyes--trust me.

(You may be wondering about those self-connected nodes, or "loop"
edges. They are a byproduct of the formal mathematical definition of a
de Bruijn graph. In a practical network, these self-connected nodes
would instead be connected to each other, so that each node of the form
"hhh" (where h is in the range [0,k)) is connected to the node "ggg"
(where g is h+1 modulus k). The paper calls this a "chain-linked" de
Bruijn graph.)

Alright, So What Do We Do With It

Still with me? I hope you didn't skip the last section, because it took
me forever to write.

At first blush, de Bruijn graphs don't seem to make sense for a p2p
network, since they are very strict in that the number of nodes in the
network is a fixed amount (recall: N = k^D). Obviously, the number of
peers connected to a p2p network is going to be a number that varies
wildly from moment to moment, so we have to have a network that can
grow and shrink easily.

This brings us to the last part of the paper; the part that actually
introduces ODRI. Let's start with a de Bruijn graph that is truly
massive. Say, k=10 and D=38 (this means that each node will have a
label that is roughly equivalent to a 128-bit key--this network has
10^38 nodes in it). Then, define each node in the graph as a slot in a
DHT table (I'll assume you understand distributed hash tables). So,
instead of each peer on the network acting as one node in the graph,
each peer acts as a virtual node that is in control of a large range of
the underlying de Bruijn graph. Peers connect to each other based on
the underlying graph.

Complicated Example:
Say we have a graph of k=2 and D=8 (a graph with 256 nodes). The labels
look like this: 00000000, 00000001, 00000010, 00000011, ..., 11111111.

But say there are only 16 peers in the network. Then, one of our peers
might be in charge of all nodes that begin with "0000". So he is in
charge of 00000000 through 00001111 (16 nodes). It is clear that all of
the nodes he is in charge of will route to nodes beginning with "0000"
or "0001". If there is another peer in the network that is in charge of
all nodes beginning with "0001", then this peer only needs to connect
to one other peer. (In a chain-linked de Bruijn graph, it would also
connect to "1111".)

The more you play around with this, the more you realize that, in this
case, our (2,8) graph that only has 16 peers in it will be connected at
the network layer exactly like a (2,4) de Bruijn graph, assuming each
peer has control of 16 nodes in the underlying graph.

Peer-to-Peer Balancing Act

Tune in next time for the nifty solution to balancing the number of
nodes each peer controls (read this paper on k-Choices if you're
impatient: http://www.eecs.harvard.edu/~jonathan/lb/kchoices05.pdf --
although the solution I propose is much simpler than that in the
paper).

I'll write more tomorrow, including algorithms for joining and leaving
an ODRI network, handling node crashes, and data redundancy. I'll
probably tackle security the day after that, as I still have to read
some more papers on distributed certification of public keys.

Spy der Mann

unread,

May 25, 2006, 3:12:30 PM5/25/06

to okopipi-dev

Nano: The diagram you proposed for a de Bruijn graph is fine for
mathematicians, but looks too complicated for our mere mortal minds
(specially the press).

I think a more generalized example that is much more pleasing to the
eyes (even if more complex) would be easier to understand.

So here it is :) A de Bruijn Graph.

http://mathworld.wolfram.com/deBruijnGraph.html

Spy der Mann

unread,

May 25, 2006, 3:54:18 PM5/25/06

to okopipi-dev

I made a tiny sketch of a (2,3) de Bruijn graph, if we put the nodes in
a circumference it makes much more sense to the mere mortals :).

Here goes:

000 100

001 110

010 111

101 011

(Note that the numbers are NOT in numeric order, this is because of the
prefix routing).

A one-way circumference is formed (counterclockwise in this example) by
"connecting the dots":

000 -> 001 -> 010 -> 101 -> 011 -> 111 -> 110 -> 100 -> 000. (Plus some
shortcut connections that follow a complicated math sequence that
guarantees an efficient routing)

The routing paths and their organization have the following properties:

(Nano, please correct me if I'm wrong in any of these)

1) All the nodes, besides a direct connection to their "successor",
have at least one "shortcut connection" to another node across the
circumference. (Exceptions be the nodes that point to themselves, but
that's a separate issue)

2) Even if all "shortcut connections" fail, the message can still pass
thru the circumference, where it could reach (more slowly) its
destination.

3) To reach a node, you might as well just reach the node's predecessor
and let it handle the message.

4) Asumming that the message is spread to all the neighboring nodes, it
spreads exponentially: At first it follows 2 edges, then 4 edges (or
more), and so on.

5) Due to the exponential spreading of the message, the number of hops
(which means successive nodes reached) required for it to reach the
destination is minimal.

6) If a connection (edge) fails, the message has passed thru many
routes, so the probability of the message getting lost is also minimal.

jdsh...@gmail.com

unread,

May 25, 2006, 4:11:08 PM5/25/06

to okopipi-dev

Nano, Make sure to present and explain this to the developers. This is
very important in their algorithm design. This would be great as
psudocode.

nano

unread,

May 26, 2006, 1:35:25 AM5/26/06

to okopipi-dev

Spy, most of the points you make don't have to do with de Bruijn graphs
specifically. Numbers 1, 2, 3, 4, and 6 apply to any fully-connected
graph of degree 2 or more with routing capabilities, including every
p2p network topography out there. I don't really understand number 5
(what does "exponential spreading" have to do with speed of routing?).

You seem to be missing the point of what makes de Bruijn graphs
special. I linked the image I linked instead of the Mathworld page on
purpose. The images on Mathworld make it hard to see what is unique
about the graph. When you look at the image I linked (which is exactly
like the image in the paper itself), you can see the precise structure.
For instance, you see the two end nodes, "000", and "111" at each end,
with self-loops, and you see "010" and "101" in the middle, as the only
two nodes that have a bidirectional connection (a feature that is very
hard to pick out in the Mathworld image).

A de Bruijn graph is not just "a circular graph with shortcut edges."
But it also isn't very complicated. You don't have to be able to
visualize a giant million-node de Bruijn graph. You just have to know
how it is connected, and how it routes.

Maybe a giant explanation of the algorithm was unnecessary. Let's just
say that it works, and it works well, with nearly optimal speed AND
resilience.

The downside is that it has no implementation. The upside is that I
think we can write a new p2p network implementation that will be
incredibly fast and strong, and we would be the first to implement a
network of this kind (which would be of considerable use to the
community). Also, designing our own network will allow us to consider
our application's specific needs during the design process.

I really think ODRI is a great idea. Raincheck on the k-Choices
algorithm till tomorrow.

Spy der Mann

unread,

May 26, 2006, 2:46:07 AM5/26/06

to okopipi-dev

Nano: See the wiki for my contribution pic to the network design.

Spy der Mann

unread,

May 26, 2006, 2:47:31 AM5/26/06

to okopipi-dev

Oops, OK, I see now. Sorry for the babbling.

nano

unread,

May 28, 2006, 12:16:31 AM5/28/06

to okopipi-dev

Sorry, I didn't mean to come off sounding imperious. :O

Spy der Mann

unread,

May 28, 2006, 1:12:28 AM5/28/06

to okopipi-dev

Great news everyone!!

Nano and I have been brainstorming, and finally we managed to make of
ODRI a 100% anonymous peer-to-peer infrastructure!

With anonymity we mean that the nodes in a message route do not know
neither the true destination nor the originator of a message, just a
path to follow. Messages are encrypted with the destination node's
public key, so the content of the message isn't known either.

This will guarantee that important nodes (i.e. administrators) cannot
be identified when they send a message. Therefore, the probability of a
DDOS attack is minimal. On top of that, adding the k-choices load
balancing algorithm will optimize the traffic routing.

What's so interesting is that ODRI's routing infrastructure made it so
natural not only for optimal routing, but also to add anonymity. Nano
will write more on this.

The network infrastructure would be called AODRI (Anonymous Optimal
Distance Routing Infrastructure), but for phonetic reasons, it will be
called "Audrey: The anonymous ODRI network".

Nano plans to publish our finds and make a generalized library for the
networking community.

nano

unread,

May 28, 2006, 1:35:58 AM5/28/06

to okopipi-dev

> Nano plans to publish our finds and make a generalized library for the
> networking community.

Well, I'm writing a design document. I'm not gonna try to get it
published. ;)

This really is good stuff, though. This network has the distinct
possibility of being the fastest and most reliable p2p network that has
ever been implemented. With the added anonymity and message encryption,
it also becomes very secure.

Cyberwolf

unread,

May 30, 2006, 2:58:55 PM5/30/06

to okopipi-dev

Hi

The idea of an overlay network based on a de Bruijn graph sounds great!

I've tried to read the paper but I'm not really an expert in
mathematics, so I tried to put the theory into practice and I made a
little simulation of a network based on a de Bruijn graph in PHP. It
really made some things more clear to me. I'm still finishing the user
interface. When that's done and someone is interested I can put it
online (and you can get the source code too of course). Don't expect
too much, it is just a basic form to input some properties (degree and
diameter of the graph) for the simulated network, afterwards you can
choose a node as the sender and a destination node, and finally it will
calculate the path to use to send messages between those two nodes. It
would get really nice if I can let the simulator make a drawing of the
simulated network together with the calculated path (by using PHP's
image functions), but I haven't started yet to implement that.

Ehm

unread,

May 30, 2006, 6:29:06 PM5/30/06

to okopipi-dev

awesome, I'd love to see that

nano

unread,

May 30, 2006, 11:15:06 PM5/30/06

to okopipi-dev

Cyberwold: Awesome! A simulator is actually on my to-do list. :) I'll
probably whip something up in Java that shows the network in a circle
formation and can display routing paths and simulate nodes joining,
leaving, and sending messages.

hedwards

unread,

May 31, 2006, 11:19:36 PM5/31/06

to okopipi-dev

Great, I get most of what you have described, but I am not quite sure
what happens when a node disconnects.

>From what I gather each de Bruijn graph requires all of the nodes in
order to function properly. Although it looks as if one missing won't
kill the network, it does limit the paths.

It sounds like unless these nodes are recycled into a new graph that
there would be an increased risk of figuring out where the admin nodes
are.

For those reasons, I figure that there is probably already a mechanism
for dealing with that, but I don't think I have seen anything to
indicate how nodes are moved into new segments as clients join and
leave.

Other than that I found your explanation to be pretty easy to follow. I
look forward to more on this, I imagine that the security and anonymity
stuff is going to be interesting.

Cheers,
Thomas

Spy der Mann

unread,

Jun 1, 2006, 12:47:50 AM6/1/06

to okopipi-dev

That's specified in the "LEAVE" algorithm. Before leaving, the node
sends a signal to its neighbors (or something) that he's leaving, and
then transfers the control of his zone to other nodes.

When the node just disconnects, it's called a "crash".

nano

unread,

Jun 1, 2006, 3:36:57 AM6/1/06

to okopipi-dev

As has been mentioned elsewhere, I'm working on a draft of a p2p
network design based on ODRI called Audrey. Sorry about the wait! I'm
hoping to complete it by Saturday.

Cyberwolf

unread,

Jun 2, 2006, 3:26:33 AM6/2/06

to okopipi-dev

Here we go:
http://www.kristofcoomans.be/okopipi/index.php

For obvious reasons I've limited the degree and diameter you can input.
The drawing of the path doesn't work yet, but it's easy to add this
with the code I already have so you can expect this functionality
tomorrow evening.

Geometry is fun! :-)

Spy der Mann

unread,

Jun 2, 2006, 11:25:30 AM6/2/06

to okopipi-dev

w00t, it looks great! :D

nano

unread,

Jun 2, 2006, 3:35:32 PM6/2/06

to okopipi-dev

Yeah, it does look great. Thanks Cyberwolf, that's a great
visualization tool. :)

Cyberwolf

unread,

Jun 3, 2006, 6:20:48 AM6/3/06

to okopipi-dev

I added the path calculation stuff. Check it out!

I was wondering what happens if you have a network of degree 2 and
diameter 3, then if 110 sends something to 101, the message goes from:

110 to 101
101 to 010
010 to 101

http://www.coomanskristof.be/okopipi/?degree=2&diameter=3&width=800&node_1_0=1&node_1_1=1&node_1_2=0&node_2_0=1&node_2_1=0&node_2_2=1&draw=

The last two steps look quite stupid, don't they?

Cyberwolf

unread,

Jun 3, 2006, 6:45:09 AM6/3/06

to okopipi-dev

But it's probably the way it works, to keep anonymity?

nano

unread,

Jun 3, 2006, 2:33:57 PM6/3/06

to okopipi-dev

No, if you are going from 110 to 101, you merge the suffix of the first
with the prefix of the second, so that the path becomes 1101, which is
a direct connection from 110 to 101.

So, when you are calculating the path, make sure to merge the longest
common substring between the end of the first and the beginning of the
second, as described in the original post.

Cyberwolf

unread,

Jun 4, 2006, 5:35:54 AM6/4/06

to okopipi-dev

Indeed, you're right, I made a mistake. It's corrected now. I'll
release the source code under the GPL one of these days.

When there's more information on the join/leave algorithms, I'll try to
add it to the simulation.

Reply all

Reply to author

Forward