Hi all,
today I found some time to check in more detail how edges behave at the
moment.
When creating an edge, you can store arbitrary payload information along
with the edge, e.g. the "some-stuff" attribute for the following edge
between vertices v1 & v2:
edges.e.save(v1, v2, { "some-stuff" : true });
Edges are not explicitly directed or undirected in ArangoDB. There is no
standard way to tell the system an edge is directed or undirected. You
can use any non-standard attribute for this info, however, it's hard to
filter on that later.
When an edge gets created, it's currently inserted into an edges index
of the edge collection.
In the above case, there will be following inserts into the edges index:
- vertex: v1, direction: OUT
- vertex: v2, direction: IN
This is reasonable and looks like the edge indeed carries some direction
with it (v1 -> v2, or _from -> _to). You can later look up the incoming
and outgoing edges using the .inEdges(vertex) or .outEdges(vertex)
functions.
However, the following index entries are also made when creating an
edge, in addition to the two entries above:
- vertex: v1, direction: ANY
- vertex: v2, direction: ANY
The only use case for these additional index entries is to quickly look
up all edges connected to a vertex, regardless of whether the original
edges were in or out edges.
That means four index entries were made per edge instead of just two
(exception: only three entries were made for edges that had identical
_from and _to values, but that should have been rare cases).
That of course bloats the index, and still edge direction is only implicit.
After thinking a bit about this, the solution I have come up with is the
following:
Only create two index entries per edge:
- vertex: v1, direction: OUT
- vertex: v2, direction: IN
These should do it, and if you need an ANY query (i.e. .edges(vertex)),
we can merge the results of an OUT and an IN query.
This prevents index bloat.
Additionally, I think we should store per edge whether it is directed or
undirected. This info can be passed to ArangoDB on edge creation.
My suggestion is to use the following syntax for this:
edges.e.save(v1, v2, { "some-stuff" : true, "_bidirectional" : true });
The _bidirectional attribute would be a new system attribute available
for edges. It is stored internally, and if set to true, it will cause
two additional index entries to be made:
- vertex: v2, direction: OUT
- vertex: v1, direction: IN
That's the reverse of the above entries and seems plausible, as the edge
is bidirectional.
To summarize so far, if an edge is unidirectional, two index entries
will be made. If it's bidirectional, four entries will be made.
The default behavior of ArangoDB should be to use unidirectional edges,
so when not specifying the "_bidirectional" attribute, everything is
fully downwards compatible.
When returning edges from a collection or a query, an edge currently has
_from and _to attributes.
This should be extended with the _bidirectional attribute. This will
always have a boolean value (with false being the default if nothing was
specified on creation).
When a unidirectional edge is returned, we should return the OUT vertex
in the _from attribute, and the IN vertex in the _to attribute as we did
before. This again is downwards-compatible.
When a bidirectional edge is returned, we can return the two vertices it
connects in an array named _vertices. This is more appropriate than
_from and _to because there is no order or hierarchy between the two
connected vertices, but this was suggested by the names _from and _to.
Some example graph follows, with a vertex collection v and an edge
collection e:
// clean up
db._drop("v");
edges._drop("e");
// create collections
db._create("v");
edges._create("e");
// save vertices
a = db.v.save({ "name" : "a" });
b = db.v.save({ "name" : "b" });
c = db.v.save({ "name" : "c" });
d = db.v.save({ "name" : "d" });
e = db.v.save({ "name" : "e" });
f = db.v.save({ "name" : "f" });
// save edges
edges.e.save(a, b, { "what": "a<->b", "_bidirectional" : true });
edges.e.save(a, b, { "what": "a->a", "_bidirectional" : false });
edges.e.save(a, c, { "what": "a->c", "_bidirectional" : false });
edges.e.save(d, a, { "what": "d->a", "_bidirectional" : false });
edges.e.save(c, d, { "what": "c->d", "_bidirectional" : false });
edges.e.save(f, d, { "what": "f<->d", "_bidirectional" : true });
edges.e.save(f, e, { "what": "f->e", "_bidirectional" : false });
edges.e.save(e, e, { "what": "e->e", "_bidirectional" : false });
As you can see I created both unidirectional and bidirectional edges in
the example. When querying all the edges, we'd get the following result:
arangod> edges.e.all().toArray();
[
{
"_bidirectional" : false,
"_from" : "7588030/10537150",
"_to" : "7588030/10275006",
"what" : "d->a"
},
{
"_bidirectional" : false,
"_from" : "7588030/10275006",
"_to" : "7588030/10471614",
"what" : "a->c"
},
{
"_bidirectional" : false,
"_from" : "7588030/10668222",
"_to" : "7588030/10602686",
"what" : "f->e"
},
{
"_bidirectional" : true,
"_vertices" : ["7588030/10668222", "7588030/10537150"],
"what" : "f<->d"
},
{
"_bidirectional" : false,
"_from" : "7588030/10275006",
"_to" : "7588030/10406078",
"what" : "a->a"
},
{
"_bidirectional" : false,
"_from" : "7588030/10602686",
"_to" : "7588030/10602686",
"what" : "e->e"
},
{
"_bidirectional" : false,
"_from" : "7588030/10471614",
"_to" : "7588030/10537150",
"what" : "c->d"
},
{
"_bidirectional" : true,
"_vertices" : ["7588030/10275006", "7588030/10406078"],
"what" : "a<->b"
}
]
As you can see from the above, we return _from and _to for
unidirectional edges, and _vertices for bidirectional edges.
I hope this suggestion is more intuitive than the current solution in
ArangoDB 1.0 and 1.1.
I also think it can be more efficient because for directed edges there
will be less index entries. Apart from that, edges direction is now
explicit.
I everybody is happy with this suggestion, you can already start playing
around with it. The above output is from the devel branch.
The change should not be made in ArangoDB 1.0 or 1.1, as it will require
some changes to the data file organisation.
ArangoDB 1.2 will introduce some changes for data file organisation
anyway due to other features (transactions), so integrating the change
into ArangoDB version 1.2 seems best from my point of view.
What do you think?
Best regards
Jan