Hi Eugene,
## What is Related?
The similarity to ActiveRecord is mostly superficial. It adheres to
the ActiveModel interface and gives the added benefit of a convenient
way of doing things like validations, callbacks and serialization. But
conceptually Related is actually closer to FlockDB (
https://github.com/
twitter/flockdb). While FlockDB is a partitioned and replicated graph
interface on top of MySQL, Related is instead more or less a fairly
thin object oriented wrapper on top of Redis sets and zsets. You could
obviously use Redis directly for simple stuff like basic set
operations, in the same way that you could use MySQL joins directly to
implement the same functionality as FlockDB. The added benefit of
using Related is that you get a consistent way of interacting with
Redis as if it was a distributed graph database. Related also
implements some basic graph algorithms and convenience modules (like
the Follower module). My intention is to add more interesting graph
algorithms and query capability in the future. Right now it only has
#path_to and #shortest_path_to.
## How does Related work under the hood?
An edge in the graph is stored as an entry in two sorted sets and two
ordinary sets and a single KV. Like this:
Related.redis.zadd(r_key(:out), self.class.weight_for(self, :out),
self.id)
Related.redis.zadd(r_key(:in), self.class.weight_for(self, :in),
self.id)
Related.redis.sadd(n_key(:out), self.end_node_id)
Related.redis.sadd(n_key(:in), self.start_node_id)
Related.redis.set(dir_key,
self.id)
The above is a copy/paste directly from the source. The sorted sets
are used to lookup the edges (in both) directions. The unsorted sets
are an optimization to quickly lookup nodes without going through the
corresponding edges first. They also make it possible to do efficient
set operations like intersect, diff and union, which are not possible
to do with zsets without storing the intermediate result. The last
line in the code above (dir_key) stores the ID of the edge with the
key "{start node id}:{edge type}:{end node id}" and is an optimization
to make deleting (or manipulating) a specific edge much faster and
scale O(1) instead of O(N). It is for example used in
Related::Follower#unfollow! to avoid having to iterate through all the
follow edges to find the one to delete.
## The Follower module
The Follower module is a simple convenience module that is pretty easy
to understand simply by looking at the code (
https://github.com/
sutajio/related/blob/master/lib/related/follower.rb). The interesting
part is perhaps the #friends method that computes an intersect of the
incoming and outgoing follow edges. By default, when running a
distributed setup where the graph is distributed over several Redis
nodes, the sets that store the incoming and outgoing edges for a node
is collocated on the same Redis node. If you for some reason opt to
distribute the keys in a different way Related will transparently be
able to handle that and instead do the intersect on the client side,
which is not much slower in practice for most use cases but is bound
by network bandwidth.
## Shortest path to
The #shortest_path_to method uses the Dijkstra's algorithm to find the
shortest path in the graph. It uses the sismember and smembers
commands to check set membership and to fetch the set of edges from a
node (only the IDs). The first gut feeling I had was that doing it
like that would be too slow. But I have done a lot testing and it
turns out it is very fast in practice, since Redis is so fast. But
obviously you will be limited by network bandwidth and latency when
doing it for real on a large scale (instead of memory bandwidth and
memory latency when doing naive benchmarks).
N