Hey Matthias,
Funny to see you on this list, as I was posting on the Blueprints list just a few weeks ago asking some questions that eventually led me to Galaxy. One of my first thoughts on Titan was in fact "how do we keep it all in memory?" so it's exciting to see that you're investigating that and you ended up at Galaxy as well.
I've been considering a (potentially naive) method for graph sharding that I'd love to get feedback on from either or both of you. The basic premise is to apply the graph to a modified version of an existing visual layout algorithm.
ForceAtlas2, for example, seems particularly well suited due of its use of the
Barnes-Hut simulation, but may need to be modified to be better balanced.
If you imagine the entire graph laid out in some two dimensional space over time as vertices/edges are added and removed, you begin to see how this could lead to a distribution technique. By picking a point near the center of space and drawing a circle around all vertices we can establish our bounds. From there, we subdivide the circle into n same-sized pie-shaped slices which represent our shards. Our layout algorithm has already done the work of balancing the graph and grouping highly connected vertices, so we simply need to draw the lines that define our shards, probably with some consideration not to intersect large clusters.
In Galaxy terms, each shard would own the edges and vertices in its slice of the graph. For redundancy and traversal optimization, we might be able to "bleed" the edges determined in the sharding step. For example, for each vertex on a given shard we could share any vertex which is within our known application-specific step count and in one of the two neighboring slices.
An alternative (and seemingly more complicated) redundancy technique could be to modify the layout algorithm to use the point we drew in the very first step (the center of the circle) as a center of gravity. This point would repel any lonely vertices and attract highly-connected vertices or clusters. Then we would establish some threshold, a smaller circle with the same center as our first one. Anything within the smaller circle gets shared with all shards.
As you can probably tell I've been thinking about this for some time. Unfortunately due to lack a of time and experience I haven't been able to actually test any of this. At the moment I'm practicing a few prerequisites in my spare time and hope to start tinkering with some real demos within a few weeks. If it seems like I'm going down the wrong path (surely there's a simpler solution) please let me know!
Thanks,
Brendan