[Blueprints] collaborative graph editing with GitGraph

636 views
Skip to first unread message

Joshua Shinavier

unread,
Apr 13, 2011, 5:20:30 PM4/13/11
to gremli...@googlegroups.com
Hi everyone,

I would like to draw attention to a new utility for Blueprints which
was motivated as follows. Lately, I have been faced with the problem
of trying to synchronize graph-y data between a mobile phone and a
desktop application. This is hard not only because the data model I
had in mind, RDF, is complicated, but also by some basic requirements
(on top of just getting the data to look the same on both devices):

1) it should be possible to load only a portion of the data on the
phone, and to push and pull changes to that portion without corrupting
the overall graph
2) it should be easy to revert changes, and it would be nice to be
able to branch
3) collaborators should be able to contribute changes to the graph, as well

This morning, it occurred to me that we could have these features in
Blueprints if we just serialize graphs in a way which plays well with
Git. I then spent all day coding, and the result is GitGraph, a
persistent Graph implementation (currently layered on top of
TinkerGraph) which stores its data in a hierarchy of canonically
ordered, diff-friendly plain text files. You can check a GitGraph
directory into GitHub, fork, edit and merge it just as you would a
piece of software. Also cool:

1) you can load subdirectories of a GitGraph as standalone graphs, and
edit them independently of the rest of the graph
2) placing two or more GitGraphs in the same directory creates a
super-GitGraph which you can load as one graph. You can then create
edges which span the two graphs and create new top-level vertices.
You can go back to a view of the individual graphs at any time.
3) no additional API, apart from the GitGraph constructor

I have created a sandbox graph here:

https://github.com/tinkerpop/gitgraph-sandbox

Take a look at the README for a usage example. Pull requests are
welcome :-) At the moment, the GitGraph source is available in a
feature/gitgraph branch of Blueprints:

https://github.com/tinkerpop/blueprints/tree/feature/gitgraph

It will be merged into Blueprints proper or made into a separate
project after we have had some time to experiment with it.


Best,

Josh

Peter Neubauer

unread,
Apr 13, 2011, 6:05:32 PM4/13/11
to gremli...@googlegroups.com

Josh,
You crank it! This is a very cool approach to offline long running concurrent edits. If you then can define a baseline below which you transform things into a resulting performant graph from the files, it rocks.

Sent from my phone.

Joshua Shinavier

unread,
Apr 13, 2011, 6:20:45 PM4/13/11
to gremli...@googlegroups.com, Peter Neubauer
Hi Peter,


On Thu, Apr 14, 2011 at 6:05 AM, Peter Neubauer
<neubaue...@gmail.com> wrote:
> Josh,
> You crank it! This is a very cool approach to offline long running
> concurrent edits.


Thanks!

> If you then can define a baseline below which you
> transform things into a resulting performant graph from the files, it rocks.


Right now, only TinkerGraph is supported as the actual graph
implementation. However, you'll notice a second GitGraph constructor
(currently private) which lets you pass in any IndexableGraph.
If/when GitGraph can be made to scale (e.g. by replacing its current
memory-intensive operations with disk-based ones), you will be able to
use another persistent graph such as Neo4jGraph for moment-to-moment
storage and only load or save via GitGraph when you want to pull or
push changes.


NOTE: I have moved the sandbox graph into Tinkubator, where I should
have put it in the first place:

https://github.com/tinkerpop/tinkubator/tree/master/gitgraph-sandbox

The other repo will be going away shortly.

Josh

Peter Neubauer

unread,
Apr 13, 2011, 6:24:27 PM4/13/11
to Joshua Shinavier, gremlin-users
Josh,
yes indeed, this would be a good balance between fine grained diffs
for recent changes, and fast baseline disk based graphs. I like!

/peter

Marko Rodriguez

unread,
Apr 13, 2011, 7:42:35 PM4/13/11
to gremli...@googlegroups.com
WoW. This sounds crazy... I haven't grocked it fully.... Will definitely need to pick your ear on IM!

Question -- What are you pushing -- GraphML or Java serialization of the TinkerGraph object? -- or your own format?

Marko.

http://markorodriguez.com

stephen mallette

unread,
Apr 13, 2011, 8:13:32 PM4/13/11
to Gremlin-users
Josh, cool stuff! Definitely going to check this out in more detail.

Steve

Joshua Shinavier

unread,
Apr 14, 2011, 2:31:09 AM4/14/11
to gremli...@googlegroups.com, Marko Rodriguez
On Thu, Apr 14, 2011 at 7:42 AM, Marko Rodriguez <okram...@gmail.com> wrote:
> WoW. This sounds crazy... I haven't grocked it fully.... Will definitely need to pick your ear on IM!
>
> Question --  What are you pushing -- GraphML or Java serialization of the TinkerGraph object? -- or your own format?


Right now, GitGraph uses a plain text format with one vertex, edge, or
property definition per line. That makes it easy to put the lines of
each file in a well-defined order (by sorting) so as to keep diffs
neat. GraphML should also possible (and would be nice), though more
complicated code-wise. We would need a couple of indices to provide
in-order (based on a Comparator which takes the element ID hierarchy
into account) traversals of vertices and edges (easy) and a customized
GraphMLWriter which orders and formats the XML tree in a deterministic
fashion (seems doable).


Josh

Marko Rodriguez

unread,
Apr 14, 2011, 7:11:09 AM4/14/11
to Joshua Shinavier, gremli...@googlegroups.com
Cool. Could you make it YAGFF (Yet Another Graph File Format)? -- Like GraphMLReader/Writer, ...

Marko.

Joshua Shinavier

unread,
Apr 14, 2011, 12:21:11 PM4/14/11
to gremli...@googlegroups.com
On Thu, Apr 14, 2011 at 7:11 PM, Marko Rodriguez <okram...@gmail.com> wrote:
> Cool. Could you make it YAGFF (Yet Another Graph File Format)? -- Like GraphMLReader/Writer, ...


Could do that. Note that it wouldn't necessarily write to a single
file, though; it creates a directory tree based on the IDs of vertices
and edges. For example, an edge with the ID "42" would be defined in
the top-level directory, but it might reference a vertex "misc/13"
which is defined in a subdirectory called "misc". If you load the
subdirectory as a GitGraph, that vertex appears as "13" and the
top-level edge is not visible.

That way, an application which only cares about the graph data in
"misc" doesn't need to deal with a monolithic file for the entire
graph. When it is done making changes, it just writes back to "misc"
and its children, ignoring the rest of the tree.


Josh

Robert Elves

unread,
Mar 18, 2012, 12:10:12 PM3/18/12
to gremli...@googlegroups.com
Hi Josh,

This looks like cool stuff. I see the gitgraph-sandbox but is the GitGraph source code still kicking around?

Thanks,

-Rob

Joshua Shinavier

unread,
Mar 18, 2012, 12:58:07 PM3/18/12
to Gremlin-users
Hi Rob,


On Mar 18, 12:10 pm, Robert Elves <rel...@gmail.com> wrote:
> Hi Josh,
>
> This looks like cool stuff. I see the gitgraph-sandbox but is the GitGraph
> source code still kicking around?


GitGraph hasn't gotten much attention in a while, but I'm happy to
maintain it if I know someone is using it. Since I started this
thread, I have merged the GitGraph idea into GraphMLWriter and a tool
called IdIndexGraph (in Tinkubator). This combo gives you all of the
collaborative editing, forking and merging functionality of GitGraph.
The only GitGraph feature it doesn't have is the directory hierarchy.
Here's a usage example:

// Import a dump from Git
IndexableGraph base = ...
base.clear();
IdIndexGraph g = new IdIndexGraph(base);
GraphMLReader r = new GraphMLReader(g);
r.inputGraph(new FileInputStream(new File("/tmp/dump.xml")));

// Export a dump to Git
IndexableGraph base = ...
IdIndexGraph g = new IdIndexGraph(base);
GraphMLWriter w = new GraphMLWriter(g);
w.setNormalize(true);
w.outputGraph(new FileOutputStream(new File("/tmp/dump.xml")));


Best,

Josh



>
> Thanks,
>
> -Rob

Robert Elves

unread,
Mar 18, 2012, 1:20:50 PM3/18/12
to gremli...@googlegroups.com
Hey thanks Josh. I'm interested in the collaborative editing aspect so this will likely suite my needs. Thanks for the usage example, and for sharing this in the first place!

-Rob
Reply all
Reply to author
Forward
0 new messages