unified graph views with MultiGraph

265 views
Skip to first unread message

Joshua Shinavier

unread,
Nov 27, 2011, 2:33:24 PM11/27/11
to gremli...@googlegroups.com
Hi everyone,

Next in line in this parade of new Blueprints utilities: a Graph
implementation called MultiGraph which has just been pushed to
Tinkubator:

https://github.com/tinkerpop/tinkubator

MultiGraph wraps multiple, lower-level Graph implementations and
provides a combined graph view, unifying vertices and edges by id.
So, for example, if you have a vertex with an id of "Arthur" in graph
#1 and another vertex with an id of "Arthur" in graph #2, and you put
those graphs into a MultiGraph, the unified vertex with the id
"Arthur" will have all of the properties of either vertex, as well as
all of the edges to or from either vertex. Any vertices and edges
which exist in some graphs but not in others will also exist in the
MultiGraph view.

Here's a more detailed example:

Graph base1 = new TinkerGraph();
Graph base2 = new TinkerGraph();

Graph graph = new MultiGraph(base1, base2);

Vertex arthur1 = base1.addVertex("Arthur");
Vertex ford1 = base1.addVertex("Ford");
Vertex earth1 = base1.addVertex("Earth");

ford1.setProperty("comment", "a little odd");

base1.addEdge("Arthur knows Ford", arthur1, ford1, "knows");
base1.addEdge("Ford's home planet", ford1, earth1, "home planet");

Vertex ford2 = base2.addVertex("Ford");
Vertex zaphod2 = base2.addVertex("Zaphod");
Vertex betelgeuse2 = base2.addVertex("Betelgeuse");

ford2.setProperty("comment", "he really knows where his towel is");
ford2.setProperty("nickname", "Ix");

base2.addEdge("Ford knows Zaphod", ford2, zaphod2, "knows");
base2.addEdge("Ford's home planet", ford2, betelgeuse2, "home planet");

Now, if you retrieve the "Ford" vertex from the MultiGraph, you will
find edges from "Arthur" (of the base1 graph) and to "Zaphod" and
"Betelgeuse" (of the base2 graph). There are conflicting "Ford's home
planet" edges in the graphs, in terms of in-vertex, so the in-vertex
of the first graph (base1) takes precedence. The "nickname" property
on "Ford" is "Ix" (of base2), but the "comment" property is "a little
odd" (of base1, not "he really knows where his towel is", of base2),
due to order-precedence.

In combination with IdIndexGraph, MultiGraph allows you to seamlessly
integrate data from different sources, residing in different data
stores, about common objects of interest (such as Semantic Web
resources identified by URIs, or anything else with a common id
scheme). This extends to Property Graphs one of the great advantages
of RDF's statement-based and URI-based data model. MultiGraph is
especially geared towards Semantic Web crossover, but it's also
generally useful for working with data which spans multiple Graph
impls.

For now, you can build MultiGraph from source, then in Maven:

<dependency>
<groupId>com.tinkerpop.tinkubator</groupId>
<artifactId>multigraph</artifactId>
<version>1.0-SNAPSHOT</version>
</dependency>


Share and enjoy.

Josh

Peter Neubauer

unread,
Nov 27, 2011, 2:50:02 PM11/27/11
to gremli...@googlegroups.com
Very cool,
thinking of how to use this already, good idea Josh!

Cheers,

/peter neubauer

GTalk:      neubauer.peter
Skype       peter.neubauer
Phone       +46 704 106975
LinkedIn   http://www.linkedin.com/in/neubauer
Twitter      http://twitter.com/peterneubauer

http://www.neo4j.org              - NOSQL for the Enterprise.
http://startupbootcamp.org/    - Öresund - Innovation happens HERE.

Pierre De Wilde

unread,
Nov 27, 2011, 6:01:23 PM11/27/11
to gremli...@googlegroups.com
Hey Josh,

Brilliant idea. If I understand you correctly, it's like an automatic owl:sameAs between vertices/edges of different graphs:

graph1:vertex1 owl:sameAs graph2:vertex1

Is there a way to manually specify the match, I mean, when the vertices/edges don't have the same id but still reference the same resource?

graph1:vertex1 owl:sameAs graph2:my_vertex_1

Anyway, graph databases and Semantic Web are getting closer by your great contributions.

Thanks,
Pierre

James Thornton

unread,
Nov 27, 2011, 6:41:00 PM11/27/11
to gremli...@googlegroups.com
Hey Josh -

This is great. A few weeks ago I posted something about doing Gremlin mashups (https://groups.google.com/d/topic/gremlin-users/XO76HY-rTSs/discussion) where you combine a Neo4j graph that receives real-time updates with a distributed, read-only graph that's regenerated periodically.

It looks like MultiGraph would allow you to do the mashup part. Peter and others that have been looking into graph sharding -- how much easier would it be to shard a read-only graph vs a graph that receives real-time updates?

- James


Joshua Shinavier

unread,
Nov 27, 2011, 7:00:20 PM11/27/11
to gremli...@googlegroups.com
Hi Pierre,


On Sun, Nov 27, 2011 at 6:01 PM, Pierre De Wilde
<pierre...@gmail.com> wrote:
> Hey Josh,
> Brilliant idea. If I understand you correctly, it's like an automatic
> owl:sameAs between vertices/edges of different graphs:
> graph1:vertex1 owl:sameAs graph2:vertex1


Yes, this is reminiscent of owl:sameAs smushing in that it superposes
the metadata of two nodes, in two different graphs, onto a single
node. That's pretty much where the similarity ends, as this is more
of a syntactic operation on the nuts-and-bolts (vertices and edges) of
a graph, rather than a semantic one on a graph of concepts. Opinions
differ on what the semantics of owl:sameAs actually are, of course :-)

> Is there a way to manually specify the match, I mean, when the
> vertices/edges don't have the same id but still reference the same resource?
> graph1:vertex1 owl:sameAs graph2:my_vertex_1


I can imagine something like that being useful, perhaps in this tool
or perhaps in a separate one. The complexity would be in the mapping
of vertices to vertices and edges to edges, which might not be
one-to-one if you allow arbitrary pairs as your example suggests.


Thanks.

Josh


Btw. for anyone who went through my example with fine-toothed comb,
it's actually the "Earth" edge which should show up in the MultiGraph,
not the "Betelgeuse" edge. The code gets it right, in any case.

Joshua Shinavier

unread,
Nov 27, 2011, 7:16:37 PM11/27/11
to gremli...@googlegroups.com
On Sun, Nov 27, 2011 at 6:41 PM, James Thornton
<james.t...@gmail.com> wrote:
> Hey Josh -
> This is great. A few weeks ago I posted something about doing Gremlin
> mashups
> (https://groups.google.com/d/topic/gremlin-users/XO76HY-rTSs/discussion)
> where you combine a Neo4j graph that receives real-time updates with a
> distributed, read-only graph that's regenerated periodically.


Ah, yes. That would be a cool application of a MultiGraph.

> It looks like MultiGraph would allow you to do the mashup part. Peter and
> others that have been looking into graph sharding -- how much easier would
> it be to shard a read-only graph vs a graph that receives real-time updates?


Well... a read-only graph doesn't require you to route updates to the
appropriate shard, so...
You could provide a unified view over read-only shards using
MultiGraph, although a more specialized tool might give better
performance in that case (MultiGraph queries *each* graph for each
read operation, whereas a sharded graph would presumably know a
single, specific graph to consult for each operation).


Best regards,

Josh

> - James
>
>

Alexandre Blanquart

unread,
Nov 28, 2011, 3:56:33 AM11/28/11
to gremli...@googlegroups.com
Hi Josh,

This seems to be similar to the concept of named graphs from the Web Semantic. The difference would be in the possibility to mix multiple graphs of different implementations ?

Regards,
Alex

Alfredo Serafini

unread,
Nov 28, 2011, 7:51:41 AM11/28/11
to gremli...@googlegroups.com
HI it's really interesting!

i don't understand if the merged graph is prepared to be used in read-only mode, or if there is a way to make the modification persistent in the original graph?

If that is the case, maybe this tool could be interesting not only for the ontology-level mapping?
i wonder if another use case could be more trivial than the named graphs: if one have to import from a batched process a serie of vertices multiple times, maybe it's possible to introduce a simple "temporary" graph for the import process, than merge it into the current graph (for example for update purposes) if the process it's ok by some point of view, in a transaction fashion.
I think about one of our use case: we have to populate a subgraph by the data parsed from remote web resources. We also need to schedule this task in order to do the updates on the vertex properties...
(just an idea :-)

how do you manage the "identity" of vertex? have you plan some kind of matching criteria other than the id, for example based also on property values?

Alfredo

James Thornton

unread,
Nov 28, 2011, 2:30:11 PM11/28/11
to gremli...@googlegroups.com


On Sunday, November 27, 2011 6:16:37 PM UTC-6, Joshua Shinavier wrote:


Well... a read-only graph doesn't require you to route updates to the
appropriate shard, so...


Hey Josh -

Yeah, and I was also looking at it from the perspective of being able to optimize traversals. 

A few month's back Jim Webber posted, "The holy grail of graph algorithms is to balance a graph across database instances by creating a minimum point cut for a graph, where graph nodes are placed such that there are few relationships that span shards. The trouble with this picture is that it's hard to achieve in practice, especially since connected graphs can mutate rapidly and unpredictably at runtime" (http://jim.webber.name/2011/02/16/3b8f4b3d-c884-4fba-ae6b-7b75a191fa22.aspx).

 
Keeping the distributed portion read only and not mutable would make finding and maintaining the minimum cut point easier. 

And if you were to regenerate the distributed portion once per day, you could keep the real-time portion in memory by using something like an event-sourcing Distruptor pattern (http://code.google.com/p/disruptor/) where "replaying a days worth of journals - takes less than a minute" (http://martinfowler.com/articles/lmax.html). 

Having the real-time portion in memory means you wouldn't have to contend with cloud disk IO so it would improve the write performance. 

Anyway, these are just some thoughts that have been in the back of my mind for a few weeks. You posted MultiGraph, and a few days ago I noticed that Neo4j has been experimenting with the Disruptor pattern (https://github.com/neo4j/fast-http) so I'd thought I'd share.

BTW: Here's a post on how to make the Disruptor work with HTTP (https://groups.google.com/d/topic/lmax-disruptor/3dw6Tsmwijg/discussion).
 
- James


 

Joshua Shinavier

unread,
Nov 30, 2011, 10:46:15 PM11/30/11
to gremli...@googlegroups.com
Hi Alex,


On Mon, Nov 28, 2011 at 3:56 AM, Alexandre Blanquart
<alex.bl...@gmail.com> wrote:
> Hi Josh,
> This seems to be similar to the concept of named graphs from the Web
> Semantic.


Yes indeed. It's similar to RDF graphs, and graph merges (which are
basically unions of sets of edges/statements).

> The difference would be in the possibility to mix multiple graphs
> of different implementations ?


Well, you can have RDF graphs which are stored/exposed by different
implementations, as well. RDF is great for seamlessly combining data
from different sources, and MultiGraph basically gives us those same
advantages in Blueprints. The main differences lie in the data model:
in Property Graphs, you can have at most one value for a given
property and element (e.g. the vertex for Marko can't have two "name"
property values, i.e. properties aren't relationships) and each edge
has exactly one in-vertex and exactly one out-vertex (i.e. property
graphs are not hypergraphs). This contrasts with RDF graphs which, as
a set of statements, have no such cardinality restrictions. The
order-precedence rules I mentioned serve to deal with those
restrictions. Otherwise, merging property graphs is pretty similar to
merging RDF graphs.

Best regards,

Josh


> Regards,
> Alex
>

Alexandre Blanquart

unread,
Dec 2, 2011, 5:23:42 AM12/2/11
to gremli...@googlegroups.com
Hi Josh,

Thanks for the explanation. This is clearer to me now. Very interesting.

Regards,
Alex
Reply all
Reply to author
Forward
0 new messages