Next in line in this parade of new Blueprints utilities: a Graph
implementation called MultiGraph which has just been pushed to
Tinkubator:
https://github.com/tinkerpop/tinkubator
MultiGraph wraps multiple, lower-level Graph implementations and
provides a combined graph view, unifying vertices and edges by id.
So, for example, if you have a vertex with an id of "Arthur" in graph
#1 and another vertex with an id of "Arthur" in graph #2, and you put
those graphs into a MultiGraph, the unified vertex with the id
"Arthur" will have all of the properties of either vertex, as well as
all of the edges to or from either vertex. Any vertices and edges
which exist in some graphs but not in others will also exist in the
MultiGraph view.
Here's a more detailed example:
Graph base1 = new TinkerGraph();
Graph base2 = new TinkerGraph();
Graph graph = new MultiGraph(base1, base2);
Vertex arthur1 = base1.addVertex("Arthur");
Vertex ford1 = base1.addVertex("Ford");
Vertex earth1 = base1.addVertex("Earth");
ford1.setProperty("comment", "a little odd");
base1.addEdge("Arthur knows Ford", arthur1, ford1, "knows");
base1.addEdge("Ford's home planet", ford1, earth1, "home planet");
Vertex ford2 = base2.addVertex("Ford");
Vertex zaphod2 = base2.addVertex("Zaphod");
Vertex betelgeuse2 = base2.addVertex("Betelgeuse");
ford2.setProperty("comment", "he really knows where his towel is");
ford2.setProperty("nickname", "Ix");
base2.addEdge("Ford knows Zaphod", ford2, zaphod2, "knows");
base2.addEdge("Ford's home planet", ford2, betelgeuse2, "home planet");
Now, if you retrieve the "Ford" vertex from the MultiGraph, you will
find edges from "Arthur" (of the base1 graph) and to "Zaphod" and
"Betelgeuse" (of the base2 graph). There are conflicting "Ford's home
planet" edges in the graphs, in terms of in-vertex, so the in-vertex
of the first graph (base1) takes precedence. The "nickname" property
on "Ford" is "Ix" (of base2), but the "comment" property is "a little
odd" (of base1, not "he really knows where his towel is", of base2),
due to order-precedence.
In combination with IdIndexGraph, MultiGraph allows you to seamlessly
integrate data from different sources, residing in different data
stores, about common objects of interest (such as Semantic Web
resources identified by URIs, or anything else with a common id
scheme). This extends to Property Graphs one of the great advantages
of RDF's statement-based and URI-based data model. MultiGraph is
especially geared towards Semantic Web crossover, but it's also
generally useful for working with data which spans multiple Graph
impls.
For now, you can build MultiGraph from source, then in Maven:
<dependency>
<groupId>com.tinkerpop.tinkubator</groupId>
<artifactId>multigraph</artifactId>
<version>1.0-SNAPSHOT</version>
</dependency>
Share and enjoy.
Josh
On Sun, Nov 27, 2011 at 6:01 PM, Pierre De Wilde
<pierre...@gmail.com> wrote:
> Hey Josh,
> Brilliant idea. If I understand you correctly, it's like an automatic
> owl:sameAs between vertices/edges of different graphs:
> graph1:vertex1 owl:sameAs graph2:vertex1
Yes, this is reminiscent of owl:sameAs smushing in that it superposes
the metadata of two nodes, in two different graphs, onto a single
node. That's pretty much where the similarity ends, as this is more
of a syntactic operation on the nuts-and-bolts (vertices and edges) of
a graph, rather than a semantic one on a graph of concepts. Opinions
differ on what the semantics of owl:sameAs actually are, of course :-)
> Is there a way to manually specify the match, I mean, when the
> vertices/edges don't have the same id but still reference the same resource?
> graph1:vertex1 owl:sameAs graph2:my_vertex_1
I can imagine something like that being useful, perhaps in this tool
or perhaps in a separate one. The complexity would be in the mapping
of vertices to vertices and edges to edges, which might not be
one-to-one if you allow arbitrary pairs as your example suggests.
Thanks.
Josh
Btw. for anyone who went through my example with fine-toothed comb,
it's actually the "Earth" edge which should show up in the MultiGraph,
not the "Betelgeuse" edge. The code gets it right, in any case.
Ah, yes. That would be a cool application of a MultiGraph.
> It looks like MultiGraph would allow you to do the mashup part. Peter and
> others that have been looking into graph sharding -- how much easier would
> it be to shard a read-only graph vs a graph that receives real-time updates?
Well... a read-only graph doesn't require you to route updates to the
appropriate shard, so...
You could provide a unified view over read-only shards using
MultiGraph, although a more specialized tool might give better
performance in that case (MultiGraph queries *each* graph for each
read operation, whereas a sharded graph would presumably know a
single, specific graph to consult for each operation).
Best regards,
Josh
> - James
>
>
Well... a read-only graph doesn't require you to route updates to the
appropriate shard, so...
Hey Josh -
Yeah, and I was also looking at it from the perspective of being able to optimize traversals.
A few month's back Jim Webber posted, "The holy grail of graph algorithms is to balance a graph across database instances by creating a minimum point cut for a graph, where graph nodes are placed such that there are few relationships that span shards. The trouble with this picture is that it's hard to achieve in practice, especially since connected graphs can mutate rapidly and unpredictably at runtime" (http://jim.webber.name/2011/02/16/3b8f4b3d-c884-4fba-ae6b-7b75a191fa22.aspx).
On Mon, Nov 28, 2011 at 3:56 AM, Alexandre Blanquart
<alex.bl...@gmail.com> wrote:
> Hi Josh,
> This seems to be similar to the concept of named graphs from the Web
> Semantic.
Yes indeed. It's similar to RDF graphs, and graph merges (which are
basically unions of sets of edges/statements).
> The difference would be in the possibility to mix multiple graphs
> of different implementations ?
Well, you can have RDF graphs which are stored/exposed by different
implementations, as well. RDF is great for seamlessly combining data
from different sources, and MultiGraph basically gives us those same
advantages in Blueprints. The main differences lie in the data model:
in Property Graphs, you can have at most one value for a given
property and element (e.g. the vertex for Marko can't have two "name"
property values, i.e. properties aren't relationships) and each edge
has exactly one in-vertex and exactly one out-vertex (i.e. property
graphs are not hypergraphs). This contrasts with RDF graphs which, as
a set of statements, have no such cardinality restrictions. The
order-precedence rules I mentioned serve to deal with those
restrictions. Otherwise, merging property graphs is pretty similar to
merging RDF graphs.
Best regards,
Josh
> Regards,
> Alex
>