I am considering adapting the JUNG2 graph library to Gremlin as a 'com.tinkerpop.gremlin.db.jung' package. I use JUNG2 in several projects and notice differences between its API, Gremlin's Graph Model, and TinkerGraph implementation.
Here is JUNG2's Directed Graph interface:
Notice that any "hashably unique" Java object can be used as a Vertex or Edge, making special Vertex or Edge wrapper class unnecessary. All of the vertex and edge access is via a single graph object - and the same element instances can be re-used in multiple graph instances.
Here is JUNG's "DirectedSparseMultigraph.java" implementation:
Notice that it instantiates two Map's. In contrast:
- Each TinkerGraph instantiates one Map
- Each TinkerVertex instantiates two Set's
- Each Edge references two Vertices
Another project with a Graph system is dANN, a Java artificial-intelligence system:
btw - dANN Graph takes a Walk parameter. (I think this is similar to Boris's mentioning of a path interface.)
Very cool. Yes---I used JUNG for many years until I found iGraph. Java is good for graph database stuff---not for real-time, user-driven analysis. For that, R: and iGraph fit the bill much better -- you get a full on stats environment when you do your graph analysis. Something Java (or Groovy for real-time) is just poor at.
Here is JUNG's "DirectedSparseMultigraph.java" implementation:
Notice that it instantiates two Map's. In contrast:
- Each TinkerGraph instantiates one Map
- Each TinkerVertex instantiates two Set's
- Each Edge references two Vertices
A DirectedSparseMultiGraph is very close to what a the property graphs of Gremlin are. The property graphs are best defined as Directed, Labeled, Attributed, Multi-Graphs. So, the question is, does DirectedSparseMultiGraph allow for property maps on the edges and vertices? Or can you at least extend DirectedSparseMultiGraph to do so.
To clarify, notice that JUNG generally instantiates two Map's for the vertex and edge incidences. In contrast:
- Each TinkerElement instantiates the property map (as mentioned above), and has a string ID. It additionally references a TinkerIndex.
- In JUNG, the vertex or edge instance itself can serve as the ID, provided it is unique and ideally hashable.
- Each TinkerGraph instantiates one HashMap for incidence
- Each TinkerVertex instantiates two HashSet's for incidence. Isn't this 2x memory redundancy, because a reference in an incoming set will appear in another vertice's output set? Alternatively JUNG's per-graph incidence structure stores this in one data structure.
Instead, I imagine one could rewrite the graph model to eliminate separate Edge and Vertex classes in moving their functionality to the Graph class. (Graph would also be responsible for what the Index/TinkerIndex classes do.) To operate on a specific graph element, one provides the graph and a reference to the element, for example:
BTW I recognized this pattern in Neo4J's API too (since it was first released years ago) in its separate Node (ie. Vertex) and Relationship (ie. Edge) classes and couldn't understand why its API wasn't more similar to JUNG.
- vertex.getInEdges() -> graph.getInEdges(vertex)
- edge.getInVertex() -> graph.getInVertex(edge)
> Yes---its possible to do Graph<V,E> and make Graph responsible for
> handling all the methods. The problem I had with this is then Gremlin
> doesn't have a clean type system. V and E can be anything and then it
> starts to mess up how paths and functions are applied. This is why I
> hid non-gremlin types behind these interfaces. For example:
>
> $g/V could return a collection of anything (for exaggeration, lets say
> it returns HTTPConnections), then $g/V/outE wouldn't know what to do
> with those HTTPConnections as they are not vertices, they are just
> objects... what is the outE of an HTTPConnection?? Gremlin would need
> to get a reference back to the original graph from which the
> HTTPConnection came from....then stuff gets all complicated.... :(
well, i'd like to see an instance's (ex: HTTPConnection) methods and
especially getters, by reflection, in its outE. also any public final
fields would explicitly point to other instances too. :)
though wouldn't you already have a reference to its origin graph, which
could be passed as a parameter to methods, or injected via a set... to
HTTPConnection if something needs to be "graph aware"?
thanks for the explanations!
>> $g/V could return a collection of anything (for exaggeration, lets
>> say it returns HTTPConnections), then $g/V/outE wouldn't know what
>> to do with those HTTPConnections as they are not vertices, they are
>> just objects... what is the outE of an HTTPConnection?? Gremlin
>> would need to get a reference back to the original graph from which
>> the HTTPConnection came from....then stuff gets all
>> complicated.... :(
> well, i'd like to see an instance's (ex: HTTPConnection) methods and
> especially getters, by reflection, in its outE. also any public
> final fields would explicitly point to other instances too. :)
Ha! ... Yea, wouldn't it be nice if everything was possible with
Gremlin. Be cool if it did your homework, brushed your teeth, ...
In all seriousness, yes, its very possible to make use of
introspection to allow you to walk a graph of your Java object space.
As you say, you can introspect object fields and methods that return
objects.... But now its just a mess---the Gremlin type system is
basically the Java type system ... and then people are going to want
to be able to construct objects and call methods that take parameters
and do what you can do in Java --- then people will say, but Gremlin
doesn't handle generics so then you have to add generic functionality
to Gremlin... dunno--its just not clean and concise anymore...?? Its
basically a contorted script-language on top of Java----and isn't this
what Groovy is all about?
> though wouldn't you already have a reference to its origin graph,
> which could be passed as a parameter to methods, or injected via a
> set... to HTTPConnection if something needs to be "graph aware"?
You have a reference at $g, but not at the path segment outE. You
could assume that the working graph $_g is the graph, but then it gets
complicated when you are working with multiple graphs in Gremlin. If
Gremlin only allowed one graph at a time, then yes, it would be okay
to do that model (I think). But then again, my previous paragraph
thoughts come up again.
> thanks for the explanations!
Thanks for digging deeper and as that CNN guy, Anderson Cooper, always
says: keepin' me real.
See ya,
Marko.
See ya,
Marko.
it's because i prototype a general purpose graph-based
browser/editor/communicator using Java/JUNG/Swing. relevant to your
wanting to interface Gremlin with filesystems and XML documents, and i
think java reflection of the actual objects and source-code involved is
important too. ultimately a C++ implementation will be necessary to
support the larger and faster graphs (ex: word-level detail of twitter
messages and web content) as well as visualizing them with high
frame-rates and detail.
i think it would be great to include the Neno/Fhat material somehow...
>> i'm considering it. JUNG is used in several scientific
>> applications ( http://sourceforge.net/apps/trac/jung/wiki/ProjectsUsingJUNG
>> ) and has a lot of built-in algorithms (like pagerank and maxflow)
>> and visualizations (Swing and Java3D). also IMO is the lowest-
>> common-denominator graph and hypergraph API, and in order to use
>> its features with Gremlin i would additionally need a Gremlin ->
>> JUNG adapter (ex: to visualize Gremlin output as I'd like to do).
I need to write some documentation on how to use Gremlin from within a
Java application. Its relatively easy:
GremlinEvaluator ge = new GremlinEvaluator();
List results = ge.evaluate("./outE/inV/...")
This way, people could query their JUNG graphs using Gremlin and still
have their JUNG graph to manipulate in their other Java apps.
Going the other way, it would be cool to have direct support for JUNG
graphs in Gremlin, as you say, for this reason --- e.g.:
gremlin> $_g := jung:open()
gremlin> g:load('some-graph-data.xml')
gremlin> jung:pagerank(0.85)
In short, have jung: namespaced Gremlin functions for each of the
algorithms in their algorithms package.
http://jung.sourceforge.net/doc/api/edu/uci/ics/jung/algorithms/importance/package-summary.html
When I was building Confluence (a particle swarm package in Java that
makes use of property graphs), I had a class to convert Confluence
graphs to JUNG graphs --- see: http://www.markorodriguez.com/docs/conf/api/conf/util/JungGraph.html
So then something that converts any property graph into a JUNG graph
may be cool:
gremlin> $_g := jung:convert($h)
And then $_g can be subjected to the graph analysis algorithms
provided by JUNG --- e.g. jung:pagerank($_g, 0.85). However, JUNG
graph algorithms are single-relational so PageRank doesn't make much
sense for multi-relational graphs.... BUT!!! you can always do this http://arxiv.org/abs/0806.2274
.. this is known as graph rewriting http://wiki.github.com/tinkerpop/gremlin/graph-rewriting
...
Thus, you can make use of that wonderful suite of JUNG algorithms for
processing Gremlin graphs (of course---this will only work for in-
memory graphs---or will it?! ... Perhaps making a wrapper to the
PropertyGraph model and JUNG will make it so that convert() is not
necessary and thus, for ANY property graph, you can do:
gremlin> jung:pagerank($_g, 0.85)
Now that would be awesome and totally worth the effort!
> it's because i prototype a general purpose graph-based browser/
> editor/communicator using Java/JUNG/Swing. relevant to your wanting
> to interface Gremlin with filesystems and XML documents, and i think
> java reflection of the actual objects and source-code involved is
> important too. ultimately a C++ implementation will be necessary to
> support the larger and faster graphs (ex: word-level detail of
> twitter messages and web content) as well as visualizing them with
> high frame-rates and detail.
> i think it would be great to include the Neno/Fhat material somehow...
Ha! I love that you love Neno/Fhat [ http://neno.lanl.gov ]. You don't
know how many people make fun of me for that monstrosity. :) ... I
love it though---its like a Frankenstein mutant from another dimension.
> Ha! I love that you love Neno/Fhat [ http://neno.lanl.gov ]. You don't
> know how many people make fun of me for that monstrosity. :) ... I
> love it though---its like a Frankenstein mutant from another dimension.
yes i absolutely do and i wonder why i must live in a world where Neno
isn't the universal architecture for computer hardware and software!
:)
is there any more Neno/Fhat documentation or code that's not on the website?
I just created an implementation of JUNGs Graph interface:
Now its possible to use the Jung algorithms package on Gremlin graphs!
Even disk-based graphs like Neo4j graphs!
Next, I'm going to create a function library called JungFunctions that
implements each of the Jung algorithm:
e.g. PageRank, KStepMarkov, Betweenness, Closeness, etc. etc.
Then we can do stuff like this:
gremlin> $_g := neo4j:open('/tmp/neograph')
gremlin> jung:pagerank($_g, 0.85)
Pretty cool, eh?
I haven't tested the implementation, so.... still need to do that. And
yes---I'll think about how this would work over LinkedProcess/
Gargamel-----my wheels are turning....
Take care,
Marko.
http://markorodriguez.com
http://linkedprocess.org
> yes i absolutely do and i wonder why i must live in a world where
> Neno isn't the universal architecture for computer hardware and
> software!
Exactly. One address space for all of computing --- everyone's
physical machines support the execution of a massive pool of
distributed abstract virtual machines executing distributed code
processing distributed data structures. Its a trip --- imagine how
different our computing lives would be!
*** Unfortunately, I never wrote more about it. I sorta of talk about
it more in this article http://arxiv.org/abs/0905.3378 .. and kinda,
maybe hint at it in this article http://arxiv.org/abs/0908.0373
See ya,
Marko.
Cheers,
/peter neubauer
COO and Sales, Neo Technology
GTalk: neubauer.peter
Skype peter.neubauer
Phone +46 704 106975
LinkedIn http://www.linkedin.com/in/neubauer
Twitter http://twitter.com/peterneubauer
http://www.neo4j.org - Your high performance graph database.
http://gremlin.tinkerpop.com - The terminal to the Giant Global Graph.