Gremlin/TinkerGraph vs. JUNG2

S H

unread,

Feb 4, 2010, 9:53:12 PM2/4/10

to gremli...@googlegroups.com

I am considering adapting the JUNG2 graph library to Gremlin as a 'com.tinkerpop.gremlin.db.jung' package. I use JUNG2 in several projects and notice differences between its API, Gremlin's Graph Model, and TinkerGraph implementation.

Here is JUNG2's Directed Graph interface:

http://jung.sourceforge.net/doc/api/edu/uci/ics/jung/graph/DirectedGraph.html

Notice that any "hashably unique" Java object can be used as a Vertex or Edge, making special Vertex or Edge wrapper class unnecessary. All of the vertex and edge access is via a single graph object - and the same element instances can be re-used in multiple graph instances.

Here is JUNG's "DirectedSparseMultigraph.java" implementation:

http://jung.cvs.sourceforge.net/viewvc/jung/jung2/jung-graph-impl/src/main/java/edu/uci/ics/jung/graph/DirectedSparseMultigraph.java?view=markup

Notice that it instantiates two Map's. In contrast:

Each TinkerGraph instantiates one Map
Each TinkerVertex instantiates two Set's
Each Edge references two Vertices

Here is the complete JUNG2 API:

http://jung.sourceforge.net/doc/api/index.html

Also relevant are JUNG's Hypergraph class and algorithms packages.

Another project with a Graph system is dANN, a Java artificial-intelligence system:

http://bugs.syncleus.com/trac.cgi/dANN/browser/trunk/projects/java_dann/src/com/syncleus/dann/graph/Graph.java

btw - dANN Graph takes a Walk parameter. (I think this is similar to Boris's mentioning of a path interface.)

http://bugs.syncleus.com/trac.cgi/dANN/browser/trunk/projects/java_dann/src/com/syncleus/dann/graph/Walk.java

Marko A. Rodriguez

unread,

Feb 5, 2010, 10:06:56 AM2/5/10

to gremli...@googlegroups.com

Hey Seth,

I am considering adapting the JUNG2 graph library to Gremlin as a 'com.tinkerpop.gremlin.db.jung' package. I use JUNG2 in several projects and notice differences between its API, Gremlin's Graph Model, and TinkerGraph implementation.

Here is JUNG2's Directed Graph interface:

http://jung.sourceforge.net/doc/api/edu/uci/ics/jung/graph/DirectedGraph.html

Notice that any "hashably unique" Java object can be used as a Vertex or Edge, making special Vertex or Edge wrapper class unnecessary. All of the vertex and edge access is via a single graph object - and the same element instances can be re-used in multiple graph instances.

Very cool. Yes---I used JUNG for many years until I found iGraph. Java is good for graph database stuff---not for real-time, user-driven analysis. For that, R: and iGraph fit the bill much better -- you get a full on stats environment when you do your graph analysis. Something Java (or Groovy for real-time) is just poor at.

Here is JUNG's "DirectedSparseMultigraph.java" implementation:

http://jung.cvs.sourceforge.net/viewvc/jung/jung2/jung-graph-impl/src/main/java/edu/uci/ics/jung/graph/DirectedSparseMultigraph.java?view=markup

Notice that it instantiates two Map's. In contrast:

Each TinkerGraph instantiates one Map

Each TinkerVertex instantiates two Set's

Each Edge references two Vertices

A DirectedSparseMultiGraph is very close to what a the property graphs of Gremlin are. The property graphs are best defined as Directed, Labeled, Attributed, Multi-Graphs. So, the question is, does DirectedSparseMultiGraph allow for property maps on the edges and vertices? Or can you at least extend DirectedSparseMultiGraph to do so.

Another project with a Graph system is dANN, a Java artificial-intelligence system:

http://bugs.syncleus.com/trac.cgi/dANN/browser/trunk/projects/java_dann/src/com/syncleus/dann/graph/Graph.java

btw - dANN Graph takes a Walk parameter. (I think this is similar to Boris's mentioning of a path interface.)

http://bugs.syncleus.com/trac.cgi/dANN/browser/trunk/projects/java_dann/src/com/syncleus/dann/graph/Walk.java

This looks cool too. Yes---if you want to make all this work with Gremlin, it seems very simple to do so as these graphs tend to have many more methods than the very basic graph model used by Gremlin. A thing that I see being a potential issue is Index.java --- if these graphs don't assume a property set, then you can't index their properties. Which is fine, you simply throw an UnsupportedOperationException for getIndex() --- at which point, g:key() doesn't do anything (well it throws an EvaluationError).

If you need any help, I'm interested in getting JUNG and Gremlin going. --- I don't know much about dANN besides the links you sent...

Take care,

Marko.

http://markorodriguez.com

http://tinkerpop.com

S H

unread,

Feb 5, 2010, 11:20:35 AM2/5/10

to gremli...@googlegroups.com

Very cool. Yes---I used JUNG for many years until I found iGraph. Java is good for graph database stuff---not for real-time, user-driven analysis. For that, R: and iGraph fit the bill much better -- you get a full on stats environment when you do your graph analysis. Something Java (or Groovy for real-time) is just poor at.

I've found Java can certainly be useful for real-time interactive graph apps... and much of it translates directly to the compiled D language which offers performance near C++.

Here is JUNG's "DirectedSparseMultigraph.java" implementation:

http://jung.cvs.sourceforge.net/viewvc/jung/jung2/jung-graph-impl/src/main/java/edu/uci/ics/jung/graph/DirectedSparseMultigraph.java?view=markup

Notice that it instantiates two Map's. In contrast:

Each TinkerGraph instantiates one Map

Each TinkerVertex instantiates two Set's

Each Edge references two Vertices

A DirectedSparseMultiGraph is very close to what a the property graphs of Gremlin are. The property graphs are best defined as Directed, Labeled, Attributed, Multi-Graphs. So, the question is, does DirectedSparseMultiGraph allow for property maps on the edges and vertices? Or can you at least extend DirectedSparseMultiGraph to do so.

With JUNG, if you want to associate each edge and vertex with a property map, the vertex or edge element will need to contain the property map object itself, like in the TinkerGraph model.

To clarify, notice that JUNG generally instantiates two Map's for the vertex and edge incidences. In contrast:

Each TinkerElement instantiates the property map (as mentioned above), and has a string ID. It additionally references a TinkerIndex.

In JUNG, the vertex or edge instance itself can serve as the ID, provided it is unique and ideally hashable.

Each TinkerGraph instantiates one HashMap for incidence
Each TinkerVertex instantiates two HashSet's for incidence. Isn't this 2x memory redundancy, because a reference in an incoming set will appear in another vertice's output set? Alternatively JUNG's per-graph incidence structure stores this in one data structure.
Each TinkerEdge references two Vertices, an ID, label, and the TinkerIndex.

For example, since I wanted to adapt JUNG2 to Gremlin, I would need to create more Java wrapper objects containing multiply redundant references and this would consume several times more memory than the actual JUNG data I wanted to wrap. So I was concerned about redundant data in Gremlin / TinkerGraph's API.

Instead, I imagine one could rewrite the graph model to eliminate separate Edge and Vertex classes in moving their functionality to the Graph class. (Graph would also be responsible for what the Index/TinkerIndex classes do.) To operate on a specific graph element, one provides the graph and a reference to the element, for example:

vertex.getInEdges() -> graph.getInEdges(vertex)
edge.getInVertex() -> graph.getInVertex(edge)

BTW I recognized this pattern in Neo4J's API too (since it was first released years ago) in its separate Node (ie. Vertex) and Relationship (ie. Edge) classes and couldn't understand why its API wasn't more similar to JUNG.

???

Marko A. Rodriguez

unread,

Feb 5, 2010, 11:51:20 AM2/5/10

to gremli...@googlegroups.com

Hi,

To clarify, notice that JUNG generally instantiates two Map's for the vertex and edge incidences. In contrast:

Each TinkerElement instantiates the property map (as mentioned above), and has a string ID. It additionally references a TinkerIndex.

In JUNG, the vertex or edge instance itself can serve as the ID, provided it is unique and ideally hashable.

Each TinkerGraph instantiates one HashMap for incidence

Each TinkerVertex instantiates two HashSet's for incidence. Isn't this 2x memory redundancy, because a reference in an incoming set will appear in another vertice's output set? Alternatively JUNG's per-graph incidence structure stores this in one data structure.

Yes. You are correct----storing it in one data structure would be more efficient memory wise, but a slower retrieval wise. It depends on how you want to index edges---if you store it like this in Graph:

Map<Vertex, Set<Edge>> outgoingEdges

where Set<Edge> is the vertex's outgoing edges, then how do you get the incoming edges? -- searching through all the other vertices (this is slow). To speed things up, you can then create another map to index incoming edges:

Map<Vertex, Set<Edge>> incomingEdges

and then you have the same model as TinkerGraph, save for how TinkerGraph lets each vertex care for its own edges. So, in TinkerGraph, I sacrifice memory for speed. Perhaps there is some clever indexing model that I'm not aware of that would solve this?

Instead, I imagine one could rewrite the graph model to eliminate separate Edge and Vertex classes in moving their functionality to the Graph class. (Graph would also be responsible for what the Index/TinkerIndex classes do.) To operate on a specific graph element, one provides the graph and a reference to the element, for example:

vertex.getInEdges() -> graph.getInEdges(vertex)

edge.getInVertex() -> graph.getInVertex(edge)

BTW I recognized this pattern in Neo4J's API too (since it was first released years ago) in its separate Node (ie. Vertex) and Relationship (ie. Edge) classes and couldn't understand why its API wasn't more similar to JUNG.

Yes---its possible to do Graph<V,E> and make Graph responsible for handling all the methods. The problem I had with this is then Gremlin doesn't have a clean type system. V and E can be anything and then it starts to mess up how paths and functions are applied. This is why I hid non-gremlin types behind these interfaces. For example:

$g/V could return a collection of anything (for exaggeration, lets say it returns HTTPConnections), then $g/V/outE wouldn't know what to do with those HTTPConnections as they are not vertices, they are just objects... what is the outE of an HTTPConnection?? Gremlin would need to get a reference back to the original graph from which the HTTPConnection came from....then stuff gets all complicated.... :(

S H

unread,

Feb 5, 2010, 12:29:35 PM2/5/10

to gremli...@googlegroups.com

Marko A. Rodriguez wrote:
>
> Map<Vertex, Set<Edge>> outgoingEdges
>
> where Set<Edge> is the vertex's outgoing edges, then how do you get
> the incoming edges? -- searching through all the other vertices (this
> is slow). To speed things up, you can then create another map to index
> incoming edges:
>
> Map<Vertex, Set<Edge>> incomingEdges

agreed - for bidirectionality it ends up being nearly the same, and this
is what JUNG is doing with Map<V, Pair<Set<E>>>.

> Yes---its possible to do Graph<V,E> and make Graph responsible for
> handling all the methods. The problem I had with this is then Gremlin
> doesn't have a clean type system. V and E can be anything and then it
> starts to mess up how paths and functions are applied. This is why I
> hid non-gremlin types behind these interfaces. For example:
>
> $g/V could return a collection of anything (for exaggeration, lets say
> it returns HTTPConnections), then $g/V/outE wouldn't know what to do
> with those HTTPConnections as they are not vertices, they are just
> objects... what is the outE of an HTTPConnection?? Gremlin would need
> to get a reference back to the original graph from which the
> HTTPConnection came from....then stuff gets all complicated.... :(

well, i'd like to see an instance's (ex: HTTPConnection) methods and
especially getters, by reflection, in its outE. also any public final
fields would explicitly point to other instances too. :)

though wouldn't you already have a reference to its origin graph, which
could be passed as a parameter to methods, or injected via a set... to
HTTPConnection if something needs to be "graph aware"?

thanks for the explanations!

Marko A. Rodriguez

unread,

Feb 5, 2010, 12:55:35 PM2/5/10

to gremli...@googlegroups.com

Hey Seth,

>> $g/V could return a collection of anything (for exaggeration, lets
>> say it returns HTTPConnections), then $g/V/outE wouldn't know what
>> to do with those HTTPConnections as they are not vertices, they are
>> just objects... what is the outE of an HTTPConnection?? Gremlin
>> would need to get a reference back to the original graph from which
>> the HTTPConnection came from....then stuff gets all
>> complicated.... :(
> well, i'd like to see an instance's (ex: HTTPConnection) methods and
> especially getters, by reflection, in its outE. also any public
> final fields would explicitly point to other instances too. :)

Ha! ... Yea, wouldn't it be nice if everything was possible with
Gremlin. Be cool if it did your homework, brushed your teeth, ...

In all seriousness, yes, its very possible to make use of
introspection to allow you to walk a graph of your Java object space.
As you say, you can introspect object fields and methods that return
objects.... But now its just a mess---the Gremlin type system is
basically the Java type system ... and then people are going to want
to be able to construct objects and call methods that take parameters
and do what you can do in Java --- then people will say, but Gremlin
doesn't handle generics so then you have to add generic functionality
to Gremlin... dunno--its just not clean and concise anymore...?? Its
basically a contorted script-language on top of Java----and isn't this
what Groovy is all about?

> though wouldn't you already have a reference to its origin graph,
> which could be passed as a parameter to methods, or injected via a
> set... to HTTPConnection if something needs to be "graph aware"?

You have a reference at $g, but not at the path segment outE. You
could assume that the working graph $_g is the graph, but then it gets
complicated when you are working with multiple graphs in Gremlin. If
Gremlin only allowed one graph at a time, then yes, it would be okay
to do that model (I think). But then again, my previous paragraph
thoughts come up again.

> thanks for the explanations!

Thanks for digging deeper and as that CNN guy, Anderson Cooper, always
says: keepin' me real.

See ya,
Marko.

http://tinkerpop.com

Marko A. Rodriguez

unread,

Feb 5, 2010, 2:38:13 PM2/5/10

to gremli...@googlegroups.com

Seth----if you want to work on creating an implementation of the
General Graph model based on JUNG, that'd be cool. Perhaps extends
DirectedSparseMultiGraph (extend it to support property maps on
vertices and edges). However, I don't know what the benefit of it
would be relative to TinkerGraph---both are in-memory graphs with the
same data model. I guess you could always manipulate your JUNG graph
in Gremlin and in Java and thus, use other tools that work with JUNG
graphs... ??

See ya,
Marko.

S H

unread,

Feb 5, 2010, 3:48:59 PM2/5/10

to gremli...@googlegroups.com

Marko A. Rodriguez wrote:
> Seth----if you want to work on creating an implementation of the
> General Graph model based on JUNG, that'd be cool. Perhaps extends
> DirectedSparseMultiGraph (extend it to support property maps on
> vertices and edges). However, I don't know what the benefit of it
> would be relative to TinkerGraph---both are in-memory graphs with the
> same data model. I guess you could always manipulate your JUNG graph
> in Gremlin and in Java and thus, use other tools that work with JUNG
> graphs... ??

i'm considering it. JUNG is used in several scientific applications (
http://sourceforge.net/apps/trac/jung/wiki/ProjectsUsingJUNG ) and has a
lot of built-in algorithms (like pagerank and maxflow) and
visualizations (Swing and Java3D). also IMO is the
lowest-common-denominator graph and hypergraph API, and in order to use
its features with Gremlin i would additionally need a Gremlin -> JUNG
adapter (ex: to visualize Gremlin output as I'd like to do).

it's because i prototype a general purpose graph-based
browser/editor/communicator using Java/JUNG/Swing. relevant to your
wanting to interface Gremlin with filesystems and XML documents, and i
think java reflection of the actual objects and source-code involved is
important too. ultimately a C++ implementation will be necessary to
support the larger and faster graphs (ex: word-level detail of twitter
messages and web content) as well as visualizing them with high
frame-rates and detail.

i think it would be great to include the Neno/Fhat material somehow...

Marko A. Rodriguez

unread,

Feb 5, 2010, 4:10:15 PM2/5/10

to gremli...@googlegroups.com

Hi Seth,

>> i'm considering it. JUNG is used in several scientific
>> applications ( http://sourceforge.net/apps/trac/jung/wiki/ProjectsUsingJUNG
>> ) and has a lot of built-in algorithms (like pagerank and maxflow)
>> and visualizations (Swing and Java3D). also IMO is the lowest-
>> common-denominator graph and hypergraph API, and in order to use
>> its features with Gremlin i would additionally need a Gremlin ->
>> JUNG adapter (ex: to visualize Gremlin output as I'd like to do).

I need to write some documentation on how to use Gremlin from within a
Java application. Its relatively easy:

GremlinEvaluator ge = new GremlinEvaluator();
List results = ge.evaluate("./outE/inV/...")

This way, people could query their JUNG graphs using Gremlin and still
have their JUNG graph to manipulate in their other Java apps.

Going the other way, it would be cool to have direct support for JUNG
graphs in Gremlin, as you say, for this reason --- e.g.:

gremlin> $_g := jung:open()
gremlin> g:load('some-graph-data.xml')
gremlin> jung:pagerank(0.85)

In short, have jung: namespaced Gremlin functions for each of the
algorithms in their algorithms package.

http://jung.sourceforge.net/doc/api/edu/uci/ics/jung/algorithms/importance/package-summary.html

When I was building Confluence (a particle swarm package in Java that
makes use of property graphs), I had a class to convert Confluence
graphs to JUNG graphs --- see: http://www.markorodriguez.com/docs/conf/api/conf/util/JungGraph.html

So then something that converts any property graph into a JUNG graph
may be cool:

gremlin> $_g := jung:convert($h)

And then $_g can be subjected to the graph analysis algorithms
provided by JUNG --- e.g. jung:pagerank($_g, 0.85). However, JUNG
graph algorithms are single-relational so PageRank doesn't make much
sense for multi-relational graphs.... BUT!!! you can always do this http://arxiv.org/abs/0806.2274
.. this is known as graph rewriting http://wiki.github.com/tinkerpop/gremlin/graph-rewriting
...

Thus, you can make use of that wonderful suite of JUNG algorithms for
processing Gremlin graphs (of course---this will only work for in-
memory graphs---or will it?! ... Perhaps making a wrapper to the
PropertyGraph model and JUNG will make it so that convert() is not
necessary and thus, for ANY property graph, you can do:

gremlin> jung:pagerank($_g, 0.85)

Now that would be awesome and totally worth the effort!

> it's because i prototype a general purpose graph-based browser/
> editor/communicator using Java/JUNG/Swing. relevant to your wanting
> to interface Gremlin with filesystems and XML documents, and i think
> java reflection of the actual objects and source-code involved is
> important too. ultimately a C++ implementation will be necessary to
> support the larger and faster graphs (ex: word-level detail of
> twitter messages and web content) as well as visualizing them with
> high frame-rates and detail.
> i think it would be great to include the Neno/Fhat material somehow...

Ha! I love that you love Neno/Fhat [ http://neno.lanl.gov ]. You don't
know how many people make fun of me for that monstrosity. :) ... I
love it though---its like a Frankenstein mutant from another dimension.

S H

unread,

Feb 5, 2010, 4:39:16 PM2/5/10

to gremli...@googlegroups.com

Marko A. Rodriguez wrote:
> Thus, you can make use of that wonderful suite of JUNG algorithms for
> processing Gremlin graphs (of course---this will only work for

> in-memory graphs---or will it?! ... Perhaps making a wrapper to the

> PropertyGraph model and JUNG will make it so that convert() is not
> necessary and thus, for ANY property graph, you can do:
>
> gremlin> jung:pagerank($_g, 0.85)
>
> Now that would be awesome and totally worth the effort!

well maybe Linked Processes can be used to make distributed equivalents
of JUNG's algorithms?

> Ha! I love that you love Neno/Fhat [ http://neno.lanl.gov ]. You don't
> know how many people make fun of me for that monstrosity. :) ... I
> love it though---its like a Frankenstein mutant from another dimension.

yes i absolutely do and i wonder why i must live in a world where Neno
isn't the universal architecture for computer hardware and software!

:)

is there any more Neno/Fhat documentation or code that's not on the website?

Marko A. Rodriguez

unread,

Feb 5, 2010, 5:10:51 PM2/5/10

to gremli...@googlegroups.com

Hey Seth,

I just created an implementation of JUNGs Graph interface:

http://github.com/tinkerpop/gremlin/blob/538201b3665583cf8531e572f7d092176233a113/src/main/java/com/tinkerpop/gremlin/models/ggm/jung/JungGraph.java

Now its possible to use the Jung algorithms package on Gremlin graphs!
Even disk-based graphs like Neo4j graphs!

Next, I'm going to create a function library called JungFunctions that
implements each of the Jung algorithm:

e.g. PageRank, KStepMarkov, Betweenness, Closeness, etc. etc.

Then we can do stuff like this:

gremlin> $_g := neo4j:open('/tmp/neograph')
gremlin> jung:pagerank($_g, 0.85)

Pretty cool, eh?

I haven't tested the implementation, so.... still need to do that. And
yes---I'll think about how this would work over LinkedProcess/
Gargamel-----my wheels are turning....

Take care,
Marko.

http://markorodriguez.com
http://linkedprocess.org

Marko A. Rodriguez

unread,

Feb 5, 2010, 5:23:38 PM2/5/10

to gremli...@googlegroups.com

Seth,

> yes i absolutely do and i wonder why i must live in a world where
> Neno isn't the universal architecture for computer hardware and
> software!

Exactly. One address space for all of computing --- everyone's
physical machines support the execution of a massive pool of
distributed abstract virtual machines executing distributed code
processing distributed data structures. Its a trip --- imagine how
different our computing lives would be!

*** Unfortunately, I never wrote more about it. I sorta of talk about
it more in this article http://arxiv.org/abs/0905.3378 .. and kinda,
maybe hint at it in this article http://arxiv.org/abs/0908.0373

See ya,
Marko.

http://arxiv.org/abs/0704.3395

Peter Neubauer

unread,

Feb 7, 2010, 3:22:52 PM2/7/10

to gremli...@googlegroups.com

Yeah,
Neno is utterly cool, if you like Frankensteins. And so will Gargamel be :)

Cheers,

/peter neubauer

COO and Sales, Neo Technology

GTalk: neubauer.peter
Skype peter.neubauer
Phone +46 704 106975
LinkedIn http://www.linkedin.com/in/neubauer
Twitter http://twitter.com/peterneubauer

http://www.neo4j.org - Your high performance graph database.
http://gremlin.tinkerpop.com - The terminal to the Giant Global Graph.

Reply all

Reply to author

Forward