Gremlin over MongoDB

Marko A. Rodriguez

unread,

Dec 27, 2009, 7:25:02 PM12/27/09

to gremli...@googlegroups.com

Hi everyone,

I've spent the last couple of days implementing the General Graph Model [ http://wiki.github.com/tinkerpop/gremlin/the-general-graph-model ] for the document key/value database MongoDB [ http://www.mongodb.org/ ].

In Mongo, there are two collections: a vertex and an edge collection. The JSON data model used to model a property graph can be inferred from

http://github.com/tinkerpop/gremlin/raw/master/trunk/src/test/resources/com/tinkerpop/gremlin/db/mongo/graph-example-1.json

which is diagrammed in http://tinkerpop.com/docs/graph-example-1.jpg .

The current implementation passes all the TestSuite cases [ http://wiki.github.com/tinkerpop/gremlin/the-general-graph-model-test-suite ], though the Index.java interface has yet to be implemented. Note that the implementation is not that efficient and many methods are taking a long time to complete... This is a matter of MongoDB not being a "true" graph data structure, but also, probably because of some poor choices in my first implementation.

If you are in the Gremlin console, and have MongoDB running, you can use the following mongo:open() function:

gremlin> $g := mongo:open('127.0.0.1', 27017, 'mongo_tests')

==>mongograph[db:mongo_tests]

gremlin> g:load($g, 'src/test/resources/com/tinkerpop/gremlin/model/parser/graph-example-1.xml')

==>true

gremlin> $g/V

==>v[5]

==>v[6]

==>v[2]

==>v[1]

==>v[4]

==>v[3]

gremlin> $g/E

==>e[7][1-knows->2]

==>e[8][1-knows->4]

==>e[9][1-created->3]

==>e[10][4-created->5]

==>e[11][4-created->3]

==>e[12][6-created->3]

Next steps are to optimize the implementation, implement the indexing interface, and provide documentation. Finally, once the MongoDB implementation is solid and well understood, I think implementing a CouchDB graph will be good.

Any thoughts, suggestions, etc. are welcome.

Take care,

Marko.

http://markorodriguez.com

http://tinkerpop.com

Peter Neubauer

unread,

Dec 28, 2009, 7:27:41 AM12/28/09

to gremli...@googlegroups.com

Awesome,
anything that propagates more Thinking in Graphs is good IMHO. The
speed and other aspects of using MongoDB och CouchDB for graphs are of
course another story, but it is great to have some means of showing
the generality of Graph concepts in the NOSQL space.

Good work!

Cheers,

/peter neubauer

COO and Sales, Neo Technology

GTalk: neubauer.peter
Skype peter.neubauer
Phone +46 704 106975
LinkedIn http://www.linkedin.com/in/neubauer
Twitter http://twitter.com/peterneubauer

http://www.neo4j.org - Relationships count.
http://gremlin.tinkerpop.com - PageRank in 2 lines of code.
http://www.linkedprocess.org - Computing at LinkedData scale.

Alexy Khrabrov

unread,

Dec 28, 2009, 11:32:12 AM12/28/09

to gremli...@googlegroups.com

Marko -- this is great, glad my Mongo example was useful. I'm pretty sure in-memory Mongo will be not much slower, if at all, than Neo if the graph is properly represented, and, most importantly, indexed. If you index each collection on its key, it will be much faster than before. See ensureIndex() in Mongo docs.

Mongo does not guarantee I in ACID due to in-RAM speed, flushed to disk as needed or on demand -- but is very fast, creating in-memory cache for the data in use which expands to as much RAM as available.

Since it talks through a port via protocol, clients exist both for Java world (Java, Scala, Clojure) and for C and Ruby and more.

I've started using Mongo for JSON graph representation of Twitter data earlier, and find it a natural fit for my algorithms. I believe it complements Neo, both have use cases where they shine.
We'll continue using document databases for graph storage and querying.

Cheers,
Alexy

Marko A. Rodriguez

unread,

Dec 28, 2009, 1:01:40 PM12/28/09

to gremli...@googlegroups.com

Hi,

Yes---thanks for your Twitter graph usecase in MongoDB. It made me think "outside the box" in terms of using Gremlin over other data management systems beyond just systems oriented directly at graphs. Also, VertexDB over Tokyo Cabinet [ http://github.com/stevedekorte/vertexdb ] was another point of inspiration.

Note though, as it stands right now, the Gremlin/MongoDB connector is not a "general purpose connector" that will work over ANY mongo data structure---its particular to a type a data model (the one exemplified in http://github.com/tinkerpop/gremlin/raw/master/trunk/src/test/resources/com/tinkerpop/gremlin/db/mongo/graph-example-1.json ). If we can find a "general purpose connector" that would be most excellent as it would make Gremlin more generally useful to the MongoDB community... Thoughts?

Finally, Emil pointed me to MongoDB references as a way to get direct pointers between JSON documents --- http://www.mongodb.org/display/DOCS/DB+Ref .. I haven't looked into it yet, but things like such could be helpful to yielding a useable MongoDB/Gremlin system.

Take care,

Marko.

Peter Neubauer

unread,

Dec 28, 2009, 1:48:55 PM12/28/09

to gremli...@googlegroups.com

Hi Alexy,
do you have the Twitter-dataset available? We were a bit late to
download it if I remember right. Maybe you could put it somewhere or
get it over for some tests with Neo4j?

Then, I think it wold be great to have more connectors to Gremlin.
After all, Gremlin is concentrating on solving real problems and
questions the graphy way, And it may actually become the first query
language to support all NOSQL data stores! And that would start
focusing attention from the NOSQL and pure scaling aspect of things
onto the data model and its manipulations - graphs being the only
real valuable and workable abstraction beside the relational model in
practice.

Cheers,

/peter neubauer

COO and Sales, Neo Technology

GTalk: neubauer.peter
Skype peter.neubauer
Phone +46 704 106975
LinkedIn http://www.linkedin.com/in/neubauer
Twitter http://twitter.com/peterneubauer

http://www.neo4j.org - Relationships count.
http://gremlin.tinkerpop.com - PageRank in 2 lines of code.
http://www.linkedprocess.org - Computing at LinkedData scale.

Alexy Khrabrov

unread,

Jan 5, 2010, 9:14:00 PM1/5/10

to gremli...@googlegroups.com

On Dec 28, 2009, at 1:48 PM, Peter Neubauer wrote:

> Hi Alexy,
> do you have the Twitter-dataset available? We were a bit late to
> download it if I remember right. Maybe you could put it somewhere or
> get it over for some tests with Neo4j?
>
> Then, I think it wold be great to have more connectors to Gremlin.
> After all, Gremlin is concentrating on solving real problems and
> questions the graphy way, And it may actually become the first query
> language to support all NOSQL data stores! And that would start
> focusing attention from the NOSQL and pure scaling aspect of things
> onto the data model and its manipulations - graphs being the only
> real valuable and workable abstraction beside the relational model in
> practice.

Peter -- that old dataset is history, Twitter long since provides Streaming API which allows to get a significant portion of all twits, or indeed all for 50,000 users, or any on a filter. We're doing that, and slice and dice it in a variety of ways. You can easily subscribe and very quickly gather a significant dataset. My plans include testing Neo4j, too. My graphs are subsets, and if Twitter allows, I'd share them as a part of published research. I'm going to use Gremlin for algorithms identifying patterns on Twitter, and it's very interesting to leverage the graph model and come up with a simple set of operations for exploratory data analysis on graphs.

Cheers,
Alexy

Peter Neubauer

unread,

Jan 6, 2010, 5:57:30 AM1/6/10

to gremli...@googlegroups.com

Cool Alexy,
would be great if some of these could find their way into the Gremlin
Wiki as examples then, and keep us updated on your progress on this!

Cheers,

/peter neubauer

COO and Sales, Neo Technology

GTalk: neubauer.peter
Skype peter.neubauer
Phone +46 704 106975
LinkedIn http://www.linkedin.com/in/neubauer
Twitter http://twitter.com/peterneubauer

http://www.neo4j.org - Relationships count.
http://gremlin.tinkerpop.com - PageRank in 2 lines of code.
http://www.linkedprocess.org - Computing at LinkedData scale.

shishya

unread,

Feb 17, 2014, 8:54:01 AM2/17/14

to gremli...@googlegroups.com

Hi,

Though this quite old topic. But I am trying to connect Mongodb from Gremlin of Orientdb.
And the above syntaxes are not valid.

gremlin> g := mongo:open('192.168.1.15', 10050, 'mongo_tests')
groovysh_parse: 47: unexpected token: = @ line 47, column 4.
g := mongo:open('192.168.1.15', 10050, 'mongo_tests')
^

1 error
Display stack trace? [yN]

Marko Rodriguez

unread,

Feb 17, 2014, 10:19:19 AM2/17/14

to gremli...@googlegroups.com

Hello,

:= is not valid =. Please use =. Moreover, what is mongo:open. That will not be valid syntax either. I recommend reading over Gremlin/Groovy syntax as you have some basic problems with your code:

http://gremlindocs.com

http://gremlin.tinkerpop.com

And in particular, please see Stephen Mallette's post on Gremlin/Groovy + MongoDB.

http://thinkaurelius.com/2013/02/04/polyglot-persistence-and-query-with-gremlin/

Good luck,

Marko.

http://markorodriguez.com

--
You received this message because you are subscribed to the Google Groups "Gremlin-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gremlin-user...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Reply all

Reply to author

Forward