Have Gremlin Talk to Redis in Real Time while It's Walking the Graph

426 views
Skip to first unread message

James Thornton

unread,
Aug 23, 2011, 2:34:44 AM8/23/11
to gremli...@googlegroups.com
Instead of using Gremlin to do batch queries where you issue a query and update a map or group counter and wait for the results, you can create a user-defined Gremlin step that updates and interacts with Redis in real time. 

Of course you're not limited to just Redis -- this will work for Memcached or any external datastore. 

But Redis is blazing fast and supports a variety of data structures (http://redis.io/topics/data-types-intro), such as hashes, lists, and sets. And it has a rich set of commands (http://redis.io/commands) that allow you to do stuff like increment counters and append to lists so it's one of the more interesting options.

You could also use something like Octobot (https://github.com/cscotta/Octobot) to send stuff to a task queue as Gremlin finds things along the way.

To see how this works, you need Jedis, the recommended Java/Groovy client (http://redis.io/clients). 

This is easy to set up and start playing with -- the only thin you need to do is download Jedis jar file and copy it into your Gremlin lib directory. Here are the steps to try this out:

Make sure you have Redis up and running, and then...

Download the latest Jedis jar file from here (https://github.com/xetorthio/jedis/downloads)

Copy the jedis jar to your Gremlin lib directory
$ cp jedis-2.0.0.jar gremlin/target/gremlin-1.2-SNAPSHOT-standalone/lib

Download the redis.groovy example gist: https://gist.github.com/1164389
$ cd gremlin

Run the example script
$ ./gremlin.sh -e redis.groovy

 - James

Peter Neubauer

unread,
Aug 23, 2011, 2:45:05 AM8/23/11
to gremli...@googlegroups.com

Nice demonstration of a side effect!

/peter

Sent from my phone.

James Thornton

unread,
Aug 24, 2011, 2:20:21 AM8/24/11
to gremli...@googlegroups.com
Of course you're not limited to just Redis -- this will work for Memcached or any external datastore.

Speaking of memcached, Amazon just announced the Elasticache service (http://aws.amazon.com/about-aws/whats-new/2011/08/22/announcing-amazon-elasticache/) for EC2, which means you won't have to mess with hosting or distributing memcached yourself -- makes things super easy.

- James 

Marko Rodriguez

unread,
Aug 24, 2011, 4:30:10 PM8/24/11
to gremli...@googlegroups.com
Hey,

Instead of using Gremlin to do batch queries where you issue a query and update a map or group counter and wait for the results, you can create a user-defined Gremlin step that updates and interacts with Redis in real time.

That is pretty sweet. Its a way to make persistent/scalable "sideEffect" data structures using Gremlin. Super hot.

Question (to play devil advocate): Why not just use a Java persistent collection? E.g. http://code.google.com/p/pcollections/ ... For example, for large graph ranking (1 million+ vertices, I use JDBM2's persistant HashMap so the groupCount(m) can scale).

Thoughts?,
Marko.

Peter Neubauer

unread,
Aug 24, 2011, 4:36:15 PM8/24/11
to gremli...@googlegroups.com

+ 1 on that!

/peter

Sent from my phone.

stephen mallette

unread,
Aug 24, 2011, 4:46:18 PM8/24/11
to gremli...@googlegroups.com
Marko, I thought James' work here was pretty hot too.  I know I've used external apps to warm a cache.  If I've got a graph and an architecture with Redis/memcached hooked into it, I could use James' approach and a host of scheduled Gremlin jobs to preload a cache and make it available to other services.  It actually opened up a new line of thinking for me in regards to what side-effects can do.

Stephen

Marko Rodriguez

unread,
Aug 24, 2011, 4:51:33 PM8/24/11
to gremli...@googlegroups.com
> Marko, I thought James' work here was pretty hot too. I know I've used external apps to warm a cache. If I've got a graph and an architecture with Redis/memcached hooked into it, I could use James' approach and a host of scheduled Gremlin jobs to preload a cache and make it available to other services. It actually opened up a new line of thinking for me in regards to what side-effects can do.

Ah. Within a "services architecture," yes, that is pretty sweet. A way for the results of a Gremlin traversal to be exposed to services. Classy.

Huh... thats actually pretty crazy as it sorta bypasses the need for Rexster's REST interface in some situations. For example, trigger a Gremlin query through Rexster which is updating a Redis server. Then, when complete, don't get the results streamed back as JSON through Rexster, simply ping Redis for data... Also, have multiple different services having access to those side effect results :/ .. hmmm.. . . . . . . . . . perhaps too spaghetti ?...

What else are you thinking?

Marko.

http://markorodriguez.com

stephen mallette

unread,
Aug 24, 2011, 5:22:19 PM8/24/11
to gremli...@googlegroups.com
Yeah...James and I have kicked around this idea in the context of Rexster in conjunction with external caching like Redis.  Those ideas have finally found a spot in Kibbles:


but the general point of the idea lies in:


How "spaghettii" the Redis side gets I guess largely depends on the complexity of the solution, but generally speaking it seems like a reasonably sane thing to do.  

The other thing I thought about in regards to side-effects and James' approach is that this seems like a perfectly good way to push data to a datamart for more traditional business intelligence/reporting tools to crunch the gremlin output.  It's largely the same process just a different place for output.  

Daniel Quest

unread,
Aug 24, 2011, 8:42:57 PM8/24/11
to gremli...@googlegroups.com
cool,

Something else to consider. Multiple users all can push data from
gremlin queries to Redis (or any cache for that matter). Could make
targeted mashups. Given blueredis, in theory you could then query the
results again using Gremlin. Feels too computer science theory?

Daniel

Russell Jurney

unread,
Aug 24, 2011, 8:57:07 PM8/24/11
to gremli...@googlegroups.com
I think Gremlin and Pacer should rely on a caching layer like this for
all kinds of things.

Russell Jurney
twitter.com/rjurney
russell...@gmail.com
datasyndrome.com

James Thornton

unread,
Aug 25, 2011, 4:25:17 AM8/25/11
to gremli...@googlegroups.com
What else are you thinking?

Instead of just writing to Redis, you could also have Gremlin query it and do element-level operations inline (comparisons, updates, filtering, etc) instead of waiting for Gremlin to return and doing the operations in your application code.

When you have multiple disparate datastores, using something like Redis as a bridge gives you a way to "join" the graph to the others while the query is running.  

- James

stephen mallette

unread,
Aug 25, 2011, 7:17:04 AM8/25/11
to gremli...@googlegroups.com
James...I'm constantly looking for those bridges...this is a good one.

Pranav Shah

unread,
Aug 25, 2011, 11:06:59 AM8/25/11
to gremli...@googlegroups.com
Question from a newbie:

I am going to use the Graph from "Defining a Property Graph"

The question is: Give me everything that was created('name') by 'marko' and anyone he 'knows'
Assume that we are always looking for the same piece of informtion 
The answer will always be lop(3), ripple(5).

Is it better to do this once save within Redis (or anything along those lines) and always look at Redis first or is it better to keep querying through Gremlin everytime.

Are there any drawbacks to this?

Thanks

Marko Rodriguez

unread,
Aug 25, 2011, 11:12:55 AM8/25/11
to gremli...@googlegroups.com
Hi,

One of the pipes I plan to add is MemoizationPipe which implements MetaPipe. With this Pipe, it has an internal Map<Object,Object> that will store previous computations and then when the same object comes through again, it will simply lookup the result instead of computing it.

For example:

gremlin> g.v(1).out.out.name.toString()
==>[OutPipe, OutPipe, PropertyPipe(name)]
gremlin> g.v(1).out.out.name.memoize(2).toString()
==>[OutPipe, MemoizationPipe[OutPipe, PropertyPipe(name)]]

While not related to your Redis-specific question, I thought you might like the thought. Also, its a way for me to make a ticket and write an email at the same time :) -- two birds.

See ya,
Marko.

Marko Rodriguez

unread,
Aug 25, 2011, 11:13:59 AM8/25/11
to gremli...@googlegroups.com
Oh. To extend this----and perhaps the MemoizationPipe's internal map can be stored in Redis if Jedis has a Map implementation that is backed by Redis -- (I'm not familiar with Jedis so I speculate).

Marko.

Pranav Shah

unread,
Aug 25, 2011, 11:35:11 AM8/25/11
to gremli...@googlegroups.com
Hi Marko,
  Marko that sounds great.  I am also assuming that there will some sort of switch or something that cleans the cache becuase the Graph has changed.  Even if this is a manual process and does not happen automoatically.

  I am working on a scenario where there will be a lot of reads / search through traversals, but some parts of the data would change on a monthly basis.  At that point, the results that are cached will not necessearily be true.

Thanks,
Pranav

Daniel Quest

unread,
Aug 25, 2011, 9:12:27 PM8/25/11
to gremli...@googlegroups.com
One thing to think about before we build too much around this cool
idea. As far as I understand it, you start up the redis server; then
connect with your Redis library. We don't want people to have to
install dependancies before they can just start using Gremlin. The
way it comes to you know is so great. Make sure and correct me if
there is an easy way to have redis (A C program) installed, and
started before you run maven and all the unit tests. There is
probably something about the packaging mechanism that I don't
understand that would enable this.

On another note, a memory pipe is a great idea Marko. Quick
question... lets say you run a traversal, and as you traverse you
modify or add nodes to the graph. How would you make sure that the
MemoizationPipe's map is in sync? Or is the objective to not be in
sync but instead hold the state of the DB at the time of the previous
traversal?

-Daniel

Marko Rodriguez

unread,
Aug 26, 2011, 11:02:02 AM8/26/11
to gremli...@googlegroups.com
Hi,

On another note, a memory pipe is a great idea Marko.  Quick
question... lets say you run a traversal, and as you traverse you
modify or add nodes to the graph.  How would you make sure that the
MemoizationPipe's map is in sync?  Or is the objective to not be in
sync but instead hold the state of the DB at the time of the previous
traversal?

You would have to be smart to "reset" the MemoizationPipe. MemoizationPipe would be mainly for traversals that are non-modifying --- that is read-only traversals. And if the graph is being modulated by another thread, well then thats life.

Marko.

Daniel Quest

unread,
Aug 26, 2011, 12:46:05 PM8/26/11
to gremli...@googlegroups.com
Agreed, thanks for clearing that up.

-Daniel

Reply all
Reply to author
Forward
0 new messages