[Gremlin/Pipes] Convert a "SQL" query to Gremlin

901 views
Skip to first unread message

Rob Elsner

unread,
May 4, 2012, 6:31:56 PM5/4/12
to gremli...@googlegroups.com
Hi,

I'm just looking for some ideas about the following structure and what a psuedo-SQL query would be:

Node {
   // various properties here, not relevant
}

Edge {
   HasA : Boolean
   HasB : Boolean
   Level : Float
}

Here is the sample "SQL" query:
select Node n where Edge.HasA = true and Edge.HasB = true and Edge.Level between currentLevel and currentLevel + ProximityDelta
 order by Edge.Level ASC;

This is what I send to the Gremlin groovy engine:

graph.E().filter { it.LEVEL >= 2.0 && it.LEVEL <= 4.0 && it.HasA == true && it.HasB == true }.sort { it.LEVEL }.collect { it.inVertex }

I'm curious what ideas anyone has on optimizing this conversion, or a better way to retrieve the inVertices when their Edges match certain properties?  We want all of the nodes from the entire graph, where they are the inVertex of an Edge matching given criteria.

Thank you in advance,
Rob Elsner

Nikhil

unread,
May 5, 2012, 1:22:45 AM5/5/12
to gremli...@googlegroups.com
Hey Rob,

I've been working on writing a Rails ActiveRecord connection adapter for Neo4j. Since ActiveRecord and Active Relation (AREL) is tightly coupled with SQL syntax, I had to fit my Gremlin queries within high-level DSL written initially for SQL.

I would suggest some sytax changes as per new syntax introduced with Gremlin 1.5.

Start with all edges (Although that isn't a good idea if your graph bloats in size, I would reduce my search space either by starting from a specific node or an edge using identifier(s) or indices). Further, it would also help in labeling these edges, so that you can have heterogeneous relationships in your graph.

g.E

Name this step as 'edges', we might want to come back
.as('edges')

Filter by LEVEL less than or equal to 4.0
.has('LEVEL', T.lte, 4.0)

Go back to 'edges', this shall now prune the result set to the ones satisfying above condition
.back('edges')

Filter by LEVEL greater than or equal to 2.0
.has('LEVEL', T.gte, 2.0)

Go back to 'edges'
.back('edges')

Filter by HasA condition, although I feel this could be renamed to 'A' for readability
.has('HasA', true)

Go back to 'edges'
.back('edges')

Filter by HasB criteria
.has('HasB', true)

Go back to 'edges'. This should emit all edges satisfying above 4 criteria
.back('edges')

Sort them by LEVEL
.sort{it.LEVEL}

Run an iterator to get edge iterator instead of a groovy list
._()

Get all end vertices
.inV

Final query would look like

g.E.as('edges').has('LEVEL', T.lte, 4.0).back('edges').has('LEVEL', T.gte, 2.0).back('edges').has('HasA', true).back('edges').has('HasB', true).back('edges').sort{it.LEVEL}._().inV

This query looks a little longer due to the back() steps. But it really helps when you're trying to achieve maintainability. Assume that your filter criteria are not fixed and can change over time. You might not want to go back and dig into your Gremlin query. Instead you could simply use a conditions Array and join them using back() steps at runtime. Later, when new conditions are added or existing ones are removed, it's just a matter of modifying your conditions Array!

HTH

--
Nikhil
--
Nikhil Lanjewar
Engineering Lead at YourNextLeap
http://yournextleap.com

Salvatore Piccione

unread,
May 5, 2012, 2:04:03 PM5/5/12
to gremli...@googlegroups.com
Hello Nikhil,

your reply is very helpful, anyway I cannot fully understand the usage of back() steps in your solutions. After a has() step, you already have the edges that are compliant with the filtering criterion you specified so you don't need to go back every time.
About the maintanability and the usage of a collection (array) of criteria, you could do something like this:

i = 0;
g.E.has(conditions[i].propertyName,conditions[i].condition,conditions[i].).sideEffect{i++}.loop(2){it.loops < conditions.length - 1}._().inV

WDYT?

Salvatore

2012/5/5 Nikhil <nikhil...@gmail.com>

Rob Elsner

unread,
May 6, 2012, 11:36:29 AM5/6/12
to gremli...@googlegroups.com
On Friday, May 4, 2012 11:22:45 PM UTC-6, Nikhil Lanjewar wrote:
Hey Rob,
 
Hi Nikhil!  Thank you for your input.
 
For the graphs we currently have they run around 5000 edges, they're relatively static.  An index on the two boolean properties would be easy, but I don't see a way in Gremlin to provide an index which lets me select a range of values from it so I think at some point it will be a linear scan anyway, though the other two properties filter out most of the edges so this shouldn't be much of a concern.
Filter by HasA condition, although I feel this could be renamed to 'A' for readability
 
 
The real names are different, I just wanted to use something simple. :)
 
 
g.E.as('edges').has('LEVEL', T.lte, 4.0).back('edges').has('LEVEL', T.gte, 2.0).back('edges').has('HasA', true).back('edges').has('HasB', true).back('edges').sort{it.LEVEL}._().inV

This query looks a little longer due to the back() steps. But it really helps when you're trying to achieve maintainability. Assume that your filter criteria are not fixed and can change over time. You might not want to go back and dig into your Gremlin query. Instead you could simply use a conditions Array and join them using back() steps at runtime. Later, when new conditions are added or existing ones are removed, it's just a matter of modifying your conditions Array!
 
 
This is one of the eventual goals, to remove the tight coupling between the caller and the query as we have now.  I'm a bit confused about the .back() calls but I will look at the documentation.
 
I like your solution quite a bit because it can be done in Java instead of through Groovy. 
 
Thank you for the information!
 
Rob

Marko A. Rodriguez

unread,
May 6, 2012, 12:09:16 PM5/6/12
to gremli...@googlegroups.com, gremli...@googlegroups.com
Hi,

> I like your solution quite a bit because it can be done in Java instead of through Groovy.

Would people be interested in a blog post on Gremlin Java? What you can do in Gremlin Groovy can be done in Gremlin Java with the only drawback being that closures (e.g. filter{}) are represented as anonymous inner classes (e.g. filter(new PipeFunction<Vertex,Boolean>() { public Boolean compute(Vertex arg) { ... })).

Thoughts?,
Marko.

http://markorodriguez.com

Pierre De Wilde

unread,
May 6, 2012, 12:49:00 PM5/6/12
to gremli...@googlegroups.com

Hi Marko,

I think so. A good guide to start with Gremlin Java for non-java developers will be very helful.

Thanks,
Pierre

Rob Elsner

unread,
May 6, 2012, 1:10:07 PM5/6/12
to gremli...@googlegroups.com

Would people be interested in a blog post on Gremlin Java? What you can do in Gremlin Groovy can be done in Gremlin Java with the only drawback being that closures (e.g. filter{}) are represented as anonymous inner classes (e.g. filter(new PipeFunction<Vertex,Boolean>() { public Boolean compute(Vertex arg) { ... })).

The reason I like the Groovy way is because filter{} is so clean.  I'm trying to abstract the Groovy and graph traversal, something akin to Hibernate Criteria queries, where you can pass in a higher-level form the search criteria and this gets mapped on to the graph filtering functions Gremlin provides.
 
Rob

pShah

unread,
May 6, 2012, 6:47:01 PM5/6/12
to gremli...@googlegroups.com
Hi Marko,
  I would definitely appreciate such a guide.  Most of the examples are in Gremlin and even after using the cheat sheet and the Java implementation document, it is hard to translate a Gremlin/Groovy code to Java.
  This is what I am currently using for references:
  1. Gremlin Steps (Cheat Sheet)
  2. Gremlin Methods (Cheat Sheet)
  3. JVM Language Implementations (Cheat Sheet)
  4. Using Gremlin through Java

  Thanks.

Nikhil

unread,
May 7, 2012, 3:47:03 AM5/7/12
to gremli...@googlegroups.com
Hello Salvatore,

back() steps might not seem directly useful in this case, but IMO, it's a good practice to have back() steps to desired step just before you want to process something on a particular result set. This is useful in conditions involving sub-condition pipes that emit something other than your result set (vertices, properties, etc) like:

g.E.as('edges').inV.has('id', 123).back('edges').outV.has('id', 456).back('edges')

Quoting my experience, I'm firing Gremlin traversals on a Neo4j REST server from a Ruby on Rails application. Here, I'm constructing the traversal String which is ultimately passed as a POST parameter to Neo4j REST server. In such a case, having back() steps for all conditions makes my traversals much robust and condition-agnostic. I would simply need to visit an already constructed chained query written in AREL in a binary tree fashion to generate a final traversal without bothering about the result set of my dynamic conditions.

I'd be happy to elaborate on my implementation of a Rails connection adapter for Gremlin traversals if needed. :)

Nikhil

unread,
May 7, 2012, 3:49:54 AM5/7/12
to gremli...@googlegroups.com
Hi Rob,

I have replied to Salvatore's email elaborating on a use-case where back() steps can be applied more effectively. Although, I find Gremlin's documentation coupled with Marko and Pierre's support more than sufficient for enlightenment! :)

Salvatore Piccione

unread,
May 7, 2012, 4:10:21 AM5/7/12
to gremli...@googlegroups.com
Thanks Nikhil,

this sheds light on your first reply! Indeed this is an approach that I commonly use for more complex traversals where the back() step is a must: for example I use it when my data are structured in a way that the attributes of an entity are stored as a tree (something like a RDF Graph) so, in order to select the entities that meet a set of criteria, I have to go back to the entity roots every time I finish to check a criterion so that I can start a new traversal to check the next criterion. For trivial traversals (basically the check of some conditions on Vertex or Edges) I rely on the query functionalities provided by the Tinkerpop implementation I use (OrientDB).

BTW, I'd like to see some more details (or simply the indication of some URLs) about the integration of Groovy and Java classes in the same project. Personally, I found some issues in integrating Groovy classes in a Java project and, mainly because of the short time I had for further investigations, I dediced to use Gremlin Java which is more efficient than using Gremlin through the Java Script Engine...

2012/5/7 Nikhil <nikhil...@gmail.com>
Reply all
Reply to author
Forward
0 new messages