Recursive Ancestor Query Terminating With Ancestors With No Input Edges

887 views
Skip to first unread message

Geary Sutterfield

unread,
Feb 9, 2017, 6:29:54 PM2/9/17
to Gremlin-users
I'm still trying to understand Gremlin, and am having trouble with constructing a query to retrieve all ancestors of a vertex, terminating with ancestors without input edges. I'm hoping someone can point me in the right direction. I've looked at several examples on the Web, but each one was different enough to cause them not to work for my case.

I'm working with a SIMPLE graph that consists of vertices labeled "event" and "data". Edges from "event" to "data" vertices represent output of a processing event. Edges from "data" to "event" vertices represent input to a processing event. A processing "event" can have one or more "data" input edges (for example, a zip process event), and also one or more "data" output edges (for example, an unzip process event).

I'm starting with a final output "data" vertex, and I want to first retrieve all "event" vertices with an outgoing edge to the this "data" vertex. These are the immediate ancestor processing events that created the data. Then I want to recursively retrieve all ancestors of these "event" vertices, terminating with "event" vertices without any input edges. Examples of these "event" vertices without input edges would be data that was ingested into the system from external sources. Multiple ingested data files may exist in the ancestor tree to the final output "data" event that I use as initial input to the query. So the full ancestor graph may resemble a large V shape.

I'm wondering if this could be described as an "all ancestor" query. Possibly the same thing. The key is traversing backwards, i.e., in the ancestor direction. The graph I'm working with has only one edge between any two vertices. A "data" to "event" edge is an input, and a "event" to "data" edge is an output. So I'm hoping that will simplify the query.

So far, I tried queries that look similar to this one without getting all the results I need:
g.V(4308).repeat(until(outE().count().is(0))

I can supply a more detailed description if that would be useful. Any help or pointers on how to approach the query would be appreciated.

Thanks

Marko Rodriguez

unread,
Feb 9, 2017, 6:53:27 PM2/9/17
to gremli...@googlegroups.com
Hi,

I didn’t read your email, but I read the title and maybe, just maybe, the query below explains to you what you want to know.

gremlin> g.V().until(outE().count().is(0)).
               repeat(out()).
           values('name')
==>lop
==>vadas
==>ripple
==>lop
==>vadas
==>lop
==>ripple
==>lop
==>ripple
==>lop
gremlin> g.V().repeat(out()).
               until(outE().count().is(0)).
           values('name')
==>lop
==>vadas
==>ripple
==>lop
==>ripple
==>lop
==>lop
gremlin>

Repeat() is cool in that if the until() is before it, it means: “while-do” and if the until() is after it, it means “do-while.”

Note that in the traversal above you are only emitting those vertices that have no outgoing edges (i.e. descendent-less ancestors).

Did I get luckily and solve your problem by only reading the title?

Marko.
--
You received this message because you are subscribed to the Google Groups "Gremlin-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gremlin-user...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gremlin-users/010abf1c-e8c4-4819-b3e5-283a5a0008f9%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Geary Sutterfield

unread,
Feb 10, 2017, 10:31:26 AM2/10/17
to Gremlin-users
Thanks, Marko. I learned something new from your examples, but I don't believe it's quite what I need. I created a diagram of the test graph that I'm using, and added examples of the queries I need and what they should return. (A picture is worth a thousand words.)

The node id inputs to the queries represent dataset nodes for which a FULL processing history is needed, all the way back to when the ancestor datasets were ingested into the system. (The example graph includes only one ingested dataset in the history, but, in general, there could be more than one.) The processing history can be complex, but, at this point, we don't expect any cycles.

The graph will be a simple, directed graph, as shown in the first example. (I didn't add the edge arrows on the last two examples.)

I hope this helps in understanding the nature of the graph and the query we're trying to create.

Geary Sutterfield

unread,
Feb 10, 2017, 10:33:59 AM2/10/17
to Gremlin-users

Sorry, forgot to attach the file with graph and examples queries. Here it is.


On Thursday, February 9, 2017 at 6:53:27 PM UTC-5, Marko A. Rodriguez wrote:
queries.png

Geary Sutterfield

unread,
Feb 10, 2017, 2:55:50 PM2/10/17
to Gremlin-users
Made some progress on this query. I've attached a better visual of the graph I'm using for testing that includes vertex ids. (Produced from Gephi.) I used the subgraph step example in the TinkerPop documentation to create what I need, I think. The only issue is that I have to include a times(nn) step to ensure I get all of the nodes in the subgraph, and I'm not sure how to generalize it to get all the nodes, regardless of the size of the graph tree.

Here's what I did with starting node id 4264. Looks like it returns the list of nodes I'm looking for (all ancestors).

gremlin> subgraph = g.V(4264).repeat(__.inE().subgraph('subGraph').outV()).times(15).cap('subGraph').next()

==>tinkergraph[vertices:6 edges:5]

gremlin> sg = subgraph.traversal(standard())

==>graphtraversalsource[tinkergraph[vertices:6 edges:5], standard]

gremlin> sg.V().sort()

==>v[4184]

==>v[4256]

==>v[4264]

==>v[4280]

==>v[8280]

==>v[8352]


Same query except starting with node id 12376


gremlin> subgraph = g.V(12376).repeat(__.inE().subgraph('subGraph').outV()).times(15).cap('subGraph').next()

==>tinkergraph[vertices:8 edges:7]

gremlin> sg = subgraph.traversal(standard())

==>graphtraversalsource[tinkergraph[vertices:8 edges:7], standard]

gremlin> sg.V().sort()

==>v[4184]

==>v[4256]

==>v[4264]

==>v[4280]

==>v[4288]

==>v[8280]

==>v[8352]

==>v[12376]


Geary


On Thursday, February 9, 2017 at 6:53:27 PM UTC-5, Marko A. Rodriguez wrote:
test-graph.png

Marko Rodriguez

unread,
Feb 10, 2017, 3:40:54 PM2/10/17
to gremli...@googlegroups.com
Hi.

Whoa. That is the craziest thing I’ve ever seen. You are projecting a graph into a subgraph, then analyzing that subgraph’s vertices via a traversal on that subgraph.  Wow. Its amazing what people come up with.

Here is a more elegant way of doing what you want to do:

g.V(4264).emit().repeat(out()).times(15)

I see you are ordering on id with your Groovy sort(), if so, do:

g.V(4264).emit().repeat(out()).times(15).
  order().by(id)
Marko.

For more options, visit https://groups.google.com/d/optout.
<test-graph.png>

Geary Sutterfield

unread,
Feb 12, 2017, 12:58:21 PM2/12/17
to Gremlin-users
Hi Marko

The test graph I'm traversing is a simple, directed graph, as shown in the previous attached test-graph.png. The query you suggested is giving me the child vertices in my graph, not ancestor vertices. Details below.

I've just been working with Titan/TinkerPop/Gremlin 2-3 weeks. So I'm a novice and still learning. How to construct the Gremlin queries is still elusive for me. I understand the concepts as outlined in several of your presentations, transformations, composite functions, etc., since I was a math major. And I understand the traversal engine steps individually. But I seem to have trouble putting the steps together in the right order. In the Gremlin console, I get a lot of zero results returned or exceptions/stack traces, but the messages don't seem helpful in correcting the query.

I was able to get the query you suggested working with sort:

gremlin> g.V(4264)

==>v[4264]

gremlin> g.V(4264).emit().repeat(out()).times(15).sort()

==>v[4112]

==>v[4208]

==>v[4208]

==>v[4264]

==>v[4288]

==>v[4336]

==>v[8304]

==>v[8304]

==>v[12376]


BUT, those are the CHILD vertices (to the right of 4264 in the test-graph.png diagram). I'm looking for ANCESTOR vertices (to the left of 4264, including 4264 itself). If starting from 4264, I need these vertices to be output with no duplicates.

==>v[4184]

==>v[4256]

==>v[4264]

==>v[4280]

==>v[8280]

==>v[8352]


Except for the subgraph query, I haven't been able to construct a query to get this output. But I realize I should NOT have to create a subgraph. Should be able to traverse the original graph.


Thanks for your help.


Geary

Marko Rodriguez

unread,
Feb 12, 2017, 1:47:51 PM2/12/17
to gremli...@googlegroups.com
Hello,

Instead of out(), try in()? That is go “up” instead of “down” the tree?

Also, sort() is a Groovy meta-method on Iterator. You should use order().by(id) instead.

HTH,
Marko.

Geary Sutterfield

unread,
Feb 13, 2017, 8:46:47 PM2/13/17
to Gremlin-users
Well, in my gremlin console, replacing out() with in() or inE() doesn't seem to work:

gremlin> g.V(4264).emit().repeat(in()).times(15)

groovysh_parse: 2: unexpected token: in @ line 2, column 25.

g.V(4264).emit().repeat(in()).times(15)

^

1 error

Display stack trace? [yN] n

  

gremlin> g.V(4264).emit().repeat(inE()).times(15)

==>v[4264]

==>e[6q3-6e0-4r9-3ag][8280-OUT->4264]

Expected traverser of Titan vertex but found: e[6q3-6e0-4r9-3ag][8280-OUT->4264]

Display stack trace? [yN] n

gremlin>


But the combo of inE() and outV() seems to work:


gremlin> g.V(4264).emit().repeat(inE().outV()).times(15)

==>v[4264]

==>v[8280]

==>v[8352]

==>v[4256]

==>v[4280]

==>v[4184]


Also. using an until clause to stop when a vertex is reached with no input edges seems to work. I don't know if this is more efficient or preferable to using the times() step. Is this the best way when you don't know how many iterations are needed to reach vertices without input edges?


gremlin> g.V(4264).emit().repeat(inE().outV()).until(inE().count().is(0))

==>v[4264]

==>v[8280]

==>v[8352]

==>v[4256]

==>v[4280]

==>v[4184]


Geary



Marko Rodriguez

unread,
Feb 13, 2017, 8:50:14 PM2/13/17
to gremli...@googlegroups.com
repeat(__.in()) cause “in” is a reserved word in Groovy.

Marko.


Reply all
Reply to author
Forward
0 new messages