How to get all nodes and edges in a k-hop ego network of a node?

3,379 views
Skip to first unread message

Apple Little

unread,
Jun 30, 2015, 12:10:06 PM6/30/15
to gremli...@googlegroups.com
Hi. I am new to Gremlin-Tinkerpop 3. I am reading through the documentation of Tinkerpop 3 and feel a little overwhelmed. I am looking to create a gremlin query that can return all the nodes and edges in a k-hop ego network of a node, where the definition of a k-hop ego network of a node is (1) all the neighbors (and edges) within k hops when performing breadth-first-search from this node, and (2) all edges between the nodes in this network. I am not sure how I can achieve this with a gremlin query (or any Tinkerpop 3 alternative such as vertexprogram). The tricky part seems to be (2). 

Can anyone help me out here? Many thanks!!

Daniel Kuppitz

unread,
Jun 30, 2015, 12:15:03 PM6/30/15
to gremli...@googlegroups.com
Can you give an example of the expected output? Maybe use the modern graph or another small sample graph to keep it simple.

Cheers,
Daniel


On Tue, Jun 30, 2015 at 6:05 PM, Apple Little <littl...@gmail.com> wrote:
Hi. I am new to Gremlin-Tinkerpop 3. I am reading through the documentation of Tinkerpop 3 and feel a little overwhelmed. I am looking to create a gremlin query that can return all the nodes and edges in a k-hop ego network of a node, where the definition of a k-hop ego network of a node is (1) all the neighbors (and edges) within k hops when performing breadth-first-search from this node, and (2) all edges between the nodes in this network. I am not sure how I can achieve this with a gremlin query (or any Tinkerpop 3 alternative such as vertexprogram). The tricky part seems to be (2). 

Can anyone help me out here? Many thanks!!

--
You received this message because you are subscribed to the Google Groups "Gremlin-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gremlin-user...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gremlin-users/dcaee5c9-1de9-4c6f-ae90-3987450e1ab7%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Apple Little

unread,
Jun 30, 2015, 1:48:08 PM6/30/15
to gremli...@googlegroups.com
Sure. Using the modern graph as an example. The ego network ignores edge directions.

The 1-hop ego network of v1 is:
v1, e7, v2
v1, e8, v4
v1, e9, v3
v4, e11, v3

The 2-hop ego network of v1 is:
v1, e7, v2
v1, e8, v4
v1, e9, v3
v4, e11, v3
v4, e10, v5
v6, e12, v3

It would be great if the result puts all (unique) vertices into a list, and all (unique) edges into another list, but it is not required. The most important thing is to get all vertices and edges correctly.

Thanks!!

Apple Little

unread,
Jun 30, 2015, 2:08:52 PM6/30/15
to gremli...@googlegroups.com
Is subgraph (http://tinkerpop.incubator.apache.org/docs/3.0.0-SNAPSHOT/#subgraph-step) starting from a vertex the one I should use? Do I only need to replace ".inE()" to ".bothE()" and "times(3)" to "time(k)"?

Thanks!!

gremlin> subGraph = g.V(3).repeat(__.inE().subgraph('subGraph').outV()).times(3).cap('subGraph').next() //(1) ==>tinkergraph[vertices:4 edges:4] gremlin> sg = subGraph.traversal(standard()) ==>graphtraversalsource[tinkergraph[vertices:4 edges:4], standard] gremlin> sg.E() ==>e[8][1-knows->4] ==>e[9][1-created->3] ==>e[11][4-created->3] ==>e[12][6-created->3]

Daniel Kuppitz

unread,
Jun 30, 2015, 2:31:42 PM6/30/15
to gremli...@googlegroups.com
Almost. You have to add a few more steps to get (2) working.

1-Hop:

gremlin> sg = g.V(1).store("v").repeat(
                  bothE().store("e").subgraph("sg").otherV().store("v")
         ).times(1).cap("v").unfold().bothE().where(without("e")).as("edge").otherV().where(within("v")).select("edge").subgraph("sg").cap("sg").next()

==>tinkergraph[vertices:4 edges:4]
gremlin> sg.traversal().V().outE().inV().path().by(id)
==>[1, 9, 3]
==>[1, 7, 2]
==>[1, 8, 4]
==>[4, 11, 3]

2-Hop:

gremlin> sg = g.V(1).store("v").repeat(
                  bothE().store("e").subgraph("sg").otherV().store("v")
         ).times(2).cap("v").unfold().bothE().where(without("e")).as("edge").otherV().where(within("v")).select("edge").subgraph("sg").cap("sg").next()

==>tinkergraph[vertices:6 edges:6]
gremlin> sg.traversal().V().outE().inV().path().by(id)
==>[1, 9, 3]
==>[1, 7, 2]
==>[1, 8, 4]
==>[4, 10, 5]
==>[4, 11, 3]
==>[6, 12, 3]

It would be great if the result puts all (unique) vertices into a list, and all (unique) edges into another list

That's obviously pretty easy: sg.V().toList() and sg.E().toList().

Cheers,
Daniel


--
You received this message because you are subscribed to the Google Groups "Gremlin-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gremlin-user...@googlegroups.com.

Daniel Kuppitz

unread,
Jun 30, 2015, 8:31:29 PM6/30/15
to gremli...@googlegroups.com
While waiting for some Hadoop jobs I came up with a slightly different (nicer) approach:

hops = 2
sg = g.V(1).emit().repeat(
    bothE().store("e").subgraph("sg").otherV()
).times(hops).aggregate("v").bothE().
    where(without("e")).as("edge").otherV().

    where(within("v")).select("edge").subgraph("sg").
    cap("sg").next()

And if all you want is the list of vertices and edges:

g.V(1).emit().repeat(
    bothE().dedup().store("edges").otherV()
).times(hops).dedup().aggregate("vertices").bothE().
    where(without("edges")).as("edge").otherV().
    where(within("vertices")).select("edge").store("edges").
    cap("vertices", "edges")

In practice it looks like that:

gremlin> (1..2).each { def hops ->
gremlin>     println "\n=== ${hops}-hops ===\n"
gremlin>     g.V(1).emit().repeat(
gremlin>         bothE().dedup().store("edges").otherV()
gremlin>     ).times(hops).dedup().aggregate("vertices").bothE().
gremlin>         where(without("edges")).as("edge").otherV().
gremlin>         where(within("vertices")).select("edge").store("edges").
gremlin>         cap("vertices", "edges").each(System.out.&println)
gremlin> }; null


=== 1-hops ===

[vertices: [v[1], v[3], v[2], v[4]],
 edges:    [e[9][1-created->3], e[7][1-knows->2], e[8][1-knows->4], e[11][4-created->3]]]


=== 2-hops ===

[vertices: [v[1], v[3], v[4], v[6], v[2], v[5]],
 edges:    [e[9][1-created->3], e[7][1-knows->2], e[8][1-knows->4], e[11][4-created->3], e[12][6-created->3], e[10][4-created->5]]]



Cheers,
Daniel

Apple Little

unread,
Jul 1, 2015, 9:44:52 AM7/1/15
to gremli...@googlegroups.com
Great! Thank you very much! I wouldn't be able to come up with these myself given my current expertise level of Gremlin. ;-)

Apple Little

unread,
Jul 6, 2015, 11:40:45 AM7/6/15
to gremli...@googlegroups.com
Is there a way to request the traversal to stop when the number of vertices in the ego network reaches a threshold X, and/or to request the traversal to only go through up to Y edges of each vertex? 
The use case is when the graph is very large, we may not want all vertices and edges in the ego network to be returned. So we may want to limit the number of vertices and/or the number of edges. 

Many thanks!!

Daniel Kuppitz

unread,
Jul 6, 2015, 12:23:10 PM7/6/15
to gremli...@googlegroups.com
... request the traversal to only go through up to Y edges of each vertex

This one is easy:

g.V(1).emit().repeat(
    local(bothE().limit(Y)).dedup().store("edges").otherV()
).times(hops).dedup().aggregate("vertices").bothE().
    where(without("edges")).as("edge").otherV().
    where(within("vertices")).select("edge").store("edges").
    cap("vertices", "edges")
 request the traversal to stop when the number of vertices in the ego network reaches a threshold X

This one is harder, but it should work somehow. However, I guess limiting the number of traversed edges per vertex may already help, hence I'm not going to spend time on the second query (yet).

Cheers,
Daniel



Apple Little

unread,
Jul 6, 2015, 1:57:18 PM7/6/15
to gremli...@googlegroups.com
Thanks!!

For limiting the total number of vertices, is it possible to use limit(X) following aggregate("vertices") if I want to first traverse all the vertices in the ego network and then just select a subset of these vertices and get all edges between them? 

What if I want to  get a random sample (up to X vertices) of the vertices in the ego network and all edges between the sampled vertices, can I use sample(X) in place of limit(X)? 

Similarly can I use local(bothE().sample(Y))?

Is sample a lot slower than limit?

Thanks a lot!

Daniel Kuppitz

unread,
Jul 6, 2015, 5:01:43 PM7/6/15
to gremli...@googlegroups.com
For limiting the total number of vertices, is it possible to use limit(X) following aggregate("vertices")

I haven't tried that, but I don't think that it would make much sense, since aggregate uses eager evaluation. See:

gremlin> g.V(1).out()
==>v[3]
==>v[2]
==>v[4]
gremlin> g.V(1).out().aggregate("x").limit(1).cap("x").unfold()
==>v[3]
==>v[2]
==>v[4]

can I use sample(X) in place of limit(X)? 

You can, but in the end your edges collection would contain edges without corresponding vertices in the vertices collection.

Similarly can I use local(bothE().sample(Y))?

This would make more sense, yeah.

Is sample a lot slower than limit?

Slower yes, but a lot? I don't think so. Try it out...

Cheers,
Daniel



Apple Little

unread,
Oct 16, 2015, 4:10:15 PM10/16/15
to Gremlin-users
Hi. I am back at it after a few months of other work. Now I have a new question/problem. I need the ego network query result to be in a nice JSON format like the following:

{

"nodes":[

{"id":"6","name":"peter","age":35},

{"id":"5","name":"ripple","lang":"java"},

{"id":"3","name":"lop","lang":"java"},

{"id":"4","name":"josh","age":32},

{"id":"2","name":"vadas","age":27},

{"id":"1","name":"marko","age":29}

],

"edges":[

{"source":"1","target":"2","eid":"0","label":"knows","weight":0.500000},

{"source":"1","target":"4","eid":"2","label":"knows","weight":1.000000},

{"source":"1","target":"3","eid":"1","label":"created","weight":0.400000},

{"source":"4","target":"3","eid":"3","label":"created","weight":0.400000},

{"source":"4","target":"5","eid":"4","label":"created","weight":1.000000},

{"source":"6","target":"3","eid":"5","label":"created","weight":0.200000}

],

"summary":[{"number of nodes":6,"number of edges":6}],

}


Is it possible to do it using the Gremlin query? It is not clear to me whether/how I can use GraphSONWriter or something else to do it. 

Can the experts here help me out? Thank you very much!
Reply all
Reply to author
Forward
0 new messages