Tinkerpop 3.1.1 (Spark-Gremlin) query to group nodes and attached nodes

Jen

unread,

May 11, 2016, 7:25:03 AM5/11/16

to Gremlin-users

Hi,

I am trying to execute a grouping query on Tinkerpop3.1.1 in Spark-Gremlin (OLAP). On the tinkerpop-modern.json graph, I would like to group person names and outgoing people names with the knows edge. So the result would be

==>[name:marko, knows[josh,vadas]]

I can execute a query to verify that the counts of attached people:

results = g.V().match(

__.as('allV').hasLabel('person').as('personV').values('name').as('name'),

__.as('personV').out('knows').count().as('counts')).

select('name','counts').fold().next()

==>[name:josh, counts:0]

==>[name:marko, counts:2]

==>[name:peter, counts:0]

==>[name:vadas, counts:0]

But when I try to add an extra fold step to group the names, it doesn't work:

results = g.V().match(

__.as('allV').hasLabel('person').as('personV').values('name').as('name'),

__.as('personV').out('knows').as('knowsV'),

__.as('knowsV').values('name').fold().as('knows')).

select('name','knows').fold().next()

==>[name:marko, knows:[josh]]

==>[name:marko, knows:[vadas]]

And if I try to fold the vertices instead, I get a serializer error:

results = g.V().match(

__.as('allV').hasLabel('person').as('personV').values('name').as('name'),

__.as('personV').out('knows').as('knowsV'),

__.as('knowsV').fold().as('knows')).

select('name').fold().next()

org.apache.spark.SparkException: Job aborted due to stage failure: Task 0.0 in stage 225.0 (TID 148) had a not serializable result: org.apache.tinkerpop.gremlin.process.computer.util.ComputerGraph$ComputerVertex

Serialization stack:

Any ideas would be appreciated.

Jen

Daniel Kuppitz

unread,

May 11, 2016, 9:47:18 AM5/11/16

to gremli...@googlegroups.com

Hi Jen,

I think your fold() is misplaced. Try this:

g.V().hasLabel('person').match(
__.as('p').values('name').as('name'),
__.as('p').out('knows').values('name').fold().as('knows')

).select('name','knows')

Cheers,

Daniel

--
You received this message because you are subscribed to the Google Groups "Gremlin-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gremlin-user...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gremlin-users/14992bd0-b696-455c-86a8-e86b8e4072af%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Marko Rodriguez

unread,

May 11, 2016, 10:15:54 AM5/11/16

to gremli...@googlegroups.com

I don't think you can use fold() in a match() pattern like that. I believe MatchStep will realize you are ending with a BarrierStep (fold()) and turn it into a local child that fully processes and reset()s after each incoming "p".

https://github.com/apache/incubator-tinkerpop/blob/master/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/step/map/MatchStep.java#L131-L135

Marko.

http://markorodriguez.com

To view this discussion on the web visit https://groups.google.com/d/msgid/gremlin-users/CA%2Bf9seWMEg059eiBzxg1TYfo%2BR8P%2B-GayoODO9_jb1t-XLAvUA%40mail.gmail.com.

Jen

unread,

May 11, 2016, 11:07:12 AM5/11/16

to Gremlin-users

Daniel - if I try to do it the way you suggested, I get a local star graph error:

Local traversals on GraphComputer may not traverse past the local star-graph: [VertexStep(OUT,[knows],vertex), PropertiesStep([name],value), FoldStep]

Marko - is there a recommended way to do aggregation within a graph traversal in OLAP? Other than pulling out the individual pieces and doing a groovy groupBy at the end?

results = g.V().match(

__.as('allV').hasLabel('person').as('personV').values('name').as('name'),

__.as('personV').out('knows').as('knowsV'),

__.as('knowsV').values('name').as('knows')).

select('name','knows').fold().next()

==>[name:marko, knows:josh]

==>[name:marko, knows:vadas]

grouped = results.groupBy({it -> it.name})

==>marko=[{name=marko, knows=josh}, {name=marko, knows=vadas}]

Jen

Marko Rodriguez

unread,

May 11, 2016, 11:47:25 AM5/11/16

to gremli...@googlegroups.com

Hello,

Daniel - if I try to do it the way you suggested, I get a local star graph error:
Local traversals on GraphComputer may not traverse past the local star-graph: [VertexStep(OUT,[knows],vertex), PropertiesStep([name],value), FoldStep]

Yea, you can't leave the star graph in local traversals in OLAP. That is the final distinction between Gremlin OLAP and OLTP that I have no general solution to :|.

Marko - is there a recommended way to do aggregation within a graph traversal in OLAP? Other than pulling out the individual pieces and doing a groovy groupBy at the end?
results = g.V().match(
__.as('allV').hasLabel('person').as('personV').values('name').as('name'),
__.as('personV').out('knows').as('knowsV'),
__.as('knowsV').values('name').as('knows')).
select('name','knows').fold().next()
==>[name:marko, knows:josh]
==>[name:marko, knows:vadas]
grouped = results.groupBy({it -> it.name})
==>marko=[{name=marko, knows=josh}, {name=marko, knows=vadas}]

Can you just do this?

g.V().match(
__.as('allV').hasLabel('person').as('personV').values('name').as('name'),
__.as('personV').out('knows').as('knowsV'),
__.as('knowsV').values('name').as('knows'))

.group()
.by(select('name'))
.by(select('knows').fold())

Its not until 3.2.0 that you get super efficient group()-reductions. -- though, you just fold() so there is no real savings on this particular query between 3.1.x and 3.2.x.

HTH,

Marko.

http://markorodriguez.com

To view this discussion on the web visit https://groups.google.com/d/msgid/gremlin-users/2b1f73b4-873e-4b6c-a542-66848deeb7a7%40googlegroups.com.

Jen

unread,

May 11, 2016, 12:00:25 PM5/11/16

to Gremlin-users

Thanks Marko! That works for me!

Jen

Daniel Kuppitz

unread,

May 11, 2016, 12:01:24 PM5/11/16

to gremli...@googlegroups.com

Can you just do this?
g.V().match(
__.as('allV').hasLabel('person').as('personV').values('name').as('name'),
__.as('personV').out('knows').as('knowsV'),
__.as('knowsV').values('name').as('knows'))
.group()
.by(select('name'))
.by(select('knows').fold())

I was going to post that as an answer earlier, but then realized that the result is significantly different. Vertices without knows-relations will no longer be part of the result.

Cheers,

Daniel

To view this discussion on the web visit https://groups.google.com/d/msgid/gremlin-users/BE7512D5-7848-45F6-BAFD-07664C7C35A1%40gmail.com.

Reply all

Reply to author

Forward