Tinkerpop 3.1.1 (Spark-Gremlin) query to group nodes and attached nodes

107 views
Skip to first unread message

Jen

unread,
May 11, 2016, 7:25:03 AM5/11/16
to Gremlin-users
Hi,

I am trying to execute a grouping query on Tinkerpop3.1.1 in Spark-Gremlin (OLAP). On the tinkerpop-modern.json graph, I would like to group person names and outgoing people names with the knows edge. So the result would be
==>[name:marko, knows[josh,vadas]]

I can execute a query to verify that the counts of attached people:
results = g.V().match(
__.as('allV').hasLabel('person').as('personV').values('name').as('name'),
__.as('personV').out('knows').count().as('counts')).
select('name','counts').fold().next()
==>[name:josh, counts:0]
==>[name:marko, counts:2]
==>[name:peter, counts:0]
==>[name:vadas, counts:0]

But when I try to add an extra fold step to group the names, it doesn't work:
results = g.V().match(
__.as('allV').hasLabel('person').as('personV').values('name').as('name'),
__.as('personV').out('knows').as('knowsV'),
__.as('knowsV').values('name').fold().as('knows')).
select('name','knows').fold().next()
==>[name:marko, knows:[josh]]
==>[name:marko, knows:[vadas]]

And if I try to fold the vertices instead, I get a serializer error:
results = g.V().match(
__.as('allV').hasLabel('person').as('personV').values('name').as('name'),
__.as('personV').out('knows').as('knowsV'),
__.as('knowsV').fold().as('knows')).
select('name').fold().next()
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0.0 in stage 225.0 (TID 148) had a not serializable result: org.apache.tinkerpop.gremlin.process.computer.util.ComputerGraph$ComputerVertex
Serialization stack:

Any ideas would be appreciated.

Jen

Daniel Kuppitz

unread,
May 11, 2016, 9:47:18 AM5/11/16
to gremli...@googlegroups.com
Hi Jen,

I think your fold() is misplaced. Try this:

g.V().hasLabel('person').match(
    __.as('p').values('name').as('name'),
    __.as('p').out('knows').values('name').fold().as('knows')
  ).select('name','knows')

Cheers,
Daniel


--
You received this message because you are subscribed to the Google Groups "Gremlin-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gremlin-user...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gremlin-users/14992bd0-b696-455c-86a8-e86b8e4072af%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Marko Rodriguez

unread,
May 11, 2016, 10:15:54 AM5/11/16
to gremli...@googlegroups.com
I don't think you can use fold() in a match() pattern like that. I believe MatchStep will realize you are ending with a BarrierStep (fold()) and turn it into a local child that fully processes and reset()s after each incoming "p".

Jen

unread,
May 11, 2016, 11:07:12 AM5/11/16
to Gremlin-users
Daniel - if I try to do it the way you suggested, I get a local star graph error:
Local traversals on GraphComputer may not traverse past the local star-graph: [VertexStep(OUT,[knows],vertex), PropertiesStep([name],value), FoldStep]

Marko - is there a recommended way to do aggregation within a graph traversal in OLAP? Other than pulling out the individual pieces and doing a groovy groupBy at the end?
results = g.V().match(
__.as('allV').hasLabel('person').as('personV').values('name').as('name'),
__.as('personV').out('knows').as('knowsV'),
__.as('knowsV').values('name').as('knows')).
select('name','knows').fold().next()
==>[name:marko, knows:josh]
==>[name:marko, knows:vadas]
grouped = results.groupBy({it -> it.name})
==>marko=[{name=marko, knows=josh}, {name=marko, knows=vadas}]

Jen

Marko Rodriguez

unread,
May 11, 2016, 11:47:25 AM5/11/16
to gremli...@googlegroups.com
Hello,

Daniel - if I try to do it the way you suggested, I get a local star graph error:
Local traversals on GraphComputer may not traverse past the local star-graph: [VertexStep(OUT,[knows],vertex), PropertiesStep([name],value), FoldStep]

Yea, you can't leave the star graph in local traversals in OLAP. That is the final distinction between Gremlin OLAP and OLTP that I have no general solution to :|.

Marko - is there a recommended way to do aggregation within a graph traversal in OLAP? Other than pulling out the individual pieces and doing a groovy groupBy at the end?
results = g.V().match(
__.as('allV').hasLabel('person').as('personV').values('name').as('name'),
__.as('personV').out('knows').as('knowsV'),
__.as('knowsV').values('name').as('knows')).
select('name','knows').fold().next()
==>[name:marko, knows:josh]
==>[name:marko, knows:vadas]
grouped = results.groupBy({it -> it.name})
==>marko=[{name=marko, knows=josh}, {name=marko, knows=vadas}]

Can you just do this?

g.V().match(
  __.as('allV').hasLabel('person').as('personV').values('name').as('name'),
  __.as('personV').out('knows').as('knowsV'),
  __.as('knowsV').values('name').as('knows'))
    .group()
      .by(select('name'))
      .by(select('knows').fold())

Its not until 3.2.0 that you get super efficient group()-reductions. -- though, you just fold() so there is no real savings on this particular query between 3.1.x and 3.2.x.

HTH,
Marko.



Jen

unread,
May 11, 2016, 12:00:25 PM5/11/16
to Gremlin-users
Thanks Marko! That works for me!

Jen

Daniel Kuppitz

unread,
May 11, 2016, 12:01:24 PM5/11/16
to gremli...@googlegroups.com
Can you just do this?
g.V().match(
  __.as('allV').hasLabel('person').as('personV').values('name').as('name'),
  __.as('personV').out('knows').as('knowsV'),
  __.as('knowsV').values('name').as('knows'))
    .group()
      .by(select('name'))
      .by(select('knows').fold())

I was going to post that as an answer earlier, but then realized that the result is significantly different. Vertices without knows-relations will no longer be part of the result.

Cheers,
Daniel

Reply all
Reply to author
Forward
0 new messages