Hi,
I am trying to execute a grouping query on Tinkerpop3.1.1 in Spark-Gremlin (OLAP). On the tinkerpop-modern.json graph, I would like to group person names and outgoing people names with the knows edge. So the result would be
==>[name:marko, knows[josh,vadas]]
I can execute a query to verify that the counts of attached people:
results = g.V().match(
__.as('allV').hasLabel('person').as('personV').values('name').as('name'),
__.as('personV').out('knows').count().as('counts')).
select('name','counts').fold().next()
==>[name:josh, counts:0]
==>[name:marko, counts:2]
==>[name:peter, counts:0]
==>[name:vadas, counts:0]
But when I try to add an extra fold step to group the names, it doesn't work:
results = g.V().match(
__.as('allV').hasLabel('person').as('personV').values('name').as('name'),
__.as('personV').out('knows').as('knowsV'),
__.as('knowsV').values('name').fold().as('knows')).
select('name','knows').fold().next()
==>[name:marko, knows:[josh]]
==>[name:marko, knows:[vadas]]
And if I try to fold the vertices instead, I get a serializer error:
results = g.V().match(
__.as('allV').hasLabel('person').as('personV').values('name').as('name'),
__.as('personV').out('knows').as('knowsV'),
__.as('knowsV').fold().as('knows')).
select('name').fold().next()
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0.0 in stage 225.0 (TID 148) had a not serializable result: org.apache.tinkerpop.gremlin.process.computer.util.ComputerGraph$ComputerVertex
Serialization stack:
Any ideas would be appreciated.
Jen