How to write gremlin query to calculate values by groups

2,868 views
Skip to first unread message

Christine Li

unread,
Jan 13, 2016, 11:56:54 PM1/13/16
to Gremlin-users
It is a continual question of my previous one.

Often in sql we write: select avg("age"), sum("score"), count(*) from aTable group by ("gender")

g.V().group().by("gender").by(values("age").mean()) can get the 1 value. Can gremlin traversal once to avg("age"), sum("score"), count(*) ?

Thanks,

Daniel Kuppitz

unread,
Jan 14, 2016, 7:19:35 AM1/14/16
to gremli...@googlegroups.com
That's pretty easy - fold() the group values and use match() to apply several queries over the values:

Example:

g = TinkerFactory.createModern().traversal()
g.V().hasLabel("person").property("gender","m")
g.V().has("name","marko").as("m").addV(label,"person","name","christine","age",30,"gender","f").addE("knows").to("m")
g.V().hasLabel("person").group().by("gender").by(
      fold().match(__.as("p").count(local).as("total"),
                   __.as("p").unfold().values("age").mean().as("avg_age"),
                   __.as("p").order(local).by("age").as("o"),
                   __.as("o").limit(local, 1).values("name").as("youngest"),
                   __.as("o").tail(local, 1).values("name").as("oldest")
                  ).select("total","avg_age","youngest","oldest"))

Result:

gremlin> g.V().hasLabel("person").group().by("gender").by(
gremlin>       fold().match(__.as("p").count(local).as("total"),
gremlin>                    __.as("p").unfold().values("age").mean().as("avg_age"),
gremlin>                    __.as("p").order(local).by("age").as("o"),
gremlin>                    __.as("o").limit(local, 1).values("name").as("youngest"),
gremlin>                    __.as("o").tail(local, 1).values("name").as("oldest")
gremlin>                   ).select("total","avg_age","youngest","oldest")).next()
==>f={total=1, avg_age=30.0, youngest=christine, oldest=christine}
==>m={total=4, avg_age=30.75, youngest=vadas, oldest=peter}

Cheers,
Daniel


--
You received this message because you are subscribed to the Google Groups "Gremlin-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gremlin-user...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gremlin-users/9dc3eca7-2188-4167-91f9-e65092343f03%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Christine Li

unread,
Jan 14, 2016, 9:11:25 AM1/14/16
to Gremlin-users
Cool, I have to re-read document of Traversal steps. Thank very much Daniel for the help.
I remember at one point there was a SQL to gremlin document to help people understand gremlin. (I can't remember whether it was from Titan or TP3). Do we still have it somewhere? google found this http://sql2gremlin.com/ . Looks like a thirdparty website, not part of TP3. Would be nice if TP3 can provide some SQL2Gremlin samples.

Thanks,

Daniel Kuppitz

unread,
Jan 14, 2016, 9:43:13 AM1/14/16
to gremli...@googlegroups.com
Take a closer look at the TinkerPop home page: http://www.tinkerpop.com
There's a TinkerPop Tutorials entry in the main menu.

Cheers,
Daniel


--
You received this message because you are subscribed to the Google Groups "Gremlin-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gremlin-user...@googlegroups.com.

Christine Li

unread,
Jan 14, 2016, 1:47:15 PM1/14/16
to Gremlin-users
Daniel, you are the BEAST!!!

Thanks a lot,

Andrew Coulson

unread,
Sep 20, 2023, 4:44:30 PM9/20/23
to Gremlin-users
Oh, please still be following this :-)....

Ok, what if I now also want to include from a vertex further up the travesal in each result? I tried a projection but it breaks the grouping logic.

So, something like:
g.V().as("aPerson").out("knows")
project("name", "agregates")
.by(select("aPerson").values("name"))
.by(
  group().by("gender").by(
      fold().match(__.as("p").count(local).as("total"),
                   __.as("p").unfold().values("age").mean().as("avg_age"),
                   __.as("p").order(local).by("age").as("o"),
                   __.as("o").limit(local, 1).values("name").as("youngest"),
                   __.as("o").tail(local, 1).values("name").as("oldest")
                  ).select("total","avg_age","youngest","oldest")).next())

May not make sense with the modern data, but it's the pattern I'm trying to implement - a couple properties from "up the chain", combined with aggregates "down the chain" on each line.
Reply all
Reply to author
Forward
0 new messages