Gremlin - To group(), count(), mean()

3,080 views
Skip to first unread message

Balaji R

unread,
Mar 22, 2018, 12:04:49 PM3/22/18
to Gremlin-users
Hi,

How to group the vertices by label at the same time count out and in vertices and mean of the property for each group?

I tried the below query. It is not working.

g.V().hasLabel('Topics').
    has('topicType','Dead Battery').
    as('topicType', 'avg_sentiment', 'resp_count', 'msg_count').
    in().out().hasLabel('Messages').in().hasLabel('ChatSessions').
    as('sessionId', 'date', 'time', 'status').
    select('topicType', 'avg_sentiment', 'resp_count', 'msg_count', 'sessionId', 'date', 'time', 'status').
        by('topicType').
        by(__.in().values('sentiment').mean()).
        by(__.in().values('response').count()).
        by(__.in().out().hasLabel('Messages').values('message').count()).
        by('sessionId').
        by('date').
        by('time').
        by('status')

Thanks,
Balaji R

Daniel Kuppitz

unread,
Mar 22, 2018, 1:17:47 PM3/22/18
to gremli...@googlegroups.com
Your query doesn't group anything, so I'm not quite sure if my answer is going to be what you're looking for, but I'll just rely on the initial question.

How to group the vertices by label at the same time count out and in vertices and mean of the property for each group?

gremlin> g = TinkerFactory.createModern().traversal()
==>graphtraversalsource[tinkergraph[vertices:6 edges:6], standard]
gremlin> g.V().
           group().                                        /* group by label                        */
             by(label).
           unfold().as('kv').                              /* unfold and ...                        */
           select(values).
           project("i","o","m").                           /* ... compute statistics for each label */
             by(unfold().inE().count()).
             by(unfold().outE().count()).
             by(unfold().bothE().values('weight').mean()).
           group().                                        /* regroup by label                      */
             by(select('kv').select(keys)).
             by(fold().unfold())
==>[software:[i:4,o:0,m:0.5],person:[i:2,o:6,m:0.625]]

Cheers,
Daniel


--
You received this message because you are subscribed to the Google Groups "Gremlin-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gremlin-users+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gremlin-users/45b53d46-b1b7-4dd7-bb40-654fff5c9c89%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Balaji R

unread,
Mar 23, 2018, 2:08:55 AM3/23/18
to Gremlin-users
Thanks a lot Daniel

By using your query I was able to get all users data. What I need is group of users under one topic. Expected something like this,

sessionId topicType productType avg_sentiment resp_count msg_count date time status
CS0001
Topic 1
Prodcut 1
1.5
4
4
3/22/2018
12:06:56 PM
ACTIVE
CS0002
Topic 2
Prodcut 1, Prodcuct 2
3
3
3
3/22/2018
12:06:56 PM
CLOSED

Please help me on this. attached my graph model.

Thanks,
Balaji
To unsubscribe from this group and stop receiving emails from it, send an email to gremlin-user...@googlegroups.com.
ChatBot_AI.jpg

Daniel Kuppitz

unread,
Mar 23, 2018, 11:13:42 AM3/23/18
to gremli...@googlegroups.com
First, what's your current query? And next, I don't see users in your model. Please clarify that, provide a small sample dataset and an expected result.

Thanks.

Cheers,
Daniel


To unsubscribe from this group and stop receiving emails from it, send an email to gremlin-users+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gremlin-users/6da823b2-525d-46d9-9a2f-05f880ffbc77%40googlegroups.com.

Balaji R

unread,
Mar 26, 2018, 1:54:50 AM3/26/18
to Gremlin-users
Sorry Daniel,

Here I attached sample graph and expected results.

Thanks for you support.

Regards,
Balaji
Expected_Result.JPG
Sample_Graph.jpg

Daniel Kuppitz

unread,
Mar 26, 2018, 1:09:19 PM3/26/18
to gremli...@googlegroups.com
For future questions: Would be nice if you could provide a script that creates your sample graph. Like this:

g = TinkerGraph.open().traversal()
g.addV('User').property(id,'USR01').property('status','Active').as('u1').
  addV('User').property(id,'USR02').property('status','Closed').as('u2').
  addV('User').property(id,'USR03').property('status','Active').as('u3').
  addV('Message').property(id,'Message 1').as('m1').
  addV('Message').property(id,'Message 2').as('m2').
  addV('Message').property(id,'Message 3').as('m3').
  addV('Message').property(id,'Message 4').as('m4').
  addV('Response').property(id,'Response 1').property('rating',4).as('r1').
  addV('Response').property(id,'Response 2').property('rating',3).as('r2').
  addV('Response').property(id,'Response 3').property('rating',4).as('r3').
  addV('Response').property(id,'Response 4').property('rating',4).as('r4').
  addV('Response').property(id,'Response 5').property('rating',4).as('r5').
  addV('Topic').property(id,'Banking').as('t1').
  addV('Topic').property(id,'Insurance').as('t2').
  addV('Topic').property(id,'Investments').as('t3').
  addV('Product').property(id,'Prod 1').as('p1').
  addV('Product').property(id,'Prod 2').as('p2').
  addE('posted').from('u1').to('m1').
  addE('posted').from('u1').to('m2').
  addE('posted').from('u2').to('m3').
  addE('posted').from('u3').to('m4').
  addE('responded').from('r1').to('m1').
  addE('responded').from('r2').to('m2').
  addE('responded').from('r3').to('m2').
  addE('responded').from('r4').to('m3').
  addE('responded').from('r5').to('m4').
  addE('belongs').from('r1').to('t1').
  addE('belongs').from('r2').to('t2').
  addE('belongs').from('r3').to('t2').
  addE('belongs').from('r4').to('t2').
  addE('belongs').from('r5').to('t3').
  addE('about').from('r1').to('p1').
  addE('about').from('r2').to('p1').
  addE('about').from('r3').to('p2').
  addE('about').from('r4').to('p2').
  addE('about').from('r5').to('p2').
  iterate()

The query, you're looking for, could look similar to this one:

gremlin> g.V('Insurance').as('t').
......1>   in('belongs').as('r').
......2>   out('responded').as('m').
......3>   in('posted').
......4>   group('ratings').
......5>     by().
......6>     by(select('r').fold()).
......7>   group('messages').
......8>     by().
......9>     by(select('m').fold()).
.....10>   group('products').
.....11>     by().
.....12>     by(select('r').out('about').fold()).
.....13>   barrier().
.....14>   dedup().as('u').
.....15>   project('userId','topicType','rating','resp_count','msg_count','productType','status').
.....16>     by(id).
.....17>     by(select('t').by(id)).
.....18>     by(select('ratings').unfold().
.....19>          where(select(keys).as('u')).
.....20>        select(values).unfold().
.....21>        values('rating').mean()).
.....22>     by(select('ratings').unfold().
.....23>          where(select(keys).as('u')).
.....24>        select(values).count(local)).
.....25>     by(select('messages').unfold().
.....26>          where(select(keys).as('u')).
.....27>        select(values).unfold().
.....28>        dedup().count()).
.....29>     by(select('products').unfold().
.....30>          where(select(keys).as('u')).
.....31>        select(values).unfold().
.....32>        dedup().id().fold()).
.....33>     by('status')
==>[userId:USR01,topicType:Insurance,rating:3.5,resp_count:2,msg_count:1,productType:[Prod 1,Prod 2],status:Active]
==>[userId:USR02,topicType:Insurance,rating:4.0,resp_count:1,msg_count:1,productType:[Prod 2],status:Closed]

There's a performance issue though, that can currently only be solved using lambdas:

gremlin> g.V('Insurance').as('t').
......1>   in('belongs').as('r').
......2>   out('responded').as('m').
......3>   in('posted').
......4>   group('ratings').
......5>     by().
......6>     by(select('r').fold()).
......7>   group('messages').
......8>     by().
......9>     by(select('m').fold()).
.....10>   group('products').
.....11>     by().
.....12>     by(select('r').out('about').fold()).
.....13>   barrier().
.....14>   dedup().as('u').
.....15>   project('userId','topicType','rating','resp_count','msg_count','productType','status').
.....16>     by(id).
.....17>     by(select('t').by(id)).
.....18>     by(flatMap {it.getSideEffects().get('ratings').get(it.get()).iterator()}.
.....19>        values('rating').mean()).
.....20>     by(map {it.getSideEffects().get('ratings').get(it.get()).size()}).
.....21>     by(map {it.getSideEffects().get('messages').get(it.get()).toSet().size()}).
.....22>     by(map {it.getSideEffects().get('products').get(it.get()).toSet()*.id()}).
.....23>     by('status')
==>[userId:USR01,topicType:Insurance,rating:3.5,resp_count:2,msg_count:1,productType:[Prod 1,Prod 2],status:Active]
==>[userId:USR02,topicType:Insurance,rating:4.0,resp_count:1,msg_count:1,productType:[Prod 2],status:Closed]

Cheers,
Daniel



To unsubscribe from this group and stop receiving emails from it, send an email to gremlin-users+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gremlin-users/b6012f82-bb53-4327-a015-daf7ec8d88be%40googlegroups.com.

Balaji R

unread,
Mar 27, 2018, 3:38:03 AM3/27/18
to Gremlin-users
Thanks a lot Daniel. :)
Reply all
Reply to author
Forward
0 new messages