Re: GroupCount and order using Java

145 views
Skip to first unread message

Daniel Kuppitz

unread,
May 17, 2013, 12:04:13 PM5/17/13
to gremli...@googlegroups.com
Hi,

the number of nodes in the last step will even explode in small graphs. I suspect you'll not be able to get any realtime results for this query. I would setup a cron job that starts a Faunus job each hour/day/week (depends on how accurate your data needs to be). With Faunus you can create an edge between the source an target node, something like this:

g.addEdge(source, target, 'common' ['degree':calculatedDegree])

Once this job is running frequently and has finished at least once, you can query the top 200 like this:

source.outE('common').order{it.b.degree <=> it.a.degree}[0..199].inV()

That's just the theory. In practice it'll be more work you have to do (e.g. check if common edges already exist, create an index for the degree properties, ...). However, I don't know whether Faunus is an option for you, that's why I didn't go into much detail here.

Cheers,
Daniel


Am Freitag, 17. Mai 2013 16:45:55 UTC+2 schrieb Prishant Mantrao:
Hi


In the following query, if I want it to return the most common companies (basically order by descending on the groupcount) what would be the best way to do that using Java Gremlin Pipes. 

If I add the order pipe at the end, it just takes forever. I want to get the top n results that are most commonly followed in a member's network. For some members, depending on the density of the graph,  it can return thousand of companies and I just want to return the top 200 results.


pipe1.start(v).out("connectedTo").in("connectedTo").dedup().except(Arrays.asList(v)).out("connectedTo").except(neighbors).groupCount(r).next(200);


Best


PM

Daniel Kuppitz

unread,
May 17, 2013, 12:06:02 PM5/17/13
to gremli...@googlegroups.com
Oh, just recognized, that I'm in the Gremlin group. So I don't even know if Titan is an option for you. Which Graph DB are you using?

Cheers,
Daniel

Prishant Mantrao

unread,
May 17, 2013, 3:23:27 PM5/17/13
to gremli...@googlegroups.com
Hi

   I am currently prototyping using OrientDB and I have around 30 million company nodes and 3 million member nodes. The edge count is around 19 million. Yes, I am not going to serve this data in real time from this query, rather the dataset is built periodically and pushed to another store to serve real time applications.

I am not looking for a under 10ms performance from this query but still something under a second. The most dense graph that I have with this data set returns me around 35000 suggestions in about 3-4 seconds but I don't need all 35000 suggestions and that is why I am limiting it to 200 but I want the top 200. 

Best

PM

Daniel Kuppitz

unread,
May 17, 2013, 6:10:29 PM5/17/13
to gremli...@googlegroups.com
I see, I misunderstood your initial post. I thought you have up to 1000 edges per company/member. If you have only 35.000 at the last step, then it should be quite easy. In Gremlin I would do:

member.as("me").out("connectedTo").in("connectedTo").dedup().except("me").out("connectedTo").except(neighbors).groupCount().cap().orderMap(T.decr)[0..199]

You Java guys know better how to write it in Java. I gave up trying it :).

Cheers,
Daniel
Reply all
Reply to author
Forward
0 new messages