Gremlin Or queries/ nested queries/ array

302 views
Skip to first unread message

Bhargav Raut

unread,
Feb 25, 2015, 5:39:35 PM2/25/15
to aureliu...@googlegroups.com
Hi,

I have a simple gremlin query issue.

Assume that I have vertices(label:country) each with a property: country_name:(String), and another property called GDP:(float)

I also have a huge number of vertices (label:citizen), each with edges (knows -> other citizen)
and property -> country_name(String)

I now have a simple issue.
I need to get all the countries which have a gdp within a certain range, and then i have to get all the citizens who belong to those countries, and sort by whoever has the maximum number of connections.

My problem is that the citizens are not directly connected to the country vertices by edges

So how do I tell gremlin to get citizens given an array of countries?

My code so far:

STEP ONE:get the country_names with a gdp between 12.2 and 14.5

m = []
g.V.interval("GDP",12.2,14.5).country_name.fill(m)

STEP TWO:get those citizens which belong to the above countries and sort by maximum connections.

Some thoughts I have had:

NESTED QUERY?

g.V.has("country_name",(g.V.interval("GDP",12.2,14.5).country_name))

OR

Iterate over array?

m.eachWithIndex{obj,k->
   g.V.has("country_name",obj).outE()......
}

Regards,
Bhargav.





Matt Frantz

unread,
Feb 27, 2015, 11:31:03 PM2/27/15
to aureliu...@googlegroups.com
Interesting problem.  I, too, have these kinds of "join" use cases, where I essentially want to infer an edge rather than materialize that edge.

Have you considered posting this on gremlin-users?  That seems to have lively response time on these kinds of "how do I do this in Gremlin" queries.  I don't really have a good handle on the 2.x Gremlin API, and I'm only a few months into Gremlin 3.0.  However, I have seen gurus respond on gremlin-users for 2.x queries.  Just make sure you say that you're using a specific version of Titan so they know how to answer the question.

My guess would be that your first guess would perform better because the country data set is small enough to fit in memory and query repeatedly, whereas the second one would essentially run through all of the vertices for each country.

Daniel Kuppitz

unread,
Mar 3, 2015, 2:45:33 AM3/3/15
to aureliu...@googlegroups.com
Hi Barghav,

I guess the best way is this one:

countries = g.V.has("label", "country").interval("GDP",12.2,14.5).country_name.toSet().toArray()
g.V.has("label", "citizen").has("country_name", T.in, countries)

Cheers,
Daniel
Reply all
Reply to author
Forward
0 new messages