[gremlin-python] multiple 'has' steps on vertex works in gremlin-groovy but not in gremlin-python/goblin

1,406 views
Skip to first unread message

John B

unread,
Nov 21, 2018, 6:05:42 PM11/21/18
to Gremlin-users
I'm trying to select a vertex based on the values of two of its (indexed) properties. The following works perfect in the gremlin-groovy shell:

    gremlin> g.V().has('vlabel', 'truck_loc').V().has('vehicle_id', "3042").next()

This query finds a vertex that has the properties:  vlabel = truck_loc and vehicle_id = 3042. It runs quickly.

In gremlin-python (version 3.3.4), using aiogremlin/goblin, the same query seems to run forever and uses up increasing amounts of memory:

    truck_loc = await session.g.V().has('vlabel', 'truck_loc').V().has('vehicle_id', "3042").next()

However, the following query, without the second V()., returns None:

    truck_loc = await session.g.V().has('vlabel', 'truck_loc').has('vehicle_id', "3042").next()

The backend for this graph is janusgraph/cassandra. The entire graph has millions of vertices and edges.

Does anyone know why the query won't work in gremlin-python, and how I might get it to work? Its there a better way to filter on multiple vertex properties?




Benjamin Ross

unread,
Nov 21, 2018, 8:11:20 PM11/21/18
to Gremlin-users
Have you considered restructuring your graph?

V() will load all vertexes into memory, which you say is very large number, and then searching by property will scan through every single one.

Perhaps you can have a vertex for each truck location where its ID is the "truck_loc_id" in your example and its vertex label is "truck location". Then have outgoing edges connected to vehicle vertices you are querying for with edge label "contains vehicle".

So your query becomes: g.V('truck_loc').out('contains vehicle') 

and you essentially turn a O(n) scan operation into a O(1) query.

Further you can use filter operations to filter by vehicle properties, but again I'd suggest some sort of restructure as searching by properties (even on a now subset of your graph) is slow, so avoid if you can. 

Curious why you chose the name "vehicle_id" to be a property on a vehicle vertex and not its actual ID because ID normally implies it is unique to that vertex which from your example suggests it is not. Again, this ID "property" can be another vertex with outgoing edges, and now filtering operations can be made faster.

As to why queries are not working, there are experts on this forum that can be of more help under the hood, but my guess would be due to inefficiencies of query in relation to size of graph, and some implementations can just handle the scale better than others.

John B

unread,
Nov 21, 2018, 8:37:30 PM11/21/18
to Gremlin-users
This might be related to my problem. The following query uses a non-existing vertex id:

   g.V(111417214712).has('vlabel', 'truck_loc').count()

It runs as expected in the gremlin-groovy console, quickly returning an answer of 0.

The same query in gremlin-python via aiogremlin/goblin hangs and eats up memory:

  truck_loc = (await session.g.V(111417214712).has('vlabel', 'truck_loc').count().toList())

Does anyone understand why these queries in gremlin-groovy and gremlin-python are behaving so differently?

John B

unread,
Nov 21, 2018, 9:14:45 PM11/21/18
to Gremlin-users
Thanks Benjamin:

I do have a vertex for each truck location, and can create a fast query, as you suggest, when I know the ID of the vertex I am looking for. That is, g.V(ID)....etc. My task is to find that ID for the vertex that has the properties I desire.

As an aside, when I use only the first has() in my query, it runs just fine and fast in both implementations:
    await session.g.V().has('vlabel', 'truck_loc').next()

It is the second has(), or perhaps .V().has(), that seems to be causing the problem in gremlin-python. Both properties for the has's are indexed. Does a query like g.V().has(prop1, prop_value1).has(prop2, prop_value2).next()  require a single composite index with two keys?  That is, wouldn't two one-key composite indexes also work for this query (although somewhat slower than a single two-key index)?
John

John B

unread,
Nov 21, 2018, 11:36:25 PM11/21/18
to Gremlin-users
Perhaps I'm not understanding what the has steps are doing. Consider these three queries in gremlin-groovy, again where both vlabel and vehicle_id are properties of a (truck_loc) vertex:

gremlin> g.V().has('vlabel', 'truck_loc').has('vehicle_id', '3042').count()
==>0

gremlin> g.V(417214712).has('vlabel', 'truck_loc').has('vehicle_id', '3042').count()
==>1

gremlin> g.V(417214712).has('vlabel', 'truck_loc').has('vehicle_id', '3042').next()
==>v[417214712]

Why would the first query say there are no vertices that meet the two equality conditions, yet when I choose a specific (truck_loc) vertex to start from a match for the equality conditions can be found?

Benjamin Ross

unread,
Nov 23, 2018, 7:40:49 AM11/23/18
to Gremlin-users
Interesting. I have tried your queries on AWS Neptune and the repeated has steps seems to work so I think something strange might be happening on your end that perhaps one of the experts on this forum can comment on. With that said, searching by properties is slow. Think about if it makes sense to create another set of vertices for each property and have edges connect to the vertices that would otherwise contain these properties.

g.addV('test').property(id,'testNode').property('prop1','p1').property('prop2','p2')

g.V().has('prop1','p1').has('prop2','p2').count()

>> [{"@type":"g:Int64","@value":1}]

Robert Dale

unread,
Nov 23, 2018, 9:40:13 AM11/23/18
to Gremlin-users
Has steps represent some set of constraints.  A graph implementation is free to optimize and apply those steps however it sees fit.  If I had to guess, I would say Janusgraph is using an index lookup and this item isn't in that index. You should be able to use the profile() step to see what it's doing.

From Gremlin Console: g.V().has('vlabel', 'truck_loc').has('vehicle_id', '3042').count().profile()

Hopefully there's enough information there to find the problem.

If the problem persists, I recommend taking this up on the Janusgraph user list.
Reply all
Reply to author
Forward
0 new messages