Migrating queries from Gremlin Console to Python

1,165 views
Skip to first unread message

wjim0324

unread,
Jun 21, 2020, 1:46:42 PM6/21/20
to Gremlin-users
I am trying to migrate a graph and queries from Gremlin Console to Python App, and I am having several issues.

I'd like to be self-sufficient... I've searched the Tinkerpop docs and the Practical Guide. I'm scouring this forum, StackOverflow and others.  

Is there a guide or reference to use in constructing python queries from gremlin console queries?

Background
I built a directed graph and a set of queries using Gremlin Console. I implemented the graph to Gremlin Server. I am able to issue the queries from Gremlin Console.My next task is to migrate the queries to a python web app that interacts with the graph on Gremlin Server.

I am running into three types of issues...
1. identifying what modules need to be imported
2. determining the needed changes to the queries based on python reserved words; functions, etc.
3. debugging errors and python error messages

As an example for #1, I found that I needed additional imports over and above the information provided in [ https://tinkerpop.apache.org/docs/3.4.7/reference/#python-imports ]. The asyncio import and parameters are required because my implementation is on Windows 10)
import socket
import asyncio
asyncio.set_event_loop_policy(asyncio.WindowsSelectorEventLoopPolicy())  # python-3.8.0a4
import gremlin_python

As an example for #2, I found that posting that indicated I needed to change any "element" that matches a Python reserved word to "element_", such as '.is' to '.is_"

As an example of #3, I have a rather long query to enumerate the shortest paths through the graph based on weights. All I get is "invalid sytax".

Here is the command from Gremlin Console to Gremlin Server and response..

gremlin> g.withComputer().V(1).shortestPath().with(ShortestPath.distance,'weight').with(ShortestPath.includeEdges, true).with(ShortestPath.target, hasId(29)).filter(count(local).is(gt(1))).group().by(project('from','to').by(limit(local, 1)).by(tail(local, 1))).unfold().order().by(select(keys).select('from').id()).by(select(keys).select('to').id()).toList()

==>{from=v[1], to=v[29]}=[path[v[1], e[36][1-route->2], v[2], e[51][2-route->7], v[7], e[55][7-route->11], v[11], e[76][11-route->24], v[24], e[80][24-route->29], v[29]], path[v[1], e[36][1-route->2], v[2], e[51][2-route->7], v[7], e[65][7-route->16], v[16], e[78][16-route->24], v[24], e[80][24-route->29], v[29]], path[v[1], e[35][1-route->2], v[2], e[51][2-route->7], v[7], e[55][7-route->11], v[11], e[76][11-route->24], v[24], e[80][24-route->29], v[29]], path[v[1], e[35][1-route->2], v[2], e[51][2-route->7], v[7], e[65][7-route->16], v[16], e[78][16-route->24], v[24], e[80][24-route->29], v[29]]]

Here is the same command from Python console to Gremlin Server.... and error message

>>> g.withComputer().V(1).shortestPath().with(ShortestPath.distance,'weight').with(ShortestPath.includeEdges, true).with(ShortestPath.target, hasId(29)).filter(count(local).is(gt(1))).group().by(project('from','to').by(limit(local, 1)).by(tail(local, 1))).unfold().order().by(select(keys).select('from').id()).by(select(keys).select('to').id()).toList()
 
File "<stdin>", line 1
    g.withComputer().V(1).shortestPath().with(ShortestPath.distance,'weight').with(ShortestPath.includeEdges, true).with(ShortestPath.target, hasId(29)).filter(count(local).is(gt(1))).group().by(project('from','to').by(limit(local, 1)).by(tail(local, 1))).unfold().order().by(select(keys).select('from').id()).by(select(keys).select('to').id()).toList()
                                         ^
SyntaxError: invalid syntax
>>>

Here is the modified command to replace python reserved words....  ".with" ==> ".with_" and ".is" ==> ".is_"

>>> g.withComputer().V(1).shortestPath().with_(ShortestPath.distance,'weight').with_(ShortestPath.includeEdges, true).with_(ShortestPath.target, hasId(29)).filter(count(local).is_(gt(1))).group().by(project('from','to').by(limit(local, 1)).by(tail(local, 1))).unfold().order().by(select(keys).select('from').id()).by(select(keys).select('to').id()).toList()

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NameError: name 'ShortestPath' is not defined

Stephen Mallette

unread,
Jun 22, 2020, 2:24:59 PM6/22/20
to gremli...@googlegroups.com
> Is there a guide or reference to use in constructing python queries from gremlin console queries?

I'm afraid that there are no other guides out there beyond what is written in the Reference Documentation. If you'd like to help improve our documentation we can take PRs on that:


Expanding the Common Imports with other imports that you found missing would be a good/easy start. I'm not sure what more we should do with the renamed steps, they are listed here:


but folks seem to miss that. I was thinking that maybe the actual Traversal Steps should have examples in their variants or at least a callout box with the alternative step name. Do you have other ideas?

We also have an open issue 


for a native translators in python to the extent that this would help...technically, we'd want a Translator written in Groovy that would write the Python syntax so that we could do Gremlin Console conversions as a command of some sort. 

I'd suggested to someone else with a similar issue that perhaps you could develop your Gremlin in a Python console so that you write it in the form you intend to use it. Maybe that helps with the copy/paste from the console to your code. Of course, the raw python console isn't quite as nice as the Gremlin Console.











--
You received this message because you are subscribed to the Google Groups "Gremlin-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gremlin-user...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gremlin-users/6604eab9-1cca-422f-a76e-dc518fcd9b32o%40googlegroups.com.

wjim0324

unread,
Jun 24, 2020, 9:04:45 AM6/24/20
to Gremlin-users


On Monday, June 22, 2020 at 2:24:59 PM UTC-4, Stephen Mallette wrote:

> Do you have other ideas?

I do have an idea of what would help me with my current problem. I also think that I would need this in the event that I teach the topic.

As I continue to searched for solutions, I came upon the following great document.... http://gremlindocs.spmallette.documentup.com/ . I've not found a more current version of this content.

I'm thinking that would be a good starting point. It would be great if (a) the vocabulary was expanded, (b) it was expanded to include "typical usage" and "common errors", and (c) linked to standard Tinkerpop documentation if it is not already.

For expanded vocabulary.... I cannot find references on how to use   .project(a,b,c)

I've included "typical usage" examples below that includes a brief description and examples from Gremlin and Python.

As for "common errors", I keep encountering two and I have not found how to deal with them...
  • Python output that seems to represent "byte code" rather than query results (as shown in g.V() below)
  • Python output... TypeError: 'GraphTraversal' object is not callable

Suggested typical usage & common errors....

g.V()

g is the TraversalSource, by specifying V() you're saying that you are going to start at a set of elements that are of type Vertex, just like by specifying E() would specify starting at a set of elements that are of type Edge

Gremlin Console to Gremlin Server

Python Command Line to Gremlin Server

gremlin> g.V()

==>v[0]

==>v[1]

==>v[2]

==>v[3]

==>v[4]

… for all V

>>> g.V()

[['withStrategies', VertexProgramStrategy]][['V']]

 

>>> g.V().tolist()

Traceback (most recent call last):

  File "<stdin>", line 1, in <module>

TypeError: 'GraphTraversal' object is not callable



g.V().both().toList()

Get both adjacent vertices of the vertex, the in and the out.

Gremlin Console to Gremlin Server

Python Command Line to Gremlin Server

gremlin> g.V().both().toList()

==>v[2]

==>v[2]

==>v[2]

==>v[2]

==>v[7]

==>v[20]

==>v[7]

==>v[20]

==>v[7]

==>v[20]

....

gremlin>

g>>> g.V().both().toList()

[v[11], v[11], v[11], v[11], v[11], v[11], v[11], v[24], v[24], v[24], v[24], v[24], v[24], v[24], v[24], v[24], v[29], v[29], v[16], v[16], v[16], v[16], v[16], v[16], v[16], v[16], v[16], v[16], v[16], v[16], v[16], v[16], v[20], v[20], v[20], v[20], v[20], v[20], v[22], v[22], v[22], v[22], v[1], v[1], v[1], v[1], v[2], v[2], v[2], v[2], v[2], v[2], v[2], v[2], v[2], v[2], v[2], v[2], v[2], v[2], v[2], v[2], v[2], v[2], v[2], v[2], v[2], v[2], v[7], v[7], v[7], v[7], v[7], v[7], v[7], v[7], v[7], v[7], v[7], v[7], v[7], v[7], v[7], v[7], v[7], v[7], v[7], v[7], v[7], v[7], v[7], v[7], v[7], v[7], v[7], v[7], v[7], v[7]]

>>> 












Stephen Mallette

unread,
Jun 24, 2020, 10:25:34 AM6/24/20
to gremli...@googlegroups.com
Good ol' GremlinDocs.... :)

GremlinDocs was basically the model for the current Reference Documentation. Every single step is listed there in the same way it was for GremlinDocs with the same basic fashion - a description and examples:


If you scroll down from the above link to get to the individual steps I'd say that matches quite closely to GremlinDocs style. And on the left in the table of contents, all the steps are listed in the same way as GremlinDocs, though they are alphabetical  rather than grouped by type (we should perhaps do the grouping...). 

> I cannot find references on how to use   .project(a,b,c)

Well, I guess it's this:


but, perhaps more verbiage and examples would probably help now that I look at it from your perspective. 

> Python output that seems to represent "byte code" rather than query results (as shown in g.V() below)

This situation is not an "error" and it's described in a few places in our reference docs and elsewhere. The "g" is a GraphTraversalSource. It's job is to spawn or construct traversals. So when you do:

g.V();

it doesn't execute anything! It's really just creating a Traversal object which on its own does nothing. It is essentially doing this:

Traversal t = g.V();

You need to "iterate" the Traversal object to make it actually execute and you do that with a terminating step:


which means something like:

t.toList()
t.iterate()

you can obviously avoid the Traversal object declaration and shorthand a bit to:

g.V().toList()
g.V().iterate()

This is the same for all the languages - python, java, groovy, clojure and so on. The only place it is different is Gremlin Console where we automagically iterate the results for you. Gremlin Console provides this convenience (at the risk of confusion) so that you can save a bit of typing and so your session flows a bit more naturally. This is described in a number of places, but is done especially well here:


We could perhaps add some more content to the Python docs so that folks who use the python console can 

> Python output... TypeError: 'GraphTraversal' object is not callable

I'm not sure offhand what causes that. Do you have a full example with the full stack trace?

> Suggested typical usage & common errors....

I like your suggestion to have the side-by-side of Gremlin in the console along with the Python representation. That's basically the reason we created those code tabs so that users could switch from the Gremlin Console representation to the Python/Javascript/etc representation. I think this gets more important going forward as I'd like to change the Python library to feel more Pythonic so that it falls more inline with how .NET and Clojure implement Gremlin. I feel pretty strongly that Gremlin should look like the language you are coding in more than it looks like Groovy/Java. Unfortunately, I feel like we really need some automation there as we have so many code examples I'd be concerned about maintaining and testing them all. That needs more thought and I'm not sure I really have time to work on it directly myself at the moment.

Thank you for your thoughtful feedback. I'm happy to continue the discussion and if you'd like to help contribute some improvements that everyone can agree upon that would be great!



--
You received this message because you are subscribed to the Google Groups "Gremlin-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gremlin-user...@googlegroups.com.

wjim0324

unread,
Jun 24, 2020, 3:17:57 PM6/24/20
to Gremlin-users

On Wednesday, June 24, 2020 at 10:25:34 AM UTC-4, Stephen Mallette wrote:
Good ol' GremlinDocs.... :)

Thanks for responding to my questions and comments.I appreciate the organization of GremlinDocs to group elements by function. It provided both guidance and context. I am on the fence about whether to have language specific or language agnostic implementation. A one-to-one port of queries to python or a simple translation / migration is best for me.

To put my questions in context, I have a proof-of-concept using directed graph that was developed with gremlin console and gremlin server. I relied heavily on Practical Gremlin Tutorial. I am very pleased with the output. I now need to implement it on a web enabled software platform. I am having trouble finding relevant scenarios, recipes and reference information to working with directed graphs using python.    

Over the course of my project I expect to learn about and exercise gremlin console, gremlin server, gremlin python... and python.

I look forward to the opportunity to share my experiences.

Stephen Mallette

unread,
Jun 25, 2020, 10:09:15 AM6/25/20
to gremli...@googlegroups.com
Over the course of my project I expect to learn about and exercise gremlin console, gremlin server, gremlin python... and python.

That is a lot to consume especially if you include python itself in that mix. In that sense I would understand that you would like to copy/paste Gremlin line for line from one programming language to the next, but I tend to feel strongly that developers should be comfortable in the programming idioms of their environment when working with Gremlin. I think that leads to better productivity. I realize your experience so far hasn't quite done that but I sense that having to learn about python on top of all the other components combined with our documentation organization and context have all compounded the difficulty. At least your list didn't also include JanusGraph+Cassandra+ElasticSearch :)  

I think our documentation approach is "learn Gremlin first by way of the Console and then apply that understanding to whatever programming language you're using". If folks just want to go right to Python or Javascript or whatever then the documentation (inside TinkerPop as well as outside) doesn't lend itself too well to that. Anyway, it would be nice to see some meaningful contributions to our documentation in this area soon. 


--
You received this message because you are subscribed to the Google Groups "Gremlin-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gremlin-user...@googlegroups.com.

wjim0324

unread,
Jun 26, 2020, 8:40:24 AM6/26/20
to Gremlin-users
I Gremlin provides great functionality and I am at the prospect of completing my project .

Please take my next comments as constructive.

> That is a lot to consume especially if you include python itself in that mix. In that sense I would understand that you would like to copy/paste Gremlin line for line from one programming language to the next, but I tend to feel strongly that developers should be comfortable in the programming idioms of their environment when working with Gremlin. I think that leads to better productivity. I realize your experience so far hasn't quite done that but I sense that having to learn about python on top of all the other components combined with our documentation organization and context have all compounded the difficulty. At least your list didn't also include JanusGraph+Cassandra+ElasticSearch :) 

It's true that my experience so far has been mixed. From my perspective, having to many prereq's before being productive represents a barrier, especially when completing the proof-of-concept was pretty straight forward. My wish would be that there was a simple implementation path for gremlin on python (and other languages) that was a simple API with wrapper-ed input and output. Assign the query string as a variable, invoke gremlin and process the resultant string and return code. (maybe that's too old school... but it would be much more productive for me).  Maybe two types of interfaces.... old school API, and new school language integration.

> I think our documentation approach is "learn Gremlin first by way of the Console and then apply that understanding to whatever programming language you're using". If folks just want to go right to Python or Javascript or whatever then the documentation (inside TinkerPop as well as outside) doesn't lend itself too well to that. Anyway, it would be nice to see some meaningful contributions to our documentation in this area soon.

My experience is that the approach to implementing gremlin with language variants does not provide out-of-the-box portability of learning from Console to the target language.

My sense is that language specific gremlin-client to gremlin-server communications could lead to a large code maintenance work stream in the Gremlin project. 

I agree that the documentation needs to be consolidated, and I will try to provide input as I can.

Thanks for the great work of the Gremlin Project!
 

Jose Motta

unread,
Jun 27, 2020, 11:59:00 AM6/27/20
to Gremlin-users
Agreed! I´m also trying this same Python + Gremlin path and I am very excited about all the good stuff we have for graphs today. But I also noticed that a lot of research is required until you can build a starter project.

I just start choosing the software tools, also checking the Datastax Graph Queries and Graph Fluent API, more at the link below:

Tinkerpop is a great option and seems to be a reliable standard to create distributed architectures using docker-compose and/or kubernetes clusters. 

Kelvin Lawrence

unread,
Jun 27, 2020, 8:12:54 PM6/27/20
to Gremlin-users
I have an issue open against myself to add coverage of all the GLV languages to Practical Gremlin and I plan to start with Python but time so far has not allowed me to get there. I would say the most common issues people run into with moving from the Gremlin Console to Python are are:

  1. Remembering to use a terminal step like iterate,next & toList
  2. Discovering the reserved words that need a suffix added such as "as_"
  3. Ensuring global replacements of keywords like range are avoided (I believe the clients have already had this addressed)
  4. The fact that Python dicts cannot have immutable keys so a group() step that generates, say, as map as the key will fail unless special action is taken.
  5. Taking not that certain things like per query timeout need the app to build its own Request object.
If you need to port queries more quickly remember that you can submit text strings as well as use in-line code (byte code)

If you have run into issues other than the ones I listed above please do share them.

I will get to this in Practical Gremlin at some point but as with the TinkerPop docs I am happy to take PRs.

I have already added several Python client samples at [1]


Cheers
Kelvin

wjim0324

unread,
Jul 3, 2020, 11:37:53 AM7/3/20
to Gremlin-users

If you need to port queries more quickly remember that you can submit text strings as well as use in-line code (byte code)

If you have run into issues other than the ones I listed above please do share them.


Thank you.

I found your "basic-client.py" sample on GitHub and adapted it to my python test program.

I implemented the "client.submit(query)" snippet successfully for simple queries....

query = """
g.E().project("id","descE").by(id()).by("descE")
"""

result = client.submit(query)
future_results = result.all()
results = future_results.result()
print('Results from client submit of g.E().project("id","descE").by(id()).by("descE")')
print('')
print(results)
print('')

I am able to get responses from gremlin server for several of the queries from gremlin console. I am working on how to work with the output of the queries in my python test program... to convert type = list into useable arrays required install and import of numpy.     

I am not able to use this method to migrate all queries.... specifically those using shortest path.

I can't tell whether the problem is related to:
1. the complexity of the queries
2. the communication between client and the server
3. missing imports
4. my lack of understanding on the data format from the server, or how to display and process the server response  

Suggestions ?


Gremlin Console version....

gremlin> g.withComputer().V(1).shortestPath().with(ShortestPath.distance,'weight').with(ShortestPath.includeEdges, true).with(ShortestPath.target, hasId(29)).filter(count(local).is(gt(1))).group().by(project('from','to').by(limit(local, 1)).by(tail(local, 1))).unfold().order().by(select(keys).select('from').id()).by(select(keys).select('to').id()).toList()

==>{from=v[1], to=v[29]}=[path[v[1], e[36][1-route->2], v[2], e[51][2-route->7], v[7], e[55][7-route->11], v[11], e[76][11-route->24], v[24], e[80][24-route->29], v[29]], path[v[1], e[36][1-route->2], v[2], e[51][2-route->7], v[7], e[65][7-route->16], v[16], e[78][16-route->24], v[24], e[80][24-route->29], v[29]], path[v[1], e[35][1-route->2], v[2], e[51][2-route->7], v[7], e[55][7-route->11], v[11], e[76][11-route->24], v[24], e[80][24-route->29], v[29]], path[v[1], e[35][1-route->2], v[2], e[51][2-route->7], v[7], e[65][7-route->16], v[16], e[78][16-route->24], v[24], e[80][24-route->29], v[29]]]

Python Console version...

>>> query = """
... g.withComputer().V(1).shortestPath().with(ShortestPath.distance,'weight').with(ShortestPath.includeEdges, true).with(ShortestPath.target, hasId(29)).filter(count(local).is(gt(1))).group().by(project('from','to').by(limit(local, 1)).by(tail(local, 1))).unfold().order().by(select(keys).select('from').id()).by(select(keys).select('to').id()).toList()
... """
>>> result = client.submit(query)
>>> future_results = result.all()
>>> results = future_results.result()

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Program Files\Python38\lib\concurrent\futures\_base.py", line 439, in result
    return self.__get_result()
  File "C:\Program Files\Python38\lib\concurrent\futures\_base.py", line 388, in __get_result
    raise self._exception
  File "C:\Program Files\Python38\lib\site-packages\gremlin_python\driver\resultset.py", line 90, in cb
    f.result()
  File "C:\Program Files\Python38\lib\concurrent\futures\_base.py", line 432, in result
    return self.__get_result()
  File "C:\Program Files\Python38\lib\concurrent\futures\_base.py", line 388, in __get_result
    raise self._exception
  File "C:\Program Files\Python38\lib\concurrent\futures\thread.py", line 57, in run
    result = self.fn(*self.args, **self.kwargs)
  File "C:\Program Files\Python38\lib\site-packages\gremlin_python\driver\connection.py", line 83, in _receive
    status_code = self._protocol.data_received(data, self._results)
  File "C:\Program Files\Python38\lib\site-packages\gremlin_python\driver\protocol.py", line 83, in data_received
    message = self._message_serializer.deserialize_message(message)
  File "C:\Program Files\Python38\lib\site-packages\gremlin_python\driver\serializer.py", line 163, in deserialize_message
    return self._graphson_reader.toObject(msg)
  File "C:\Program Files\Python38\lib\site-packages\gremlin_python\structure\io\graphsonV3d0.py", line 129, in toObject
    return dict((self.toObject(k), self.toObject(v)) for k, v in obj.items())
  File "C:\Program Files\Python38\lib\site-packages\gremlin_python\structure\io\graphsonV3d0.py", line 129, in <genexpr>
    return dict((self.toObject(k), self.toObject(v)) for k, v in obj.items())
  File "C:\Program Files\Python38\lib\site-packages\gremlin_python\structure\io\graphsonV3d0.py", line 129, in toObject
    return dict((self.toObject(k), self.toObject(v)) for k, v in obj.items())
  File "C:\Program Files\Python38\lib\site-packages\gremlin_python\structure\io\graphsonV3d0.py", line 129, in <genexpr>
    return dict((self.toObject(k), self.toObject(v)) for k, v in obj.items())
  File "C:\Program Files\Python38\lib\site-packages\gremlin_python\structure\io\graphsonV3d0.py", line 126, in toObject
    return self.deserializers[obj[GraphSONUtil.TYPE_KEY]].objectify(obj[GraphSONUtil.VALUE_KEY], self)
  File "C:\Program Files\Python38\lib\site-packages\gremlin_python\structure\io\graphsonV3d0.py", line 432, in objectify
    new_list.append(reader.toObject(obj))
  File "C:\Program Files\Python38\lib\site-packages\gremlin_python\structure\io\graphsonV3d0.py", line 126, in toObject
    return self.deserializers[obj[GraphSONUtil.TYPE_KEY]].objectify(obj[GraphSONUtil.VALUE_KEY], self)
  File "C:\Program Files\Python38\lib\site-packages\gremlin_python\structure\io\graphsonV3d0.py", line 484, in objectify
    new_dict[reader.toObject(l[x])] = reader.toObject(l[x + 1])
TypeError: unhashable type: 'dict'



Stephen Mallette

unread,
Jul 7, 2020, 6:58:22 AM7/7/20
to gremli...@googlegroups.com
I think you have hit a limitation of Python with your query - you're returning a dict with an unhashable key:


The unhashable key is itself a dict. You need to restructure your data so that it can be deserialized properly to a usable data structure in Python. This blog post discusses this issue further:


 

--
You received this message because you are subscribed to the Google Groups "Gremlin-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gremlin-user...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages