String concatenation in gremlin-python, with or without lambdas?

1,195 views
Skip to first unread message

Pierre-Yves de Brito

unread,
Feb 15, 2018, 9:59:10 AM2/15/18
to Gremlin-users
Hi Gremlin fans!

I've been working with Gremlin for a year now (and I love this language!!).
I'm a bit stuck on a query. This is why I'm posting here.

I'm basically trying to aggregate strings (vertices properties) in a large gremlin-python query to populate newly created vertices with these aggregated strings.
(this field is used later in solr for search purposes)

Here is a basic schema that enable to test the query:

schema.propertyKey("name").Text().single().create()
schema.propertyKey("location").Text().single().create()
schema.vertexLabel("legal_entity").properties("name", "location").create()
schema.config().option("graph.allow_scan").set("true")

g.addV("legal_entity").property("name","renault").property("location","FR")
g.addV("legal_entity").property("name","bmw").property("location","DE")

Here is a working query in gremlin, with a lambda:
g.V().values('location').dedup().fold(""){a,b->a+' '+b} returns the expected " FR DE".

Ideally, I would like to either translate it in pure gremlin to avoid the lambda. Or if it's not possible, I'd like to translate the lambda in gremlin-python.
I tried the latter without success:
g.V().values('location').dedup().fold("",lambda: ("lambda x, y: x+' '+y", "gremlin-python")).toList() returns an empty list while result = g.V().values('location').dedup().fold().next() returns the expected ['FR', 'DE']
I tried the other gremlin-python syntaxes in the link below but without success
http://tinkerpop.apache.org/docs/current/reference/#gremlin-variants

Do you have any Ideas?

Thanks for your help!
Pierre-Yves de Brito

ps : I'm working with DSE 5.1.3

Stephen Mallette

unread,
Feb 15, 2018, 10:07:17 AM2/15/18
to Gremlin-users
You can't do a python lambda string in in gremlin-python with DSE Graph. DSE Graph doesn't bundle gremlin-python on the server and so it doesn't know how to eval that string. If you wanted to use a lambda you would have to write it as groovy, thus:

>>> g.V().values('name').dedup().fold("",lambda: ("a,b->a + ' ' + b","gremlin-groovy")).toList()
[u' marko vadas lop josh ripple peter']

Before you go down that route you should question whether or not you need to do that simple string transformation on the server or if you could get by with doing it on the client when the data is returned. Always best to avoid lambdas if you can.

--
You received this message because you are subscribed to the Google Groups "Gremlin-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gremlin-users+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gremlin-users/f338b8e3-752a-4629-9e17-af471e588b08%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Pierre-Yves de Brito

unread,
Feb 15, 2018, 11:39:46 AM2/15/18
to Gremlin-users
Thanks a lot!
The aggregated string is stored on a vertex on the fly, I don't want to go back and forth from the server to the local script.
I've just tried with :

.fold("",lambda: ("a,b->a + ' ' + b", "gremlin-groovy")).toList()
Unfortunately, it throws:
dse.InvalidRequest: Error from server: code=2200 [Invalid query] message="Could not locate method: DefaultGraphTraversal.fold([, a,b->a + ' ' + b])"

do you have an idea why?

Best regards,
Pierre-Yves

Stephen Mallette

unread,
Feb 15, 2018, 11:54:21 AM2/15/18
to Gremlin-users
hmm - no. not sure why that would happen. perhaps try that lambda syntax with a more simplisitic traversal? like perhaps use one of the examples in the docs:


--
You received this message because you are subscribed to the Google Groups "Gremlin-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gremlin-users+unsubscribe@googlegroups.com.

Stephen Mallette

unread,
Feb 15, 2018, 11:55:27 AM2/15/18
to Gremlin-users
sorry - hit send accidentally with hot keys before i was done typing. here was the link i wanted to provide:


and maybe you just do a simple:

g.V().out().map(lambda: ("it.get().value('name').length()", "gremlin-groovy"))

and see if that works

Message has been deleted
Message has been deleted

Pierre-Yves de Brito

unread,
Feb 15, 2018, 12:09:10 PM2/15/18
to Gremlin-users
same error here:
dse.InvalidRequest: Error from server: code=2200 [Invalid query] message="Could not locate method: DefaultGraphTraversal.map([it.
get().value('location_search').length()])"

may be a missing import?

here is what I use:

import csv
import sys
from dse.cluster import Cluster
from dse.policies import DCAwareRoundRobinPolicy
from dse.policies import TokenAwarePolicy
from dse.auth import PlainTextAuthProvider
from dse_graph import DseGraph
from dse_graph.predicates import Search
from gremlin_python.process.graph_traversal import __
from gremlin_python import statics
import ast

Pierre-Yves de Brito

unread,
Feb 15, 2018, 12:15:19 PM2/15/18
to Gremlin-users
If it helps, here is how I execute the query:

servers = ast.literal_eval(str(sys.argv[4]))
graph = str(sys.argv[5])
cluster = Cluster(contact_points=servers)
session = cluster.connect()
g = DseGraph.traversal_source(session, graph)

q = """result = g.V("{~label=legal_entity, source=rmpm, id=12456789}").both().hasLabel("legal_entity").map(lambda: ("it.get().value('location_search').length()","gremlin-groovy")).toList()"""
ldict = locals()
exec(q,globals(),ldict)
print(ldict['result'])

Daniel Kuppitz

unread,
Feb 15, 2018, 1:06:39 PM2/15/18
to gremli...@googlegroups.com
Are you trying to get a global aggregation of certain values to store them in another single vertex? This would be easy, wouldn't require an index and no String concatenation.

// Schema:
schema.propertyKey("name").Text().single().create()
schema.propertyKey("location").Text().single().create()
schema.propertyKey("locations").Text().multiple().create()
schema.vertexLabel("legal_entity").properties("name", "location").create()
schema.vertexLabel("legal_entity_stats").properties("locations").create()
schema.config().option("graph.allow_scan").set("true")

// Sample vertices
g.addV("legal_entity").property("name","renault").property("location","FR").
  addV("legal_entity").property("name","bmw").property("location","DE").iterate()

// Aggregation and storage of location values
g.V().hasLabel('legal_entity_stats').fold().
  coalesce(
    unfold().sideEffect(properties('locations').drop()),
    addV('legal_entity_stats')).as('stats').
  V().hasLabel('legal_entity').
    values('location').dedup().as('location').
  select('stats').
    property(list, 'locations', select('location')).iterate()

Result:

Inline image 1

Cheers,
Daniel


--
You received this message because you are subscribed to the Google Groups "Gremlin-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gremlin-users+unsubscribe@googlegroups.com.

Pierre-Yves de Brito

unread,
Feb 15, 2018, 2:22:31 PM2/15/18
to Gremlin-users
Thanks Daniel for coming in!
This concatenated field is indexed in solr, I use it to search for companies.
The company receiving the concatenated field is a "hat" company, that regroups the images of the company in all the systems that I loaded in the graph.
Then, in the search, I only look for hat companies.
I wanted to concatenate the location strings of the companies to an indexed search field. (Solr works really well for this use case with our volumetry)
Your query builds a list if I understand correctly, do you think you can aggregate the strings in pure gremlin?

2 other options :
- retrieve the strings in python and push them back, which is slower, and not as clean
- index lists if this works instead of strings, but we'll need full solr capabilities on that, including facets. I didn't investigate this so far. (and it's a major schema change)

Thanks!
Pierre-Yves

Daniel Kuppitz

unread,
Feb 15, 2018, 3:26:29 PM2/15/18
to gremli...@googlegroups.com
I don't know about facets, but I just tried to create a search index over the multi-valued property and it just worked:

schema.vertexLabel('legal_entity_stats').index('search').search().by('locations').add()

Plus the following query returned the expected result:

g.V().has('legal_entity_stats', 'locations', Search.token('DE'))

Cheers,
Daniel


--
You received this message because you are subscribed to the Google Groups "Gremlin-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gremlin-users+unsubscribe@googlegroups.com.

Pierre-Yves de Brito

unread,
Feb 15, 2018, 4:58:50 PM2/15/18
to Gremlin-users
Thanks a lot Daniel!

I'll investigate this further, it seems promising, and clean!
Let's hope everything goes well on Solr side.

Pierre-Yves

Pierre-Yves de Brito

unread,
Feb 16, 2018, 5:42:19 AM2/16/18
to Gremlin-users
The document is not stored into Solr because it expects single values documents with the default schema:

ERROR [minimal.legal_entity_stats_p Index WorkPool work thread-8] 2018-02-16 10:16:19,034  Cql3SolrSecondaryIndex.java:724 - Exception writing document id ["1301494912","3"] to the index; possible analysis error: DocValuesField "locations" appears more than once in this document (only one value is allowed per field)

This is coherent with the schema definition:
    <field docValues="true" indexed="true" multiValued="false" name="locations" stored="true" type="StrField"/>
    <field indexed="true" multiValued="false" name="locations_analyzed" stored="true" type="TextField"/>
    <copyField dest="locations_analyzed" source="locations"/>

I tried to tweak the schema to try with multi-valued field, but it doesn't seem to work either:
ERROR [minimal.legal_entity_stats_p Index WorkPool work thread-8] 2018-02-16 10:38:39,732  Cql3SolrSecondaryIndex.java:724 - Exception writing document id ["1301494912","3"] to the index; possible analysis error: cannot change DocValues type from SORTED to SORTED_SET for field "locations"

The only way I managed to get an answer to the search query was to write it directly after the creation query. In which case I assume that Solr is bypassed (?)

Can you try running your search query separately from the creation of the vertex to check this assumption?

Thanks!
Pierre-Yves
Reply all
Reply to author
Forward
0 new messages