Populating array property value via gremlin-python

1,203 views
Skip to first unread message

Scott Friedman

unread,
Oct 11, 2018, 5:23:45 PM10/11/18
to Gremlin-users
Good afternoon,

I've successfully created and stored arrays (of floats) as properties via the Gremlin console (and JanusGraph/Cassandra backend).  For example, for the float[] property vec4, which has been explicitly defined as a float[] data type:

mgmt.makePropertyKey('vec4').dataType(float[]).cardinality(Cardinality.SINGLE).make()
...
g
.addV().property('vec4', [1, 2, 3, 4] as float[]).next()

I've verified that I can retrieve the vector value as well.  This is great.  However, from my Python application, if I merely try to populate this property via ...property('vec4', [1,2,3,4]) from gremlin-python GLV, I get an error from the Gremlin server:

gremlin_python.driver.protocol.GremlinServerError: 500: Value [[1, 2, 3, 4]] is not an instance of the expected data type for property key [vec5] and cannot be converted. Expected: class [F, found: class java.util.ArrayList

I understand what this error means: my application has not given Gremlin Server explicit information that my value is a floating-point array, so it's interpreting it as an ArrayList.  Unfortunately, I'm not sure how to specify the type (or cast) from the gremlin-python GLV.

Thanks for any insight!

Scott

Stephen Mallette

unread,
Oct 12, 2018, 7:49:01 AM10/12/18
to gremli...@googlegroups.com
I think that your error message is coming from JanusGraph. I'm not sure what you can do about that from TinkerPop's perspective. We don't have a specific type in GraphSON that deals with primitive arrays so when your request gets to the server it ends up coming up as a List and you get your error. 

I tend to feel like graphs that have schemas should attempt to coerce values to their required types. Perhaps you could open an issue in JanusGraph for that (perhaps it's been brought up before though and shot down for some reason, because I seem to remember this issue going back to Titan days). Maybe JanusGraph folks following along here could comment....

There aren't a ton of workarounds that I can think of other than to submit a script rather than use Gremlin bytecode based requests. With a script you can send the "as float[]" part which would solve the problem but, sending scripts - gross. You could go with LIST cardinality i guess in defining the schema. Then regular bytecode based requests would work.



--
You received this message because you are subscribed to the Google Groups "Gremlin-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gremlin-user...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gremlin-users/50063125-8044-4376-ae4f-05cb3abaf52a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Florian Hockmann

unread,
Oct 12, 2018, 8:26:12 AM10/12/18
to Gremlin-users
Scott mentioned in a similar post on JanusGraph users that he wants to store vectors with 50-300 dimensions in these properties. I recommended against simply using LIST cardinality for such a vector use case, but that was mostly a gut feeling on my side that the cardinality seems to be intended for other cases than storing vectors like these. Is that correct or would you still recommend to use LIST cardinality in this case? (Currently it seems to be the only option besides sending raw scripts. I wondered more about whether LIST cardinality is in general a good fit when someone wants to store a vector of primitive values.)
Unfortunately, I didn't remember at that time that we don't have a handling for arrays in GraphSON which led him run into this problem.

I guess we could add support for serialization of arrays in JanusGraph and the new libraries that extend the GLVs like Gremlin-Python which are currently under development (this PR adds the first version of JanusGraph-Python).

Scott Friedman

unread,
Oct 12, 2018, 10:53:02 AM10/12/18
to Gremlin-users
FWIW, I'm building a large-scale knowledge graph layer that includes vectors (e.g., derived from neural word embeddings) to support AI/ML R&D.  JanusGraph seems like a sensible solution, and TinkerPop/gremlin-python offer ample query expressivity.

I've verified that I can pass a script to the JanusGraph gremlin server and add a new vector as a property value of an existing vertex (thanks, Stephen!).

I'd be thrilled with Stephen's suggestion of server-side schema-directed type coercion in JanusGraph, and Florian's suggestion of array serialization in JanusGraph.  I think this would support future knowledge-heavy Python AI/ML projects, but I understand from managing R&D projects that some simple-sounding features like type coercion and serialization can have many devils in the details.

Stephen Mallette

unread,
Oct 12, 2018, 11:04:27 AM10/12/18
to gremli...@googlegroups.com
>   I recommended against simply using LIST cardinality for such a vector use case

This is why I dislike multiproperties - when is it a list cardinality and when is it a list. Too many choices for users imo. Anyway, I suppose I'd agree that LIST cardinality isn't as nice here as float[] given the nature of the use case. I only mentioned it b/c it was a workaround. 



Stephen Mallette

unread,
Oct 12, 2018, 3:42:27 PM10/12/18
to gremli...@googlegroups.com
Kevin Gallardo was asking me about how to append items to list values using just the traversal API. I came up with this:

gremlin> g.addV().property('list',[1,2,3])
==>v[0]
gremlin> g.V().property('list',union(values('list').unfold(),constant(4)).fold())
==>v[0]
gremlin> g.V().valueMap()
==>[list:[[1,2,3,4]]]

Maybe there's an easier way, Kuppitz? Perhaps this is easier to follow:

gremlin> g.V().property('list',values('list').unfold().inject(5).fold())
==>v[0]
gremlin> g.V().valueMap()
==>[list:[[5,1,2,3,4]]]

Either way, looks hard for a graph to optimize that one if they had the ability to. 



Florian Hockmann

unread,
Oct 15, 2018, 3:46:41 AM10/15/18
to Gremlin-users
Since JanusGraph already supports arrays for property types, JanusGraph should also support using them via GraphSON. I just created issues for this JanusGraph/janusgraph#1295, JanusGraph/janusgraph-python#12 and JanusGraph/janusgraph-dotnet#4.

Scott, I would probably go forward with using arrays by submitting scripts from gremlin-python until we have handled the GraphSON serialization of arrays. That way, you can simply stop using scripts as soon as the serialization works.

Daniel Kuppitz

unread,
Oct 15, 2018, 9:42:38 AM10/15/18
to gremli...@googlegroups.com
Maybe there's an easier way, Kuppitz?

Sure...

gremlin> g = TinkerGraph.open().traversal()
==>graphtraversalsource[tinkergraph[vertices:0 edges:0], standard]
gremlin> g.addV().property('list',[1,2,3])
==>v[0]
gremlin> g.V().property('list', sack(assign).by('list').
                                sack(addAll).by(constant([4])).
                                sack())
==>v[0]
gremlin> g.V().valueMap()
==>[list:[[1,2,3,4]]]

Cheers,
Daniel


Kevin Gallardo

unread,
Oct 15, 2018, 10:12:01 AM10/15/18
to gremli...@googlegroups.com
Interesting, the `addAll` operator also supports maps:

gremlin> g = TinkerGraph.open().traversal()
==>graphtraversalsource[tinkergraph[vertices:0 edges:0], standard]
gremlin> g.addV().property('map', [1:'1'])
==>v[0]
gremlin> g.V().properties('map')
==>vp[map->{1=1}]
gremlin> g.V().property('map', sack(assign).by('map').sack(addAll).by(constant([2:'2'])).sack())
==>v[0]
gremlin> g.V().properties('map')
==>vp[map->{1=1, 2=2}]

If anybody's interested.


For more options, visit https://groups.google.com/d/optout.


--
Kévin Gallardo.
Software Engineer at DataStax.


Stephen Mallette

unread,
Oct 15, 2018, 10:14:55 AM10/15/18
to gremli...@googlegroups.com
any advantage to using sack() over the other options? i mean, it's arguable that any of these options is "easier" than the others, so is there anything that makes sack() stand out as the best way? 

On Mon, Oct 15, 2018 at 9:42 AM Daniel Kuppitz <m...@gremlin.guru> wrote:

Daniel Kuppitz

unread,
Oct 15, 2018, 10:25:13 AM10/15/18
to gremli...@googlegroups.com
The metrics. It's much more expensive to unfold and fold again as you'll create a whole bunch of new traversers for all these operations. sack() on the other hand only operates on the plain value.

Cheers,
Daniel


Daniel Kuppitz

unread,
Oct 15, 2018, 10:29:35 AM10/15/18
to gremli...@googlegroups.com
Sorry, sent too early, I wanted to include the metrics:

union:

gremlin> g.V().property('list', union(values('list').unfold(), constant(4)).fold()).profile()
==>Traversal Metrics
Step                                                               Count  Traversers       Time (ms)    % Dur
=============================================================================================================
TinkerGraphStep(vertex,[])                                             1           1           0.051     1.02
AddPropertyStep({value=[[UnionStep([[Properties...                     1           1           5.013    98.98
  UnionStep([[PropertiesStep([list],value), Pro...                     4           4           2.159
    PropertiesStep([list],value)                                       1           1           0.022
    UnfoldStep                                                         3           3           0.088
    NoOpBarrierStep(2500)                                              3           3           0.098
    EndStep                                                            3           3           0.077
    ConstantStep(4)                                                    1           1           0.008
    EndStep                                                            1           1           0.036
  FoldStep                                                             1           1           0.388
                                            >TOTAL                     -           -           5.064        -

sack:

gremlin> g.V().property('list', sack(assign).by('list').sack(addAll).by(constant([4])).sack()).profile()
==>Traversal Metrics
Step                                                               Count  Traversers       Time (ms)    % Dur
=============================================================================================================
TinkerGraphStep(vertex,[])                                             1           1           0.053    14.00
AddPropertyStep({value=[[SackValueStep(assign,v...                     1           1           0.325    86.00
  SackValueStep(assign,value(list))                                    1           1           0.020
  SackValueStep(addAll,[ConstantStep([4]), Prof...                     1           1           0.063
    ConstantStep([4])                                                  1           1           0.008
  SackStep                                                             1           1           0.073
                                            >TOTAL                     -           -           0.379        -

Cheers,
Daniel

Reply all
Reply to author
Forward
0 new messages