Best practice for storing and using large (50-300 dimension) vector properties on vertices

67 views
Skip to first unread message

Scott Friedman

unread,
Oct 10, 2018, 4:00:34 PM10/10/18
to JanusGraph users
If I have a property whose value will be a vector (i.e., ordered list, with fixed length) of floating-point values, what is the recommended way to store this in JanusGraph?  

By my read, the most fitting would be dataType(Float.class), cardinality(Cardinality.LIST).

Furthermore, is there any precedent for defining server-side vector math, e.g., to embed dot product and L2 norm computations into the graph traversal?  Thanks for any pointers!

Regards,
Scott

Florian Hockmann

unread,
Oct 11, 2018, 3:57:29 AM10/11/18
to JanusGraph users
I think using the cardinality for this is actually not what you want. The cardinality is meant for properties that have more than one value. Customer addresses are a good example where that could make sense.

I can't find any references to that in the docs right now, but it should also be possible to define arrays as property types with JanusGraph. It should look like this if I remember it correctly: dataType(Float[].class).

TinkerPop now includes a math() step which supports some mathematical functions. I haven't used that myself yet and you can use it for vector computations.

In general, if there is something you want to achieve with a graph traversal but there is not Gremlin step for that, then you can fall back to using lambdas which allow you to use basically everything Groovy / Java (or Python if you have enabled that on the server) can do. Lambdas can also be used from Gremlin-Python which seems to be the GLV you are using.

Of course you have to decide for yourself whether it makes sense in general to do these computations as part of the graph traversal or whether you can also just retrieve the data and then do the computations in your application.

Scott Friedman

unread,
Oct 11, 2018, 2:48:23 PM10/11/18
to JanusGraph users
Thanks, Florian, I agree that this wasn't accessible via the JanusGraph or TinkerPop docs.  Your estimation of "Float[].class" didn't work as expected, but after a bit of perusing the JanusGraph code github, I was able to create a floating-point array property like this:

mgmt.makePropertyKey('vec4').dataType(float[]).cardinality(Cardinality.SINGLE).make()

Then I was able to create a vector with this property like this:

g.addV().property('vec4', [1, 2, 3, 4] as float[]).next()

And I verify like this:

gremlin> g.V().has('vec4').valueMap()
13:38:17 WARN  org.janusgraph.graphdb.transaction.StandardJanusGraphTx  - Query requires iterating over all vertices [()]. For better performance, use indexes
==>[vec4:[[1.0, 2.0, 3.0, 4.0]]]

I'll take a look at the math() and lambda solutions next, and assess some differences in timing.  The TinkerPop documents don't mention vector math (i.e., dot product), but I'll play around and dig deeper.  Thanks!

Scott Friedman

unread,
Oct 11, 2018, 3:34:59 PM10/11/18
to JanusGraph users
One remaining issue is that I can't determine how to issue this command:

g.addV().property('vec4', [1, 2, 3, 4] as float[]).next()

From my Python GLV.  If I merely try to send ...property('vec4', [1,2,3,4]) from gremlin-python, I get an error from the Gremlin server:

gremlin_python.driver.protocol.GremlinServerError: 500: Value [[1, 2, 3, 4]] is not an instance of the expected data type for property key [vec5] and cannot be converted. Expected: class [F, found: class java.util.ArrayList

I'm so close!  I'm not sure if this type-specification/casting is allowable from my GLV.

Thanks,
Scott

Florian Hockmann

unread,
Oct 12, 2018, 8:29:28 AM10/12/18
to JanusGraph users
For anyone following here. Scott also posted this question in gremlin-users. So, I'd say that we continue the discussion there, especially since part of it is related to GraphSON serialization in general which is a general TinkerPop issue (although we might want to add support for array serialization in JanusGraph in the end).
Reply all
Reply to author
Forward
0 new messages