Confused about GraphSON edges definition

42 views
Skip to first unread message

Laura Morales

unread,
Sep 2, 2021, 6:14:23 AM9/2/21
to gremli...@googlegroups.com
I'm looking at this example from TinkerPop https://tinkerpop.apache.org/docs/current/dev/io/#graphson

{"id":{"@type":"g:Int32","@value":1},"label":"person","outE":{"created":[{"id":{"@type":"g:Int32","@value":9},"inV":{"@type":"g:Int32","@value":3},"properties":{"weight":{"@type":"g:Double","@value":0.4}}}],"knows":[{"id":{"@type":"g:Int32","@value":7},"inV":{"@type":"g:Int32","@value":2},"properties":{"weight":{"@type":"g:Double","@value":0.5}}},{"id":{"@type":"g:Int32","@value":8},"inV":{"@type":"g:Int32","@value":4},"properties":{"weight":{"@type":"g:Double","@value":1.0}}}]},"properties":{"name":[{"id":{"@type":"g:Int64","@value":0},"value":"marko"}],"age":[{"id":{"@type":"g:Int64","@value":1},"value":{"@type":"g:Int32","@value":29}}]}}
{"id":{"@type":"g:Int32","@value":2},"label":"person","inE":{"knows":[{"id":{"@type":"g:Int32","@value":7},"outV":{"@type":"g:Int32","@value":1},"properties":{"weight":{"@type":"g:Double","@value":0.5}}}]},"properties":{"name":[{"id":{"@type":"g:Int64","@value":2},"value":"vadas"}],"age":[{"id":{"@type":"g:Int64","@value":3},"value":{"@type":"g:Int32","@value":27}}]}}
{"id":{"@type":"g:Int32","@value":3},"label":"software","inE":{"created":[{"id":{"@type":"g:Int32","@value":9},"outV":{"@type":"g:Int32","@value":1},"properties":{"weight":{"@type":"g:Double","@value":0.4}}},{"id":{"@type":"g:Int32","@value":11},"outV":{"@type":"g:Int32","@value":4},"properties":{"weight":{"@type":"g:Double","@value":0.4}}},{"id":{"@type":"g:Int32","@value":12},"outV":{"@type":"g:Int32","@value":6},"properties":{"weight":{"@type":"g:Double","@value":0.2}}}]},"properties":{"name":[{"id":{"@type":"g:Int64","@value":4},"value":"lop"}],"lang":[{"id":{"@type":"g:Int64","@value":5},"value":"java"}]}}
{"id":{"@type":"g:Int32","@value":4},"label":"person","inE":{"knows":[{"id":{"@type":"g:Int32","@value":8},"outV":{"@type":"g:Int32","@value":1},"properties":{"weight":{"@type":"g:Double","@value":1.0}}}]},"outE":{"created":[{"id":{"@type":"g:Int32","@value":10},"inV":{"@type":"g:Int32","@value":5},"properties":{"weight":{"@type":"g:Double","@value":1.0}}},{"id":{"@type":"g:Int32","@value":11},"inV":{"@type":"g:Int32","@value":3},"properties":{"weight":{"@type":"g:Double","@value":0.4}}}]},"properties":{"name":[{"id":{"@type":"g:Int64","@value":6},"value":"josh"}],"age":[{"id":{"@type":"g:Int64","@value":7},"value":{"@type":"g:Int32","@value":32}}]}}
{"id":{"@type":"g:Int32","@value":5},"label":"software","inE":{"created":[{"id":{"@type":"g:Int32","@value":10},"outV":{"@type":"g:Int32","@value":4},"properties":{"weight":{"@type":"g:Double","@value":1.0}}}]},"properties":{"name":[{"id":{"@type":"g:Int64","@value":8},"value":"ripple"}],"lang":[{"id":{"@type":"g:Int64","@value":9},"value":"java"}]}}
{"id":{"@type":"g:Int32","@value":6},"label":"person","outE":{"created":[{"id":{"@type":"g:Int32","@value":12},"inV":{"@type":"g:Int32","@value":3},"properties":{"weight":{"@type":"g:Double","@value":0.2}}}]},"properties":{"name":[{"id":{"@type":"g:Int64","@value":10},"value":"peter"}],"age":[{"id":{"@type":"g:Int64","@value":11},"value":{"@type":"g:Int32","@value":35}}]}}

I don't understand two things, can anyone help me understand them?

- why do I need an "outE" *and* "inE" definition for the same edge? Why can't I just define one or the other? If I define both, the edge is created when importing the file, otherwise if I only use "outE" the edge is not created

- why is everything given an id? Including edges and properties (for example "properties":{"name":[{"id":{"@type":"g:Int64","@value":0},"value":"marko"}). Removing all the "id" except for nodes IDs seems to work fine

Stephen Mallette

unread,
Sep 2, 2021, 7:02:34 AM9/2/21
to gremli...@googlegroups.com
>  why do I need an "outE" *and* "inE" definition for the same edge? Why can't I just define one or the other? If I define both, the edge is created when importing the file, otherwise if I only use "outE" the edge is not created

I believe that the reason for including both has to do with the need for maintaining a complete star graph for OLAP. GraphSONWriter has a finer degree of control for that if instead of writeGraph() you choose writeVertices(OutputStream, Iterator<Vertex>, Direction) where that last argument lets you do Direction.IN/OUT rather than BOTH. Each is still valid GraphSON from a format perspective. It's more a question of what your application needs from the GraphSON in order to operate properly. So that much is up to you.

> - why is everything given an id? Including edges and properties (for example "properties":{"name":[{"id":{"@type":"g:Int64","@value":0},"value":"marko"}). Removing all the "id" except for nodes IDs seems to work fine

Every Element object (Vertex, Edge, VertexProperty) has an ID. The ID is a primary key for the Element and all graph providers allow for such a value allowing fast lookup as in g.V(id) or g.E(id). While many graphs generate that ID for you, some graphs allow ID assignment by the user, therefore having an "id" in the GraphSON would allow that to happen. The GraphSONReader will try id assignment only if the graph allows it. If it does not, the GraphSONReader lets the graph generate the id. 

You found that "id" is required for vertices and assuming your graph generates ids, it should be fine to produce GraphSON that doesn't provide an "id" for edges or vertex properties. Obviously, you must have vertex ids or else the GraphSONReader won't understand how the vertices connect to one another in your edges. If you are fine to let the graph generate the ids for everything else, then that's perfectly fine.




--
You received this message because you are subscribed to the Google Groups "Gremlin-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gremlin-user...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gremlin-users/trinity-f2fdf87e-5cfd-4fd3-9f02-a26f6dbeb79d-1630564061620%403c-app-mailcom-lxa14.

Laura Morales

unread,
Sep 2, 2021, 7:45:40 AM9/2/21
to gremli...@googlegroups.com, gremli...@googlegroups.com
> I believe that the reason for including both has to do with the need for maintaining a complete star graph for OLAP. GraphSONWriter has a finer degree of control for that if instead of writeGraph() you choose writeVertices(OutputStream, Iterator<Vertex>, Direction) where that last argument lets you do Direction.IN/OUT rather than BOTH. Each is still valid GraphSON from a format perspective. It's more a question of what your application needs from the GraphSON in order to operate properly. So that much is up to you.

I'm coming from JanusGraph and my problem is not so much with writing a GraphSON file, but rather how to import one. I'm creating a GraphSON file from another process, one node per line, and I would just like to load this file into Janus. In my use case I'm appending all the nodes from several processes into one GraphSON file, and then I load it into Janus like this

graph = JanusGraphFactory.open("graph.properties")
graph.io(graphson()).readGraph("data.graphson")
graph.tx().commit()

it works great except for the fact that for GraphSON to work I'm required to find all the "inE" of a node, and this is very expensive because I need to maintain some kind of index with all the edges, and then add the "inE" to all the nodes. It would be immensely simpler if it were possible to only use "outE" instead of both. Curiosly, the GraphSON file seems to be loaded fine (with all vertexes and edges) when using only "inE" and no "outE". But using only "outE" and no "inE", edges are not created. It would also work for me if it were possible to split nodes and edges in two files, then load the nodes first and the edges after that.
Any suggestion if this is possible? Really appreciate any help!

Stephen Mallette

unread,
Sep 2, 2021, 8:16:09 AM9/2/21
to gremli...@googlegroups.com
Just as I suggested using the GraphSONWriter directly to call writeVertices(), you can do the same with GraphSONReader and readVertices() (or various other methods). You can specify the Direction you want the GraphSONReader to access. When you do 

graph.io(graphson()).readGraph("data.graphson")

it's basically calling readGraph() which uses the default of Direction.IN which is why it requires the inE as you are seeing.



--
You received this message because you are subscribed to the Google Groups "Gremlin-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gremlin-user...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages