Add property to all vertices and edges

141 views
Skip to first unread message

Peter Storm

unread,
Jan 22, 2016, 9:23:26 AM1/22/16
to Aurelius
Hi,

we want to allow users to insert vertices and edges in our graph so that only they can access those vertices and edges. At the same time all users should be able to access the data that we generate during our automatic processing. For this reason, we would like to add a property UserIDs to each vertex and each edge with a default UserID that belongs to our automatic processing. (All data so far was inserted by our automatic processing, so we don't have to filter here.)
For a test graph with 100 k vertices and 200 k edges we could simply accomplish this with the following query:
uids = new int[1]
uids
[0] = 0
g
.V().property('UserIDs', uids).outE().property('UserIDs', uids).iterate()

But for our production environment, the query failed after about an hour with this exception:
org.apache.thrift.transport.TTransportException: Frame size (152315166) larger than max length (15728640)!

I assume this is related to: https://groups.google.com/forum/#!topic/aureliusgraphs/WOUNNKf6Q8c and the outE() lets Titan return all edges for the current vertex which can get pretty huge for super nodes. So, is there a way to iterate over all vertices and edges to add a property to each without actually retrieving all vertices and edges or at least only retrieve one vertex / edge at a time?

BTW: Does the exception really mean that we have a super node that has a size of 150 MB?

Regards,

Stephen Mallette

unread,
Jan 26, 2016, 7:20:30 AM1/26/16
to Aurelius
how big is your graph?  if it's into the tens of millions of edges, you probably shouldn't be trying to iterate all edges this way.  I think you'd want to execute some form of OLAP based job to do this.  Of course, you might still hit that problem with the frame size error in which case you just need to bump these settings:

cassandra.thrift.framed.size_mb
cassandra.thrift.message.max_size_mb



--
You received this message because you are subscribed to the Google Groups "Aurelius" group.
To unsubscribe from this group and stop receiving emails from it, send an email to aureliusgraph...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/aureliusgraphs/f096cfa2-f23b-436f-92a5-b8d47e3d86a0%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Peter Storm

unread,
Jan 26, 2016, 7:28:22 AM1/26/16
to Aurelius
It has probably around 20 million vertices and 30 million edges at the moment and the graph is stored on 5 nodes.

So it is not possible to add a property to each edge without returning all edges of a vertex? Is that because all edges of a vertex are stored in one row in Cassandra?

But I will try it again with Hadoop and higher limits.

Stephen Mallette

unread,
Jan 26, 2016, 7:53:47 AM1/26/16
to Aurelius
My memory on this is fuzzy for some reason, but I seem to remember having success reading less data if i used additional filters on my traversals to limit the edge data coming back.  for instance, does it help if you batch your work by labels? in other words instead of trying to do all edges at once, do:

g.V().outE('knows').property(....
g.V().outE('created').property(....

you iterate V again and again for each label you have, but if you are intent on doing it oltp style that might work.

Peter Storm

unread,
Jan 28, 2016, 10:11:03 AM1/28/16
to Aurelius
I just tried it with Spark but I just get the following error:
The following step is currently not supported by GraphComputer traversals: AddPropertyStep

Here is what I did:
gremlin> graph = GraphFactory.open('conf/hadoop-graph/read-cassandra.properties')
==>hadoopgraph[cassandrainputformat->nulloutputformat]
gremlin
> g = graph.traversal(computer(SparkGraphComputer))
==>graphtraversalsource[hadoopgraph[cassandrainputformat->gryooutputformat], sparkgraphcomputer]
gremlin
> g.V().property('UserIDs',uids).outE().property('UserIDs',uids).iterate()

Maybe you guys could make a tutorial for Gremlin-Hadoop that explains how to read in data from a graph stored in Cassandra, manipulate it and then write the results back to Cassandra. I know that some (or even most) of the necessary steps for this are explained somewhere in the TinkerPop or Titan documentation, but most of the examples there concentrate on importing or exporting of data.
Reply all
Reply to author
Forward
0 new messages