Bulk Update of Existing Edges

49 views
Skip to first unread message

sambhajic...@gmail.com

unread,
Apr 4, 2017, 7:51:07 AM4/4/17
to Aurelius
I am using Titan 1.0.0 with Cassandra backend and Elasticsearch for indexing. Our graph currently has 100 million nodes and 170 million edges with a replication factor of 2 and distributed across a cluster of 3 cassandra nodes. Now, I want to add a property to about 30 million existing edges. I have extracted the Edge ID and Property value in a text file.

We currently use a scala script to loop through all the edges, retrieving each edge and adding the property using the below command. In a single transaction, 1000 edges are updated, after which a new transaction is opened.

        while (write)
        {
            val lineJustFetched: String = buf.readLine()  //To read a line from the input file
            if (lineJustFetched == null)
            {
                write = false
            }
            else
            {
                if (count % 1000 == 0) //In each transaction, we update 1000 edges
                {
                    tx.commit()
                    tx = graphDb.newTransaction()
                }

                val properties: Array[String] = lineJustFetched.stripPrefix("(").stripSuffix(")").split(",")

                tx.traversal().E().hasLabel("undertakes").has("undertakesId", properties(0)).property("startTimestamp", properties(1)).iterate()
                count = count + 1
            }
        }

Using the above code we are able to update about 1 million edges per day, which is way below our required speed. Is there a better way to do this.

Daniel Kuppitz

unread,
Apr 4, 2017, 8:46:16 AM4/4/17
to aureliu...@googlegroups.com
It would be much faster if you could also provide the out-vertex id in your input file. Furthermore you should create a single traversal source and reuse it.

if (count % 1000 == 0) {
  tx.commit()
  tx = graphDb.newTransaction()
  txg = tx.traversal()
}
...
txg.V(vertexId).outE("undertakes").has("undertakesId", properties(0)).property("startTimestamp", properties(1)).iterate()

Also, if you have a vertex-centric index on undertakesId, it would be as fast as possible.

Cheers,
Daniel


--
You received this message because you are subscribed to the Google Groups "Aurelius" group.
To unsubscribe from this group and stop receiving emails from it, send an email to aureliusgraphs+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/aureliusgraphs/16ad4a57-d1ed-43f4-95c6-4efb2dbc216f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Samik R

unread,
Apr 5, 2017, 2:57:46 AM4/5/17
to Aurelius
Hi Daniel,
What does the iterate() command do here? Does it just execute the preceding statements? Would next() have also worked?
I didn't see much comment on the [http://tinkerpop.apache.org/docs/3.0.1-incubating/] page.
Regards,
-Samik
To unsubscribe from this group and stop receiving emails from it, send an email to aureliusgraph...@googlegroups.com.

Daniel Kuppitz

unread,
Apr 5, 2017, 7:03:43 AM4/5/17
to aureliu...@googlegroups.com
iterate: execute, expect no result / ignore result
next: execute, expect one result
toList: execute, expect many results

Cheers,
Daniel


To unsubscribe from this group and stop receiving emails from it, send an email to aureliusgraphs+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/aureliusgraphs/b4e3327a-e767-40c4-ba5e-90139fcc0f9d%40googlegroups.com.

Samik Raychaudhuri

unread,
Apr 5, 2017, 8:12:10 AM4/5/17
to aureliu...@googlegroups.com
Thanks Daniel.

Samik R

unread,
Apr 6, 2017, 4:21:33 AM4/6/17
to Aurelius
Hi Daniel,
Coming back to the original post: can this also be done using OLAP, e.g., using SparkGraphComputer and CassandraInputFormat? Would that be faster for the size of graph that is mentioned?  (I am from the same team as OP)
Regards.

-Samik


On Tuesday, April 4, 2017 at 6:16:16 PM UTC+5:30, Daniel Kuppitz wrote:
To unsubscribe from this group and stop receiving emails from it, send an email to aureliusgraph...@googlegroups.com.

Daniel Kuppitz

unread,
Apr 6, 2017, 6:59:55 AM4/6/17
to aureliu...@googlegroups.com
Out of the box OLAP jobs are not able to mutate the graph. You can write a custom VertexProgram to do that though (which is most likely always the best thing to do for global graph operations).
The input for your VP would be the same file that you're currently reading line by line and the input format would be ScriptInputFormat.

However, it may sound pretty easy to do it this way, but you should plan to spend some time on this approach, if you've never written a VP before.

Cheers,
Daniel


To unsubscribe from this group and stop receiving emails from it, send an email to aureliusgraphs+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/aureliusgraphs/7f8a258d-7d2a-46bc-a46e-718deab64239%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages