I am using Titan 1.0.0 with Cassandra backend and Elasticsearch for indexing. Our graph currently has 100 million nodes and 170 million edges with a replication factor of 2 and distributed across a cluster of 3 cassandra nodes. Now, I want to add a property to about 30 million existing edges. I have extracted the Edge ID and Property value in a text file.
We currently use a scala script to loop through all the edges, retrieving each edge and adding the property using the below command. In a single transaction, 1000 edges are updated, after which a new transaction is opened.
while (write)
{
val lineJustFetched: String = buf.readLine() //To read a line from the input file
if (lineJustFetched == null)
{
write = false
}
else
{
if (count % 1000 == 0) //In each transaction, we update 1000 edges
{
tx.commit()
tx = graphDb.newTransaction()
}
val properties: Array[String] = lineJustFetched.stripPrefix("(").stripSuffix(")").split(",")
tx.traversal().E().hasLabel("undertakes").has("undertakesId", properties(0)).property("startTimestamp", properties(1)).iterate()
count = count + 1
}
}
Using the above code we are able to update about 1 million edges per day, which is way below our required speed. Is there a better way to do this.