Edge creation really slow

89 views
Skip to first unread message

Raj

unread,
Mar 14, 2017, 3:17:16 AM3/14/17
to OrientDB

I am using Tinkerpop Blueprints to create a OrientDB graph for a dataset with millions of nodes and 100M edges on a laptop with 16gb memory, 64-bit Ubuntu, 64-bit jvm.


Here are the results from our benchmarking – the vertices get added fine but each edge addition takes almost a second.































Can you suggest what we are not doing right wrt to the edge creation and how we can get it to improve to a more reasonable latency?



Here's the code associated with the above table:






package orientdbtest

import com.orientechnologies.orient.core.metadata.schema.OType
import com.tinkerpop.blueprints.{Direction, Edge, Vertex}
import com.tinkerpop.blueprints.impls.orient._
import com.orientechnologies.orient.core.intent.OIntentMassiveInsert

object Example {
 
 
def addRandomVertex(graph: OrientGraphNoTx, uuid: String) : Vertex = graph.addVertex("class:Random", "uuid", uuid)
   
 
def addRandomEdge(source: Vertex, target: Vertex, id: String) = graph.addEdge(null, source, target, id)
 
def createRandomNetworkDatabase(graph: OrientGraphNoTx, numNodes: Int, numEdges: Int, useLightWeightEdges: Boolean):(Long, Long) = {  
   
   
if (graph.getVertexType("Random") == null) {
        println
("Creating random type")
        val random_vertex_type
: OrientVertexType = graph.createVertexType("Random")
        random_vertex_type
.createProperty("id", OType.STRING)
     
} else {
        println
("Random type exists")
     
}

    val timeStartNodeCreation
= System.currentTimeMillis()
   
    val nodeList
: List[Vertex] = Range(0,numNodes).toList.map(x => addRandomVertex(graph, x.toString()))
   
    val timeEndNodeCreation
= System.currentTimeMillis()
   
    println
("Time to create " + numNodes + " is " + (timeEndNodeCreation-timeStartNodeCreation))

    val nodeListFirstHalf
= nodeList.slice(0,nodeList.length/2)
    val nodeListSecondHalf
= nodeList.slice(nodeList.length/2+1,nodeList.length)
   
   
var edgeID = 1
   
if(useLightWeightEdges) graph.setUseLightweightEdges(true)
    val timeStartEdgeCreation
= System.currentTimeMillis()
   
// createEdges from first half to the second half

    nodeListFirstHalf
.foreach(sourceVertex => {
       nodeListSecondHalf
.foreach(targetVertex => {
         
while(edgeID < numEdges)
         
{
             addRandomEdge
(sourceVertex, targetVertex, edgeID.toString())
             edgeID
= edgeID +1
             graph
.commit()
         
}
         
})
   
})
   
    val timeEndEdgeCreation
= System.currentTimeMillis()
    println
("Time to create " + edgeID + " is " + (timeEndEdgeCreation-timeStartEdgeCreation))
   
(0L, 0L)
 
}
 
 
def main(args: Array[String]): Unit = {

    val numNodes
= 10
    val numEdges
= 25
    val useLightWeightEdges
= false

    val uri
: String = "plocal:target/database/random_sample_" + numNodes + "_" + numEdges + useLightWeightEdges.toString()
    val graph
: OrientGraphNoTx = new OrientGraphNoTx(uri)

    graph
.setKeepInMemoryReferences(false);
    graph
.getRawGraph().getLocalCache().setEnable(false)
    graph
.declareIntent(new OIntentMassiveInsert())
   
try {
      createRandomNetworkDatabase
(graph, numNodes, numEdges, useLightWeightEdges)
      graph
.declareIntent(null)
   
} finally {
      graph
.shutdown()
   
}
    println
("Adios")
 
}
}



Raj

unread,
Mar 18, 2017, 2:30:20 AM3/18/17
to OrientDB
Can someone at OrientDB kindly help to resolve this issue? We chose OrientDB for a client project but without getting past this simple issue we wont be able to continue with OrientDB and have to chose some other solution.

Thanks!
Raj

Christian MICHON

unread,
Mar 18, 2017, 2:57:41 PM3/18/17
to OrientDB
Maybe you're calling too often graph.commit() ?

Have you tried using an ETL ?

s.l...@orientdb.com

unread,
Mar 19, 2017, 11:26:16 AM3/19/17
to OrientDB
Hi,

Thanks for considering OrientDB.  I am able to create 100K vertices and 1M edges in about 9 minutes using version 2.2.17 while OrientDB's Write Ahead Log (WAL) is enabled.  I can get better numbers with WAL disabled, but I wouldn't suggest you to disable WAL unless you know why you are doing that, and what consequences you may get, e.g. in case of a crash.

Please don't take this as an official benchmark. This is on my personal laptop with 32 gb of RAM, using the java code below, a bit modified from your code. I set -XX:MaxDirectMemorySize=32585m when running the program. Setting this variable is mandatory (please adapt its value to your own situation). 

javac CreateEdge.java

java -XX:MaxDirectMemorySize=32585m OrientDBExamples.CreateEdge
mar 19, 2017 3:43:42 PM com.orientechnologies.common.log.OLogManager log
INFORMAZIONI: OrientDB auto-config DISKCACHE=23.294MB (heap=7.243MB direct=32.585MB os=32.585MB)
Creating random type
Time to create 100 vertices is 0.04 seconds
Time to create 1000 edges is 0.511 seconds
Adios

javac CreateEdge.java

java -XX:MaxDirectMemorySize=32585m OrientDBExamples.CreateEdge
mar 19, 2017 3:44:10 PM com.orientechnologies.common.log.OLogManager log
INFORMAZIONI: OrientDB auto-config DISKCACHE=23.294MB (heap=7.243MB direct=32.585MB os=32.585MB)
Creating random type
Time to create 1000 vertices is 0.202 seconds
Time to create 10000 edges is 2.42 seconds
Adios

javac CreateEdge.java

java -XX:MaxDirectMemorySize=32585m OrientDBExamples.CreateEdge
mar 19, 2017 3:44:30 PM com.orientechnologies.common.log.OLogManager log
INFORMAZIONI: OrientDB auto-config DISKCACHE=23.294MB (heap=7.243MB direct=32.585MB os=32.585MB)
Creating random type
Time to create 10000 vertices is 0.758 seconds
Time to create 100000 edges is 50.626 seconds
Adios

javac CreateEdge.java

java -XX:MaxDirectMemorySize=32585m OrientDBExamples.CreateEdge
mar 19, 2017 3:45:43 PM com.orientechnologies.common.log.OLogManager log
INFORMAZIONI: OrientDB auto-config DISKCACHE=23.294MB (heap=7.243MB direct=32.585MB os=32.585MB)
Creating random type
Time to create 100000 vertices is 3.629 seconds
Time to create 1000000 edges is 554.207 seconds
Adios


Java Code - a bit modified from the code you posted.  Please don't take this as "perfect" code. I didn't invest too much time on it - this is just an example:

package OrientDBExamples;

import com.orientechnologies.orient.core.metadata.schema.OType;
import com.tinkerpop.blueprints.Direction;
import com.tinkerpop.blueprints.Edge;
import com.tinkerpop.blueprints.Vertex;
 
import com.tinkerpop.blueprints.impls.orient.*;
import com.orientechnologies.orient.core.intent.OIntentMassiveInsert;

import java.util.ArrayList;
import java.util.List;

public class CreateEdge {
  
  private static Vertex addRandomVertex(OrientGraphNoTx graph, int id) {
Vertex myVertex = graph.addVertex("class:Random","PropertyId", id);
//System.out.println(" Created: " + myVertex);
return myVertex;
  }
 
  private static void addRandomEdge(OrientGraphNoTx graph, Vertex source, Vertex target) {
 Edge myEdge = graph.addEdge(null, source, target, "EdgeClass");  
 //System.out.println(" Created: " + myEdge);
  }
  
  private static void createRandomNetworkDatabase(OrientGraphNoTx graph, int numNodes, int numEdges, boolean useLightWeightEdges){   
    
    if (graph.getVertexType("Random") == null) {
        System.out.println("Creating random type");
        OrientVertexType random_vertex_type =  graph.createVertexType("Random");
 
graph.getRawGraph().getMetadata().getSchema().getClass("Random").createProperty("PropertyId", OType.LONG);
      } else {
        System.out.println("Random type exists");
      }

double timeStartNodeCreation = System.currentTimeMillis();
List<Vertex> nodeList = new ArrayList<Vertex>();
  
for(int j=1; j<=numNodes; j++){
nodeList.add(addRandomVertex(graph, j)); 
}
double timeEndNodeCreation = System.currentTimeMillis();
    
    System.out.println("Time to create " + numNodes + " vertices is " + (timeEndNodeCreation-timeStartNodeCreation)/1000 + " seconds");
List<Vertex> nodeListFirstHalf = nodeList.subList(0, numNodes/2);

    List<Vertex> nodeListSecondHalf = nodeList.subList((numNodes/2),numNodes);
double timeStartEdgeCreation = System.currentTimeMillis();
int edgeID = 0;
while(edgeID < numEdges){
for(int j=0; j<=numNodes/2 -1; j++){ 
addRandomEdge(graph, nodeListFirstHalf.get(j), nodeListSecondHalf.get(j));
edgeID = edgeID +1;
}
}
    
    double timeEndEdgeCreation = System.currentTimeMillis();
    System.out.println("Time to create " + edgeID + " edges is " + (timeEndEdgeCreation-timeStartEdgeCreation)/1000 + " seconds");
   
   
  }
  
  public static void main(String[] args) {

    int numNodes = 100000;
    int numEdges = 1000000;
    boolean useLightWeightEdges = false;

    String uri = "plocal:random_sample_" + numNodes + "_" + numEdges + useLightWeightEdges;
    OrientGraphNoTx graph = new OrientGraphNoTx(uri);

    graph.setKeepInMemoryReferences(false);
    graph.getRawGraph().getLocalCache().setEnable(false);
    graph.declareIntent(new OIntentMassiveInsert());
    try {
      createRandomNetworkDatabase(graph, numNodes, numEdges, useLightWeightEdges);
      graph.declareIntent(null);
    } finally {
      graph.shutdown();
    }
    System.out.println("Adios");
  }
}

This is the graph I got for the case 100 vertices and 1000 edges:

 
  
Note that there are some Support options available from OrientDB Ltd, that can help you in case you are in need of fast replies or assistance while developing your application. There are also production support options. 

Hope this helps,
Reply all
Reply to author
Forward
0 new messages