I need to load a very large quantity of data that represents a social network graph mapped in csv files.
By now I created a java program that creates the schema and load verticies and edges using gremlin.
The problem is that this method is very slow.
Is there a way to perform bulk loading into Hbase in order to significantly reduce the loading times?
We are having similar issues with performance loading graph data into Janus backed by HBase. I agree with Jason, we didn't have any issues with doing all the mgmt calls in one go.
One thing that we did was to multi-thread the java code which certainly helped performance. HBase seems to respond well to multiple calls at once. For example, in your loadVerticies method, you may want to make a thread inside the main for loop and give it a bank of maybe 32 threads (depends on the machine your're running on). I use the Java ExecutorService - like:
ExecutorService doWork=
Executors.newFixedThreadPool(MAX_WORK_CALLS);
Semaphore smDoWork= new Semaphore(MAX_WORK_CALLS);
try {
smDoWork.acquire();
} catch (InterruptedException ex) {
log.error("Interrupt: " + ex);
}
someThread= new doJanusStuff(this);
doWork.execute(someThread);
Just make to release the semaphore when the thread is completed.
All that said, performance was then limited by the one machine
doing the ingesting, and still seemed slower than one would
expect. In our case to generate a 154 million node and ~275
million edge graph took 3 days on a 5 node Hadoop cluster.
-Joe
--
You received this message because you are subscribed to the Google Groups "JanusGraph users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to janusgraph-use...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-users/bb4a6e00-b069-4c5b-a87c-77580decde75%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Time needed for loading schema into the graph in milliseconds: 94592
Time needed for loading data into the graph in milliseconds: 4587774
Time needed for loading vertices into the graph in milliseconds: 302718
Time needed for loading properties into the graph in milliseconds: 13071
Time needed for loading edges into the graph in milliseconds: 4271985
Total duration in milliseconds: 4682366
Time Elapsed for loading schema into the graph: 000h.01m.34s
Time Elapsed for loading data into the graph: 001h.16m.27s
Total duration: 001h.18m.2s
vertices: 3181724, edges: 17436661
I've made an open source Java library and created a separate repository.
Check it out at https://github.com/mpolonioli/janusgraph-csv-importer
--
You received this message because you are subscribed to the Google Groups "JanusGraph users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to janusgraph-users+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-users/d5ec8fcb-617d-4b06-a6f1-5b15677fc914%40googlegroups.com.