I have a JanusGraph database, backed by HBase. Into this database, I'm trying to load a number of GraphML files, which range in size from a few hundred KB to a couple of GB. I'm doing this with the following Java code:
graph.io(IoCore.graphml()).writer().create().writeGraph(new FileOutputStream(file), graph);
Where conf is loaded with the following configuration file:
gremlin.graph=org.janusgraph.core.JanusGraphFactory
storage.backend=hbase
storage.hostname=my_host
storage.hbase.table=my_table
cache.db-cache = true
cache.db-cache-clean-wait = 20
cache.db-cache-time = 180000
cache.db-cache-size = 0.5
index.search.backend=lucene
index.search.directory=data/index
Loading of the smaller files works, but if I try to load the large files I get out of memory errors if there's already data in the table. If I try loading a large file into an empty table, it seems to work (although I do have issues with HBase/ZooKeeper timeouts). To me, this feels like JanusGraph is trying to load the existing data into memory before ingesting the new data. Is this correct? And if so, is there anyway I can avoid it doing this or load my data in differently? If not, any ideas what is going on?
Thanks,
James