Is the batch-loading=true valid for HBase underlying DB?

61 views
Skip to first unread message

杨荣

unread,
Aug 6, 2019, 8:42:48 AM8/6/19
to JanusGraph users
My underlying DB of my JanusGraph is HBase. The speed of bulk load is really slow. I know there are many factors effecting it, but I can not find the exact reason (I am trying). And I heard the batch-loading is only supporting for Cassandra. So is it true? Or it just exists in old version?

Abhay Pandit

unread,
Aug 6, 2019, 10:57:33 AM8/6/19
to janusgra...@googlegroups.com
Hi,

Can you show what all configurations are you using for batch-loading?

Thanks,
Abhay

On Tue, 6 Aug 2019 at 18:12, 杨荣 <samyan...@gmail.com> wrote:
My underlying DB of my JanusGraph is HBase. The speed of bulk load is really slow. I know there are many factors effecting it, but I can not find the exact reason (I am trying). And I heard the batch-loading is only supporting for Cassandra. So is it true? Or it just exists in old version?

--
You received this message because you are subscribed to the Google Groups "JanusGraph users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to janusgraph-use...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-users/6a64b157-8d5a-486f-93f3-4d5699a83ed1%40googlegroups.com.

杨荣

unread,
Aug 7, 2019, 3:52:55 AM8/7/19
to JanusGraph users
here is my bulk load code in gremlin console

:load data/grateful-dead-janusgraph-schema.groovy

graph = JanusGraphFactory.open('conf/janusgraph-hbase-es-grateful.properties')
defineGratefulDeadSchema(graph)
graph.close()

conf = new PropertiesConfiguration('conf/hadoop-graph/hadoop-graphson.properties')
conf.setProperty('spark.yarn.jars', 'hdfs:///user/janusgraph/spark-lib/*')
graph = GraphFactory.open(conf)
blvp = BulkLoaderVertexProgram.build().bulkLoader(OneTimeBulkLoader).writeGraph('conf/janusgraph-hbase-es-grateful.properties').create(graph)
graph.compute(SparkGraphComputer).program(blvp).submit().get()

"conf/janusgraph-hbase-es-grateful.properties" is:

gremlin.graph=org.janusgraph.core.JanusGraphFactory

storage.backend=hbase
storage.batch-loading=true
storage.hostname=xxx
storage.hbase.table=test
storage.hbase.region-count=50
storage.write-time=1000000
storage.buffer-size=10240
storage.hbase.compression-algorithm=SNAPPY

index.search.backend=elasticsearch
index.search.hostname=xxx

ids.block-size=30000000
ids.authority.wait-time=3600000
ids.renew-timeout=3600000

"conf/hadoop-graph/hadoop-graphson.properties" is:

gremlin.graph=org.apache.tinkerpop.gremlin.hadoop.structure.HadoopGraph

gremlin.hadoop.graphReader=org.apache.tinkerpop.gremlin.hadoop.structure.io.graphson.GraphSONInputFormat
gremlin.hadoop.graphWriter=org.apache.tinkerpop.gremlin.hadoop.structure.io.graphson.GraphSONOutputFormat
gremlin.hadoop.inputLocation=~/janusgraph_data_2
gremlin.hadoop.outputLocation=output
gremlin.hadoop.jarsInDistributedCache=true

gremlin.vertexProgram=org.apache.tinkerpop.gremlin.process.computer.ranking.pagerank.PageRankVertexProgram

spark.master=yarn
spark.submit.deployMode=client
spark.executor.memory=15g
spark.executor.instances=35
spark.executor.cores=2
spark.serializer=org.apache.spark.serializer.KryoSerializer
spark.yarn.am.extraJavaOptions=-Djava.library.path=/opt/cloudera/parcels/CDH/lib/hadoop/lib/native
spark.executor.extraJavaOptions=-Djava.library.path=/opt/cloudera/parcels/CDH/lib/hadoop/lib/native

hadoop.home.dir=/opt/cloudera/parcels/CDH/lib/hadoop/

spark.network.timeout=2400s
spark.shuffle.file.buffer=512k
spark.files.fetchTimeout=600s
spark.shuffle.io.retryWait=20s
spark.shuffle.io.maxRetries=10
spark.reducer.maxSizeInFlight=96m
spark.rpc.numRetries=10
spark.rpc.retry.wait=10s


在 2019年8月6日星期二 UTC+8下午10:57:33,Abhay Pandit写道:
Hi,

Can you show what all configurations are you using for batch-loading?

Thanks,
Abhay

On Tue, 6 Aug 2019 at 18:12, 杨荣 <samyan...@gmail.com> wrote:
My underlying DB of my JanusGraph is HBase. The speed of bulk load is really slow. I know there are many factors effecting it, but I can not find the exact reason (I am trying). And I heard the batch-loading is only supporting for Cassandra. So is it true? Or it just exists in old version?

--
You received this message because you are subscribed to the Google Groups "JanusGraph users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to janusgra...@googlegroups.com.

杨荣

unread,
Aug 7, 2019, 9:09:56 PM8/7/19
to JanusGraph users
hi, I have already post my code, could give me some advice? thanks advance.


在 2019年8月6日星期二 UTC+8下午10:57:33,Abhay Pandit写道:
Hi,


Can you show what all configurations are you using for batch-loading?

Thanks,
Abhay
On Tue, 6 Aug 2019 at 18:12, 杨荣 <samyan...@gmail.com> wrote:
My underlying DB of my JanusGraph is HBase. The speed of bulk load is really slow. I know there are many factors effecting it, but I can not find the exact reason (I am trying). And I heard the batch-loading is only supporting for Cassandra. So is it true? Or it just exists in old version?

--
You received this message because you are subscribed to the Google Groups "JanusGraph users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to janusgra...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages