Is the batch-loading=true valid for HBase underlying DB?

杨荣

unread,

Aug 6, 2019, 8:42:48 AM8/6/19

to JanusGraph users

My underlying DB of my JanusGraph is HBase. The speed of bulk load is really slow. I know there are many factors effecting it, but I can not find the exact reason (I am trying). And I heard the batch-loading is only supporting for Cassandra. So is it true? Or it just exists in old version?

Abhay Pandit

unread,

Aug 6, 2019, 10:57:33 AM8/6/19

to janusgra...@googlegroups.com

Hi,

Can you show what all configurations are you using for batch-loading?

Thanks,
Abhay

On Tue, 6 Aug 2019 at 18:12, 杨荣 <samyan...@gmail.com> wrote:

My underlying DB of my JanusGraph is HBase. The speed of bulk load is really slow. I know there are many factors effecting it, but I can not find the exact reason (I am trying). And I heard the batch-loading is only supporting for Cassandra. So is it true? Or it just exists in old version?

--
You received this message because you are subscribed to the Google Groups "JanusGraph users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to janusgraph-use...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-users/6a64b157-8d5a-486f-93f3-4d5699a83ed1%40googlegroups.com.

杨荣

unread,

Aug 7, 2019, 3:52:55 AM8/7/19

to JanusGraph users

here is my bulk load code in gremlin console

:load data/grateful-dead-janusgraph-schema.groovy

graph = JanusGraphFactory.open('conf/janusgraph-hbase-es-grateful.properties')

defineGratefulDeadSchema(graph)

graph.close()

conf = new PropertiesConfiguration('conf/hadoop-graph/hadoop-graphson.properties')

conf.setProperty('spark.yarn.jars', 'hdfs:///user/janusgraph/spark-lib/*')

graph = GraphFactory.open(conf)

blvp = BulkLoaderVertexProgram.build().bulkLoader(OneTimeBulkLoader).writeGraph('conf/janusgraph-hbase-es-grateful.properties').create(graph)

graph.compute(SparkGraphComputer).program(blvp).submit().get()

"conf/janusgraph-hbase-es-grateful.properties" is:

gremlin.graph=org.janusgraph.core.JanusGraphFactory

storage.backend=hbase

storage.batch-loading=true

storage.hostname=xxx

storage.hbase.table=test

storage.hbase.region-count=50

storage.write-time=1000000

storage.buffer-size=10240

storage.hbase.compression-algorithm=SNAPPY

index.search.backend=elasticsearch

index.search.hostname=xxx

ids.block-size=30000000

ids.authority.wait-time=3600000

ids.renew-timeout=3600000

"conf/hadoop-graph/hadoop-graphson.properties" is:

gremlin.graph=org.apache.tinkerpop.gremlin.hadoop.structure.HadoopGraph

gremlin.hadoop.graphReader=org.apache.tinkerpop.gremlin.hadoop.structure.io.graphson.GraphSONInputFormat

gremlin.hadoop.graphWriter=org.apache.tinkerpop.gremlin.hadoop.structure.io.graphson.GraphSONOutputFormat

gremlin.hadoop.inputLocation=~/janusgraph_data_2

gremlin.hadoop.outputLocation=output

gremlin.hadoop.jarsInDistributedCache=true

gremlin.vertexProgram=org.apache.tinkerpop.gremlin.process.computer.ranking.pagerank.PageRankVertexProgram

spark.master=yarn

spark.submit.deployMode=client

spark.executor.memory=15g

spark.executor.instances=35

spark.executor.cores=2

spark.serializer=org.apache.spark.serializer.KryoSerializer

spark.yarn.am.extraJavaOptions=-Djava.library.path=/opt/cloudera/parcels/CDH/lib/hadoop/lib/native

spark.executor.extraJavaOptions=-Djava.library.path=/opt/cloudera/parcels/CDH/lib/hadoop/lib/native

hadoop.home.dir=/opt/cloudera/parcels/CDH/lib/hadoop/

spark.network.timeout=2400s

spark.shuffle.file.buffer=512k

spark.files.fetchTimeout=600s

spark.shuffle.io.retryWait=20s

spark.shuffle.io.maxRetries=10

spark.reducer.maxSizeInFlight=96m

spark.rpc.numRetries=10

spark.rpc.retry.wait=10s

在 2019年8月6日星期二 UTC+8下午10:57:33，Abhay Pandit写道：

Hi,

Can you show what all configurations are you using for batch-loading?

Thanks,
Abhay

On Tue, 6 Aug 2019 at 18:12, 杨荣 <samyan...@gmail.com> wrote:

My underlying DB of my JanusGraph is HBase. The speed of bulk load is really slow. I know there are many factors effecting it, but I can not find the exact reason (I am trying). And I heard the batch-loading is only supporting for Cassandra. So is it true? Or it just exists in old version?

--
You received this message because you are subscribed to the Google Groups "JanusGraph users" group.

To unsubscribe from this group and stop receiving emails from it, send an email to janusgra...@googlegroups.com.

杨荣

unread,

Aug 7, 2019, 9:09:56 PM8/7/19

to JanusGraph users

hi, I have already post my code, could give me some advice? thanks advance.

在 2019年8月6日星期二 UTC+8下午10:57:33，Abhay Pandit写道：

Hi,

Can you show what all configurations are you using for batch-loading?

Thanks,
Abhay

On Tue, 6 Aug 2019 at 18:12, 杨荣 <samyan...@gmail.com> wrote:

My underlying DB of my JanusGraph is HBase. The speed of bulk load is really slow. I know there are many factors effecting it, but I can not find the exact reason (I am trying). And I heard the batch-loading is only supporting for Cassandra. So is it true? Or it just exists in old version?

--
You received this message because you are subscribed to the Google Groups "JanusGraph users" group.

To unsubscribe from this group and stop receiving emails from it, send an email to janusgra...@googlegroups.com.

Reply all

Reply to author

Forward