here is my bulk load code in gremlin console
:load data/grateful-dead-janusgraph-schema.groovy
graph = JanusGraphFactory.open('conf/janusgraph-hbase-es-grateful.properties')
defineGratefulDeadSchema(graph)
graph.close()
conf = new PropertiesConfiguration('conf/hadoop-graph/hadoop-graphson.properties')
conf.setProperty('spark.yarn.jars', 'hdfs:///user/janusgraph/spark-lib/*')
graph = GraphFactory.open(conf)
blvp = BulkLoaderVertexProgram.build().bulkLoader(OneTimeBulkLoader).writeGraph('conf/janusgraph-hbase-es-grateful.properties').create(graph)
graph.compute(SparkGraphComputer).program(blvp).submit().get()
"conf/janusgraph-hbase-es-grateful.properties" is:
gremlin.graph=org.janusgraph.core.JanusGraphFactory
storage.backend=hbase
storage.batch-loading=true
storage.hostname=xxx
storage.hbase.table=test
storage.hbase.region-count=50
storage.write-time=1000000
storage.buffer-size=10240
storage.hbase.compression-algorithm=SNAPPY
index.search.backend=elasticsearch
index.search.hostname=xxx
ids.block-size=30000000
ids.authority.wait-time=3600000
ids.renew-timeout=3600000
"conf/hadoop-graph/hadoop-graphson.properties" is:
gremlin.graph=org.apache.tinkerpop.gremlin.hadoop.structure.HadoopGraph
gremlin.hadoop.graphReader=org.apache.tinkerpop.gremlin.hadoop.structure.io.graphson.GraphSONInputFormat
gremlin.hadoop.graphWriter=org.apache.tinkerpop.gremlin.hadoop.structure.io.graphson.GraphSONOutputFormat
gremlin.hadoop.inputLocation=~/janusgraph_data_2
gremlin.hadoop.outputLocation=output
gremlin.hadoop.jarsInDistributedCache=true
gremlin.vertexProgram=org.apache.tinkerpop.gremlin.process.computer.ranking.pagerank.PageRankVertexProgram
spark.master=yarn
spark.submit.deployMode=client
spark.executor.memory=15g
spark.executor.instances=35
spark.executor.cores=2
spark.serializer=org.apache.spark.serializer.KryoSerializer
spark.yarn.am.extraJavaOptions=-Djava.library.path=/opt/cloudera/parcels/CDH/lib/hadoop/lib/native
spark.executor.extraJavaOptions=-Djava.library.path=/opt/cloudera/parcels/CDH/lib/hadoop/lib/native
hadoop.home.dir=/opt/cloudera/parcels/CDH/lib/hadoop/
spark.network.timeout=2400s
spark.shuffle.file.buffer=512k
spark.files.fetchTimeout=600s
spark.shuffle.io.retryWait=20s
spark.shuffle.io.maxRetries=10
spark.reducer.maxSizeInFlight=96m
spark.rpc.numRetries=10
spark.rpc.retry.wait=10s
在 2019年8月6日星期二 UTC+8下午10:57:33,Abhay Pandit写道: