Hbase to Bigtable : Spark Job failing

Neeraj Verma

unread,

May 26, 2018, 7:21:31 PM5/26/18

to Google Cloud Dataproc Discussions

Hi ,

I have Hbase Spark job running at AWS EMR cluster. Recently we moved to GCP. I transferred all Hbase data to Bigtable. Now I am running same Spark - Java/Scala job in Data proc. Sprak job failing as it is looking spark.hbase.zookeeper.quorum setting . Please let me know without code change How I make my spark job run successfully with bigtable

Regards,

Neeraj Verma

Karthik Palaniappan

unread,

May 29, 2018, 12:31:40 AM5/29/18

to Google Cloud Dataproc Discussions

Hi Neeraj,

You need the Hbase client libraries for Bigtable: https://cloud.google.com/bigtable/docs/bigtable-and-hbase. Essentially, to your Spark job, Bigtable will appear as your Hbase.

Assuming you're using the Spark HBase connector, this is a good example to start from: https://github.com/GoogleCloudPlatform/cloud-bigtable-examples/tree/master/scala/bigtable-shc.

--Karthik

Neeraj Verma

unread,

May 29, 2018, 12:44:22 PM5/29/18

to Google Cloud Dataproc Discussions

Hi Karthik,

Thanks for your information . is it possible to pass Hbase properties like below

Properties

spark.time.interval.date

2018/03/30/14

spark.boost.input.path

gs://bigdata-sandbox/ml-sort-boosts-prepared-json/year=2018/month=03/day=30/hour=14

spark.input.path

gs://bigdata-sandbox/ml-sort-boosts-prepared-json/year=2018/month=03/day=30/hour=14

spark.output.path

gs://bigdata-sandbox/latest/temp

spark.query.input.path

gs://bigdata-sandbox/ml-sort-search-prepared-json/original/year=2018/month=03/day=30/hour=14

spark.search.filter

requestUriStem = '/api/queryresults/browse/womens-shoes'

spark.app.name.suffix

Test_spark-hbase_2018/03/30/14

spark.time.interval.format

yyyy/MM/dd/HH

spark.output.delete.previous.trigger

true

spark.hbase.zookeeper.quorum

cluster-bigdata-test2-m-2,cluster-bigdata-test2-m-0,cluster-bigdata-test2-m-1

spark.google.bigtable.project.id

bigdata-hbase

spark.google.bigtable.instance.id

bigdata-hbase

spark.hbase.client.connection.impl

com.google.cloud.bigtable.hbase1_x.BigtableConnection

Karthik Palaniappan

unread,

May 29, 2018, 3:44:31 PM5/29/18

to Google Cloud Dataproc Discussions

The bolded properties look correct to me, assuming that the Bigtable client is on the classpath.

I assume you shouldn't need to pass in the spark.hbase.zookeeper.quorum property since you're not actually using hbase. Have you tried running the job without that property?

Karthik Palaniappan

unread,

May 29, 2018, 8:14:46 PM5/29/18

to Google Cloud Dataproc Discussions

Actually, which library are you using for Spark <-> HBase? How does it accept properties to pass through? I realize that spark.google.* probably won't get passed through, since a library would likely look for spark.hbase.*. Instead of passing them in via spark, can you instead set those properties in hbase-site.xml? Take a look at the in-progress initialization action for Bigtable for example: https://github.com/GoogleCloudPlatform/dataproc-initialization-actions/pull/267.

Neeraj Verma

unread,

May 29, 2018, 9:57:04 PM5/29/18

to Google Cloud Dataproc Discussions

Hi Karthik,

Issue resolved. I modify existing spark job by adding bigtable jar in pom.xml as dependency and below code change . Thanks for your help.

case class HBaseConnectionWithClosure(zookeeperQuorum: String, zookeeperPort: String) {

def getConnection(): Connection = {

val c = HBaseConfiguration.create

//c.set("hbase.zookeeper.quorum", zookeeperQuorum)

//c.set("hbase.zookeeper.property.clientPort", zookeeperPort)

c.set("hbase.meta.replicas.use", "true")

c.set("google.bigtable.project.id", "fbfblok3vejkytb5bdbbzxkr3krxw3")

c.set("google.bigtable.instance.id", "bigdata-hbase")

c.set("hbase.client.connection.impl", "com.google.cloud.bigtable.hbase1_x.BigtableConnection")

ConnectionFactory.createConnection(c)

}

and

case class HBaseConnectionWithClosure(zookeeperQuorum: String, zookeeperPort: String) {

def getConnection(): Connection = {

val c = HBaseConfiguration.create

//c.set("hbase.zookeeper.quorum", zookeeperQuorum)

//c.set("hbase.zookeeper.property.clientPort", zookeeperPort)

c.set("hbase.meta.replicas.use", "true")

c.set("google.bigtable.project.id", "fbfblok3vejkytb5bdbbzxkr3krxw3")

c.set("google.bigtable.instance.id", "bigdata-hbase")

c.set("hbase.client.connection.impl", "com.google.cloud.bigtable.hbase1_x.BigtableConnection")

ConnectionFactory.createConnection(c)

}

and

<groupId>com.google.cloud.bigtable</groupId>

<artifactId>bigtable-hbase-1.x-shaded</artifactId>

</dependency>

Reply all

Reply to author

Forward