I have a Cassandra to Bigtable dataproc serverless script working.
And it works great.
output_data.write.format("org.apache.hadoop.hbase.spark").options(
catalog=bt_catalog
).option("hbase.spark.use.hbasecontext", "false").mode("overwrite").save()
Now, I would like to read from Bigtable (again, using Python), and there are no examples to be found.
I've tried this among many other iterations.
df = (
spark.read.options(catalog=catalog)
.format("org.apache.hadoop.hbase.spark")
.option("hbase.spark.use.hbasecontext", "false")
.load()
)
print(f"Count: {df.count()}")
It tries to connect to a localhost for some reason. And we are submitting a container with the hbase-site.xml file set correctly.
Here's some of the output of what we get:
INFO ZooKeeper: Initiating client connection, connectString=
127.0.0.1:2181 sessionTimeout=90000 watcher=org.apache.hadoop.hbase.zookeeper.ReadOnlyZKClient$$Lambda$2533/0x0000000801216840@337930b7
...
23/05/18 20:56:06 WARN ClientCnxn: Session 0x0 for sever localhost/
127.0.0.1:2181, Closing socket connection. Attempting reconnect except it is a SessionExpiredException.
java.net.ConnectException: Connection refused
at java.base/sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at java.base/sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:777)
at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:344)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1290)
What am I missing?
TIA