Hello,
I am trying to write table from a dataframe to bigTable and facing some errors. I am looking this notebook.
Jar:
bigtable_spark_connector_jar="gs://spark-lib/bigtable/spark-bigtable_2.12-0.1.0.jar"
Session:
spark = SparkSession.builder \
.appName('df-to-bigtable')\
.config("spark.jars", bigtable_spark_connector_jar) \
.config("spark.jars.packages", "org.slf4j:slf4j-reload4j:1.7.36") \
.getOrCreate()
Spark version:
3.5.0
Scala Version:Scala code runner version 2.12.18
Writing data: df.repartition(10) \
.write \
.format("bigtable") \
.options(catalog=catalog) \
.option("spark.bigtable.project.id", bigtable_project_id) \
.option("spark.bigtable.instance.id", bigtable_instance_id) \
.option("spark.bigtable.create.new.table", create_new_table) \
.save()
Its erroring out:Py4JJavaError: An error occurred while calling o338.save.
: org.apache.spark.SparkClassNotFoundException: [DATA_SOURCE_NOT_FOUND] Failed to find the data source: bigtable. Please find packages at `https://spark.apache.org/third-party-projects.html`.
at org.apache.spark.sql.errors.QueryExecutionErrors$.dataSourceNotFoundError(QueryExecutionErrors.scala:724)
at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:647)
at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSourceV2(DataSource.scala:697)
at org.apache.spark.sql.DataFrameWriter.lookupV2Provider(DataFrameWriter.scala:863)
at org.apache.spark.sql.DataFrameWriter.saveInternal(DataFrameWriter.scala:257)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:248)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:566)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:374)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182)
at py4j.ClientServerConnection.run(ClientServerConnection.java:106)
at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: java.lang.ClassNotFoundException: bigtable.DefaultSource
Can someone please take a look?
Thanks,Venkat