Error while writing Dataframe to BigTable using PySpark Connector

14 views
Skip to first unread message

venkata suresh gummadillli

unread,
May 10, 2024, 8:44:12 AMMay 10
to Google Cloud Bigtable Discuss
Hello, 

I am trying to write table from a dataframe to bigTable and facing some errors. I am looking this notebook.

Jar:
bigtable_spark_connector_jar="gs://spark-lib/bigtable/spark-bigtable_2.12-0.1.0.jar"

Session:
spark = SparkSession.builder \
  .appName('df-to-bigtable')\
  .config("spark.jars", bigtable_spark_connector_jar) \
  .config("spark.jars.packages", "org.slf4j:slf4j-reload4j:1.7.36") \
  .getOrCreate()

Spark version:
3.5.0
Scala Version:Scala code runner version 2.12.18
Writing data: df.repartition(10) \
  .write \
  .format("bigtable") \
  .options(catalog=catalog) \
  .option("spark.bigtable.project.id", bigtable_project_id) \
  .option("spark.bigtable.instance.id", bigtable_instance_id) \
  .option("spark.bigtable.create.new.table", create_new_table) \
  .save()

Its erroring out:Py4JJavaError: An error occurred while calling o338.save. : org.apache.spark.SparkClassNotFoundException: [DATA_SOURCE_NOT_FOUND] Failed to find the data source: bigtable. Please find packages at `https://spark.apache.org/third-party-projects.html`. at org.apache.spark.sql.errors.QueryExecutionErrors$.dataSourceNotFoundError(QueryExecutionErrors.scala:724) at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:647) at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSourceV2(DataSource.scala:697) at org.apache.spark.sql.DataFrameWriter.lookupV2Provider(DataFrameWriter.scala:863) at org.apache.spark.sql.DataFrameWriter.saveInternal(DataFrameWriter.scala:257) at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:248) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:566) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:374) at py4j.Gateway.invoke(Gateway.java:282) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) at py4j.commands.CallCommand.execute(CallCommand.java:79) at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182) at py4j.ClientServerConnection.run(ClientServerConnection.java:106) at java.base/java.lang.Thread.run(Thread.java:829) Caused by: java.lang.ClassNotFoundException: bigtable.DefaultSource
Can someone please take a look?
Thanks,Venkat

Reply all
Reply to author
Forward
0 new messages