Dataproc Serverless template development: Cassandra to Bigtable error interpretation

60 views
Skip to first unread message

Adam Scott

unread,
Feb 16, 2023, 5:56:54 PM2/16/23
to Google Cloud Dataproc Discussions
Working on a new template for Cassandr to Bigtable, and after I submit the job, I receive (copying only partial error message for space) the following.

Could this be a mal-formed catalog specification or a version mis-match?
Also, How do I tell it which GCP runtime to use for the Spark stack?

```
  File "/tmp/srvls-batch-73398c59-2085-4050-a18c-da17e3528d2a/main.py", line 127, in run_template
    template_instance.run(spark=spark, args=args)
  File "/tmp/srvls-batch-73398c59-2085-4050-a18c-da17e3528d2a/dataproc_templates_distribution.egg/dataproc_templates/cassandra/cassandra_to_bigtable.py", line 155, in run
    input_data = spark.sql(query)
  File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/sql/session.py", line 1034, in sql
    return DataFrame(self._jsparkSession.sql(sqlQuery), self)
  File "/usr/lib/spark/python/lib/py4j-0.10.9.5-src.zip/py4j/java_gateway.py", line 1321, in __call__
    return_value = get_return_value(
  File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/sql/utils.py", line 190, in deco
    return f(*a, **kw)
  File "/usr/lib/spark/python/lib/py4j-0.10.9.5-src.zip/py4j/protocol.py", line 326, in get_return_value
    raise Py4JJavaError(
py4j.protocol.Py4JJavaError: An error occurred while calling o73.sql.
: java.lang.NoClassDefFoundError: scala/collection/GenTraversableOnce
    at java.base/java.lang.Class.getDeclaredConstructors0(Native Method)
    at java.base/java.lang.Class.privateGetDeclaredConstructors(Class.java:3373)
    at java.base/java.lang.Class.getConstructor0(Class.java:3578)
    at java.base/java.lang.Class.getDeclaredConstructor(Class.java:2754)
    at org.apache.spark.sql.connector.catalog.Catalogs$.load(Catalogs.scala:59)
```

Mich Talebzadeh

unread,
Apr 21, 2023, 5:48:22 AM4/21/23
to Google Cloud Dataproc Discussions
What version of Spark is this running on and the version of Java on dataproc?

Mich Talebzadeh

unread,
Apr 21, 2023, 6:14:46 AM4/21/23
to Google Cloud Dataproc Discussions
I have tested Google connectivity to BigQuery using VM hosts 

This is the version of Java that works OK with BigQuery and spark

 java -version
openjdk version "1.8.0_292"
OpenJDK Runtime Environment (AdoptOpenJDK)(build 1.8.0_292-b10)
OpenJDK 64-Bit Server VM (AdoptOpenJDK)(build 25.292-b10, mixed mode)



If you have higher version of Java, Spark connector to BigQuery will fail. I tested the latest Spark (3.4). As long as your VM host runs Java 8 it should work , regardless of Spark version.

So you have the choice of installing JAVA 8 and run your job again. 

HTH
Reply all
Reply to author
Forward
0 new messages