Dataproc apache iceberg: iceberg is not a valid Spark SQL Data Source

770 views

Skip to first unread message

Punyawee Posri

unread,

May 16, 2023, 1:54:55 AM5/16/23

to Google Cloud Dataproc Discussions

Hi Google Cloud Dataproc Team!

I'm now trying to use apache iceberg for my nestjs project to speed up our database query on some of our features. So, I've been try to test the query speed of apache iceberg on ssh (Google Dataproc). But I'm now facing an issue while using apache iceberg on spark via google dataproc ssh.

What I've done for now?

I've already create a dataproc cluster follow by an official document from google as described here https://cloud.google.com/dataproc-metastore/docs/apache-iceberg
I've already connect to vm-instance through the ssh connection
I've completed the configuration of a spark-shell (follow by an official document from google) and also for spark-sql follow by an official document from an iceberg as described here https://iceberg.apache.org/docs/latest/getting-started/)
Finally, I've found that the configuration that worked for me was
spark-sql --packages org.apache.iceberg:iceberg-spark-runtime-3.3_2.12:1.0.0
With the package configuration on step 4, it made my table creation process worked completely.
But when I've tried to just SELECT * FROM {my created table with 'USING iceberg'}, It gave me an error
java.util.concurrent.ExecutionException: org.apache.spark.sql.AnalysisException: iceberg is not a valid Spark SQL Data Source

*** My questions are...

What I've missed here?
Is it the right way to use apache iceberg as a new database instead of postgresql on my nestjs project?

*** PS:

I've been tried to recheck with SELECT * FROM {another created table without 'USING iceberg'} then everything works fine.
With this error (java.util.concurrent.ExecutionException: org.apache.spark.sql.AnalysisException: iceberg is not a valid Spark SQL Data Source) I've try to use a new catalog name as discussed here https://github.com/apache/iceberg/issues/1756. But all of the answer within the forum seems not to work for me.
For example:
- use SELECT * FROM local.default.{myCreatedTable}
- use SELECT * FROM default.{myCreatedTable}
- use SELECT * FROM {myCreatedTable}