Dataproc apache iceberg: iceberg is not a valid Spark SQL Data Source
716 views
Skip to first unread message
Punyawee Posri
unread,
May 16, 2023, 1:54:55 AM5/16/23
Reply to author
Sign in to reply to author
Forward
Sign in to forward
Delete
You do not have permission to delete messages in this group
Copy link
Report message
Show original message
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to Google Cloud Dataproc Discussions
Hi Google Cloud Dataproc Team!
I'm now trying to use apache iceberg for my nestjs project to speed up our database query on some of our features. So, I've been try to test the query speed of apache iceberg on ssh (Google Dataproc). But I'm now facing an issue while using apache iceberg on spark via google dataproc ssh.
I've already connect to vm-instance through the ssh connection
I've completed the configuration of a spark-shell (follow by an official document from google) and also for spark-sql follow by an official document from an iceberg as described here https://iceberg.apache.org/docs/latest/getting-started/)
Finally, I've found that the configuration that worked for me was spark-sql --packages org.apache.iceberg:iceberg-spark-runtime-3.3_2.12:1.0.0
With the package configuration on step 4, it made my table creation process worked completely.
But when I've tried to just SELECT * FROM {my created table with 'USING iceberg'}, It gave me an error java.util.concurrent.ExecutionException: org.apache.spark.sql.AnalysisException: iceberg is not a valid Spark SQL Data Source
*** My questions are...
What I've missed here?
Is it the right way to use apache iceberg as a new database instead of postgresql on my nestjs project?
*** PS:
I've been tried to recheck with SELECT * FROM {another created table without 'USING iceberg'} then everything works fine.
With this error (java.util.concurrent.ExecutionException: org.apache.spark.sql.AnalysisException: iceberg is not a valid Spark SQL Data Source) I've try to use a new catalog name as discussed here https://github.com/apache/iceberg/issues/1756. But all of the answer within the forum seems not to work for me. For example: - use SELECT * FROM local.default.{myCreatedTable} - use SELECT * FROM default.{myCreatedTable} - use SELECT * FROM {myCreatedTable}