Iceberg compaction with external, MySQL Metastore

12 views
Skip to first unread message

David Engel

unread,
May 27, 2025, 2:55:55 PMMay 27
to MR3
I'm trying to test compaction of Iceberg tables and getting an error
concerning the MySQL jdbc driver. For historical reasons, we are
still running the Metastore on one of our old, Hadoop cluster nodes
and configuring the Hive server on Kubernetes to use it. We can and
eventually will move the Metastore to Kubernetes but it hasn't been a
high priority yet.

Here is the error I'm seeing when running an "alter table ... compact
'major'" query:

2025-05-27T17:13:08,623 ERROR [HiveServer2-Background-Pool: Thread-2820] txn.TxnUtils: Unable to instantiate raw store directly in fastpath mode
java.lang.RuntimeException: Failed to get driver instance for jdbcUrl=jdbc:mysql://kbhadoop01/hive?createDatabaseIfNotExist=true&useSSL=false
at org.apache.hive.com.zaxxer.hikari.util.DriverDataSource.<init>(DriverDataSource.java:114) ~[hive-exec-4.0.0.jar:4.0.0]
at org.apache.hive.com.zaxxer.hikari.pool.PoolBase.initializeDataSource(PoolBase.java:331) ~[hive-exec-4.0.0.jar:4.0.0]
at org.apache.hive.com.zaxxer.hikari.pool.PoolBase.<init>(PoolBase.java:114) ~[hive-exec-4.0.0.jar:4.0.0]
at org.apache.hive.com.zaxxer.hikari.pool.HikariPool.<init>(HikariPool.java:108) ~[hive-exec-4.0.0.jar:4.0.0]
at org.apache.hive.com.zaxxer.hikari.HikariDataSource.<init>(HikariDataSource.java:81) ~[hive-exec-4.0.0.jar:4.0.0]
at org.apache.hadoop.hive.metastore.datasource.HikariCPDataSourceProvider.create(HikariCPDataSourceProvider.java:102) ~[hive-exec-4.0.0.jar:4.0.0]
at org.apache.hadoop.hive.metastore.txn.TxnHandler.setupJdbcConnectionPool(TxnHandler.java:984) ~[hive-exec-4.0.0.jar:4.0.0]
at org.apache.hadoop.hive.metastore.txn.TxnHandler.setConf(TxnHandler.java:282) ~[hive-exec-4.0.0.jar:4.0.0]
at org.apache.hadoop.hive.metastore.txn.TxnUtils.getTxnStore(TxnUtils.java:151) ~[hive-exec-4.0.0.jar:4.0.0]
at org.apache.hadoop.hive.ql.ddl.table.storage.compact.AlterTableCompactOperation.execute(AlterTableCompactOperation.java:90) ~[hive-exec-4.0.0.jar:4.0.0]
at org.apache.hadoop.hive.ql.ddl.DDLTask.execute(DDLTask.java:84) ~[hive-exec-4.0.0.jar:4.0.0]
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:214) ~[hive-exec-4.0.0.jar:4.0.0]
at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:105) ~[hive-exec-4.0.0.jar:4.0.0]
at org.apache.hadoop.hive.ql.Executor.launchTask(Executor.java:354) ~[hive-exec-4.0.0.jar:4.0.0]
at org.apache.hadoop.hive.ql.Executor.launchTasks(Executor.java:327) ~[hive-exec-4.0.0.jar:4.0.0]
at org.apache.hadoop.hive.ql.Executor.runTasks(Executor.java:244) ~[hive-exec-4.0.0.jar:4.0.0]
at org.apache.hadoop.hive.ql.Executor.execute(Executor.java:105) ~[hive-exec-4.0.0.jar:4.0.0]
at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:348) ~[hive-exec-4.0.0.jar:4.0.0]
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:192) ~[hive-exec-4.0.0.jar:4.0.0]
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:145) ~[hive-exec-4.0.0.jar:4.0.0]
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:140) ~[hive-exec-4.0.0.jar:4.0.0]
at org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:190) ~[hive-exec-4.0.0.jar:4.0.0]
at org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:234) ~[hive-service-4.0.0.jar:4.0.0]
at org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:334) ~[hive-service-4.0.0.jar:4.0.0]
at java.security.AccessController.doPrivileged(AccessController.java:712) ~[?:?]
at javax.security.auth.Subject.doAs(Subject.java:439) ~[?:?]
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899) ~[hadoop-common-3.3.6.jar:?]
at org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:354) ~[hive-service-4.0.0.jar:4.0.0]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539) ~[?:?]
at java.util.concurrent.FutureTask.run(FutureTask.java:264) ~[?:?]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) ~[?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) ~[?:?]
at java.lang.Thread.run(Thread.java:833) ~[?:?]
Caused by: java.sql.SQLException: No suitable driver
at java.sql.DriverManager.getDriver(DriverManager.java:299) ~[java.sql:?]
at org.apache.hive.com.zaxxer.hikari.util.DriverDataSource.<init>(DriverDataSource.java:106) ~[hive-exec-4.0.0.jar:4.0.0]
... 32 more
2025-05-27T17:13:08,623 ERROR [HiveServer2-Background-Pool: Thread-2820] exec.Task: Failed
java.sql.SQLException: No suitable driver
at java.sql.DriverManager.getDriver(DriverManager.java:299) ~[java.sql:?]
at org.apache.hive.com.zaxxer.hikari.util.DriverDataSource.<init>(DriverDataSource.java:106) ~[hive-exec-4.0.0.jar:4.0.0]
at org.apache.hive.com.zaxxer.hikari.pool.PoolBase.initializeDataSource(PoolBase.java:331) ~[hive-exec-4.0.0.jar:4.0.0]
at org.apache.hive.com.zaxxer.hikari.pool.PoolBase.<init>(PoolBase.java:114) ~[hive-exec-4.0.0.jar:4.0.0]
at org.apache.hive.com.zaxxer.hikari.pool.HikariPool.<init>(HikariPool.java:108) ~[hive-exec-4.0.0.jar:4.0.0]
at org.apache.hive.com.zaxxer.hikari.HikariDataSource.<init>(HikariDataSource.java:81) ~[hive-exec-4.0.0.jar:4.0.0]
at org.apache.hadoop.hive.metastore.datasource.HikariCPDataSourceProvider.create(HikariCPDataSourceProvider.java:102) ~[hive-exec-4.0.0.jar:4.0.0]
at org.apache.hadoop.hive.metastore.txn.TxnHandler.setupJdbcConnectionPool(TxnHandler.java:984) ~[hive-exec-4.0.0.jar:4.0.0]
at org.apache.hadoop.hive.metastore.txn.TxnHandler.setConf(TxnHandler.java:282) ~[hive-exec-4.0.0.jar:4.0.0]
at org.apache.hadoop.hive.metastore.txn.TxnUtils.getTxnStore(TxnUtils.java:151) ~[hive-exec-4.0.0.jar:4.0.0]
at org.apache.hadoop.hive.ql.ddl.table.storage.compact.AlterTableCompactOperation.execute(AlterTableCompactOperation.java:90) ~[hive-exec-4.0.0.jar:4.0.0]
at org.apache.hadoop.hive.ql.ddl.DDLTask.execute(DDLTask.java:84) ~[hive-exec-4.0.0.jar:4.0.0]
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:214) ~[hive-exec-4.0.0.jar:4.0.0]
at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:105) ~[hive-exec-4.0.0.jar:4.0.0]
at org.apache.hadoop.hive.ql.Executor.launchTask(Executor.java:354) ~[hive-exec-4.0.0.jar:4.0.0]
at org.apache.hadoop.hive.ql.Executor.launchTasks(Executor.java:327) ~[hive-exec-4.0.0.jar:4.0.0]
at org.apache.hadoop.hive.ql.Executor.runTasks(Executor.java:244) ~[hive-exec-4.0.0.jar:4.0.0]
at org.apache.hadoop.hive.ql.Executor.execute(Executor.java:105) ~[hive-exec-4.0.0.jar:4.0.0]
at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:348) ~[hive-exec-4.0.0.jar:4.0.0]
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:192) ~[hive-exec-4.0.0.jar:4.0.0]
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:145) ~[hive-exec-4.0.0.jar:4.0.0]
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:140) ~[hive-exec-4.0.0.jar:4.0.0]
at org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:190) ~[hive-exec-4.0.0.jar:4.0.0]
at org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:234) ~[hive-service-4.0.0.jar:4.0.0]
at org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:334) ~[hive-service-4.0.0.jar:4.0.0]
at java.security.AccessController.doPrivileged(AccessController.java:712) ~[?:?]
at javax.security.auth.Subject.doAs(Subject.java:439) ~[?:?]
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899) ~[hadoop-common-3.3.6.jar:?]
at org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:354) ~[hive-service-4.0.0.jar:4.0.0]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539) ~[?:?]
at java.util.concurrent.FutureTask.run(FutureTask.java:264) ~[?:?]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) ~[?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) ~[?:?]
at java.lang.Thread.run(Thread.java:833) ~[?:?]
2025-05-27T17:13:08,623 ERROR [HiveServer2-Background-Pool: Thread-2820] exec.Task: DDLTask failed, DDL Operation: class org.apache.hadoop.hive.ql.ddl.table.storage.compact.AlterTableCompactOperation
java.lang.RuntimeException: java.lang.RuntimeException: Failed to get driver instance for jdbcUrl=jdbc:mysql://kbhadoop01/hive?createDatabaseIfNotExist=true&useSSL=false
at org.apache.hadoop.hive.metastore.txn.TxnUtils.getTxnStore(TxnUtils.java:156) ~[hive-exec-4.0.0.jar:4.0.0]
at org.apache.hadoop.hive.ql.ddl.table.storage.compact.AlterTableCompactOperation.execute(AlterTableCompactOperation.java:90) ~[hive-exec-4.0.0.jar:4.0.0]
at org.apache.hadoop.hive.ql.ddl.DDLTask.execute(DDLTask.java:84) ~[hive-exec-4.0.0.jar:4.0.0]
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:214) ~[hive-exec-4.0.0.jar:4.0.0]
at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:105) ~[hive-exec-4.0.0.jar:4.0.0]
at org.apache.hadoop.hive.ql.Executor.launchTask(Executor.java:354) ~[hive-exec-4.0.0.jar:4.0.0]
at org.apache.hadoop.hive.ql.Executor.launchTasks(Executor.java:327) ~[hive-exec-4.0.0.jar:4.0.0]
at org.apache.hadoop.hive.ql.Executor.runTasks(Executor.java:244) ~[hive-exec-4.0.0.jar:4.0.0]
at org.apache.hadoop.hive.ql.Executor.execute(Executor.java:105) ~[hive-exec-4.0.0.jar:4.0.0]
at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:348) ~[hive-exec-4.0.0.jar:4.0.0]
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:192) ~[hive-exec-4.0.0.jar:4.0.0]
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:145) ~[hive-exec-4.0.0.jar:4.0.0]
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:140) ~[hive-exec-4.0.0.jar:4.0.0]
at org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:190) ~[hive-exec-4.0.0.jar:4.0.0]
at org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:234) ~[hive-service-4.0.0.jar:4.0.0]
at org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:334) ~[hive-service-4.0.0.jar:4.0.0]
at java.security.AccessController.doPrivileged(AccessController.java:712) ~[?:?]
at javax.security.auth.Subject.doAs(Subject.java:439) ~[?:?]
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899) ~[hadoop-common-3.3.6.jar:?]
at org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:354) ~[hive-service-4.0.0.jar:4.0.0]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539) ~[?:?]
at java.util.concurrent.FutureTask.run(FutureTask.java:264) ~[?:?]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) ~[?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) ~[?:?]
at java.lang.Thread.run(Thread.java:833) ~[?:?]
Caused by: java.lang.RuntimeException: Failed to get driver instance for jdbcUrl=jdbc:mysql://kbhadoop01/hive?createDatabaseIfNotExist=true&useSSL=false
at org.apache.hive.com.zaxxer.hikari.util.DriverDataSource.<init>(DriverDataSource.java:114) ~[hive-exec-4.0.0.jar:4.0.0]
at org.apache.hive.com.zaxxer.hikari.pool.PoolBase.initializeDataSource(PoolBase.java:331) ~[hive-exec-4.0.0.jar:4.0.0]
at org.apache.hive.com.zaxxer.hikari.pool.PoolBase.<init>(PoolBase.java:114) ~[hive-exec-4.0.0.jar:4.0.0]
at org.apache.hive.com.zaxxer.hikari.pool.HikariPool.<init>(HikariPool.java:108) ~[hive-exec-4.0.0.jar:4.0.0]
at org.apache.hive.com.zaxxer.hikari.HikariDataSource.<init>(HikariDataSource.java:81) ~[hive-exec-4.0.0.jar:4.0.0]
at org.apache.hadoop.hive.metastore.datasource.HikariCPDataSourceProvider.create(HikariCPDataSourceProvider.java:102) ~[hive-exec-4.0.0.jar:4.0.0]
at org.apache.hadoop.hive.metastore.txn.TxnHandler.setupJdbcConnectionPool(TxnHandler.java:984) ~[hive-exec-4.0.0.jar:4.0.0]
at org.apache.hadoop.hive.metastore.txn.TxnHandler.setConf(TxnHandler.java:282) ~[hive-exec-4.0.0.jar:4.0.0]
at org.apache.hadoop.hive.metastore.txn.TxnUtils.getTxnStore(TxnUtils.java:151) ~[hive-exec-4.0.0.jar:4.0.0]
... 24 more
Caused by: java.sql.SQLException: No suitable driver
at java.sql.DriverManager.getDriver(DriverManager.java:299) ~[java.sql:?]
at org.apache.hive.com.zaxxer.hikari.util.DriverDataSource.<init>(DriverDataSource.java:106) ~[hive-exec-4.0.0.jar:4.0.0]
at org.apache.hive.com.zaxxer.hikari.pool.PoolBase.initializeDataSource(PoolBase.java:331) ~[hive-exec-4.0.0.jar:4.0.0]
at org.apache.hive.com.zaxxer.hikari.pool.PoolBase.<init>(PoolBase.java:114) ~[hive-exec-4.0.0.jar:4.0.0]
at org.apache.hive.com.zaxxer.hikari.pool.HikariPool.<init>(HikariPool.java:108) ~[hive-exec-4.0.0.jar:4.0.0]
at org.apache.hive.com.zaxxer.hikari.HikariDataSource.<init>(HikariDataSource.java:81) ~[hive-exec-4.0.0.jar:4.0.0]
at org.apache.hadoop.hive.metastore.datasource.HikariCPDataSourceProvider.create(HikariCPDataSourceProvider.java:102) ~[hive-exec-4.0.0.jar:4.0.0]
at org.apache.hadoop.hive.metastore.txn.TxnHandler.setupJdbcConnectionPool(TxnHandler.java:984) ~[hive-exec-4.0.0.jar:4.0.0]
at org.apache.hadoop.hive.metastore.txn.TxnHandler.setConf(TxnHandler.java:282) ~[hive-exec-4.0.0.jar:4.0.0]
at org.apache.hadoop.hive.metastore.txn.TxnUtils.getTxnStore(TxnUtils.java:151) ~[hive-exec-4.0.0.jar:4.0.0]
... 24 more
FAILED: Execution Error, return code 40000 from org.apache.hadoop.hive.ql.ddl.DDLTask. No suitable driver
2025-05-27T17:13:08,634 INFO [HiveServer2-Background-Pool: Thread-2820] reexec.ReOptimizePlugin: ReOptimization: retryPossible: false
2025-05-27T17:13:08,634 INFO [HiveServer2-Background-Pool: Thread-2820] reexec.ReExecuteLostAMQueryPlugin: Exception is not a TezRuntimeException, no need to check further with ReExecuteLostAMQueryPlugin
2025-05-27T17:13:08,634 INFO [HiveServer2-Background-Pool: Thread-2820] reexec.ReExecutionDagSubmitPlugin: Got exception message: No suitable driver retryPossible: false
2025-05-27T17:13:08,634 INFO [HiveServer2-Background-Pool: Thread-2820] reexec.ReExecuteOnWriteConflictPlugin: Got exception message: No suitable driver retryPossible: false
2025-05-27T17:13:08,634 ERROR [HiveServer2-Background-Pool: Thread-2820] ql.Driver: FAILED: Execution Error, return code 40000 from org.apache.hadoop.hive.ql.ddl.DDLTask. No suitable driver

I believe I saw possible linkages between workdir-pv/pvc and the
location for the downloaded, MySQL driver. Based on earlier
discussions here, we don't use workdir-pv/pvc and instead point Hive
to a location in HDFS. Should I revert that change or is there
another place I should configure/put the MySQL driver? Or should I
go ahead and move the Metastore to Kubernetes now?

David
--
David Engel
da...@istwok.net

Sungwoo Park

unread,
May 28, 2025, 3:30:05 AMMay 28
to David Engel, MR3
This is a problem we have also observed before, and it is still in our TODO list. In our case, it was a minor compaction. I think Moving Metastore to K8s will not help because it seems that we produced this problem on K8s.


Let me do some more testing later and get back to you.

--- Sungwoo

INFO  : Executing command(queryId=hive_20241021074027_6792818f-c101-49f3-86b8-d9d51cacca5e): ALTER TABLE test_compaction COMPACT 'minor'
INFO  : Starting task [Stage-0:DDL] in serial mode
ERROR : Failed

java.sql.SQLException: No suitable driver
  at java.sql.DriverManager.getDriver(DriverManager.java:299) ~[java.sql:?]
  at org.apache.hive.com.zaxxer.hikari.util.DriverDataSource.<init>(DriverDataSource.java:106) ~[hive-exec-4.0.0.jar:4.0.0]

ERROR : DDLTask failed, DDL Operation: class org.apache.hadoop.hive.ql.ddl.table.storage.compact.AlterTableCompactOperation
java.lang.RuntimeException: java.lang.RuntimeException: Failed to get driver instance for jdbcUrl=jdbc:mysql://192.168.10.1/hive400mr3?createDatabaseIfNotExist=true&useSSL=false

  at org.apache.hadoop.hive.metastore.txn.TxnUtils.getTxnStore(TxnUtils.java:156) ~[hive-exec-4.0.0.jar:4.0.0]

Caused by: java.lang.RuntimeException: Failed to get driver instance for jdbcUrl=jdbc:mysql://192.168.10.1/hive400mr3?createDatabaseIfNotExist=true&useSSL=false

  at org.apache.hive.com.zaxxer.hikari.util.DriverDataSource.<init>(DriverDataSource.java:114) ~[hive-exec-4.0.0.jar:4.0.0]

Caused by: java.sql.SQLException: No suitable driver
  at java.sql.DriverManager.getDriver(DriverManager.java:299) ~[java.sql:?]
  at org.apache.hive.com.zaxxer.hikari.util.DriverDataSource.<init>(DriverDataSource.java:106) ~[hive-exec-4.0.0.jar:4.0.0]
--
You received this message because you are subscribed to the Google Groups "MR3" group.
To unsubscribe from this group and stop receiving emails from it, send an email to hive-mr3+u...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/hive-mr3/aDYKuKqAiCkT5xiJ%40opus.istwok.net.

David Engel

unread,
May 28, 2025, 12:42:47 PMMay 28
to Sungwoo Park, MR3
Okay.

David
--
David Engel
da...@istwok.net

Ill

unread,
Jun 19, 2025, 11:40:04 AMJun 19
to David Engel, Sungwoo Park, MR3
I actually encountered another issue. When I run compact, the error I get is: UnknownHostException: ${hive.database.host}: Name or service not known. Currently, I can perform CRUD operations on Iceberg tables, but I can't perform the compact operation. It feels strange, it seems like the values from env.sh are not fully injected into the JVM environment.

Sungwoo Park

unread,
Jun 19, 2025, 9:32:48 PMJun 19
to MR3
It seems that all these issues are due to the Jar file missing in the classpath. I will try to fix the issue in MR3 2.1.

Sungwoo
Reply all
Reply to author
Forward
0 new messages