Missing jar and UDFs

David Engel

unread,

Oct 11, 2022, 7:25:03 PM10/11/22

to MR3

We have a jar with various UDFs. We load it using the "add jar"
directive whenever we need to use the UDFs, which is in most
sessions(*). I can use the UDFs just fine on simple queries in
beeline. I'm getting a cryptic error, however, when I use them in an
only, slightly, more complex query which reads data from one tablle
and writes the UDF results to another table. Here is the error:

ERROR : FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.tez.TezTask. java.io.IOException: Previous writer likely failed to write file:/opt/mr3-run/work-dir/hive/hive/_mr3_session_dir/f82d2277/intzsta-1.0-SNAPSHOT.jar. Failing because I am unlikely to write too.
at org.apache.hadoop.hive.ql.exec.mr3.DAGUtils.localizeResource(DAGUtils.java:1371)
at org.apache.hadoop.hive.ql.exec.mr3.DAGUtils.addTempResources(DAGUtils.java:1260)
at org.apache.hadoop.hive.ql.exec.mr3.DAGUtils.localizeTempFilesFromConf(DAGUtils.java:1171)
at org.apache.hadoop.hive.ql.exec.mr3.MR3Task.setupSubmit(MR3Task.java:241)
at org.apache.hadoop.hive.ql.exec.mr3.MR3Task.execute(MR3Task.java:143)
at org.apache.hadoop.hive.ql.exec.tez.TezTask.executeMr3(TezTask.java:148)
at org.apache.hadoop.hive.ql.exec.tez.TezTask.execute(TezTask.java:136)
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:212)
at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:101)
at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:2681)
at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:2352)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:2029)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1729)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1723)
at org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:157)
at org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:229)
at org.apache.hive.service.cli.operation.SQLOperation.access$600(SQLOperation.java:87)
at org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:326)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729)
at org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:344)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)

intzsta-1.0-SNAPSHOT.jar is the jar with the uDFs.

David

(*)We still use "add jar" mainly for historical reasons dating back to
when we occasionally updated the jar with new or fixed UDFs. We
almost never need to do that anymore. Does the MR3 implementation of
Hive have a directory from where jars are automatically loaded? I
think the directory or setting was named auxlib or similar in the old,
Hadoop verison.

--
David Engel
da...@istwok.net

Sungwoo Park

unread,

Oct 11, 2022, 10:10:58 PM10/11/22

to David Engel, MR3

1. For the error (java.io.IOException: Previous writer likely failed to write file: ...), I currently don't understand why it occurs. From the source code of Hive-MR3, the file intzsta-1.0-SNAPSHOT.jar is supposed to be already written and Hive-MR3 should not retry to overwrite the file, but for some reason, it retries to overwrite the file. If you could send the log (by private email) or explain how to reproduce it, let me try again.

2. MR3 supports libUris and auxUris (mr3.lib.uris, mr3.aux.uris, mr3.cluster.additional.classpath), but in the current implementation, they are useful only on Hadoop and should not used on K8s. Specifically,

1) You can set mr3.lib.uris/mr3.aux.uris in mr3-site.xml, but on K8s, the jar files included in them should be found in the classpath specified in mr3.cluster.additional.classpath.

2) However, the value for mr3.cluster.additional.classpath is hard-coded in mr3-setup.sh:

MR3_ADD_CLASSPATH_OPTS="-Dmr3.cluster.additional.classpath=$REMOTE_BASE_DIR/mr3/mr3lib/*:$REMOTE_BASE_DIR/hive/apache-hive/lib/*:$REMOTE_BASE_DIR/tez/tezjar/*:$REMOTE_BASE_DIR/tez/tezjar/lib/*"

3) As a result, we cannot include additional jar files for MR3 master and workers on K8s.

We didn't anticipate such use cases of including additional jar files in MR3 master and workers. (On the other hand, we sometimes need to include additional jar files in HiveServer2 and Metastore, and this is supported on K8s).

A quick fix is to update mr3-setup.sh and extend the value for MR3_ADD_CLASSPATH_OPTS so that it includes a subdirectory of PersistentVolume. However, this requires us to build a custom Docker image.

Ideally I would like to solve the first problem, but if you are okay with building a Docker image, please let me know.

Cheers,

--- Sungwoo

--
You received this message because you are subscribed to the Google Groups "MR3" group.
To unsubscribe from this group and stop receiving emails from it, send an email to hive-mr3+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/hive-mr3/Y0X7TXF3lb5i8Fxh%40opus.istwok.net.

David Engel

unread,

Oct 11, 2022, 11:31:00 PM10/11/22

to Sungwoo Park, MR3

On Wed, Oct 12, 2022 at 11:10:45AM +0900, Sungwoo Park wrote:
> 1. For the error (java.io.IOException: Previous writer likely failed to
> write file: ...), I currently don't understand why it occurs. From the
> source code of Hive-MR3, the file intzsta-1.0-SNAPSHOT.jar is supposed to
> be already written and Hive-MR3 should not retry to overwrite the file, but
> for some reason, it retries to overwrite the file. If you could send the
> log (by private email) or explain how to reproduce it, let me try again.

Which logs, hiveserver2, mr3worker or both?

> 2. MR3 supports libUris and auxUris (mr3.lib.uris, mr3.aux.uris,
> mr3.cluster.additional.classpath), but in the current implementation, they
> are useful only on Hadoop and should not used on K8s. Specifically,

> [...]

> We didn't anticipate such use cases of including additional jar files in
> MR3 master and workers. (On the other hand, we sometimes need to include
> additional jar files in HiveServer2 and Metastore, and this is supported on
> K8s).
>
> A quick fix is to update mr3-setup.sh and extend the value for
> MR3_ADD_CLASSPATH_OPTS so that it includes a subdirectory of
> PersistentVolume. However, this requires us to build a custom Docker image.

No worries. All of our tools already have the appropriate add jar
commands. I was just wondering if there was a quick and easy way.
There are plenty of other things to do get this into production use
first.

David
--
David Engel
da...@istwok.net

Sungwoo Park

unread,

Oct 12, 2022, 1:05:07 AM10/12/22

to David Engel, MR3

On Wed, Oct 12, 2022 at 11:10:45AM +0900, Sungwoo Park wrote:
> 1. For the error (java.io.IOException: Previous writer likely failed to
> write file: ...), I currently don't understand why it occurs. From the
> source code of Hive-MR3, the file intzsta-1.0-SNAPSHOT.jar is supposed to
> be already written and Hive-MR3 should not retry to overwrite the file, but
> for some reason, it retries to overwrite the file. If you could send the
> log (by private email) or explain how to reproduce it, let me try again.

Which logs, hiveserver2, mr3worker or both?