USING UDTF ON MR3 WITH NPE

75 views
Skip to first unread message

Carol Chapman

unread,
Nov 15, 2022, 1:12:16 AM11/15/22
to MR3
Currently, I have some problems using UDTF on MR3.
Here is the SQL I executed:

add jar hdfs:///user/hive/resource/jars/logback/hive-udf-1.1.4-SNAPSHOT.jar;
create temporary function json_array_str_explode as 'com.test.plt.hive.udf.JsonArrayExplode';

  SELECT
  A,B,C,entity
  from
  (
    SELECT
    A,B,C,
    get_json_object(D,'$.sku_order_list') as orders
    from
    (
       select
       get_json_object(D,'$.payload') as D,A
       from
       etl_data.test_table
       where dt>='2022-10-01'
    ) s
    lateral view json_tuple(D,'B','appointment_ship_time') b as B,C
  ) base_data  lateral view json_array_str_explode(orders) t as entity;

When I add the LIMIT keyword to the subquery, it can be executed normally.

However, when I remove the LIMIT keyword, I get an error immediately after executing the SQL.

NFO  : Completed executing command(queryId=hue_20221115135809_6b2f2234-d1d2-4ef0-8f5e-1315ddb83d3b); Time taken: 0.555 seconds
Error: Error while processing statement: FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask. Terminating unsuccessfully: Vertex failed, vertex_1658132306403_144655_72966_01. Map 1            1 task           1668491890276 milliseconds: Failed, Some(Failed to create RootInputInitializerManager or VertexManager for Map 1, com.datamonad.mr3.api.common.AMInputInitializerException: WorkerVertex.createRootInputInitializerManager() Map 1
        at com.datamonad.mr3.dag.WorkerVertexImpl.createRootInputInitializerManager(WorkerVertex.scala:634)
        at com.datamonad.mr3.dag.WorkerVertexImpl.transitionToInitializing(WorkerVertex.scala:478)
        at com.datamonad.mr3.dag.WorkerVertexImpl.checkFromCanStartInitializing(WorkerVertex.scala:872)
        at com.datamonad.mr3.dag.WorkerVertexImpl.eventInitialize(WorkerVertex.scala:1062)
        at com.datamonad.mr3.dag.WorkerVertexImpl.handle(WorkerVertex.scala:972)
        at com.datamonad.mr3.dag.WorkerVertex$$anon$1.com$datamonad$mr3$common$AsyncHandling$$super$handle(WorkerVertex.scala:74)
        at com.datamonad.mr3.common.AsyncHandling$$anon$1.run(EventHandler.scala:47)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
Caused by: com.datamonad.mr3.api.common.MR3UncheckedException: Unable to instantiate class with 1 arguments: org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator
        at com.datamonad.mr3.common.ReflectionUtils.getNewInstance(ReflectionUtils.java:57)
        at com.datamonad.mr3.common.ReflectionUtils.createClazzInstance(ReflectionUtils.java:26)
        at com.datamonad.mr3.tez.TezInputInitializer$.getInputInitializer(TezInputInitializer.scala:34)
        at com.datamonad.mr3.tez.TezRuntimeEnv.getInputInitializer(TezRuntimeEnv.scala:365)
        at com.datamonad.mr3.dag.InputInitializerRunner.<init>(InputInitializerRunner.scala:39)
        at com.datamonad.mr3.dag.RootInputInitializerManager$$anonfun$3.apply(RootInputInitializerManager.scala:39)
        at com.datamonad.mr3.dag.RootInputInitializerManager$$anonfun$3.apply(RootInputInitializerManager.scala:38)
        at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
        at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
        at scala.collection.immutable.List.foreach(List.scala:392)
        at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
        at scala.collection.immutable.List.map(List.scala:296)
        at com.datamonad.mr3.dag.RootInputInitializerManager.<init>(RootInputInitializerManager.scala:38)
        at com.datamonad.mr3.dag.WorkerVertexImpl.createRootInputInitializerManager(WorkerVertex.scala:631)
        ... 11 more
Caused by: java.lang.reflect.InvocationTargetException
        at sun.reflect.GeneratedConstructorAccessor508.newInstance(Unknown Source)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
        at com.datamonad.mr3.common.ReflectionUtils.getNewInstance(ReflectionUtils.java:45)
        ... 24 more
Caused by: java.lang.RuntimeException: Failed to load plan: hdfs://spacex-hadoop/tmp/hue/hue/_mr3_session_dir/c53f5b44-9de1-45f4-865b-6c405086f61e/hive/_mr3_scratch_dir-ae5f-72290/fb0ee71e-094d-4db2-8bbc-26182a34027b/map.xml
        at org.apache.hadoop.hive.ql.exec.Utilities.getBaseWork(Utilities.java:504)
        at org.apache.hadoop.hive.ql.exec.Utilities.getMapWork(Utilities.java:350)
        at org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.<init>(HiveSplitGenerator.java:137)
        ... 28 more
Caused by: org.apache.hive.com.esotericsoftware.kryo.KryoException: Unable to find class: com.test.plt.hive.udf.JsonArrayExplode
Serialization trace:
genericUDTF (org.apache.hadoop.hive.ql.plan.UDTFDesc)
conf (org.apache.hadoop.hive.ql.exec.UDTFOperator)
parentOperators (org.apache.hadoop.hive.ql.exec.LateralViewJoinOperator)
childOperators (org.apache.hadoop.hive.ql.exec.SelectOperator)
childOperators (org.apache.hadoop.hive.ql.exec.LateralViewForwardOperator)
childOperators (org.apache.hadoop.hive.ql.exec.SelectOperator)
childOperators (org.apache.hadoop.hive.ql.exec.LateralViewJoinOperator)
childOperators (org.apache.hadoop.hive.ql.exec.SelectOperator)
childOperators (org.apache.hadoop.hive.ql.exec.LateralViewForwardOperator)
childOperators (org.apache.hadoop.hive.ql.exec.SelectOperator)
childOperators (org.apache.hadoop.hive.ql.exec.TableScanOperator)
aliasToWork (org.apache.hadoop.hive.ql.plan.MapWork)
        at org.apache.hive.com.esotericsoftware.kryo.util.DefaultClassResolver.readName(DefaultClassResolver.java:156)
        at org.apache.hive.com.esotericsoftware.kryo.util.DefaultClassResolver.readClass(DefaultClassResolver.java:133)
        at org.apache.hive.com.esotericsoftware.kryo.Kryo.readClass(Kryo.java:670)
        at org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readClass(SerializationUtilities.java:184)
        at org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:118)
        at org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:551)
        at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:708)
        at org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readObject(SerializationUtilities.java:217)
        at org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:125)
        at org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:551)
        at org.apache.hive.com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:790)
        at org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readClassAndObject(SerializationUtilities.java:179)
        at org.apache.hive.com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:134)
        at org.apache.hive.com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:40)
        at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:708)
        at org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readObject(SerializationUtilities.java:217)
        at org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:125)
        at org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:551)
        at org.apache.hive.com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:790)
        at org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readClassAndObject(SerializationUtilities.java:179)
        at org.apache.hive.com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:134)
        at org.apache.hive.com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:40)
        at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:708)
        at org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readObject(SerializationUtilities.java:217)
        at org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:125)
        at org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:551)
        at org.apache.hive.com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:790)
        at org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readClassAndObject(SerializationUtilities.java:179)
        at org.apache.hive.com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:134)
        at org.apache.hive.com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:40)
        at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:708)
        at org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readObject(SerializationUtilities.java:217)
        at org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:125)
        at org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:551)
        at org.apache.hive.com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:790)
        at org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readClassAndObject(SerializationUtilities.java:179)
        at org.apache.hive.com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:134)
        at org.apache.hive.com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:40)
        at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:708)
        at org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readObject(SerializationUtilities.java:217)
        at org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:125)
        at org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:551)
        at org.apache.hive.com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:790)
        at org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readClassAndObject(SerializationUtilities.java:179)
        at org.apache.hive.com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:134)
        at org.apache.hive.com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:40)
        at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:708)
        at org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readObject(SerializationUtilities.java:217)
        at org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:125)
        at org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:551)
        at org.apache.hive.com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:790)
        at org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readClassAndObject(SerializationUtilities.java:179)
        at org.apache.hive.com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:134)
        at org.apache.hive.com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:40)
        at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:708)
        at org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readObject(SerializationUtilities.java:217)
        at org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:125)
        at org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:551)
        at org.apache.hive.com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:790)
        at org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readClassAndObject(SerializationUtilities.java:179)
        at org.apache.hive.com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:134)
        at org.apache.hive.com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:40)
        at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:708)
        at org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readObject(SerializationUtilities.java:217)
        at org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:125)
        at org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:551)
        at org.apache.hive.com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:790)
        at org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readClassAndObject(SerializationUtilities.java:179)
        at org.apache.hive.com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:134)
        at org.apache.hive.com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:40)
        at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:708)
        at org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readObject(SerializationUtilities.java:217)
        at org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:125)
        at org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:551)
        at org.apache.hive.com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:790)
        at org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readClassAndObject(SerializationUtilities.java:179)
        at org.apache.hive.com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:134)
        at org.apache.hive.com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:40)
        at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:708)
        at org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readObject(SerializationUtilities.java:217)
        at org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:125)
        at org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:551)
        at org.apache.hive.com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:790)
        at org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readClassAndObject(SerializationUtilities.java:179)
        at org.apache.hive.com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:161)
        at org.apache.hive.com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:39)
        at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:708)
        at org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readObject(SerializationUtilities.java:217)
        at org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:125)
        at org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:551)
        at org.apache.hadoop.hive.ql.exec.SerializationUtilities$MapWorkSerializer.read(SerializationUtilities.java:555)
        at org.apache.hadoop.hive.ql.exec.SerializationUtilities$MapWorkSerializer.read(SerializationUtilities.java:547)
        at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:686)
        at org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readObject(SerializationUtilities.java:209)
        at org.apache.hadoop.hive.ql.exec.SerializationUtilities.deserializeObjectByKryo(SerializationUtilities.java:753)
        at org.apache.hadoop.hive.ql.exec.SerializationUtilities.deserializePlan(SerializationUtilities.java:659)
        at org.apache.hadoop.hive.ql.exec.SerializationUtilities.deserializePlan(SerializationUtilities.java:636)
        at org.apache.hadoop.hive.ql.exec.Utilities.getBaseWork(Utilities.java:465)
        ... 30 more
Caused by: java.lang.ClassNotFoundException: com.test.plt.hive.udf.JsonArrayExplode
        at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
        at java.lang.Class.forName0(Native Method)
        at java.lang.Class.forName(Class.java:348)
        at org.apache.hive.com.esotericsoftware.kryo.util.DefaultClassResolver.readName(DefaultClassResolver.java:154)
        ... 127 more

Sungwoo Park

unread,
Nov 15, 2022, 10:34:03 PM11/15/22
to Carol Chapman, MR3
This is probably a bug in Hive. For example, see a similar bug report: https://issues.apache.org/jira/browse/HIVE-25487

A workaround might be to put the LIMIT keyword back, but with a large value as an argument, so as to effectively emulate no use of LIMIT.

Sungwoo

--
You received this message because you are subscribed to the Google Groups "MR3" group.
To unsubscribe from this group and stop receiving emails from it, send an email to hive-mr3+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/hive-mr3/907af4a9-bdc7-422f-94b1-ec824ccd81e2n%40googlegroups.com.

Carol Chapman

unread,
Nov 18, 2022, 2:15:13 AM11/18/22
to MR3
However, when I use HDP3.1.5 HIVE, the SQL statements run normally .

Sungwoo Park

unread,
Nov 18, 2022, 3:15:10 AM11/18/22
to MR3
Probably some patch was applied to Hive in HDP, but not to Hive-MR3. If you can find the patch, please let us know.

Sungwoo

Sungwoo Park

unread,
Dec 19, 2022, 4:59:56 AM12/19/22
to MR3
I revisited your problem, and now I start to think that this might be due to a bug in MR3. Specifically MR3 might fail to include local resources (such as hive-udf-1.1.4-SNAPSHOT.jar in your example) in the classpath when executing InputInitializer inside MR3 DAGAppMaster.

If you think you can share the data and queries for reproducing the problem, please let me know (by private mail).

Cheers,

--- Sungwoo

Sungwoo Park

unread,
Dec 19, 2022, 5:17:49 AM12/19/22
to MR3
Actually this might be the same problem reported in your earlier message of November last year (on "HIVE MR3 can not use JdbcStorageHandler"). If you can still execute the query, could you try again with mr3.am.permit.custom.user.class set to true? Note that setting mr3.am.permit.custom.user.class set to true means that a rogue user can execute malicious code inside DAGAppMaster.

Cheers,

Sungwoo

David Engel

unread,
Dec 19, 2022, 12:00:01 PM12/19/22
to Sungwoo Park, MR3
I will try to test this later today.

David
> >>>>> <https://groups.google.com/d/msgid/hive-mr3/907af4a9-bdc7-422f-94b1-ec824ccd81e2n%40googlegroups.com?utm_medium=email&utm_source=footer>
> >>>>> .
> >>>>>
> >>>> --
> >>> You received this message because you are subscribed to the Google
> >>> Groups "MR3" group.
> >>> To unsubscribe from this group and stop receiving emails from it, send
> >>> an email to hive-mr3+u...@googlegroups.com.
> >>> To view this discussion on the web visit
> >>> https://groups.google.com/d/msgid/hive-mr3/a830a5c8-b172-4598-ae1e-d8b5ebaaf688n%40googlegroups.com
> >>> <https://groups.google.com/d/msgid/hive-mr3/a830a5c8-b172-4598-ae1e-d8b5ebaaf688n%40googlegroups.com?utm_medium=email&utm_source=footer>
> >>> .
> >>>
> >>
>
> --
> You received this message because you are subscribed to the Google Groups "MR3" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to hive-mr3+u...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/hive-mr3/ad15e8e2-a6df-42a5-867f-059cd0b8679en%40googlegroups.com.


--
David Engel
da...@istwok.net

Sungwoo Park

unread,
Dec 19, 2022, 12:23:49 PM12/19/22
to David Engel, MR3
Thanks for testing this. Any query that calls a UDF inside InputInitializer will do the job. If mr3.am.permit.custom.user.class is set to false, MR3 does not include the jar file with the UDF in the classpath, thus throwing a ClassNotFoundException. In such a case, setting mr3.am.permit.custom.user.class to true will fix the problem.

Cheers,

Sungwoo

David Engel

unread,
Dec 20, 2022, 11:30:11 AM12/20/22
to Sungwoo Park, MR3
Sorry for taking longer than intended. I was in and out yesterday and
also allowed the tests to run as long as possible. Unfortuneatly, I
have bad news. The first, relatively, small test completed fine in a
reasonable time. However, the second, much larger test never
completed and also made no progress at all. Its map stage stayed at
0(+70)/70 for several hours until I killed it. This happened on
multiple attempts. To be clear, I commented out
hive.vectorized.adaptor.usage.mode=chosen and added
mr3.am.permit.custom.user.class=true. Only after I reverted these
changes did the test complete.

David
> > .esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:161)
> > > >>>>> at
> > > >>>>> org.apache.hive.com
> > .esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:39)
> > > >>>>> at
> > > >>>>> org.apache.hive.com
> > .esotericsoftware.kryo.Kryo.readObject(Kryo.java:708)
> > > >>>>> at
> > > >>>>>
> > org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readObject(SerializationUtilities.java:217)
> > > >>>>> at
> > > >>>>> org.apache.hive.com
> > .esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:125)
> > > >>>>> at
> > > >>>>> org.apache.hive.com
--
David Engel
da...@istwok.net

Sungwoo Park

unread,
Dec 20, 2022, 5:57:44 PM12/20/22
to David Engel, MR3
Sorry for taking longer than intended.  I was in and out yesterday and
also allowed the tests to run as long as possible.  Unfortuneatly, I
have bad news.  The first, relatively, small test completed fine in a
reasonable time.  However, the second, much larger test never
completed and also made no progress at all.  Its map stage stayed at
0(+70)/70 for several hours until I killed it.  This happened on
multiple attempts.  To be clear, I commented out
hive.vectorized.adaptor.usage.mode=chosen and added
mr3.am.permit.custom.user.class=true.  Only after I reverted these
changes did the test complete.

It looks like the slow execution is caused by hive.vectorized.adaptor.usage.mode set to the default value 'all'. If so, I think it is due to a bug in VectorUDFAdaptor for vectorizing UDF calls. To be safe, hive.vectorized.adaptor.usage.mode should be set to 'none' or 'chosen', in which case UDFs calls may not be quite fast but do not stall the query for a long time.

If InputInitializer cannot load custom Java classes because mr3.am.permit.custom.user.class is set to false, the query fails immediately. So, mr3.am.permit.custom.user.class does not affect the execution speed. For your query, I think InputInitializer does not call UDFs.

--- Sungwoo

David Engel

unread,
Dec 20, 2022, 6:39:21 PM12/20/22
to Sungwoo Park, MR3
Yes, that's what you concluded before and my testing appeared to
confirm. I initially changed hive.vectorized.adaptor.usage.mode to
all and then later to chosen. Both worked for me.

David
--
David Engel
da...@istwok.net

Ill

unread,
Dec 30, 2022, 1:34:58 AM12/30/22
to MR3
HI,
Sorry, because my account is lost( carolcha...@gmail.com ).This is my new account.
Because I was busy with my work a while ago, I didn't continue the test.
I always set the value of 'mr3.am.permit.custom.user.class' to true. At present, I have observed that after the SQL is submitted to MR3, an exception occurs when MR3 executes MAP TASK. So I think that the UDF JAR has been loaded at first, but there was a problem during the execution. Therefore, I can execute UDTF with adding the LIMIT keyword.
If you need, I can share my UDF JAR and dataset. 

Sungwoo Park

unread,
Dec 30, 2022, 6:22:04 AM12/30/22
to Ill, MR3
For your case, I think the only solution is to include hive-udf-1.1.4-SNAPSHOT.jar in the list of hive.aux.jars.path in hive-site.xml. Here is why.

After executing 'add jar' and setting mr3.am.permit.custom.user.class to true, the JAR file is registered in the URLClassLoader dedicated to the InputInitializer thread (which is executed inside DAGAppMaster). Note that for security reasons, the JAR file is NOT registered in the default URLClassLoader because the JAR file should be visible only to the query requiring the JAR file.

Hence the JAR file can be accessed directly inside the InputInitializer thread. That is what happens normally when executing simple queries using UDFs.

For your query, however, the JAR file is accessed not from the URLClassLoader dedicated to the InputInitializer thread but from the ClassLoader of Kyro DefaultClassResolver, which is unaware of the JAR file. As such, you get ClassNotFoundException:

Caused by: java.lang.ClassNotFoundException: com.test.plt.hive.udf.JsonArrayExplode
        at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
        at java.lang.Class.forName0(Native Method)
        at java.lang.Class.forName(Class.java:348)
        at org.apache.hive.com.esotericsoftware.kryo.util.DefaultClassResolver.readName(DefaultClassResolver.java:154)

I have seen a similar problem while stabilizing Hive 4 on MR3 using Iceberg, and the workaround is the same (i.e., add the JAR file in the list of hive.aux.jars.path). We could revise MR3 so that JAR files are added to the default ClassLoader, but this gives rise to new security problems and I am not sure if the revision is worth it.

Please let me know if this workaround works for you.

--- Sungwoo


--
You received this message because you are subscribed to the Google Groups "MR3" group.
To unsubscribe from this group and stop receiving emails from it, send an email to hive-mr3+u...@googlegroups.com.

Ill

unread,
Jan 4, 2023, 1:56:19 AM1/4/23
to MR3
Hi.
The solution that put UDF jar in to  "hive.aux.jars.path"  is not suit for me. The reasons are as follows:
1.The security boundary problem. In our production environment, SQL developers do not have login permission to the backend ECS.
2.UDF jars are updated more frequently due to changes in business requirements. The solution to add it to“ hive.aux.jars.path” is too bulky.
So. we need the solution that revise MR3 so that JAR files are added to the default ClassLoader.
As for security problems, When we allow users to run custom jars, we have a hard time getting rid of this problem.

Sungwoo Park

unread,
Jan 4, 2023, 2:42:49 AM1/4/23
to Ill, MR3
The solution that put UDF jar in to  "hive.aux.jars.path"  is not suit for me. The reasons are as follows:
1.The security boundary problem. In our production environment, SQL developers do not have login permission to the backend ECS.
2.UDF jars are updated more frequently due to changes in business requirements. The solution to add it to“ hive.aux.jars.path” is too bulky.
So. we need the solution that revise MR3 so that JAR files are added to the default ClassLoader.
As for security problems, When we allow users to run custom jars, we have a hard time getting rid of this problem.

If a JAR file is added to the default class loader, everyone can call any UDF included in it. Hence, mr3.am.permit.custom.user.class is unnecessary. (It is still not clear to me what extension you would like to see.)

Do you use Ranger to control which user is allowed/disallowed to run UDFs? If this is the case, it might make sense to share JAR files for all users.

Another option is to use 'individual session mode' of MR3.


In individual session mode, each user creates an isolated MR3 master, so JAR files can be safely added to the default class loader. However, workers are not shared among users, so the resource usage is much lower. (It's like running Hive on Tez.)

Sungwoo

Ill

unread,
Jan 4, 2023, 3:19:54 AM1/4/23
to MR3
HI.
Yes, I use Ranger to control which user is allowed/disallowed to run UDFs

Sungwoo Park

unread,
Jan 5, 2023, 2:38:19 AM1/5/23
to MR3
We decided not to change the current implementation. 

1. Adding new URLs to the system class loader is not feasible without ugly hacking . It is not safe, either, because we might keep adding URLs.
2. It is not clear if registering a new JAR with the same name but different contents is okay. The result of executing a query may depend on the time that a class is loaded for the first time.

This problem is unique to Hive-MR3 and does not arise in Hive-Tez (which allocated a new DAGAppMaster for each client). So, my suggestion is almost the same as before.

1. For complex UDFs, include the JAR in hive.aux.jars.path.
2. If the JAR is updated, you need to restart HiveServer2, unfortunately.
3. For simple UDFs, you can use 'add jar' command, as usual.

Sungwoo

Ill

unread,
Jan 9, 2023, 2:33:38 AM1/9/23
to MR3
HI.
If we add UDF to “hive.aux.jars.path”, it will cause a serious problem:
The JAR package added by the user may conflict with the version of the JAR package in HIVE, causing various strange problems.

Therefore, I believe that the scheme of adding UDF to   “hive.aux.jars.path”   is not advisable in the production environment.This problem will need to be solved eventually.

If MR3 can solve this problem in "shared session mode", it must be the best result. 
If it is really difficult, we need to document the solution(eg. use 'individual session mode' ).And explain why, so users can quickly understand what's going on.

Sungwoo Park

unread,
Jan 9, 2023, 3:34:33 AM1/9/23
to MR3
If we add UDF to “hive.aux.jars.path”, it will cause a serious problem:
The JAR package added by the user may conflict with the version of the JAR package in HIVE, causing various strange problems.

In such a case, I guess you could build your JAR package so that it would not conflict with Hive.

Sungwoo

Reply all
Reply to author
Forward
0 new messages