HIVE ON MR3 cannot execute statements repeatedly with EsStorageHandler

26 views
Skip to first unread message

Carol Chapman

unread,
Oct 11, 2022, 8:06:54 AM10/11/22
to MR3
Hi.
  At present, we need to use EsStorageHandler  to write the data in HIVE to Elasticsearch. I used  EsStorageHandler   in HIVE ON MR3, and I found a very strange phenomenon:
Here is my execution SQL statement:
add jar hdfs:///user/hive/resource/jars/elasticsearch-hadoop-6.3.2.jar;
add jar hdfs:///user/hive/resource/jars/commons-httpclient-3.1.jar;
CREATE EXTERNAL TABLE test.test01
(
  c1 string,
  c2 string,
  c3 string,
  c4 string,
  c5 string,
  c6 string,
  c7 string
)
STORED BY 'org.elasticsearch.hadoop.hive.EsStorageHandler'
TBLPROPERTIES(
 'es.nodes' = '192.168.xx,192.168.xx',
 'es.port' = '9200',
 'es.index.read.missing.as.empty' = 'true',
 'es.resource' = 'index_crowd_uniid_mapping/_doc',
 'es.nodes.wan.only' = 'true',
 'es.index.auto.create' = 'false',
 'es.read.metadata' = 'true',
 'es.mapping.names' = 'c1:c1,c2:c2, c3:c3, c4:c4, c5:c5,c6:c6,c7:c7',
 'es.scroll.size' = '5000'
);

insert overwrite table test.test01
    cd.c1,'static1' as c2 ,cd.c3 as c3,mp.c4,mp.c5,mp.c6,from_unixtime(unix_timestamp(),'yyyy-MM-dd HH:mm:ss') as c7
from test.test02 as cd
inner join test.test03 as mp on cd.c1=mp.c2
where cd.c1='008185' and mp.c3='qiushi6' and length(mp.s_uni_id)>0 and length(mp.c4)>0;


When I execute the Insert statement for the first time, data can be written
But when I execute the statement for the second time, I get the following exception information:
Caused by: java.lang.NoClassDefFoundError: org/elasticsearch/hadoop/hive/HiveValueWriter
        at org.elasticsearch.hadoop.hive.EsHiveOutputFormat.getHiveRecordWriter(EsHiveOutputFormat.java:88)
        at org.elasticsearch.hadoop.hive.EsHiveOutputFormat.getHiveRecordWriter(EsHiveOutputFormat.java:42)
        at org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getHiveRecordWriter(HiveFileFormatUtils.java:282)
        at org.apache.hadoop.hive.ql.exec.FileSinkOperator.createBucketForFileIdx(FileSinkOperator.java:772)
        at org.apache.hadoop.hive.ql.exec.FileSinkOperator.createBucketFiles(FileSinkOperator.java:723)
        at org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:889)
        at org.apache.hadoop.hive.ql.exec.vector.VectorFileSinkOperator.process(VectorFileSinkOperator.java:111)
        at org.apache.hadoop.hive.ql.exec.Operator.vectorForward(Operator.java:968)
        at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:941)
        at org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.process(VectorSelectOperator.java:158)
        at org.apache.hadoop.hive.ql.exec.Operator.vectorForward(Operator.java:968)
        at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:941)
        at org.apache.hadoop.hive.ql.exec.vector.VectorGroupByOperator.flushOutput(VectorGroupByOperator.java:1176)
        at org.apache.hadoop.hive.ql.exec.vector.VectorGroupByOperator.closeOp(VectorGroupByOperator.java:1184)
        at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:735)
        at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.close(ReduceRecordProcessor.java:389)
        at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:353)
        ... 10 more


When I use Apache HIVE(HDP 3.1.4) to execute statements, it is always normal.

Sungwoo Park

unread,
Oct 11, 2022, 8:48:23 AM10/11/22
to Carol Chapman, MR3
This is indeed strange. I think you could check out the working directories of containers and see if a subdirectory (e.g.,  /dag_10664_0000_10_LR) contains the jar file after executing the statement.

In Hive-MR3, 'add jar' should be executed for each session, so if the second execution of the statement is in a new session, 'add jar' should be executed again.

Cheers,

--- Sungwoo


--
You received this message because you are subscribed to the Google Groups "MR3" group.
To unsubscribe from this group and stop receiving emails from it, send an email to hive-mr3+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/hive-mr3/9219033e-c432-4651-8be2-f91ee5815888n%40googlegroups.com.
Message has been deleted

Carol Chapman

unread,
Oct 11, 2022, 10:53:40 AM10/11/22
to MR3
I am performing the operation in the same Session.  I was testing in a QA environment.
I judged that the JAR package should be there because the INSERT statement can be committed for execution. The current SQL statement is executed to the Reduce section and begins to report errors. If the JAR package does not exist, then the SQL statement should not have been committed in the first place.
I will test in detail next, provide some more detailed information.

Sungwoo Park

unread,
Oct 11, 2022, 10:55:58 AM10/11/22
to Carol Chapman, MR3
Another suggestion I could make is to include the jar file in the configuration hive.aux.jars.path in hive-site.xml, e.g.:

<property>
  <name>hive.aux.jars.path</name>
  <value>/home/hive/mr3-run/hive/hivejar/apache-hive-3.1.3-bin/lib/hive-llap-common-3.1.3.jar,/home/hive/mr3-run/hive/hivejar/apache-hive-3.1.3-bin/lib/hive-llap-server-3.1.3.jar,/home/hive/mr3-run/hive/hivejar/apache-hive-3.1.3-bin/lib/hive-llap-tez-3.1.3.jar,/foo/bar/elasticsearch-hadoop-6.3.2.jar,/foo/bar/commons-httpclient-3.1.jar</value>
</property>

You can also use mr3.lib.uris and mr3.aux.uris in mr3-site.xml (by adding HDFS paths).

--- Sungwoo


On Tue, Oct 11, 2022 at 11:43 PM Carol Chapman <carolcha...@gmail.com> wrote:
I am performing the operation in the same Session.  I was testing in a QA environment.
As it stands, I can only execute an INSERT statement once in a SESSION unless I restart MR3.
I will test in detail next, provide some more detailed information.
On Tuesday, 11 October 2022 at 20:48:23 UTC+8 Sungwoo Park wrote:

Carol Chapman

unread,
Oct 11, 2022, 12:24:06 PM10/11/22
to MR3
Ok,i try.
Reply all
Reply to author
Forward
0 new messages