Getting Tez LimitExceededException after dag execution on large query

134 views
Skip to first unread message

Carol Chapman

unread,
Dec 15, 2021, 3:10:37 AM12/15/21
to MR3
ERROR : FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.tez.TezTask. org.apache.tez.common.counters.LimitExceededException: Too many counters: 1201 max=1200 at org.apache.tez.common.counters.Limits.checkCounters(Limits.java:81) at org.apache.tez.common.counters.Limits.incrCounters(Limits.java:88) at org.apache.tez.common.counters.AbstractCounterGroup.addCounter(AbstractCounterGroup.java:63) at org.apache.tez.common.counters.AbstractCounterGroup.addCounterImpl(AbstractCounterGroup.java:80) at org.apache.tez.common.counters.AbstractCounterGroup.findCounterImpl(AbstractCounterGroup.java:106) at org.apache.tez.common.counters.AbstractCounterGroup.findCounter(AbstractCounterGroup.java:98) at org.apache.tez.common.counters.AbstractCounterGroup.findCounter(AbstractCounterGroup.java:113) at org.apache.hadoop.hive.ql.exec.mr3.monitoring.DAGSummary.getCounterValueByGroupName(DAGSummary.java:109) at org.apache.hadoop.hive.ql.exec.mr3.monitoring.DAGSummary.hiveCounterValue(DAGSummary.java:114) at org.apache.hadoop.hive.ql.exec.mr3.monitoring.DAGSummary.vertexSummary(DAGSummary.java:184) at org.apache.hadoop.hive.ql.exec.mr3.monitoring.DAGSummary.print(DAGSummary.java:141) at org.apache.hadoop.hive.ql.exec.mr3.monitoring.MR3JobMonitor.printSummary(MR3JobMonitor.java:346) at org.apache.hadoop.hive.ql.exec.mr3.monitoring.MR3JobMonitor.monitorExecution(MR3JobMonitor.java:287) at org.apache.hadoop.hive.ql.exec.mr3.status.MR3JobRefImpl.monitorJob(MR3JobRefImpl.java:59) at org.apache.hadoop.hive.ql.exec.mr3.MR3Task.execute(MR3Task.java:184) at org.apache.hadoop.hive.ql.exec.tez.TezTask.executeMr3(TezTask.java:148) at org.apache.hadoop.hive.ql.exec.tez.TezTask.execute(TezTask.java:136) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:212) at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:97) at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:2675) at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:2346) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:2023) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1721) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1715) at org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:157) at org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:226) at org.apache.hive.service.cli.operation.SQLOperation.access$700(SQLOperation.java:87) at org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:324) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729) at org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:342) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748)





But i set tez.counters.max=10000 in tez-site.xml. Why does the configuration have not taken effect?

Carol Chapman

unread,
Dec 15, 2021, 4:02:45 AM12/15/21
to MR3
I saw a log like this:

2021-12-15T13:34:29,759 INFO [HiveServer2-Background-Pool: Thread-79] counters.Limits: Counter limits initialized with parameters: GROUP_NAME_MAX=256, MAX_GROUPS=500, COUNTER_NAME_MAX=64, MAX_COUNTERS=1200

The parameter entries displayed in the log are not the values in my profile

Sungwoo Park

unread,
Dec 15, 2021, 4:34:30 AM12/15/21
to MR3
2021-12-15T13:34:29,759 INFO [HiveServer2-Background-Pool: Thread-79] counters.Limits: Counter limits initialized with parameters: GROUP_NAME_MAX=256, MAX_GROUPS=500, COUNTER_NAME_MAX=64, MAX_COUNTERS=1200

The parameter entries displayed in the log are not the values in my profile

 Did you run Hive-MR3 in local thread mode when you save the above message?

You can set tez.counters.max in tez-site.xml, but it affects only DAGAppMaster and ContainerWorker. In the current implementation of Hive (and Hive-MR3), org.apache.tez.common.counters.Limits.setConfiguration() is never called from Hive, so the default value of 1200 is used for tez.counters.max in HiveServer2.

Not sure if this is intended, or if it is a bug. If it is a bug, we could open a JIRA ticket in Apache Hive. For Hive-MR3, we will try to include a patch in MR3 1.4.

Cheers,

--- Sungwoo

Carol Chapman

unread,
Dec 15, 2021, 5:09:45 AM12/15/21
to MR3
No,i run HIVE-MR3 on hadoop.
I found this:
tez-Limit.jpg
org.apache.tez.common.counters.Limits.setConfiguration()  called in TEZ DAGAppMaster.
So,I guess Mr3DagAppMaster did not do something similar.
Maybe we need to check  Mr3DagAppMaster   to ensure that  Mr3DagAppMaster   is consistent with  TezDAGAppMaster's behavior.

By the way,I wonder why I can't find  org.apache.tez.common.counters.Limits in the mr3-tez project?

Sungwoo Park

unread,
Dec 15, 2021, 6:36:14 AM12/15/21
to MR3
Limits.setConfiguration() is called in MR3 DAGAppMaster (and MR3 ContainerWorker), so it is not the source of the problem. The problem is that Limits.setConfiguration() is not called in HiveServer2, so if it is a bug at all, it is a bug on the Hive side.

org.apache.tez.common.counters.Limits is not found in mr3-tez because it is implemented in MR3 core.

Cheers,

--- Sungwoo
Reply all
Reply to author
Forward
0 new messages