MR3ApplicationMaster can not allocated container

Carol Chapman

unread,

Nov 12, 2021, 4:35:14 AM11/12/21

to MR3

HI!

Hello, I have encountered some problems in the process of using HIVE ON MR3, I need your help.

After MR3 ON YARN has been running for some time,We found that Mr3AppMaster was always in a state where it could not apply for resources to run the query task submitted by the user.When I run the SQL query,the Map Task is always in the Running state.

According to the YARN UI, the Application of MR3 has only one ApplicationMaster running, but it cannot apply for any resources to run tasks, even if the cluster is completely idle.

LOG INFO:

2021-11-12 12:00:56,325 [All-In-One] INFO TaskScheduler [] - workerVertexFinishedScheduling() All-In-One vertex_1633700383948_42192_341_02

2021-11-12 12:00:56,570 [YarnResourceScheduler] INFO ResourceScheduler [] - YarnResourceScheduler.reschedule() [0MB, 1]

2021-11-12 12:00:56,871 [IPC Server handler 25 on 2768] INFO DAG [] - hue_20211112120055_f26d7aec-1cc7-4959-902d-62bc89784a7a:343 waitUntilFinished() returns with false

2021-11-12 12:00:57,570 [YarnResourceScheduler] INFO ResourceScheduler [] - YarnResourceScheduler.reschedule() [0MB, 1]

2021-11-12 12:00:57,874 [IPC Server handler 26 on 2768] INFO DAG [] - hue_20211112120055_f26d7aec-1cc7-4959-902d-62bc89784a7a:343 waitUntilFinished() returns with false

2021-11-12 12:00:58,571 [YarnResourceScheduler] INFO ResourceScheduler [] - YarnResourceScheduler.reschedule() [0MB, 1]

2021-11-12 12:00:58,875 [IPC Server handler 14 on 2768] INFO DAG [] - hue_20211112120055_f26d7aec-1cc7-4959-902d-62bc89784a7a:343 waitUntilFinished() returns with false

2021-11-12 12:00:59,572 [YarnResourceScheduler] INFO ResourceScheduler [] - YarnResourceScheduler.reschedule() [0MB, 1]

2021-11-12 12:00:59,876 [IPC Server handler 2 on 2768] INFO DAG [] - hue_20211112120055_f26d7aec-1cc7-4959-902d-62bc89784a7a:343 waitUntilFinished() returns with false

2021-11-12 12:01:00,572 [YarnResourceScheduler] INFO ResourceScheduler [] - YarnResourceScheduler.reschedule() [0MB, 1]

2021-11-12 12:01:00,877 [IPC Server handler 0 on 2768] INFO DAG [] - hue_20211112120055_f26d7aec-1cc7-4959-902d-62bc89784a7a:343 waitUntilFinished() returns with false

2021-11-12 12:01:01,573 [YarnResourceScheduler] INFO ResourceScheduler [] - YarnResourceScheduler.reschedule() [0MB, 1]

2021-11-12 12:01:01,877 [IPC Server handler 15 on 2768] INFO DAG [] - hue_20211112120055_f26d7aec-1cc7-4959-902d-62bc89784a7a:343 waitUntilFinished() returns with false

2021-11-12 12:01:02,574 [YarnResourceScheduler] INFO ResourceScheduler [] - YarnResourceScheduler.reschedule() [0MB, 1]

2021-11-12 12:01:02,878 [IPC Server handler 20 on 2768] INFO DAG [] - hue_20211112120055_f26d7aec-1cc7-4959-902d-62bc89784a7a:343 waitUntilFinished() returns with false

2021-11-12 12:01:03,575 [YarnResourceScheduler] INFO ResourceScheduler [] - YarnResourceScheduler.reschedule() [0MB, 1]

2021-11-12 12:01:03,879 [IPC Server handler 28 on 2768] INFO

After I switch the Application of MR3 to the YARN queue, it can apply for resources and perform tasks again.

At present, I don't know how to solve the problem that MR3 suddenly cannot apply for container. Could you help me?I can provide you with relevant log information.

Looking forward to your reply.

Thank You.

Sungwoo Park

unread,

Nov 12, 2021, 5:59:34 AM11/12/21

to MR3

1)

Could you check the value for the configuration key mr3.queue.name in mr3-site.xml? It specifies the queue to use for running MR3. If it is not set, MR3 tries to use the default queue of Yarn.

2)
If this is the first line in the log that contains "ResourceScheduler [] - YarnResourceScheduler.reschedule()",

2021-11-12 12:00:56,570 [YarnResourceScheduler] INFO ResourceScheduler [] - YarnResourceScheduler.reschedule() [0MB, 1]

it means that MR3 cannot allocate containers because it reports that only 0MB of memory is usable for MR3.

In this case, my guess is that MR3 should be assigned to a different queue by setting mr3.queue.name in mr3-site.xml.

3)

Normally this would be the progress in the log of DAGAppMaster:

2021-11-12 02:45:01,086 [YarnResourceScheduler] INFO ResourceScheduler [] - YarnResourceScheduler.reschedule() [2752512MB, 10752]

<-- right after DAGAppMaster starts, we see that 2752512MB of memory is available.

...

2021-11-12 02:45:48,250 [YarnResourceScheduler] INFO ResourceScheduler [] - YarnResourceScheduler.reschedule() [24576MB, 96]

<-- after containers are created, we see that 24576MB of memory is left.

4)

For debugging, AMProcess mode is useful for running DAGAppMaster. In AMProcess mode, DAGAppMaster runs as a process on the host machine, rather than a Yarn container. The log is found at, e.g.,:

$ hive/hiveserver2-service.sh start --tpcds --hivesrc3 --amprocess

/home/hive/mr3-run/hive/hiveserver2-service-result/hive-mr3-63127d5-2021-11-12-11-44-20-76c5c021/application_1629718349310_0248

Cheers,

--- Sungwoo

Carol Chapman

unread,

Nov 12, 2021, 8:04:10 AM11/12/21

to MR3

1.on my cluster. MR3 has high permission and can submit tasks on any YARN queue

2. My task of initializing MR3 is fine, and it can execute SQL statements as well.However, after running for a week, MR3 will suddenly fail to submit SQL Query. According to log analysis, this is the first case you mentioned. In this case, MR3ApplicationMaster cannot apply for any resources on YARN.

3. From my logs,MR3 failed several times before it was unable to claim resources, and then killed a large number of vertex nodes.Since then, it has been unable to apply for any resources.

I can provide you with relevant log information.But how do I upload from here?Or should I send the log to you in the form of email?

Sungwoo Park

unread,

Nov 12, 2021, 10:26:23 AM11/12/21

to Carol Chapman, MR3

1. Do you run Hive on MR3 in shared session mode, or in individual session mode? Please see the diagrams at:

https://mr3docs.datamonad.com/docs/k8s/features/hiveserver2/

It seems that you run Hive on MR3 in individual session mode (because you said DAGAppMaster started with no memory available in the cluster), but normally one would run it in shared session mode for efficiency.

2.

From my logs,MR3 failed several times before it was unable to claim resources, and then killed a large number of vertex nodes.

--> Do you mean that MR3 DAGAppMaster failed to obtain resources from Yarn for creating containers?

3. I would like very much to see your log files. In our internal cluster, we run Hive on MR3 for many weeks without encountering any problem, so it may be a problem either with HiveServer2 or with MR3 DAGAppMaster. Let me send a private email.

Cheers,

--- Sungwoo

--
You received this message because you are subscribed to the Google Groups "MR3" group.
To unsubscribe from this group and stop receiving emails from it, send an email to hive-mr3+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/hive-mr3/2fc5a28b-7449-497f-b685-f169f6cd6ab0n%40googlegroups.com.

Carol Chapman

unread,

Nov 12, 2021, 11:06:26 AM11/12/21

to MR3

i use in shared session mode.I just sent the log file over.Hopefully we can find some problems in the log files

Carol Chapman

unread,

Jan 28, 2022, 10:53:20 AM1/28/22

to MR3

Through experiments, I have determined that the reason why am cannot allocate resources is : Can mr3 support yarn label scheduling? (google.com)

Now mr3 working is ok.

Reply all

Reply to author

Forward