MR3 1.10 Release

38 views
Skip to first unread message

Sungwoo Park

unread,
Mar 12, 2024, 1:11:15 PMMar 12
to MR3
We have released MR3 1.10. You can download Hive-MR3 binaries from GitHub, and Hive-MR3 Docker image from DockerHub (for Java 17 only).

A major change in MR3 1.10 is a new implementation of shuffle server in Tez runtime. In the previous versions of Hive-MR3 (and also in Apache Hive), each Task runs its own shuffle manager/scheduler for each LogicalInput. Thus the total number of concurrent fetchers can be controlled only by the number of tasks in a ContainerWork, thus allowing too many concurrent fetchers in large ContainerWorkers.

In the new implementation of MR3 1.10, a central shuffle server manages all fetchers, for both ordered and unordered shuffling. The maximum number of concurrent fetchers can be set with a new configuration parameter and all fetchers share a common ExecutionContext, thus making much better use of Java resources. As a result, the execution of TPC-DS queries is more stable.

As part of the new implementation of shuffle server, fault tolerance for pipelined shuffling works well.

MR3 1.10 will be probably the last release for Hive 3.1.3 branch. As Hive 4 is now finally on the horizon, we will shift our focus on Hive 4 on MR3 (and maybe Hive 3.2 on MR3) from now on.

Let me announce the release of MR3 1.10 after updating documentation at MR3docs. Because the documentation will be based on MR3 1.10, we are going to remove MR3 1.9 release from GitHub and DockerHub.

--- Sungwoo


Sungwoo Park

unread,
Mar 16, 2024, 6:53:03 AMMar 16
to MR3

Sungwoo Park

unread,
Mar 17, 2024, 11:43:40 AMMar 17
to MR3
Here is a quick summary of performance comparison between MR3 1.9 and MR3 1.10 on the original TPC-DS benchmark with 99 queries.

MR3 1.9: total = 6473 seconds, geometric mean = 25.01 seconds
MR3 1.10: total = 6138 seconds, geometric mean = 24.42 seconds

For reference,

Trino 435: total = 6950 seconds, geometric mean = 19.18 seconds, query 23 returns wrong results and query 72 fails.
Spark 3.4.1 = 20647 seconds, geometric mean = 32.03 seconds

All experiments use ORC 10TB and Java 17.

Cheers,

--- Sungwoo
Reply all
Reply to author
Forward
0 new messages