Groups keyboard shortcuts have been updated
Dismiss
See shortcuts

Hive-MR3 with Celeborn,

11 views
Skip to first unread message

Sungwoo Park

unread,
Oct 24, 2023, 8:06:15 AM10/24/23
to MR3
Before the impending release of MR3 1.8, we would like to announce the release of Hive-MR3 with Celeborn (Hive 3.1.3 on MR3 1.8 with Celeborn 0.3.1).

Apache Celeborn [1] is remote shuffle service, similar to Magnet [2] and Apache Uniffle [3] (which was discussed in the Hive mailing list a while ago). Celeborn officially supports Spark and Flink, and we have implemented an MR3-extension for Celeborn.

In addition to all the benefits of using remote shuffle service, Hive-MR3-Celeborn supports direct processing of mapper output on the reducer side, which means that reducers do not store mapper output on local disks (for unordered edges). In this way, Hive-MR3-Celeborn can eliminate over 95% of local disk writes when tested on the 10TB TPC-DS benchmark. This can be particularly useful when running Hive-MR3 on public clouds where fast local disk storage is expensive or not available.

We have documented the usage of Hive-MR3-Celeborn in [4]. You can download Hive-MR3-Celeborn in [5].

FYI, MR3 is an execution engine providing native support for Hadoop, Kubernetes, and standalone mode [6]. Hive-MR3, its main application, provides the performance of LLAP yet is very easy to install and operate. If you are using Hive-Tez for running ETL jobs, switching to Hive-MR3 will give you a much higher throughput thanks to its advanced resource sharing model.

We have recently opened a Slack channel. If interested, please join the Slack channel and ask any question on MR3:


Thank you,

--- Sungwoo

Reply all
Reply to author
Forward
0 new messages