Groups keyboard shortcuts have been updated
Dismiss
See shortcuts

Performance issue and Hive-MR3 with Java 17

10 views
Skip to first unread message

Sungwoo Park

unread,
Nov 15, 2023, 12:20:00 AM11/15/23
to MR3
Some users reported Hive-MR3 getting slower after running for a while. We reproduced this problem by executing TPC-DS query 4 (1TB scale) 50 times. Query 4 is good for testing the performance of Hive-MR3 because it generates several ordered edges and is shuffle-heavy.

We have been battling this problem for days. Ths problem started to surface after merging HIVE-20702, but this patch is not the cause of the problem because it only helps generating more efficient DAGs. Heap analysis shows no memory leak or any similar symptom, and in the end, we have come to the conclusion that this is a bug in G1RemSet::refine_card() Java 8. (https://bugs.openjdk.org/browse/JDK-8177707 seems related to this problem.)

In the attached table, we see that from Hive-MR3 1.3 with Java 8, the query starts to get slower around 25-th execution of query 4 (highlighted in red). The same problem is not observed when running Hive-MR3 with Java 17 even after executing 200 times.

From the experiment, we also observe:

1. Hive-MR3 1.8 is noticeably faster than Hive-MR3 1.7.
2. Running Hive-MR3 with Java 17 gives a significant performance boost.

--- Sungwoo
hivemr3.release1.8.java17.performance.xlsx
Reply all
Reply to author
Forward
0 new messages