Some users reported Hive-MR3 getting slower after running for a while. We reproduced this problem by executing TPC-DS query 4 (1TB scale) 50 times. Query 4 is good for testing the performance of Hive-MR3 because it generates several ordered edges and is shuffle-heavy.
We have been battling this problem for days. Ths problem started to surface after merging HIVE-20702, but this patch is not the cause of the problem because it only helps generating more efficient DAGs. Heap analysis shows no memory leak or any similar symptom, and in the end, we have come to the conclusion that this is a bug in G1RemSet::refine_card() Java 8. (
https://bugs.openjdk.org/browse/JDK-8177707 seems related to this problem.)
In the attached table, we see that from Hive-MR3 1.3 with Java 8, the query starts to get slower around 25-th execution of query 4 (highlighted in red). The same problem is not observed when running Hive-MR3 with Java 17 even after executing 200 times.
From the experiment, we also observe:
1. Hive-MR3 1.8 is noticeably faster than Hive-MR3 1.7.
2. Running Hive-MR3 with Java 17 gives a significant performance boost.
--- Sungwoo