Grouping Sets rewrite - done so far
-----------------------------------
Last week, I took a stab at replacing GPDB's historical GROUPING SETS
implementation with the one from PostgreSQL 9.5. Why? Because the GPDB
implementation has a number of issues, ranging from assertion failures
to incorrect results. See
https://github.com/greenplum-db/gpdb/issues?q=is%3Aissue+is%3Aopen+label%3A%22Grouping+Sets%22+
The result of that was this PR:
https://github.com/greenplum-db/gpdb/pull/5440. There is still some ORCA
work needed, and there are still a few regression failures, but I
believe they're just a matter of fixing the expected output. It was
relatively straightforward; I first ripped out all the existing grouping
sets code, and then cherry-picked the necessary upstream commits. It works.
However, it's got a few performance issues:
1. It's not very MPP-aware. If you do "ROLLUP(a, b)", it will generate a
Gather node to bring all the data to the QD node, and perform the
grouping there.
2. It only supports sorted aggregates. No hashing.
I'm particularly worried about 1, the lack of MPP-awareness. I'm afraid
users would be very unhappy with that. In the last few days, I've been
thinking about how to fix the performance regressions.
Plan A: merge/cherry-pick stuff
-------------------------------
There are two relevant features in upstream:
1. Parallel aggregate support was added in PostgreSQL 9.6. It would
provide the infrastructure needed for two-stage aggregates. A parallel
aggregate on a single node, with separate Partial and Finalize stages,
is very similar to a two-stage aggregate across nodes.
2. Hashing support was added in PostgreSQL v10.
I considered cherry-picking those two features from PostgreSQL v10.
However, it gets very complicated, very quickly. The partial aggregate
support was added after the so-called Upper Planner Pathification work
in PostgreSQL 9.6. That completely revamped the GROUP BY planning in
PostgreSQL. Backporting that seems infeasible. It might be possible to
cherry-pick just the partial aggregate support without the Upper Planner
Pathification, but it requires some patchwork.
Even if we do that, the upstream Partial aggregate code is quite
different from the current GPDB implementation, especially in the
planner. Partial aggregates are used with regular GROUP BY queries, even
without grouping sets. So replacing the GPDB multi-stage aggragate
implementation with upstream's would involve a big refactoring of the
code in cdbgroup.c. And trying to support two different flavors of
multi-stage aggregates at the same time would be quite schizophrenic.
From a developer's point of view, I think the best marching order would be:
1. Commit the GROUPING SETS re-implementation we have in PR 5440 as it
is, and accept the performance issues for now.
2. Merge whole of GPDB all the way through PostgreSQL 9.6. Once we hit
9.6, as a temporary fix, remove cdbgroup.c. That would cause a massive
performance degradation to all GROUP BY queries, losing the ability to
create MPP-aware plans.
3. Re-implement the logic in cdbgroup.c, now that we have Upper Planner
Pathification, and Parallel Aggregates. That brings back the performance
on GROUP BY queries.
4. Re-implement the MPP-awareness to grouping sets using the Partial
Aggregate code. This brings back the performance we lost in the GROUPING
SETs rewrite in step 1.
5. Cherry-pick the support for hashing with grouping sets from PostgrSQL
v10 (or continue the merge from version 9.6 to 10 as whole).
In other words, we'd take on a technical mortgage in step 1, and pay it
back in steps 2-5.
From a Product Manager's point of view, there's a little problem with
this plan: We would like to release GPDB6 based on PostgreSQL 9.2, or
maybe 9.3 or 9.4. Not 9.6 or 10. If we release GPDB 6 between steps 1
and 2, grouping sets will have a significant performance regression,
compared to GPDB 5.
Plan B
------
Another approach would be to not pick the upstream Partial aggregate
support, but instead, add a bunch of new code to the grouping sets
planner that uses the current GPDB multi-stage aggregate implementation.
so the plan would be:
1. Commit the GROUPING SETS re-implementation we have in PR 5440 as it
is, and accept the performance issues for now.
2. Make that MPP-aware, by adding some code similar to current code in
cdbgroup.c.
We could perhaps reuse some of the old grouping sets code and the code
in cdbgroup.c, or the old grouping sets implementation, for this, but
it's not straightforward. It's going to be a fair amount of refactoring
and new code. And once we reach 9.6 in the PostgreSQL merge, we'll need
to rip out and replace it all anyway, and follow the steps 2-5 above.
The upside with this approach is that we have a good chance of getting
this into GPDB6.
Thoughts?
- Heikki