Hi all,
short version:
2. have community idea if we should backport the above PR to 6X
---------------------
TL;DR (but recommend reading...)
Since Greenplum 5, we have enabled the important feature of SQL: recursive CTE.
There is a GUC gp_recursive_cte to turn on | off this feature.
In Greenplum 5, the GUC is default off, and since Greenplum 6 it is default on.
## Plan shape of Recursive Query
Recursive query's plan has a recursive Union at the top, and its left tree is non-recurisve part,
its right tree is recursive part. Recursive part will be scanned just like a while loop, each new run
the top Recursive Union node will pass the worktable via Parameter to the Recursive part's
worktable scan.
Look at the below example (from Postgres 12):
gpadmin=# explain (costs off) with recursive cte(n) as
(
select 1
union all
select n+1 from cte where n < 10
)
select * from cte;
QUERY PLAN
-------------------------------------------
CTE Scan on cte
CTE cte
-> Recursive Union
-> Result
-> WorkTable Scan on cte cte_1
Filter: (n < 10)
The Result Node (left tree of Recursive Union) is from the `select 1`,
and the worktable scan is the the recurisve part SQL: `select n+1 from cte`.
The execution algorithm is:
1. Recursive Union prepare the init worktable as the result of Result node
2. Recusrive pass the work table to below worktable scan, and use the right
tree plan to generate a new worktable
3. if the new generated worktable is empty, then go to 4, else go to 2.
4. Union all worktable of each run as final result.
## MPP problem in Greenplum
In Greenplum, optimizer might add motion when planning a join of the worktable scan
with other tables. Motion will cut plan into difference slices and thus difference processes.
Current Greenplum MPP architecture don't support pass params across motion. Thus
It might crash (null pointer reference due to param not passed from top).
Greenplum use two ways to get rid of crash:
1. during parse analyze stage, we add much code to throw error when we think it might lead
to MPP plan of the recursive part. This introduces much difference from the upstream.
This is Greenplum specific behavior and unfortunately, it seems we don't have a clear
document on what we support and what we don't.
2. in planning stage, we will mark a path cannot be moved (no motion added) when the path
contain wts (work table scan). See code of `cdbpath_contains_wts`
and `cdbpath_motion_for_join`.
## Problems and Bugs of Greenplum recursive CTE parse-analyze
Greenplum specific logic during parse-analyze introduces much difference from
upstream and leads to some problems:
I hit a recursive SQL failure during pg12.12 merge, and come up with the above PR [2],
the SQL is in the test case of the that PR.
With PR [2] gets in, it introduces a regression which is the above Issue [3]. (Greenplum 6
support the SQL in Issue 2).
So two hard problems here:
* without the fix, pg_upgrade test case will fail and also the problem in PR [1]
* with the fix, it introduces a regression compared to GP6 (Issue [3]).
Previously, Xuejing spend considerable time on fixing the Greenplum-specific code of
parse-analyze to try best to fix both. Code becomes hard to understand, hard to prove
correctness, hard to review. Thus I come up with a new idea.
## New fix idea: lets just keep align with upstream
New fix method is:
* remove all Greenplum specific parse-analyze code and just make it same as upstream
* add a post-plan (or during-plan) check, when detect there is a motion between recursive
union and worktable scan, throw error to get rid of crash.
The algorithm is simple and clear.
My only concern is: the post-plan (or during-plan) method, is hard to have an accurate
document on what kind of SQL might fail. Previous parse-analyze check method might
have well document but we don't. Given we don't hear any real complain, so I think it is
acceptable.
## Should we backport to 6X?
Previous method (continue to hack the parse-analyze check code) are hard to review so I
am very hesitant to backport. But the new algorithm is much clearer, now I tend to backport
but want to have your ideas.
---------------------------
Best,
Zhenghua Lyu