Greenplum Planner uses locus to model the data distribution of paths. Some typical examples are:
Function scan of generate_series is General locus (because the information is complete at every single place)
the scan path of a replicated table is segmenteGeneral locus (because data only stored in segments, not master, and the information is complete at every single segment)
……
General and segmentGeneral locus imply that if the corresponding slice is executed in many different segments should provide the same result data set. Thus, in some cases, General and segmentGeneral can be treated like broadcast.
But what if the segmentGeneral and general locus path contain mutable functions? Mutable functions, by definition, do not guarantee results of different invokes. So for such cases, they lose the property and cannot be treated as *general. Previously, Greenplum planner does not handle these cases correctly.
We first notice this issue when Heikki raised an issue in Github: Replicated table has inconsistent content after UPDATE #10226. And after some investigation, we find other bad cases. Some bad plans are listed below, the complete list can be found here: https://gist.github.com/kainwen/ad43f4ae88dc56f3c6eb73056b195e8b
The root cause is: if a general or segmentGeneral locus paths contain volatile functions, they cannot be treated as general or segmentGeneral.
The idea is: when we find the pattern (a general or segmentGeneral locus paths contain volatile functions), we create a motion path above it to turn its locus to singleQE and then create a projection path. Then the core job becomes how we choose the places to check:
For a single base rel, we should only check its restriction, this is the at bottom of planner, this is at the function set_rel_pathlist
When creating a join path, if the join locus is general or segmentGeneral, check its joinqual to see if it contains volatile functions
When handling subquery, we will invoke set_subquery_pathlist function, at the end of this function, check the targetlist and havingQual
When creating limit path, the check and change algorithm should also be used
Correctly handle make_subplan
OrderBy clause and Group Clause should be included in targetlist and handled by the above Step 3.
A pr is opened: Correct plan of general & segmentGeneral path with volatiole functions. #10418
Update & Delete Statement on a replicated table is special. These statements have to be dispatched to each segment to execute. So if they contain volatile functions in their targetList or where clause, we should reject such statements:
For targetList, we check it at the function create_motion_path_for_upddel
For where clause, they will be handled in the query planner and if we find the pattern and want to fix it, do another check if we are updating or deleting replicated table, if so reject the statement.
For insert statement, if we have correct behavior on planner, it is OK automatically. Refer to the cases in the above gist link and the following gist link.
The cases that matched the above wrong ones are listed below. For the whole list, please refer to https://gist.github.com/kainwen/cd96c8fc35b68c9d87d73af7f269ba99