On window functions in recursive part of RCTE

30 views
Skip to first unread message

Zhenghua Lyu

unread,
Mar 22, 2022, 7:19:53 AM3/22/22
to gpdb...@greenplum.org
Hi,
    Greenplum 5 starts to partially support recursive CTE (under legacy planner, 
    ORCA does not support this feature yet). Since it is not easy to fully support
    recursive CTE the same as Postgres, Greenplum adds some specific checks
    during parse analysis stage, like not support recursive part contains window
    functions in target list (refer to the check function `checkWindowFuncInRecursiveTerm`).

    The discussion is based on:
  • master branch (top commit 7458af05)
  • 6x branch (top commit 4cb2b3b4)

    ## Outline of Current Implementation

  1. first try to generate plan (or path) for non-recursive part
  2. set worktable scan path's locus based the non-recursive part
  3. avoid motion over worktable scan 
  4. generate a recursive union to combine non-recursive part and recursive part
     No motions between RecursiveUnion and worktable scan is very important because
     RecursiveUnion needs to update worktable every iteration, this is sort of like passing
     parameters from parent node to child node. And current Greenplum MPP architecture
     does not support passing parameters across motion nodes.

     If we keep current framework that setting worktable scan's locus using non-recursive part plan,
     it is very hard to avoid motions when target list of recursive part contains window functions.

    ## How to support window functions?

    In my mind there might be 3 kinds of solutions.

    ### Solution 1, simple:
    
    We can force a gather motion in non-recursive part so that force a singleQE locus of
    worktable scan. Thus the recursive part is just like single-node postgres, we will node
    add any motions for window function.

    This might impact performance when we do not need gather: when non-recursive part
     is co-located with the window agg's order by clause.

   ### Solution 2, hard:

   Dynamically determine worktable scan's locus because we can control the locus of the
   non-recursive part. Sort of like the idea of outerlocus, still not very clear. No spike and
   no estimation of amount of work now.
 
   ### Solution 3, super hard
   
   Totally rewrite Recursive Queries in MPP database. No more details than a single sentence now. 
Reply all
Reply to author
Forward
0 new messages