What is the purpose of ShareInputScan?

208 views
Skip to first unread message

Craig Harris

unread,
Apr 18, 2017, 12:06:00 PM4/18/17
to Greenplum Developers
Can anyone offer a succinct explanation of the purpose of ShareInputScan plan nodes?

There has been a recent assertion to the effect that ShareInputScan scans across slices.
The code is a bit tricky to follow, but it's not obvious to me that it does that?

In most situations, execution of a plan node involves calling ExecProcNode on its "child planstate" to pull an input tuple.

For example, SubqueryNext has "slot = ExecProcNode(node->subplan);"

While ShareInputScan plan nodes have children, the code doesn't use this approach, appears to "know" the child can only be a Material or Sort node, and calls lower-level functions to access the "result" of its child.

If I understood the purpose of ShareInputScan, I might have an easier time following its implementation code.


Haisheng Yuan

unread,
Apr 18, 2017, 3:24:10 PM4/18/17
to Craig Harris, Greenplum Developers
Hi Craig,

ShareInputScan is used to share the output of a subplan.
Queries that have CTE, window function may generate plan using ShareInputScan.

For example, we have a CTE query like this:

hyuan=# explain with x as (select * from foo) select * from x x1, x x2;
                                            QUERY PLAN
---------------------------------------------------------------------------------------------------
 Gather Motion 3:1  (slice2; segments: 3)  (cost=7.06..13.27 rows=100 width=32)
   ->  Nested Loop  (cost=7.06..13.27 rows=34 width=32)
         ->  Shared Scan (share slice:id 2:0)  (cost=3.21..3.42 rows=4 width=16)
         ->  Materialize  (cost=3.85..4.15 rows=10 width=16)
               ->  Broadcast Motion 3:3  (slice1; segments: 3)  (cost=3.21..3.82 rows=10 width=16)
                     ->  Shared Scan (share slice:id 1:0)  (cost=3.21..3.42 rows=4 width=16)
                           ->  Materialize  (cost=3.11..3.21 rows=4 width=16)
                                 ->  Seq Scan on foo  (cost=0.00..3.10 rows=4 width=16)
 Settings:  gp_cte_sharing=on
 Optimizer status: legacy query optimizer
(10 rows)

Inline image 1

Different colors represent different slices in above plan graph.

ShareInputScan has 2 different types: producer and consumer. As you can see in above plan, the first Shared Scan (outer child of NLJ) is a consumer, it reads tuple from producer. The second Shared Scan (under inner child of NLJ) that has child plan is a producer. The child operator of ShareInputScan is either Material or Sort.

ShareInputScan doesn't pull tuple like other operators do. We need to make sure the producer ShareInputScan reads all the tuples from child plan before the consumer ShareInputScan can read it to avoid potential deadlock issue in cross slice environment. If the material node is used by share input scan, we will need to fetch all rows and put them in tuplestore (implemented by tuplestorenew.c), then the ShareInputScan can read tuples from tuplestore instead of child plan.

In fact, in non cross slice scenario, both producer and consumer and pull a tuple from child plan and store it in tuple store, just like how nodeCteScan in postgres does. But in GPDB's cross slice scenario, consumer needs to wait for producer finishes the reading, and they share the results through workfile.

I agree the code is tricky to follow, node ShareInputScan and Material, Sort are tightly coupled at the moment. You may also need to read source code of nodeMaterial.c/nodeSort.c to understand ShareInputScan.

Feel free to ask if you have further questions.


~ ~ ~
Haisheng Yuan
Reply all
Reply to author
Forward
0 new messages