[Proposal] Plan Hints in Greenplum

186 views
Skip to first unread message

David Kimura

unread,
Sep 22, 2023, 3:27:11 PM9/22/23
to Greenplum Developers
Hi hackers,

I would like to propose introducing plan hints in Greenplum. Plan hints is a
way to override the optimizer in order to coerce a plan shape by disregarding
its estimated cost compared to alternative plans. I think this feature has two
significant benefits:

1) It enables an immediate customer workaround for bugs in the optimizer when
optimizer generates an inferior plan.

2) It enables easy testing of plan performance which in turn can be used to
drive optimizer improvements.

In upstream, this feature has been proposed numerous times [1][2][3] in
different forms, but it has always been struck down. I think the main hurdle is
the fact that code is not free to write or maintain; and upstream prioritizes
fixing the optimizer bugs/limitations as opposed to implementing a workaround
solution. Another argument used against the feature is that, if done
incorrectly, it can promote bad SQL hygiene.

I think upstreams arguments are all valid. But in our case, I think that the
benefits may outweigh the risks. Postgres extension pg_hint_plan [4] has been
used in various other projects and has active community support from core
Postgres developers. Chris Hajas had explored introducing this into Greenplum
for Planner [5]. I propose we introduce a similar syntax and mechanism in
Greenplum for Orca.

In order to achieve that, I think the following problems need to be solved:

1) MPP hints:

One significant difference for Greenplum is its multi-host setup which
adds motions to shuffle data. So, we would need to extend the grammar to
express that. For joining tables the grammar already supports join order.

Example:

/*+
MergeJoin(t1 t2 t3)
Leading((t1 (t2 t3)))
*/

I propose to express motions with a new hint Motion.

Example:

/*+
Motion((none(t1) broadcast(redistribute(t2) any(t3))))
*/

2) Orca hint framework

I think all the hints can be implemented using the required/derived
property framework. I spent time to spike on that approach and so far it
seems quite promising.

3) Leverage existing code

Because pg_hint_plan is an extension, it needed its own parser (instead of
using gram.y). However, Orca is not an extension. Ideally, Orca would
reuse pg_hint_plan's grammar definition and structs without redefining
them somewhere else.

Any thoughts?

Does seem like a worthwhile feature? Does the syntax seems appropriate? Any
issues that I missed?

Thanks,
David

[1] https://www.postgresql.org/message-id/flat/Pine.LNX.4.64.0910062354510.6801%40sn.sai.msu.ru
[2] https://www.postgresql.org/message-id/flat/44D87BB7.4070509%40phlo.org
[3] https://www.postgresql.org/message-id/flat/20061012151439.GT28647%40nasby.net
[4] https://github.com/ossc-db/pg_hint_plan
[5] https://github.com/chrishajas/gpdb/blob/hackday/gpcontrib/pg_hint_plan/README.md

Luis Macedo

unread,
Sep 25, 2023, 5:27:53 PM9/25/23
to Greenplum Developers, David Kimura
David,

I personally dislike hints... We already have GUCs that we can set to force change in the optimizer behavior.

Also, when we set hints, if the data demography changes, the plan will be stuck to a particular hint that its not true anymore, so it can also be harmful.

Honestly I would invest that time in improving statistics collection, specially on multiple columns (eg: joins or filters on 2 or more columns) and propagating the stats on upper slices of the plan, which we currently don't do well and its a big source of bad plans.

I would also invest on a query plan check tool that could provide us with the metadata of the plans so we have a collection of queries were we do not do well and we can work on making those run fast.


Rgds,

Luis Macedo

unread,
Sep 25, 2023, 5:30:36 PM9/25/23
to Greenplum Developers, Luis Macedo, David Kimura
One more thing. Not that giving the database information about the query in a comment is not useful. I would like to see query tags for example. We can use tags to route such queries to specific resource groups for example.

David Kimura

unread,
Sep 25, 2023, 6:19:18 PM9/25/23
to Luis Macedo, Greenplum Developers
Hi Luis,

On Monday, September 25, 2023 2:27 PM Luis Macedo <luis0...@gmail.com> wrote:
> We already have GUCs that we can set to force change in the optimizer
> behavior.

The problem with GUCs is that often they are not granular enough. We have seen
multiple customer escalations where we didn't pick the right index; or we
picked a bad join order; or we performed a join on the coordinator instead of
the segments; or ...

And we've even added GUCs (e.g. optimizer_penalize_broadcast_threshold) to hack
around issues. Personally, I think this is even worse practice as it pollutes
our GUC-space with esoteric use cases for specific customer workloads.

> Also, when we set hints, if the data demography changes, the plan will be
> stuck to a particular hint that its not true anymore, so it can also be
> harmful.

Agreed. That's part of the "bad SQL hygiene" concern I noted. Plan hints should
probably be ephemeral. But I think that's on the DBA to manage. In production,
hints should probably only be used as a workaround until a proper optimizer
fix is shipped. As with many features, there's an risk that it may be used
incorrectly.

> Honestly I would invest that time in improving statistics collection,
> specially on multiple columns (eg: joins or filters on 2 or more columns)
> and  propagating the stats on upper slices of the plan, which we currently
> don't do well and its a big source of bad plans.
>
> I would also invest on a query plan check tool that could provide us with
> the  metadata of the plans so we have a collection of queries were we do not
> do well and we can work on making those run fast.

Noted. Thanks for that feedback.

Thanks,
David

Soumyadeep Chakraborty

unread,
Sep 26, 2023, 3:49:57 PM9/26/23
to David Kimura, Greenplum Developers, luis0...@gmail.com
Hey David,

On Fri, Sep 22, 2023 at 12:27 PM 'David Kimura' via Greenplum
Developers <gpdb...@greenplum.org> wrote:

> I would like to propose introducing plan hints in Greenplum. Plan hints is a
> way to override the optimizer in order to coerce a plan shape by disregarding
> its estimated cost compared to alternative plans. I think this feature has two
> significant benefits:
>
> 1) It enables an immediate customer workaround for bugs in the optimizer when
> optimizer generates an inferior plan.
>
> 2) It enables easy testing of plan performance which in turn can be used to
> drive optimizer improvements.

I am ambivalent about customers using plan hints in production, even as a
workaround. A specific plan hint may seem attractive at one moment, but become
stale quickly if the data changes (and the stats changes - we are also running
auto-analyze by default in 7X). Already-analyzed read-only tables seem to be the
only sensible target for plan hint use.

The 2nd one seems more attractive to me - so we could use it in dev builds and
not ship it in retail builds. I don't know if this is enough reason though to
take on the burden of forking, merging and modifying pg_hint_plan. Can you
elaborate on some of the ways this can enhance testability of our code, CI etc?

If we do decide to ship it, it should not be on by default. I certainly don't
like the fact that pg_hint_plan.enable_hint_table can be set by regular users.
Users could inadvertently set a pattern that may match against other user
queries. So there is some work to be done here.

>
> In order to achieve that, I think the following problems need to be solved:
>
> 1) MPP hints:
>
> One significant difference for Greenplum is its multi-host setup which
> adds motions to shuffle data. So, we would need to extend the grammar to
> express that. For joining tables the grammar already supports join order.
>
> Example:
>
> /*+
> MergeJoin(t1 t2 t3)
> Leading((t1 (t2 t3)))
> */
>
> I propose to express motions with a new hint Motion.
>
> Example:
>
> /*+
> Motion((none(t1) broadcast(redistribute(t2) any(t3))))
> */

Don't we also have to worry about distribution policies, locus, replicated vs
distributed etc?

Would the relstats estimates being sometimes whole-table vs divided by segment
count (the ADJUST_BASESCAN() gymnastics) make things messy?

Also, there is the aspect of access method. Certain hints won't make sense for
AO/CO tables.

There are also a number of operators that are exclusive to GPDB like
Dynamic*Scan and deviations from upstream (such as stream-bottom HashAgg) that
we would have to gradually absorb.

Finally, we would have to augment our existing tools (minirepro) to capture plan
hints used in queries, which may be annoying?

>
> 3) Leverage existing code
>
> Because pg_hint_plan is an extension, it needed its own parser (instead of
> using gram.y). However, Orca is not an extension. Ideally, Orca would
> reuse pg_hint_plan's grammar definition and structs without redefining
> them somewhere else.

Logistics wise, we could ship this as a core extension (like gp_toolkit), and
ORCA can depend on it being installed.


On Mon, Sep 25, 2023 at 3:19PM 'David Kimura' via Greenplum Developers
<gpdb...@greenplum.org> wrote:
>
> Hi Luis,
>
> On Monday, September 25, 2023 2:27 PM Luis Macedo <luis0...@gmail.com> wrote:
> > We already have GUCs that we can set to force change in the optimizer
> > behavior.
>
> The problem with GUCs is that often they are not granular enough. We have seen
> multiple customer escalations where we didn't pick the right index; or we
> picked a bad join order; or we performed a join on the coordinator instead of
> the segments; or ...
>
> And we've even added GUCs (e.g. optimizer_penalize_broadcast_threshold) to hack
> around issues. Personally, I think this is even worse practice as it pollutes
> our GUC-space with esoteric use cases for specific customer workloads.

True, however we have a degree of control over what users can do with the GUC.
With plan hints however, there is no degree of control.

>
> > Also, when we set hints, if the data demography changes, the plan will be
> > stuck to a particular hint that its not true anymore, so it can also be
> > harmful.
>
> Agreed. That's part of the "bad SQL hygiene" concern I noted. Plan hints should
> probably be ephemeral. But I think that's on the DBA to manage. In production,
> hints should probably only be used as a workaround until a proper optimizer
> fix is shipped. As with many features, there's an risk that it may be used
> incorrectly.

Seldom do we have DBAs manage specific queries. In so many cases, DBAs have so
little control over what app devs are doing. This is why we have features such
as workload management so that we can impose global limits (such as resource
groups/queues). Expecting DBAs to have oversight over individual queries can be
a tough sell.


Regards,
Soumyadeep (VMware)

David Kimura

unread,
Sep 26, 2023, 5:11:59 PM9/26/23
to Soumyadeep Chakraborty, Greenplum Developers, luis0...@gmail.com
Hi Soumyadeep,

On Tuesday, September 26, 2023 12:49 PM Soumyadeep Chakraborty <soumyad...@gmail.com> wrote:
> I am ambivalent about customers using plan hints in production, even as a
> workaround. A specific plan hint may seem attractive at one moment, but
> become stale quickly if the data changes

I feel like the same argument can be made against using GUCs to get out of a
plan regression. If we're okay with that workaround, then how are plan hints
different?

> Can you elaborate on some of the ways this can enhance testability of our
> code, CI etc?

One recent case is when the team was doing work to update index costing to account
for various combinations and permutations of index columns. The easiest way to
compare index performance is to force plans that pick each index and compare
result. For large data sets it became impractical to delete and create indexes
while debugging.

Another valuable use case is when debugging customer performance issues. If we
can coerce smaller queries then we can more quickly debug and isolate problems.

> Don't we also have to worry about distribution policies, locus, replicated vs
> distributed etc?

We don't. The hint writer does. If the hint conflicts, then the optimizer will
either error or fail to generate a plan. That's because hints don't force plans,
they just add "preference" to a subset of plans in the valid plan search space.

> Would the relstats estimates being sometimes whole-table vs divided by
> segment count (the ADJUST_BASESCAN() gymnastics) make things messy?

Possibly. I haven't deep dived into that yet.

> Also, there is the aspect of access method. Certain hints won't make sense
> for AO/CO tables.

Again, I think this shouldn't be an issue because hints don't force plans. If
the hint coerced a bad plan then that is probably a bug in the optimizer, not
the hinter.

> There are also a number of operators that are exclusive to GPDB like
> Dynamic*Scan and deviations from upstream (such as stream-bottom HashAgg)
> that we would have to gradually absorb.

Good point.

> Finally, we would have to augment our existing tools (minirepro) to capture
> plan hints used in queries, which may be annoying?

Yeah, it may be annoying. But not insurmountable. Those tools could probably
use a good refactoring anyways. ;)

>>   3) Leverage existing code
>>
>>      Because pg_hint_plan is an extension, it needed its own parser (instead 
>>      of using gram.y). However, Orca is not an extension. Ideally, Orca
>>      would reuse pg_hint_plan's grammar definition and structs without
>>      redefining them somewhere else.
>
> Logistics wise, we could ship this as a core extension (like gp_toolkit), and 
> ORCA can depend on it being installed.

Good to know. Thanks.

On Tuesday, September 26, 2023 12:49 PM Soumyadeep Chakraborty <soumyad...@gmail.com> wrote:
> On Mon, Sep 25, 2023 at 3:19PM 'David Kimura' via Greenplum Developers

>> On Monday, September 25, 2023 2:27 PM Luis Macedo <luis0...@gmail.com> wrote:
>>> We already have GUCs that we can set to force change in the optimizer
>>> behavior.
>>
>> The problem with GUCs is that often they are not granular enough. We have 
>> seen multiple customer escalations where we didn't pick the right index; or
>> we picked a bad join order; or we performed a join on the coordinator
>> instead of the segments; or ...
>>
>> And we've even added GUCs (e.g. optimizer_penalize_broadcast_threshold) to
>> hack around issues. Personally, I think this is even worse practice as it
>> pollutes our GUC-space with esoteric use cases for specific customer
>> workloads.
>
> True, however we have a degree of control over what users can do with the
> GUC.  With plan hints however, there is no degree of control.

What do you mean by "there is no degree of control"?

>>> Also, when we set hints, if the data demography changes, the plan will be
>>> stuck to a particular hint that its not true anymore, so it can also be
>>> harmful.
>>
>> Agreed. That's part of the "bad SQL hygiene" concern I noted. Plan hints
>> should probably be ephemeral. But I think that's on the DBA to manage. In
>> production, hints should probably only be used as a workaround until a
>> proper optimizer fix is shipped. As with many features, there's an risk that
>> it may be used incorrectly.
>
> Seldom do we have DBAs manage specific queries. In so many cases, DBAs have
> so little control over what app devs are doing. This is why we have features
> such as workload management so that we can impose global limits (such as
> resource groups/queues). Expecting DBAs to have oversight over individual
> queries can be a tough sell.

Hmm, that's unfortunate.. Though again, I think same problem exists using a GUC
workaround. If not the DBA then who usually manages that?

Thanks a lot for all this useful feedback!

Thanks,
David

Chris Hajas

unread,
Oct 3, 2023, 1:55:25 PM10/3/23
to Greenplum Developers, David Kimura
This is awesome (though I'm admittedly biased).

I will say that over the years I've become more and more supportive of plan hints. In most cases, the optimizer is really good at selecting plans. But when there's a corner case, it usually involves rewriting the query or using a GUC to serve as a plan hint to sort of get the shape we want.

In the ideal world, plan hints wouldn't be necessary. The optimizer would have perfect cardinality estimates for correlated columns, for predicates and join conditions with multiple complex case statements or text operations, and all data types would support statistics. On the data side, ideally there wouldn't be data skew, or skew information would be held in statistics. On the algorithm side, P=NP and we could compute the optimal join order for 12+ way joins in a reasonable time. And on the optimizer side, sometimes the cost model just isn't accurate enough, but major changes here are a huge effort and can cause many other plans to get worse. And sometimes it is a bug/limitation in the optimizer, and it just takes time to fix that we don't want to rush through. These are all cases we've encountered, and in some cases we rewrite the query a bit, use a GUC if possible (or introduce one if it's not...), or use Orca/Planner to see if that works and try GUCs there to get the plan we want. Which is basically trying to serve the same purpose as plan hints, just being less explicit about it.

I don't want to get too in the weeds of why some of these problems aren't realistically going to be completely solved super soon, but I want to give that context for why, in my opinion, we need hints. Note that these issues aren't limited to Orca or Planner--they're pretty core optimizer problems that are difficult to solve. If I'm generalizing, I'd say that most "bad" plans fall in 3 categories:

1) Incorrect join estimates for deep join trees. We're usually pretty good at estimating the first join. But the higher in the tree we go, the error gets magnified. By the time we're 5-10+ joins up, it's quite difficult to estimate. We would need statistics that track correlation between columns (which is available via extended statistics, but makes analyze take longer), but we would also need more granular statistics to have all of this information. The more complex calculations we make to be more accurate, the longer we spend during optimization.

2) Poor join order. In many cases, this is partially due to (1). But also, we use a simplified cost model when calculating join order, since we need this to be fast. Additionally, there's many permutations of join order, and we need to quickly choose the best one, based on calculations or heuristics. For 10 joins, this is already in the millions and optimizers start heavily simplifying after this many joins just to generate a valid.

3) Choosing, or not choosing an index (or a non-optimal index). This is just a hard cost model problem. Different underlying storage, compression, and index types, different visibility, all influence the cost here, and getting this right all the time can be tricky

I like the idea of leveraging pg_hint_plan since it will hopefully solve the parsing problem, and we'll get hints for the Postgres-based planner for nearly free.

  • Chris


From: 'David Kimura' via Greenplum Developers <gpdb...@greenplum.org>
Sent: Friday, September 22, 2023 12:27 PM
To: Greenplum Developers <gpdb...@greenplum.org>
Subject: [Proposal] Plan Hints in Greenplum
 
!! External Email
[1] https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.postgresql.org%2Fmessage-id%2Fflat%2FPine.LNX.4.64.0910062354510.6801%2540sn.sai.msu.ru&data=05%7C01%7Cchajas%40vmware.com%7Ce80eb0fd89f54ea767c908dbbba1ec0a%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C638310076349779742%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=i0BUt7odSExnKS3%2F2O40Ts86YWo60u0nQDTjgM1JqZw%3D&reserved=0
[2] https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.postgresql.org%2Fmessage-id%2Fflat%2F44D87BB7.4070509%2540phlo.org&data=05%7C01%7Cchajas%40vmware.com%7Ce80eb0fd89f54ea767c908dbbba1ec0a%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C638310076349779742%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=JPJnFlSm9kx74qfdorJTYPiJxFzkLZsRNfZMkP0pofc%3D&reserved=0
[3] https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.postgresql.org%2Fmessage-id%2Fflat%2F20061012151439.GT28647%2540nasby.net&data=05%7C01%7Cchajas%40vmware.com%7Ce80eb0fd89f54ea767c908dbbba1ec0a%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C638310076349779742%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=UQ1JHNm5qOBMcA7R0yLuDTRj7whSHSS1U3AzqOvzCRM%3D&reserved=0
[4] https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fossc-db%2Fpg_hint_plan&data=05%7C01%7Cchajas%40vmware.com%7Ce80eb0fd89f54ea767c908dbbba1ec0a%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C638310076349779742%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=9UGf63Zql4OTeV%2Bw5uvWWpuzKyxtYNrsHVpEusxI5%2Bg%3D&reserved=0
[5] https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fchrishajas%2Fgpdb%2Fblob%2Fhackday%2Fgpcontrib%2Fpg_hint_plan%2FREADME.md&data=05%7C01%7Cchajas%40vmware.com%7Ce80eb0fd89f54ea767c908dbbba1ec0a%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C638310076349779742%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=RelKrRME%2F5qKg0MsKbynA4V5xCnQ7bZ2ZB%2BT51y4BtI%3D&reserved=0



!! External Email: This email originated from outside of the organization. Do not click links or open attachments unless you recognize the sender.

David Kimura

unread,
Feb 5, 2024, 7:39:09 PMFeb 5
to gpdb...@greenplum.org
On Friday, September 22, 2023 at 12:27:11 PM David Kimura wrote:
> I would like to propose introducing plan hints in Greenplum.

I am starting to implement join order hints in ORCA. I think we have a couple
options for how:

1) In CXformExpandNAryJoinExpand*::Transform() and friends, filter outer
expression trees that do not satisfy the join order hints.

2) Before CXformExpandNAryJoinExpand*::Transform() and friends, update the
NAryJoin with the hints.

Let's walk through this example: A query joins tables A, B, and C; the hint
specifies to join A and B first.

In the first option, the join order algorithms may still enumerate over all the
possible join orders. In this case that would be (AxB)xC, (BxC)xA, and (AxC)xB.
But it would only add results that satisfy the hint: (AxB)xC.

In the second option, we fix-up the NAryJoin expression before passing it into
the join order algorithm.

Before fix-up:
+--CLogialNAryJoin
|--CLogicalGet "A"
|--CLogicalGet "B"
|--CLogicalGet "C"
+...

After fix-up:
+--CLogialNAryJoin
|--CLogicalInnerJoin
| |--CLogicalGet "A"
| |--CLogicalGet "B"
| +...
|--CLogicalGet "C"
+...

Then the join order xform has less decisions to make; only option is (AxB)xC.
Besides the potential optimization speedup, another advantage in the second
approach is that the logic can exist in one place (perhaps ORCA preprocessor).

IMO the first option is the safest because it merely filters out plans rather
than constructs nodes and relationships. The speedup provided by the second
option comes with the risk of incorrect plan bugs due to hint code. For this
reason, I am leaning in the direction of option one.


Does anybody have thoughts, questions, or concerns?

Thanks,
David

PS: ORCA scan hints were added in commit e781fe0db4 and a GPDB version of
pg_hint_plan was added in gpcontrib. It follows pg_hint_plan syntax.

--
This electronic communication and the information and any files transmitted
with it, or attached to it, are confidential and are intended solely for
the use of the individual or entity to whom it is addressed and may contain
information that is confidential, legally privileged, protected by privacy
laws, or otherwise restricted from disclosure to anyone else. If you are
not the intended recipient or the person responsible for delivering the
e-mail to the intended recipient, you are hereby notified that any use,
copying, distributing, dissemination, forwarding, printing, or copying of
this e-mail is strictly prohibited. If you received this e-mail in error,
please return the e-mail to the sender, delete it from your computer, and
destroy any printed copy of it.
Reply all
Reply to author
Forward
0 new messages