I would like to propose introducing plan hints in Greenplum. Plan hints is a
way to override the optimizer in order to coerce a plan shape by disregarding
its estimated cost compared to alternative plans. I think this feature has two
significant benefits:
1) It enables an immediate customer workaround for bugs in the optimizer when
optimizer generates an inferior plan.
2) It enables easy testing of plan performance which in turn can be used to
drive optimizer improvements.
In upstream, this feature has been proposed numerous times [1][2][3] in
different forms, but it has always been struck down. I think the main hurdle is
the fact that code is not free to write or maintain; and upstream prioritizes
fixing the optimizer bugs/limitations as opposed to implementing a workaround
solution. Another argument used against the feature is that, if done
incorrectly, it can promote bad SQL hygiene.
I think upstreams arguments are all valid. But in our case, I think that the
benefits may outweigh the risks. Postgres extension pg_hint_plan [4] has been
used in various other projects and has active community support from core
Postgres developers. Chris Hajas had explored introducing this into Greenplum
for Planner [5]. I propose we introduce a similar syntax and mechanism in
Greenplum for Orca.
In order to achieve that, I think the following problems need to be solved:
1) MPP hints:
One significant difference for Greenplum is its multi-host setup which
adds motions to shuffle data. So, we would need to extend the grammar to
express that. For joining tables the grammar already supports join order.
Example:
/*+
MergeJoin(t1 t2 t3)
Leading((t1 (t2 t3)))
*/
I propose to express motions with a new hint Motion.
Example:
/*+
Motion((none(t1) broadcast(redistribute(t2) any(t3))))
*/
2) Orca hint framework
I think all the hints can be implemented using the required/derived
property framework. I spent time to spike on that approach and so far it
seems quite promising.
3) Leverage existing code
Because pg_hint_plan is an extension, it needed its own parser (instead of
using gram.y). However, Orca is not an extension. Ideally, Orca would
reuse pg_hint_plan's grammar definition and structs without redefining
them somewhere else.
Any thoughts?
Does seem like a worthwhile feature? Does the syntax seems appropriate? Any
issues that I missed?
Thanks,
David
[1] https://www.postgresql.org/message-id/flat/Pine.LNX.4.64.0910062354510.6801%40sn.sai.msu.ru
[2] https://www.postgresql.org/message-id/flat/44D87BB7.4070509%40phlo.org
[3] https://www.postgresql.org/message-id/flat/20061012151439.GT28647%40nasby.net
[4] https://github.com/ossc-db/pg_hint_plan
[5] https://github.com/chrishajas/gpdb/blob/hackday/gpcontrib/pg_hint_plan/README.md
On Monday, September 25, 2023 2:27 PM Luis Macedo <luis0...@gmail.com> wrote:
> We already have GUCs that we can set to force change in the optimizer
> behavior.
The problem with GUCs is that often they are not granular enough. We have seen
multiple customer escalations where we didn't pick the right index; or we
picked a bad join order; or we performed a join on the coordinator instead of
the segments; or ...
And we've even added GUCs (e.g. optimizer_penalize_broadcast_threshold) to hack
around issues. Personally, I think this is even worse practice as it pollutes
our GUC-space with esoteric use cases for specific customer workloads.
> Also, when we set hints, if the data demography changes, the plan will be
> stuck to a particular hint that its not true anymore, so it can also be
> harmful.
Agreed. That's part of the "bad SQL hygiene" concern I noted. Plan hints should
probably be ephemeral. But I think that's on the DBA to manage. In production,
hints should probably only be used as a workaround until a proper optimizer
fix is shipped. As with many features, there's an risk that it may be used
incorrectly.
> Honestly I would invest that time in improving statistics collection,
> specially on multiple columns (eg: joins or filters on 2 or more columns)
> and propagating the stats on upper slices of the plan, which we currently
> don't do well and its a big source of bad plans.
>
> I would also invest on a query plan check tool that could provide us with
> the metadata of the plans so we have a collection of queries were we do not
> do well and we can work on making those run fast.
Noted. Thanks for that feedback.
Thanks,
David
On Tuesday, September 26, 2023 12:49 PM Soumyadeep Chakraborty <soumyad...@gmail.com> wrote:
> I am ambivalent about customers using plan hints in production, even as a
> workaround. A specific plan hint may seem attractive at one moment, but
> become stale quickly if the data changes
I feel like the same argument can be made against using GUCs to get out of a
plan regression. If we're okay with that workaround, then how are plan hints
different?
> Can you elaborate on some of the ways this can enhance testability of our
> code, CI etc?
One recent case is when the team was doing work to update index costing to account
for various combinations and permutations of index columns. The easiest way to
compare index performance is to force plans that pick each index and compare
result. For large data sets it became impractical to delete and create indexes
while debugging.
Another valuable use case is when debugging customer performance issues. If we
can coerce smaller queries then we can more quickly debug and isolate problems.
> Don't we also have to worry about distribution policies, locus, replicated vs
> distributed etc?
We don't. The hint writer does. If the hint conflicts, then the optimizer will
either error or fail to generate a plan. That's because hints don't force plans,
they just add "preference" to a subset of plans in the valid plan search space.
> Would the relstats estimates being sometimes whole-table vs divided by
> segment count (the ADJUST_BASESCAN() gymnastics) make things messy?
Possibly. I haven't deep dived into that yet.
> Also, there is the aspect of access method. Certain hints won't make sense
> for AO/CO tables.
Again, I think this shouldn't be an issue because hints don't force plans. If
the hint coerced a bad plan then that is probably a bug in the optimizer, not
the hinter.
> There are also a number of operators that are exclusive to GPDB like
> Dynamic*Scan and deviations from upstream (such as stream-bottom HashAgg)
> that we would have to gradually absorb.
Good point.
> Finally, we would have to augment our existing tools (minirepro) to capture
> plan hints used in queries, which may be annoying?
Yeah, it may be annoying. But not insurmountable. Those tools could probably
use a good refactoring anyways. ;)
>> 3) Leverage existing code
>>
>> Because pg_hint_plan is an extension, it needed its own parser (instead
>> of using gram.y). However, Orca is not an extension. Ideally, Orca
>> would reuse pg_hint_plan's grammar definition and structs without
>> redefining them somewhere else.
>
> Logistics wise, we could ship this as a core extension (like gp_toolkit), and
> ORCA can depend on it being installed.
Good to know. Thanks.
On Tuesday, September 26, 2023 12:49 PM Soumyadeep Chakraborty <soumyad...@gmail.com> wrote:
> On Mon, Sep 25, 2023 at 3:19PM 'David Kimura' via Greenplum Developers
>> On Monday, September 25, 2023 2:27 PM Luis Macedo <luis0...@gmail.com> wrote:
>>> We already have GUCs that we can set to force change in the optimizer
>>> behavior.
>>
>> The problem with GUCs is that often they are not granular enough. We have
>> seen multiple customer escalations where we didn't pick the right index; or
>> we picked a bad join order; or we performed a join on the coordinator
>> instead of the segments; or ...
>>
>> And we've even added GUCs (e.g. optimizer_penalize_broadcast_threshold) to
>> hack around issues. Personally, I think this is even worse practice as it
>> pollutes our GUC-space with esoteric use cases for specific customer
>> workloads.
>
> True, however we have a degree of control over what users can do with the
> GUC. With plan hints however, there is no degree of control.
What do you mean by "there is no degree of control"?
>>> Also, when we set hints, if the data demography changes, the plan will be
>>> stuck to a particular hint that its not true anymore, so it can also be
>>> harmful.
>>
>> Agreed. That's part of the "bad SQL hygiene" concern I noted. Plan hints
>> should probably be ephemeral. But I think that's on the DBA to manage. In
>> production, hints should probably only be used as a workaround until a
>> proper optimizer fix is shipped. As with many features, there's an risk that
>> it may be used incorrectly.
>
> Seldom do we have DBAs manage specific queries. In so many cases, DBAs have
> so little control over what app devs are doing. This is why we have features
> such as workload management so that we can impose global limits (such as
> resource groups/queues). Expecting DBAs to have oversight over individual
> queries can be a tough sell.
Hmm, that's unfortunate.. Though again, I think same problem exists using a GUC
workaround. If not the DBA then who usually manages that?
Thanks a lot for all this useful feedback!
Thanks,
David