On Clearing Reader Gang's CatCache

114 views
Skip to first unread message

Zhenghua Lyu

unread,
Aug 30, 2022, 7:07:17 AM8/30/22
to gpdb...@greenplum.org, David Kimura
Hi,
    recently the test case `qp_union_intersect_optimizer` fail a lot of times,


The failure
    
    ======================================================================
    DIFF FILE: ../gpdb_src/src/test/regress/regression.diffs
    ----------------------------------------------------------------------

    diff -I HINT: -I CONTEXT: -I GP_IGNORE: -U3 /tmp/build/e18b2f02/gpdb_src/src/test/regress/expected/qp_union_intersect_optimizer.out                    
    /tmp/build/e18b2f02/gpdb_src/src/test/regress/results/qp_union_intersect.out
    --- /tmp/build/e18b2f02/gpdb_src/src/test/regress/expected/qp_union_intersect_optimizer.out     2022-08-30 03:14:24.132309958 +0000
    +++ /tmp/build/e18b2f02/gpdb_src/src/test/regress/results/qp_union_intersect.out    2022-08-30 03:14:24.192313753 +0000
    @@ -1699,8 +1699,7 @@
    (1 row)
 
    UPDATE dml_union_s SET b = (SELECT NULL UNION SELECT NULL)::numeric;
  -DETAIL:  Failing row contains (#####).
  -ERROR:  null value in column "b" violates not-null constraint
  +ERROR:  expected partdefid 134511, but got 0 (partdesc.c:194)
  --SELECT COUNT(DISTINCT(b)) FROM dml_union_s;
  --SELECT DISTINCT(b) FROM dml_union_s;
  -- @description union_update_test30: Negative Tests  more than one row returned by a sub-query used as an expression


  Thanks to @David Kimura's investigation, we know the RCA and let me paste more details here and to clarify more problems
  here. David has provided a very simple reproduce script, my discussion will base on the following script (test on master branch,
  with top commit 58cb8cf432).


The simple reproduce steps

create table dml_union_s (a int not null, b numeric default 10.00) distributed by (a) partition by range(b);
create table dml_union_s_1_prt_2 partition of dml_union_s for values from (1) to (1001);
create table dml_union_s_1_prt_def partition of dml_union_s default;

insert into dml_union_s select generate_series(1,10), generate_series(1,10);

begin;
drop table dml_union_s_1_prt_def;
-- it is this SQL need run under orca to reproduce the issue
select count(distinct(b)) from dml_union_s;
rollback;

-- this SQL fallback to planner now
update dml_union_s set a = (select null union select null)::numeric;

Why it fails


The overview stage of the bugs are reader's old catcache not invalidated. 

  1. an explicit transaction started with begin​;
  2. the first SQL modify the catalog, in the above example, it drop the default partition,
  3. not this statement will only dispatch to writer gang
  4. then comes an MPP query, which will create several gangs
  5. in the reader gang of this SQL `select count(distinct(b)) from dml_union_s;`
  6. it will do dynamic scan based on the root partition (ORCA's feature), thus this
  7. reader gang will store the partition table's catcache, call chain is:
  8. ExecInitDynamicSeqScan --> ExecOpenScanRelation --> ExecGetRangeTableRelation -->
  9. table_open --> relation_open --> RelationIdGetRelation --> RelationBuildDesc --> 
  10. RelationBuildPartitionKey
  11. NOTE:​ we are still during the begin; block, not commit or abort. So the above steps will not
  12. call get_default_partition_oid during RelationBuildPartitionDesc
  13. then, the transaction abort, as design, it will only invalidate the cache for its own process
  14. see the comments in function `AtEOXact_Inval`
  15. So the reader gang's catversion still think that this partition do not have default leaf partition
  16. Then a new SQL comes in, when it `get_default_partition_oid` it will return 0​ because of 8​, thus
  17. error out. And the call chain for this update of reader gang's scan is:
  18. InitPlan --> InitResultRelInfo --> RelationGetPartitionQual --> generate_partition_qual --> relation_open ...
For this very specific issue, serveral points here:
  1. only orca will generate dynamic scan node, and execute in reader gang
  2. we InitResultRelation even in reader gang, at the first glance, reader gang seems not need to do so

Will it impact more?

     The first question come to my mind after understanding this issue is: will the old catcache in reader gang
cause other problems?

     Here is what I have tried:
  1. in a begin; block, drop a column of a table
  2. then do a select with multi slices SQL, so that each reader gang will cache a version of relation
  3. abort the transaction
  4. query the table again
It works with no error. Relation catversion will be cleaned up in function `AtEOXact_RelationCache` (greenplum
patch this function a bit).

    I try to go through syscache, still not find another bad case yet.

How to fix?

David provides the following fix:

diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index cbb23a6ba1..bb8e23bf7c 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -3553,7 +3553,7 @@ AbortTransaction(void)
                AtEOXact_ComboCid_Dsm_Detach();
                AtEOXact_Buffers(false);
                AtEOXact_RelationCache(false);
-               AtEOXact_Inval(false);
+               AtEOXact_Inval(true);
                AtEOXact_MultiXact();
 
                ResourceOwnerRelease(TopTransactionResourceOwner,

The fix seem reasonable to me, even Postgres says for abort it does not need to notify other
backend. In Greenplum, we have MPP session, and reader and writer behaves sort of like
a single backend, when abort, we should try to notify reader gang. And:

  1. correctness is more important than performance
  2. most of the transactions should commit than abort
  3. most of SQLs in greenplum are OLAP workloads
So we should not worry much about the above change.


Any thoughts? Thanks so much!


Best,
Zhenghua Lyu

Soumyadeep Chakraborty

unread,
Sep 6, 2022, 6:25:12 PM9/6/22
to Zhenghua Lyu, gpdb...@greenplum.org, David Kimura
> The fix seem reasonable to me, even Postgres says for abort it does not need to notify other
> backend. In Greenplum, we have MPP session, and reader and writer behaves sort of like
> a single backend, when abort, we should try to notify reader gang. And:

> 1. correctness is more important than performance
> 2. most of the transactions should commit than abort
> 3. most of SQLs in greenplum are OLAP workloads
>
> So we should not worry much about the above change.

I didn't spend enough time on this for an in-depth look but here are my initial
feelings.

Imagine a scenario where we have a malformed ETL/write job that keeps retrying
and failing (for eg due to a column missing from csv), running concurrently
with intensive read workloads having many slices. This is where there will be
significant overhead. Higher the number of backends, more often will we be
spending CPU cycles in SIGetDataEntries() (which also grabs a contentious
LWLock), which is called by every backend and very frequently.

I don't have a better solution in mind. But we should perf test this change. If
the dynamic scan is the only thing affected, maybe we can solve for that
specifically, instead of doing this large change.

Regards,
Soumyadeep (VMware)

David Kimura

unread,
Sep 7, 2022, 12:51:12 PM9/7/22
to Soumyadeep Chakraborty, Zhenghua Lyu, gpdb...@greenplum.org
On Tuesday, September 6, 2022 3:24 PM Soumyadeep Chakraborty <soumyad...@gmail.com> wrote:
> Imagine a scenario where we have a malformed ETL/write job that keeps retrying
> and failing (for eg due to a column missing from csv), running concurrently
> with intensive read workloads having many slices.

I see your point of a poor performance scenario. But, it seems to me like that
particular example is a user error by providing malformed data into the
workflow. If that is true, then is it really worth optimizing for that
particular scenario?

Thanks,
David

David Kimura

unread,
Sep 7, 2022, 12:55:29 PM9/7/22
to Soumyadeep Chakraborty, Zhenghua Lyu, David Kimura, gpdb...@greenplum.org

Ah, but a malicious user could probably affect other users' queries. That seems
like a legit issue...

Thanks,
David

Soumyadeep Chakraborty

unread,
Sep 7, 2022, 1:04:12 PM9/7/22
to David Kimura, Zhenghua Lyu, gpdb...@greenplum.org
> > > Imagine a scenario where we have a malformed ETL/write job that keeps retrying
> > > and failing (for eg due to a column missing from csv), running concurrently
> > > with intensive read workloads having many slices.
> >
> > I see your point of a poor performance scenario. But, it seems to me like that
> > particular example is a user error by providing malformed data into the
> > workflow. If that is true, then is it really worth optimizing for that
> > particular scenario?
>
> Ah, but a malicious user could probably affect other users' queries. That seems
> like a legit issue...

Yeah and such syntax ERRORs are common, not to mention the ERROR could be some
internal ERROR outside the control of the user.

Regards,
Soumyadeep (VMware)

Zhenghua Lyu

unread,
Sep 7, 2022, 6:57:40 PM9/7/22
to Soumyadeep Chakraborty, gpdb...@greenplum.org, David Kimura
Hi, 

Imagine a scenario where we have a malformed ETL/write job that keeps retrying
and failing (for eg due to a column missing from csv), running concurrently
with intensive read workloads having many slices. This is where there will be
significant overhead. Higher the number of backends, more often will we be
spending CPU cycles in SIGetDataEntries() (which also grabs a contentious
LWLock), which is called by every backend and very frequently.
This is not worse than commit, since commit will always invalidate the catcache.
And normally, in begin;end block, there will not be many DDLs.​


From: Soumyadeep Chakraborty <soumyad...@gmail.com>
Sent: Wednesday, September 7, 2022 6:24 AM
To: Zhenghua Lyu <zl...@vmware.com>
Cc: gpdb...@greenplum.org <gpdb...@greenplum.org>; David Kimura <dki...@vmware.com>
Subject: Re: On Clearing Reader Gang's CatCache
 
⚠ External Email
________________________________

⚠ External Email: This email originated from outside of the organization. Do not click links or open attachments unless you recognize the sender.

Ashwin Agrawal

unread,
Sep 8, 2022, 12:51:53 PM9/8/22
to Zhenghua Lyu, gpdb...@greenplum.org, David Kimura
Is it possible to restrict the Invadiation in abort cases only if a reader gang exists for a session (if easy to check)?


--
Ashwin Agrawal (VMware)

Zhenghua Lyu

unread,
Sep 26, 2022, 10:34:54 AM9/26/22
to Ashwin Agrawal, gpdb...@greenplum.org, David Kimura
 Is it possible to restrict the Invadiation in abort cases only if a reader gang exists for a session (if easy to check)?
​Hi, Ashwin

this happen on segments, each QE for the same postmaster are equal, seems we do not have a light and easy flag
to check if this postmaster have other backend with Gp_iswrite=True​.

But I think ReaderGang should not do such change, maybe we add some condition to only do this for:
  • QD (to clear up entry DB)
  • writer gang
Ideas?

From: Ashwin Agrawal <ashwi...@gmail.com>
Sent: Friday, September 9, 2022 12:51 AM

To: Zhenghua Lyu <zl...@vmware.com>
Cc: gpdb...@greenplum.org <gpdb...@greenplum.org>; David Kimura <dki...@vmware.com>
Subject: Re: On Clearing Reader Gang's CatCache
 

⚠ External Email

On Tue, Aug 30, 2022 at 4:07 AM 'Zhenghua Lyu' via Greenplum Developers <gpdb...@greenplum.org> wrote:

Soumyadeep Chakraborty

unread,
Oct 24, 2022, 5:34:16 PM10/24/22
to Zhenghua Lyu, Ashwin Agrawal, gpdb...@greenplum.org, David Kimura
>> Imagine a scenario where we have a malformed ETL/write job that keeps retrying
>> and failing (for eg due to a column missing from csv), running concurrently
>> with intensive read workloads having many slices. This is where there will be
>> significant overhead. Higher the number of backends, more often will we be
>> spending CPU cycles in SIGetDataEntries() (which also grabs a contentious
>> LWLock), which is called by every backend and very frequently.

> This is not worse than commit, since commit will always invalidate the catcache.
> And normally, in begin;end block, there will not be many DDLs.

I see, yeah if there is no DDL, then we will hit this early exit case in
AtEOXact_Inval():

/* Quick exit if no messages */
if (transInvalInfo == NULL)
return;

So the impact shouldn't be felt at all most of the time.

The only DDL I can think of that might consistently appear inside a transaction
block is create|drop temp table.

All that considered, I don't feel that bad about the change.

> But I think ReaderGang should not do such change, maybe we add some condition to only do this for:
> QD (to clear up entry DB)
> writer gang
> Ideas?

For a reader gang, I don't think we will ever have any invalidation messages to
send (as you mentioned they can't make catalog changes no?). Also, we don't need
to do this in utility mode. So maybe it should look like this:

>> Imagine a scenario where we have a malformed ETL/write job that keeps retrying
>> and failing (for eg due to a column missing from csv), running concurrently
>> with intensive read workloads having many slices. This is where there will be
>> significant overhead. Higher the number of backends, more often will we be
>> spending CPU cycles in SIGetDataEntries() (which also grabs a contentious
>> LWLock), which is called by every backend and very frequently.

> This is not worse than commit, since commit will always invalidate the catcache.
> And normally, in begin;end block, there will not be many DDLs.

I see, yeah if there is no DDL, then we will hit this early exit case in
AtEOXact_Inval():

/* Quick exit if no messages */
if (transInvalInfo == NULL)
return;

So the impact shouldn't be felt at all most of the time.

The only DDL I can think of that might consistently appear inside a transaction
block is create|drop temp table.

All that considered, I don't feel that bad about the change.

> But I think ReaderGang should not do such change, maybe we add some condition to only do this for:
> QD (to clear up entry DB)
> writer gang
> Ideas?

For a reader gang, I don't think we will ever have any invalidation messages to
send (as you mentioned they can't make catalog changes no?). Also, we don't need
to do this in utility mode. So yeah it can look like this:
if (Gp_role == GP_ROLE_DISPATCH ||
(Gp_role == GP_ROLE_EXECUTE && MyProc->mppIsWriter))
AtEOXact_Inval(true);
else
{
/*
* Reader gangs and utility mode sessions don't need to trigger
* additional cache invalidation.
*/
AtEOXact_Inval(false);
}

Regards,
Soumyadeep (VMware)
Reply all
Reply to author
Forward
0 new messages