VACUUM FREEZE record in pg_stat_last_operation and implication to pg

Huansong Fu

unread,

Jun 14, 2023, 3:40:29 PM6/14/23

to Greenplum Developers

Hi devs,

We recently found an interesting issue: during VACUUM FREEZE, for every table being

vacuumed & frozen, we would insert an unfrozen row into pg_stat_last_operaiton for

metadata tracking. In some sense that breaks the contract of VACUUM FREEZE (w/o a table name)

which is supposed to freeze everything in that database.

This caused an issue for a PR in which we tried to import system collations into template0

at gpinitsystem time. The issue arose in pg_upgrade after we replaced the target cluster’s

pg_xact directory with the one from source cluster, then we would have trouble checking

visibility of the unfrozen row in pg_stat_last_operation (below is from pg_upgrade log):

Deleting files from new pg_xact ok

Copying old pg_xact to new server ok

…

Setting frozenxid and minmxid counters in new cluster SQL command failed

VACUUM FREEZE

ERROR: could not access status of transaction 640

DETAIL: Could not open file "pg_xact/0000": No such file or directory.

The ERROR happened when we do VACUUM FREEZE for template0. But note that it could

happen even if we simply SELECT FROM pg_stat_last_operation in template0.

I think we have two questions to discuss here:

QUESTION #1:

Should we do anything to the pg_stat_last_operation record for VACUUM FREEZE? Some options:

a) We mark the row frozen as soon as we inserted it to pg_stat_last_operation. We have some ways to do

that - but before we talk about them, is that something worth doing? The contract of VACUUM FREEZE

is broken only if we VACUUM FREEZE the entire database. If we just VACUUM FREEZE individual tables,

there’s no such contract to respect (unless we VACUUM FREEZE pg_stat_last_operation itself).

b) Or, should we avoid metadata-tracking of VACUUM FREEZE for template0? That would solve our bug for sure.

But for that PR we are also thinking about VACUUM FREEZE template1 and postgres too to make everything

equivalent to initdb, so potentially we could have similar bugs in future for template1 and postgres too.

c) Or, should we avoid metadata-tracking of VACUUM FREEZE at all? That might be an over-kill but just for the

thought - is that information useful in real life?

d) Or, we do nothing here. For our bug, we just do not import collations to template0 in gpinitsystem. That’s a bit

unfortunate but I guess not too impactful because firstly rarely do users create database from template0,

and secondly, even if they do it would be just one additional call to pg_import_system_collations() anyway.

QUESTION #2:

Why are we even doing VACUUM FREEZE for template0 in pg_upgrade?

Basically, after we replaced pg_xact, VACUUM FREEZE cannot reliably do the job anymore: it cannot freeze

a row whose xid is missing in pg_xact. So we have to make sure all rows are already frozen before we run

VACUUM FREEZE (all the workarounds for QUESTION #1 are essentially for this purpose). But if we have

made sure of that, then why do we need to VACUUM FREEZE again? If freeze is a requirement, perhaps we

could perform VACUUM FREEZE before we replaced pg_xact?

Thoughts?

Thanks,

Huansong

Ashwin Agrawal

unread,

Jun 15, 2023, 6:56:58 AM6/15/23

to Huansong Fu, Greenplum Developers

On Thu, Jun 15, 2023 at 1:10 AM 'Huansong Fu' via Greenplum Developers <gpdb...@greenplum.org> wrote:

Hi devs,

We recently found an interesting issue: during VACUUM FREEZE, for every table being

vacuumed & frozen, we would insert an unfrozen row into pg_stat_last_operaiton for

metadata tracking. In some sense that breaks the contract of VACUUM FREEZE (w/o a table name)

which is supposed to freeze everything in that database.

This caused an issue for a PR in which we tried to import system collations into template0

at gpinitsystem time. The issue arose in pg_upgrade after we replaced the target cluster’s

pg_xact directory with the one from source cluster, then we would have trouble checking

visibility of the unfrozen row in pg_stat_last_operation (below is from pg_upgrade log):

Deleting files from new pg_xact ok

Copying old pg_xact to new server ok

…

Setting frozenxid and minmxid counters in new cluster SQL command failed

VACUUM FREEZE

ERROR: could not access status of transaction 640

DETAIL: Could not open file "pg_xact/0000": No such file or directory.

The ERROR happened when we do VACUUM FREEZE for template0. But note that it could

happen even if we simply SELECT FROM pg_stat_last_operation in template0.

Please can you share what was the datfrozenxid, age(datfrozenxid) for the template0 database and also

the relfrozenxid, age(relfrozenxid) of pg_stat_last_operation from pg_class after performing VACUUM FREEZE.

I believe these would be reflecting the values correctly on the reality and conveying the age of these have not dropped to 1.

(Understand there is implicit assumption being made by VACUUM FREEZE that all tuples are frozen,

though seems no check/validation exist to make sure that assumption is met where required.

Trying to think through what aspect conveys in PostgreSQL that all tuples in a relation or database

have been frozen (I believe age dropping to 1 is the reality). And with the situation stated about pg_stat_last_operation

this would never happen in GPDB, correct?

I think we have two questions to discuss here:

QUESTION #1:

Should we do anything to the pg_stat_last_operation record for VACUUM FREEZE? Some options:

a) We mark the row frozen as soon as we inserted it to pg_stat_last_operation. We have some ways to do

that - but before we talk about them, is that something worth doing? The contract of VACUUM FREEZE

is broken only if we VACUUM FREEZE the entire database. If we just VACUUM FREEZE individual tables,

there’s no such contract to respect (unless we VACUUM FREEZE pg_stat_last_operation itself).

b) Or, should we avoid metadata-tracking of VACUUM FREEZE for template0? That would solve our bug for sure.

But for that PR we are also thinking about VACUUM FREEZE template1 and postgres too to make everything

equivalent to initdb, so potentially we could have similar bugs in future for template1 and postgres too.

c) Or, should we avoid metadata-tracking of VACUUM FREEZE at all? That might be an over-kill but just for the

thought - is that information useful in real life?

d) Or, we do nothing here. For our bug, we just do not import collations to template0 in gpinitsystem. That’s a bit

unfortunate but I guess not too impactful because firstly rarely do users create database from template0,

and secondly, even if they do it would be just one additional call to pg_import_system_collations() anyway.

Still trying to gain context and understand the situation better before answering these questions.

QUESTION #2:

Why are we even doing VACUUM FREEZE for template0 in pg_upgrade?

Reasoning is instead of pg_upgrade assuming template0 is full in frozen state and directly going ahead to modify

relfrozenxid and such details for relations and database for template0. It's best for pg_upgrade to *not assume* and instead make

it happen to trust the condition. As if any problems later after upgrade is extremely hard to RCA (if similar to current bug gets

slipped to production where template0 is tempered with and not left fully frozen by any operation).

Please refer to commit from PG 94 STABLE https://github.com/greenplum-db/gpdb-postgres-merge/commit/04634d50ba62df936d818a58dead045d1daa2a6c

which will provide some details as the assumption was broken. I don't remember what was the reasoning

for pg_statictics tuples to be not fully frozen by initdb then (possible we had fixed something related to it too).

Basically, after we replaced pg_xact, VACUUM FREEZE cannot reliably do the job anymore: it cannot freeze

a row whose xid is missing in pg_xact. So we have to make sure all rows are already frozen before we run

VACUUM FREEZE (all the workarounds for QUESTION #1 are essentially for this purpose). But if we have

made sure of that, then why do we need to VACUUM FREEZE again? If freeze is a requirement, perhaps we

could perform VACUUM FREEZE before we replaced pg_xact?

I agree with the notion that VACUUM FREEZE needs to happen before replacing pg_xact. We missed that aspect

when placing FREEZE where its happening. Not strictly necessary to have VACUUM FREEZE as upstream doesn't

but then any and all operations on template0 need to make sure they leave template0 in full frozen state - which I am

not sure how we can validate.

Having VACUUM FREEZE for template0 in pg_upgrade helped to catch the current problem I guess, so proves its worth already -)

--

Ashwin Agrawal (VMware)

Ashwin Agrawal

unread,

Jun 15, 2023, 7:19:24 AM6/15/23

to Huansong Fu, Greenplum Developers

On Thu, Jun 15, 2023 at 4:26 PM Ashwin Agrawal <ashwi...@gmail.com> wrote:

Trying to think through what aspect conveys in PostgreSQL that all tuples in a relation or database
have been frozen (I believe age dropping to 1 is the reality). And with the situation stated about pg_stat_last_operation
this would never happen in GPDB, correct?

I tried PostgreSQL, after VACUUM FREEZE for template1 or postgres database age drops to 0.

Whereas I don't see that for GPDB (didn't dig into the details as to why).

PostgreSQL

==========

GPDB - age can increase after VACUUM FREEZE :-(

======

postgres=# select age(datfrozenxid),datname from pg_database where datname='template1';
age | datname
-----+-----------
24 | template1
(1 row)

postgres=# vacuum freeze;
VACUUM

postgres=# select age(datfrozenxid),datname from pg_database where datname='template1';
age | datname
-----+-----------
118 | template1
(1 row)

Why these relations continue to have age greater than 0. Is the problem not just with pg_stat_last_operation?

Or I am doing something wrong.

Side note:

Due to how the freezing is performed now via t_infomask and leaving the transaction

number intact in table makes it very hard to trace not frozen tuples. Need to use pageinpect and all such longer route tool.

Don't think some easy function exist to give such tuples easily.

--

Ashwin Agrawal (VMware)

Huansong Fu

unread,

Jun 15, 2023, 4:27:55 PM6/15/23

to Ashwin Agrawal, Greenplum Developers

Why these relations continue to have age greater than 0

Great finding! It seems GPDB's VACUUM advances XID for each table being vacuumed. Whereas PG’s VACUUM takes no XID (txid_current itself takes one):

PostgreSQL

===========

postgres=# select txid_current();

txid_current

--------------

759

(1 row)

postgres=# vacuum freeze;

VACUUM

postgres=# select txid_current();

txid_current

--------------

760

(1 row)

GPDB

============

postgres=# select txid_current();

txid_current

--------------

1275

(1 row)

postgres=# vacuum freeze;

VACUUM

postgres=# select txid_current();

txid_current

--------------

1367

(1 row)

Firstly, that explains the value of relfrozenxid/datfrozenxid:

* A table’s relfrozenxid is set to the current XID at the time it is being vacuumed. Because the current XID advances, relfrozenxid also advances for each table.

* A database’s datfrozenxid is set to the minimal relfrozenxid in the database.

* So after VACUUM FREEZE, datfrozenxid is set to the relfrozenxid of the first table being vacuumed and it would have an age of (number of table being vacuumed - 1).

Secondly, why does XID advance? It turns out that it is also related to pg_stat_last_operation: because we add/update the table we would request an XID.

Since no XID has been assigned in VACUUM, we would assign a new one. Then we commit the transaction for the current table being vacuumed which completes the XID.

Now, is it a bug? I’m not sure:

* Firstly, it is definitely not ideal that VACUUM takes as many XIDs as the number of tables. But it is expected that we need a standalone transaction for

each table being vacuumed — that aspect is the same as upstream. So as long as we keep metadata tracking of VACUUM of each table, this would be hard to

avoid. Maybe we can just keep one overall VACUUM record in pg_stat_last_operation? Or only update all the records in the last transaction? We can look into that separately.

* Secondly, I think those relfrozenxid/datfrozenxid values still reflect the reality of the table/database: there’s no unfrozen rows below relfrozenxid/datfrozenxid. I think

the key issue is still that we produced new unfrozen rows along with VACUUM FREEZE.

Please refer to commit from PG 94 STABLE https://github.com/greenplum-db/gpdb-postgres-merge/commit/04634d50ba62df936d818a58dead045d1daa2a6c

which will provide some details as the assumption was broken

Thanks for the link! It answers why VACUUM FREEZE is needed in pg_upgrade. And I agree we should keep such a safeguard (but move it prior to replacing pg_xact).

But I wonder why it is done only for template0 though? I guess maybe just template1 and postgres didn’t have the problem at that time. But maybe we should do the same

for template1 and postgres for maximum assurance?

Best regards,

Huansong

On Jun 15, 2023, at 7:19 AM, Ashwin Agrawal <ashwi...@gmail.com> wrote:

!! External Email

!! External Email: This email originated from outside of the organization. Do not click links or open attachments unless you recognize the sender.

Ashwin Agrawal

unread,

Jun 21, 2023, 8:02:35 AM6/21/23

to Huansong Fu, Greenplum Developers

In the context of pure VACUUM FREEZE this is not a bug to produce unfrozen rows

(given its correctly reflecting the age for the relation and database). The bug is purely in context

of pg_upgrade and all such operations which blindly rely on the assumption that after VACUUM FREEZE

the age of the database will be 0 without actually checking for the same. Based on that assumption, upgrade

is replacing transaction files and such seems incorrect - as if age is not 0 means the transaction ID will appear

on disk and hence can't perform the operations it's performing.

So, if PostgreSQL code is written with assumption that VACUUM FREEZE with have age as 0 at end we might

have to meet the same in GPDB and that means we have to avoid tracking VACUUM FREEZE transaction ID

on disk (someway). Or if don't do the same generically at least have to do the same for pg_upgrade context -

means avoid tracking VACUUM FREEZE invoked by pg_upgrade time to avoid the problem.

In general though disliking VACUUM FREEZE unnecessarily consuming XIDs equivalent to the number of

tables in the database or schema or partition table for that matter. But information on VACUUM FREEZE was

performed on a table is critical for troubleshooting and has to be maintained. So, need to continue to think how

to accomplish the goal (I have not given to solution side yet)

Please refer to commit from PG 94 STABLE https://github.com/greenplum-db/gpdb-postgres-merge/commit/04634d50ba62df936d818a58dead045d1daa2a6c

which will provide some details as the assumption was broken

Thanks for the link! It answers why VACUUM FREEZE is needed in pg_upgrade. And I agree we should keep such a safeguard (but move it prior to replacing pg_xact).

But I wonder why it is done only for template0 though? I guess maybe just template1 and postgres didn’t have the problem at that time. But maybe we should do the same

for template1 and postgres for maximum assurance?

prepare_new_cluster() step of pg_upgrade runs vacuumdb --all --freeze so it ideally takes care of all the connectable databases,

There is a comment above it for template0 stating initdb creates on frozen rows so no need to include it.

--

Ashwin Agrawal (VMware)

Huansong Fu

unread,

Jun 23, 2023, 12:36:22 AM6/23/23

to Ashwin Agrawal, Greenplum Developers

In the context of pure VACUUM FREEZE this is not a bug to produce unfrozen rows

(given its correctly reflecting the age for the relation and database). The bug is purely in context

of pg_upgrade and all such operations which blindly rely on the assumption that after VACUUM FREEZE

the age of the database will be 0 without actually checking for the same. Based on that assumption, upgrade

is replacing transaction files and such seems incorrect - as if age is not 0 means the transaction ID will appear

on disk and hence can't perform the operations it's performing.

Just to get more clarify about table age vs unfrozen row: it seems to me that there’s a difference in the expectation

from VACUUM FREEZE regarding them - it seems that the age doesn’t matter, but the unfrozen row does. Imagine

after VACUUM FREEZE we consumed some more xid w/o adding more rows into the table (e.g. select xid_current()),

even in upstream that would make table age > 0. But that’s OK right? pg_upgrade does VACUUM FREEZE because

it wants to make sure every row is frozen, not every table has age=0.

So if it is hard to avoid increasing table age after VACUUM FREEZE, we still have a way to fix our issue by avoiding

unfrozen rows. Here’s a thought: if we can make sure pg_stat_last_operation is always the last one to freeze, then

we would freeze all the pg_stat_last_operations rows we just inserted during a VACUUM FREEZE. After that, we may

just skip the VACUUM meta-tracking of pg_stat_last_operation itself. Or, we may freeze pg_stat_last_operation again

and only skip tracking of the second freeze. That way we made sure everything is frozen. How does that sound?

Thanks,

Huansong

On Jun 21, 2023, at 8:02 AM, Ashwin Agrawal <ashwi...@gmail.com> wrote:

!! External Email

Ashwin Agrawal

unread,

Jun 26, 2023, 2:26:43 AM6/26/23

to Huansong Fu, Greenplum Developers

On Fri, Jun 23, 2023 at 10:06 AM Huansong Fu <fhua...@vmware.com> wrote:

In the context of pure VACUUM FREEZE this is not a bug to produce unfrozen rows

(given its correctly reflecting the age for the relation and database). The bug is purely in context

of pg_upgrade and all such operations which blindly rely on the assumption that after VACUUM FREEZE

the age of the database will be 0 without actually checking for the same. Based on that assumption, upgrade

is replacing transaction files and such seems incorrect - as if age is not 0 means the transaction ID will appear

on disk and hence can't perform the operations it's performing.

Just to get more clarify about table age vs unfrozen row: it seems to me that there’s a difference in the expectation
from VACUUM FREEZE regarding them - it seems that the age doesn’t matter, but the unfrozen row does. Imagine

after VACUUM FREEZE we consumed some more xid w/o adding more rows into the table (e.g. select xid_current()),

even in upstream that would make table age > 0. But that’s OK right? pg_upgrade does VACUUM FREEZE because

it wants to make sure every row is frozen, not every table has age=0.

Yes, the problem is due to the VACUUM FREEZE transaction ID appearing in the database table (unfrozen row).

I referred to the discussion more in age context as table age does define the oldest transaction ID appearing in

the table. PostgreSQL doesn't have a mechanism currently to reflect that transaction ID doesn't appear in the

table as still the table's relfrozenxid is updated with VACUUM FREEZE transaction ID. Ideally, upstream if transaction

ID doesn't appear in the table, so the table's age shouldn't increase by consuming transactions

(e.g. select xid_current()) but it does currently. (I understand that making future life easy for table upstream reflects

relfrozenxid this way - irrespective of, that ID appears in the table or not).

So if it is hard to avoid increasing table age after VACUUM FREEZE, we still have a way to fix our issue by avoiding

unfrozen rows. Here’s a thought: if we can make sure pg_stat_last_operation is always the last one to freeze, then

we would freeze all the pg_stat_last_operations rows we just inserted during a VACUUM FREEZE. After that, we may

just skip the VACUUM meta-tracking of pg_stat_last_operation itself. Or, we may freeze pg_stat_last_operation again

and only skip tracking of the second freeze. That way we made sure everything is frozen. How does that sound?

I would prefer the option to evaluate VACUUM FREEZEing pg_stat_last_operation at end during database wide vacuum

operation (even if it happens twice is fine) and not tracking its own VACUUM FREEZE to solve the current aspects

discussed in this thread.

--

Ashwin Agrawal (VMware)

Huansong Fu

unread,

Jun 29, 2023, 11:51:26 AM6/29/23

to Ashwin Agrawal, Greenplum Developers

I would prefer the option to evaluate VACUUM FREEZEing pg_stat_last_operation at end during database wide vacuum

operation (even if it happens twice is fine) and not tracking its own VACUUM FREEZE to solve the current aspects

discussed in this thread.

I will go down this path. Thanks Ashwin for the feedbacks!

Thanks,

Huansong

On Jun 26, 2023, at 2:26 AM, Ashwin Agrawal <ashwi...@gmail.com> wrote:

!! External Email

Reply all

Reply to author

Forward

VACUUM FREEZE record in pg_stat_last_operation and implication to pg_upgrade

Huansong Fu

Ashwin Agrawal

Ashwin Agrawal

Huansong Fu

Ashwin Agrawal

Huansong Fu

Ashwin Agrawal

Huansong Fu