VACUUM FREEZE record in pg_stat_last_operation and implication to pg_upgrade

56 views
Skip to first unread message

Huansong Fu

unread,
Jun 14, 2023, 3:40:29 PM6/14/23
to Greenplum Developers
Hi devs,

We recently found an interesting issue: during VACUUM FREEZE, for every table being 
vacuumed & frozen, we would insert an unfrozen row into pg_stat_last_operaiton for 
metadata tracking. In some sense that breaks the contract of VACUUM FREEZE (w/o a table name) 
which is supposed to freeze everything in that database.

This caused an issue for a PR in which we tried to import system collations into template0 
at gpinitsystem time. The issue arose in pg_upgrade after we replaced the target cluster’s 
pg_xact directory with the one from source cluster, then we would have trouble checking 
visibility of the unfrozen row in pg_stat_last_operation (below is from pg_upgrade log):

Deleting files from new pg_xact                             ok
Copying old pg_xact to new server                           ok
Setting frozenxid and minmxid counters in new cluster       SQL command failed
VACUUM FREEZE
ERROR:  could not access status of transaction 640
DETAIL:  Could not open file "pg_xact/0000": No such file or directory.

The ERROR happened when we do VACUUM FREEZE for template0. But note that it could 
happen even if we simply SELECT FROM pg_stat_last_operation in template0.

I think we have two questions to discuss here:

QUESTION #1: 
Should we do anything to the pg_stat_last_operation record for VACUUM FREEZE? Some options:

a) We mark the row frozen as soon as we inserted it to pg_stat_last_operation. We have some ways to do 
that - but before we talk about them, is that something worth doing? The contract of VACUUM FREEZE 
is broken only if we VACUUM FREEZE the entire database. If we just VACUUM FREEZE individual tables, 
there’s no such contract to respect (unless we VACUUM FREEZE pg_stat_last_operation itself).

b) Or, should we avoid metadata-tracking of VACUUM FREEZE for template0? That would solve our bug for sure.
But for that PR we are also thinking about VACUUM FREEZE template1 and postgres too to make everything
equivalent to initdb, so potentially we could have similar bugs in future for template1 and postgres too. 

c) Or, should we avoid metadata-tracking of VACUUM FREEZE at all? That might be an over-kill but just for the
thought - is that information useful in real life?

d) Or, we do nothing here. For our bug, we just do not import collations to template0 in gpinitsystem. That’s a bit
unfortunate but I guess not too impactful because firstly rarely do users create database from template0, 
and secondly, even if they do it would be just one additional call to pg_import_system_collations() anyway.

QUESTION #2: 
Why are we even doing VACUUM FREEZE for template0 in pg_upgrade?

Basically, after we replaced pg_xact, VACUUM FREEZE cannot reliably do the job anymore: it cannot freeze 
a row whose xid is missing in pg_xact. So we have to make sure all rows are already frozen before we run 
VACUUM FREEZE (all the workarounds for QUESTION #1 are essentially for this purpose). But if we have 
made sure of that, then why do we need to VACUUM FREEZE again? If freeze is a requirement, perhaps we 
could perform VACUUM FREEZE before we replaced pg_xact?

Thoughts?

Thanks,
Huansong

Ashwin Agrawal

unread,
Jun 15, 2023, 6:56:58 AM6/15/23
to Huansong Fu, Greenplum Developers
On Thu, Jun 15, 2023 at 1:10 AM 'Huansong Fu' via Greenplum Developers <gpdb...@greenplum.org> wrote:
Hi devs,

We recently found an interesting issue: during VACUUM FREEZE, for every table being 
vacuumed & frozen, we would insert an unfrozen row into pg_stat_last_operaiton for 
metadata tracking. In some sense that breaks the contract of VACUUM FREEZE (w/o a table name) 
which is supposed to freeze everything in that database.

This caused an issue for a PR in which we tried to import system collations into template0 
at gpinitsystem time. The issue arose in pg_upgrade after we replaced the target cluster’s 
pg_xact directory with the one from source cluster, then we would have trouble checking 
visibility of the unfrozen row in pg_stat_last_operation (below is from pg_upgrade log):

Deleting files from new pg_xact                             ok
Copying old pg_xact to new server                           ok
Setting frozenxid and minmxid counters in new cluster       SQL command failed
VACUUM FREEZE
ERROR:  could not access status of transaction 640
DETAIL:  Could not open file "pg_xact/0000": No such file or directory.

The ERROR happened when we do VACUUM FREEZE for template0. But note that it could 
happen even if we simply SELECT FROM pg_stat_last_operation in template0.

Please can you share what was the datfrozenxid, age(datfrozenxid) for the template0 database and also
the relfrozenxid, age(relfrozenxid) of pg_stat_last_operation from pg_class after performing VACUUM FREEZE.
I believe these would be reflecting the values correctly on the reality and conveying the age of these have not dropped to 1.
(Understand there is implicit assumption being made by VACUUM FREEZE that all tuples are frozen, 
though seems no check/validation exist to make sure that assumption is met where required.

Trying to think through what aspect conveys in PostgreSQL that all tuples in a relation or database
have been frozen (I believe age dropping to 1 is the reality). And with the situation stated about pg_stat_last_operation 
this would never happen in GPDB, correct?

I think we have two questions to discuss here:

QUESTION #1: 
Should we do anything to the pg_stat_last_operation record for VACUUM FREEZE? Some options:

a) We mark the row frozen as soon as we inserted it to pg_stat_last_operation. We have some ways to do 
that - but before we talk about them, is that something worth doing? The contract of VACUUM FREEZE 
is broken only if we VACUUM FREEZE the entire database. If we just VACUUM FREEZE individual tables, 
there’s no such contract to respect (unless we VACUUM FREEZE pg_stat_last_operation itself).

b) Or, should we avoid metadata-tracking of VACUUM FREEZE for template0? That would solve our bug for sure.
But for that PR we are also thinking about VACUUM FREEZE template1 and postgres too to make everything
equivalent to initdb, so potentially we could have similar bugs in future for template1 and postgres too. 

c) Or, should we avoid metadata-tracking of VACUUM FREEZE at all? That might be an over-kill but just for the
thought - is that information useful in real life?

d) Or, we do nothing here. For our bug, we just do not import collations to template0 in gpinitsystem. That’s a bit
unfortunate but I guess not too impactful because firstly rarely do users create database from template0, 
and secondly, even if they do it would be just one additional call to pg_import_system_collations() anyway.

Still trying to gain context and understand the situation better before answering these questions.
 

QUESTION #2: 
Why are we even doing VACUUM FREEZE for template0 in pg_upgrade?

Reasoning is instead of pg_upgrade assuming template0 is full in frozen state and directly going ahead to modify
relfrozenxid and such details for relations and database for template0. It's best for pg_upgrade to *not assume* and instead make 
it happen to trust the condition. As if any problems later after upgrade is extremely hard to RCA (if similar to current bug gets 
slipped to production where template0 is tempered with and not left fully frozen by any operation).

which will provide some details as the assumption was broken. I don't remember what was the reasoning 
for pg_statictics tuples to be not fully frozen by initdb then (possible we had fixed something related to it too).

Basically, after we replaced pg_xact, VACUUM FREEZE cannot reliably do the job anymore: it cannot freeze 
a row whose xid is missing in pg_xact. So we have to make sure all rows are already frozen before we run 
VACUUM FREEZE (all the workarounds for QUESTION #1 are essentially for this purpose). But if we have 
made sure of that, then why do we need to VACUUM FREEZE again? If freeze is a requirement, perhaps we 
could perform VACUUM FREEZE before we replaced pg_xact?

I agree with the notion that VACUUM FREEZE needs to happen before replacing pg_xact. We missed that aspect
when placing FREEZE where its happening. Not strictly necessary to have VACUUM FREEZE as upstream doesn't
but then any and all operations on template0 need to make sure they leave template0 in full frozen state - which I am
not sure how we can validate.

Having VACUUM FREEZE for template0 in pg_upgrade helped to catch the current problem I guess, so proves its worth already -)

--
Ashwin Agrawal (VMware)

Ashwin Agrawal

unread,
Jun 15, 2023, 7:19:24 AM6/15/23
to Huansong Fu, Greenplum Developers
On Thu, Jun 15, 2023 at 4:26 PM Ashwin Agrawal <ashwi...@gmail.com> wrote:
Trying to think through what aspect conveys in PostgreSQL that all tuples in a relation or database
have been frozen (I believe age dropping to 1 is the reality). And with the situation stated about pg_stat_last_operation 
this would never happen in GPDB, correct?

I tried PostgreSQL, after VACUUM FREEZE for template1 or postgres database age drops to 0.
Whereas I don't see that for GPDB (didn't dig into the details as to why).

PostgreSQL
==========
postgres=# select age(datfrozenxid),datname from pg_database;
 age |  datname  
-----+------------
  16 | postgres
  16 | direntrydb
  16 | template1
  16 | template0
(4 rows)

postgres=# vacuum freeze;
VACUUM
postgres=# select age(datfrozenxid),datname from pg_database;
 age |  datname  
-----+------------
   0 | postgres
  16 | direntrydb
  16 | template1
  16 | template0
(4 rows)

postgres=# \c template1
You are now connected to database "template1" as user "ashwin".
template1=# vacuum freeze;
VACUUM
template1=# select age(datfrozenxid),datname from pg_database;
 age |  datname  
-----+------------
   0 | postgres
  16 | direntrydb
   0 | template1
  16 | template0
(4 rows)


GPDB - age can increase after VACUUM FREEZE :-(
======
postgres=# select age(datfrozenxid),datname from pg_database where datname='template1';
 age |  datname  
-----+-----------
  24 | template1
(1 row)

postgres=# vacuum freeze;
VACUUM

postgres=# select age(datfrozenxid),datname from pg_database where datname='template1';
 age |  datname  
-----+-----------
 118 | template1
(1 row)

postgres=# select age(relfrozenxid), relname from pg_class where relfrozenxid != 0 and relisshared = 'f' order by 1 desc limit 30;
 age |         relname        
-----+-------------------------
  90 | pg_statistic
  90 | pg_toast_2619
  89 | pg_toast_3429
  89 | pg_statistic_ext_data
  88 | pg_toast_1418
  88 | pg_user_mapping
  86 | pg_toast_1247
  86 | pg_type
  85 | pg_attribute
  84 | pg_proc
  84 | pg_toast_1255
  83 | pg_class
  82 | pg_attrdef
  82 | pg_toast_2604
  81 | pg_constraint
  81 | pg_toast_2606
  80 | pg_inherits
  79 | pg_index
  78 | pg_operator
  77 | pg_opfamily
  76 | pg_opclass
  75 | pg_am
  74 | pg_amop
  73 | pg_amproc
  72 | pg_toast_2612
  72 | pg_language
  71 | pg_largeobject_metadata
  70 | pg_toast_2600
  70 | pg_aggregate
  69 | pg_toast_3381
(30 rows)

Why these relations continue to have age greater than 0. Is the problem not just with pg_stat_last_operation?
Or I am doing something wrong.


Side note:
Due to how the freezing is performed now via t_infomask and leaving the transaction 
number intact in table makes it very hard to trace not frozen tuples. Need to use pageinpect and all such longer route tool.
Don't think some easy function exist to give such tuples easily.

-- 
Ashwin Agrawal (VMware)

Huansong Fu

unread,
Jun 15, 2023, 4:27:55 PM6/15/23
to Ashwin Agrawal, Greenplum Developers
Why these relations continue to have age greater than 0

Great finding! It seems GPDB's VACUUM advances XID for each table being vacuumed. Whereas PG’s VACUUM takes no XID (txid_current itself takes one):

PostgreSQL
===========
postgres=# select txid_current();
 txid_current
--------------
          759
(1 row)

postgres=# vacuum freeze;
VACUUM
postgres=# select txid_current();
 txid_current
--------------
          760
(1 row)

GPDB
============
postgres=# select txid_current();
 txid_current
--------------
         1275
(1 row)

postgres=# vacuum freeze;
VACUUM
postgres=# select txid_current();
 txid_current
--------------
         1367
(1 row)

Firstly, that explains the value of relfrozenxid/datfrozenxid:
* A table’s relfrozenxid is set to the current XID at the time it is being vacuumed. Because the current XID advances, relfrozenxid also advances for each table.
* A database’s datfrozenxid is set to the minimal relfrozenxid in the database. 
* So after VACUUM FREEZE, datfrozenxid is set to the relfrozenxid of the first table being vacuumed and it would have an age of (number of table being vacuumed - 1).

Secondly, why does XID advance? It turns out that it is also related to pg_stat_last_operation: because we add/update the table we would request an XID. 
Since no XID has been assigned in VACUUM, we would assign a new one. Then we commit the transaction for the current table being vacuumed which completes the XID.

Now, is it a bug? I’m not sure:

* Firstly, it is definitely not ideal that VACUUM takes as many XIDs as the number of tables. But it is expected that we need a standalone transaction for
each table being vacuumed — that aspect is the same as upstream. So as long as we keep metadata tracking of VACUUM of each table, this would be hard to
avoid. Maybe we can just keep one overall VACUUM record in pg_stat_last_operation? Or only update all the records in the last transaction? We can look into that separately.

* Secondly, I think those relfrozenxid/datfrozenxid values still reflect the reality of the table/database: there’s no unfrozen rows below relfrozenxid/datfrozenxid. I think 
the key issue is still that we produced new unfrozen rows along with VACUUM FREEZE.

which will provide some details as the assumption was broken
Thanks for the link! It answers why VACUUM FREEZE is needed in pg_upgrade. And I agree we should keep such a safeguard (but move it prior to replacing pg_xact).
But I wonder why it is done only for template0 though? I guess maybe just template1 and postgres didn’t have the problem at that time. But maybe we should do the same 
for template1 and postgres for maximum assurance?

Best regards,
Huansong



On Jun 15, 2023, at 7:19 AM, Ashwin Agrawal <ashwi...@gmail.com> wrote:

!! External Email
!! External Email: This email originated from outside of the organization. Do not click links or open attachments unless you recognize the sender.

Ashwin Agrawal

unread,
Jun 21, 2023, 8:02:35 AM6/21/23
to Huansong Fu, Greenplum Developers
In the context of pure VACUUM FREEZE this is not a bug to produce unfrozen rows 
(given its correctly reflecting the age for the relation and database). The bug is purely in context
of pg_upgrade and all such operations which blindly rely on the assumption that after VACUUM FREEZE
the age of the database will be 0 without actually checking for the same. Based on that assumption, upgrade
is replacing transaction files and such seems incorrect - as if age is not 0 means the transaction ID will appear
on disk and hence can't perform the operations it's performing.

So, if PostgreSQL code is written with assumption that VACUUM FREEZE with have age as 0 at end we might
have to meet the same in GPDB and that means we have to avoid tracking VACUUM FREEZE transaction ID 
on disk (someway). Or if don't do the same generically at least have to do the same for pg_upgrade context -
means avoid tracking VACUUM FREEZE invoked by pg_upgrade time to avoid the problem.

In general though disliking VACUUM FREEZE unnecessarily consuming XIDs equivalent to the number of 
tables in the database or schema or partition table for that matter. But information on VACUUM FREEZE was
performed on a table is critical for troubleshooting and has to be maintained. So, need to continue to think how 
to accomplish the goal (I have not given to solution side yet)


which will provide some details as the assumption was broken
Thanks for the link! It answers why VACUUM FREEZE is needed in pg_upgrade. And I agree we should keep such a safeguard (but move it prior to replacing pg_xact).
But I wonder why it is done only for template0 though? I guess maybe just template1 and postgres didn’t have the problem at that time. But maybe we should do the same 
for template1 and postgres for maximum assurance?

prepare_new_cluster() step of pg_upgrade runs vacuumdb --all --freeze so it ideally takes care of all the connectable databases,
There is a comment above it for template0 stating initdb creates on frozen rows so no need to include it.

--
Ashwin Agrawal (VMware)

Huansong Fu

unread,
Jun 23, 2023, 12:36:22 AM6/23/23
to Ashwin Agrawal, Greenplum Developers
In the context of pure VACUUM FREEZE this is not a bug to produce unfrozen rows 
(given its correctly reflecting the age for the relation and database). The bug is purely in context
of pg_upgrade and all such operations which blindly rely on the assumption that after VACUUM FREEZE
the age of the database will be 0 without actually checking for the same. Based on that assumption, upgrade
is replacing transaction files and such seems incorrect - as if age is not 0 means the transaction ID will appear
on disk and hence can't perform the operations it's performing.

Just to get more clarify about table age vs unfrozen row: it seems to me that there’s a difference in the expectation 
from VACUUM FREEZE regarding them - it seems that the age doesn’t matter, but the unfrozen row does. Imagine 
after VACUUM FREEZE we consumed some more xid w/o adding more rows into the table (e.g. select xid_current()), 
even in upstream that would make table age > 0. But that’s OK right? pg_upgrade does VACUUM FREEZE because 
it wants to make sure every row is frozen, not every table has age=0. 

So if it is hard to avoid increasing table age after VACUUM FREEZE, we still have a way to fix our issue by avoiding 
unfrozen rows. Here’s a thought: if we can make sure pg_stat_last_operation is always the last one to freeze, then 
we would freeze all the pg_stat_last_operations rows we just inserted during a VACUUM FREEZE. After that, we may 
just skip the VACUUM meta-tracking of pg_stat_last_operation itself. Or, we may freeze pg_stat_last_operation again
and only skip tracking of the second freeze. That way we made sure everything is frozen. How does that sound?

Thanks,
Huansong


On Jun 21, 2023, at 8:02 AM, Ashwin Agrawal <ashwi...@gmail.com> wrote:

!! External Email

Ashwin Agrawal

unread,
Jun 26, 2023, 2:26:43 AM6/26/23
to Huansong Fu, Greenplum Developers
On Fri, Jun 23, 2023 at 10:06 AM Huansong Fu <fhua...@vmware.com> wrote:
In the context of pure VACUUM FREEZE this is not a bug to produce unfrozen rows 
(given its correctly reflecting the age for the relation and database). The bug is purely in context
of pg_upgrade and all such operations which blindly rely on the assumption that after VACUUM FREEZE
the age of the database will be 0 without actually checking for the same. Based on that assumption, upgrade
is replacing transaction files and such seems incorrect - as if age is not 0 means the transaction ID will appear
on disk and hence can't perform the operations it's performing.

Just to get more clarify about table age vs unfrozen row: it seems to me that there’s a difference in the expectation 
from VACUUM FREEZE regarding them - it seems that the age doesn’t matter, but the unfrozen row does. Imagine 
after VACUUM FREEZE we consumed some more xid w/o adding more rows into the table (e.g. select xid_current()), 
even in upstream that would make table age > 0. But that’s OK right? pg_upgrade does VACUUM FREEZE because 
it wants to make sure every row is frozen, not every table has age=0.

Yes, the problem is due to the VACUUM FREEZE transaction ID appearing in the database table (unfrozen row). 
I referred to the discussion more in age context as table age does define the oldest transaction ID appearing in 
the table. PostgreSQL doesn't have a mechanism currently to reflect that transaction ID doesn't appear in the 
table as still the table's relfrozenxid is updated with VACUUM FREEZE transaction ID. Ideally, upstream if transaction 
ID doesn't appear in the table, so the table's age shouldn't increase by consuming transactions 
(e.g. select xid_current()) but it does currently. (I understand that making future life easy for table upstream reflects 
relfrozenxid this way - irrespective of, that ID appears in the table or not).


So if it is hard to avoid increasing table age after VACUUM FREEZE, we still have a way to fix our issue by avoiding 
unfrozen rows. Here’s a thought: if we can make sure pg_stat_last_operation is always the last one to freeze, then 
we would freeze all the pg_stat_last_operations rows we just inserted during a VACUUM FREEZE. After that, we may 
just skip the VACUUM meta-tracking of pg_stat_last_operation itself. Or, we may freeze pg_stat_last_operation again
and only skip tracking of the second freeze. That way we made sure everything is frozen. How does that sound?

I would prefer the option to evaluate VACUUM FREEZEing pg_stat_last_operation at end during database wide vacuum 
operation (even if it happens twice is fine) and not tracking its own VACUUM FREEZE to solve the current aspects 
discussed in this thread.


--
Ashwin Agrawal (VMware)

Huansong Fu

unread,
Jun 29, 2023, 11:51:26 AM6/29/23
to Ashwin Agrawal, Greenplum Developers
I would prefer the option to evaluate VACUUM FREEZEing pg_stat_last_operation at end during database wide vacuum 
operation (even if it happens twice is fine) and not tracking its own VACUUM FREEZE to solve the current aspects 
discussed in this thread.

I will go down this path. Thanks Ashwin for the feedbacks!

Thanks,
Huansong

On Jun 26, 2023, at 2:26 AM, Ashwin Agrawal <ashwi...@gmail.com> wrote:

!! External Email
Reply all
Reply to author
Forward
0 new messages