gpbackup: Utilize pg_export_snapshot to ensure consistent data visibility across parallel workers

170 views
Skip to first unread message

Brent Doil

unread,
Feb 3, 2022, 4:15:48 PM2/3/22
to Greenplum Developers
Hello everyone,

We would like some input on a proposed feature when using gpbackup on 6X when using the jobs flag.

Problem Statement
If you specify a jobs value higher than 1, the database must be in a quiescent state at the very beginning while the utility creates the individual connections, initializes their transaction snapshots, and acquires a lock on the tables that are being backed up. If concurrent database operations are being performed on tables that are being backed up during the transaction snapshot initialization and table locking step, consistency between tables that are backed up in different parallel workers cannot be guaranteed.

To avoid a COPY deadlock scenario, parallel workers also must acquire an ACCESS SHARE lock on each table before attempting the COPY TO command These locks are currently not released until end of backup. if a parallel worker is unable to get a lock on a table, the worker no longer has a valid distributed snapshot and is terminated.

Proposal
Guarantee user data consistency and by exporting a distributed snapshot that all parallel workers will use to synchronize their views of the database.

Goals
  1. Consistency between tables that are backed up is guaranteed when running gpbackup with default parameters.
  2. Reuse parallel worker connections if an error is encountered backing up a table. 
  3. Reduce maximum concurrent locks held during gpbackup operations 

Solution 1. When jobs flag is specified, establish jobs + 1 total connections. Use connection 0 to export the snapshot. Parallel workers will begin and commit a transaction for each table.

connection 0:
  1. Sets and exports transaction snapshot 
  2. Runs catalog queries required for setup.
  3. Acquires ACCESS SHARE locks on tables in the backup set
connections 1.. n
  1. Worker gets a table from the set, begins a new transaction, imports the snapshot, attempts a table lock.
  2. If lock succeeds, copy the table out and commit the transaction.
  3. If lock fails, place table into a deferred queue and rollback the transaction.
  4. Repeat step 1.
connection 0 scans the deferred queue. It already holds the locks, so copies the table out. 

Risks/Issues
  1. The connection that exported the snapshot must keep the transaction alive for the duration of the backup. This is no change from the current implementation.
    1. If a copy error occurs on the connection, the snapshot is no longer valid and the backup will fail.
  2. gpbackup implicitly establishes jobs + 1 connections to the database.
  3. Connection 0 must still acquire ACCESS SHARE locks for all tables in a single transaction
Solution 2. Implement optional flag that does not acquire locks up front.

Several users have run into the error [CRITICAL]:-ERROR: out of shared memory (SQLSTATE 53200) when attempting to gather the ACCESS SHARE locks. 

Provide users who can guarantee there will be no MVCC unsafe statements i.e. TRUNCATE TABLE or several ALTER TABLE variations, the ability to run a backup without acquiring locks that attempts to ensure consistent data visibility as a best effort.

A user who is aware of the downsides should have the ability to backup an arbitrarily large database without concern for running out of shared memory due to locking, or having to restart the cluster to enable a workaround, such as increasing the GUC max_locks_per_transaction.

connection 0:
  1. Sets and exports transaction snapshot 
  2. Runs catalog queries required for setup
connections 1 .. n
  1. Worker gets a table from the set, begins a new transaction, imports the snapshot, attempts a table lock.
  2. If lock succeeds, copy the table out and commit the transaction.
  3. If lock fails, log a warning and pass table to an errored tables list.
  4. Repeat step 1.
cleanup:
  1. Output list of errored tables, if any
  2. Query pg_catalog.pg_stat_last_operation to determine if MVCC unsafe operations were run on tables on the backup set during the backup timeframe. If yes, output warning with list of tables.
Risks/Issues
  1. The connection that exported the snapshot must keep the transaction alive for the duration of the backup.
  2. A backup could become inconsistent or table DDL could be modified at any point if an MVCC unsafe operation is run.
  3. Additional documentation and log output required.
One or both of these solutions could be implemented.

Thanks,
Brent

Ashwin Agrawal

unread,
Feb 3, 2022, 5:26:43 PM2/3/22
to Brent Doil, Greenplum Developers
On Thu, Feb 3, 2022 at 1:15 PM Brent Doil <bd...@pivotal.io> wrote:
Hello everyone,

We would like some input on a proposed feature when using gpbackup on 6X when using the jobs flag.

I am guessing 6X and above. We should start designing with the master branch as a first class citizen.
Nice description. Summary, we just need to implement exactly the same semantics as what pg_dump --jobs does, just difference being PostgreSQL using single node snapshot and we would be using distributed snapshot export and import. All other aspects of the work should be just the same. Refer [1] which can section to explain and functionality plus warning to users on how backups can fail due to concurrent exclusive lock workload.

Please call out and let's discuss if we need to differ from upstream semantics in any way.

I don't feel we should be providing a solution 2 at all. How much we document, providing the option to get corrupted/inconsistent backup is not a good thing. We should continue to explore and see how to keep optimizing solution 1.


-- 
Ashwin Agrawal (VMware)
Message has been deleted
Message has been deleted

Shivram Mani

unread,
Feb 3, 2022, 11:42:01 PM2/3/22
to Ashwin Agrawal, Brent Doil, Greenplum Developers
Yes Solution 1 is something that would be a significant enhancement on the existing architecture.

Ashwin, with regards to the parallel version of pg_dump (with --jobs flag)  in the deadlock scenario where one of the worker is not able to obtain the lock (NOWAIT), upstream simply aborts the entire pg_dump operation.
On the other hand with gpbackup we attempt to tolerate such cases by simply deferring to the main worker thread.

Brent, in the current design the main worker thread waits until all the worker threads are handled, and only then processes the deferred tables.
Do we still need to have this constraint, or can it concurrently process the deferred tables as well ?

Regards
Shivram

From: Ashwin Agrawal <ashwi...@gmail.com>
Sent: Thursday, February 3, 2022 2:26 PM
To: Brent Doil <bd...@vmware.com>
Cc: Greenplum Developers <gpdb...@greenplum.org>
Subject: Re: gpbackup: Utilize pg_export_snapshot to ensure consistent data visibility across parallel workers
 

Ashwin Agrawal

unread,
Feb 4, 2022, 12:00:42 PM2/4/22
to Shivram Mani, Brent Doil, Greenplum Developers
On Thu, Feb 3, 2022 at 8:41 PM Shivram Mani <shiv...@vmware.com> wrote:
Ashwin, with regards to the parallel version of pg_dump (with --jobs flag)  in the deadlock scenario where one of the worker is not able to obtain the lock (NOWAIT), upstream simply aborts the entire pg_dump operation.
On the other hand with gpbackup we attempt to tolerate such cases by simply deferring to the main worker thread.

That's a fantastic enhancement in gpbackup, love it.

Even if we may not contribute code for this enhancement to upstream in pg_dump, if interested please go ahead to propose the enhancement idea. Slower but functional is better than failed dump as its long running activity.

--
Ashwin Agrawal (VMware)

Brent Doil

unread,
Feb 4, 2022, 1:11:45 PM2/4/22
to Shivram Mani, Ashwin Agrawal, Greenplum Developers
Brent, in the current design the main worker thread waits until all the worker threads are handled, and only then processes the deferred tables.
Do we still need to have this constraint, or can it concurrently process the deferred tables as well ?
​The proposed implementation uses worker 0 to concurrently process the deferred tables, which will further enhance performance.

Brent


From: Shivram Mani <shiv...@vmware.com>
Sent: Thursday, February 3, 2022 11:41 PM
To: Ashwin Agrawal <ashwi...@gmail.com>; Brent Doil <bd...@vmware.com>
Reply all
Reply to author
Forward
0 new messages