Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Idea for nested transactions / savepoints

8 views
Skip to first unread message

Bruce Momjian

unread,
Aug 5, 2001, 12:41:14 AM8/5/01
to
I have been thinking about how to implement nested transactions /
savepoints. As you may remember, Vadim wants to add UNDO to WAL and
thus enable this feature.

Some objected because of the added WAL complexity and the problem with
long running transactions requiring lots of WAL segments.

I have not been able to come up with any solution that doesn't have some
UNDO capability to mark aborted tuples of the current transaction.

My idea is that we not put UNDO information into WAL but keep a List of
rel ids / tuple ids in the memory of each backend and do the undo inside
the backend. We could go around and clear our transaction id from
tuples that need to be undone.

Basically, I am suggesting a per-backend UNDO segment. This seems to
enable nested transactions without the disadvantages of putting it in
WAL.

Am I missing something about why UNDO should be in WAL?

I realize UNDO in WAL would allow UNDO of any transaction, but we don't
need that in our current non-overwriting system. It is only nested
transactions we need to undo, and I don't think we need WAL writing for
that because we are always undoing something before we commit the main
transaction. In a crash recover, the entire transaction is aborted
anyway.

--
Bruce Momjian | http://candle.pha.pa.us
pg...@candle.pha.pa.us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026

---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://www.postgresql.org/search.mpl

Tom Lane

unread,
Aug 5, 2001, 10:40:47 AM8/5/01
to
Bruce Momjian <pg...@candle.pha.pa.us> writes:
> My idea is that we not put UNDO information into WAL but keep a List of
> rel ids / tuple ids in the memory of each backend and do the undo inside
> the backend.

The complaints about WAL size amount to "we don't have the disk space
to keep track of this, for long-running transactions". If it doesn't
fit on disk, how likely is it that it will fit in memory?

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to majo...@postgresql.org

Bruce Momjian

unread,
Aug 5, 2001, 2:37:09 PM8/5/01
to
> Bruce Momjian <pg...@candle.pha.pa.us> writes:
> > My idea is that we not put UNDO information into WAL but keep a List of
> > rel ids / tuple ids in the memory of each backend and do the undo inside
> > the backend.
>
> The complaints about WAL size amount to "we don't have the disk space
> to keep track of this, for long-running transactions". If it doesn't
> fit on disk, how likely is it that it will fit in memory?

Sure, we can put on the disk if that is better. I thought the problem
with WAL undo is that you have to keep UNDO info around for all
transactions that are older than the earliest transaction. So, if I
start a nested transaction, and then sit at a prompt for 8 hours, all
WAL logs are kept for 8 hours.

We can create a WAL file for every backend, and record just the nested
transaction information. In fact, once a nested transaction finishes,
we don't need the info anymore. Certainly we don't need to flush these
to disk.

--
Bruce Momjian | http://candle.pha.pa.us
pg...@candle.pha.pa.us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026

---------------------------(end of broadcast)---------------------------

Tom Lane

unread,
Aug 5, 2001, 2:50:48 PM8/5/01
to
Bruce Momjian <pg...@candle.pha.pa.us> writes:
>> The complaints about WAL size amount to "we don't have the disk space
>> to keep track of this, for long-running transactions". If it doesn't
>> fit on disk, how likely is it that it will fit in memory?

> Sure, we can put on the disk if that is better.

I think you missed my point. Unless something can be done to make the
log info a lot smaller than it is now, keeping it all around until
transaction end is just not pleasant. Waving your hands and saying
that we'll keep it in a different place doesn't affect the fundamental
problem: if the transaction runs a long time, the log is too darn big.

There probably are things we can do --- for example, I bet an UNDO
log kept in this way wouldn't need to include page images. But it's
that sort of consideration that will make or break UNDO, not where
we store the info.

regards, tom lane

Bruce Momjian

unread,
Aug 5, 2001, 3:45:01 PM8/5/01
to
> Bruce Momjian <pg...@candle.pha.pa.us> writes:
> >> The complaints about WAL size amount to "we don't have the disk space
> >> to keep track of this, for long-running transactions". If it doesn't
> >> fit on disk, how likely is it that it will fit in memory?
>
> > Sure, we can put on the disk if that is better.
>
> I think you missed my point. Unless something can be done to make the
> log info a lot smaller than it is now, keeping it all around until
> transaction end is just not pleasant. Waving your hands and saying
> that we'll keep it in a different place doesn't affect the fundamental
> problem: if the transaction runs a long time, the log is too darn big.

When you said long running, I thought you were concerned about long
running in duration, not large transaction. Long duration in one-WAL
setup would cause all transaction logs to be kept. Large transactions
are another issue.

One solution may be to store just the relid if many tuples are modified
in the same table. If you stored the command counter for start/end of
the nested transaction, it would be possible to sequential scan the
table and undo all the affected tuples. Does that help? Again, I am
just throwing out ideas here, hoping something will catch.

--
Bruce Momjian | http://candle.pha.pa.us
pg...@candle.pha.pa.us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026

---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majo...@postgresql.org so that your
message can get through to the mailing list cleanly

Hannu Krosing

unread,
Aug 5, 2001, 4:57:46 PM8/5/01
to
Tom Lane wrote:
>
> Bruce Momjian <pg...@candle.pha.pa.us> writes:
> >> The complaints about WAL size amount to "we don't have the disk space
> >> to keep track of this, for long-running transactions". If it doesn't
> >> fit on disk, how likely is it that it will fit in memory?
>
> > Sure, we can put on the disk if that is better.
>
> I think you missed my point. Unless something can be done to make the
> log info a lot smaller than it is now, keeping it all around until
> transaction end is just not pleasant. Waving your hands and saying
> that we'll keep it in a different place doesn't affect the fundamental
> problem: if the transaction runs a long time, the log is too darn big.

Keeping it in a different place does have other benefits - you can
discard
each subtransaction after it is committed/aborted regardless of what WAL
log does, so the chap who did a "begin transaction" 8 hours ago does not
get
subtransactions kept as well, thus postponing the problem a lot.

> There probably are things we can do --- for example, I bet an UNDO
> log kept in this way wouldn't need to include page images.

Not keeping something that does not need to be kept is always a good
idea
when preserving space is important.

> But it's that sort of consideration that will make or break UNDO,
> not where we store the info.

But "how long do we need to keep the info" _is_ an important
consideration.

--------------
Hannu

Bruce Momjian

unread,
Aug 5, 2001, 9:23:33 PM8/5/01
to
> > Bruce Momjian <pg...@candle.pha.pa.us> writes:
> > >> The complaints about WAL size amount to "we don't have the disk space
> > >> to keep track of this, for long-running transactions". If it doesn't
> > >> fit on disk, how likely is it that it will fit in memory?
> >
> > > Sure, we can put on the disk if that is better.
> >
> > I think you missed my point. Unless something can be done to make the
> > log info a lot smaller than it is now, keeping it all around until
> > transaction end is just not pleasant. Waving your hands and saying
> > that we'll keep it in a different place doesn't affect the fundamental
> > problem: if the transaction runs a long time, the log is too darn big.
>
> When you said long running, I thought you were concerned about long
> running in duration, not large transaction. Long duration in one-WAL
> setup would cause all transaction logs to be kept. Large transactions
> are another issue.
>
> One solution may be to store just the relid if many tuples are modified
> in the same table. If you stored the command counter for start/end of
> the nested transaction, it would be possible to sequential scan the
> table and undo all the affected tuples. Does that help? Again, I am
> just throwing out ideas here, hoping something will catch.

Actually, we need to keep around nested transaction UNDO information
only until the nested transaction exits to the main transaction:

BEGIN WORK;
BEGIN WORK;
COMMIT;
-- we can throw away the UNDO here
BEGIN WORK;
BEGIN WORK;
...
COMMIT
COMMIT;
-- we can throw away the UNDO here
COMMIT;

We are using the outside transaction for our ACID capabilities, and just
using UNDO for nested transaction capability.

--
Bruce Momjian | http://candle.pha.pa.us
pg...@candle.pha.pa.us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026

---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster

0 new messages