XMIN semantic at peril ?

Karsten Hilbert

unread,

Oct 11, 2007, 9:26:32 AM10/11/07

to

Hi all,

we use XMIN in our application (http://wiki.gnumed.de) as a
lightweight database-provided "data-changed" marker in a
cross-transaction optimistic locking scheme. The one
characteristic of XMIN we rely on in this scheme is that
"XMIN changes when a (writing) transaction commits
successfully". We don't care *how* it changes, just *that*
it does so.

Now, one kind soul has reviewed our model and suggested that
we really shouldn't rely on XMIN at the end of the day for
various fairly good reasons:

- XMIN is an implementation detail of PostgreSQL (similar to row OIDs)
- GNUmed shouldn't rely on PostgreSQL implementation details

I argued that

- GNUmed only relies on a very general assumption about XMIN behaviour
- PG is likely to always need a way to mark tuples committed one
way or another as that is part of tuple visibility (it needn't be
or be named XMIN, though)
- PG is unlikely (given its track record) of obscuring
visibility of that commit marker

Surely, good counter-arguments were brought forth:

- row OIDs did disappear (or, rather, were defaulted to off)
- it is uncertain whether tuple visibility semantics of PG
will in the future continue to lend themselves to our
cross-tx collision detection/optimistic locking scheme
- particularly in the light of 8.3 HOT patches
- more so in the light of the "Skytools extended transaction ID module"

Of course, we do know how to alleviate the situation
technically by just defining our own marker column and
adding appropriate triggers to the relevant tables to update
said marker. Which, in a perfect world of unlimited
resources, we'd just go ahead and do. In the real world,
however, I'd like to assess the likelihood of our need to
embark on such a mission. So, eventually, in the near- to
mid-future:

How likely is XMIN (or equivalent) to become invisible to
SQL level user space ?

How likely is XMIN (or equivalent) to NOT change on each
successful (write) transaction commit anymore ?

Thanks for any insight,
Karsten
--
GPG key ID E4071346 @ wwwkeys.pgp.net
E167 67FD A291 2BEA 73BD 4537 78B9 A9F9 E407 1346

---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster

Tom Lane

unread,

Oct 11, 2007, 10:44:17 AM10/11/07

to

Karsten Hilbert <Karsten...@gmx.net> writes:
> How likely is XMIN (or equivalent) to become invisible to
> SQL level user space ?

No one has suggested this. I suppose the argument could be made that
the system columns are an unwarranted intrusion on users' column
namespace, but we'd probably handle that by demoting them to
second-class citizens, not hiding them entirely --- there are far too
many apps that rely on ctid, for instance, and I think some that are
doing like you do with xmin. So as long as you don't create a user
column named xmin in your tables, you could expect to access the system
column.

> How likely is XMIN (or equivalent) to NOT change on each
> successful (write) transaction commit anymore ?

No chance of that, unless we abandon MVCC for something else, which
again seems highly unlikely.

One question I'd have though is whether "freezing" of old tuples is
likely to confuse your app. That process might get more aggressive
in the future (it already is more aggressive in 8.2 than before,
depending on where vacuum_freeze_min_age is set).

The only argument you cited that seems impressive to me is the one
about it being a Postgres-ism. Are you willing to have GNUmed tied
tightly to Postgres?

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match

Karsten Hilbert

unread,

Oct 11, 2007, 11:12:23 AM10/11/07

to

On Thu, Oct 11, 2007 at 10:44:17AM -0400, Tom Lane wrote:

> The only argument you cited that seems impressive to me is the one
> about it being a Postgres-ism. Are you willing to have GNUmed tied
> tightly to Postgres?

Personally, yes, no problem and thanks for it :-))

Karsten
--
GPG key ID E4071346 @ wwwkeys.pgp.net
E167 67FD A291 2BEA 73BD 4537 78B9 A9F9 E407 1346

---------------------------(end of broadcast)---------------------------

Filip Rembiałkowski

unread,

Oct 11, 2007, 11:18:30 AM10/11/07

to

2007/10/11, Tom Lane <t...@sss.pgh.pa.us>:

> The only argument you cited that seems impressive to me is the one
> about it being a Postgres-ism. Are you willing to have GNUmed tied
> tightly to Postgres?

Well, at least not in all aspects, if I may cut in.

(from http://www.gnumed.org/)
error
insert into WebLog values(586,31,"2007-10-12","80.177.203.190")
Can't open file: 'WebLog.MYD'. (errno: 145)

SCNR

Filip

Karsten Hilbert

unread,

Oct 11, 2007, 11:30:55 AM10/11/07

to

On Thu, Oct 11, 2007 at 04:18:30PM +0100, Filip Rembiałkowski wrote:
> Subject: Re: [GENERAL] XMIN semantic at peril ?

>
> 2007/10/11, Tom Lane <t...@sss.pgh.pa.us>:
> > The only argument you cited that seems impressive to me is the one
> > about it being a Postgres-ism. Are you willing to have GNUmed tied
> > tightly to Postgres?
>
> Well, at least not in all aspects, if I may cut in.
>
> (from http://www.gnumed.org/)
> error
> insert into WebLog values(586,31,"2007-10-12","80.177.203.190")
> Can't open file: 'WebLog.MYD'. (errno: 145)

Well, that's just a tool we use and got nothing to do with
GNUmed proper. We don't have control over the domain since
the current holder doesn't seem willing to hand it over (not
that that's much of a problem, we do have edit rights to
that homegrown mysql based wiki).

This is where all the action is, documentation-wise:

http://wiki.gnumed.de

Karsten
--
GPG key ID E4071346 @ wwwkeys.pgp.net
E167 67FD A291 2BEA 73BD 4537 78B9 A9F9 E407 1346

---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

http://archives.postgresql.org/

Karsten Hilbert

unread,

Oct 11, 2007, 11:41:34 AM10/11/07

to

On Thu, Oct 11, 2007 at 10:44:17AM -0400, Tom Lane wrote:

> > How likely is XMIN (or equivalent) to NOT change on each
> > successful (write) transaction commit anymore ?
>
> No chance of that, unless we abandon MVCC for something else, which
> again seems highly unlikely.

I figured as much but I'm just a loney MD.

> One question I'd have though is whether "freezing" of old tuples is
> likely to confuse your app. That process might get more aggressive
> in the future (it already is more aggressive in 8.2 than before,
> depending on where vacuum_freeze_min_age is set).

Well, what we do is this:

- read row including XMIN
- do some UI stuff without open transactions
- update row with "... where pk = ... and XMIN = old_xmin_from_read"

If in the meantime another writer changed the data we
originally read we would detect that by xmin having changed
hence no row to be updated. So, yes, there is a *tiny*
failure condition:

- read row including XMIN
- do UI stuff w/o open transaction
- wait a LONG time in which
- old_xmin_from_read gets frozen
- and recycled on the *very same* row
- attempt to write initial data with ... WHERE XMIN = old_xmin_from_initial_read

In those very circumstances we would not be able to detect
the collision. Now, how likely is that given sane values for
vacuum_freeze_min_age and average ambulatory care patient
encounter times of 5-20 minutes (which would in GNUmed be
equivalent to cross-tx medical record open times - IOW the
time our XMIN based optimistic locks would normally be
held).

Karsten
--
GPG key ID E4071346 @ wwwkeys.pgp.net
E167 67FD A291 2BEA 73BD 4537 78B9 A9F9 E407 1346

---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faq

Tom Lane

unread,

Oct 11, 2007, 12:03:47 PM10/11/07

to

Karsten Hilbert <Karsten...@gmx.net> writes:
> On Thu, Oct 11, 2007 at 10:44:17AM -0400, Tom Lane wrote:
>> One question I'd have though is whether "freezing" of old tuples is
>> likely to confuse your app.

> Well, what we do is this:

> - read row including XMIN
> - do some UI stuff without open transactions
> - update row with "... where pk = ... and XMIN = old_xmin_from_read"

> If in the meantime another writer changed the data we
> originally read we would detect that by xmin having changed
> hence no row to be updated. So, yes, there is a *tiny*
> failure condition:

Hmm. I think the failure condition is not what you are thinking: in
your example, you'd correctly conclude that some other transaction
modified the row. The problem case is

- read (a rather old) row including XMIN
- VACUUM comes along and decides to set XMIN = FrozenTransactionId

- update row with "... where pk = ... and XMIN = old_xmin_from_read"

- update fails, when there is no need to fail

As long as the failure is "soft", ie, you recover reasonably, this
shouldn't be a big problem. But it's certainly not a scenario you
should dismiss as not credible because of timescales.

Erik Jones

unread,

Oct 11, 2007, 12:47:37 PM10/11/07

to

On Oct 11, 2007, at 11:03 AM, Tom Lane wrote:

> Karsten Hilbert <Karsten...@gmx.net> writes:
>> On Thu, Oct 11, 2007 at 10:44:17AM -0400, Tom Lane wrote:
>>> One question I'd have though is whether "freezing" of old tuples is
>>> likely to confuse your app.
>
>> Well, what we do is this:
>
>> - read row including XMIN
>> - do some UI stuff without open transactions
>> - update row with "... where pk = ... and XMIN = old_xmin_from_read"
>
>> If in the meantime another writer changed the data we
>> originally read we would detect that by xmin having changed
>> hence no row to be updated. So, yes, there is a *tiny*
>> failure condition:
>
> Hmm. I think the failure condition is not what you are thinking: in
> your example, you'd correctly conclude that some other transaction
> modified the row. The problem case is
>
> - read (a rather old) row including XMIN
> - VACUUM comes along and decides to set XMIN = FrozenTransactionId
> - update row with "... where pk = ... and XMIN = old_xmin_from_read"
> - update fails, when there is no need to fail
>
> As long as the failure is "soft", ie, you recover reasonably, this
> shouldn't be a big problem. But it's certainly not a scenario you
> should dismiss as not credible because of timescales.

If the query is always based on a primary key + XMIN, and since
vacuum is the only thing that sets FrozenTransactionId, would it be
unsane to change the update to

- update row with "... where pk=... and XMIN IN (old_xmin_from_read,
FrozenTransactionId)

?

Erik Jones

Software Developer | Emma®
er...@myemma.com
800.595.4401 or 615.292.5888
615.292.0777 (fax)

Emma helps organizations everywhere communicate & market in style.
Visit us online at http://www.myemma.com

Tom Lane

unread,

Oct 11, 2007, 1:48:10 PM10/11/07

to

Erik Jones <er...@myemma.com> writes:
> If the query is always based on a primary key + XMIN, and since
> vacuum is the only thing that sets FrozenTransactionId, would it be
> unsane to change the update to

> - update row with "... where pk=... and XMIN IN (old_xmin_from_read,
> FrozenTransactionId)

I wouldn't risk it, because that *would* fail in the case Karsten
mentioned of an update sequence taking so long that a concurrent
update got frozen.

regards, tom lane

---------------------------(end of broadcast)---------------------------

Karsten Hilbert

unread,

Oct 11, 2007, 2:07:13 PM10/11/07

to

On Thu, Oct 11, 2007 at 01:48:10PM -0400, Tom Lane wrote:

> > If the query is always based on a primary key + XMIN,

It is.

> > and since
> > vacuum is the only thing that sets FrozenTransactionId, would it be
> > unsane to change the update to
>
> > - update row with "... where pk=... and XMIN IN (old_xmin_from_read,
> > FrozenTransactionId)
>
> I wouldn't risk it, because that *would* fail in the case Karsten
> mentioned of an update sequence taking so long that a concurrent
> update got frozen.

Which is the failure case we should be more worried about:
"not detecting a valid collision" rather than "detecting a
non-collision" as could happen in the scenario Tom pointed
out. False-Negative vs. False-Positive, that is to say ;-)

I was unable to find any reference to FrozenTransationId in
the 8.2 documentation (found it in the source, though).
Hence one would assume its value cannot be derived (at the
SQL level) at runtime and hence it needed to be hardcoded in
our Python code. Which, I dare say, would mean relying on
even more obscure implementation details which was part of
the argument that triggered the whole process of thought.

Ne'ertheless, thanks for the suggestion. I am gaining a much
better understanding of the factors involved.

Karsten Hilbert

unread,

Oct 18, 2007, 5:20:12 PM10/18/07

to

On Thu, Oct 11, 2007 at 12:03:47PM -0400, Tom Lane wrote:

> > If in the meantime another writer changed the data we
> > originally read we would detect that by xmin having changed
> > hence no row to be updated. So, yes, there is a *tiny*
> > failure condition:
>
> Hmm. I think the failure condition is not what you are thinking: in
> your example, you'd correctly conclude that some other transaction
> modified the row.

Not really, notice:

- original XMIN read, transaction is closed
- a loooong time passes:
- original row gets frozen, XMIN changed to FrozenTransactionId
- XMIN wraps
- original XMIN is *reused* on the *exact same* original row by
another concurrent writer without changing the primary key
- the original row is written back to the database
with "... where xmin = original_xmin and pk = original_pk ..."

Now the update succeeds, although the data DID change !

This is a worst-case failure but should be *very* rare.

> The problem case is
>
> - read (a rather old) row including XMIN
> - VACUUM comes along and decides to set XMIN = FrozenTransactionId
> - update row with "... where pk = ... and XMIN = old_xmin_from_read"
> - update fails, when there is no need to fail
>
> As long as the failure is "soft", ie, you recover reasonably, this
> shouldn't be a big problem. But it's certainly not a scenario you
> should dismiss as not credible because of timescales.

Very true. I documented this in our code and set up a TODO
item to switch to a AFTER-trigger updated non-system oplock
column.

Thanks,

Karsten
--
GPG key ID E4071346 @ wwwkeys.pgp.net
E167 67FD A291 2BEA 73BD 4537 78B9 A9F9 E407 1346

---------------------------(end of broadcast)---------------------------