Updates, READ COMMITTED, and append-only tables

75 views
Skip to first unread message

Heikki Linnakangas

unread,
Aug 24, 2017, 2:34:23 AM8/24/17
to Greenplum Developers, Daniel Gustafsson
Hi,

Coverity is pointing out an access to an uninitialized variable in
ExecUpdate. With some irrelevant details omitted, it looks like this:

TransactionId update_xmax = InvalidTransactionId;
ItemPointerData update_ctid;

if (rel_is_heap)
{
result = heap_update(resultRelationDesc, tupleid, tuple,
&update_ctid, &update_xmax, ...);
}
else if (rel_is_aorows)
{
result = appendonly_update(resultRelInfo->ri_updateDesc, tuple,
(AOTupleId *) tupleid, &aoTupleId);
}
else if (rel_is_aocols)
{
result = aocs_update(resultRelInfo->ri_updateDesc,
partslot, (AOTupleId *) tupleid, &aoTupleId);

}

switch (result)
{
case HeapTupleSelfUpdated:
/* already deleted by self; nothing to do */
return;

case HeapTupleMayBeUpdated:
break;

case HeapTupleUpdated:
if (!ItemPointerEquals(tupleid, &update_ctid))
{
TupleTableSlot *epqslot;

Assert(update_xmax != InvalidTransactionId);

epqslot = EvalPlanQual(estate,
resultRelInfo->ri_RangeTableIndex,
&update_ctid,
update_xmax);
if (!TupIsNull(epqslot))
{
*tupleid = update_ctid;
partslot = ExecFilterJunk(estate->es_junkFilter, epqslot);
tuple = ExecFetchSlotHeapTuple(partslot);
goto lreplace;
}
}
/* tuple already deleted; nothing to do */
return;

default:
elog(ERROR, "unrecognized heap_update status: %u", result);
return;
}

Coverity is pointing out, that if we're updating an appendonly or an
AOCS table, the update_ctid variable is not set, but it checked in the
"case HeapTupleUpdated" above. Moreover, if we reach that EvalPlanQual()
above, update_xmax will still be invalid, and we will trip the assertion.

So clearly this is wrong, but I wonder what we should do here? To recap,
this is the code that handles UPDATEs in READ COMMITTED mode. In a
normal heap table, if you try to UPDATE a row, but it's been
concurrently updated by another transaction, the system will follow the
ctid chain to find the latest version of the row, re-evaluate the plan
quals (i.e. the WHERE clause) against the latest version, and if it
still passes the quals, it will update the latest version instead.
That's in READ COMMITTED mode; in REPEATABLE READ / SERIALIZABLE mode,
you get a "could not serialize access due to concurrent update" error
instead.

But tuples in append-only tables don't have CTID pointers, so we cannot
do that. If you find that a tuple has been concurrently updated, there's
no way to find the latest version of the same row. So what should we do?
I can see two feasible options:

1. Skip the update. This is what we do for DELETEd tuples. In essence,
this would mean treating an UPDATEd tuple the same as an DELETE+INSERT.

2. Throw an error, like in SERIALIZABLE mode. This has the problem that
I don't think we can even tell an updated and a deleted tuple apart. So
we would have to throw an error even on deleted tuples. That's what
happens in a serializable transaction too.

I'm leaning towards option 1. But I wonder if we have this behavior
documented somewhere already? If not, this really needs more comments.

- Heikki

Pengzhou Tang

unread,
Aug 24, 2017, 2:51:42 AM8/24/17
to Heikki Linnakangas, Greenplum Developers, Daniel Gustafsson

But tuples in append-only tables don't have CTID pointers, so we cannot do that. If you find that a tuple has been concurrently updated, there's no way to find the latest version of the same row. So what should we do? I can see two feasible options:

1. Skip the update. This is what we do for DELETEd tuples. In essence, this would mean treating an UPDATEd tuple the same as an DELETE+INSERT.


 Are you meaning pg_aoseg.pg_aovisimap_<oid>? As README says, it works as an overlay, over the ao data and implements the MVCC visibility for deleted AO tables.

Heikki Linnakangas

unread,
Aug 24, 2017, 2:57:09 AM8/24/17
to Pengzhou Tang, Greenplum Developers, Daniel Gustafsson
I'm talking about updating a row, at a higher-level. Yeah, MVCC for AO
tables is implemented using the visimap, but that's a lower-level
implementation detail.

When you update a heap tuple with heap_update(), and the tuple has been
concurrently updated, it returns a CTID pointing to the latest tuple
version of the same tuple. The corresponding appendonly_update()
function doesn't return that.

- Heikki

Pengzhou Tang

unread,
Aug 24, 2017, 3:26:53 AM8/24/17
to Heikki Linnakangas, Greenplum Developers, Daniel Gustafsson


I'm talking about updating a row, at a higher-level. Yeah, MVCC for AO tables is implemented using the visimap, but that's a lower-level implementation detail.

When you update a heap tuple with heap_update(), and the tuple has been concurrently updated, it returns a CTID pointing to the latest tuple version of the same tuple. The corresponding appendonly_update() function doesn't return that.


Ok, it's about how to handle concurrently update on ao tables. Just a silly question, is it possible to follow the ctid chain of visimap to found the newest version and fetch the tuple it maps to re-execute the plan equal?


- Heikki

Heikki Linnakangas

unread,
Aug 24, 2017, 3:33:31 AM8/24/17
to Pengzhou Tang, Greenplum Developers, Daniel Gustafsson
No, unfortunately not. Following the CTID chain of the visimap tuple
would lead you to the transaction XID that performed the concurrent
update, but it wouldn't tell you which new tuple corresponds to which
old tuple.

- Heikki
Reply all
Reply to author
Forward
0 new messages