Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.

Dismiss

Per tuple overhead, cmin, cmax

7 views

Skip to first unread message

Manfred Koizar

unread,

May 2, 2002, 3:54:19 PM5/2/02

Hi,

having been chased away from pgsql-novice by Rasmus Mohr, I come here
to try my luck :-) I'm still new to this; so please be patient, if I
ask silly questions.

There has been a discussion recently about saving some bytes per tuple
header. Well, I have the suspicion, we can eliminate 4 bytes more by
using one single field for t_cmin and t_cmax. Here are my thoughts
supporting this feeling:

(1) The purpose of the command ids is to prevent a command from
working on the same tuple more than once. 2002-04-20 Tom Lane wrote
in "On-disk Tuple Size":
> You don't want an individual command's effects to become visible
> until you do CommandCounterIncrement.
and
> The command IDs aren't interesting anymore once the originating
> transaction is over, but I don't see a realistic way to recycle the
> space ...

(2) The command ids are of any meaning (2a) only during the active
transaction (with the exception of the (ab)use of t_cmin by vacuum);
and (2b) we are only interested in whether t_cxxx is less than the
current command id, if it is, we don't care about the exact value.

(3) From a command's view a tuple can be in one of these states:

(3a) Neither t_xmin nor t_xmax is the current transaction. The tuple
has been neither inserted nor deleted (and thus not updated) by this
transaction, and the command ids are irrelevant.

(3b) t_xmin is the current transaction, t_xmax is
InvalidTransactionId; i.e. the tuple has been inserted by the current
transaction and it has not been deleted (or replaced). In this case
t_cmin identifies the command having inserted the tuple, and t_cmax is
irrelevant.

(3c) t_xmin is some other transaction, t_xmax is the current
transaction; i.e. the current transaction has deleted the tuple.
Then t_cmax identifies the command having deleted the tuple, t_cmin is
irrelevant.

(3d) t_xmin == t_xmax == current transaction. The tuple has been
inserted and then deleted by the current transaction. Then I claim
(but I'm not absolutely sure), that insert and delete cannot have
happened in the same command,
so t_cmin < t_cmax,
so t_cmin < CurrentCommandId,
so the exact value of t_cmin is irrelevant.

So at any moment at most one of the two fields t_cmin and t_cmax is
needed.

(4) If (3) is true, we can have a single field t_cnum in
HeapTupleHeaderData, the meaning of which is t_cmax, if t_xmax is the
current transaction, otherwise t_cmin.

t_cmin is used in:
. backend/access/common/heaptuple.c
. backend/access/heap/heapam.c
. backend/access/transam/xlogutils.c
. backend/commands/vacuum.c
. backend/utils/time/tqual.c

t_cmax is used in:
. backend/access/common/heaptuple.c
. backend/access/heap/heapam.c
. backend/utils/time/tqual.c

As far as I have checked these sources (including the abuse of c_tmin
by vacuum) my suggested change should be possible, but as I told you
I'm new here and so I have the following questions:

(Q1) Is assumption (3d) true? Do you have any counter examples?

(Q2) Is there any possibiltity of t_cmax being set and t_cmin still
being needed? (Preferred answer: no :-)

(Q3) Are my thoughts WAL compatible?

(Q4) Is it really easy to change the size of HeapTupleHeaderData? Are
the data of this struct only accessed by field names or are there
dirty tricks using memcpy() and pointer arithmetic?

(Q5) Are these thoughts obsolete as soon as nested transactions are
considered?

Thank you for reading this long message.

Servus
Manfred

---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

Manfred Koizar

unread,

May 2, 2002, 8:45:38 PM5/2/02

Tom,
thanks for answering.

On Thu, 02 May 2002 17:16:38 -0400, you wrote:
>The hole in this logic is that there can be multiple active scans with
>different values of CurrentCommandId (eg, within a function
>CurrentCommandId may be different than it is outside). If you overwrite
>cmin with cmax then you are destroying the information needed by a scan
>with smaller CurrentCommandId than yours.

Oh, I see :-(
Let me throw in one of my infamous wild ideas in an attempt to rescue
my proposal: We have 4 32-bit-numbers: xmin, cmin, xmax, and cmax.
The only case, where we need cmin *and* cmax, is, when xmin == xmax.
So if we find a single bit to flag this case, we only need 3
32-bit-numbers to store this information on disk.

To keep the code readable we probably would need some accessor
functions or macros to access these fields.

As of 7.2 there are three unused bits in t_infomask.

>
>> (Q4) Is it really easy to change the size of HeapTupleHeaderData? Are
>> the data of this struct only accessed by field names or are there
>> dirty tricks using memcpy() and pointer arithmetic?
>

>AFAIK there are no dirty tricks there. I am hesitant to change the
>header layout without darn good reason, because it breaks any chance
>of having a working pg_upgrade process. But that's strictly a
>production-system concern, and need not discourage you from
>experimenting.

Is saving 4 bytes per tuple a "darn good reason"? Is a change
acceptable for 7.3? Do you think it's worth the effort?

> We haven't worked out exactly how nested transactions would
>work, but to the extent that they are handled as different CommandIds
>we'd have the same issue already mentioned: we should not assume that
>execution of different CommandIds can't overlap in time.

Assuming that a subtransaction is completely contained in the outer
transaction and there is no activity by the outer transaction while
the subtransaction is active, I believe, this problem can be solved
...
It's late now, I'll try to think clearer tomorrow.

Good night

Tom Lane

unread,

May 2, 2002, 9:40:42 PM5/2/02

Manfred Koizar <mko...@aon.at> writes:
> Let me throw in one of my infamous wild ideas in an attempt to rescue
> my proposal: We have 4 32-bit-numbers: xmin, cmin, xmax, and cmax.
> The only case, where we need cmin *and* cmax, is, when xmin == xmax.
> So if we find a single bit to flag this case, we only need 3
> 32-bit-numbers to store this information on disk.

Hmm ... that might work. Actually, we are trying to stuff *five*
numbers into these fields: xmin, xmax, cmin, cmax, and a VACUUM FULL
transaction id (let's call it xvac just to have a name). The code
currently assumes that cmin is not interesting simultaneously with xvac.
I think it might be true that cmax is not interesting simultaneously
with xvac either, in which case this could be made to work. (Vadim,
your thoughts?)

> To keep the code readable we probably would need some accessor
> functions or macros to access these fields.

Amen. But that would be cleaner than now, at least for VACUUM;
it's just using cmin where it means xvac.

> Is saving 4 bytes per tuple a "darn good reason"? Is a change
> acceptable for 7.3? Do you think it's worth the effort?

I'm on the fence about it. My thoughts are probably colored by the
fact that I prefer platforms that have MAXALIGN=8, so half the time
(including all null-free rows) there'd be no savings at all. Now if
we could get rid of 8 bytes in the header, I'd get excited ;-)

Any other opinions out there?

regards, tom lane

PS: I did like your point about BITMAPLEN; I think that might be
a free savings. I was waiting for you to bring it up on hackers
before commenting though...

---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster

Manfred Koizar

unread,

May 3, 2002, 3:51:01 AM5/3/02

On Thu, 02 May 2002 21:10:40 -0400, Tom Lane wrote:

>Hmm ... that might work. Actually, we are trying to stuff *five*
>numbers into these fields: xmin, xmax, cmin, cmax, and a VACUUM FULL
>transaction id (let's call it xvac just to have a name). The code
>currently assumes that cmin is not interesting simultaneously with xvac.
>I think it might be true that cmax is not interesting simultaneously
>with xvac either, in which case this could be made to work. (Vadim,
>your thoughts?)

Having read the sources recently I'm pretty sure you're right.

>I'm on the fence about it. My thoughts are probably colored by the
>fact that I prefer platforms that have MAXALIGN=8, so half the time
>(including all null-free rows) there'd be no savings at all.

But the other half of the time we'd save 8 bytes. So on average we
get savings of 4 bytes per tuple, don't we?

> Now if
>we could get rid of 8 bytes in the header, I'd get excited ;-)

I keep trying :-)

Servus
Manfred

---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to majo...@postgresql.org)

0 new messages