Groups

Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.

Dismiss

[BUGS] BUG #8673: Could not open file "pg_multixact/members/xxxx" on slave during hot_standby

96 views

Skip to first unread message

pe...@petrovich.kiev.ua

unread,

Dec 9, 2013, 8:47:34 AM12/9/13

to

The following bug has been logged on the website:

Bug reference: 8673
Logged by: Serge Negodyuck
Email address: pe...@petrovich.kiev.ua
PostgreSQL version: 9.3.2
Operating system: Linux 3.2.0-57-generic amd64. Ubuntu 12.04.3
Description:

I'm using postgresql 9.3.0, with WAL streaming replication, i.e.:
postgresql.conf:
hot_standby = on

recovery.conf:
standby_mode = 'on'
primary_conninfo = 'host=master port=5432 user=postgres
password=password'

I've installed new slave database on 6th of December. Since there was no
packages on apt.postgresql.org with postgresql 9.3.0 I've set up postgresql
9.3.2

I see a lot of messages in slave server log like :

2013-12-09 10:10:24 EET 172.18.10.45 main ERROR: could not access status of
transaction 24568845
2013-12-09 10:10:24 EET 172.18.10.45 main DETAIL: Could not open file
"pg_multixact/members/CD8F": No such file or directory.
--
2013-12-09 10:11:59 EET 172.18.10.30 main ERROR: could not access status of
transaction 31724554
2013-12-09 10:11:59 EET 172.18.10.30 main DETAIL: Could not open file
"pg_multixact/members/F615": No such file or directory.
--
2013-12-09 10:12:20 EET 172.18.10.7 main ERROR: could not access status of
transaction 25496296
2013-12-09 10:12:20 EET 172.18.10.7 main DETAIL: Could not open file
"pg_multixact/members/D31E": No such file or directory.

My next step was to upgrade to postgresql 9.3.2 on master and to do initial
sync from scratch.
It does not help. I still have the same error.

I think it may be tied to this commit:
http://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=215ac4ad6589e0f6a31cc4cd867aedba3cd42924

Next question why binary packages for 9.3.0/9.3.1 where removed from
apt.postgresql.org so I cannot downgrade.

--
Sent via pgsql-bugs mailing list (pgsql...@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Andres Freund

unread,

Dec 9, 2013, 9:25:25 AM12/9/13

to

Hi,

On 2013-12-09 13:47:34 +0000, pe...@petrovich.kiev.ua wrote:
> PostgreSQL version: 9.3.2

> I've installed new slave database on 6th of December. Since there was no
> packages on apt.postgresql.org with postgresql 9.3.0 I've set up postgresql
> 9.3.2

> 2013-12-09 10:10:24 EET 172.18.10.45 main ERROR: could not access status of
> transaction 24568845
> 2013-12-09 10:10:24 EET 172.18.10.45 main DETAIL: Could not open file
> "pg_multixact/members/CD8F": No such file or directory.

> My next step was to upgrade to postgresql 9.3.2 on master and to do initial
> sync from scratch.
> It does not help. I still have the same error.

Could you post, as close as possible to the next occurance of that
error:
* pg_controldata output from the primary
* pg_controldata output from the standby
* SELECT datfrozenxid, datminmxid FROM pg_database;

Do you frequently VACUUM FREEZE on the primary? Have you modified any of
the vacuum_freeze_* parameters?

> I think it may be tied to this commit:
> http://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=215ac4ad6589e0f6a31cc4cd867aedba3cd42924

Only incidentally I think - we didn't properly maintain it during
recovery before at all.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Andres Freund

unread,

Dec 9, 2013, 10:03:45 AM12/9/13

to

On 2013-12-09 16:55:21 +0200, Serge Negodyuck wrote:
> 2013/12/9 Andres Freund <and...@2ndquadrant.com>:

> > On 2013-12-09 13:47:34 +0000, pe...@petrovich.kiev.ua wrote:
> >> PostgreSQL version: 9.3.2
> >
> >> I've installed new slave database on 6th of December. Since there was no
> >> packages on apt.postgresql.org with postgresql 9.3.0 I've set up postgresql
> >> 9.3.2
> >
> >> 2013-12-09 10:10:24 EET 172.18.10.45 main ERROR: could not access status of
> >> transaction 24568845
> >> 2013-12-09 10:10:24 EET 172.18.10.45 main DETAIL: Could not open file
> >> "pg_multixact/members/CD8F": No such file or directory.
> >
> >> My next step was to upgrade to postgresql 9.3.2 on master and to do initial
> >> sync from scratch.
> >> It does not help. I still have the same error.
> >
> > Could you post, as close as possible to the next occurance of that
> > error:
> > * pg_controldata output from the primary
> > * pg_controldata output from the standby
>

> Sorry, I've just downgraded all the cluster to 9.3.0, and this error
> disappeared.
> I can provide output right now, if it make any sence.

Yes, that'd be helpful.

Could you also provide ls -l pg_multixact/ on both primary and standby?

Serge Negodyuck

unread,

Dec 9, 2013, 9:55:21 AM12/9/13

to

2013/12/9 Andres Freund <and...@2ndquadrant.com>:
> Hi,

>
> On 2013-12-09 13:47:34 +0000, pe...@petrovich.kiev.ua wrote:
>> PostgreSQL version: 9.3.2
>
>> I've installed new slave database on 6th of December. Since there was no
>> packages on apt.postgresql.org with postgresql 9.3.0 I've set up postgresql
>> 9.3.2
>
>> 2013-12-09 10:10:24 EET 172.18.10.45 main ERROR: could not access status of
>> transaction 24568845
>> 2013-12-09 10:10:24 EET 172.18.10.45 main DETAIL: Could not open file
>> "pg_multixact/members/CD8F": No such file or directory.
>
>> My next step was to upgrade to postgresql 9.3.2 on master and to do initial
>> sync from scratch.
>> It does not help. I still have the same error.
>
> Could you post, as close as possible to the next occurance of that
> error:
> * pg_controldata output from the primary
> * pg_controldata output from the standby

Sorry, I've just downgraded all the cluster to 9.3.0, and this error
disappeared.
I can provide output right now, if it make any sence.

But the same query with the same id *always* gave the same error not
depending on replication state.
One more thing: I had a look on previous fay logs and find following:

2013-12-08 21:17:15 EET LOG: could not truncate directory
"pg_multixact/members": apparent wraparound
2013-12-08 21:22:20 EET LOG: could not truncate directory
"pg_multixact/members": apparent wraparound
2013-12-08 21:27:26 EET LOG: could not truncate directory
"pg_multixact/members": apparent wraparound

AT some point at time there was no "apparent wraparound" message
anymore, but just "Could not open file"

> * SELECT datfrozenxid, datminmxid FROM pg_database;

datfrozenxid | datminmxid
--------------+------------
710 | 1
710 | 1
710 | 1
710 | 1

>
> Do you frequently VACUUM FREEZE on the primary? Have you modified any of
> the vacuum_freeze_* parameters?

I never run VACUUM FREEZE manually

The only changed vacuum_* parameter is
vacuum_cost_delay = 10

>
>> I think it may be tied to this commit:
>> http://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=215ac4ad6589e0f6a31cc4cd867aedba3cd42924
>
> Only incidentally I think - we didn't properly maintain it during
> recovery before at all.
>

Andres Freund

unread,

Dec 9, 2013, 11:28:49 AM12/9/13

to

On 2013-12-09 17:49:34 +0200, Serge Negodyuck wrote:
> Latest checkpoint's NextXID: 0/90546484
> Latest checkpoint's NextOID: 6079185
> Latest checkpoint's NextMultiXactId: 42049949
> Latest checkpoint's NextMultiOffset: 55384024
> Latest checkpoint's oldestXID: 710
> Latest checkpoint's oldestXID's DB: 1
> Latest checkpoint's oldestActiveXID: 90546475
> Latest checkpoint's oldestMultiXid: 1
> Latest checkpoint's oldestMulti's DB: 1

So, the oldest multi is 1, thus the truncation code wouldn't have been
able to remove any. So I think this really is an independent problem
from the truncation patch. But a problem nonetheless.

> > Could you also provide ls -l pg_multixact/ on both primary and standby?
>

> Did you mean pg_multixact/members/ ?

From both members, and offset. I'd be great if you could attach a ls -lR
of pg_multixact/.

> I't not possible on slave right now. Since I had to re-sync these
> files from master. May be that was not a good idea but it helped.
>
> On master there are files from 0000 to 14078
>
> On slave there were absent files from A1xx to FFFF
> They were the oldest ones. (October, November)

Hm. Were files from before A1xx, I am not sure how to parse your answer.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Serge Negodyuck

unread,

Dec 9, 2013, 10:49:34 AM12/9/13

to

2013/12/9 Andres Freund <and...@2ndquadrant.com>:
> On 2013-12-09 16:55:21 +0200, Serge Negodyuck wrote:
>> 2013/12/9 Andres Freund <and...@2ndquadrant.com>:

>> > On 2013-12-09 13:47:34 +0000, pe...@petrovich.kiev.ua wrote:
>> >> PostgreSQL version: 9.3.2
>> >
>> >> I've installed new slave database on 6th of December. Since there was no
>> >> packages on apt.postgresql.org with postgresql 9.3.0 I've set up postgresql
>> >> 9.3.2
>> >
>> >> 2013-12-09 10:10:24 EET 172.18.10.45 main ERROR: could not access status of
>> >> transaction 24568845
>> >> 2013-12-09 10:10:24 EET 172.18.10.45 main DETAIL: Could not open file
>> >> "pg_multixact/members/CD8F": No such file or directory.
>> >
>> >> My next step was to upgrade to postgresql 9.3.2 on master and to do initial
>> >> sync from scratch.
>> >> It does not help. I still have the same error.
>> >
>> > Could you post, as close as possible to the next occurance of that
>> > error:
>> > * pg_controldata output from the primary
>> > * pg_controldata output from the standby
>>
>> Sorry, I've just downgraded all the cluster to 9.3.0, and this error
>> disappeared.
>> I can provide output right now, if it make any sence.
>

> Yes, that'd be helpful.

master:
pg_control version number: 937
Catalog version number: 201306121
Database system identifier: 5928279687159054327
Database cluster state: in production
pg_control last modified: Mon 09 Dec 2013 05:29:53 PM EET
Latest checkpoint location: 3D4/76E97DA0
Prior checkpoint location: 3D4/6E768638
Latest checkpoint's REDO location: 3D4/76925C18
Latest checkpoint's REDO WAL file: 00000001000003D400000076
Latest checkpoint's TimeLineID: 1
Latest checkpoint's PrevTimeLineID: 1
Latest checkpoint's full_page_writes: on

Latest checkpoint's NextXID: 0/90546484
Latest checkpoint's NextOID: 6079185
Latest checkpoint's NextMultiXactId: 42049949
Latest checkpoint's NextMultiOffset: 55384024
Latest checkpoint's oldestXID: 710
Latest checkpoint's oldestXID's DB: 1
Latest checkpoint's oldestActiveXID: 90546475
Latest checkpoint's oldestMultiXid: 1
Latest checkpoint's oldestMulti's DB: 1

Time of latest checkpoint: Mon 09 Dec 2013 05:29:44 PM EET
Fake LSN counter for unlogged rels: 0/1
Minimum recovery ending location: 0/0
Min recovery ending loc's timeline: 0
Backup start location: 0/0
Backup end location: 0/0
End-of-backup record required: no
Current wal_level setting: hot_standby
Current max_connections setting: 1000
Current max_prepared_xacts setting: 0
Current max_locks_per_xact setting: 64
Maximum data alignment: 8
Database block size: 8192
Blocks per segment of large relation: 131072
WAL block size: 8192
Bytes per WAL segment: 16777216
Maximum length of identifiers: 64
Maximum columns in an index: 32
Maximum size of a TOAST chunk: 1996
Date/time type storage: 64-bit integers
Float4 argument passing: by value
Float8 argument passing: by value
Data page checksum version: 0

slave:
pg_control version number: 937
Catalog version number: 201306121
Database system identifier: 5928279687159054327
Database cluster state: in archive recovery
pg_control last modified: Mon 09 Dec 2013 05:25:22 PM EET
Latest checkpoint location: 3D4/6E768638
Prior checkpoint location: 3D4/66F14C60
Latest checkpoint's REDO location: 3D4/6E39F9E8
Latest checkpoint's REDO WAL file: 00000001000003D40000006E
Latest checkpoint's TimeLineID: 1
Latest checkpoint's PrevTimeLineID: 1
Latest checkpoint's full_page_writes: on

Latest checkpoint's NextXID: 0/90546484
Latest checkpoint's NextOID: 6079185

Latest checkpoint's NextMultiXactId: 42046170
Latest checkpoint's NextMultiOffset: 55058098

Latest checkpoint's oldestXID: 710
Latest checkpoint's oldestXID's DB: 1

Latest checkpoint's oldestActiveXID: 90541410

Latest checkpoint's oldestMultiXid: 1
Latest checkpoint's oldestMulti's DB: 1

Time of latest checkpoint: Mon 09 Dec 2013 05:24:44 PM EET
Fake LSN counter for unlogged rels: 0/1
Minimum recovery ending location: 3D4/7884BB68
Min recovery ending loc's timeline: 1
Backup start location: 0/0
Backup end location: 0/0
End-of-backup record required: no
Current wal_level setting: hot_standby
Current max_connections setting: 1000
Current max_prepared_xacts setting: 0
Current max_locks_per_xact setting: 64
Maximum data alignment: 8
Database block size: 8192
Blocks per segment of large relation: 131072
WAL block size: 8192
Bytes per WAL segment: 16777216
Maximum length of identifiers: 64
Maximum columns in an index: 32
Maximum size of a TOAST chunk: 1996
Date/time type storage: 64-bit integers
Float4 argument passing: by value
Float8 argument passing: by value
Data page checksum version: 0

>
> Could you also provide ls -l pg_multixact/ on both primary and standby?

Did you mean pg_multixact/members/ ?

I't not possible on slave right now. Since I had to re-sync these
files from master. May be that was not a good idea but it helped.

On master there are files from 0000 to 14078

On slave there were absent files from A1xx to FFFF
They were the oldest ones. (October, November)

Andres Freund

unread,

Dec 9, 2013, 1:27:01 PM12/9/13

to

Hi,

On 2013-12-09 17:49:34 +0200, Serge Negodyuck wrote:

> On master there are files from 0000 to 14078
>
> On slave there were absent files from A1xx to FFFF
> They were the oldest ones. (October, November)

Some analysis later, I am pretty sure that the origin is a longstanding
problem and not connected to 9.3.[01] vs 9.3.2.

The above referenced 14078 file is exactly the last page before a
members wraparound:
(gdb) p/x (1L<<32)/(MULTIXACT_MEMBERS_PER_PAGE * SLRU_PAGES_PER_SEGMENT)
$10 = 0x14078

So, what happened is that enough multixacts where created, that the
members slru wrapped around. It's not unreasonable for the members slru
to wrap around faster then the offsets one - after all we create at
least two entries into members for every offset entry. Also in 9.3+
there fit more xids on a offset than a members page.
When truncating, we first read the offset, to know where we currently
are in members, and then truncate both from their respective
point. Since we've wrapped around in members we very well might remove
content we actually need.

I've recently remarked that I find it dangerous that we only do
anti-wraparound stuff for pg_multixact/offsets, not for /members. So,
here we have the proof that that's bad.

This is an issue in <9.3 as well. It might, in some sense, even be worse
there, because we never vacuum old multis away. But on the other hand,
the growths of multis is slower there and we look into old multis less
frequently.

The only reason that you saw the issue on the standby first is that the
truncation code is called more frequently there. Afaics it will happen,
sometime in the future, on the master as well.

I think problems should be preventable if you issue a systemwide VACUUM
FREEZE, but please let others chime in before you execute it.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Alvaro Herrera

unread,

Dec 9, 2013, 2:00:32 PM12/9/13

to

Andres Freund wrote:
> Hi,
>
> On 2013-12-09 17:49:34 +0200, Serge Negodyuck wrote:
> > On master there are files from 0000 to 14078
> >
> > On slave there were absent files from A1xx to FFFF
> > They were the oldest ones. (October, November)
>
> Some analysis later, I am pretty sure that the origin is a longstanding
> problem and not connected to 9.3.[01] vs 9.3.2.
>
> The above referenced 14078 file is exactly the last page before a
> members wraparound:
> (gdb) p/x (1L<<32)/(MULTIXACT_MEMBERS_PER_PAGE * SLRU_PAGES_PER_SEGMENT)
> $10 = 0x14078

As a note, the SlruScanDirectory code has a flaw because it only looks
at four-digit files; the reason only files up to 0xFFFF are missing and
not the following ones is because those got ignored. This needs a fix
as well.

> So, what happened is that enough multixacts where created, that the
> members slru wrapped around. It's not unreasonable for the members slru
> to wrap around faster then the offsets one - after all we create at
> least two entries into members for every offset entry. Also in 9.3+
> there fit more xids on a offset than a members page.
> When truncating, we first read the offset, to know where we currently
> are in members, and then truncate both from their respective
> point. Since we've wrapped around in members we very well might remove
> content we actually need.

Yeah, on 9.3 each member Xid occupies five bytes in
pg_multixact/members, whereas each offset only occupies four bytes in
pg_multixact/offsets. It's rare that a multixact only contains one
member; typically they will have at least two (so for each multixact we
would have 4 bytes in offsets and a minimum of 10 bytes in members).
So wrapping around is easy for members, even with the protections we
have in place for offsets.

> I've recently remarked that I find it dangerous that we only do
> anti-wraparound stuff for pg_multixact/offsets, not for /members. So,
> here we have the proof that that's bad.

It's hard to see how to add this post-facto, though. I mean, I am
thinking we would need some additional pg_control info etc. We'd better
figure out a way to add such controls without having to add that.

> This is an issue in <9.3 as well. It might, in some sense, even be worse
> there, because we never vacuum old multis away. But on the other hand,
> the growths of multis is slower there and we look into old multis less
> frequently.
>
> The only reason that you saw the issue on the standby first is that the
> truncation code is called more frequently there. Afaics it will happen,
> sometime in the future, on the master as well.
>
> I think problems should be preventable if you issue a systemwide VACUUM
> FREEZE, but please let others chime in before you execute it.

I wouldn't freeze anything just yet, at least until the patch to fix
multixact freezing is in.

--
�lvaro Herrera http://www.2ndQuadrant.com/

Andres Freund

unread,

Dec 9, 2013, 8:00:54 PM12/9/13

to

On 2013-12-09 16:00:32 -0300, Alvaro Herrera wrote:
> As a note, the SlruScanDirectory code has a flaw because it only looks
> at four-digit files; the reason only files up to 0xFFFF are missing and
> not the following ones is because those got ignored. This needs a fix
> as well.

While I agree it's a bug, I don't think it's relevant for the case at
hand. For offset's there's no following page (or exactly 1, not sure
about the math offhand), and we only use SlruScanDirectory() for
offsets not for members.

> > I've recently remarked that I find it dangerous that we only do
> > anti-wraparound stuff for pg_multixact/offsets, not for /members. So,
> > here we have the proof that that's bad.
>
> It's hard to see how to add this post-facto, though. I mean, I am
> thinking we would need some additional pg_control info etc. We'd better
> figure out a way to add such controls without having to add that.

Couldn't we just get the oldest multi, check where in offsets it points
to, and compare that with nextOffset? That should be doable without
additional data.

> > I think problems should be preventable if you issue a systemwide VACUUM
> > FREEZE, but please let others chime in before you execute it.
>
> I wouldn't freeze anything just yet, at least until the patch to fix
> multixact freezing is in.

Well, it seems better than getting errors because of multixact members
that are gone.
Maybe PGOPTIONS='-c vacuum_freez_table_age=0 -c vacuum_freeze_min_age=1000000 vacuumdb -a'
- that ought not to cause problems with current data and should freeze
enough to get rid of problematic multis?

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/

Alvaro Herrera

unread,

Dec 9, 2013, 11:25:20 PM12/9/13

to

Andres Freund wrote:
> On 2013-12-09 16:00:32 -0300, Alvaro Herrera wrote:
> > As a note, the SlruScanDirectory code has a flaw because it only looks
> > at four-digit files; the reason only files up to 0xFFFF are missing and
> > not the following ones is because those got ignored. This needs a fix
> > as well.
>
> While I agree it's a bug, I don't think it's relevant for the case at
> hand. For offset's there's no following page (or exactly 1, not sure
> about the math offhand), and we only use SlruScanDirectory() for
> offsets not for members.

Sure we do for members, through SimpleLruTruncate which calls
SlruScanDirectory underneath.

> > > I've recently remarked that I find it dangerous that we only do
> > > anti-wraparound stuff for pg_multixact/offsets, not for /members. So,
> > > here we have the proof that that's bad.
> >
> > It's hard to see how to add this post-facto, though. I mean, I am
> > thinking we would need some additional pg_control info etc. We'd better
> > figure out a way to add such controls without having to add that.
>
> Couldn't we just get the oldest multi, check where in offsets it points
> to, and compare that with nextOffset? That should be doable without
> additional data.

Hmm, that seems a sensible approach ...

> > > I think problems should be preventable if you issue a systemwide VACUUM
> > > FREEZE, but please let others chime in before you execute it.
> >
> > I wouldn't freeze anything just yet, at least until the patch to fix
> > multixact freezing is in.
>
> Well, it seems better than getting errors because of multixact members
> that are gone.
> Maybe PGOPTIONS='-c vacuum_freez_table_age=0 -c vacuum_freeze_min_age=1000000 vacuumdb -a'
> - that ought not to cause problems with current data and should freeze
> enough to get rid of problematic multis?

TBH I don't feel comfortable with predicting what will it freeze with
the broken code.

--
�lvaro Herrera http://www.2ndQuadrant.com/

Serge Negodyuck

unread,

Dec 10, 2013, 8:22:49 AM12/10/13

to

> From both members, and offset. I'd be great if you could attach a ls -lR
> of pg_multixact/.
>

I've set up another slave (in not production mode) to reproduce this
bug and provide more details if necessary.
There are no "apparent wraparound" message in logs yet. Probably
because I cannot sent the same SQL queries to non-production server.

Just to provide exact information: these are links to ls -lR
pg_multixact/ of both master and slave server.

https://drive.google.com/file/d/0B5sVzeuionyGVER6dnA3a0NkOU0/
https://drive.google.com/file/d/0B5sVzeuionyGUlFibC1jZEVvQzA/

Alvaro Herrera

unread,

Jan 2, 2014, 8:46:17 PM1/2/14

to

Andres Freund wrote:

> On 2013-12-09 17:49:34 +0200, Serge Negodyuck wrote:
> > On master there are files from 0000 to 14078
> >
> > On slave there were absent files from A1xx to FFFF
> > They were the oldest ones. (October, November)
>
> Some analysis later, I am pretty sure that the origin is a longstanding
> problem and not connected to 9.3.[01] vs 9.3.2.
>
> The above referenced 14078 file is exactly the last page before a
> members wraparound:
> (gdb) p/x (1L<<32)/(MULTIXACT_MEMBERS_PER_PAGE * SLRU_PAGES_PER_SEGMENT)
> $10 = 0x14078
>

> So, what happened is that enough multixacts where created, that the
> members slru wrapped around. It's not unreasonable for the members slru
> to wrap around faster then the offsets one - after all we create at
> least two entries into members for every offset entry. Also in 9.3+
> there fit more xids on a offset than a members page.
> When truncating, we first read the offset, to know where we currently
> are in members, and then truncate both from their respective
> point. Since we've wrapped around in members we very well might remove
> content we actually need.
>

> I've recently remarked that I find it dangerous that we only do
> anti-wraparound stuff for pg_multixact/offsets, not for /members. So,
> here we have the proof that that's bad.

I have applied three patches to deal with some of the problems reported
here, and others discovered during the investigation of them. One of
them was about failing to truncate files beyond FFFF. That should work
fine now -- that is, you would lose more data. Unless we consider a
second fix, which is that files corresponding to data still in use are
no longer truncated.

I had to include the third fix (to enable the system to wrap around
sanely from file 14078 to 0000) was necessary so I could reproduce the
issues. In systems with assertions enabled, there is a crash at the
point of overflow. I didn't try, but since your system appears to have
wrapped around I imagine it sort-of works in systems compiled without
assertions (which is the recommended setting for production settings.)

One thing not yet patched is overrun of members' SLRU: if you have
enough live multixacts with enough members, creating a new one might
overwrite the members area used by an older member. Freezing multis
earlier would help with that. With the default settings, where multis
are frozen when they are 50 million multis old and pages are 8kB long,
there is room for 85 members per multi on average without such
overrun[*]. I was able to observe this overrun by running Andres'
pg_multixact_burn with each multixact having 100 members. I doubt it's
common to have that many members in each multixact on average, but it's
certainly a possibility.

[*] There are 82040 files, having 32 pages each; each page has room for
1636 members. (82040 * 32 / 1636) / 50000000 =~ 85.

One complaint somebody might rightly have about this is the space
consumption by pg_multixact/ files. Perhaps instead of using Xid's
freezing horizon verbatim, we should use ceil(min_freeze_age^0.8) or
something like that; so for the default 50000000 Xid freezing age, we
would freeze multis over 1.44 million multis old. (We could determine
ratio of xid to multi usage so that they would both freeze when the same
time has lapsed, but this seems unnecessarily complicated.)

--
�lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

0 new messages

Search

Clear search

Close search

Google apps

Main menu