Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

IpcSemaphoreLock/Unlock and proc_exit on 7.2.6

14 views
Skip to first unread message

Kris Jurka

unread,
Nov 13, 2004, 10:43:01 PM11/13/04
to

I have an underpowered server running 7.2.6 that backs a website which
occasionally gets hit by a bunch of traffic and starts firing off "FATAL
1: Sorry, too many clients already" messages. This is all as expected,
but sometimes it just crashes. I had no clue what was going on until I
checked the stderr log (because I had set it up to use syslog). In there
I find a whole bunch of these:

IpcSemaphoreLock: semop(id=-1) failed: Invalid argument
IpcSemaphoreLock: semop(id=-1) failed: Invalid argument
IpcSemaphoreLock: semop(id=-1) failed: Invalid argument
IpcSemaphoreLock: semop(id=-1) failed: Invalid argument
IpcSemaphoreUnlock: semop(id=-1) failed: Invalid argument
IpcSemaphoreLock: semop(id=-1) failed: Invalid argument
IpcSemaphoreUnlock: semop(id=-1) failed: Invalid argument
IpcSemaphoreLock: semop(id=-1) failed: Invalid argument

Looking at the source I see proc_exit as the failure path for these two
functions (IpcSemaphoreLock, IpcSemaphoreUnlock). I've read the comments
around the code, but must admit that I can't really follow what's going
on.

Could anyone shed some light on what is going on? Certainly the semId of
-1 looks a little suspicious.

This is on freebsd 4.5

Kris Jurka


---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faqs/FAQ.html

Tom Lane

unread,
Nov 14, 2004, 7:18:22 PM11/14/04
to
Kris Jurka <bo...@ejurka.com> writes:
> I have an underpowered server running 7.2.6 that backs a website which
> occasionally gets hit by a bunch of traffic and starts firing off "FATAL
> 1: Sorry, too many clients already" messages. This is all as expected,
> but sometimes it just crashes. I had no clue what was going on until I
> checked the stderr log (because I had set it up to use syslog). In there
> I find a whole bunch of these:

> IpcSemaphoreLock: semop(id=-1) failed: Invalid argument

[ eyeballs code... ] It looks like this could happen in 7.2 during exit
from a backend that failed to acquire a semaphore --- ProcKill does
things like LockReleaseAll, which needs to acquire the lockmanager LWLock,
which could try to block using the process semaphore if there's
contention for the LWLock. The problem should be gone in 7.3 and later
due to reorganization of the semaphore management code. I'm not sure
it's worth trying to fix in 7.2.* --- the odds of introducing new
problems seem too high, and we're not really maintaining 7.2 anymore
anyway.

The comment in ProcGetNewSemIdAndNum suggests that you might be able to
suppress the problem in 7.2 by using a different max_connections value.
Is your current value one less than a multiple of 16, by any chance?

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend

Kris Jurka

unread,
Nov 14, 2004, 8:04:45 PM11/14/04
to

On Sun, 14 Nov 2004, Tom Lane wrote:

> The comment in ProcGetNewSemIdAndNum suggests that you might be able to
> suppress the problem in 7.2 by using a different max_connections value.
> Is your current value one less than a multiple of 16, by any chance?
>

Currently 32. It is unclear whether you think 31 is the failure case your
thinking of or whether 31 might help.

Kris Jurka

---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majo...@postgresql.org so that your
message can get through to the mailing list cleanly

Tom Lane

unread,
Nov 14, 2004, 8:10:06 PM11/14/04
to
Kris Jurka <bo...@ejurka.com> writes:
> On Sun, 14 Nov 2004, Tom Lane wrote:
>> Is your current value one less than a multiple of 16, by any chance?

> Currently 32. It is unclear whether you think 31 is the failure case your
> thinking of or whether 31 might help.

No, 32 is actually the best case (most slop) if I'm reading the code
correctly.

I'd suggest an update to 7.3 or later ...

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster

0 new messages