Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

backend died

3 views
Skip to first unread message

Brusser, Michael

unread,
Nov 15, 2004, 2:30:39 PM11/15/04
to

Our customer running Postgres v. 7.3.2 reported a problem, occurring
couple times a week on three different servers, all on Solaris 9.
We enabled debugging in postgresql.conf, now it happened again;
here's the excerpt from the database log:

2004-11-13 10:01:06 [10456]  DEBUG:  child process (pid 19285) was terminated by signal 10
2004-11-13 10:01:06 [10456]  LOG:  server process (pid 19285) was terminated by signal 10
2004-11-13 10:01:06 [10456]  LOG:  terminating any other active server processes
2004-11-13 10:01:06 [10456]  DEBUG:  CleanupProc: sending SIGQUIT to process 10876
2004-11-13 10:01:06 [10456]  DEBUG:  CleanupProc: sending SIGQUIT to process 10482
2004-11-13 10:01:06 [10456]  DEBUG:  CleanupProc: sending SIGQUIT to process 10481
2004-11-13 10:01:06 [10456]  DEBUG:  CleanupProc: sending SIGQUIT to process 10478
2004-11-13 10:01:06 [10456]  DEBUG:  CleanupProc: sending SIGQUIT to process 10472
2004-11-13 10:01:06 [10876]  WARNING:  Message from PostgreSQL backend:
        The Postmaster has informed me that some other backend
        died abnormally and possibly corrupted shared memory.
        I have rolled back the current transaction and am
        going to terminate your database system connection and exit.
        Please reconnect to the database system and repeat your query.
2004-11-13 10:01:06 [10478]  WARNING:  Message from PostgreSQL backend:
        The Postmaster has informed me that some other backend
        died abnormally and possibly corrupted shared memory.
        I have rolled back the current transaction and am
        going to terminate your database system connection and exit.
        Please reconnect to the database system and repeat your query.
2004-11-13 10:01:06 [10482]  WARNING:  Message from PostgreSQL backend:
        The Postmaster has informed me that some other backend
        ... ...

There's no other references to process 19285 in the log file.
If it helps the servers are configured to use UDS.
The socket files are placed in different directories (each db's PGDATA)

Would it be helful to change the debug level from DEBUG1 to a higher value?
What else should I look at?

Thank you,
Mike


Tom Lane

unread,
Nov 15, 2004, 2:48:18 PM11/15/04
to
"Brusser, Michael" <Michael...@matrixone.com> writes:
> Our customer running Postgres v. 7.3.2 reported a problem, occurring
> couple times a week on three different servers, all on Solaris 9.

> 2004-11-13 10:01:06 [10456] DEBUG: child process (pid 19285) was
> terminated by signal 10

SIGBUS iirc.

> What else should I look at?

Find out what query is causing the crash --- enable query logging if you
have no other way. And get a debugger stack trace from the core file
that the crashed backend left behind.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

Brusser, Michael

unread,
Nov 15, 2004, 3:01:44 PM11/15/04
to

> "Brusser, Michael" <Michael...@matrixone.com> writes:
> > Our customer running Postgres v. 7.3.2 reported a problem, occurring
> > couple times a week on three different servers, all on Solaris 9.
>
> > 2004-11-13 10:01:06 [10456]  DEBUG:  child process (pid 19285) was
> > terminated by signal 10
>
> SIGBUS iirc.

> > What else should I look at?
>
> Find out what query is causing the crash --- enable query
> logging if you have no other way.  And get a debugger stack trace from the core file
> that the crashed backend left behind.
>                       regards, tom lane

==================================================
The log-statements option was already enabled,
here's what I see prior to the crash:

2004-11-13 09:49:46 [10876]  DEBUG:  StartTransactionCommand
2004-11-13 09:49:46 [10876]  LOG:  statement: SELECT SUM(C0),SUM(C1),SUM(C2),SUM(C3),SUM(C4),SUM(C5),SUM(C6),SUM(C7),SUM(C8),SUM(C9)

,SUM(C10) FROM cache_refreshes
2004-11-13 09:49:46 [10876]  LOG:  query: SELECT SUM(C0),SUM(C1),SUM(C2),SUM(C3),SUM(C4),SUM(C5),SUM(C6),SUM(C7),SUM(C8),SUM(C9),SUM

(C10) FROM cache_refreshes
2004-11-13 09:49:46 [10876]  DEBUG:  CommitTransactionCommand
2004-11-13 09:49:46 [10876]  LOG:  statement: SELECT SUM(C0),SUM(C1),SUM(C2),SUM(C3),SUM(C4),SUM(C5),SUM(C6),SUM(C7),SUM(C8),SUM(C9)

,SUM(C10) FROM cache_refreshes
2004-11-13 09:49:46 [10876]  DEBUG:  StartTransactionCommand
2004-11-13 09:49:46 [10876]  LOG:  statement: commit
2004-11-13 09:49:46 [10876]  LOG:  query: commit
2004-11-13 09:49:46 [10876]  DEBUG:  CommitTransactionCommand
2004-11-13 09:49:46 [10876]  LOG:  statement: commit
2004-11-13 09:49:46 [10876]  DEBUG:  StartTransactionCommand
2004-11-13 09:49:46 [10876]  LOG:  statement: begin
2004-11-13 09:49:46 [10876]  LOG:  query: begin
2004-11-13 09:49:46 [10876]  DEBUG:  CommitTransactionCommand
2004-11-13 09:49:46 [10876]  LOG:  statement: begin
2004-11-13 09:49:46 [10876]  DEBUG:  StartTransactionCommand
2004-11-13 09:49:46 [10876]  LOG:  statement: commit
2004-11-13 09:49:46 [10876]  LOG:  query: commit
2004-11-13 09:49:46 [10876]  DEBUG:  CommitTransactionCommand
2004-11-13 09:49:46 [10876]  LOG:  statement: commit
2004-11-13 09:49:46 [10876]  DEBUG:  StartTransactionCommand
2004-11-13 09:49:46 [10876]  LOG:  statement: begin
2004-11-13 09:49:46 [10876]  LOG:  query: begin
2004-11-13 09:49:46 [10876]  DEBUG:  CommitTransactionCommand
2004-11-13 09:49:46 [10876]  LOG:  statement: begin
2004-11-13 09:49:46 [10876]  DEBUG:  StartTransactionCommand
2004-11-13 09:49:46 [10876]  LOG:  statement: commit
2004-11-13 09:49:46 [10876]  LOG:  query: commit
2004-11-13 09:49:46 [10876]  DEBUG:  CommitTransactionCommand
2004-11-13 09:49:46 [10876]  LOG:  statement: commit
2004-11-13 09:54:21 [10456]  DEBUG:  child process (pid 19282) exited with exit code 0

2004-11-13 10:01:06 [10456]  DEBUG:  child process (pid 19285) was terminated by signal 10

2004-11-13 10:01:06 [10456]  LOG:  server process (pid 19285) was terminated by signal 10
2004-11-13 10:01:06 [10456]  LOG:  terminating any other active server processes
2004-11-13 10:01:06 [10456]  DEBUG:  CleanupProc: sending SIGQUIT to process 10876

... ... ...

The same app. is running for other customers, seemingly steady...
I will ask for the core file.
This is a brand new machine. Is it likely that a bad memory chip may cause this?
Thank you,
Mike

Tom Lane

unread,
Nov 15, 2004, 3:33:36 PM11/15/04
to
"Brusser, Michael" <Michael...@matrixone.com> writes:
> 2004-11-13 10:01:06 [10456] DEBUG: child process (pid 19285) was
> terminated by signal 10

> The log-statements option was already enabled,


> here's what I see prior to the crash:

That's no help. What were the last few lines from process 19285?

> This is a brand new machine. Is it likely that a bad memory chip may cause
> this?

Possibly, but it would not do to point fingers at the hardware when
you're running an obsolete version of PG ;-). At least get it updated
to 7.3.8.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend

Brusser, Michael

unread,
Nov 15, 2004, 4:02:32 PM11/15/04
to

> -----Original Message-----
> From: Tom Lane [mailto:t...@sss.pgh.pa.us]
> Sent: Monday, November 15, 2004 3:34 PM
> To: Brusser, Michael
> Cc: Pgsql-Hackers (E-mail)
> Subject: Re: [HACKERS] backend died
>
>
> "Brusser, Michael" <Michael...@matrixone.com> writes:
> > 2004-11-13 10:01:06 [10456]  DEBUG:  child process (pid 19285) was
> > terminated by signal 10
>
> > The log-statements option was already enabled,
> > here's what I see prior to the crash:
>
> That's no help.  What were the last few lines from process 19285?

That's the strangest thing: the log file begins with
2004-11-12 11:49:14 [10456]  DEBUG:  FindExec: found
"/lsi/soft/synchronicity/latest/syncinc/bin.sol2/postgres" using argv[0]

 - it continues with all SQL statements until it crashes; but the only
reference to pid 19285 is:

2004-11-13 10:01:06 [10456]  DEBUG:  child process (pid 19285) was terminated by signal 10

2004-11-13 10:01:06 [10456]  LOG:  server process (pid 19285) was terminated by signal 10

there are no prior occurrences of token 19285 in the file.

> ... you're running an obsolete version of PG ;-).

>     At least get it updated to 7.3.8.

I'd love to, but this is not something I can do. Have to live with that,
as well as with the fact that many of our customers are running on NFS
(yes, I know...)

Tom Lane

unread,
Nov 15, 2004, 4:12:18 PM11/15/04
to
"Brusser, Michael" <Michael...@matrixone.com> writes:
>> That's no help. What were the last few lines from process 19285?

> there are no prior occurrences of token 19285 in the file.

Hmm, so it seems 19285 died during startup. That does make a hardware
problem seem a bit plausible --- the backend start sequence is pretty
well tested ;-).

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match

0 new messages