GPDB Master not is in recovery mode

67 views
Skip to first unread message

Rohit Tanwar

unread,
Apr 2, 2024, 6:55:35 PM4/2/24
to Greenplum Users, Leela Krishna Alapati
Hi Team,

We are using GPDB 6.15.0 with 24 segment cluster (12 P & 12 M) from last 2 years and it is stable with our applications.

All of sudden now we are seeing the below error on the master node only workers logs are good nothing much to track on this side.

We checked the memory stats nothing is spill over everything is in limit nothing wrong.

2024-04-02 22:30:36.902130 UTC,"prasingh","cloud",p9062,th1724984256,"IP","53058",2024-04-02 22:30:36 UTC,0,,,seg-1,,,,,"FATAL","57P03","the database system is in recovery mode","last replayed record at 0/0",,,,,,0,,"postmaster.c",2553,
2024-04-02 22:30:37.216802 UTC,,,p9031,th1724984256,,,,0,,,seg-1,,,,,"LOG","00000","database system was not properly shut down; automatic recovery in progress",,,,,,,0,,"xlog.c",6849,
2024-04-02 22:30:37.231248 UTC,"calixcloud","cloud",p9069,th1724984256,"IP2","38144",2024-04-02 22:30:37 UTC,0,,,seg-1,,,,,"FATAL","57P03","the database system is in recovery mode","last replayed record at E0/B07C03F0",,,,,,0,,"postmaster.c",2553,
2024-04-02 22:30:37.237586IP10.2.182.212","45608",2024-04-02 22:30:37 UTC,0,,,seg-1,,,,,"FATAL","57P03","the database system is in recovery mode","last replayed record at E0/B07C03F0",,,,,,0,,"postmaster.c",2553,
2024-04-02 22:30:37.240795 UTC,"calixcloud","cloud",p9071,th1724984256,"IP","37902",2024-04-02 22:30:37 UTC,0,,,seg-1,,,,,"FATAL","57P03","the database system is in recovery mode","last replayed record at E0/B07C03F0",,,,,,0,,"postmaster.c",2553,
2024-04-02 22:30:37.242660 UTC,"calixcloud","cloud",p9072,th1724984256,"IP","40564",2024-04-02 22:30:37 UTC,0,,,seg-1,,,,,"FATAL","57P03","the database system is in recovery mode","last replayed record at E0/B07C03F0",,,,,,0,,"postmaster.c",2553,


Please add the new user to the community - LKALAP...@gmail.com

Thanks,
Rohit Tanwar

Ivan Novick

unread,
Apr 2, 2024, 7:06:48 PM4/2/24
to Rohit Tanwar, Greenplum Users, Leela Krishna Alapati

Is this log from the MASTER node?  Did you check for any failures in the segment logs?

Ivan


--
You received this message because you are subscribed to the Google Groups "Greenplum Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gpdb-users+...@greenplum.org.
To view this discussion on the web visit https://groups.google.com/a/greenplum.org/d/msgid/gpdb-users/10435f03-bd05-4c24-8c27-5d4b0642b0c7n%40greenplum.org.

This electronic communication and the information and any files transmitted with it, or attached to it, are confidential and are intended solely for the use of the individual or entity to whom it is addressed and may contain information that is confidential, legally privileged, protected by privacy laws, or otherwise restricted from disclosure to anyone else. If you are not the intended recipient or the person responsible for delivering the e-mail to the intended recipient, you are hereby notified that any use, copying, distributing, dissemination, forwarding, printing, or copying of this e-mail is strictly prohibited. If you received this e-mail in error, please return the e-mail to the sender, delete it from your computer, and destroy any printed copy of it.

Rohit Tanwar

unread,
Apr 2, 2024, 7:37:58 PM4/2/24
to Greenplum Users, Ivan Novick, Greenplum Users, Leela Krishna Alapati, Rohit Tanwar
2024-04-02 16:25:02.643399 UTC,"gpadmin","gpadmin",p13553,th-569070656,"IP","22223",2024-04-02 16:25:02 UTC,0,,,seg0,,,,sx1,"FATAL","3D000","database ""gpadmin"" does not exist",,,,,,,0,,"postinit.c",957,

Only this statement during the same window.

Ashwin Agrawal

unread,
Apr 2, 2024, 8:21:34 PM4/2/24
to Rohit Tanwar, Greenplum Users, Ivan Novick, Leela Krishna Alapati
Those messages convey the node is performing crash recovery.

Two things are possible:
- Are you sure you are connecting to the master and not the standby master node? (check the host and port settings for the connection)
- If its master, then check the logs before these messages related to recovery are printed. It should have some kind of PANIC? or someone restarted the master node and it's taking a long time to recover?

- Ashwin

Rohit Tanwar

unread,
Apr 2, 2024, 8:24:51 PM4/2/24
to Greenplum Users, Ashwin Agrawal, Greenplum Users, Ivan Novick, Leela Krishna Alapati, Rohit Tanwar
1. We are not using standby master
2. No one restarted it we did it to see if it fix the issue.

2024-04-02 23:57:00.076546 UTC,"gpadmin","template1",p20264,th1724984256,"127.0.0.1","61893",2024-04-02 23:57:00 UTC,0,con447,,seg-1,,,,,"FATAL","XX000","semctl(3178585, 10, SETVAL, 0) failed: Invalid argument (pg_sema.c:151)",,,,,,,0,,"pg_sema.c",151,"Stack trace:
1    0x55f5144c9eb1 postgres errstart + 0x1f1
2    0x55f513f4f57a postgres <symbol not found> + 0x13f4f57a
3    0x55f5142b7bf3 postgres <symbol not found> + 0x142b7bf3
4    0x55f514335582 postgres InitProcess + 0x2c2
5    0x55f51434bdb9 postgres PostgresMain + 0x159
6    0x55f513f5301f postgres <symbol not found> + 0x13f5301f
7    0x55f5142cbe48 postgres PostmasterMain + 0x11b8
8    0x55f513f582da postgres main + 0x4aa
9    0x7f3b64315bf7 libc.so.6 __libc_start_main + 0xe7
10   0x55f513f6423a postgres _start + 0x2a

Ashwin Agrawal

unread,
Apr 2, 2024, 8:55:34 PM4/2/24
to Rohit Tanwar, Greenplum Users, Ivan Novick, Leela Krishna Alapati
On Tue, Apr 2, 2024 at 5:24 PM Rohit Tanwar <rttr...@gmail.com> wrote:
1. We are not using standby master
2. No one restarted it we did it to see if it fix the issue.

You are saying you restarted and those messages correspond to the master node being in recovery.
How long has it been in recovery now?

Do the number in messages ""last replayed record at <number here>" keep increasing?
That tells recovery is making progress though it just has a lot to recover.
Generally not expected for recovery to take more than 5-10 mins. Unless CHECKPOINTS were failing on the master node and it collected a huge backlog to replay on restart.

Frankly, I am unclear on the problem you are seeking an answer to.
Is it can't connect as DB is reporting it is in recovery mode? or
All connections are failing with ""semctl(3178585, 10, SETVAL, 0) failed: Invalid argument (pg_sema.c:151)"" and wish to understand the reason for it?


- Ashwin

Rohit Tanwar

unread,
Apr 2, 2024, 9:02:16 PM4/2/24
to Greenplum Users, Ashwin Agrawal, Greenplum Users, Ivan Novick, Leela Krishna Alapati, Rohit Tanwar
On Tuesday 2 April 2024 at 17:55:34 UTC-7 Ashwin Agrawal wrote:
On Tue, Apr 2, 2024 at 5:24 PM Rohit Tanwar <rttr...@gmail.com> wrote:
1. We are not using standby master
2. No one restarted it we did it to see if it fix the issue.

You are saying you restarted and those messages correspond to the master node being in recovery.
How long has it been in recovery now?[
Recovery messages keep on coming for 1- 2min  

Do the number in messages ""last replayed record at <number here>" keep increasing?
That tells recovery is making progress though it just has a lot to recover.
Generally not expected for recovery to take more than 5-10 mins. Unless CHECKPOINTS were failing on the master node and it collected a huge backlog to replay on restart.
Last replayed records completed in 10-15 min. 

Frankly, I am unclear on the problem you are seeking an answer to.
Is it can't connect as DB is reporting it is in recovery mode? or 
All connections are failing with ""semctl(3178585, 10, SETVAL, 0) failed: Invalid argument (pg_sema.c:151)"" and wish to understand the reason for it?
All connections are failing during the very first min when it start happening. 
Reply all
Reply to author
Forward
0 new messages