SSH session limit

94 views
Skip to first unread message

Alric Kriel

unread,
Mar 3, 2021, 9:08:57 AM3/3/21
to Greenplum Users
Good Day,

I have a question and hope someone can give me some guidance. We have a Greenplum installation running on version 6.11.1 on 4 DB nodes, 1 master node and 1 master standby node. On each database node we have 5 primary and 5 mirror segments.

The ETL process is quite aggressive on connections and causes us to run out of connections. I have set the ssh limit to 500 on the linux host and also 500 for the db max_connections. 

If I try and increase the ssh limit above 500 I find that when attempting to start the db the segments are failing forcing me to stop, reduce the ssh limits, restart the db and recover the failing segments.

It appears as if there is a hard limit of ssh session of 500. This is running on CentOS 7.

Have anyone encountered this before and is there a way to overcome this issue?

Any advice or guidance will be greatly appreciated.

Kind Regards
Alric Kriel

ludi

unread,
Mar 3, 2021, 9:35:40 AM3/3/21
to Alric Kriel, Greenplum Users
Could you please post "MaxStartups","MaxSessions"  in /etc/sshd_config

Thanks,
Dillon

Alric Kriel <alric...@gmail.com> 于2021年3月3日周三 下午10:09写道:
--
You received this message because you are subscribed to the Google Groups "Greenplum Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gpdb-users+...@greenplum.org.
To view this discussion on the web visit https://groups.google.com/a/greenplum.org/d/msgid/gpdb-users/691d8bc0-61ff-4319-941a-9c53e70dcabfn%40greenplum.org.

Alric Kriel

unread,
Mar 3, 2021, 11:10:40 AM3/3/21
to Greenplum Users, Dillon_lu, Greenplum Users, Alric Kriel
Hi there Dillon,

Herewith as requested
MaxSessions 500
MaxStartups 10:30:500

Kind Regards
Alric Kriel

ludi

unread,
Mar 3, 2021, 9:00:19 PM3/3/21
to Alric Kriel, Greenplum Users
Hi Alric,

I'm not sure why you change max_connect to fix ETL issue causes the SSH issue. 

But, if you have problem to start gp with SSH, please try these:
MaxSessions 1000
MaxStartups 1000


Since you don't post error message, I suggest you double check document, https://greenplum.docs.pivotal.io/6-14/install_guide/prep_os.html

Please search SSH connection threshold. 

Thanks,
Dillon



Alric Kriel <alric...@gmail.com> 于 2021年3月4日周四 00:10写道:

Alric Kriel

unread,
Mar 3, 2021, 11:26:32 PM3/3/21
to Greenplum Users, Dillon_lu, Greenplum Users, Alric Kriel
Thank you very much Dillon,

i will give it a try and also look at the document.

Kind Regards

Alric Kriel

unread,
Mar 4, 2021, 8:21:47 AM3/4/21
to Greenplum Users, Alric Kriel, Dillon_lu, Greenplum Users
Hi Dillon,

I managed to find the error in one of the segments logs that failed the moment I changed the connection limit too high. Please see below.

Kind Regards
2020-11-17 15:05:33.341127 GMT,,,p25189,th1927645312,,,,0,,,seg10,,,,,"ERROR","XX000","could not send end-of-streaming message to primary: no COPY in progress
",,,,,,,0,,"libpqwalreceiver.c",226,"Stack trace:
1    0xbf81ac postgres errstart (elog.c:557)
2    0xa4ef20 postgres <symbol not found> (libpqwalreceiver.c:224)
3    0xa43118 postgres WalReceiverMain (walreceiver.c:540)
4    0x79089a postgres AuxiliaryProcessMain (bootstrap.c:438)
5    0xa1504c postgres <symbol not found> (postmaster.c:5832)
6    0xa17087 postgres <symbol not found> (postmaster.c:2138)
7    0x7f597045b630 libpthread.so.0 <symbol not found> + 0x7045b630
8    0x7f596f8d4983 libc.so.6 __select + 0x13
9    0x6b9238 postgres <symbol not found> (postmaster.c:1894)
10   0xa187c2 postgres PostmasterMain (postmaster.c:1523)
11   0x6bdc01 postgres main (main.c:205)
12   0x7f596f801555 libc.so.6 __libc_start_main + 0xf5
13   0x6c986c postgres <symbol not found> + 0x6c986c
"
2020-11-17 15:05:33.532564 GMT,,,p25805,th1927645312,,,,0,,,seg10,,,,,"ERROR","XX000","could not connect to the primary server: could not connect to server: Connection refused
        Is the server running on host ""ca-edw-db04.togocom.tg"" (10.228.11.34) and accepting
        TCP/IP connections on port 7000?
",,,,,,,0,,"libpqwalreceiver.c",113,"Stack trace:
1    0xbf81ac postgres errstart (elog.c:557)
2    0xa4e703 postgres <symbol not found> (libpqwalreceiver.c:111)
3    0xa42b78 postgres WalReceiverMain (walreceiver.c:329)
4    0x79089a postgres AuxiliaryProcessMain (bootstrap.c:438)
5    0x6b96cf postgres <symbol not found> (postmaster.c:5837)
6    0xa187c2 postgres PostmasterMain (postmaster.c:1523)
7    0x6bdc01 postgres main (main.c:205)
8    0x7f596f801555 libc.so.6 __libc_start_main + 0xf5
9    0x6c986c postgres <symbol not found> + 0x6c986c
"
2020-11-17 15:05:38.216949 GMT,,,p25818,th1927645312,,,,0,,,seg10,,,,,"ERROR","XX000","could not connect to the primary server: could not connect to server: Connection refused
        Is the server running on host ""ca-edw-db04.togocom.tg"" (10.228.11.34) and accepting
        TCP/IP connections on port 7000?
",,,,,,,0,,"libpqwalreceiver.c",113,"Stack trace:
1    0xbf81ac postgres errstart (elog.c:557)
2    0xa4e703 postgres <symbol not found> (libpqwalreceiver.c:111)
3    0xa42b78 postgres WalReceiverMain (walreceiver.c:329)
4    0x79089a postgres AuxiliaryProcessMain (bootstrap.c:438)
5    0xa1504c postgres <symbol not found> (postmaster.c:5832)
6    0xa17087 postgres <symbol not found> (postmaster.c:2138)
7    0x7f597045b630 libpthread.so.0 <symbol not found> + 0x7045b630
8    0x7f5970458a2a libpthread.so.0 pthread_sigmask + 0x2a
9    0xa13be1 postgres <symbol not found> (postmaster.c:2972)
10   0x7f597045b630 libpthread.so.0 <symbol not found> + 0x7045b630
11   0x7f596f8d4983 libc.so.6 __select + 0x13
12   0x6b9238 postgres <symbol not found> (postmaster.c:1894)
13   0xa187c2 postgres PostmasterMain (postmaster.c:1523)
14   0x6bdc01 postgres main (main.c:205)
15   0x7f596f801555 libc.so.6 __libc_start_main + 0xf5
16   0x6c986c postgres <symbol not found> + 0x6c986c
"
2020-11-17 15:05:40.109946 GMT,"gpadmin","template1",p25831,th1927645312,"10.228.52.11","25064",2020-11-17 15:05:40 GMT,0,con53,,seg10,,,,,"FATAL","57M02","the database system is in recovery mode","last replayed record at 1/19ADA2A


ludi

unread,
Mar 5, 2021, 12:52:23 AM3/5/21
to Alric Kriel, Greenplum Users
It looks like this instance can't connect to primary or mirror instance.

Did you check the related primary or mirror instance?

Thanks,
Dillon

Alric Kriel <alric...@gmail.com> 于 2021年3月4日周四 21:21写道:

Alric Kriel

unread,
Mar 8, 2021, 10:14:00 AM3/8/21
to Greenplum Users, Dillon_lu, Greenplum Users, Alric Kriel
Hi Dillon,

Yes I did check the specific instance but could not see anything strange. What bugs me is that this occurs when I up the ssh count above 500 in sshd_config. When starting up the database all is fine. However the moment I up the max session count above 500 even with ssh count at 1000 the segments start failing. The moment I bring it back down to 500 all works fine again. 

Very stange.

Kind Regards

Reply all
Reply to author
Forward
0 new messages