"ERROR","58M01","failed to acquire resources on one or more segments"

379 views
Skip to first unread message

Pilar de Teodoro

unread,
Feb 9, 2023, 2:50:49 PM2/9/23
to gpdb-...@greenplum.org, Sara Nieto, virgini...@ext.esa.int

Dear all,


We are working with a Greenplum Cluster 6.19.0 running on RHEL7. Interconnect is 40G.

1 master, 1 standby and 4 datanodes with 20 primary segments, 5 each, and 20  mirror segments. Our master/standby are 256GB RAM and 80 cores. Datanodes are 512GB RAM and 80 cores respectively.


From time to time our queries hang and we find these type of errors: 


2023-02-08 14:20:50.697116 CET,"postgres","pgeuclid_ops",p220860,th876542080,"192.168.118.46","29252",2023-02-07 20:55:09 CET,0,con66,cmd1412,seg-1,,,,sx1,"ERROR","58M01","failed to acquire resources on one or more segments","could not connect to server: Connection timed out

        Is the server running on host ""192.168.118.21"" and accepting

        TCP/IP connections on port 45000?


When we ran the gpperfcheck on master and datanodes it throws:


/home/gpadmin/software/bin/gpcheckperf -f hostfile_all -r N -d /tmp


-------------------

--  NETPERF TEST

-------------------


====================

==  RESULT 2023-02-09T12:21:42.182903

====================

Netperf bisection bandwidth test

easgpopsmaster01.scieuc.lan -> easgpopsnode01.scieuc.lan = 2767.440000

easgpopsnode02.scieuc.lan -> easgpopsnode03.scieuc.lan = 2194.770000

easgpopsnode04.scieuc.lan -> easgpopsmaster01.scieuc.lan = 2401.130000

easgpopsnode01.scieuc.lan -> easgpopsmaster01.scieuc.lan = 2196.900000

easgpopsnode03.scieuc.lan -> easgpopsnode02.scieuc.lan = 2200.570000

easgpopsmaster01.scieuc.lan -> easgpopsnode04.scieuc.lan = 2796.790000


Summary:

sum = 14557.60 MB/sec

min = 2194.77 MB/sec

max = 2796.79 MB/sec

avg = 2426.27 MB/sec

median = 2401.13 MB/sec


[Warning] connection between easgpopsnode02.scieuc.lan and easgpopsnode03.scieuc.lan is no good

[Warning] connection between easgpopsnode04.scieuc.lan and easgpopsmaster01.scieuc.lan is no good

[Warning] connection between easgpopsnode01.scieuc.lan and easgpopsmaster01.scieuc.lan is no good

[Warning] connection between easgpopsnode03.scieuc.lan and easgpopsnode02.scieuc.lan is no good



What is exactly not good?





We are using default parameters in the configuration except for:


in master:

 

log_statement=all
gp_contentid=-1
gp_vmem_protect_limit=84760
gp_interconnect_type=tcp
gp_interconnect_tcp_listener_backlog=4096
#max_connections=500
max_connections=500
max_prepared_transactions=500
#gp_fts_probe_timeout=500
gp_fts_probe_timeout=500
shared_buffers=64GB
effective_cache_size=192GB
maintenance_work_mem=2GB
checkpoint_completion_target=0.9
wal_buffers=16MB
random_page_cost=1.1
effective_io_concurrency=200
max_worker_processes=80

 

in datanodes:

 

gp_contentid=0
gp_vmem_protect_limit=84760
gp_interconnect_type=tcp
gp_interconnect_tcp_listener_backlog=4096
max_connections=2500
max_prepared_transactions=2500
log_statement=all
gp_fts_probe_timeout=500
shared_buffers=8GB
effective_cache_size=24GB
maintenance_work_mem=256MB
checkpoint_completion_target=0.9
wal_buffers=16MB
random_page_cost=1.1
effective_io_concurrency=25
max_worker_processes=10


Our application connects to the GP cluster using jdbc 42.2


How can we know what is really happening? What should we check on the network?



Than you very much for any suggestion,


Pilar





Kevin Huang

unread,
Feb 10, 2023, 5:34:18 PM2/10/23
to Greenplum Users, Pilar de Teodoro, Sara Nieto, virgini...@ext.esa.int
Hi Pilar,

Is this a new cluster? Any particular reason for using gp_interconnect_type=tcp? Generally I've seen udpifc is most commonly used for Greenplum clusters. You could also check for packet drops on master and segment hosts.

Kevin 

Danilo Fortunato

unread,
Feb 23, 2023, 4:44:17 PM2/23/23
to gpdb-...@greenplum.org, pilar.d...@gmail.com
Pilar,
I had the error "failed to acquire resources on one or more segments".
In my case it was caused by the value of two Linux parameters, that were wrongly set at a very low value on new servers added to an existing cluster:
- soft nproc (it should be set as described in the Greenplum installation guide)
- kernel.pid_max (its value was different and much lower than the other servers of the clusters).

Hope it helps.

Regards,
Danilo Fortunato



Da: Kevin Huang <kchua...@gmail.com> (kchua...@gmail.com)
Inviato: Venerdì 10 Febbraio 2023 23:34
A: Greenplum Users (gpdb-...@greenplum.org)
Cc: Pilar De Teodoro, Sara Nieto, Virgini .. (pilar.d...@gmail.com, sara....@esa.int, virgini...@ext.esa.int)
Oggetto: [gpdb-users] Re: "ERROR","58M01","failed to acquire resources on one or more segments"

 

--
You received this message because you are subscribed to the Google Groups "Greenplum Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gpdb-users+...@greenplum.org.
To view this discussion on the web visit https://groups.google.com/a/greenplum.org/d/msgid/gpdb-users/2145ad1a-06c9-4bf2-ae90-fd83f21b099dn%40greenplum.org.
Reply all
Reply to author
Forward
0 new messages