ERROR: failed to acquire resources on one or more segments

730 views
Skip to first unread message

Maciej Wawrzyniak

unread,
Nov 24, 2021, 11:28:50 AM11/24/21
to Greenplum Users
Hi,

We've installed Greenplum Database 6.18.1 and occasionally experience two errors :

First:
2021-11-20 12:18:29.254286 CET,"dwh_cdr_loop","dwh",p56296,th-22648704,"10.10.114.132","30718",2021-11-20 11:55:17 CET,0,con379,cmd3279,seg-1,,dx2051,,sx1,"ERROR","58M01","failed to acquire resources on one or more segments","FATAL:  writer segworker group shared snapshot collision on id 379. Slot array dump: Local SharedSnapshot Slot Dump: currSlots: 53 maxSlots: 1836 (SLOT index: 0 slotid: 550 QDxid: 695 pid: 287678)(SLOT index: 1 slotid: 379 QDxid: 2033 pid: 357272)(SLOT index: 2 slotid: 504 QDxid: 563 pid: 284117)(SLOT index: 3 slotid: 378 QDxid: 252 pid: 268515)(SLOT index: 4 slotid: 632 QDxid: 2054 pid: 358053)(SLOT index: 5 slotid: 1218 QDxid: 1982 pid: 352685)(SLOT index: 6 slotid: 508 QDxid: 571 pid: 284311)(SLOT index: 7 slotid: 596 QDxid: 816 pid: 291547)(SLOT index: 8 slotid: 511 QDxid: 578 pid: 284552)(SLOT index: 9 slotid: 39 QDxid: 2003 pid: 354276)(SLOT index: 10 slotid: 514 QDxid: 593 pid: 284760)(SLOT index: 11 slotid: 517 QDxid: 605 pid: 285048)(SLOT index: 12 slotid: 585 QDxid: 786 pid: 290436)(SLOT index: 13 slotid: 523 QDxid: 620 pid: 285482)(SLOT index: 14 slotid: 586 QDxid: 787 pid: 290483)(SLOT index: 15 slotid: 623 QDxid: 898 pid: 294186)(SLOT index: 16 slotid: 1268 QDxid: 2017 pid: 356007)(SLOT index: 17 slotid: 784 QDxid: 1352 pid: 310594)(SLOT index: 18 slotid: 530 QDxid: 642 pid: 286162)(SLOT index: 19 slotid: 1241 QDxid: 2005 pid: 354372)(SLOT index: 21 slotid: 1292 QDxid: 0 pid: 357592)(SLOT index: 22 slotid: 1066 QDxid: 1767 pid: 341048)(SLOT index: 23 slotid: 539 QDxid: 669 pid: 286807)(SLOT index: 24 slotid: 537 QDxid: 666 pid: 286605)(SLOT index: 25 slotid: 630 QDxid: 932 pid: 295235)(SLOT index: 26 slotid: 603 QDxid: 840 pid: 292248)(SLOT index: 27 slotid: 660 QDxid: 1014 pid: 298198)(SLOT index: 28 slotid: 1294 QDxid: 0 pid: 357694)(SLOT index: 29 slotid: 547 QDxid: 692 pid: 287401)(SLOT index: 30 slotid: 1077 QDxid: 1816 pid: 342724)(SLOT index: 31 slotid: 670 QDxid: 1034 pid: 299305)(SLOT index: 32 slotid: 577 QDxid: 764 pid: 289409)(SLOT index: 33 slotid: 516 QDxid: 2053 pid: 356694)(SLOT index: 34 slotid: 1260 QDxid: 2015 pid: 355458)(SLOT index: 36 slotid: 575 QDxid: 0 pid: 289468)(SLOT index: 37 slotid: 1261 QDxid: 2012 pid: 355480)(SLOT index: 38 slotid: 619 QDxid: 886 pid: 294039)(SLOT index: 39 slotid: 1262 QDxid: 2024 pid: 355497)(SLOT index: 40 slotid: 1155 QDxid: 2040 pid: 357445)(SLOT index: 41 slotid: 607 QDxid: 851 pid: 292402)(SLOT index: 42 slotid: 1263 QDxid: 2020 pid: 355523)(SLOT index: 44 slotid: 1296 QDxid: 2044 pid: 357739)(SLOT index: 45 slotid: 1295 QDxid: 2045 pid: 357781)(SLOT index: 47 slotid: 641 QDxid: 967 pid: 296851)(SLOT index: 48 slotid: 611 QDxid: 865 pid: 292623)(SLOT index: 49 slotid: 613 QDxid: 868 pid: 293094)(SLOT index: 50 slotid: 1297 QDxid: 0 pid: 357858)(SLOT index: 51 slotid: 1300 QDxid: 2052 pid: 357906)(SLOT index: 52 slotid: 653 QDxid: 993 pid: 297617)(SLOT index: 55 slotid: 1067 QDxid: 2048 pid: 358242)(SLOT index: 57 slotid: 656 QDxid: 1001 pid: 297815)(SLOT index: 58 slotid: 737 QDxid: 1226 pid: 305607)(SLOT index: 64 slotid: 685 QDxid: 1104 pid: 300845) (sharedsnapshot.c:394)(seg23 10.10.114.137:6023)",,,,,"set statement_mem = '5MB'; INSERT INTO XXXX (A, B, C, D, E) VALUES (0, now()::timestamp, 1, 1, 'XYZ');",0,,"cdbgang_async.c",241,
2021-11-20 12:18:29.257502 CET,"dwh_cdr_loop","dwh",p56296,th-22648704,"10.10.114.132","30718",2021-11-20 11:55:17 CET,0,con1311,,seg-1,,dx2051,,sx1,"LOG","00000","The previous session was reset because its gang was disconnected (session id = 379). The new session id = 1311",,,,,,,0,,"cdbgang.c",795,
2021-11-20 12:18:29.258096 CET,"dwh_cdr_loop","dwh",p56282,th-22648704,"10.10.114.132","30524",2021-11-20 11:55:17 CET,0,con378,cmd2485,seg-1,,dx252,,sx2,"WARNING","01000","DBD::Pg::st execute failed: ERROR:  failed to acquire resources on one or more segment

Second:
WARNING: WARNING: WARNING: Transaction aborted because DBD::Pg::st execute failed: ERROR: failed to acquire resources on one or more segments
DETAIL: could not connect to server: Connection timed out
Is the server running on host "10.10.114.137" and accepting
TCP/IP connections on port 6018?
(seg18 10.10.114.137:6018) at line 20.

Errors occurs randomly in our ETL procesess and user's sql's.
Is there any know solution and what is the best way to debug this ?

I'll be happy to provide additional information that can help troubleshoot

Regards


Luis Filipe de Macedo

unread,
Nov 24, 2021, 12:43:31 PM11/24/21
to Maciej Wawrzyniak, Greenplum Users
Looks like you are saturating your resources.

Do you have resource groups properly configured? Did you limit your queues?

From: Maciej Wawrzyniak <maciej.w...@linuxpolska.pl>
Sent: Wednesday, November 24, 2021 1:28:50 PM
To: Greenplum Users <gpdb-...@greenplum.org>
Subject: [gpdb-users] ERROR: failed to acquire resources on one or more segments
 
https://linuxpolska.pl/

Linux Polska Sp. z o.o.

Al. Jerozolimskie 100, 00-807 Warszawa

tel. +48 22 213 95 71, fax +48 22 213 96 71

KRS 00000326158, Sąd Rejonowy dla M. St. Warszawy w Warszawie, XII Wydział Gospodarczy KRS

Kapitał zakładowy 1 000 500 PLN wpłacony w całości, NIP 7010181018, REGON 141791601

 

www.linuxpolska.pl |  https://www.linkedin.com/company/linux-polska/   https://www.facebook.com/linuxpolskapl   https://twitter.com/linuxpolska

_________________________________________________________________________

  

This message may contain confidential information that is covered by legal privilege. If you are not the intended recipient or if you have received this message by mistake, please notify the sender immediately and delete this e-mail and its attachments from your system. Any unauthorized copying, disclosure or distribution of the material in this e-mail and its attachments is strictly forbidden.



--
You received this message because you are subscribed to the Google Groups "Greenplum Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gpdb-users+...@greenplum.org.
To view this discussion on the web visit https://groups.google.com/a/greenplum.org/d/msgid/gpdb-users/a1e07b05-0e26-4521-a44e-6051921814e4n%40greenplum.org.

Ashwin Agrawal

unread,
Nov 24, 2021, 1:53:08 PM11/24/21
to Maciej Wawrzyniak, Greenplum Users
On Wed, Nov 24, 2021 at 8:28 AM Maciej Wawrzyniak <maciej.w...@linuxpolska.pl> wrote:
Hi,

We've installed Greenplum Database 6.18.1 and occasionally experience two errors :

First:
2021-11-20 12:18:29.254286 CET,"dwh_cdr_loop","dwh",p56296,th-22648704,"10.10.114.132","30718",2021-11-20 11:55:17 CET,0,con379,cmd3279,seg-1,,dx2051,,sx1,"ERROR","58M01","failed to acquire resources on one or more segments","FATAL:  writer segworker group shared snapshot collision on id 379. Slot array dump: Local SharedSnapshot Slot Dump: currSlots: 53 maxSlots: 1836 (SLOT index: 0 slotid: 550 QDxid: 695 pid: 287678)(SLOT index: 1 slotid: 379 QDxid: 2033 pid: 357272)(SLOT index: 2 slotid: 504 QDxid: 563 pid: 284117)(SLOT index: 3 slotid: 378 QDxid: 252 pid: 268515)(SLOT index: 4 slotid: 632 QDxid: 2054 pid: 358053)(SLOT index: 5 slotid: 1218 QDxid: 1982 pid: 352685)(SLOT index: 6 slotid: 508 QDxid: 571 pid: 284311)(SLOT index: 7 slotid: 596 QDxid: 816 pid: 291547)(SLOT index: 8 slotid: 511 QDxid: 578 pid: 284552)(SLOT index: 9 slotid: 39 QDxid: 2003 pid: 354276)(SLOT index: 10 slotid: 514 QDxid: 593 pid: 284760)(SLOT index: 11 slotid: 517 QDxid: 605 pid: 285048)(SLOT index: 12 slotid: 585 QDxid: 786 pid: 290436)(SLOT index: 13 slotid: 523 QDxid: 620 pid: 285482)(SLOT index: 14 slotid: 586 QDxid: 787 pid: 290483)(SLOT index: 15 slotid: 623 QDxid: 898 pid: 294186)(SLOT index: 16 slotid: 1268 QDxid: 2017 pid: 356007)(SLOT index: 17 slotid: 784 QDxid: 1352 pid: 310594)(SLOT index: 18 slotid: 530 QDxid: 642 pid: 286162)(SLOT index: 19 slotid: 1241 QDxid: 2005 pid: 354372)(SLOT index: 21 slotid: 1292 QDxid: 0 pid: 357592)(SLOT index: 22 slotid: 1066 QDxid: 1767 pid: 341048)(SLOT index: 23 slotid: 539 QDxid: 669 pid: 286807)(SLOT index: 24 slotid: 537 QDxid: 666 pid: 286605)(SLOT index: 25 slotid: 630 QDxid: 932 pid: 295235)(SLOT index: 26 slotid: 603 QDxid: 840 pid: 292248)(SLOT index: 27 slotid: 660 QDxid: 1014 pid: 298198)(SLOT index: 28 slotid: 1294 QDxid: 0 pid: 357694)(SLOT index: 29 slotid: 547 QDxid: 692 pid: 287401)(SLOT index: 30 slotid: 1077 QDxid: 1816 pid: 342724)(SLOT index: 31 slotid: 670 QDxid: 1034 pid: 299305)(SLOT index: 32 slotid: 577 QDxid: 764 pid: 289409)(SLOT index: 33 slotid: 516 QDxid: 2053 pid: 356694)(SLOT index: 34 slotid: 1260 QDxid: 2015 pid: 355458)(SLOT index: 36 slotid: 575 QDxid: 0 pid: 289468)(SLOT index: 37 slotid: 1261 QDxid: 2012 pid: 355480)(SLOT index: 38 slotid: 619 QDxid: 886 pid: 294039)(SLOT index: 39 slotid: 1262 QDxid: 2024 pid: 355497)(SLOT index: 40 slotid: 1155 QDxid: 2040 pid: 357445)(SLOT index: 41 slotid: 607 QDxid: 851 pid: 292402)(SLOT index: 42 slotid: 1263 QDxid: 2020 pid: 355523)(SLOT index: 44 slotid: 1296 QDxid: 2044 pid: 357739)(SLOT index: 45 slotid: 1295 QDxid: 2045 pid: 357781)(SLOT index: 47 slotid: 641 QDxid: 967 pid: 296851)(SLOT index: 48 slotid: 611 QDxid: 865 pid: 292623)(SLOT index: 49 slotid: 613 QDxid: 868 pid: 293094)(SLOT index: 50 slotid: 1297 QDxid: 0 pid: 357858)(SLOT index: 51 slotid: 1300 QDxid: 2052 pid: 357906)(SLOT index: 52 slotid: 653 QDxid: 993 pid: 297617)(SLOT index: 55 slotid: 1067 QDxid: 2048 pid: 358242)(SLOT index: 57 slotid: 656 QDxid: 1001 pid: 297815)(SLOT index: 58 slotid: 737 QDxid: 1226 pid: 305607)(SLOT index: 64 slotid: 685 QDxid: 1104 pid: 300845) (sharedsnapshot.c:394)(seg23 10.10.114.137:6023)",,,,,"set statement_mem = '5MB'; INSERT INTO XXXX (A, B, C, D, E) VALUES (0, now()::timestamp, 1, 1, 'XYZ');",0,,"cdbgang_async.c",241,
2021-11-20 12:18:29.257502 CET,"dwh_cdr_loop","dwh",p56296,th-22648704,"10.10.114.132","30718",2021-11-20 11:55:17 CET,0,con1311,,seg-1,,dx2051,,sx1,"LOG","00000","The previous session was reset because its gang was disconnected (session id = 379). The new session id = 1311",,,,,,,0,,"cdbgang.c",795,
2021-11-20 12:18:29.258096 CET,"dwh_cdr_loop","dwh",p56282,th-22648704,"10.10.114.132","30524",2021-11-20 11:55:17 CET,0,con378,cmd2485,seg-1,,dx252,,sx2,"WARNING","01000","DBD::Pg::st execute failed: ERROR:  failed to acquire resources on one or more segment

For this problem I had initiated discussion in the gpdb-dev list via [1]. Would like to understand more in which scenarios you are seeing this snapshot collision error.
Is GUC gp_vmem_idle_resource_timeout coming into play for these sessions?
Do you have a work-load where the session remains idle for this GUC gp_vmem_idle_resource_timeout time and then queries are executed again using the same session?
Most likely then adjusting/increasing the value of GUC gp_vmem_idle_resource_timeout should help resolve the situation.
In past increasing the value of GUC gp_snapshotadd_timeout also provided relief from this error (though I personally as developer don't like this solution, though no downsides)

Second:
WARNING: WARNING: WARNING: Transaction aborted because DBD::Pg::st execute failed: ERROR: failed to acquire resources on one or more segments
DETAIL: could not connect to server: Connection timed out
Is the server running on host "10.10.114.137" and accepting
TCP/IP connections on port 6018?
(seg18 10.10.114.137:6018) at line 20.

Seems seg18 crashed based on the message. Please look into database log files in pg_log for seg18.
That should provide stacktrace or PANIC string to trace the reason for the failure and for which query, etc...



-- 
Ashwin Agrawal (VMware)

Maciej Wawrzyniak

unread,
Nov 25, 2021, 10:33:13 AM11/25/21
to Ashwin Agrawal, mac...@vmware.com, Greenplum Users
@Luis
  We migrated from GP5 to GP6 last week, and we didn't have time to think about resource groups (we have it in our backlog).
Resource queue from GP5:

ALTER RESOURCE QUEUE pg_default ACTIVE THRESHOLD 3 NOOVERCOMMIT WITH (priority = max, memory_limit = '375MB');
CREATE RESOURCE QUEUE dr_1 ACTIVE THRESHOLD 20 NOOVERCOMMIT WITH (priority = MEDIUM, memory_limit = '4000MB');
CREATE RESOURCE QUEUE pg_2 ACTIVE THRESHOLD 15 NOOVERCOMMIT WITH (priority = MEDIUM, memory_limit = '2GB');
CREATE RESOURCE QUEUE pg_3 ACTIVE THRESHOLD 15 NOOVERCOMMIT WITH (priority = MEDIUM, memory_limit = '2000MB');
CREATE RESOURCE QUEUE pg_4 ACTIVE THRESHOLD 4 NOOVERCOMMIT IGNORE THRESHOLD 500.00 WITH (priority = low, memory_limit = '500MB');
CREATE RESOURCE QUEUE pg_5 ACTIVE THRESHOLD 6 NOOVERCOMMIT WITH (priority = MEDIUM, memory_limit = '1GB');
CREATE RESOURCE QUEUE pg_6 ACTIVE THRESHOLD 12 NOOVERCOMMIT IGNORE THRESHOLD 500.00 WITH (priority = MEDIUM, memory_limit = '4GB');
CREATE RESOURCE QUEUE pg_7 ACTIVE THRESHOLD 15 NOOVERCOMMIT WITH (priority = MEDIUM, memory_limit = '2GB');
CREATE RESOURCE QUEUE pg_8 ACTIVE THRESHOLD 9 NOOVERCOMMIT IGNORE THRESHOLD 1000.00 WITH (priority = high, memory_limit = '2GB');

@Ashwin
it's strange but there is no error like message in pg_log on seg18 at this time. It looks like seg18 was running all the time.
--

Pozdrawiam/Regards,

 

Maciej Wawrzyniak

Senior Solutions Architect

tel. +48 600 050 583

Luis Filipe de Macedo

unread,
Nov 25, 2021, 11:18:47 AM11/25/21
to Maciej Wawrzyniak, Ashwin Agrawal, Greenplum Users
The issue appeared after the upgrade?

Can you tell if the issue happens when you have many concurrent queries?

Is your environment virtual or physical?

Rgds
From: Maciej Wawrzyniak <maciej.w...@linuxpolska.pl>
Sent: Thursday, November 25, 2021 12:32:34 PM
To: Ashwin Agrawal <ashwi...@gmail.com>; Luis Filipe de Macedo <mac...@vmware.com>
Cc: Greenplum Users <gpdb-...@greenplum.org>
Subject: Re: [gpdb-users] ERROR: failed to acquire resources on one or more segments
 

Maciej Wawrzyniak

unread,
Nov 29, 2021, 5:08:58 AM11/29/21
to Greenplum Users, Luis Macedo, Greenplum Users, Ashwin Agrawal
Yes, issue  appeared after we migrated from 5 to 6 ( we removed GP5 and installed GP6).  Environment is exactly the same,  physical 6 servers (master, standby master, 4 x workers).
I will try to correlate error with multiple simultaneous queries and come back with feedback.

Luis Filipe de Macedo

unread,
Nov 29, 2021, 8:13:19 AM11/29/21
to Maciej Wawrzyniak, Greenplum Users, Greenplum Users, Ashwin Agrawal

What is your network setup?

 

I see these type of error when for one reason or another one segment can’t talk to another. Maybe GP6 is more chatty and you hitting some number of connections limit.

 

Can say what the CPU from the servers looked like at the time of the incident?

 

Rgds,

 

Luis F R Macedo

Advisory Data Engineer & Business Development for Latam

VMware Tanzu Data

Call Me @ +55 11 98860 8596 (new)

Take care of the customers and the rest takes care of itself

Linux Polska Sp. z o.o.

Al. Jerozolimskie 100, 00-807 Warszawa

tel. +48 22 213 95 71, fax +48 22 213 96 71

KRS 00000326158, Sąd Rejonowy dla M. St. Warszawy w Warszawie, XII Wydział Gospodarczy KRS

Kapitał zakładowy 1 000 500 PLN wpłacony w całości, NIP 7010181018, REGON 141791601

 

_________________________________________________________________________

  

This message may contain confidential information that is covered by legal privilege. If you are not the intended recipient or if you have received this message by mistake, please notify the sender immediately and delete this e-mail and its attachments from your system. Any unauthorized copying, disclosure or distribution of the material in this e-mail and its attachments is strictly forbidden.




 

Linux Polska Sp. z o.o.

Al. Jerozolimskie 100, 00-807 Warszawa

tel. +48 22 213 95 71, fax +48 22 213 96 71

KRS 00000326158, Sąd Rejonowy dla M. St. Warszawy w Warszawie, XII Wydział Gospodarczy KRS

Kapitał zakładowy 1 000 500 PLN wpłacony w całości, NIP 7010181018, REGON 141791601

 

_________________________________________________________________________

  

This message may contain confidential information that is covered by legal privilege. If you are not the intended recipient or if you have received this message by mistake, please notify the sender immediately and delete this e-mail and its attachments from your system. Any unauthorized copying, disclosure or distribution of the material in this e-mail and its attachments is strictly forbidden.




--

You received this message because you are subscribed to the Google Groups "Greenplum Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gpdb-users+...@greenplum.org.

Reply all
Reply to author
Forward
0 new messages