Master process received signal SIGSEGV

263 views
Skip to first unread message

lambgong

unread,
Mar 21, 2019, 4:39:11 AM3/21/19
to Greenplum Users
Hi,

   My GP cluster had encounter a problem as tilte, the master log as follow:

2019-03-19 18:06:04.589008 CST,,,p25193,th0,,,2019-03-19 17:09:14 CST,0,con4032051,cmd99,seg-1,,,,,"PA
NIC","XX000","Unexpected internal error: Master process received signal SIGSEGV",,,,,,,0,,,,"1    0x92
a063 postgres StandardHandlerForSigillSigsegvSigbus_OnMainThread + 0x163
2    0x7fa325c8b370 libpthread.so.0 <symbol not found> + 0x25c8b370
3    0x957854 postgres MemoryContextFreeImpl + 0x4
4    0x9e686c postgres add_second_stage_agg + 0x3ac
5    0x9e711a postgres <symbol not found> + 0x9e711a
6    0x9ebd41 postgres cdb_grouping_planner + 0x2891
7    0x9ed0c0 postgres <symbol not found> + 0x9ed0c0
8    0x9ee2da postgres <symbol not found> + 0x9ee2da
9    0x9efa3f postgres within_agg_planner + 0xc5f
10   0x766add postgres <symbol not found> + 0x766add
11   0x768720 postgres subquery_planner + 0x710
12   0x768c37 postgres standard_planner + 0x167
13   0x769385 postgres planner + 0x145
14   0x80d92c postgres <symbol not found> + 0x80d92c
15   0x80ee82 postgres <symbol not found> + 0x80ee82
16   0x811254 postgres PostgresMain + 0x1a84
17   0x7b3fe6 postgres <symbol not found> + 0x7b3fe6
18   0x7b59e2 postgres PostmasterMain + 0xc52
19   0x4c833d postgres main + 0x39d
20   0x7fa3250bbb35 libc.so.6 __libc_start_main + 0xf5
21   0x4c88d1 postgres <symbol not found> + 0x4c88d1

2019-03-19 18:06:04.602972 CST,,,p8857,th686254144,,,,0,,,seg-1,,,,,"LOG","00000","server process (PID
 25193) was terminated by signal 11: Segmentation fault",,,,,,,0,,"postmaster.c",5574,
2019-03-19 18:06:04.603018 CST,,,p8857,th686254144,,,,0,,,seg-1,,,,,"LOG","00000","terminating any oth
er active server processes",,,,,,,0,,"postmaster.c",5253,
2019-03-19 18:06:04.612948 CST,,,p8857,th686254144,,,,0,,,seg-1,,,,,"LOG","00000","ftsprobe process (P
ID 3874) exited with exit code 2",,,,,,,0,,"postmaster.c",5554,
2019-03-19 18:06:04.612994 CST,"dw3d","etl3d",p13195,th686254144,"172.19.0.8","50478",2019-03-19 18:06
:04 CST,0,,,seg-1,,,,,"FATAL","57P03","the database system is in recovery mode",,,,,,,0,,"postmaster.c
",2924,
2019-03-19 18:06:04.618214 CST,,,p3878,th686254144,,,,0,con3736419,,seg-1,,,,,"FATAL","57P01","termina
ting connection due to administrator command",,,,,,,0,,"postgres.c",3650,
2019-03-19 18:06:04.618276 CST,,,p8857,th686254144,,,,0,,,seg-1,,,,,"LOG","00000","sweeper process (PI
D 3876) exited with exit code 2",,,,,,,0,,"postmaster.c",5554,
2019-03-19 18:06:04.618393 CST,,,p8857,th686254144,,,,0,,,seg-1,,,,,"LOG","00000","seqserver process (
PID 3873) exited with exit code 2",,,,,,,0,,"postmaster.c",5554,
2019-03-19 18:06:04.618407 CST,,,p8857,th686254144,,,,0,,,seg-1,,,,,"LOG","00000","stats sender proces
s (PID 3878) exited with exit code 1",,,,,,,0,,"postmaster.c",5554,
2019-03-19 18:06:04.627729 CST,,,p8857,th686254144,,,,0,,,seg-1,,,,,"LOG","00000","BeginResetOfPostmas
terAfterChildrenAreShutDown: counter 66",,,,,,,0,,"postmaster.c",2147,

So my question is What may caused this,it seems that system resources  is enough.

Heikki Linnakangas

unread,
Mar 21, 2019, 5:05:57 AM3/21/19
to lambgong, Greenplum Users
Hi!

On 21/03/2019 10:39, lambgong wrote:
>    My GP cluster had encounter a problem as tilte, the master log as
> follow:

Looks like a bug.

What version of Greenplum are you running? Have you updated to the
latest stable version? What is the query that caused this?
This sounds similar to the bugs that were fixed here:

https://github.com/greenplum-db/gpdb/commit/0668b582ceb9cf6ab8775bfd1db8b23949e5fa16

and here:

https://github.com/greenplum-db/gpdb/commit/19ad4bdedb120929d70806b11470cf131ceb21be

If you're running the latest 5.X version, you should have those fixes
already, though. But perhaps there's another similar bug still lurking
there?

If you can create a self-contained test case to trigger the bug, with
CREATE TABLE and INSERTs to reconstruct the tables, and the SELECT that
caused the crash, that would help a lot in finding the bug.

- Heikki

lambgong

unread,
Mar 21, 2019, 5:41:17 AM3/21/19
to Greenplum Users, gongga...@gmail.com
What version of Greenplum are you running?
-- My version is 5X_STABLE published by 2018-05

What is the query that caused this? 
-- Create external table; insert into inner table selelct from external table; drop external table

在 2019年3月21日星期四 UTC+8下午5:05:57,Heikki Linnakangas写道:

Heikki Linnakangas

unread,
Mar 21, 2019, 9:05:58 AM3/21/19
to lambgong, Greenplum Users
On 21/03/2019 11:41, lambgong wrote:
>> What version of Greenplum are you running?
>
> My version is 5X_STABLE published by 2018-05

Ok. You really need to upgrade. The latest minor version is 5.17.0.
There's been a lot of bug fixes between your checkout and 5.17.0.

>> What is the query that caused this?
>> Create external table; insert into inner table selelct from external
> table; drop external table

That can't be the whole story. Based on the stack trace, there must be
WITHIN GROUP somewhere in the query.

If you can still reproduce it after upgrading, please try to write a
self-contained script, to reproduce the issue, and post it here.

- Heikki

lambgong

unread,
Mar 21, 2019, 10:37:22 AM3/21/19
to Greenplum Users, gongga...@gmail.com
That can't be the whole story. Based on the stack trace, there must be 
WITHIN GROUP somewhere in the query.
-- Yes,you are right,I find the full log as follow:

2019-03-19 18:06:04.552088 CST,"dw3d","etl3d",p25193,th686254144,"172.19.0.8","43972",2019-03-19 17:09
:14 CST,0,con4032051,cmd99,seg-1,,dx1885228,,sx1,"LOG","00000","Planner produced plan :0",,,,,,"select

      channel,
      PERCENTILE_CONT(0.5) WITHIN GROUP(ORDER BY STARTUP_CNT)    AS MED
from m_login
where fdate = 20190318 and platform in ('islandegg')
group by channel;",0,,"orca.c",60,
2019-03-19 18:06:04.589008 CST,,,p25193,th0,,,2019-03-19 17:09:14 CST,0,con4032051,cmd99,seg-1,,,,,"PA
NIC","XX000","Unexpected internal error: Master process received signal SIGSEGV",,,,,,,0,,,,"1    0x92
a063 postgres StandardHandlerForSigillSigsegvSigbus_OnMainThread + 0x163
2    0x7fa325c8b370 libpthread.so.0 <symbol not found> + 0x25c8b370
3    0x957854 postgres MemoryContextFreeImpl + 0x4
4    0x9e686c postgres add_second_stage_agg + 0x3ac
5    0x9e711a postgres <symbol not found> + 0x9e711a
6    0x9ebd41 postgres cdb_grouping_planner + 0x2891
7    0x9ed0c0 postgres <symbol not found> + 0x9ed0c0
8    0x9ee2da postgres <symbol not found> + 0x9ee2da
9    0x9efa3f postgres within_agg_planner + 0xc5f
10   0x766add postgres <symbol not found> + 0x766add
11   0x768720 postgres subquery_planner + 0x710
12   0x768c37 postgres standard_planner + 0x167
13   0x769385 postgres planner + 0x145
14   0x80d92c postgres <symbol not found> + 0x80d92c
15   0x80ee82 postgres <symbol not found> + 0x80ee82
16   0x811254 postgres PostgresMain + 0x1a84
17   0x7b3fe6 postgres <symbol not found> + 0x7b3fe6
18   0x7b59e2 postgres PostmasterMain + 0xc52
19   0x4c833d postgres main + 0x39d
20   0x7fa3250bbb35 libc.so.6 __libc_start_main + 0xf5
21   0x4c88d1 postgres <symbol not found> + 0x4c88d1

在 2019年3月21日星期四 UTC+8下午9:05:58,Heikki Linnakangas写道:
Message has been deleted

lambgong

unread,
Mar 21, 2019, 10:46:30 AM3/21/19
to Greenplum Users, gongga...@gmail.com
That can't be the whole story. Based on the stack trace, there must be 
WITHIN GROUP somewhere in the query. 
-- You are right, i find the full log as follow:

2019-03-19 18:06:04.552088 CST,"dw3d","etl3d",p25193,th686254144,"172.19.0.8","43972",2019-03-19 17:09
:14 CST,0,con4032051,cmd99,seg-1,,dx1885228,,sx1,"LOG","00000","Planner produced plan :0",,,,,,"select

      channel,
      PERCENTILE_CONT(0.5) WITHIN GROUP(ORDER BY STARTUP_CNT)    AS MED
from m_login
where fdate = 20190318 and platform in ('islandegg')
group by channel;",0,,"orca.c",60,
2019-03-19 18:06:04.589008 CST,,,p25193,th0,,,2019-03-19 17:09:14 CST,0,con4032051,cmd99,seg-1,,,,,"PA
NIC","XX000","Unexpected internal error: Master process received signal SIGSEGV",,,,,,,0,,,,"1    0x92
a063 postgres StandardHandlerForSigillSigsegvSigbus_OnMainThread + 0x163
2    0x7fa325c8b370 libpthread.so.0 <symbol not found> + 0x25c8b370
3    0x957854 postgres MemoryContextFreeImpl + 0x4
4    0x9e686c postgres add_second_stage_agg + 0x3ac
5    0x9e711a postgres <symbol not found> + 0x9e711a
6    0x9ebd41 postgres cdb_grouping_planner + 0x2891

So the problem is caused by WITHIN GROUP query? And under what condition will cause the problem?
Thanks!

在 2019年3月21日星期四 UTC+8下午9:05:58,Heikki Linnakangas写道:
On 21/03/2019 11:41, lambgong wrote:

Heikki Linnakangas

unread,
Mar 22, 2019, 3:15:00 AM3/22/19
to lambgong, Greenplum Users
On 21/03/2019 16:46, lambgong wrote:
> So the problem is caused by WITHIN GROUP query? And under what condition
> will cause the problem?

I'm not sure. But you need to upgrade. That will probably fix the problem.

- Heikki

lambgong

unread,
Mar 22, 2019, 4:12:32 AM3/22/19
to Greenplum Users, gongga...@gmail.com
Thanks, i will upgrade and try the query!

在 2019年3月22日星期五 UTC+8下午3:15:00,Heikki Linnakangas写道:
Reply all
Reply to author
Forward
0 new messages