Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Bug#993848: postgresql-13: A SELECT query with cursor can cause segmentation fault

906 views
Skip to first unread message

Tomas Barton

unread,
Sep 7, 2021, 4:40:04 AM9/7/21
to
Package: postgresql-13
Version: 13.4-1.pgdg100+1
Severity: normal

Dear Maintainer,

a slightly sophisticated SELECT query with a CURSOR can lead to
postgresql server segmentation fault.

LOG:  server process (PID 10722) was terminated by signal 11: Segmentation fault
DETAIL:  Failed process was running: COMMIT
LOG:  terminating any other active server processes
WARNING:  terminating connection because of crash of another server process
DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.

Here's a backtrace retrieved from core dump:

(gdb) bt
#0  slot_deform_heap_tuple (natts=5, offp=0x557cc2e8ffd0, tuple=<optimized out>, slot=0x557cc2e8ff88) at ./build/../src/backend/executor/execTuples.c:930
#1  tts_buffer_heap_getsomeattrs (slot=0x557cc2e8ff88, natts=5) at ./build/../src/backend/executor/execTuples.c:695
#2  0x0000557cc1d3998c in slot_getsomeattrs_int (slot=slot@entry=0x557cc2e8ff88, attnum=5) at ./build/../src/backend/executor/execTuples.c:1912
#3  0x0000557cc1d28fba in slot_getsomeattrs (attnum=<optimized out>, slot=0x557cc2e8ff88) at ./build/../src/include/executor/tuptable.h:344
#4  ExecInterpExpr (state=0x557cc303c538, econtext=0x557cc2d08178, isnull=<optimized out>) at ./build/../src/backend/executor/execExprInterp.c:482
#5  0x0000557cc1d5548d in ExecEvalExprSwitchContext (isNull=0x7ffdd2599507, econtext=0x557cc2d08178, state=0x557cc303c538) at ./build/../src/include/executor/executor.h:322
#6  ExecQual (econtext=0x557cc2d08178, state=0x557cc303c538) at ./build/../src/include/executor/executor.h:391
#7  MJFillInner (node=0x557cc2d07f68) at ./build/../src/backend/executor/nodeMergejoin.c:494
#8  0x0000557cc1d55ce8 in ExecMergeJoin (pstate=0x557cc2d07f68) at ./build/../src/backend/executor/nodeMergejoin.c:1353
#9  0x0000557cc1d2cc83 in ExecProcNode (node=0x557cc2d07f68) at ./build/../src/include/executor/executor.h:248
#10 ExecutePlan (execute_once=<optimized out>, dest=0x557cc2ed8f60, direction=<optimized out>, numberTuples=0, sendTuples=<optimized out>, operation=CMD_SELECT, use_parallel_mode=<optimized out>, planstate=0x557cc2d07f68,
   estate=0x557cc2d07d08) at ./build/../src/backend/executor/execMain.c:1632
#11 standard_ExecutorRun (queryDesc=0x557cc2ed8ed0, direction=<optimized out>, count=0, execute_once=<optimized out>) at ./build/../src/backend/executor/execMain.c:350
#12 0x00007f0ec05ae09d in pgss_ExecutorRun (queryDesc=0x557cc2ed8ed0, direction=ForwardScanDirection, count=0, execute_once=<optimized out>) at ./build/../contrib/pg_stat_statements/pg_stat_statements.c:1045
#13 0x0000557cc1cdbcd4 in PersistHoldablePortal (portal=portal@entry=0x557cc2d489f8) at ./build/../src/backend/commands/portalcmds.c:407
#14 0x0000557cc1ff95f9 in HoldPortal (portal=portal@entry=0x557cc2d489f8) at ./build/../src/backend/utils/mmgr/portalmem.c:642
#15 0x0000557cc1ff9e7d in PreCommit_Portals (isPrepare=isPrepare@entry=false) at ./build/../src/backend/utils/mmgr/portalmem.c:738
#16 0x0000557cc1c001c4 in CommitTransaction () at ./build/../src/backend/access/transam/xact.c:2087
#17 0x0000557cc1c015d5 in CommitTransactionCommand () at ./build/../src/backend/access/transam/xact.c:3085
#18 0x0000557cc1ea211d in finish_xact_command () at ./build/../src/backend/tcop/postgres.c:2662
#19 0x0000557cc1ea4703 in exec_simple_query (query_string=0x557cc2c9cd28 "COMMIT") at ./build/../src/backend/tcop/postgres.c:1264
#20 0x0000557cc1ea6143 in PostgresMain (argc=<optimized out>, argv=argv@entry=0x557cc2cf7af8, dbname=<optimized out>, username=<optimized out>) at ./build/../src/backend/tcop/postgres.c:4339
#21 0x0000557cc1e25bcd in BackendRun (port=0x557cc2ce9650, port=0x557cc2ce9650) at ./build/../src/backend/postmaster/postmaster.c:4526
#22 BackendStartup (port=0x557cc2ce9650) at ./build/../src/backend/postmaster/postmaster.c:4210
#23 ServerLoop () at ./build/../src/backend/postmaster/postmaster.c:1739
#24 0x0000557cc1e26b41 in PostmasterMain (argc=5, argv=<optimized out>) at ./build/../src/backend/postmaster/postmaster.c:1412
#25 0x0000557cc1b70f4f in main (argc=5, argv=0x557cc2c96c30) at ./build/../src/backend/main/main.c:210

(gdb) frame 11
#11 standard_ExecutorRun (queryDesc=0x557cc2ed8ed0, direction=<optimized out>, count=0, execute_once=<optimized out>) at ./build/../src/backend/executor/execMain.c:350
350     ./build/../src/backend/executor/execMain.c: No such file or directory.
(gdb) p queryDesc
$1 = (QueryDesc *) 0x557cc2ed8ed0
(gdb) p *(QueryDesc *) 0x557cc2ed8ed0
$2 = {operation = CMD_SELECT, plannedstmt = 0x557cc2faef08,  
 sourceText = 0x557cc2ed8a70 "\n", ' ' <repeats 12 times>, "DECLARE \"categoryPagePhotoUrl_image_urls\" CURSOR WITH HOLD FOR\n", ' ' <repeats 12 times>, "SELECT di.itemId, image_number, filename FROM (SELECT *\n", ' ' <repeats 12
times>, "FROM downloaded_images"..., snapshot = 0x557cc2edf328, crosscheck_snapshot = 0x0, dest = 0x557cc2ed8f60, params = 0x0, queryEnv = 0x0, instrument_options = 0, tupDesc = 0x557cc303bc58,  
 estate = 0x557cc2d07d08, planstate = 0x557cc2d07f68, already_executed = true, totaltime = 0x0

I'll try to make an reproducable code, let me known if you need more
information.

The query might be a bit nasty, but it shouldn't crash whole server.

Best Regards,
Tomas Barton


-- System Information:
Debian Release: 10.10
 APT prefers oldstable-updates
 APT policy: (500, 'oldstable-updates'), (500, 'oldstable')
Architecture: amd64 (x86_64)

Kernel: Linux 4.19.0-17-amd64 (SMP w/12 CPU cores)
Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8), LANGUAGE=en_US.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /usr/bin/dash
Init: systemd (via /run/systemd/system)
LSM: AppArmor: enabled

Versions of packages postgresql-13 depends on:
ii  debconf [debconf-2.0]  1.5.71
ii  libc6                  2.28-10
ii  libgcc1                1:8.3.0-6
ii  libgssapi-krb5-2       1.17-3+deb10u2
ii  libicu63               63.1-6+deb10u1
ii  libldap-2.4-2          2.4.47+dfsg-3+deb10u6
ii  libllvm7               1:7.0.1-8+deb10u2
ii  libpam0g               1.3.1-5
ii  libpq5                 13.4-1.pgdg100+1
ii  libselinux1            2.8-1+b1
ii  libssl1.1              1.1.1d-0+deb10u7
ii  libstdc++6             8.3.0-6
ii  libsystemd0            241-7~deb10u8
ii  libuuid1               2.33.1-0.1
ii  libxml2                2.9.4+dfsg1-7+deb10u2
ii  libxslt1.1             1.1.32-2.2~deb10u1
ii  locales                2.28-10
ii  locales-all            2.28-10
ii  postgresql-client-13   13.4-1.pgdg100+1
ii  postgresql-common      226.pgdg100+1
ii  ssl-cert               1.0.39
ii  tzdata                 2021a-0+deb10u1
ii  zlib1g                 1:1.2.11.dfsg-1

Versions of packages postgresql-13 recommends:
pn  sysstat  <none>

postgresql-13 suggests no packages.

-- debconf information excluded

Christoph Berg

unread,
Sep 7, 2021, 8:20:04 AM9/7/21
to
Re: Tomas Barton
> a slightly sophisticated SELECT query with a CURSOR can lead to
> postgresql server segmentation fault.
>
> LOG: server process (PID 10722) was terminated by signal 11: Segmentation
> fault
> DETAIL: Failed process was running: COMMIT

> I'll try to make an reproducable code, let me known if you need more
> information.
>
> The query might be a bit nasty, but it shouldn't crash whole server.

Can you share the query and the schema?

Christoph

Tomas Barton

unread,
Sep 7, 2021, 12:10:03 PM9/7/21
to
Here's an attempt to make the bug reproducible. Unfortunately I'm not able to reproduce the issue with generated data.

dropdb testdb || true
createdb -E UTF8 testdb
cat <<EOF > stress.sql
CREATE TABLE "public".downloaded_images (
   itemid text NOT NULL,
   property text NOT NULL,
   image_number integer DEFAULT 0 NOT NULL,
   filename text,
   download_time timestamp without time zone DEFAULT CURRENT_TIMESTAMP,
   failures_count integer DEFAULT 0
);


INSERT INTO "public".downloaded_images(itemid, property, filename, download_time)
SELECT md5(RANDOM()::TEXT), 'categoryPagePhotoUrl', md5(RANDOM()::TEXT), NOW()
FROM generate_series(1, 100000);
EOF

cat <<EOF > evil.sql
BEGIN;
  
CREATE TABLE IF NOT EXISTS "vgg16_fc1"
         (itemId TEXT,
          embedding_number INT DEFAULT 0,
          embedding JSONB,
          weight NUMERIC DEFAULT 1,
          last_update TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
          additional_data JSON,
          PRIMARY KEY(itemId, embedding_number)
          );

          CREATE INDEX IF NOT EXISTS "last_update_vgg16_fc1" ON "vgg16_fc1"
          USING btree ("last_update");


DECLARE "test-cursor-vgg16_fc1" CURSOR WITH HOLD FOR
           SELECT di.itemId, image_number, filename FROM (SELECT *
           FROM "public".downloaded_images
           WHERE property='categoryPagePhotoUrl' AND filename IS NOT NULL) di
           LEFT JOIN (SELECT itemId, MIN(last_update) as last_update FROM "vgg16_fc1" GROUP BY itemId) computed ON di.itemId=computed.itemId
           WHERE COALESCE(last_update, '1970-01-01') < download_time;


FETCH 10000 IN "test-cursor-vgg16_fc1";


COMMIT;
EOF
psql -d testdb -f stress.sql
psql -d testdb -f evil.sql

Anytime I run the evil.sql, it crashes the server. 

BEGIN
CREATE TABLE
CREATE INDEX
DECLARE CURSOR
psql:evil.sql:28: server closed the connection unexpectedly
       This probably means the server terminated abnormally
       before or while processing the request.
psql:evil.sql:28: fatal: connection to server was lost

Tomas Barton

unread,
Sep 8, 2021, 5:50:03 AM9/8/21
to
I've tried to rule out a possible data corruption issue, server is using ECC RAM modules, checksums are enabled on database. I've tried rebuilding indexes, running VACUUM FULL, switching to replica etc.

Anyway the issues was probably caused by a faulty NVMe adapter:

kernel: pcieport 0000:00:01.1: PCIe Bus Error: severity=Corrected, type=Data Link Layer, (Transmitter ID)
kernel: pcieport 0000:00:01.1:   device [8086:1905] error status/mask=00001000/00002000
kernel: pcieport 0000:00:01.1:    [12] Timeout              
kernel: pcieport 0000:00:01.1:   Error of this Agent is reported first
kernel: nvme 0000:02:00.0: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID)
kernel: nvme 0000:02:00.0:   device [144d:a808] error status/mask=000000c1/0000e000
kernel: nvme 0000:02:00.0:    [ 0] RxErr                  (First)

somehow the database on disk got corrupted and it can't be fixed without losing some data.

You can close the ticket, sorry for any inconvenience caused.

Christoph Berg

unread,
Sep 9, 2021, 8:30:03 AM9/9/21
to
Control: reopen -1

Re: Tomas Barton
> Sorry to bring this up again. I've tried to reproduce the issue on a
> postgresql-13 cluster that wasn't affected by this bug.

Oh, that's a real issue then, reopening. I can confirm the crash here:

2021-09-09 14:18:33.448 CEST [1538480] LOG: server process (PID 1540244) was terminated by signal 11: Segmentation fault
2021-09-09 14:18:33.448 CEST [1538480] DETAIL: Failed process was running: COMMIT;

> The bug appears to be introduced in 13.4-1.pgdg100+1 while
> 13.3-1.pgdg100+1 works
> fine.
> Also affects 12.8-1.pgdg100+1
>
> Steps to reproduce:
>
> pg_createcluster 13 main
> createdb testdb
> psql -d testdb -f postgresql-segfault.sql
>
> where the SQL is here:
> https://gist.github.com/deric/b3e82a8ace7fb92935984c2f30361f5c
>
> Should I cross post this to pgsql-pk...@postgresql.org? I don't wanna
> spam too many mailing lists.

Post it to pgsql...@postgresql.org, either directly or via the bug
submission form on the website. (Bonus points for inlining the gist.)

(I can forward it for you, if you like.)

Christoph
0 new messages